QUICK REVIEW

[논문 리뷰] Discovering objects and their relations from entangled scene representations

David Raposo, Adam Santoro|arXiv (Cornell University)|2017. 02. 16.

Multimodal Machine Learning Applications인용 수 73

한 줄 요약

Relation Networks (RNs)은 장면에서 객체 관계를 학습하고, 순열 불변이며, 얽힌 입력을 분해할 수 있다; 이러한 네트워크는 memory-augmented nets와 짝지어 원샷 관계 학습을 가능하게 한다.

ABSTRACT

Our world can be succinctly and compactly described as structured scenes of objects and relations. A typical room, for example, contains salient objects such as tables, chairs and books, and these objects typically relate to each other by their underlying causes and semantics. This gives rise to correlated features, such as position, function and shape. Humans exploit knowledge of objects and their relations for learning a wide spectrum of tasks, and more generally when learning the structure underlying observed data. In this work, we introduce relation networks (RNs) - a general purpose neural network architecture for object-relation reasoning. We show that RNs are capable of learning object relations from scene description data. Furthermore, we show that RNs can act as a bottleneck that induces the factorization of objects from entangled scene description inputs, and from distributed deep representations of scene images provided by a variational autoencoder. The model can also be used in conjunction with differentiable memory mechanisms for implicit relation discovery in one-shot learning tasks. Our results suggest that relation networks are a potentially powerful architecture for solving a variety of problems that require object relation reasoning.

연구 동기 및 목표

구조화된 장면에서 객체와 관계에 대한 추론 필요성을 제시한다.
변환 불변성(permutation invariance)을 갖는 객체 쌍에서 작동하는 신경망 아키텍처(Relation Networks)를 제안한다.
Relational 구조를 바탕으로 장면을 분류하는 RN의 능력을 시연한다.
얽힌 입력을 객체와 같은 표현으로 요인화하는 병목으로서의 RN을 보인다.
RN과 memory-augmented nets의 결합으로 one-shot 관계 학습을 시연한다.

제안 방법

장면 설명 행렬 D에서 객체를 특징 벡터로 정의한다(객체 m개, 특징 n개).
모든 객체 쌍에 대해 공유된 MLP gψ를 사용하여 관계를 계산하고, 합(sum)과 같이 교환가능하고 결합적인 함수 a를 통해 집계한다.
최종 함수 fφ를 사용하여 a에서 예측치를 얻는다. 예를 들어 r̃ = fφ(Σij gψ(oi, oj)).
대상이 객체 관계를 설명하는 인접 행렬인 감독 학습 과제에서 RN을 평가한다.
RN이 선형 병목층이나 VAE 프리프로세서를 사용하여 얽힌 입력으로부터 객체의 요인화를 유도할 수 있음을 시연한다.
RN을 Memory-Augmented Neural Network (MANN)과 결합하여 one-shot 관계 학습을 수행한다.

실험 결과

연구 질문

RQ1RNs가 장면 설명으로부터 객체-관계 구조를 학습하고 일반화할 수 있는가?
RQ2RNs가 얽힌 입력이나 픽셀 기반 입력으로부터 객체 요인화와 관계를 추론할 수 있는가?
RQ3메모리 모듈과 결합될 때 RNs가 원샷 학습을 지원하는가?
RQ4관계 과제에서 RN 성능이 MLP 기준선과 어떻게 비교되는가?
RQ5RN 매개 표현이 보지 않은 관계 그래프에 대한 일반화를 가능하게 하는가?

주요 결과

RNs는 관계적 장면 분류 과제에서 동등한 크기의 MLP보다 성능이 우수하고 보지 않은 클래스에 일반화한다.
RNs는 RN 이전에 선형 disentangler를 학습하여 얽힌 입력으로부터 객체 관계를 추론하고, emergent한 객체-요인화 표현을 드러낸다.
VAE 기반 지각 경로가 잠재 코드를 RN에 공급할 수 있어, 분산된 이미지 표현과의 RN 호환성을 시연한다.
RN-전처리된 MANN은 에피소드 내에서 높은 원샷 관계 분류를 달성하는 반면, MLP 전처리기를 가진 MANN은 기회치 수준이다.
메모리 및 지각 모듈과 함께 사용할 때 RN 능력이 지속되며, 관계 추론 과제에 넓은 적용 가능성을 시사한다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.