QUICK REVIEW

[논문 리뷰] Towards a Definition of Disentangled Representations

Irina Higgins, David Amos|arXiv (Cornell University)|2018. 12. 05.

Generative Adversarial Networks and Image Synthesis참고 문헌 5인용 수 294

한 줄 요약

본 논문은 그룹 이론과 대칭을 사용하여 해제된(disentangled) 표현을 형식적으로 정의한다: 표현은 세계의 대칭 그룹의 부분군에 의해 각 독립적으로 변환되는 부분 공간들로 분해될 때 해제된 상태이다.

ABSTRACT

How can intelligent agents solve a diverse set of tasks in a data-efficient manner? The disentangled representation learning approach posits that such an agent would benefit from separating out (disentangling) the underlying structure of the world into disjoint parts of its representation. However, there is no generally agreed-upon definition of disentangling, not least because it is unclear how to formalise the notion of world structure beyond toy datasets with a known ground truth generative process. Here we propose that a principled solution to characterising disentangled representations can be found by focusing on the transformation properties of the world. In particular, we suggest that those transformations that change only some properties of the underlying world state, while leaving all other properties invariant, are what gives exploitable structure to any kind of data. Similar ideas have already been successfully applied in physics, where the study of symmetry transformations has revolutionised the understanding of the world structure. By connecting symmetry transformations to vector representations using the formalism of group and representation theory we arrive at the first formal definition of disentangled representations. Our new definition is in agreement with many of the current intuitions about disentangling, while also providing principled resolutions to a number of previous points of contention. While this work focuses on formally defining disentangling - as opposed to solving the learning problem - we believe that the shift in perspective to studying data transformations can stimulate the development of better representation learning algorithms.

연구 동기 및 목표

대칭 변환을 사용해 원칙적이고 형식적인 해제된 표현의 정의를 제시한다.
물리학의 개념(군 이론 및 표현 이론)을 머신 러닝 표현으로 연결한다.
데이터 생성 요인이 무엇인지, 그리고 그것들이 어떻게 표현되고 조작될 수 있는지 명확히 한다.

제안 방법

세계의 일부 특성만을 바꾸고 다른 특성은 불변하도록 하는 군 작용으로서의 대칭 변환을 도입한다.
벡터 표현이 분해되어 독립적인 부분공간으로 나뉘고, 각 부분공간은 세계의 대칭 그룹의 단일 부분군에 의해 영향을 받는다고 가정한다.
등가성 정의: 표현 f가 분해되었다고 보려면 Z에 G 작용이 존재하여 f가 세계 W에 대해 G-동변적(equivariant)이어야 한다.
G = G1 × ... × Gn의 분해와 Z를 Z1 ⊕ ... ⊕ Zn 또는 Z1 × ... × Zn으로 대응적으로 분해하는 것에 대해 해제된 표현을 형식화한다.
부분군의 서브공간에 대한 작용이 선형인 선형 해제된 표현을 논의한다.
개념을 설명하기 위한 작동된 격자 세계(grid-world) 예를 제공한다.

Figure 1 : A : an example grid world where the object can move horizontally or vertically, as well as change colour. Moving beyond the edge of the grid transports the object to the opposite side of the grid. B : an example of non-commutativity of 3D rotations. Rotating by $90^{\circ}$ along axis 1 t

실험 결과

연구 질문

RQ1대칭 변환을 어떻게 형식화하여 해제된 표현을 정의할 수 있는가?
RQ2세계의 대칭 그룹 분해에 어떤 조건이 있으면 변동 요인이 독립적인 부분공간으로 분리되는 표현이 보장되는가?
RQ3동변성이 대칭 구조를 보존하면서 세계 상태를 표현 공간에 매핑하는 능력과 어떤 관계가 있는가?
RQ4주어진 데이터세트에서 다른 부분군 분해를 선택하는 것은 어떤 함의를 갖는가?

주요 결과

군 이론과 표현 이론에 기초한 해제된 표현의 최초의 원칙적이고 형식적인 정의를 제시한다.
해제된 표현이 세계의 대칭 그룹 분해와 정렬된 표현 공간의 분해에 대응한다는 것을 보인다.
여러 부분군 분해가 존재할 수 있음을 주장하지만, 세계 구조를 반영하는 자연스러운 분해만이 유용한 해제를 가능하게 한다.
해제된 표현은 구성가능성과 변환의 선형성 가능성을 촉진하여 학습 효율성을 높일 수 있음을 강조한다.
순전히 경험적 직관보다 대칭 기반 정의를 배경으로 해제화를 평가하는 방법을 명확히 한다.

Figure 2 : Top left : pixel observations $o\in O$ of world states $w\in W$ under the action of group $G=G_{x}\times G_{y}\times G_{c}$ , where $G_{x}$ stands for a cyclic group of translations along the x coordinate, $G_{y}$ stands for a cyclic group of translations along the y coordinate, and $G_{c

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.