QUICK REVIEW

[論文レビュー] Independently Controllable Factors

Valentin Thomas, Jules Pondard|arXiv (Cornell University)|Aug 3, 2017

Neural Networks and Applications参考文献 6被引用数 51

ひとこと要約

本研究は、オートエンコーダとポリシーを共同で訓練して相互作用環境内の独立して制御可能な要因を発見する学習目的を提案し、外部報酬なしで表現を分離可能にする。

ABSTRACT

It has been postulated that a good representation is one that disentangles the underlying explanatory factors of variation. However, it remains an open question what kind of training framework could potentially achieve that. Whereas most previous work focuses on the static setting (e.g., with images), we postulate that some of the causal factors could be discovered if the learner is allowed to interact with its environment. The agent can experiment with different actions and observe their effects. More specifically, we hypothesize that some of these factors correspond to aspects of the environment which are independently controllable, i.e., that there exists a policy and a learnable feature for each such aspect of the environment, such that this policy can yield changes in that feature with minimal changes to other features that explain the statistical variations in the observed data. We propose a specific objective function to find such factors and verify experimentally that it can indeed disentangle independently controllable aspects of the environment without any extrinsic reward signal.

研究の動機と目的

相互作用環境における変動要因を分離する表現学習を動機づける。
学習されたポリシーによっていくつかの要因が独立して制御可能であるメカニズムを導入する。
再構成損失と選択性/分離項を組み合わせた目的関数を提案する。
外部報酬なしで制御可能な要因を回復できることを示す。

提案手法

オートエンコーダを用いて潜在表現を定義し、各潜在特徴ごとにポリシーを学習する。
ポリシーが自分に関連する特徴のみをどれだけ変化させるかを測る選択性目的を導入する。
再構成損失を最小化し、選択性を最大化して制御可能な因子を分離する。
学習済み埋め込みで因子をインデックス付けし、属性変動セレクタを用いて連続埋め込みへ拡張する。
REINFORCEを用いるポリシーグラデイントで選択性目的を最適化する。
gridworldとMazeBase環境で手法をデモンストレーションし、制御可能な因子の分離を示す。

実験結果

リサーチクエスチョン

RQ1特徴とポリシーを共同学習することにより、独立して制御可能な因子を自律的に発見できるか？
RQ2再構成を最小化しつつ選択性を最大化することで、外部報酬なしに制御可能な因子の分離を実現できるか？
RQ3連続埋め込みをどのように活用して複雑な環境へ手法をスケールさせられるか？
RQ4学習済み表現は計画やポリシー推論タスクをサポートできるか？

主な発見

本手法はgridworld設定で物体位置など制御可能な要因に対応する潜在特徴を学習する。
指向性のある選択性により分離を達成し、明示的な監視なしで真の因子を回復する。
MazeBaseでは連続埋め込みアプローチが基礎因子に対応する変動をクラスタリングし、計画に類する推論を可能にする。
このアプローチは制御可能な因子の変化が特徴空間の異なる方向に対応する潜在空間を生み出し、単純なポリシー推定タスクを可能にする。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。