QUICK REVIEW

[論文レビュー] Causal Induction from Visual Observations for Goal Directed Tasks

Suraj Nair, Yuke Zhu|arXiv (Cornell University)|Oct 3, 2019

Multimodal Machine Learning Applications参考文献 45被引用数 45

ひとこと要約

The paper presents iterative causal induction from visual observations and an attention-based goal-conditioned policy to enable agents to complete multi-step, goal-directed tasks in environments with unseen causal structures.

ABSTRACT

Causal reasoning has been an indispensable capability for humans and other intelligent animals to interact with the physical world. In this work, we propose to endow an artificial agent with the capability of causal reasoning for completing goal-directed tasks. We develop learning-based approaches to inducing causal knowledge in the form of directed acyclic graphs, which can be used to contextualize a learned goal-conditional policy to perform tasks in novel environments with latent causal structures. We leverage attention mechanisms in our causal induction model and goal-conditional policy, enabling us to incrementally generate the causal graph from the agent's visual observations and to selectively use the induced graph for determining actions. Our experiments show that our method effectively generalizes towards completing new tasks in novel environments with previously unseen causal structures.

研究の動機と目的

潜在的因果構造について推論することで、エージェントがゴール志向タスクを実行できるよう動機づける。
観察からマクロ変数のDAGを構築する因果誘導と、ゴール条件付きポリシーを導く因果推論という二段階のメタ学習フレームワークを提案する。
対話的データから因果グラフを逐次更新するための注意機構を備えた反復的因果誘導ネットワークを開発する。
各ステップで関連する因果エッジに焦点を当てるポリシー内の注意ベースのグラフエンコードを導入する。
因果グラフを介して誘導と推論を分解することが、限られた訓練ケースで未知の構造へ一般化することを示す。

提案手法

視覚的観察と行動の軌跡からDAG [Chat{C}] を構築する反復的因果誘導ネットワーク F 〃fff。
Edge Decoder はエdge updates bDelta e と、グラフノードへの更新を適用するアテンションベクトルを出力する。
ポリシー [alpha] のアテンションボトルネックは、現在のステップに関連するエッジに焦点を当てて行動選択を行う。
Policy 〃gamma_G(s,g,〃C) は因果グラフ上の注意を用いてエッジを選択し、アクションを生成する。
トレーニングは、教師データと予測CのL2損失を最小化するための教師あり学習と、Oracleの指導を用いてポリシーを訓練するDAggerを用いる。

実験結果

リサーチクエスチョン

RQ1反復的で注意を用いた因果誘導ネットワークは、視覚的な相互作用データから基礎的な因果グラフを正確に推定できるか？
RQ2ゴール条件付きポリシーの注意ボトルネックは、未知の因果構造への一般化を改善するか？
RQ3反復的因果誘導と注意ベースのポリシーを組み合わせると、新規の因果関係を含む視覚的ゴール指向タスクで、従来手法よりも優れているか？
RQ4見た訓練済み構造の数とタスクサイズに応じて、性能はどのように変化するか？

主な発見

注意付きの反復誘導ネットワーク（ICIN）は、非反復的な変種やアブレーション型と比較して、因果グラフの回復において優れている（未 seen構造でのF1スコア）。
注意ボトルネックを持つポリシー（ICIN）は、未 seen因果構造での成功率を、スイッチ数や構造タイプを問わず、ベースラインより高く達成する。
ICINは、5-switch, 50-seen-structures設定でほぼOracleの性能に一致し、原因グラフの誘導が強力であることを示している。
ポリシー内の注意ボトルネックは一般化を大幅に向上させ、1:1およびMasterswitchケースで約10ポイント、1:KおよびK:1ケースで約40ポイント程度の改善を示す。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。