QUICK REVIEW

[論文レビュー] Causal Discovery in Physical Systems from Videos

Yunzhu Li, Antonio Torralba|arXiv (Cornell University)|Jul 1, 2020

Explainable Artificial Intelligence (XAI)参考文献 53被引用数 27

ひとこと要約

本稿では、キーポイント表現を学習し、グラフニューラルネットワークを用いて因果グラフを推論し、将来のダイナミクスを予測することで、ビデオデータから因果構造を発見するエンドツーエンドの教師なしフレームワーク、V-CDNを提案する。本手法は、教師あり因果ラベルや明示的な干渉なしに、未学習の相互作用グラフへのワンショット一般化と反事後的推論を可能にする。

ABSTRACT

Causal discovery is at the core of human cognition. It enables us to reason about the environment and make counterfactual predictions about unseen scenarios that can vastly differ from our previous experiences. We consider the task of causal discovery from videos in an end-to-end fashion without supervision on the ground-truth graph structure. In particular, our goal is to discover the structural dependencies among environmental and object variables: inferring the type and strength of interactions that have a causal effect on the behavior of the dynamical system. Our model consists of (a) a perception module that extracts a semantically meaningful and temporally consistent keypoint representation from images, (b) an inference module for determining the graph distribution induced by the detected keypoints, and (c) a dynamics module that can predict the future by conditioning on the inferred graph. We assume access to different configurations and environmental conditions, i.e., data from unknown interventions on the underlying system; thus, we can hope to discover the correct underlying causal graph without explicit interventions. We evaluate our method in a planar multi-body interaction environment and scenarios involving fabrics of different shapes like shirts and pants. Experiments demonstrate that our model can correctly identify the interactions from a short sequence of images and make long-term future predictions. The causal structure assumed by the model also allows it to make counterfactual predictions and extrapolate to systems of unseen interaction graphs or graphs of various sizes.

研究の動機と目的

真の因果グラフや隠れた交絡要因へのアクセスなしに、物理系のビデオからエンドツーエンドの因果発見を可能にすること。
下流の因果モデリングのための、時間的に一貫性のあるコンパクトなキーポイント表現を、生画像から学習すること。
未知の干渉下で収集された観測データから、構造的因果モデル（SCM）と隠れた交絡要因を推定すること。
推定された因果構造を用いて、長期的な将来予測と反事後的推論を可能にすること。
訓練時に見未曾ざらぬグラフトポロジーと物体数の異なるシステムへの一般化を可能にすること。

提案手法

感知モジュールは、教師なしキーポイント検出を用いて、ビデオフレームから意味的に意味のある時間的に一貫性のある表現を抽出する。
推論モジュールは、グラフニューラルネットワークを用いて、外生変数を推定し、キーポイント間の因果グラフ構造を推定する。
ダイナミクスモジュールは、推定された因果グラフと隠れた交絡要因を条件として、将来のキーポイント軌道を予測する。
モデルは、多様な設定や環境条件からのデータを、真の因果グラフを同定するための暗黙の干渉として活用する。
メタラーニング設定を用いることで、未学習の因果メカニズムのワンショット発見を可能にする。
モデルクラス推定、パrameter推定、ダイナミクス学習を、エンドツーエンドで自己教師ありの形で同時に実行する。

実験結果

リサーチクエスチョン

RQ1教師ありラベルや明示的な干渉なしに、モデルはビデオから物理的実体間の真の因果グラフを発見できるか？
RQ2モデルは、推論時に未学習の相互作用グラフ構造や異なる数の物体に対して一般化できるか？
RQ3推定された因果構造は、正確な長期的将来予測と反事後的推論を可能にするか？
RQ4本手法は、入力ノイズやシステム設定の変化に対してどれほど頑健か？
RQ5モデルは、複雑な物理系において、完全に視覚的なデータから解釈可能な因果メカニズムを学習できるか？

主な発見

モデルは、複雑性が異なるマルチボディ物理系の短い動画シーケンスから、因果的相互作用を効果的に同定した。
未学習の相互作用グラフや異なる物体数のシステムへ一般化でき、ワンショット因果発見能力を示した。
推定された因果構造により、学習分布を超えた正確な長期的将来予測が可能となった。
因果グラフを変更して代替結果を予測することで、反事後的推論を支援した。
生地の環境での実験では、シャツやパンツなど、異なる形状やトポロジーに対しても一般化が確認された。
入力ノイズに対して頑健であり、未知の干渉下でも良好に動作し、無教師因果発見能力が裏付けられた。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。