QUICK REVIEW

[论文解读] Causal Discovery in Physical Systems from Videos

Yunzhu Li, Antonio Torralba|arXiv (Cornell University)|Jul 1, 2020

Explainable Artificial Intelligence (XAI)参考文献 53被引用 27

一句话总结

本文提出 V-CDN，一种端到端的无监督框架，通过学习关键点表征、利用图神经网络推断因果图并预测未来动态，从视频数据中发现因果结构。该方法可在未见过的交互图结构上实现一次完成的泛化，并在无真实因果标签或显式干预的情况下实现反事实推理。

ABSTRACT

Causal discovery is at the core of human cognition. It enables us to reason about the environment and make counterfactual predictions about unseen scenarios that can vastly differ from our previous experiences. We consider the task of causal discovery from videos in an end-to-end fashion without supervision on the ground-truth graph structure. In particular, our goal is to discover the structural dependencies among environmental and object variables: inferring the type and strength of interactions that have a causal effect on the behavior of the dynamical system. Our model consists of (a) a perception module that extracts a semantically meaningful and temporally consistent keypoint representation from images, (b) an inference module for determining the graph distribution induced by the detected keypoints, and (c) a dynamics module that can predict the future by conditioning on the inferred graph. We assume access to different configurations and environmental conditions, i.e., data from unknown interventions on the underlying system; thus, we can hope to discover the correct underlying causal graph without explicit interventions. We evaluate our method in a planar multi-body interaction environment and scenarios involving fabrics of different shapes like shirts and pants. Experiments demonstrate that our model can correctly identify the interactions from a short sequence of images and make long-term future predictions. The causal structure assumed by the model also allows it to make counterfactual predictions and extrapolate to systems of unseen interaction graphs or graphs of various sizes.

研究动机与目标

在无法访问真实因果图或隐性混淆因子的情况下，实现从视频中对物理系统进行端到端的因果发现。
从原始图像中学习紧凑且时序一致的关键点表征，以支持下游因果建模。
从在未知干预下收集的观测数据中推断结构因果模型（SCM）和隐性混淆因子。
利用推断出的因果结构实现长期未来预测和反事实推理。
在训练期间未见过的新型图拓扑结构和可变数量物体的系统上实现泛化。

提出的方法

感知模块采用无监督关键点检测，从视频帧中提取语义有意义且时序一致的表征。
推理模块采用图神经网络，估计外生变量并推断关键点之间的因果图结构。
动力学模块在推断出的因果图和隐性混淆因子条件下，预测未来关键点轨迹。
模型利用来自多样化配置和环境条件的数据作为隐式干预，以识别真实潜在因果图。
该框架在元学习设置下运行，以实现对未见因果机制的一次完成发现。
其以端到端、自监督方式联合执行模型类别估计、参数推断和动力学学习。

实验结果

研究问题

RQ1模型能否在无真实标签或显式干预的情况下，从未见视频中发现物理实体之间的真实因果图？
RQ2模型在推理阶段能否泛化到未见过的交互图结构和不同数量的物体？
RQ3推断出的因果结构能否支持准确的长期未来预测和反事实推理？
RQ4该方法对输入噪声和系统配置变化的鲁棒性如何？
RQ5模型能否从纯粹的视觉数据中学习到可解释的因果机制，以应对复杂物理系统？

主要发现

该模型成功地从复杂度各异的多体物理系统短视频序列中识别出因果交互。
其可泛化到未见的交互图结构和具有不同物体数量的系统，展现出一次完成因果发现的能力。
推断出的因果结构支持在训练分布之外的长期未来预测。
通过修改因果图并预测替代结果，模型支持反事实推理。
在布料环境中的实验表明，该方法可跨不同形状和拓扑结构（如衬衫和长裤）实现泛化。
该框架对输入噪声具有鲁棒性，并在未知干预条件下表现良好，验证了其无监督因果发现能力。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。