QUICK REVIEW

[论文解读] Physics-as-Inverse-Graphics: Joint Unsupervised Learning of Objects and Physics from Video.

Miguel Jaques, Michael Burke|arXiv (Cornell University)|May 27, 2019

Model Reduction and Neural Networks参考文献 43被引用 16

一句话总结

该论文提出了一种物理即逆图形（physics-as-inverse-graphics）框架，无需对象或状态监督，即可从视频中联合学习对象身份、状态及物理参数。通过将可微分物理引擎与视觉即逆图形方法相结合，该方法实现了准确的长期视频预测，并在小球-弹簧和三体引力系统等场景中实现了数据高效、可解释的模型预测控制。

ABSTRACT

We propose a model that is able to perform unsupervised physical parameter estimation of systems from video, where the differential equations governing the scene dynamics are known, but labeled states or objects are not available. Existing physical scene understanding methods require either object state supervision, or do not integrate with differentiable physics to learn interpretable system parameters and states. We address this problem through a physics-as-inverse-graphics approach that brings together vision-as-inverse-graphics and differentiable physics engines, enabling objects and explicit state and velocity representations to be discovered. This framework allows us to perform long term extrapolative video prediction, as well as vision-based model-predictive control. Our approach significantly outperforms related unsupervised methods in long-term future frame prediction of systems with interacting objects (such as ball-spring or 3-body gravitational systems), due to its ability to build dynamics into the model as an inductive bias. We further show the value of this tight vision-physics integration by demonstrating data-efficient learning of vision-actuated model-based control for a pendulum system. We also show that the controller's interpretability provides unique capabilities in goal-driven control and physical reasoning for zero-data adaptation.

研究动机与目标

在缺乏对象状态和标签的动态场景中，实现无监督的物理参数估计。
将视觉即逆图形方法与可微分物理引擎相结合，联合发现对象、状态及系统动力学。
提升在具有相互作用对象的系统（如弹簧-质量系统或引力系统）中长期视频预测的性能。
实现数据高效、基于视觉的模型预测控制，并支持可解释策略，以实现物理推理和零样本适应。

提出的方法

该模型使用可微分物理引擎作为归纳偏置，引导从视频帧中重建场景状态的逆图形过程。
通过端到端可微分推理，联合优化对象身份、位置、速度及物理参数（如弹簧常数、质量）。
该框架采用神经渲染头，从预测的场景状态和物理参数重建视频帧，从而实现自监督训练。
通过可微分渲染和物理模拟，将视频重建误差反向传播至视觉和物理动力学组件。
通过联合优化视觉一致性和物理一致性，学习对象及其物理属性的解耦表征。
通过所学习的物理模型，基于视觉观测规划动作，支持模型预测控制。

实验结果

研究问题

RQ1视觉系统是否能仅从无状态或对象监督的视频中，联合发现对象和物理参数？
RQ2基于物理信息的逆图形模型在复杂动力系统中的长期视频预测泛化能力如何？
RQ3视觉与可微分物理的紧密集成在模型基控制中的数据效率方面提升了多少？
RQ4由于其可解释性，所学习的控制器是否能支持零样本适应和目标驱动的物理推理？

主要发现

在小球-弹簧和三体引力系统中，该模型在长期视频预测精度上显著优于现有无监督方法。
将可微分物理作为归纳偏置，实现了训练序列之外的稳定且物理解释合理的外推。
该方法在摆动系统中实现了数据高效的视觉驱动模型基控制，相较于基线方法在有限演示数据下表现更优。
控制器的可解释性使其能够实现零样本适应，支持目标驱动任务中的物理推理而无需微调。
该模型成功从原始无监督视频中发现了解耦的对象身份、位置、速度及物理参数。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。