QUICK REVIEW

[论文解读] Causal Reasoning from Meta-reinforcement Learning

Ishita Dasgupta, Jane X. Wang|arXiv (Cornell University)|Jan 23, 2019

Bayesian Modeling and Causal Inference参考文献 6被引用 74

一句话总结

本文显示，一个无模型的元学习RNN智能体能够在观测、干预和反事实数据环境中通过学习设计信息性实验，完成因果推理——包括 do-calculus、干预和反事实。

ABSTRACT

Discovering and exploiting the causal structure in the environment is a crucial challenge for intelligent agents. Here we explore whether causal reasoning can emerge via meta-reinforcement learning. We train a recurrent network with model-free reinforcement learning to solve a range of problems that each contain causal structure. We find that the trained agent can perform causal reasoning in novel situations in order to obtain rewards. The agent can select informative interventions, draw causal inferences from observational data, and make counterfactual predictions. Although established formal causal reasoning algorithms also exist, in this paper we show that such reasoning can arise from model-free reinforcement learning, and suggest that causal reasoning in complex settings may benefit from the more end-to-end learning-based approaches presented here. This work also offers new strategies for structured exploration in reinforcement learning, by providing agents with the ability to perform -- and interpret -- experiments.

研究动机与目标

激发在没有显式因果先验的情况下，因果推理是否能从元学习中涌现。
展示从观测数据中获得的因果推断能力（因果效应、干预、反事实）。
证明主动数据收集可以提升因果理解和任务奖励。
评估元学习智能体是否能迁移到训练中未见过的新型因果图。

提出的方法

用无模型强化学习训练基于LSTM的智能体，在随机因果贝叶斯网络（CBNs）上行动。
将序列结构为信息阶段与测验阶段，在测验阶段使用干预或观测来推断因果结构。
利用受 do-calculus 启发的推理来推导观测数据中的因果效应，并在反事实推理任务中进行测试。
比较主动（有信息的）与随机数据收集，以评估结构化探索的价值。
在三种设定下进行评估：观测、干预和反事实，并对留出测试图进行测试。

实验结果

研究问题

RQ1一个经过元训练的无模型RL智能体是否可以仅从观测数据中进行因果推理？
RQ2获取干预数据是否能够在存在未观测混杂因素的情况下解决因果问题？
RQ3智能体是否能够进行反事实推理，且在 degenerate 情形中显式推断（abduction）是否提升性能？
RQ4智能体是否学习主动选择信息性观测或干预以提高 quiz-phase 的奖励？
RQ5学习到的策略对未见过的因果图的迁移能力如何？

主要发现

通过元学习训练的智能体能够从观测数据中进行因果推理，当干预节点具有父节点时，表现优于最好的纯相关基线。
干预数据有助于解决未观测到的混杂因素，在混杂情形下，主动干预智能体的表现优于仅观测的智能体。
利用潜在随机性（abduction）的反事实智能体，在退化极大情形下以及使用定制干预时，超过了干预智能体。
主动数据收集策略在三种实验设定中均比随机观测策略获得更高的 quiz-phase 奖励。
智能体展示了学习 do-calculus、有效干预规划和反事实预测的能力，而不需要显式的因果先验。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。