QUICK REVIEW

[论文解读] InfoGAIL: Interpretable Imitation Learning from Visual Demonstrations

Yunzhu Li, Jiaming Song|arXiv (Cornell University)|Mar 26, 2017

Reinforcement Learning in Robotics参考文献 39被引用 141

一句话总结

InfoGAIL 在 GAIL 的基础上引入潜变量，以揭示并解耦专家示范中的潜在因素，使基于视觉输入的可解释模仿成为可能，并在无监督情况下学习多种行为模式。

ABSTRACT

The goal of imitation learning is to mimic expert behavior without access to an explicit reward signal. Expert demonstrations provided by humans, however, often show significant variability due to latent factors that are typically not explicitly modeled. In this paper, we propose a new algorithm that can infer the latent structure of expert demonstrations in an unsupervised way. Our method, built on top of Generative Adversarial Imitation Learning, can not only imitate complex behaviors, but also learn interpretable and meaningful representations of complex behavioral data, including visual demonstrations. In the driving domain, we show that a model learned from human demonstrations is able to both accurately reproduce a variety of behaviors and accurately anticipate human actions using raw visual inputs. Compared with various baselines, our method can better capture the latent structure underlying expert demonstrations, often recovering semantically meaningful factors of variation in the data.

研究动机与目标

在专家示范在多种策略中存在潜在变异时，推动模仿学习。
扩展 GAIL，以发现并解耦示范中的潜在变异因素。
使从原始视觉输入学习成为可能，并恢复具有语义意义的潜在结构。

提出的方法

在 GAIL 中加入一个潜变量 c，用于从专家策略混合体 π_E = {π_E^0, π_E^1, ...} 中进行选择。
引入互信息正则化项 I(c; τ)，通过变分界近似 L_I(π,Q) 来强制 c 捕捉轨迹中的显著因素。
优化 InfoGAIL 目标：min_π,Q max_D E_π[log D(s,a)] + E_{π_E}[log(1−D(s,a))] − λ1 L_I(π,Q) − λ2 H(π)。
使用简化的后验 Q(c|s,a)，以避免代价高昂的轨迹级计算。
通过 Wasserstein GAN 目标、奖励增强、方差缩减技术以及用于策略更新的 TRPO 来增强优化。

实验结果

研究问题

RQ1策略中的潜变量是否能捕捉并解耦专家示范中的潜在变异？
RQ2InfoGAIL 是否能够从视觉输入中在无监督的情况下识别出具有语义意义的变异因素（如驾驶风格）？
RQ3该方法是否能够从原始像素学习，并在驾驶场景中产生模式特定的行为？

主要发现

InfoGAIL 在一个合成的二维任务中学习区分并模仿多种专家模式，在该任务中存在三条圆形轨迹。
在驾驶实验中，InfoGAIL 在原始视觉输入下训练，能够高准确性地重现并区分对应潜在编码的不同驾驶行为（例如内车道/外车道转弯；从左/右超车）。
后验推断 Q(c|s,a) 在驾驶任务中对通过（pass）的潜在编码识别准确率超过 81%，对转弯（turn）接近完美。
InfoGAIL 的平均 rollout 距离高于行为克隆和标准 GAIL，在奖励增强和高级优化下甚至超过一些人类示范。
使用带有预训练 CNN 特征（ImageNet）的迁移学习，有助于在演示较少的情况下从高维视觉输入学习。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。