Skip to main content
QUICK REVIEW

[论文解读] Data Quality in Imitation Learning

Suneel Belkhale, Yuchen Cui|arXiv (Cornell University)|Jun 4, 2023
Domain Adaptation and Few-Shot Learning被引用 7
一句话总结

本论文通过行动发散度和转移多样性形式化了模仿学习中的数据质量,并实证地展示了这些属性与 Horizon 和噪声,以及它们如何影响策略性能和数据筛选的关系。

ABSTRACT

In supervised learning, the question of data quality and curation has been over-shadowed in recent years by increasingly more powerful and expressive models that can ingest internet-scale data. However, in offline learning for robotics, we simply lack internet scale data, and so high quality datasets are a necessity. This is especially true in imitation learning (IL), a sample efficient paradigm for robot learning using expert demonstrations. Policies learned through IL suffer from state distribution shift at test time due to compounding errors in action prediction, which leads to unseen states that the policy cannot recover from. Instead of designing new algorithms to address distribution shift, an alternative perspective is to develop new ways of assessing and curating datasets. There is growing evidence that the same IL algorithms can have substantially different performance across different datasets. This calls for a formalism for defining metrics of "data quality" that can further be leveraged for data curation. In this work, we take the first step toward formalizing data quality for imitation learning through the lens of distribution shift: a high quality dataset encourages the policy to stay in distribution at test time. We propose two fundamental properties that shape the quality of a dataset: i) action divergence: the mismatch between the expert and learned policy at certain states; and ii) transition diversity: the noise present in the system for a given state and action. We investigate the combined effect of these two key properties in imitation learning theoretically, and we empirically analyze models trained on a variety of different data sources. We show that state diversity is not always beneficial, and we demonstrate how action divergence and transition diversity interact in practice.

研究动机与目标

  • 为基于分布漂移的模仿学习定义一个正式的数据质量概念。
  • 确定塑造数据质量的两个核心属性——行动发散度和转移多样性。
  • 分析这些属性如何随时间及数据特征(如地平线长度和噪声)相互作用。
  • 提出面向数据的数据驱动洞察,以改善模仿学习性能的数据集筛选。

提出的方法

  • 将数据质量建模为学习到的状态访问分布与专家状态访问分布在IL算法下的负f-发散。
  • 将行动发散度定义为在某状态下学习到的行动分布与专家行动分布的不匹配。
  • 将转移多样性定义为给定状态和行动的环境噪声/动态的多样性。
  • 证明将分布漂移与行动发散度的界限,并展示时间效应。
  • 在不同环境下实证研究数据噪声(系统噪声和策略噪声)与数据测量(人类与机器数据)的影响。
  • 使用行动方差、地平线长度、状态相似度等指标来测量数据质量因素在数据集中的表现。
Figure 1: Case Study : Trajectories and action variance for scripted (left two plots) compared to human demonstration data (right two plots). Even though the human data (right) has high state coverage, the action variance is high, leading to high action divergence, and vice versa.
Figure 1: Case Study : Trajectories and action variance for scripted (left two plots) compared to human demonstration data (right two plots). Even though the human data (right) has high state coverage, the action variance is high, leading to high action divergence, and vice versa.

实验结果

研究问题

  • RQ1在模仿学习中应如何定义和度量数据质量以考虑分布漂移?
  • RQ2行动发散度和转移多样性在塑造数据集质量与策略性能方面扮演什么角色?
  • RQ3在实践中,这些属性如何与数据集规模、地平线长度和环境噪声相互作用?
  • RQ4以数据为中心的筛选策略是否比以算法为中心的修正更能提升IL性能?

主要发现

  • 行动发散度和转移多样性共同影响分布漂移与IL performance。
  • 单独的状态多样性并不能保证更好的IL性能;行动一致性至关重要。
  • 在数据收集阶段的系统噪声可以提升对行动发散的鲁棒性,前提是数据充足。
  • 策略噪声(接近人类的次优行动)在数据不足的情况下可能损害性能,除非转移多样性能够抵消。
  • 对人类数据集做的数据测量显示,较高的行动方差和更长的地平线并不必然与更高的成功率相关,凸显数据质量的复杂性。
  • 提供一定程度的转移多样性可以减轻嘈杂或次优专家示范的负面影响。
Figure 2: BC Success rates in PMObstacle (top row) for 1000 and 10 episodes of data, and in Square (bottom row) for 200 and 50 episodes of data (error bars over 3 datasets). X-axis corresponds to injected Gaussian noise in the dataset and each line corresponds to injected system noise ( $\sigma_{s}$
Figure 2: BC Success rates in PMObstacle (top row) for 1000 and 10 episodes of data, and in Square (bottom row) for 200 and 50 episodes of data (error bars over 3 datasets). X-axis corresponds to injected Gaussian noise in the dataset and each line corresponds to injected system noise ( $\sigma_{s}$

更好的研究,从现在开始

从论文设计到论文写作,大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成,并经人工编辑审核。