QUICK REVIEW

[論文レビュー] Learning human behaviors from motion capture by adversarial imitation

Josh Merel, Yuval Tassa|arXiv (Cornell University)|Jul 7, 2017

Reinforcement Learning in Robotics参考文献 1被引用数 154

ひとこと要約

本論文は generative adversarial imitation learning (GAIL) を拡張して、部分観測のモーションキャプチャから人間らしい動作ポリシーを訓練し、身体転送と高レベルコントローラ内でのサブスキルとしての再利用を可能にする。

ABSTRACT

Rapid progress in deep reinforcement learning has made it increasingly feasible to train controllers for high-dimensional humanoid bodies. However, methods that use pure reinforcement learning with simple reward functions tend to produce non-humanlike and overly stereotyped movement behaviors. In this work, we extend generative adversarial imitation learning to enable training of generic neural network policies to produce humanlike movement patterns from limited demonstrations consisting only of partially observed state features, without access to actions, even when the demonstrations come from a body with different and unknown physical parameters. We leverage this approach to build sub-skill policies from motion capture data and show that they can be reused to solve tasks when controlled by a higher level controller.

研究の動機と目的

高次元のヒューマノイドに対して、手作りの報酬を用いずに人間的で人間らしい動作生成を動機付け、実現する。
状態特徴の一部だけで動作し、アクションデータなしで動作模倣学習パイプラインを開発する。
身体転送のデモ、ノイズのあるモーションキャプチャに対する頑健性、そして高レベル制御における学習済みサブスキルの再利用を示す。
マルチビヘイビア学習を通じて、複数の挙動間の頑健な遷移の出現を示す。

提案手法

Extend GAIL to partial observations and include a context variable for multi-behavior policy learning.
Train a stochastic neural policy to output Gaussian action distributions for actuators.
Use TRPO for policy updates and adversarial rewards derived from a discriminator distinguishing demo vs. policy data.
Provide end-to-end pipeline from motion capture-based demonstrations to low-level controllers, then integrate with a high-level controller for task learning.
Utilize a MuJoCo physics engine with varied bodies, including a complex humanoid, for training and evaluation.
Expose end-effector–based features (vectors from root to feet, hands, head) and inertial sensors to stabilize imitation from noisy motion captures.

実験結果

リサーチクエスチョン

RQ1GAIL imitation learning can succeed when demonstrations have only partial state observations and no actions?
RQ2Is imitation robust to differences in body dynamics between demonstrator and imitator (body transfer)?
RQ3Can multiple behaviors be learned and transitioned between robustly using a context-conditioned discriminator?
RQ4Can learned low-level skills from motion capture be reused by a higher-level controller to solve new tasks?
RQ5How well can a complex humanoid learn from limited, noisy motion capture data and still exhibit humanlike motions?

主な発見

Imitation learning with partial state observations can match behaviors without access to actions.
Discriminator conditioning on body-invariant features enables cross-body imitation and re-targeting across different dof configurations.
Multi-behavior training with context variables yields robust transitions between skills and supports switching during trajectories.
Motion-capture based learning yields more natural gait and capable get-up behaviors for a complex humanoid than random initialization or pure RL.
End-effector–based feature representations stabilize imitation from noisy motion capture and improve perceived humanlikeness of learned motions.
Sub-skills learned from motion capture can be composed and modulated by a higher-level controller to perform tasks like navigation, turning, running, and stair climbing.

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。