QUICK REVIEW

[論文レビュー] Learning to Walk in the Real World with Minimal Human Effort

Sehoon Ha, Peng Xu|arXiv (Cornell University)|Feb 20, 2020

Robotic Locomotion and Control参考文献 57被引用数 76

ひとこと要約

本論文は、複数の地形での歩行ポリシーを最小限の人手介入で訓練するために、マルチタスク学習と安全制約付きSACを用いた自律的な実世界のRLシステムを提案し、Minitaur上での効率的な実世界学習を実現し、転倒を減らす。

ABSTRACT

Reliable and stable locomotion has been one of the most fundamental challenges for legged robots. Deep reinforcement learning (deep RL) has emerged as a promising method for developing such control policies autonomously. In this paper, we develop a system for learning legged locomotion policies with deep RL in the real world with minimal human effort. The key difficulties for on-robot learning systems are automatic data collection and safety. We overcome these two challenges by developing a multi-task learning procedure and a safety-constrained RL framework. We tested our system on the task of learning to walk on three different terrains: flat ground, a soft mattress, and a doormat with crevices. Our system can automatically and efficiently learn locomotion skills on a Minitaur robot with little human intervention. The supplemental video can be found at: \url{https://youtu.be/cwyiq6dCgOc}.

研究の動機と目的

Reduce human effort required for real-world legged robot learning by automating data collection, resets, and safety in training.
Enable learning of multiple locomotion skills (forward, backward, turns) within a single training session.
Demonstrate robust learning across diverse terrains (flat ground, soft mattress, crevassed doormat).
Show that safety-constrained RL reduces falls and training interruptions in real-world data collection.

提案手法

Adopt a multi-task RL framework with a task scheduler that selects walking directions to keep the robot within the training workspace.
Define per-task rewards that encode desired direction of motion relative to episode start with a three-dimensional task vector w^i.
Train separate SAC policies for each task without shared actors/critics to avoid cross-task interference.
Introduce a safety constraint as a constrained MDP using a Lagrangian multiplier to bound torso pitch/roll during training.
Optimize using dual gradient descent to jointly update policy and the Lagrangian multiplier.
Use early termination near workspace boundaries with safety-aware return calculations to prevent excessive resets.

実験結果

リサーチクエスチョン

RQ1Can autonomous real-world RL learn legged locomotion with minimal human intervention on multiple terrains?
RQ2Is it feasible to train multiple locomotion policies simultaneously in real-world settings?
RQ3Does a safety-constrained RL framework reduce the number of falls and training disruptions compared to standard methods?

主な発見

The system learns to walk on flat ground, a soft mattress, and a doormat with crevices with minimal human intervention (zero manual resets in two of three flat-ground runs).
Four locomotion policies (forward, backward, turn left, turn right) can be learned in a single session, enabling a complete directional walking controller at test time.
On flat terrain, training two policies (forward and backward) required about 1.5 hours (~60k steps per policy).
On challenging surfaces, forward/backward walking learned in 5.5 hours ( mattress, ~200k steps) and 4.5 hours (doormat, ~150k steps).
The safety-constrained SAC reduces falls substantially compared with SAC without safety constraints (roughly 40 falls vs. >100 in a baseline with no safety constraint), reducing training interruptions.
Multi-task learning significantly lowers out-of-workspace failures to about 5-10% of the baseline across workspace sizes.

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。