[論文レビュー] Learning to Walk in the Real World with Minimal Human Effort
本論文は、複数の地形での歩行ポリシーを最小限の人手介入で訓練するために、マルチタスク学習と安全制約付きSACを用いた自律的な実世界のRLシステムを提案し、Minitaur上での効率的な実世界学習を実現し、転倒を減らす。
Reliable and stable locomotion has been one of the most fundamental challenges for legged robots. Deep reinforcement learning (deep RL) has emerged as a promising method for developing such control policies autonomously. In this paper, we develop a system for learning legged locomotion policies with deep RL in the real world with minimal human effort. The key difficulties for on-robot learning systems are automatic data collection and safety. We overcome these two challenges by developing a multi-task learning procedure and a safety-constrained RL framework. We tested our system on the task of learning to walk on three different terrains: flat ground, a soft mattress, and a doormat with crevices. Our system can automatically and efficiently learn locomotion skills on a Minitaur robot with little human intervention. The supplemental video can be found at: \url{https://youtu.be/cwyiq6dCgOc}.
研究の動機と目的
- Reduce human effort required for real-world legged robot learning by automating data collection, resets, and safety in training.
- Enable learning of multiple locomotion skills (forward, backward, turns) within a single training session.
- Demonstrate robust learning across diverse terrains (flat ground, soft mattress, crevassed doormat).
- Show that safety-constrained RL reduces falls and training interruptions in real-world data collection.
提案手法
- Adopt a multi-task RL framework with a task scheduler that selects walking directions to keep the robot within the training workspace.
- Define per-task rewards that encode desired direction of motion relative to episode start with a three-dimensional task vector w^i.
- Train separate SAC policies for each task without shared actors/critics to avoid cross-task interference.
- Introduce a safety constraint as a constrained MDP using a Lagrangian multiplier to bound torso pitch/roll during training.
- Optimize using dual gradient descent to jointly update policy and the Lagrangian multiplier.
- Use early termination near workspace boundaries with safety-aware return calculations to prevent excessive resets.
実験結果
リサーチクエスチョン
- RQ1Can autonomous real-world RL learn legged locomotion with minimal human intervention on multiple terrains?
- RQ2Is it feasible to train multiple locomotion policies simultaneously in real-world settings?
- RQ3Does a safety-constrained RL framework reduce the number of falls and training disruptions compared to standard methods?
主な発見
- The system learns to walk on flat ground, a soft mattress, and a doormat with crevices with minimal human intervention (zero manual resets in two of three flat-ground runs).
- Four locomotion policies (forward, backward, turn left, turn right) can be learned in a single session, enabling a complete directional walking controller at test time.
- On flat terrain, training two policies (forward and backward) required about 1.5 hours (~60k steps per policy).
- On challenging surfaces, forward/backward walking learned in 5.5 hours ( mattress, ~200k steps) and 4.5 hours (doormat, ~150k steps).
- The safety-constrained SAC reduces falls substantially compared with SAC without safety constraints (roughly 40 falls vs. >100 in a baseline with no safety constraint), reducing training interruptions.
- Multi-task learning significantly lowers out-of-workspace failures to about 5-10% of the baseline across workspace sizes.
より良い研究を、今すぐ始めましょう
論文設計から論文執筆まで、研究時間を劇的に削減しましょう。
クレジットカード登録不要
このレビューはAIが作成し、人間の編集者が確認しました。