QUICK REVIEW

[論文レビュー] Legged Locomotion in Challenging Terrains using Egocentric Vision

Ananye Agarwal, Ashish Kumar|arXiv (Cornell University)|Nov 14, 2022

Robotic Locomotion and Control被引用数 31

ひとこと要約

この論文は、自己視点深度視を用いた小型四足歩行ロボットのエンドツーエンドの機動システムを提案する。段差、縁石、踏み石、ギャップをシミュレーションで2段階学習法で訓練し、実世界でデプロイ。

ABSTRACT

Animals are capable of precise and agile locomotion using vision. Replicating this ability has been a long-standing goal in robotics. The traditional approach has been to decompose this problem into elevation mapping and foothold planning phases. The elevation mapping, however, is susceptible to failure and large noise artifacts, requires specialized hardware, and is biologically implausible. In this paper, we present the first end-to-end locomotion system capable of traversing stairs, curbs, stepping stones, and gaps. We show this result on a medium-sized quadruped robot using a single front-facing depth camera. The small size of the robot necessitates discovering specialized gait patterns not seen elsewhere. The egocentric camera requires the policy to remember past information to estimate the terrain under its hind feet. We train our policy in simulation. Training has two phases - first, we train a policy using reinforcement learning with a cheap-to-compute variant of depth image and then in phase 2 distill it into the final policy that uses depth using supervised learning. The resulting policy transfers to the real world and is able to run in real-time on the limited compute of the robot. It can traverse a large variety of terrain while being robust to perturbations like pushes, slippery surfaces, and rocky terrain. Videos are at https://vision-locomotion.github.io

研究の動機と目的

Elevation mapsなしで自己視点深度を用いたエンドツーエンドの機動を実証する。
階段、ギャップ、踏み石を含む多様な地形を小型四足歩行で横断可能にする。
効率的な学習のため安価な深度代理を活用する2段階学習パイプラインを開発する。
知覚ノイズや摂動に頑健な、シミュレーション訓練ポリシの実世界転送を示す。

提案手法

2段階訓練: 第1段階はロボットの下にある安価な地形問い合わせ scandots を用いた強化学習で参照ポリシー π1 を生成; 第2段階は π1 を深度ベースのポリシー π2 に教師あり学習で蒸留。
第1段階はPPOと再帰メモリ（GRU）を用いて scandots、固有覚、指示速度をターゲット関節角へ写像; 第2段階はモノリシックGRUベースのポリシーまたは視覚と固有覚入力を分離するRMAアーキテクチャを使用。
第2段階はオンボードセンサー入力（深度 d、固有覚 x）へ蒸留: (a) 深度前処理を convnet で行い π1 を模倣するDAggerで訓練するモノリシック手法; または (b) GRUを介して γ（地形形状）と z（環境パラメータ）を推定し、それらを基本的な MLP ポリシーへ入力するRMAアプローチ。
訓練環境は地形カリキュラムとパラメータのランダム撹乱を用いて頑健性を促進; 歩法 priors は課さず、出現する歩法を発生させる。

実験結果

リサーチクエスチョン

RQ1自己視点深度の視覚を用いて、地形地形? elevation maps や歩法 priors なしで小型四足歩行は横断できるか？
RQ2 scandotsを用いたRL からの深度ベース制御への教師あり蒸留という2段階訓練パイプラインは、ハードウェア上での信頼できるシミュレーション→現実転送を実現できるか？
RQ3モノリシックGRUベースとRMAアーキテクチャは、エンドツーエンドの視覚-運動制御において、脚歩行でどのように比較されるか？
RQ4視覚ベースの制御は、摺動、つるつる、岩場などの摂動と知覚ノイズに対する頑健性にどのような影響を与えるか？

主な発見

地形	RMA 平均 X-変位 (↑)	MLith 平均 X-変位 (↑)	ノイズあり平均 X-変位 (↑)	Blind 平均 X-変位 (↑)	RMA 平均転倒時間 (s)	MLith 平均転倒時間 (s)	ノイズあり平均転倒時間 (s)	Blind 平均転倒時間 (s)
斜面	43.98	44.09	36.14	34.72	88.99	85.68	70.25	67.07
踏み石	18.83	20.72	1.09	1.02	34.30	41.32	2.51	2.49
階段	31.24	42.40	6.74	16.64	69.99	90.48	15.77	39.17
離散障害物	40.13	28.64	29.08	32.41	85.17	57.53	59.30	66.33
合計	134.18	135.85	73.05	84.79	278.45	275.01	147.83	175.06

提案システムは、前方の深度カメラ1台を用いて stairs, 縁石, 踏み石, ギャップを横断できる小型四足歩行機でリアルタイム制御を実現する。
scandots による第1段階と教師あり蒸留による第2段階の2段階訓練は、シミュレーション→現実転送を成功させ、第2段階のポリシーは限られたオンボード計算機で50 Hzで動作。
モノリシックGRUベースとRMAの両アーキテクチャは、盲目および elevation-map ベースのベースラインを上回る。
シミュレーションでは、視覚ベース手法は地形全体で移動距離と転倒時間の著しい改善を示す: 踏み石では約20 m、階段・個別障害物では転倒前に数十メートル、ベースラインより顕著に良好。
実世界の試験では upstairs, downstairs, gaps で100%成功、踏み石で94%成功; blindベースラインはギャップと踏み石で失敗。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。