QUICK REVIEW

[論文レビュー] Natural and Robust Walking using Reinforcement Learning without Demonstrations in High-Dimensional Musculoskeletal Models

Pierre Schumacher, Thomas Geijtenbeek|arXiv (Cornell University)|Sep 6, 2023

Robotic Locomotion and Control被引用数 10

ひとこと要約

本論文は、デモなしで高次元の筋骨格モデルにおいて堅牢で自然な歩行を達成する強化学習を示し、適応報酬と複数の2D/3Dモデルとシミュレータにまたがる評価を用いる。

ABSTRACT

Humans excel at robust bipedal walking in complex natural environments. In each step, they adequately tune the interaction of biomechanical muscle dynamics and neuronal signals to be robust against uncertainties in ground conditions. However, it is still not fully understood how the nervous system resolves the musculoskeletal redundancy to solve the multi-objective control problem considering stability, robustness, and energy efficiency. In computer simulations, energy minimization has been shown to be a successful optimization target, reproducing natural walking with trajectory optimization or reflex-based control methods. However, these methods focus on particular motions at a time and the resulting controllers are limited when compensating for perturbations. In robotics, reinforcement learning~(RL) methods recently achieved highly stable (and efficient) locomotion on quadruped systems, but the generation of human-like walking with bipedal biomechanical models has required extensive use of expert data sets. This strong reliance on demonstrations often results in brittle policies and limits the application to new behaviors, especially considering the potential variety of movements for high-dimensional musculoskeletal models in 3D. Achieving natural locomotion with RL without sacrificing its incredible robustness might pave the way for a novel approach to studying human walking in complex natural environments. Videos: https://sites.google.com/view/naturalwalkingrl

研究の動機と目的

モーションキャプチャデモンストレーションなしで、堅牢な筋骨格ウォーキングを実現するためのRL研究を動機づける。
生物学的に妥当な目標（速度、努力、痛み）を含む報酬関数を開発し、人間に近い歩行を生み出す。
報酬の変更なしに、複数のモデルとシミュレーションエンジン間でこの手法の移植性を示す。
従来の反射ベースの制御器を超える撹乱や不均一な地形に対する頑健性を示す。

提案手法

筋活動駆動の歩行のため、適応的で制約にインスパイアされた報酬を用いたDEP-RLフレームワークを採用。
報酬項: r_vel (目標CM速度 ~1.2 m/s), c_effort (筋活動と励起の滑らかさ), c_pain (関節制限とGRF荷重のペナルティ)。
パフォーマンスに基づく努力項の適応重み alpha(t) を実装し、一貫性を保つためにオフポリシーデータの再ラベリングを行う。
自然な歩行を確保するため、速度維持、エネルギー効率、関節荷重の不自然さを避ける安全性の三つの項を適用。
複数のモデル（2D平面のH0918から高次元のMyoLeg）と2つのシミュレーションエンジン（HyfydyとMuJoCo/MyoSuite）でテスト。
初期バイアスを減らし現実的なエネルギー使用を促進するため、ランダム化した状態でエージェントを初期化し、筋刺激をクリップする。

実験結果

リサーチクエスチョン

RQ1デモなしで高次元の筋骨格モデルにおいて、自然で堅牢な歩行を強化学習で生成できるか？
RQ2適応的で生物学的動機づけを持つ報酬は、多様なモデルとシミュレータ全体で人間に近い歩行運動学とGRFをもたらすか？
RQ3トレーニング中に見られなかった撹乱や不均一な地形に対して、学習されたポリシーはどれくらい頑健か？
RQ4同じトレーニング設定が、異なるモデルの複雑さと生体力学的シミュレーション間でどの程度一般化できるか？

主な発見

コントローラ	システム	平均努力	実験的適合	平均距離 [m]
reflex	H0918	0.041 ± 3×10^-3	0.68 ± 0.08	2.46 ± 0.98
RL	H0918	0.013 ± 3×10^-4	0.67 ± 0.03	10.42 ± 0.94
RL	H1622	0.015 ± 2×10^-3	0.73 ± 0.01	5.6 ± 0.99
RL	H2190	0.017 ± 1×10^-5	0.50 ± 0.01	10.59 ± 2.51
RL	MyoLeg	0.013 ± 2×10^-4	0.43 ± 0.05	n.a.

RLポリシーは、運動学とGRFが実験的人間データに近い歩行を達成し、いくつかの従来のRL手法よりも近づいた。
学習は、報酬を変更せずに、4モデル（2Dおよび3D、最大90筋）と2つのシミュレーションエンジンで堅牢な歩行を生み出した。
平坦な地形と撹乱地形の双方で、同等のタスクにおいて反射ベースの制御よりRLポリシーの頑健性が高かった。
適応的な努力重みによりエネルギー効率の高い歩行が現れ、固定コストスケジュールによる brittleness を回避した。
高次元モデル（80/90筋）の場合でも自然に見える歩行を維持可能だったが、生体力学モデリングの精度に起因するいくつかのアーチファクトがあった。
このアプローチは最小限のハイパーパラメータ調整とトレーニング中のモーションキャプチャデータなしで実現した。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。