QUICK REVIEW

[論文レビュー] Query-Efficient Imitation Learning for End-to-End Autonomous Driving

Jiakai Zhang, Kyunghyun Cho|arXiv (Cornell University)|May 20, 2016

Reinforcement Learning in Robotics参考文献 15被引用数 115

ひとこと要約

SafeDAgger は DAgger を安全ポリシーで拡張し、参照ポリシーへの問い合わせを削減。エンドツーエンド自動運転の問い合わせ効率を高め、TORCSシミュレーションでの収束をより速く安全にする。

ABSTRACT

One way to approach end-to-end autonomous driving is to learn a policy function that maps from a sensory input, such as an image frame from a front-facing camera, to a driving action, by imitating an expert driver, or a reference policy. This can be done by supervised learning, where a policy function is tuned to minimize the difference between the predicted and ground-truth actions. A policy function trained in this way however is known to suffer from unexpected behaviours due to the mismatch between the states reachable by the reference policy and trained policy functions. More advanced algorithms for imitation learning, such as DAgger, addresses this issue by iteratively collecting training examples from both reference and trained policies. These algorithms often requires a large number of queries to a reference policy, which is undesirable as the reference policy is often expensive. In this paper, we propose an extension of the DAgger, called SafeDAgger, that is query-efficient and more suitable for end-to-end autonomous driving. We evaluate the proposed SafeDAgger in a car racing simulator and show that it indeed requires less queries to a reference policy. We observe a significant speed up in convergence, which we conjecture to be due to the effect of automated curriculum learning.

研究の動機と目的

参照ポリシーからの模倣学習を通じてエンドツーエンド自動運転を動機付ける。
参照ポリシーが高価な場合（例: 人間ドライバー）に DAgger の高い問い合わせコストに対処する。
SafeDAgger を提案、参照ポリシーの問い合わせを最小化する安全ポリシーを備えた DAgger の問い合わせ効率向上拡張。
TORCS シュミレーションを通じて SafeDAgger が収束を速め、クラッシュ/ダメージを減らすことを実証。
安全性評価に guided したサブセット選択による自動カリキュラム学習効果を強調。

提案手法

主要ポリシーが参照ポリシーを問い合わせずに逸脱する可能性を予測する安全ポリシーを導入する。
deviation ε(π,π*,φ(s)) = ||π(φ(s)) − π*(φ(s))||^2 とし、閾値 τ を定義して π_safe* を形成する。
安全ポリシーを SafeDAgger ループに組み込み、参照ポリシーを問い合わせるのは安全ポリシーが 0 を返したときの「難例」のみとする。
データ収集時の queried 状態を subset selection で制限し、データ効率とカリキュラム様の学習を実現する。
DAgger に似た学習-探索フレームワークを維持し、反復ごとに一次ポリシーと安全ポリシーの更新を行う。
TORCS に深層 CNN 主ポリシーを適用し、運転操作（操舵、ブレーキ、アフォーダンス）を予測するほか、安全ポリシーが安全/危険な運転判断を予測する。

実験結果

リサーチクエスチョン

RQ1SafeDAgger はエンドツーエンド走行において参照ポリシーへの問い合わせ回数を従来の DAgger と比べて削減できるか。
RQ2SafeDAgger はシミュレートされた走行環境で supervised 学習や DAgger よりも収束を速め、走行性能（クラッシュ件数の低減、損害の低減）を改善できるか。
RQ3安全ポリシーはデータ効率とポリシー品質を改善する意味のある自動カリキュラムを可能にするか。
RQ4TORCS における交通量あり/なしの条件下で SafeDAgger はどう性能を示すか。
RQ5安全ポリシーの概念を DAgger 以外の他の模倣学習フレームワークへ一般化することは可能か。

主な発見

SafeDAgger は学習中に元の DAgger よりも参照ポリシーへの問い合わせを大幅に少なくて済む。
3 回の反復後、SafeDAgger で訓練されたポリシーは TORCS 設定でほぼ完璧な走行を達成する。
安全ポリシーはテスト時の参照ポリシーの使用時間を短縮し、ノー交通で 7.11%、交通ありで 10.81% の削減を初期段階で示した。
報告された設定で、訓練データの約 77.70% の例が安全と見なされる。
SafeDAgger は vanilla DAgger と比べて収束が速く、参照ポリシーへの依存の減少傾向がより明確である。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。