QUICK REVIEW

[論文レビュー] DESIRE: Distant Future Prediction in Dynamic Scenes with Interacting Agents

Namhoon Lee, Wongun Choi|arXiv (Cornell University)|Apr 14, 2017

Reinforcement Learning in Robotics参考文献 49被引用数 75

ひとこと要約

DESIRE は、CVAE サンプル仮説、IOC に基づくランク付け、シーン文脈の融合、そして逐次改良を組み合わせることにより、相互作用する複数のエージェントに対して多様で長期的な未来を予測する、深層確率的 IOC-RNN エンコーダ・デコーダである。

ABSTRACT

We introduce a Deep Stochastic IOC RNN Encoderdecoder framework, DESIRE, for the task of future predictions of multiple interacting agents in dynamic scenes. DESIRE effectively predicts future locations of objects in multiple scenes by 1) accounting for the multi-modal nature of the future prediction (i.e., given the same context, future may vary), 2) foreseeing the potential future outcomes and make a strategic prediction based on that, and 3) reasoning not only from the past motion history, but also from the scene context as well as the interactions among the agents. DESIRE achieves these in a single end-to-end trainable neural network model, while being computationally efficient. The model first obtains a diverse set of hypothetical future prediction samples employing a conditional variational autoencoder, which are ranked and refined by the following RNN scoring-regression module. Samples are scored by accounting for accumulated future rewards, which enables better long-term strategic decisions similar to IOC frameworks. An RNN scene context fusion module jointly captures past motion histories, the semantic scene context and interactions among multiple agents. A feedback mechanism iterates over the ranking and refinement to further boost the prediction accuracy. We evaluate our model on two publicly available datasets: KITTI and Stanford Drone Dataset. Our experiments show that the proposed model significantly improves the prediction accuracy compared to other baseline methods.

研究の動機と目的

複数の相互作用するエージェントが存在する動的なシーンにおいて、正確な遠未来予測を促進する。
マルチモダリティと長期報酬を捉えるエンドツーエンドで訓練可能なフレームワークを開発する。
過去の運動、シーン文脈、エージェント間の相互作用を組み込み、予測品質を向上させる。
複数のもっともらしい未来の軌道を生成し、反復的なフィードバックを通じてそれらを洗練させる。
自動車運転および空中監視コンテキストへの適用性とスケーラビリティを可能にする。

提案手法

過去の軌道から複数の未来の軌道仮説を生成する条件付き変分オートエンコーダ（CVAE）を用いた多様なサンプル生成。
蓄積された未来報酬に基づいてサンプルを評価し、反復的に予測を調整するIOCベースのランキングと改良。
SCF（Scene Context Fusion）は、過去の運動、CNNベースのシーン文脈、およびエージェント間の相互作用をRNNデコードへ統合する。
過去の軌道とシーンをエンコードするGRUを用いたRNNエンコーダ–デコーダアーキテクチャで、複数の未来サンプルをデコードする。
長期報酬により適合させるために、予測変位を用いてサンプルを洗練させる反復的フィードバックループ。
再構成損失、KLD損失、サンプリングランキングのためのクロスエントロピー損失、改良のための回帰損失を組み合わせた結合最適化。

実験結果

リサーチクエスチョン

RQ1DESIRE は、さまざまなシーン文脈の下で、相互作用する複数のエージェントに対して多様でマルチモーダルな未来の軌道を生成できるか？
RQ2シーン文脈とエージェント間の相互作用を取り込むことは、長期予測の精度を改善するか？
RQ3IOCベースのランキングと反復的改良は、決定論的または反応的なベースラインよりもより正確で安定した予測を生み出すか？
RQ4運転（KITTI）と空中監視（Stanford Drone Dataset）シナリオでのモデルの性能はどうなるか？
RQ5サンプル量と反復的なフィードバックが予測品質に与える影響は何か？

主な発見

DESIRE は KITTI および SDD における線形およびRNNベースラインと比較して未来の軌道予測を大きく改善する。
CVAE ベースのサンプリングは多様性を捉え、より多くのサンプルがオラクル風の予測を改善する。
SCF を介してシーン文脈とエージェント間の相互作用を組み込むことで、シーン非依存バリアントより精度が向上する。
反復的な回帰改良は予測誤差を着実に低減し、長期予測を強化する。
DESIRE-S（セマンティック文脈のみ）および DESIRE-SI（文脈＋相互作用）は、複数のエージェントが存在する場合により強い性能を示し、特に SDD データセットで顕著である。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。