QUICK REVIEW

[論文レビュー] HumanDiffusion: A Vision-Based Diffusion Trajectory Planner with Human-Conditioned Goals for Search and Rescue UAV

Faryal Batool, Iana Zhura|arXiv (Cornell University)|Jan 21, 2026

Robotics and Sensor-Based Localization被引用数 0

ひとこと要約

tldr: HumanDiffusion は YOLO ベースの人検出を用いて動的ゴールを定義し、ピクセル空間拡散モデルで地図不要な安全な UAV 軌道を生成する。シミュレーションと実 indoor 試験で 80% の任務成功を検証。

ABSTRACT

Reliable human--robot collaboration in emergency scenarios requires autonomous systems that can detect humans, infer navigation goals, and operate safely in dynamic environments. This paper presents HumanDiffusion, a lightweight image-conditioned diffusion planner that generates human-aware navigation trajectories directly from RGB imagery. The system combines YOLO-11 based human detection with diffusion-driven trajectory generation, enabling a quadrotor to approach a target person and deliver medical assistance without relying on prior maps or computationally intensive planning pipelines. Trajectories are predicted in pixel space, ensuring smooth motion and a consistent safety margin around humans. We evaluate HumanDiffusion in simulation and real-world indoor mock-disaster scenarios. On a 300-sample test set, the model achieves a mean squared error of 0.02 in pixel-space trajectory reconstruction. Real-world experiments demonstrate an overall mission success rate of 80% across accident-response and search-and-locate tasks with partial occlusions. These results indicate that human-conditioned diffusion planning offers a practical and robust solution for human-aware UAV navigation in time-critical assistance settings.

研究の動機と目的

時間が限られた検索救助シナリオで explicit マップや事前定義されたゴールなしに自律的な人間認識ナビゲーションを動機づける。
視覚に基づく人検出と拡散駆動軌道生成を結合するエンドツーエンドパイプラインを開発する。
リアルタイムのピクセル空間軌道計画を実現し、人間に安全に接近しつつ安全余裕を確保する。
simulated data での訓練と real indoor 実験での評価を通じて、シム・ツー・リアルの移行を実証する。

提案手法

RGB フレームから動的ゴール点を生成する YOLO ベースの人検出を統合する。
開始点、推定ゴール、RGB 画像を条件として、ピクセル空間軌道を生成する条件付き UNet 拡散モデルを使用する。
二乗余弦スケジュールを用いた前方拡散プロセスと DDPM スタイルの逆ノイズ除去を実装し、クリーンな軌道マスクを生成する。
軌道再構成とエンドポイント精度を組み合わせた複合損失（$L_{path}$ と $L_{endpoint}$）で訓練する。
2D ピクセル空間軌道を生成し、深度とカメラ内部パラメータを用いて世界座標系の3D経路に射影して UAV 実行を行う。
シミュレーションデータ（9,800 の ground-truth 軌道；8,000 訓練、1,500 評価、300 テスト）と実 indoor シナリオで評価する。

実験結果

リサーチクエスチョン

RQ1RGB 画像を条件とした拡散ベースのプランナーは、人間中心の SAR タスクのための地図なしの信頼できる軌道を生成できるか。
RQ2YOLO 派生の人間ゴールを拡散軌道生成器と統合すると、遮蔽や部分的視認下で頑健な性能を発揮するか。
RQ3シム→リアルの拡散ベース計画が、時間的に緊急な支援のための実 indoor 環境にどれだけ移行できるか。
RQ4視覚条件付き拡散計画の動的な人間とロボットの協働シナリオにおける制約と失敗モードは何か。

主な発見

300 サンプルのテストセットで、ピクセル空間軌道再構成の平均二乗誤差は 0.02 を達成。
実世界試験では、事故対応と捜索・発見タスクを通じて、部分遮蔽の中で全体の任務成功率が 80% に達した。
シナリオ1（事故対応）は 9/10 の成功配送を達成。
シナリオ2（遮蔽環境での捜索・発見）は 7/10 の成功。
失敗は知覚喪失（2 試行）、制御器追従エラー（1 試行）、通信ドロップアウト（1 試行）による。
システムは軌道生成に frame 毎 0.2–0.3 秒の処理時間を要し、安全のための停止余裕は 1 m を使用。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。