QUICK REVIEW

[論文レビュー] Voice-Driven Semantic Perception for UAV-Assisted Emergency Networks

Nuno Saavedra, Patrick Rasmussen Ribeiro|arXiv (Cornell University)|Feb 19, 2026

UAV Applications and Optimization被引用数 0

ひとこと要約

SIREN は未構造の緊急音声通信を構造化された機械可読セマンティック出力へ変換し、ASR、LLMs、NLP 検証を通じた UAV 支援ネットワーク管理を支援する；合成シナリオでの評価は実現可能性と主要な限界を示した。

ABSTRACT

Unmanned Aerial Vehicle (UAV)-assisted networks are increasingly foreseen as a promising approach for emergency response, providing rapid, flexible, and resilient communications in environments where terrestrial infrastructure is degraded or unavailable. In such scenarios, voice radio communications remain essential for first responders due to their robustness; however, their unstructured nature prevents direct integration with automated UAV-assisted network management. This paper proposes SIREN, an AI-driven framework that enables voice-driven perception for UAV-assisted networks. By integrating Automatic Speech Recognition (ASR) with Large Language Model (LLM)-based semantic extraction and Natural Language Processing (NLP) validation, SIREN converts emergency voice traffic into structured, machine-readable information, including responding units, location references, emergency severity, and Quality-of-Service (QoS) requirements. SIREN is evaluated using synthetic emergency scenarios with controlled variations in language, speaker count, background noise, and message complexity. The results demonstrate robust transcription and reliable semantic extraction across diverse operating conditions, while highlighting speaker diarization and geographic ambiguity as the main limiting factors. These findings establish the feasibility of voice-driven situational awareness for UAV-assisted networks and show a practical foundation for human-in-the-loop decision support and adaptive network management in emergency response operations.

研究の動機と目的

地上インフラが劣化または利用不可となる場合に備えた、堅牢で適応的な緊急通信を動機づける。
音声トラフィックから構造化セマンティック情報を抽出して人間が関与する UAV ネットワーク管理を実現する。
認知層を介して、未構造の音声交換とプログラム可能なネットワーク決定を橋渡しする。
多様な言語および音響条件下での音声駆動セマンティック知覚の頑健性と限界を評価する。
音声由来の文脈を使用した適応的な UAV の配置とリソース割り当ての実用的な基礎を提供する。

提案手法

緊急音声を UAV 管理に適した構造化出力へ変換するモジュラー AI パイプラインとして SIREN を提案する。
意味抽出を schema-constrained prompting 体制の下で行うために ASR と Large Language Models (LLMs) を統合する。
LLM 出力を監査・改良するために決定論的 NLP 検証（NER、スピーカー diarization、感情分析）を適用する。
位置情報、ユニット、emergency_level、QoS の期待値等を含む JSON ライン風の構造化表現を生成し、UAV 制御・計画システムと統合する。
検証済みの位置エンティティを座標へマッピングし、インタラクティブな地理参照インターフェースを通じて結果を可視化する。
言語・話者数・背景ノイズ・メッセージの複雑さが異なる合成マルチシナリオデータセットで評価する。

実験結果

リサーチクエスチョン

RQ1音声由来のセマンティック知覚は UAV 支援緊急ネットワーク管理に信頼できる入力を提供できるか。
RQ2言語変動、話者類似性、背景ノイズに対して SIREN パイプラインはどれくらい頑健か。
RQ3音声駆動知覚を用いた空地連携の主なボトルネック（例：diarization、geocoding）は何か。
RQ4構造化出力は人間の意思決定と適応的なネットワーク制御をどの程度支援できるか。

主な発見

SIREN は合成条件下での堅牢な転写とセマンティック抽出を実現する。
話者 diarization および地理的曖昧性は、多話者・曖昧な場所シナリオの主要な制限要因である。
API ベースの転写はノイズ条件下でローカルのオフラインモデルよりも優れており、双方とも高容量バックエンドで改善する。
地理座標の曖昧性と外部サービスの制約は、正確なテキスト抽出にも関わらず座標精度に影響を及ぼす可能性がある。
LLM ベースのセマンティック処理と決定論的 NLP 検証は UAV の位置決めとリソース割り当てタスクに適した構造化 JSON 出力を生み出す。
実行時間はシナリオの複雑さとともに増加し、複雑さが増すにつれて LLN の推論が支配的になる。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。