QUICK REVIEW

[論文レビュー] Seeing Eye to Eye: Enabling Cognitive Alignment Through Shared First-Person Perspective in Human-AI Collaboration

Zhuyu Teng, Pei Chen|arXiv (Cornell University)|Mar 13, 2026

Human-Automation Interaction and Safety被引用数 0

ひとこと要約

Eye2Eyeは、共同注視、積み上げられた共通 ground、反省的フィードバックを組み合わせた一人称視点フレームワークを提案し、ARプロトタイプで実装・ユーザ研究で評価する。

ABSTRACT

Despite advances in multimodal AI, current vision-based assistants often remain inefficient in collaborative tasks. We identify two key gulfs: a communication gulf, where users must translate rich parallel intentions into verbal commands due to the channel mismatch , and an understanding gulf, where AI struggles to interpret subtle embodied cues. To address these, we propose Eye2Eye, a framework that leverages first-person perspective as a channel for human-AI cognitive alignment. It integrates three components: (1) joint attention coordination for fluid focus alignment, (2) revisable memory to maintain evolving common ground, and (3) reflective feedback allowing users to clarify and refine AI's understanding. We implement this framework in an AR prototype and evaluate it through a user study and a post-hoc pipeline evaluation. Results show that Eye2Eye significantly reduces task completion time and interaction load while increasing trust, demonstrating its components work in concert to improve collaboration.

研究の動機と目的

ウェアラブルAI協調を妨げる通信・理解のギャップを特定する。
Eye2Eyeを提案し、一人称視点を認知的整合の共有知覚チャネルへ変換する。
Eye2EyeをARプロトタイプに実装し、ユーザ研究とパイプライン評価を通じて有効性を検証する。

提案手法

Eye2Eyeを三つの中核要素：Joint Attention Coordination（共同注視調整）、Accumulated Common Ground（積み上げられた共通地盤）、Reflective Situated Feedback（反省的状況フィードバック）で定義・運用可能とする。
リアルタイムの多感覚知覚とフィードバックを実現するため、Apple Vision Pro上にARプロトタイプを開発する。
相互作用履歴を恒常的に蓄積・更新するオブジェクトカードメモリモジュールを実装する。
軽量な知覚に続く視覚言語モデルによる意味解釈を組み合わせた二段階の注視パイプラインを作成する。
ユーザー Corrections や新たな相互作用でコンテキストを更新するリトリーバル拡張メモリワークフローを採用する。

実験結果

リサーチクエスチョン

RQ1リアルタイムタスクにおいて、人間とAIの間で共有された一人称視点はアラインドされた注視を確立・維持できるか？
RQ2Eye2Eyeは基準となるウェアラブルアシスタントと比較して、グラウンディングコストを削減し、相互作用の摩擦を低下させ、信頼を高めるか？
RQ3凝集的地盤を形成・更新する上で、視線・ジェスチャー・発话などのマルチモーダル信号はどのように寄与するか？
RQ4複数ターンの相互作用を通じて認知的整合を維持する上で、持続的なオブジェクトカード記憶の役割は何か？

主な発見

Eye2Eyeは協調タスクにおけるタスク完了時間と相互作用負荷を有意に低減する。
フレームワークはAI協働者へのユーザー信頼を高める。
マルチモーダル表現は共通地盤の形成に独自に寄与する。
パイプライン評価は、 System のすべての要素を統合した場合に相乗効果があることを示唆する。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。