QUICK REVIEW

[論文レビュー] Dynamic Multimodal Expression Generation for LLM-Driven Pedagogical Agents: From User Experience Perspective

Ninghao Wan, Jiarun Song|arXiv (Cornell University)|Mar 10, 2026

Social Robot Interaction and HRI被引用数 0

ひとこと要約

論文は、VR教育エージェントのための動的なマルチモーダル表現（音声とジェスチャー）を生成するLLM駆動の手法を提示し、学習者の体験と関与の向上を示す。

ABSTRACT

In virtual reality (VR) educational scenarios, Pedagogical agents (PAs) enhance immersive learning through realistic appearances and interactive behaviors. However, most existing PAs rely on static speech and simple gestures. This limitation reduces their ability to dynamically adapt to the semantic context of instructional content. As a result, interactions often lack naturalness and effectiveness in the teaching process. To address this challenge, this study proposes a large language model (LLM)-driven multimodal expression generation method that constructs semantically sensitive prompts to generate coordinated speech and gesture instructions, enabling dynamic alignment between instructional semantics and multimodal expressive behaviors. A VR-based PA prototype was developed and evaluated through user experience-oriented subjective experiments. Results indicate that dynamically generated multimodal expressions significantly enhance learners' perceived learning effectiveness, engagement, and intention to use, while effectively alleviating feelings of fatigue and boredom during the learning process. Furthermore, the combined dynamic expression of speech and gestures notably enhances learners' perceptions of human-likeness and social presence. The findings provide new insights and design guidelines for building more immersive and naturally expressive intelligent PAs.

研究の動機と目的

VR教育エージェントにおける適応的で意味的に整合したマルチモーダル表現の必要性を喚起する。
指示意味論に導かれた音声とジェスチャーの協調指示を生成する方法を開発する。
ユーザー体験を評価するためのVRベースの教育エージェントのプロトタイプを構築する。
動的マルチモーダル表現が知覚学習効果、関与、疲労、および社会的存在感に与える影響を評価する。

提案手法

音声とジェスチャー生成を協調させる意味的に敏感なプロンプトを提案する。
LLM駆動のパイプラインを用いて指示意味論とマルチモーダル表現を整合させる。
VRベースの教育エージェントのプロトタイプを実装する。
効果を評価するためのユーザー体験志向の主観実験を実施する。
知覚学習効果、関与、疲労、社会的存在感に対する影響を分析する。

実験結果

リサーチクエスチョン

RQ1動的に生成されたマルチモーダル表現は、静的表現と比較して知覚学習効果を向上させるか。
RQ2協調した音声とジェスチャー表現は学習者の関与とエージェント使用意図を高めるか。
RQ3VR学習中の疲労と退屈を動的マルチモーダル表現は緩和できるか。
RQ4音声とジェスチャーの組み合わせ表現は知覚される人間らしさと社会的存在感にどう影響するか。

主な発見

動的に生成されたマルチモーダル表現は、学習者の知覚される学習効果、関与、および使用意図を有意に高める。
動的表現は学習中の疲労感と退屈感を緩和するのに役立つ。
音声とジェスチャーの組み合わせ動的表現は、人間らしさと社会的存在感の認知を高める。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。