QUICK REVIEW

[論文レビュー] Memory Bear AI Memory Science Engine for Multimodal Affective Intelligence: A Technical Report

Deliang Wen, Ke Sun|arXiv (Cornell University)|Mar 18, 2026

Emotion and Mood Recognition被引用数 0

ひとこと要約

Memory Bear AI Memory Science Engineは感情情報を構造化メモリシステム（EMUs）として扱い、長期的かつロバストなマルチモーダル感情判断と検索を可能にする。いくつかのデータセットでベースラインを上回り、ノイズ条件下でも優位性を示す。

ABSTRACT

Affective judgment in real interaction is rarely a purely local prediction problem. Emotional meaning often depends on prior trajectory, accumulated context, and multimodal evidence that may be weak, noisy, or incomplete at the current moment. Although multimodal emotion recognition (MER) has improved the integration of text, speech, and visual signals, many existing systems remain optimized for short-range inference and provide limited support for persistent affective memory, long-horizon dependency modeling, and robust interpretation under imperfect input. This technical report presents the Memory Bear AI Memory Science Engine, a memory-centered framework for multimodal affective intelligence. Instead of treating emotion as a transient output label, the framework models affective information as a structured and evolving variable within a memory system. It organizes processing through structured memory formation, working-memory aggregation, long-term consolidation, memory-driven retrieval, dynamic fusion calibration, and continuous memory updating. At its core, multimodal signals are transformed into structured Emotion Memory Units (EMUs), enabling affective information to be preserved, reactivated, and revised across interaction horizons. Experimental results show consistent gains over comparison systems across benchmark and business-grounded settings, with stronger accuracy and robustness, especially under noisy or missing-modality conditions. The framework offers a practical step from local emotion recognition toward more continuous, robust, and deployment-relevant affective intelligence.

研究の動機と目的

感情判定を純粋な局所予測タスクではなく、記憶中心の問題として再定義する。
再利用可能なEmotion Memory Units（EMUs）をエンコードする構造化メモリアーキテクチャを提案する。
欠損・ノイズのあるモダリティ下での堅牢性を高めるため、記憶ベースの短期・長期の統合、検索、動的フュージョンを可能にする。
ベンチマークデータセットとビジネス志向データセットでの性能と堅牢性を示し、展開志向の分析を行う。

提案手法

ステージ1：モダリティ固有の感情エンコードを生成するためのマルチモーダル前処理と表現学習（テキストはLLMベースのセマンティックエンコード、音声はHiggs-Audio、視覚はVLM駆動の表現）。
ステージ2：感情構造化メモリモデルで、感情e_t、ソース信頼性m_t、文脈アンカーc_t、目立度α_t、時相τ_tを捉えるEMUを形成する。
ステージ2には短期統合のための感情作業メモリと統合のための感情長期記憶、さらにメモリ主導の検索を含む。
ステージ3：過去のメモリに対してマルチモーダルの寄与度を調整する動的フュージョン戦略。
ステージ4：忘却と更新を含むメモリライフサイクルを備えた分類・意思決定・メモリ更新。

実験結果

リサーチクエスチョン

RQ1長い相互作用の時間軸で、記憶中心設計は感情判定の安定性と精度にどう影響するか。
RQ2EMUsと記憶主導の検索は、従来のフュージョン手法と比べて欠損・劣化したモダリティ下でのロバスト性を改善できるか。
RQ3標準MERベンチマーク（IEMOCAP、CMU-MOSEI）およびビジネス指向データセットでの精度と安定性の向上はどの程度か。
RQ4ノイズの多い入力の下で、記憶ガイド付き校正はリアルタイムの感情解釈にどのように影響するか。

主な発見

Dataset / Setting	Metric	Value
IEMOCAP	Accuracy	78.8%
CMU-MOSEI	Accuracy	66.7%
Memory Bear AI Business Dataset	Accuracy	68.4%
Memory Bear AI Business Dataset	Weighted F1	48.6
Memory Bear AI Business Dataset	Macro F1	45.9
Degraded multimodal conditions	Complete-condition robustness	92.3%

IEMOCAPではMemory Bear AIが78.8%の精度を達成。
CMU-MOSEIではMemory Bear AIが66.7%の精度を達成。
Memory Bear AI Business Datasetでは精度が68.4%、重み付きF1が48.6、マクロF1が45.9。
ビジネスデータセットでは従来のフュージョンベースのベースラインより8.2ポイントの精度向上を再現。
劣化したマルチモーダル条件下でも、完全条件のパフォーマンスの92.3%を保持することで頑健性を示す。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。