QUICK REVIEW

[論文レビュー] Same Answer, Different Representations: Hidden instability in VLMs

Farooq Ahmad Wani, Alessandro Suglia|arXiv (Cornell University)|Feb 6, 2026

Multimodal Machine Learning Applications被引用数 0

ひとこと要約

論文は表現を意識した、周波数を考慮した Vision-Language Models (VLMs) のロバスト性フレームワークを提案し、出力が同じでも摂動下で隠れた内部ドリフトを明らかにする。

ABSTRACT

The robustness of Vision Language Models (VLMs) is commonly assessed through output-level invariance, implicitly assuming that stable predictions reflect stable multimodal processing. In this work, we argue that this assumption is insufficient. We introduce a representation-aware and frequency-aware evaluation framework that measures internal embedding drift, spectral sensitivity, and structural smoothness (spatial consistency of vision tokens), alongside standard label-based metrics. Applying this framework to modern VLMs across the SEEDBench, MMMU, and POPE datasets reveals three distinct failure modes. First, models frequently preserve predicted answers while undergoing substantial internal representation drift; for perturbations such as text overlays, this drift approaches the magnitude of inter-image variability, indicating that representations move to regions typically occupied by unrelated inputs despite unchanged outputs. Second, robustness does not improve with scale; larger models achieve higher accuracy but exhibit equal or greater sensitivity, consistent with sharper yet more fragile decision boundaries. Third, we find that perturbations affect tasks differently: they harm reasoning when they disrupt how models combine coarse and fine visual cues, but on the hallucination benchmarks, they can reduce false positives by making models generate more conservative answers.

研究の動機と目的

意味保持摂動にもかかわらず出力不変性以上のロバスト性評価を動機づけ、VLM の隠れたマルチモーダルな不安定性を検出する。
埋め込みドリフト、スペクトル変化、構造的滑らかさを測る表現認識型フレームワークを提案する。
故障モードを特定し、摂動が推論と幻覚タスクへ与える影響を定量化する。
モデル規模、データセット、アーキテクチャ間でロバスト性を評価し、スケーリング効果を理解する。

提案手法

ラベル安定性と内部表現指標およびマージンダイナミクスを結合した評価フレームワークを開発する。
埋め込み安定性、Dirichlet Energy（構造的滑らかさ）、Perturbation Drift と Control Drift、Drift-to-Prior を複数のプロンプト regime で測定する。
ロジット尤度 MCQ スコアリングプロトコルを用いてマージンダイナミクスと意思決定境界を追跡する。
意味的オーバーレイや遮蔽を含む六つの摂动uation 系（翻訳、パディング/クロッピング、スケーリング、回転、テキストオーバーレイ）を評価する。
SEEDBench、MMMU、POPE を横断してクロスデータセット・クロスアーキテクチャのロバスト性を研究する。

Figure 1: Cosine distance ( $1-\cos$ ), Drift versus control drift for the ans_mcq_free embedding under Translation and Textoverlay perturbation. Blue shows perturbation-induced drift relative to the base image; orange shows control drift (base image versus randomly sampled other images). Left: Tran

実験結果

リサーチクエスチョン

RQ1出力レベルのロバスト性は、意味保持摂動の下で内部表現ドリフトを隠せるのか。
RQ2摂動は内部埋め込み、スペクトル含有量、ローカルトークンの滑らかさにどう影響するのか。
RQ3モデル規模はロバスト性を向上させるのか、それとも特定の摂動下で大きなモデルが脆弱になるのか。
RQ4摂動は推論と幻覚タスクにどのような影響を与えるのか。
RQ5周波数成分と周波数間の整合性は VLM のロバスト性にどのような役割を果たすのか。

主な発見

Perturbation	IFR	IV
Translation	0.062	0.168
Pad/Crop	0.065	0.169
Scale	0.079	0.079
Scale+Pad	0.080	0.100
Rotation	0.122	0.166
TextOverlay(semantic)	0.192	0.239
TextOverlay(random)	0.064	0.086
TextOverlay(empty)	0.043	0.044
Any (union)	0.079	0.376

摂動の組合せのうち 37.6% の画像が摂動ごとに少なくとも1回の決定反転を経験。
テキストオーバーレイは最も破壊的で、IFR ≈19.2%、IV ≈23.9%。
表現ドリフトは予測が同じでも大きくなることがあり、ドリフトの大きさはしばしば画像間変動と同程度である。
モデル規模は必ずしもロバスト性を保証せず、より大きなモデルは摂動下で同等または大きな表現ドリフトと誤差遷移を示す。
摂動は推論タスクを傷つけるが、幻覚ベンチマークでの偽陽性を減らす方向に働くことがあり、より保守的な予測を促す。
データセットとアーキテクチャを跨いでもロバスト性の欠陥は継続し、容量と単調増加的にスケールしない。

Figure 2: Qwen3-VL (Instruct) scaling on SEEDBench. Left: base accuracy versus ground truth. Right: average flip rate under natural perturbations (lower is better).

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。