QUICK REVIEW

[论文解读] Revealing the Attention Floating Mechanism in Masked Diffusion Models

Xin Dai, Pengcheng Huang|arXiv (Cornell University)|Jan 12, 2026

Generative Adversarial Networks and Image Synthesis被引用 0

一句话总结

论文揭示了掩码扩散模型（MDMs）中的注意力漂浮现象，呈现出对结构较浅、对内容较深的注意模式，提升了在上下文中使用知识的能力和鲁棒性，在利用检索上下文时优于自回归模型（ARMs）

ABSTRACT

Masked diffusion models (MDMs), which leverage bidirectional attention and a denoising process, are narrowing the performance gap with autoregressive models (ARMs). However, their internal attention mechanisms remain under-explored. This paper investigates the attention behaviors in MDMs, revealing the phenomenon of Attention Floating. Unlike ARMs, where attention converges to a fixed sink, MDMs exhibit dynamic, dispersed attention anchors that shift across denoising steps and layers. Further analysis reveals its Shallow Structure-Aware, Deep Content-Focused attention mechanism: shallow layers utilize floating tokens to build a global structural framework, while deeper layers allocate more capability toward capturing semantic content. Empirically, this distinctive attention pattern provides a mechanistic explanation for the strong in-context learning capabilities of MDMs, allowing them to double the performance compared to ARMs in knowledge-intensive tasks. All codes and datasets are available at https://github.com/NEUIR/Attention-Floating.

研究动机与目标

研究在去噪过程中，MDMs内部的注意力如何表现。
表征注意力漂浮现象并将其与ARMs中的注意力汇聚点进行对比。
理解注意力动态如何促进MDMs的上下文学习和知识利用。
在带有上下文噪声、位置偏置和证据布局的情况下，考察MDMs的鲁棒性。

提出的方法

在去噪步骤和层级中定义并量化MDMs的注意力模式。
可视化每个令牌的注意力权重并执行逐层QK（查询-键）几何分解（范数乘积与方向余弦）。
识别漂浮令牌并将其分为结构令牌与词汇令牌。
分析检索头以评估它们在上下文敏感信息流中的作用。
进行区域级注意力流分析，追踪推理过程中注意力如何在输入区域之间移动。
在知识密集任务中对比MDMs与ARMs，在有无检索上下文时的表现。

实验结果

研究问题

RQ1MDMs在去噪步骤与层级中的注意力分布性质为何？
RQ2漂浮令牌与ARMs中的汇聚点有何不同，它们往往是哪些令牌（结构型 vs 词汇型）？
RQ3与ARMs相比，注意力漂浮机制如何促进MDMs的上下文学习和鲁棒性？

主要发现

MDMs呈现注意力漂浮现象，主导注意力锚点在位置和步骤间漂移，而非收敛到固定的汇聚点。
较浅的层依赖漂浮的结构令牌来形成全局框架，而较深的层将注意力转向语义内容。
检索头的分析显示，随着深度增加，较深层的面向内容的检索头变得更具影响力，与提出的“浅结构、深内容”机制相一致。
在知识密集任务中，MDMs从检索上下文中获得更大收益，且在多种情景下优于具有检索的ARMs。
MDMs在面对上下文噪声、位置扰动和证据分布等压力测试时表现出鲁棒性，优于ARMs。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。