Skip to main content
QUICK REVIEW

[论文解读] DynHD: Hallucination Detection for Diffusion Large Language Models via Denoising Dynamics Deviation Learning

Yanyu Qian, Yue Tan|arXiv (Cornell University)|Mar 17, 2026
Mental Health via Writing被引用 0
一句话总结

DynHD 通过从标记熵构建语义感知证据并以参考轨迹和偏差为基础的检测器建模去噪动力学,在基准数据上实现了最先进的 AUROC。

ABSTRACT

Diffusion large language models (D-LLMs) have emerged as a promising alternative to auto-regressive models due to their iterative refinement capabilities. However, hallucinations remain a critical issue that hinders their reliability. To detect hallucination responses from model outputs, token-level uncertainty (e.g., entropy) has been widely used as an effective signal to indicate potential factual errors. Nevertheless, the fixed-length generation paradigm of D-LLMs implies that tokens contribute unevenly to hallucination detection, with only a small subset providing meaningful signals. Moreover, the evolution trend of uncertainty throughout the diffusion process can also provide important signals, highlighting the necessity of modeling its denoising dynamics for hallucination detection. In this paper, we propose DynHD that bridge these gaps from both spatial (token sequence) and temporal (denoising dynamics) perspectives. To address the information density imbalance across tokens, we propose a semantic-aware evidence construction module that extracts hallucination-indicative signals by filtering out non-informative tokens and emphasizing semantically meaningful ones. To model denoising dynamics for hallucination detection, we introduce a reference evidence generator that learns the expected evolution trajectory of uncertainty evidence, along with a deviation-based hallucination detector that makes predictions by measuring the discrepancy between the observed and reference trajectories. Extensive experiments demonstrate that DynHD consistently outperforms state-of-the-art baselines while achieving higher efficiency across multiple benchmarks and backbone models.

研究动机与目标

  • 为使用固定长度序列和迭代去噪的扩散型大语言模型(D-LLMs)提供可靠的幻觉检测动机。
  • 解决跨词元的信息密度失衡问题,以避免检测信号被稀释。
  • 建模不确定性的时间演化(去噪动力学)以捕捉幻觉过程级信号。
  • 开发一个两阶段框架,构建语义感知证据并学习与参考轨迹的偏差。
  • 在多种数据集和骨干 D-LLMs 中证明鲁棒性和高效性。

提出的方法

  • 语义感知证据构建:滤除非信息性结构性 token,并通过每一步的三项统计量对语义 token 的熵进行汇总(语义 token 的均值、最大熵、以及前 k 个熵的均值)。
  • 从逐步统计量构建证据轨迹 E = (a_T, a_{T-1}, ..., a_0)。
  • 动力学偏差学习:以查询为条件训练参考证据动力学生成器 g_theta,用以建模正常证据的演化。
  • 使用一个基于偏差的检测器,将观测证据 a_t、参考 a_hat_t及其差 Delta a_t,以及可学习的时间权重相结合来预测幻觉。
  • 引入正则化项,强调在后期阶段的停滞和潜在回弹的不确定性,辅以基于 EMA 的自适应边界。
  • 端到端目标函数将分类损失与路径和回弹正则化项相结合(L_cls + lambda1*L_path + lambda2*L_reb)。
Figure 1: Visualization of spatial uncertainty distribution during decoding. Tokens exhibiting the highest entropy spikes serve as the primary indicators of factual instability. On the contrary, intermediate and structural tokens provide limited cues for hallucination detection.
Figure 1: Visualization of spatial uncertainty distribution during decoding. Tokens exhibiting the highest entropy spikes serve as the primary indicators of factual instability. On the contrary, intermediate and structural tokens provide limited cues for hallucination detection.

实验结果

研究问题

  • RQ1语义 token 过滤和多变量熵统计如何在 D-LLMs 中改善幻觉信号?
  • RQ2以参考轨迹建模去噪动力学是否能比最先进的轨迹方法更好地检测幻觉?
  • RQ3后期阶段的动力学(停滞/回弹)是否在跨数据集的 D-LLMs 报告事实性时提供更强的线索?
  • RQ4DynHD 框架在不同的 D-LLM 骨干和问答任务上是否鲁棒且高效?
  • RQ5消融研究对证据构建与偏差建模组件的检测性能有何影响?

主要发现

  • DynHD 在 TriviaQA、HotpotQA 和 CSQA 的 AUROC 上超越最先进水平,覆盖 LLaDA-8B-Instruct 与 Dream-7B-Instruct 两个骨干,且对基线有平均提升。
  • 在所报道的设置中,DynHD 平均 AUROC 比 TraceDet 提高 12.2%。
  • 消融研究表明证据的令牌过滤和三种熵统计对于强性能至关重要;移除组件会降低 AUROC。
  • 统一的时序汇聚与基于注意力的加权提升了证据的时序聚合能力,且强调后期步骤与停滞/回弹信号一致。
  • DynHD 展现出良好的速度-精度权衡,相较多样本方法更高效,同时提供更高的准确度。
(a)
(a)

更好的研究,从现在开始

从论文设计到论文写作,大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成,并经人工编辑审核。