QUICK REVIEW

[论文解读] Advancing Block Diffusion Language Models for Test-Time Scaling

Yi Lu, Deyang Kong|arXiv (Cornell University)|Feb 10, 2026

Topic Modeling被引用 0

一句话总结

论文提出 BACD 与 TCCF，使 Block Diffusion Language Models 在测试时具备自适应扩展能力，从而实现更快推理和在复杂基准上的推理提升。

ABSTRACT

Recent advances in block diffusion language models have demonstrated competitive performance and strong scalability on reasoning tasks. However, existing BDLMs have limited exploration under the test-time scaling setting and face more severe decoding challenges in long Chain-of-Thought reasoning, particularly in balancing the decoding speed and effectiveness. In this work, we propose a unified framework for test-time scaling in BDLMs that introduces adaptivity in both decoding and block-wise generation. At the decoding level, we propose Bounded Adaptive Confidence Decoding (BACD), a difficulty-aware sampling strategy that dynamically adjusts denoising based on model confidence, accelerating inference while controlling error accumulation. Beyond step-wise adaptivity, we introduce Think Coarse, Critic Fine (TCCF), a test-time scaling paradigm that allocates large block sizes to exploratory reasoning and smaller block sizes to refinement, achieving an effective efficiency-effectiveness balance. To enable efficient and effective decoding with a large block size, we adopt Progressive Block Size Extension, which mitigates performance degradation when scaling block sizes. Extensive experiments show that applying BACD and TCCF to TDAR-8B yields significant improvements over strong baselines such as TraDo-8B (2.26x speedup, +11.2 points on AIME24). These results mark an important step toward unlocking the potential of BDLMs for test-time scaling in complex reasoning tasks.

研究动机与目标

在测试时扩展下对 BDLM 的高效长链推理提供动机。
开发自适应解码与分块大小策略，以平衡速度与准确性。
提出渐进式分块大小扩展以实现大块解码。
在数学、代码与 STEM 推理基准上展示改进。
提供开源代码与模型以便复现。

提出的方法

提出界限自适应置信解码（BACD）：一种动态、有界阈值的策略，利用过去的平均置信度在每一步选择解码的令牌。
引入 Think Coarse, Critic Fine（TCCF）：在测试时推理中将大块大小分配给探索性思维、较小块用于细化阶段。
应用渐进式分块大小扩展：一种多阶段微调方法，逐步增大块大小以缓解扩块带来的降级。
通过对 BDLMs 绑定上限和下限置信阈值来调整采样策略，以稳定速度-准确性权衡。
在涵盖数学、代码生成与 STEM 推理的六个基准上进行评估。
提供训练细节，包括从 B=4 到 B=64 的渐进式分块扩展，以及对 8B 模型的 B=16 的选择。

实验结果

研究问题

RQ1在 BDLM 的长推理轨迹中，测试时解码如何适应不同难度？
RQ2在推理阶段跨分块大小的变化是否能改善效率-准确性权衡？
RQ3渐进式分块大小扩展是否在使用大块时实现稳定的训练与推理？
RQ4BACD 与 TCCF 如何影响数学、代码与 STEM 推理基准的性能与速度？

主要发现

方法	Math500 (TPF)	Math500 (ACC)	AIME24 (TPF)	AIME24 (ACC)	AIME25 (TPF)	AIME25 (ACC)	AMC23 (TPF)	AMC23 (ACC)	LCB (TPF)	LCB (ACC)	GPQA (TPF)	GPQA (ACC)	AVG (TPF)	AVG (ACC)
+ BACD +TCCF (TDAR-8B-thinking, ours)	1.75	84.0	3.04	42.9	2.79	35.8	2.68	80.0	1.32	42.6	1.39	50.0	2.16	55.9

在 BACD 的 TDAR-8B 思考下，解码更快（最高提升至 3.37x）且在 AIME24 上的准确性优于基线。
TCCF 进一步提升推理性能，并在各基准上提供更好的速度-准确性权衡。
渐进式分块大小扩展在扩展块大小时缓解性能退化，并显著优于直接扩展。
BACD 在不同置信阈值下保持稳定性能，且在稳定性与鲁棒性方面优于动态置信解码。
BACD 与 TCCF 能提升较长生成任务的鲁棒性与性能（复杂推理）。
泛化性：BACD 与 TCCF 也能提升 TraDo-8B-Thinking，显示对 BDLM 的广泛适用性。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。