QUICK REVIEW

[論文レビュー] Diffusion In Diffusion: Reclaiming Global Coherence in Semi-Autoregressive Diffusion

Linrui Ma, Yufei Cui|arXiv (Cornell University)|Jan 20, 2026

Topic Modeling被引用数 0

ひとこと要約

Diffusion in Diffusionというマルチ段階のドラフト-リファインフレームワークを提案。小ブロックのドラフトとグローバルな双方向リビジョンを組み合わせて半自己回帰型拡散モデルのグローバル整合性を回復し、OpenWebText上で modestなファインチューニング予算で強いパープレキシティ向上を達成。

ABSTRACT

One of the most compelling features of global discrete diffusion language models is their global bidirectional contextual capability. However, existing block-based diffusion studies tend to introduce autoregressive priors, which, while offering benefits, can cause models to lose this global coherence at the macro level. To regain global contextual understanding while preserving the advantages of the semi-autoregressive paradigm, we propose Diffusion in Diffusion, a 'draft-then-refine' framework designed to overcome the irreversibility and myopia problems inherent in block diffusion models. Our approach first employs block diffusion to generate rapid drafts using small blocks, then refines these drafts through global bidirectional diffusion with a larger bidirectional receptive field. We utilize snapshot confidence remasking to identify the most critical tokens that require modification, and apply mix-scale training to expand the block diffusion model's global capabilities. Empirical results demonstrate that our approach sets a new benchmark for discrete diffusion models on the OpenWebText dataset. Using only 26% of the fine-tuning budget of baseline models, we reduce generative perplexity from 25.7 to 21.9, significantly narrowing the performance gap with autoregressive models.

研究の動機と目的

Semi-autoregressive diffusionモデルにおけるグローバル整合性と推論効率のトレードオフを動機付け・解決する。
小ブロックでドラフトを行い大ブロックのグローバル拡散でリファインする構造拡散フレームワークを提案する。
ステージ間で revise のためのトークンを選択するスナップショット信頼度リマスキングを導入する。
複数ブロックサイズにわたる効果的な学習を可能にするマックスケール訓練を開発する。

提案手法

段階的にブロックサイズを拡大するマルチステージ生成パイプラインを提案（ドラフト → リビジョン）。
スナップショット信頼度に基づくインターステージリマスキングで revision 対象トークンを選択。
リビジョン時により大きな受容野を持つグローバル双方向拡散ステップを適用。
ドラフトとリビジョンの能力のバランスを取るための混合スケール訓練目的を導入（ブロックサイズ分布は二峰性）。
事前学習済みBD3-LMチェックポイントから初期化した110MパラメータのTransformerを用いてOpenWebText上で訓練/評価。
構造ブロック拡散サンプリングのアルゴリズム（Algorithm 1）を提供。

Figure 1: Overview of Diffusion in Diffusion method

実験結果

リサーチクエスチョン

RQ1マルチステージのブロック拡散フレームワークは速度を犠牲にせず半自己回帰拡散モデルのグローバル整合性を回復できるか。
RQ2スナップショット信頼度ベースのリマスキングは revision の恩恵を受けるトークンを効果的に識別できるか。
RQ3混合スケール訓練はドラフト（小ブロック）とリビジョン（大ブロック）の段階間の一般化を改善するか。
RQ4OpenWebTextにおけるドラフト-リファインパラダイム導入時のデータ効率向上はどの程度か。

主な発見

Model	Gen. PPL (L=1024)	NFEs (L=1024)	Gen. PPL (L=2048)	NFEs (L=2048)
AR	14.1	1K	13.2	2K
SEDD	52.0	1K	41.3	2K
MDLM	46.8	1K	35.3	2K
SSD-LM (L'=25)	37.2	40K	35.3	80K
BD3-LM (L'=16)	33.4	1K	31.5	2K
BD3-LM (L'=8)	30.4	1K	28.2	2K
BD3-LM (L'=4)	25.7	1K	23.6	2K
Ours (Stage 1 only)	27.4	1.0K	25.1	2.0K
Ours (Full 2-Stage)	24.6	1.1K	22.5	2.2K
Ours (Stage 2+ Stage 3?)	22.6	1.2K	21.2	2.5K
Ours (Full)	21.9	1.5K	20.6	3.0K

Stage 1（ドラフト）でL=1024時のGen PPLは27.4を達成；Stage 2（完全リビジョン）でGen PPLは21.9へ低下、相対的には約20%の改善。
調整予算の26%を用いた2段階法は単一パスのブロック拡散ベースを上回り、自己回帰品質に近づく。
スナップショット信頼度リマスキングはランダムマスキングや後付け信頼度戦略より revision の誘引に優れる。
混合スケール訓練（ブロックサイズ4と1024の二峰性）はドラフトとグローバルリビジョン双方を可能にするため必須。ベースラインの単一スケール訓練はリビジョンで失敗する。
OpenWebText上で本手法はこのスケールの離散拡散モデルとして新しい最先端を確立し、品質と効率のトレードオフが良好。
自己回帰ベースと比較すると本手法は困惑度のギャップを狭め、ブロック拡散へグローバル受容野を再導入する効果を示している。

Figure 2: Ablation on Revision Scope. Generative Perplexity (Gen PPL) as a function of the Stage 2 revision ratio $\gamma$ across varying block sizes $\mathcal{B}^{(2)}$ . The gray dashed line represents the Stage 1 baseline (BS=4).

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。