QUICK REVIEW

[論文レビュー] Adaptation to Intrinsic Dependence in Diffusion Language Models

Yunxiao Zhao, Changxiao Cai|arXiv (Cornell University)|Feb 23, 2026

Topic Modeling被引用数 0

ひとこと要約

要約: 本論文は、内在的データ依存性に適応し、並列サンプリング時のKL収束保証を提供する拡散言語モデル向けの分布非依存のランダム化アンマスキングスケジュールを提案する。

ABSTRACT

Diffusion language models (DLMs) have recently emerged as a promising alternative to autoregressive (AR) approaches, enabling parallel token generation beyond a rigid left-to-right order. Despite growing empirical success, the theoretical understanding of how unmasking schedules -- which specify the order and size of unmasked tokens during sampling -- affect generation quality remains limited. In this work, we introduce a distribution-agnostic unmasking schedule for DLMs that adapts to the (unknown) dependence structure of the target data distribution, without requiring any prior knowledge or hyperparameter tuning. In contrast to prior deterministic procedures that fix unmasking sizes, our method randomizes the number of tokens revealed at each iteration. We show that, for two specific parameter choices, the sampling convergence guarantees -- measured by Kullback-Leibler (KL) divergence -- scale as $\widetilde O(\mathsf{TC}/K)$ and $\widetilde O(\mathsf{DTC}/K)$ respectively. Here, $K$ is the number of iterations, and $\mathsf{TC}$ and $\mathsf{DTC}$ are the total correlation and dual total correlation of the target distribution, capturing the intrinsic dependence structure underlying the data. Importantly, our guarantees hold in the practically relevant parallel-sampling regime $K

研究の動機と目的

拡散言語モデル（DLMs）におけるサンプリングの効率と精度のトレードオフの動機付けと解決。
未知のデータ構造に前提知識なしで適応する分布非依存のランダム化アンマスキングスケジュールを提案。
内在データ指標（TC/DTC）に依存する理論的KL発散収束保証を確立。
低複雑度の分布に対して、K < L の並列サンプリングを加速できることを示す。

提案手法

各反復でアンマスク集合のサイズをランダムに選択するランダム化アンマスキング方式を導入。
TC適応とDTC適応の2つの具体的スキームを、対応する係数・ウェイトの定義とともに規定。
TC(X)/KおよびDTC(X)/(K−log(L−1)−1)にスケールするKLベースの収束保証を提供。
並列サンプリング系のK < Lにおいて保証が成り立つことを示す。
条件付き周辺分布を周辺の積として近似するマスク予測器の訓練目的を説明（式3）。
アンマスキングスキームの計算コストと事前計算について議論（事前計算O(KL)；サンプリング後O(K+L)）。

Figure 1: Empirical mean unmasking size vs. iteration index $k$ : (a) TC-adaptive scheme $\pi_{\mathsf{tc}}$ ; (b) DTC-adaptive scheme $\pi_{\mathsf{dtc}}$ . The total number of iterations is $K=1000$ and the sequence length is $L=2000$ .

実験結果

リサーチクエスチョン

RQ1未知の内在データ構造に分布知識なしで適応するDLM向けアンマスキングスケジュールを設計できるか。
RQ2ターゲット分布のTCおよびDTCがDLMサンプリングのKL収束率にどのように影響するか。
RQ3TC-およびDTC適応スケジュールは、K < L の並列サンプリング下で実用的な利得を生むか。
RQ4ランダム化アンマスキングスケジュールを実際に実装する際の計算コストと実現可能性は。

主な発見

分布非依存のランダム化アンマスキングスケジュールは、TC(X)またはDTC(X)に依存するKL収束保証を達成する。
TC適応スキームは、TC(X)/Kに対する境界を最大でlog L因子までスケールさせる。
DTC適応スキームは、DTC(X)/(K−log(L−1)−1)に対する境界を最大でlog因子までスケールさせる。
K < L の並列生成領域で保証が成り立ち、低複雑度分布に対するサンプリングを高速化できる。
内部構造への適応性を示す2つの具体的スキームは、事前知識やTC/DTCの推定なしで適応可能である。
アンマスキングサイズを固定するのではなくランダム化することが、分布非依存の適応性にとって重要である。

Figure 2: Expected KL error of the TC-adaptive unmasking scheme $\pi_{\mathsf{tc}}$ : (a) KL error vs. iteration number $K$ for codimension $L-d=5$ ; (b) KL error vs. TC for number of iterations $K=500$ . Sequence length $L=2000$ and alphabet size $q=2048$ .

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。