QUICK REVIEW

[論文レビュー] Auto-Regressive Masked Diffusion Models

Mahdi Karami, Ali Ghodsi|arXiv (Cornell University)|Jan 23, 2026

Topic Modeling被引用数 0

ひとこと要約

tldr: ARMDは、マスクド拡散をブロック単位の因果モデルとして再構成し、厳密な因果性と並列生成を用いた置換同値性を備えたアーキテクチャを導入することで、マスクド拡散を用いたトレーニング効率を自己回帰デコーディングと統合します。

ABSTRACT

Masked diffusion models (MDMs) have emerged as a promising approach for language modeling, yet they face a performance gap compared to autoregressive models (ARMs) and require more training iterations. In this work, we present the Auto-Regressive Masked Diffusion (ARMD) model, an architecture designed to close this gap by unifying the training efficiency of autoregressive models with the parallel generation capabilities of diffusion-based models. Our key insight is to reframe the masked diffusion process as a block-wise causal model. This perspective allows us to design a strictly causal, permutation-equivariant architecture that computes all conditional probabilities across multiple denoising steps in a single, parallel forward pass. The resulting architecture supports efficient, autoregressive-style decoding and a progressive permutation training scheme, allowing the model to learn both canonical left-to-right and random token orderings. Leveraging this flexibility, we introduce a novel strided parallel generation strategy that accelerates inference by generating tokens in parallel streams while maintaining global coherence. Empirical results demonstrate that ARMD achieves state-of-the-art performance on standard language modeling benchmarks, outperforming established diffusion baselines while requiring significantly fewer training steps. Furthermore, it establishes a new benchmark for parallel text generation, effectively bridging the performance gap between parallel and sequential decoding.

研究の動機と目的

Masked diffusion models (MDMs) と autoregressive models (ARMs) の言語モデリングにおけるギャップを動機づける。
すべての条件付き評価を並列に可能にする厳密な因果性・置換同値性を持つアーキテクチャを提案する。
キー・バリューキャッシングとストライド並列生成戦略を用いて自己回帰風デコーディングを可能にする。
canonical な左から右への順序とランダムトークン順序の学習をサポートするトレーニングスキームを提供する。
拡散ベースのベースラインと比較してトレーニングステップを削減しつつ、最先端の性能を示す。

提案手法

マスクド拡散をブロック単位の因果モデルとして再定義し、単一のフォワードパスで全ての条件付きを並列評価できるようにする。
厳密な因果性を持つパーミュテーション等価性・アテンションベースのアーキテクチャを導入し、因果層と厳密因果層の二流アテンション機構を採用する。
左から右とランダム順序の学習を促す漸進的置換トレーニングによるハイブリッド学習を可能にする。
推論時の自己回帰風デコーディングを効率化するために KV キャッシングを組み込む。
グローバルな整合性を保ちながら並列ストリームでトークンを生成するストライド並列生成（SBP）戦略を開発する。

実験結果

リサーチクエスチョン

RQ1マスクド拡散モデルをブロック単位の因果モデルとして再定義し、並列条件付き評価を可能にできるか。
RQ2厳密な因果性・置換同値性を持つアーキテクチャは、既存のMDMと比較してトレーニング効率と言語モデリングの性能を改善するか。
RQ3ストライド並列生成は速度と品質の両方で拡散ベースと自己回帰デコーディングのギャップを縮めるか。
RQ4漸進的置換トレーニングは canonical な順序とランダム順序の両方からの学習にどう影響するか。

主な発見

ARMD は標準的な言語モデリングのベンチマークで最先端の性能を達成する。
ARMD は確立された拡散ベースのベースラインよりも大幅に少ないトレーニングステップで上回る。
このモデルは、並列テキスト生成における新しいベンチマークを確立し、並列デコードと逐次デコードの性能ギャップを埋める。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。