QUICK REVIEW

[論文レビュー] DINOISER: Diffused Conditional Sequence Learning by Manipulating Noises

Jiasheng Ye, Zaixiang Zheng|arXiv (Cornell University)|Feb 20, 2023

Music and Audio Processing被引用数 9

ひとこと要約

DiNoiSerはノイズスケールクリッピングと条件強化拡散サンプリングを導入し、拡散ベースの条件付き系列学習の翻訳・テキスト生成品質を multilingual ベンチマークで改善します。

ABSTRACT

While diffusion models have achieved great success in generating continuous signals such as images and audio, it remains elusive for diffusion models in learning discrete sequence data like natural languages. Although recent advances circumvent this challenge of discreteness by embedding discrete tokens as continuous surrogates, they still fall short of satisfactory generation quality. To understand this, we first dive deep into the denoised training protocol of diffusion-based sequence generative models and determine their three severe problems, i.e., 1) failing to learn, 2) lack of scalability, and 3) neglecting source conditions. We argue that these problems can be boiled down to the pitfall of the not completely eliminated discreteness in the embedding space, and the scale of noises is decisive herein. In this paper, we introduce DINOISER to facilitate diffusion models for sequence generation by manipulating noises. We propose to adaptively determine the range of sampled noise scales for counter-discreteness training; and encourage the proposed diffused sequence learner to leverage source conditions with amplified noise scales during inference. Experiments show that DINOISER enables consistent improvement over the baselines of previous diffusion-based sequence generative models on several conditional sequence modeling benchmarks thanks to both effective training and inference strategies. Analyses further verify that DINOISER can make better use of source conditions to govern its generative process.

研究の動機と目的

拡散モデルが離散系列学習にもたらす主な制約（離散性の落とし穴、スケーラビリティ、ソース条件の活用不足）を特定する。
適応的ノイズスケール操作を通じて離散性を緩和するための訓練・推論戦略を開発する。
複数の条件付き系列タスク（機械翻訳、テキスト簡略化、言い換え）でベースラインを上回る性能を示す。
ノイズスケールがソース条件の依存と生成品質に与える影響を分析する。

提案手法

離散系列における埋め込みベース拡散の欠点を分析し、ノイズスケールとの関連を示す。
Small-noise regimesを避ける訓練を保証するノイズスケールクリッピングを導入し、埋め込み空間特性に応じてクリッピング閾値を適応させる。
CeDi（条件強化デノイザー）を提案し、高ノイズ指標を介して推論時のソース条件への依存を強制する。
ソース条件付き生成を強調するため、Timestepsを変更したDDIM風サンプリングと2タイムステップスケジュールを採用する。
潜在変数拡散フレームワーク内で最小ノイズ閾値と再構成項を含む訓練目的関数L'diffusionを提供する。

実験結果

リサーチクエスチョン

RQ1適応的ノイズスケーリングは拡散ベースの系列学習における離散性の落とし穴を緩和できるか。
RQ2訓練時により高い最小ノイズスケールを課すことは条件付き生成品質を改善するか。
RQ3CeDiによる条件強化サンプリングは推論時のソース条件の利用を改善するか。
RQ4DiNoiSerは多言語MT・テキスト簡略化・言い換えにおいて自己回帰・CMLM・従来の拡散ベース系列モデルと比べてどうか。

主な発見

Methods	IWSLT14 De→En	WMT14 En→De	WMT16 De→En	De→En	Ro→En	En→Ro
Transformer (AR, beam=5)	33.61	28.30	30.55	26.85	33.08	32.86
CMLM (NAR, LB=5)	29.41	24.33	28.71	23.22	31.13	31.26
CMLM (NAR, LB=5, MBR=1)	29.32	24.34	28.43	23.09	31.07	30.92
DiffusionLM (LB=5, MBR=1)	26.61	20.29	17.31	15.33	28.61	27.01
DiffusionLM (LB=5, MBR=10)	29.11	22.91	19.69	17.41	30.17	29.39
CDCD (MBR=10)	-	-	25.40	19.70	-	-
CDCD (MBR=100)	-	-	26.00	20.00	-	-
Difformer (LBxMBR=20)	-	-	-	23.80	-	-
DiffuSeq (KD, LBxMBR=10)	-	-	-	15.37	-	25.45
SeqDiffuSeq (KD, LBxMBR=10)	-	-	-	17.14	-	26.17
DiNoiSer (LB=5, MBR=1)	31.29	25.55	28.83	24.25	31.14	30.93
DiNoiSer (LB=5, MBR=10)	31.61	25.70	29.05	24.26	31.22	31.08
DiNoiSer (LB=10, MBR=5)	31.44	26.14	29.01	24.62	31.24	31.03
DiNoiSer (KD, LB=10, MBR=5)	-	-	30.30	25.88	33.13	32.84

DiNoiSerは双方向・多言語MT、テキスト簡略化、言い換えを含む複数の条件付き系列タスクで拡散ベースのベースラインに対して一貫した改善を達成した。
ノイズスケールクリッピング戦略を用いた訓練は小ノイズ域での訓練を防ぎ、離散性の落とし穴を緩和する。
CeDiを介した大ノイズ指標での推論はソース条件への依存を高め、幻覚を減らす。
アブレーション研究は訓練の改善（ノイズクリッピング）と推論の改善（CeDiサンプリング）の両方が性能向上に寄与することを確認した。
事後分析は条件強化デノイザーがソース条件をより適切に活用して正確な予測を行えることを示した。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。