QUICK REVIEW

[論文レビュー] Exploring the Limits of Domain-Adaptive Training for Detoxifying Large-Scale Language Models

Boxin Wang, Ping Wei|arXiv (Cornell University)|Feb 8, 2022

Topic Modeling被引用数 20

ひとこと要約

論文本体は、ドメイン適応訓練を用いたデ detoxification の研究で、Self-Generation Enabled domain-Adaptive Training (SGEAT) を導入し、アダプター／プレフィックス・チューニングと全モデル適応をサイズ別に比較します。

ABSTRACT

Pre-trained language models (LMs) are shown to easily generate toxic language. In this work, we systematically explore domain-adaptive training to reduce the toxicity of language models. We conduct this study on three dimensions: training corpus, model size, and parameter efficiency. For the training corpus, we propose to leverage the generative power of LMs and generate nontoxic datasets for domain-adaptive training, which mitigates the exposure bias and is shown to be more data-efficient than using a curated pre-training corpus. We demonstrate that the self-generation method consistently outperforms the existing baselines across various model sizes on both automatic and human evaluations, even when it uses a 1/3 smaller training corpus. We then comprehensively study detoxifying LMs with parameter sizes ranging from 126M up to 530B (3x larger than GPT-3), a scale that has never been studied before. We find that i) large LMs have similar toxicity levels as smaller ones given the same pre-training corpus, and ii) large LMs require more endeavor to detoxify. We also explore parameter-efficient training methods for detoxification. We demonstrate that adding and training adapter-only layers in LMs not only saves a lot of parameters but also achieves a better trade-off between toxicity and perplexity than whole model adaptation for the large-scale models.

研究の動機と目的

ドメイン適応訓練を通じて、トレーニングコーパス、モデルサイズ、パラメータ効率性に応じたデトックス実現の可能性を調査する。
自己生成データが、デトックスのデータ効率を高め、キュレーション済みの事前学習データよりも効果的であることを示す。
毒性低下とLMの品質（パープレキシティと下流ユーティリティ）とのトレードオフを評価する。
デトックスに対するパラメータ効率的手法（アダプターおよびプレフィックス・チューニング）を、全モデル適応と比較して評価する。

提案手法

Self-Generation Enabled domain-Adaptive Training (SGEAT) を提案し、自己生成プロンプトを使用して非毒性コーパスを作成する。
各文書につき最大 1,000 トークンを核サンプリング（p=0.9、温度=1）で生成し、文末で終了する要約を得る。
Perspective API を用いて生成データをフィルタリングし、学習用データとして約 50% を保持する。
事前学習済みLMを、キュレーションされた非毒性コーパスで標準の対数尤度損失を用いてファインチューニングする。
SGEAT の標準版、ヒューリスティック版、増補版を、DAPTおよびJigsawのベースライン、及びデコード時手法と比較する。
Perspective API（Expected Maximum Toxicity, Toxicity Probability）とLMの品質を、パープレキシティおよびタスク横断のダウンストリーム・ユーティリティで評価する。

実験結果

リサーチクエスチョン

RQ1ドメイン適応訓練のデトックス効果は、モデルサイズ（126M から 530B のパラメータまで）でどのようにスケールするか。
RQ2自己生成データと事前学習データを用いた場合のデトックス効率とLM品質にどのような影響があるか。
RQ3大規模LMにおいて、アダプター・プレフィックス・チューニングと全モデル適応のトレードオフはどうか。
RQ4SGEATとデコード時手法を組み合わせると、パープレキシティやタスク性能を大幅に損なうことなく、毒性低減を向上させられるか。

主な発見

Self-generated data via SGEAT は、訓練コーパスを3分の1減らしても、サイズが異なるモデル全体で一貫してベースラインを上回る。
事前学習データを一定に保った場合、大規模LMは小規模モデルと同程度の毒性を示すことがあり、毒性はモデルサイズよりデータから生じることを示唆する。
デトックス効果はモデルサイズが大きくなるほど減衰し、同等の毒性低減を達成するには、より多くのデータまたは訓練が必要となる。
大規模LMにおいては、アダプターに基づくドメイン適応訓練が、全モデル適応より毒性-パープレキシティ-ユーティリティのトレードオフで優れる。とりわけモデルサイズが大きくなるにつれて有利。
プレフィックス・チューニングは毒性低減には効果が低く、パープレキシティと下流ユーティリティの制御力がアダプターより弱い。
SGEAT にデコード時手法を組み合わせると、評価対象手法の中で最も低い毒性を達成しつつ、LMの品質とユーティリティを維持できる。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。