QUICK REVIEW

[论文解读] Exploring the Limits of Domain-Adaptive Training for Detoxifying Large-Scale Language Models

Boxin Wang, Ping Wei|arXiv (Cornell University)|Feb 8, 2022

Topic Modeling被引用 20

一句话总结

本文研究通过领域自适应训练对大语言模型进行去毒化，提出 Self-Generation Enabled domain-Adaptive Training (SGEAT)，并在适配器和前缀调优与对整个模型的适应在规模上的比较。

ABSTRACT

Pre-trained language models (LMs) are shown to easily generate toxic language. In this work, we systematically explore domain-adaptive training to reduce the toxicity of language models. We conduct this study on three dimensions: training corpus, model size, and parameter efficiency. For the training corpus, we propose to leverage the generative power of LMs and generate nontoxic datasets for domain-adaptive training, which mitigates the exposure bias and is shown to be more data-efficient than using a curated pre-training corpus. We demonstrate that the self-generation method consistently outperforms the existing baselines across various model sizes on both automatic and human evaluations, even when it uses a 1/3 smaller training corpus. We then comprehensively study detoxifying LMs with parameter sizes ranging from 126M up to 530B (3x larger than GPT-3), a scale that has never been studied before. We find that i) large LMs have similar toxicity levels as smaller ones given the same pre-training corpus, and ii) large LMs require more endeavor to detoxify. We also explore parameter-efficient training methods for detoxification. We demonstrate that adding and training adapter-only layers in LMs not only saves a lot of parameters but also achieves a better trade-off between toxicity and perplexity than whole model adaptation for the large-scale models.

研究动机与目标

研究如何在跨训练语料、模型规模和参数效率条件下，通过领域自适应训练实现去毒化。
证明自生成数据在去毒化方面能够优于经过精心挑选的预训练数据，且具有更好的数据效率。
评估毒性降低与语言模型质量（困惑度和下游效用）之间的权衡。
相对于全模型适应，评估参数高效方法（Adapter 和 Prefix-tuning）在去毒化方面的效果。

提出的方法

提出 Self-Generation Enabled domain-Adaptive Training (SGEAT)，使用自生成的提示来创建非毒性语料。
使用核采样（p=0.9，温度=1）为每篇文档生成最多 1,000 个标记，并在文档结束处截断。
用 Perspective API 过滤生成数据，以保留约 50% 的用于训练的文本。
在经过筛选的非毒性语料上对预训练语言模型进行微调，使用标准对数似然损失。
比较 SGEAT 的变体（standard、heuristic、augmented）与 DAPT 和 Jigsaw 基线，以及解码时方法。
使用 Perspective API（Expected Maximum Toxicity、Toxicity Probability）评估毒性，以及通过困惑度和零-shot 效用评估语言模型质量。

实验结果

研究问题

RQ1领域自适应训练的去毒化有效性如何随模型大小（从 126M 到 530B 参数）变化？
RQ2使用自生成数据与预训练语料数据对去毒化效率和 LM 质量有何影响？
RQ3相对于全模型适应，参数高效方法（Adapter、Prefix-tuning）在大型 LMs 的去毒化中是否提供更有利的权衡？
RQ4将 SGEAT 与解码时方法结合能否在不过度损害困惑度或任务性能的前提下实现更高的毒性降低？

主要发现

通过 SGEAT 的自生成数据在各模型规模上始终优于基线，即使训练语料量少了三分之一。
当预训练数据保持不变时，大型 LM 的毒性水平与较小模型相似，表明毒性来自数据而非模型规模。
去毒化效果随着模型规模增大而衰减，需要更多数据或训练来在较大模型上达到类似的毒性降低。
基于 Adapter 的领域自适应训练在大型 LMs 中提供比全模型适应更好的毒性-困惑度-效用权衡，特别是模型规模增大时。
前缀调优在去毒方面效果较差，对困惑度和下游效用的控制也弱于 Adapter。
配合解码时方法的 SGEAT 能在评估的方法中实现最低毒性，同时保持 LM 质量和效用。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。