QUICK REVIEW

[论文解读] A Distributional Approach to Controlled Text Generation

Muhammad Khalifa, Hady Elsahar|arXiv (Cornell University)|Dec 21, 2020

Topic Modeling参考文献 55被引用 31

一句话总结

论文提出一个分布式控制框架（GDC），通过推导一个最优的能量基模型（EBM）使其最小化与原始 LM 的 KL 散度来统一点wise 与分布式约束，然后通过 KL 自适应分布式策略梯度训练自回归策略。

ABSTRACT

We propose a Distributional Approach for addressing Controlled Text Generation from pre-trained Language Models (LMs). This approach permits to specify, in a single formal framework, both "pointwise" and "distributional" constraints over the target LM -- to our knowledge, the first model with such generality -- while minimizing KL divergence from the initial LM distribution. The optimal target distribution is then uniquely determined as an explicit EBM (Energy-Based Model) representation. From that optimal representation we then train a target controlled Autoregressive LM through an adaptive distributional variant of Policy Gradient. We conduct a first set of experiments over pointwise constraints showing the advantages of our approach over a set of baselines, in terms of obtaining a controlled LM balancing constraint satisfaction with divergence from the initial LM. We then perform experiments over distributional constraints, a unique feature of our approach, demonstrating its potential as a remedy to the problem of Bias in Language Models. Through an ablation study, we show the effectiveness of our adaptive technique for obtaining faster convergence. (Code available at https://github.com/naver/gdc)

研究动机与目标

将受控文本生成形式化为目标分布的约束满意问题，以同时满足点式与分布式要求。
在保持语言质量的同时最小化从预训练 LM 到约束的 KL 散度。
在 EBM 形式下推导唯一的最优目标分布，并通过可学习的自回归策略实现实际采样。
展示偏差缓解潜力并通过自适应采样方法实现更快收敛。

提出的方法

将矩约束定义为目标分布 p 下特征函数的期望。
通过在矩约束条件下最小化 D_KL(p||a) 获得唯一的 p，使 p(x) ∝ a(x) exp( sum_i λ_i φ_i(x) )。
用 Self-NN Importance Sampling 近似 EBM 以估计矩，并通过 SGD(算法 1) 求解 λ。
训练自回归策略 π_θ 来近似 p，使用 KL-Adaptive Distributional Policy Gradient (算法 2)。
将过程解耦为在 EBM 上的枢转，然后进行高效推断采样，便于监控 D_KL(p||π_θ) 与 D_KL(π_θ||a)。
在单一框架中提供点式、分布式与混合约束处理。

实验结果

研究问题

RQ1在保持尽量接近原始 LM 的同时，是否能在单一、KL 最小化框架内同时满足点式与分布式约束？
RQ2最优目标分布是否形成唯一满足约束的能量基模型？
RQ3KL-Adaptive DPG 是否能高效近似最优分布以用于自回归采样？
RQ4与基线相比，分布式控制在约束满足、多样性和偏见缓解方面有何差异？

主要发现

GDC 在约束满足方面优于基线，同时保持对预训练 GPT-2 的较低散度并保留多样性。
在分布实验中，GDC 通过提高目标人口统计或主题比例来缓解偏见（例如在一个设定中将女性传记比例从 7.4% 提升到 35.6%）。
在职业/传记实验中，GDC 在混合设定下实现了目标的上升/下降（如 Science 由 1.5% 提升至 20.1%、Art 由 11.4% 提升至 88.6%；见表格数值）。
GDC 相对于基线在收敛性方面更稳定，D_KL(p||π_θ) 更低，并且在许多情况下词汇量更丰富（Self-BLEU-5 下降）。
该方法避免了某些强化学习基线的退化问题，因为它在执行约束的同时保持与原始 LM 的接近性。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。