[论文解读] Recovering Private Text in Federated Learning of Language Models
论文提出 FILM,一种针对语言模型联邦学习的文本特定梯度反演攻击,能够从最多 128 的批次中恢复单句或多句,并提出嵌入冻结作为防御,使用公开预训练模型在隐私-效用权衡方面提供更好的折中。
Federated learning allows distributed users to collaboratively train a model while keeping each user's data private. Recently, a growing body of work has demonstrated that an eavesdropping attacker can effectively recover image data from gradients transmitted during federated learning. However, little progress has been made in recovering text data. In this paper, we present a novel attack method FILM for federated learning of language models (LMs). For the first time, we show the feasibility of recovering text from large batch sizes of up to 128 sentences. Unlike image-recovery methods that are optimized to match gradients, we take a distinct approach that first identifies a set of words from gradients and then directly reconstructs sentences based on beam search and a prior-based reordering strategy. We conduct the FILM attack on several large-scale datasets and show that it can successfully reconstruct single sentences with high fidelity for large batch sizes and even multiple sentences if applied iteratively. We evaluate three defense methods: gradient pruning, DPSGD, and a simple approach to freeze word embeddings that we propose. We show that both gradient pruning and DPSGD lead to a significant drop in utility. However, if we fine-tune a public pre-trained LM on private text without updating word embeddings, it can effectively defend the attack with minimal data utility loss. Together, we hope that our results can encourage the community to rethink the privacy concerns of LM training and its standard practices in the future.
研究动机与目标
- 在语言模型的联邦学习中引发对隐私问题的关注,并证明从梯度恢复私有文本的可行性。
- 开发一种聚焦文本的攻击(FILM),利用嵌入梯度来恢复单词并重构句子。
- 在大型语言模型数据集上评估攻击性能并评估防御方法。
- 提出一种简单的嵌入冻结防御,并在不同训练设置下分析隐私-效用权衡。
提出的方法
- 从词嵌入梯度中提取一组词汇(bag of words),以识别私有批次中的候选词。
- 用由预训练或记忆化的语言模型驱动的束搜索来从词集合中重建句子。
- 应用一个基于先验的重新排序步骤,将困惑度与梯度范数结合来细化恢复的句子。
- 对同一批次迭代地应用攻击以恢复多句。
实验结果
研究问题
- RQ1在联邦语言模型训练中,窃听者是否可以从梯度中恢复私有文本,批大小可达 128 句?
- RQ2利用嵌入梯度和语言先验的攻击在从私有批次重构句子方面有多有效?
- RQ3哪些防御措施可以在不造成过高效用损失的情况下减轻这种泄漏,以及它们在公开与随机初始化的语言模型中的表现如何?
主要发现
- FILM 能从最多 128 句的批次中高保真地恢复单句,并通过迭代恢复多句的一部分。
- 从预训练的语言模型开始时,攻击性能提高,并且随着训练进展由于记忆化而提高。
- 梯度裁剪和 DPSGD 明显降低效用,而从公开 LM 开始并冻结词嵌入可以有效防御 FILM,且对效用的损失很小。
- 从私有文本重新训练会带来比从公开 LM 开始并冻结嵌入更高的效用损失。
- 该方法在 WikiText-103 与 Enron Email 上使用 GPT-2 基线工作,说明真实世界的语言模型联邦设置中的实际隐私风险。
更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。