QUICK REVIEW

[论文解读] Identifying and Mitigating Privacy Risks Stemming from Language Models: A Survey

Victoria Smith, Ali Shahin Shamsabadi|arXiv (Cornell University)|Sep 27, 2023

Topic Modeling被引用 9

一句话总结

本技术综述映射了语言模型中的隐私攻击面，回顾了跨越预训练、微调和压缩阶段的现有攻击与缓解措施，并强调了未解决的问题与差距。

ABSTRACT

Large Language Models (LLMs) have shown greatly enhanced performance in recent years, attributed to increased size and extensive training data. This advancement has led to widespread interest and adoption across industries and the public. However, training data memorization in Machine Learning models scales with model size, particularly concerning for LLMs. Memorized text sequences have the potential to be directly leaked from LLMs, posing a serious threat to data privacy. Various techniques have been developed to attack LLMs and extract their training data. As these models continue to grow, this issue becomes increasingly critical. To help researchers and policymakers understand the state of knowledge around privacy attacks and mitigations, including where more work is needed, we present the first SoK on data privacy for LLMs. We (i) identify a taxonomy of salient dimensions where attacks differ on LLMs, (ii) systematize existing attacks, using our taxonomy of dimensions to highlight key trends, (iii) survey existing mitigation strategies, highlighting their strengths and limitations, and (iv) identify key gaps, demonstrating open problems and areas for concern.

研究动机与目标

澄清隐私攻击如何在LM各阶段和架构之间差异。
使用维度分类法（目标、攻击者知识、被攻击阶段、模型类型）对现有隐私攻击进行调查。
评估缓解策略（预处理、训练时处理、后处理），并识别其优点、局限性与差距。

提出的方法

提出一个关于LM隐私攻击的显著维度分类法（攻击目标、攻击者知识、训练阶段、模型架构）。
在黑盒和白盒设定中，调查并对现有攻击进行分类（成员资格推断、模型反演/属性推断、数据提取、模型提取）。
将缓解策略综合为预处理、训练时处理和后处理方法；讨论它们的有效性与局限性。

实验结果

研究问题

RQ1LM隐私攻击差异的关键维度有哪些？
RQ2影响语言模型的主要攻击家族有哪些？它们如何随访问级别和模型阶段而变化？
RQ3存在哪些隐私保护技术？在缓解LM隐私风险方面还存在哪些差距？

主要发现

随着模型规模和数据重复的增加，记忆化与数据泄漏增加，尤其是对近期看到的微调数据。
成员资格推断攻击影响受监督的语言模型、静态嵌入、预训练、微调和压缩的大型语言模型，在黑盒和白盒设定下均存在；去重后的训练数据可减少泄漏。
模型反演与属性推断可能重构私有训练数据或属性，在白盒和联邦学习设定下风险显著；某些攻击能够从微调模型中恢复句子或属性。
数据提取攻击甚至在黑盒设定下也能揭示逐字训练数据，尤其是在预训练和微调的大型语言模型上。
模型提取攻击通过复制功能并在复制的模型上进行白盒攻击，威胁API发布的模型，导致进一步泄漏。
缓解方法包括数据净化、去重、差分隐私和知识忘却，凸显仅进行净化不足，应与其他方法结合。
在对所有训练阶段（预训练、微调、压缩）评估隐私风险以及开发稳健、可扩展的隐私保护技术方面，仍存在差距。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。