[论文解读] SoK: Memorization in General-Purpose Large Language Models
本综述提出大型语言模型记忆类型的分类法,分析它们对性能、隐私、安全、版权与审计的影响,并讨论在逐字文本、事实、思想与算法、写作风格、分布特性,以及对齐目标等方面的检测与缓解策略。
Large Language Models (LLMs) are advancing at a remarkable pace, with myriad applications under development. Unlike most earlier machine learning models, they are no longer built for one specific application but are designed to excel in a wide range of tasks. A major part of this success is due to their huge training datasets and the unprecedented number of model parameters, which allow them to memorize large amounts of information contained in the training data. This memorization goes beyond mere language, and encompasses information only present in a few documents. This is often desirable since it is necessary for performing tasks such as question answering, and therefore an important part of learning, but also brings a whole array of issues, from privacy and security to copyright and beyond. LLMs can memorize short secrets in the training data, but can also memorize concepts like facts or writing styles that can be expressed in text in many different ways. We propose a taxonomy for memorization in LLMs that covers verbatim text, facts, ideas and algorithms, writing styles, distributional properties, and alignment goals. We describe the implications of each type of memorization - both positive and negative - for model performance, privacy, security and confidentiality, copyright, and auditing, and ways to detect and prevent memorization. We further highlight the challenges that arise from the predominant way of defining memorization with respect to model behavior instead of model weights, due to LLM-specific phenomena such as reasoning capabilities or differences between decoding algorithms. Throughout the paper, we describe potential risks and opportunities arising from memorization in LLMs that we hope will motivate new research directions.
研究动机与目标
- 提供覆盖多种信息类型的大型语言模型记忆的全面分类。
- 讨论记忆对模型性能、隐私、安全、版权和审计的影响。
- 识别在定义和衡量记忆方面的挑战,并概述检测与缓解的方法。
- 突出尚待解决的问题与研究方向,以推动对LLMs记忆的理解与治理。
提出的方法
- 提出对LLMs记忆类型的分类法,涵盖逐字文本、事实、思想与算法、写作风格、训练分布属性,以及对齐目标。
- 回顾并综合来自LLMs、机器学习、隐私、安全与法律的文献,将记忆与性能、隐私、安全、版权和审计联系起来。
- 讨论每种记忆类型的定义、检测方法和缓解措施。
- 将记忆与幻觉和推理进行对比,以阐明输出是来自记忆内容还是来自泛化。
- 强调测量方面的挑战,如推断攻击和分布推断,以及它们如何影响记忆研究。
实验结果
研究问题
- RQ1LLMs记忆的不同信息类型有哪些,如何对其进行定义和检测?
- RQ2每种记忆类型对模型性能、隐私、安全、版权和审计有何影响?
- RQ3在提示、解码和模型行为的挑战下,记忆如何在实践中被测量、缓解和治理?
- RQ4从超越逐字文本的记忆角度考虑,有哪些尚待探讨的研究方向?
主要发现
- 逐字文本记忆很常见,范围从完整文档到短序列和改写版本,检测与缓解的挑战与解码和提示相关。
- 记忆的事实,包括世界知识、领域知识以及个人身份信息(PII),可通过元组、KaRR式度量和反事实记忆进行研究,影响知识正确性与隐私。
- 对思想、算法和写作风格的记忆有助于泛化与迁移,但也可能导致再现有害内容、脱离情境或版权问题。
- 与训练分布属性和对齐目标相关的记忆影响学习效果、偏见、安全性,以及人类偏好或标签的潜在泄露。
- 本文强调区分记忆、推理和幻觉的困难,并倡导能够揭示数据集污染和模型安全弱点的审计方法。
- 讨论了多种检测与预防策略,包括去重、裁剪、语义层面的训练目标、基准中的金丝雀以及后处理保护措施。
更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。