[论文解读] Memory poisoning and secure multi-agent systems
该论文分析代理AI和多代理系统中的内存污染,分类内存类型(语义、情节、短期),评估污染风险,并提出基于密码学的与隐私保护的缓解策略,包括一个类Prolog的私有推理原型。
Memory poisoning attacks for Agentic AI and multi-agent systems (MAS) have recently caught attention. It is partially due to the fact that Large Language Models (LLMs) facilitate the construction and deployment of agents. Different memory systems are being used nowadays in this context, including semantic, episodic, and short-term memory. This distinction between the different types of memory systems focuses mostly on their duration but also on their origin and their localization. It ranges from the short-term memory originated at the user's end localized in the different agents to the long-term consolidated memory localized in well established knowledge databases. In this paper, we first present the main types of memory systems, we then discuss the feasibility of memory poisoning attacks in these different types of memory systems, and we propose mitigation strategies. We review the already existing security solutions to mitigate some of the alleged attacks, and we discuss adapted solutions based on cryptography. We propose to implement local inference based on private knowledge retrieval as an example of mitigation strategy for memory poisoning for semantic memory. We also emphasize actual risks in relation to interactions between agents, which can cause memory poisoning. These latter risks are not so much studied in the literature and are difficult to formalize and solve. Thus, we contribute to the construction of agents that are secure by design.
研究动机与目标
- 对代理/ MAS中的内存系统进行分类,并明确污染可能发生的环节(语义、情节、短期)。
- 评估不同内存类型下内存污染的可行性和影响。
- 开发利用密码学、来源追踪和隐私保护的检索/推理缓解策略。
- 提出并展示语义内存污染缓解的实际实现。
- 强调在安全设计与MAS中的内存交互风险方面存在的开放挑战。
提出的方法
- 在MAS中回顾内存类型,并将语义、情节和短期记忆及其整合联系定义清楚。
- 用知识库数据污染和检索输出(公式与日志)来建模语义记忆污染攻击。
- 提出通过安全内存机制(哈希、签名)与带来源结构的安全更新来缓解。
- 探索受PIR启发的私有信息检索方法,在不暴露查询的情况下访问不可信知识库。
- 实现一个本地的类似Prolog的Horn子句推理引擎,使用来自两个不合作的知识库的私有知识检索;并基于k-匿名性实现一个更轻量的单服务器PIR变体。
- 讨论私有推理的局限性以及对不可信代理的信任/声誉机制。
- 将缓解扩展到情节与短期记忆,包含安全更新、来源追踪,以及对恶意交互的防护措施。
实验结果
研究问题
- RQ1在MAS/代理智能中存在哪些主要的内存系统,污染攻击如何在语义、情节和短期记忆中表现?
- RQ2密码学与隐私保护技术如何缓解内存污染,尤其是对语义记忆的影响?
- RQ3有哪些实际实现可以展示在MAS中实现安全内存更新、来源追踪与私有检索/推理?
- RQ4在缓解情节记忆和短期记忆污染以及代理交互方面存在哪些局限性与开放挑战?
主要发现
- 语义记忆污染可能改变事实性知识并影响未来决策。
- 安全的内存机制、带来源的知识库更新以及安全通信可降低污染风险。
- 私有知识检索与私有推理可以减轻对不可信知识库的依赖。
- 一个基于Python的类Prolog推理原型演示了从两个知识库进行私有检索;另外实现了一个基于k-匿名性的更轻量单服务器PIR变体。
- 情节与短期记忆污染需要安全更新、来源追踪与对恶意交互的防护;建议采用信任与声誉机制。
- 工作指出两个关键研究问题:安全的情节记忆更新与通过代理交互进行的记忆操控。
更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。