Skip to main content
QUICK REVIEW

[论文解读] CODE: A Contradiction-Based Deliberation Extension Framework for Overthinking Attacks on Retrieval-Augmented Generation

Xiaolei Zhang, Xiaojun Jia|arXiv (Cornell University)|Jan 19, 2026
Topic Modeling被引用 0
一句话总结

论文提出 CODE,一个三代理框架,在检索增强生成(RAG)中污染外部知识,以诱导商业推理模型的过度推理,同时维持答案准确性。

ABSTRACT

Introducing reasoning models into Retrieval-Augmented Generation (RAG) systems enhances task performance through step-by-step reasoning, logical consistency, and multi-step self-verification. However, recent studies have shown that reasoning models suffer from overthinking attacks, where models are tricked to generate unnecessarily high number of reasoning tokens. In this paper, we reveal that such overthinking risk can be inherited by RAG systems equipped with reasoning models, by proposing an end-to-end attack framework named Contradiction-Based Deliberation Extension (CODE). Specifically, CODE develops a multi-agent architecture to construct poisoning samples that are injected into the knowledge base. These samples 1) are highly correlated with the use query, such that can be retrieved as inputs to the reasoning model; and 2) contain contradiction between the logical and evidence layers that cause models to overthink, and are optimized to exhibit highly diverse styles. Moreover, the inference overhead of CODE is extremely difficult to detect, as no modification is needed on the user query, and the task accuracy remain unaffected. Extensive experiments on two datasets across five commercial reasoning models demonstrate that the proposed attack causes a 5.32x-24.72x increase in reasoning token consumption, without degrading task performance. Finally, we also discuss and evaluate potential countermeasures to mitigate overthinking risks.

研究动机与目标

  • 通过污染外部知识而不触碰提示词或模型权重,促成并实现对 RAG 系统的端到端最终用户攻击。
  • 设计一个多代理流水线,构建可被系统检索的基于矛盾的对立性段落。
  • 在多种商业模型上 Demonstrate 在保持准确性的前提下,显著增加推理成本。
  • 分析鲁棒性并讨论对基于过度思考的攻击的潜在防御措施。

提出的方法

  • 提出一个由 Contradiction Architect(矛盾架构师)、Conflict Weaver(冲突编织者)、Style Adapter(风格适配器)组成的三代理 CODE 框架,以生成对抗性段落。
  • Contradiction Architect 创建一个跨层次的矛盾蓝图,将逻辑约束与相冲突的证据内容联系起来。
  • Conflict Weaver 将矛盾蓝图转化为流畅、便于检索的对抗性文本,同时在语义上与查询保持一致。
  • Style Adapter 进行一种进化的、基于风格的改写,在不影响检索的前提下最大化推理令牌的消耗,并以一个软性目标准确性适应的适应度函数引导。
  • 在黑盒威胁模型下,污染文档被注入外部知识库并在 RAG 处理过程中进行检索。
  • 在多种商业推理模型和标准数值推理问答数据集上进行评估,衡量令牌放大和任务准确性。

实验结果

研究问题

  • RQ1外部知识污染通过结构化矛盾框架是否会在不降低答案准确性的前提下,提高 RAG 系统的推理成本?
  • RQ2三代理 CODE 框架(Contradiction Architect、Conflict Weaver、Style Adapter)在生成可检索、相互矛盾的段落以膨胀推理成本方面有多大效果?
  • RQ3对抗性风格适应对不同模型和数据集的令牌级和任务级放大有何影响?

主要发现

  • 对抗性推理成本显著增加,令牌级放大因子在评估模型中范围为 5.32× 到 24.72×。
  • 任务级放大范围约为 12.70× 到 43.45×,在应用风格适配时显示出更深层次的推理膨胀。
  • 在不同模型中,答案准确性与非对抗情境相当,表明在不降低输出质量的情况下实现隐性操控。
  • 在测试配置下,对抗性段落的检索命中率保持在 100%,确保欺骗渗透到推理过程。
  • 消融实验显示 Contradiction Architect 和 Conflict Weaver 是放大的主要驱动因素,Style Adapter 提供额外但较小的放大。
  • 如提示约束和检索过滤等防御措施可以降低但无法完全抑制推理成本的膨胀。

更好的研究,从现在开始

从论文设计到论文写作,大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成,并经人工编辑审核。