Skip to main content
QUICK REVIEW

[论文解读] When Large Language Models are More PersuasiveThan Incentivized Humans, and Why

Philipp Schoenegger, Fabrizio Salvi|ArXiv.org|May 14, 2025
Misinformation and Its Impacts被引用 4
一句话总结

本文比较两种大型语言模型(Claude 3.5 Sonnet 和 DeepSeek v3)与激励型人类在实时说服任务中的表现,结果显示在说服力方面,模型通常更具影响力;对目标答案的真实性和模型决定影响准确性有条件的影响。

ABSTRACT

Large Language Models (LLMs) have been shown to be highly persuasive, but when and why they outperform humans is still an open question. We compare the persuasiveness of two LLMs (Claude 3.5 Sonnet and DeepSeek v3) against humans who had incentives to persuade, using an interactive, real-time conversational setting. We demonstrate that LLMs persuasive superiority is context-dependent: it depends on whether the persuasion attempt is truthful (towards the right answer) or deceptive (towards the wrong answer) and on the LLM model, and wanes over repeated interactions (unlike human persuasiveness). In our first large-scale experiment, humans vs LLMs (Claude 3.5 Sonnet) interacted with other humans who were completing an online quiz for a reward, attempting to persuade them toward a given (either correct or incorrect) answer. Claude was more persuasive than incentivized human persuaders both in truthful and deceptive contexts and it significantly increased accuracy if persuasion was truthful, but decreased it if persuasion was deceptive. In a follow-up experiment with Deepseek v3, we replicated the findings about accuracy but found greater LLM persuasiveness only if the persuasion was deceptive. Linguistic analyses of the persuaders texts suggest that these effects may be due to LLMs expressing higher conviction than humans.

研究动机与目标

  • 研究在说服任务中何时LLMs优于激励型人类。
  • 检验说服中的真实性与欺骗对结果的影响。
  • 比较两种LLMs(Claude 3.5 Sonnet 与 DeepSeek v3)的说服力。
  • 分析支撑说服力和信服度的语言特征。

提出的方法

  • 进行交互式、实时对话实验,劝说者试图引导在线测验参与者给出正确或错误答案。
  • 比较 Claude 3.5 Sonnet 与 DeepSeek v3 相对于激励型人类劝说者的表现。
  • 在真实与欺骗性说服情境下评估准确性结果。
  • 对劝说者文本进行语言分析,以识别信念度等特征。
  • 在两种不同LLM上重复发现以测试鲁棒性。

实验结果

研究问题

  • RQ1LLMs在真实与欺骗情境中的说服效果是否优于激励型人类?
  • RQ2说服力是否取决于具体的LLM模型?
  • RQ3重复互动是否会改变说服力?
  • RQ4LLM说服中的哪些语言特征与更高的信念度和效果相关?

主要发现

  • 在第一项实验中,Claude在真实与欺骗性说服情境下均优于激励型人类。
  • 当说服为真实时,Claude提高了准确性;当说服为欺骗性时,准确性下降。
  • 在后续的 DeepSeek v3 实验中,准确性得到重复,欺骗性情境中的说服力更显著。
  • 语言分析表明LLMs表达的信念度高于人类,可能驱动观察到的效应。

更好的研究,从现在开始

从论文设计到论文写作,大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成,并经人工编辑审核。