QUICK REVIEW

[논문 리뷰] When Large Language Models are More PersuasiveThan Incentivized Humans, and Why

Philipp Schoenegger, Fabrizio Salvi|ArXiv.org|2025. 05. 14.

Misinformation and Its Impacts인용 수 4

한 줄 요약

이 논문은 두 개의 LLM(Claude 3.5 Sonnet 및 DeepSeek v3)을 실시간 설득 과제에서 인센티브를 받은 인간과 비교하고, LLM이 종종 더 설득적이며 대상 답변의 진실성 여부와 모델에 따라 정확도에 영향을 준다는 것을 발견한다.

ABSTRACT

Large Language Models (LLMs) have been shown to be highly persuasive, but when and why they outperform humans is still an open question. We compare the persuasiveness of two LLMs (Claude 3.5 Sonnet and DeepSeek v3) against humans who had incentives to persuade, using an interactive, real-time conversational setting. We demonstrate that LLMs persuasive superiority is context-dependent: it depends on whether the persuasion attempt is truthful (towards the right answer) or deceptive (towards the wrong answer) and on the LLM model, and wanes over repeated interactions (unlike human persuasiveness). In our first large-scale experiment, humans vs LLMs (Claude 3.5 Sonnet) interacted with other humans who were completing an online quiz for a reward, attempting to persuade them toward a given (either correct or incorrect) answer. Claude was more persuasive than incentivized human persuaders both in truthful and deceptive contexts and it significantly increased accuracy if persuasion was truthful, but decreased it if persuasion was deceptive. In a follow-up experiment with Deepseek v3, we replicated the findings about accuracy but found greater LLM persuasiveness only if the persuasion was deceptive. Linguistic analyses of the persuaders texts suggest that these effects may be due to LLMs expressing higher conviction than humans.

연구 동기 및 목표

LLM이 설득 과제에서 언제 인센티브를 받은 인간을 능가하는지 조사한다.
설득에서 진실성 대 기망이 결과에 어떤 영향을 주는지 분석한다.
두 LLM(Claude 3.5 Sonnet 및 DeepSeek v3) 간의 설득력 비교한다.
설득력과 확신에 근거한 언어적 특징 분석한다.

제안 방법

온라인 퀴즈 참가자들을 정답 또는 오답으로 유도하기 위해 설득자가 실시간으로 상호작용하는 대화 실험을 수행한다.
Claude 3.5 Sonnet 및 DeepSeek v3의 성능을 인센티브를 받은 인간 설득자들과 비교한다.
진실한 설득 맥락과 기만적 설득 맥락에서의 정확도 결과를 평가한다.
설득자의 텍스트를 언어학적으로 분석하여 확신 및 기타 특징을 식별한다.
강건성을 테스트하기 위해 두 가지 다른 LLM에서 결과를 재현한다.

실험 결과

연구 질문

RQ1LLMs가 진실된 맥락과 기만적 맥락에서 인센티브를 받은 인간보다 설득 효과에서 더 우수한가?
RQ2설득력은 사용된 특정 LLM 모델에 의존하는가?
RQ3반복 상호작용으로 설득력이 달라지는가?
RQ4LLM 설득의 어떤 언어적 특성이 더 높은 확신과 효과성과 관련이 있는가?

주요 결과

Claude는 첫 번째 실험에서 진실한 설득 맥락과 기만적 설득 맥락 모두에서 인센티브를 받은 인간보다 우수하다.
Claude는 설득이 진실일 때 정확도를 높이고, 설득이 기만적일 때 정확도를 낮춘다.
DeepSeek v3를 사용한 후속 연구에서 정확도가 재현되었고, 설득력은 주로 기만적 맥락에서 더 크다.
언어 분석은 LLM이 인간보다 더 높은 확신을 표현하는 경향이 있음을 시사하며, 이는 관찰된 효과를 유발하는 원인일 수 있다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.