[论文解读] Conversational AI Powered by Large Language Models Amplifies False Memories in Witness Interviews
与调查或预先设定的聊天机器人相比,生成式聊天机器人交互会显著增加即时错误记忆和信心,且效应在一周后仍然存在。
This study examines the impact of AI on human false memories -- recollections of events that did not occur or deviate from actual occurrences. It explores false memory induction through suggestive questioning in Human-AI interactions, simulating crime witness interviews. Four conditions were tested: control, survey-based, pre-scripted chatbot, and generative chatbot using a large language model (LLM). Participants (N=200) watched a crime video, then interacted with their assigned AI interviewer or survey, answering questions including five misleading ones. False memories were assessed immediately and after one week. Results show the generative chatbot condition significantly increased false memory formation, inducing over 3 times more immediate false memories than the control and 1.7 times more than the survey method. 36.4% of users' responses to the generative chatbot were misled through the interaction. After one week, the number of false memories induced by generative chatbots remained constant. However, confidence in these false memories remained higher than the control after one week. Moderating factors were explored: users who were less familiar with chatbots but more familiar with AI technology, and more interested in crime investigations, were more susceptible to false memories. These findings highlight the potential risks of using advanced AI in sensitive contexts, like police interviews, emphasizing the need for ethical considerations.
研究动机与目标
- 研究AI介导提问如何影响在类似证人情境下的错误记忆形成。
- 比较四种交互方式(对照、基于调查的、预设脚本的聊天机器人、生成式聊天机器人)在错误记忆上的效果。
- 考察所诱导的错误记忆的即时性及一周的持续性与自信度。
- 确定影响对AI诱导的错误记忆易感性的个体因素。
提出的方法
- 两阶段实验,200 名参与者随机分配到四个条件。
- 参与者观看一起犯罪视频,然后回答包含五个误导性问题在内的 25 道题。
- 第一阶段使用对照、基于调查的、预设脚本的聊天机器人或生成式聊天机器人条件。
- 第二阶段,一周后,重新评估记忆和自信度以衡量持久性。
- 通过即时和一周回忆来量化错误记忆;对错误记忆和真实记忆的自信度进行测量。

实验结果
研究问题
- RQ1不同的AI交互模式如何影响在类似证人访谈设置中错误记忆的形成?
- RQ2生成式聊天机器人在诱发错误记忆方面是否比基于调查的或预设脚本的聊天机器人更有效?
- RQ3哪些潜在的用户因素会影响对AI诱导的错误记忆的易感性?
- RQ4AI诱导的错误记忆是否在一周内持续,信心随时间如何变化?
主要发现
- 生成式聊天机器人在即时错误记忆方面比基于调查的和预设脚本的聊天机器人条件更显著(平均数:对照 0.54,调查 1.08,预设脚本 1.34,生成式 1.82)。
- 生成式聊天机器人条件中有 36.4% 的回答立即产生误导。
- 所有干预都相对于对照提高了即时错误记忆和自信度,其中生成式聊天机器人在错误记忆上的自信度最高。
- 一周后,生成式聊天机器人引发的错误记忆基本保持不变(即时 36.4% 对 36.8%),与对照和调查组不同,后两者的错误记忆有所增加。
- 调节因素:对聊天机器人熟悉度较低、对AI技术熟悉度较高、以及对犯罪调查的更大兴趣,会提高对AI诱导的错误记忆的易感性。
- 生成式聊天机器人诱发的错误记忆在长期自信度方面高于对照和预设脚本条件。

更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。