QUICK REVIEW

[论文解读] Talk, Listen, Connect: How Humans and AI Evaluate Empathy in Responses to Emotionally Charged Narratives

Mahnaz Roshanaei, Rezvaneh Rezapour|arXiv (Cornell University)|Sep 23, 2024

AI in Service Interactions被引用 7

一句话总结

本文通过个人叙事、GPT-4o，以及带有角色属性微调的对齐评估，比较人-人与人-AI 互动中的同理心表达与感知。

ABSTRACT

Social interactions promote well-being, yet barriers like geographic distance, time limitations, and mental health conditions can limit face-to-face interactions. Emotionally responsive AI systems, such as chatbots, offer new opportunities for social and emotional support, but raise critical questions about how empathy is perceived and experienced in human-AI interactions. This study examines how empathy is evaluated in AI-generated versus human responses. Using personal narratives, we explored how persona attributes (e.g., gender, empathic traits, shared experiences) and story qualities affect empathy ratings. We compared responses from standard and fine-tuned AI models with human judgments. Results show that while humans are highly sensitive to emotional vividness and shared experience, AI-responses are less influenced by these cues, often lack nuance in empathic expression. These findings highlight challenges in designing emotionally intelligent systems that respond meaningfully across diverse users and contexts, and informs the design of ethically aware tools to support social connection and well-being.

研究动机与目标

研究人-人与人-AI 互动中同理心的体验与表达有何不同。
考察 AI 角色属性（性别、同理关怀、换位思考、经验相似性）如何影响同理心表达。
评估指令微调与数据驱动的角色线索是否提升 AI 与人类同理心的一致性。
识别讲故事情境和感知相似性中的因素，促使人类与 AI 的同理心产生。
讨论在心理健康相关情境下具同理心的 AI 的伦理与现实意义。

提出的方法

使用 126 名本科生的三个最好生活事件和三个最坏生活事件的故事来生成叙事。
在 MTurk 收集对同理心、情感维度与认知维度的评分，以及感同身受原因（n≈2,586 评分，覆盖 756 个故事）。
对同一故事生成 GPT-4o 的回答，使用基础提示与基于角色属性的提示（性别、同理关怀、换位思考、经验相似性）。
以两种模式对 GPT-4o 进行微调：<br>（a）仅故事且包含人类同理心评分；<br>（b）包含读者人口统计信息和相似性测量的全部属性。
使用均值、标准差、RMSE、t 检验和 Wasserstein 距离来评估同理心；用多层模型（R 的 lme4）分析唤起的同理心。
分析角色提示和微调是否降低人类与 AI 同理心之间的错配。

实验结果

研究问题

RQ1RQ1: 人-人与人-AI 互动中的体验与表达同理心有何差异？
RQ2RQ2: 角色属性（性别、同理关怀、换位思考、经验相似性）如何影响 AI 与人类表达与感知的同理心？
RQ3RQ3: 微调 AI 模型在多大程度上改善人类与 AI 同理心的一致性？
RQ4RQ4: 促使人类与 AI 系统产生同理心的因素有哪些？

主要发现

GPT-4o 对同理心的评估高且变异性较小，AI 的平均同理心为 3.615，而人类为 3.23（总体）。
AI 在认知同理心上的差异性比情感同理心更大，显示理解深度相对较浅。
提示中的角色属性对 GPT-4o 的基础同理心影响仅为轻微变化，但微调在包括经验相似性时显著提升一致性。
在包含故事与读者属性的全部属性下进行微调，综合指标上接近人类同理心（均值差接近零），并提升认知同理心的表现。
将模型微调使其反映经验相似性更能提升人类与 AI 的同理心唤起一致性，优于其他角色属性。
人类依赖情感强度与感知相似性等因素；GPT-4o 起初捕捉到其中一些因素，但错过了其他因素（如愉快的讲述者情境），这是人类所考虑的。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。