QUICK REVIEW

[论文解读] Situated Understanding of Errors in Older Adults' Interactions with Voice Assistants: A Month-Long, In-Home Study

Amama Mahmood, Junxiang Wang|arXiv (Cornell University)|Mar 4, 2024

AI in Service Interactions被引用 5

一句话总结

本文介绍了一项为期一个月的在家研究，涉及使用智能扬声器的15位老年人，并通过野外音频记录分析对话错误，并研究以大型语言模型驱动的虚拟助理（ChatGPT + Alexa）作为技术探针。

ABSTRACT

Our work addresses the challenges older adults face with commercial Voice Assistants (VAs), notably in conversation breakdowns and error handling. Traditional methods of collecting user experiences-usage logs and post-hoc interviews-do not fully capture the intricacies of older adults' interactions with VAs, particularly regarding their reactions to errors. To bridge this gap, we equipped 15 older adults' homes with smart speakers integrated with custom audio recorders to collect "in-the-wild" audio interaction data for detailed error analysis. Recognizing the conversational limitations of current VAs, our study also explored the capabilities of Large Language Models (LLMs) to handle natural and imperfect text for improving VAs. Midway through our study, we deployed ChatGPT-powered VA to investigate its efficacy for older adults. Our research suggests leveraging vocal and verbal responses combined with LLMs' contextual capabilities for enhanced error prevention and management in VAs, while proposing design considerations to align VA capabilities with older adults' expectations.

研究动机与目标

研究在野外捕获的额外音频数据如何为老年人与虚拟助理的互动、错误和对话进展提供更深层次的背景信息。
评估将大型语言模型整合到提升老年人与虚拟助理的互动中的潜在收益与挑战。
开发并验证一套在家数据收集工具，以捕获完整的音频交互及反应，超出使用日志。
探讨以LLM驱动的VA作为技术探针的可行性与可用性，以理解老年人对高级对话代理的反应。

提出的方法

在家中对15位老年人使用Amazon Echo Dot设备进行为期4周的现场研究。
附加一个定制音频记录仪，捕获每次互动从开始到结束以及互动后10秒，采用隐私优先的激活策略。
将音频记录转写并与Alexa使用日志对齐，以分析错误和对话恢复。
集成一个由ChatGPT驱动的Alexa技能，作为技术探针并评估老年人的对话能力。
使用带有预定义代码簿的定性编码来对错误类型、发生情况、恢复策略和多轮对话进行分类。

实验结果

研究问题

RQ1RQ1：野外额外音频交互数据如何为老年人对虚拟助理的行为和交互动态提供更深层次、具有情境性的洞见，尤其是在识别错误和推动对话方面？
RQ2RQ2：以LLM驱动的VA在提升老年人互动质量方面带来哪些潜在收益与挑战，以应对当前商用语音助手的局限性？

主要发现

在2552轮对话中，相当一部分是单轮交互，其中1173轮为单轮，1379轮为多轮交互。
错误分类法识别出13类，包括Intent（204次）和Limitation（130次），其中若干错误未被解决或导致对话中断。
音频数据通过捕捉反应、打断和重叠语音来补充使用日志，这些在日志中不存在，使对话动态的理解更加丰富。
由ChatGPT驱动的Alexa技能展示了将LLMs整合到更流畅对话中的可行性，同时也凸显了老年人的学习曲线和可用性考量。
该研究为使VA能力与老年人期望保持一致提供了设计考量，强调在错误处理中的情境保留和恢复策略。
分析建议利用发声线索和即时反应作为隐含信号，以改进对VA的错误检测和管理。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。