QUICK REVIEW

[论文解读] Understanding User Experience in Large Language Model Interactions

Jiayin Wang, Weizhi Ma|arXiv (Cornell University)|Jan 16, 2024

Topic Modeling被引用 16

一句话总结

This study develops a user-intent taxonomy for general LLM interfaces, conducts a survey with 411 participants to assess satisfaction and concerns, and proposes 6 future research directions to enhance user-centered human-AI collaboration.

ABSTRACT

In the rapidly evolving landscape of large language models (LLMs), most research has primarily viewed them as independent individuals, focusing on assessing their capabilities through standardized benchmarks and enhancing their general intelligence. This perspective, however, tends to overlook the vital role of LLMs as user-centric services in human-AI collaboration. This gap in research becomes increasingly critical as LLMs become more integrated into people's everyday and professional interactions. This study addresses the important need to understand user satisfaction with LLMs by exploring four key aspects: comprehending user intents, scrutinizing user experiences, addressing major user concerns about current LLM services, and charting future research paths to bolster human-AI collaborations. Our study develops a taxonomy of 7 user intents in LLM interactions, grounded in analysis of real-world user interaction logs and human verification. Subsequently, we conduct a user survey to gauge their satisfaction with LLM services, encompassing usage frequency, experiences across intents, and predominant concerns. This survey, compiling 411 anonymous responses, uncovers 11 first-hand insights into the current state of user engagement with LLMs. Based on this empirical analysis, we pinpoint 6 future research directions prioritizing the user perspective in LLM developments. This user-centered approach is essential for crafting LLMs that are not just technologically advanced but also resonate with the intricate realities of human interactions and real-world applications.

研究动机与目标

基于真实世界日志和人工验证，为通用 LLM 界面定义用户意图分类法。
通过大规模调查在各个意图上评估用户对当前 LLM 服务的满意度。
识别使用模式、体验和核心关切，以指导以用户为中心的 LLM 设计。
揭示当前评估与现实世界用户需求之间的差距，以指导未来研究方向。

提出的方法

使用相关文献、真实世界日志和人工验证，开发并验证一个七意图的 LLM 交互分类法。
通过对 English ShareGPT 日志进行多名评审标注，验证并细化该分类法。
设计并实施一项包含 12 道题、411 份回答的用户调查，以衡量在各意图下的使用、体验和关切。
分析中文与英文回答中的使用频率、意图分布、满意度和工具期望。
基于卡方检验的相互依赖性对意图进行聚类，识别三类使用：基于 GUI 的客观使用、基于 GUI 的主观使用，以及基于 API 的使用。
提取并总结 11 条洞见，讨论 6 个面向用户的 LLM 发展未来研究方向。

实验结果

研究问题

RQ1RQ1：与由 LLM 驱动的对话界面互动的主要用户意图是什么？
RQ2RQ2：在现实世界环境中，用户如何感知自己与当前 LLM 服务互动的体验？
RQ3RQ3：用户在使用大型语言模型时有哪些主要关切？
RQ4RQ4：在构建以用户为中心的大型语言模型以促进更好的人工智能协作方面的未来方向是什么？

主要发现

大约 80% 的参与者至少每周使用一次 LLM，其中约一半的英语受访者和 42.09% 的中文受访者报告每日使用。
七个意图聚成三组：通过 GUI 的客观使用、通过 GUI 的主观使用，以及通过 API 的使用。
Text Assistant、Information Retrieval、以及 Solve Problems in Specialized Areas 是前三大使用场景。
诸如 Seek Creativity、Ask for Advice 等主观意图很常见，但在以往研究中可能被低估；娱乐用途相对较低。
文本相关/文本操作任务显示高满意度（超过 80%），而 Seek Creativity 的不满度最高，跨文化差异影响满意度（例如 Solve Problems 在中文与英文使用者之间存在差异）。
在主观意图中，个性化受到重视，并且需要将 LLMs 针对不同语言和文化背景进行定制；用户关切集中在能力和可信度上（幻觉、长上下文、多模态、隐私、安全）。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。