Skip to main content
QUICK REVIEW

[논문 리뷰] Understanding User Experience in Large Language Model Interactions

Jiayin Wang, Weizhi Ma|arXiv (Cornell University)|2024. 01. 16.
Topic Modeling인용 수 16
한 줄 요약

이 연구는 일반 LLM 인터페이스를 위한 사용자 의도 분류 체계를 개발하고, 만족도와 우려를 평가하기 위해 411명의 참가자를 대상으로 설문조사를 실시하며, 사용자 중심의 인간-AI 협업을 강화하기 위한 6가지 향후 연구 방향을 제안합니다.

ABSTRACT

In the rapidly evolving landscape of large language models (LLMs), most research has primarily viewed them as independent individuals, focusing on assessing their capabilities through standardized benchmarks and enhancing their general intelligence. This perspective, however, tends to overlook the vital role of LLMs as user-centric services in human-AI collaboration. This gap in research becomes increasingly critical as LLMs become more integrated into people's everyday and professional interactions. This study addresses the important need to understand user satisfaction with LLMs by exploring four key aspects: comprehending user intents, scrutinizing user experiences, addressing major user concerns about current LLM services, and charting future research paths to bolster human-AI collaborations. Our study develops a taxonomy of 7 user intents in LLM interactions, grounded in analysis of real-world user interaction logs and human verification. Subsequently, we conduct a user survey to gauge their satisfaction with LLM services, encompassing usage frequency, experiences across intents, and predominant concerns. This survey, compiling 411 anonymous responses, uncovers 11 first-hand insights into the current state of user engagement with LLMs. Based on this empirical analysis, we pinpoint 6 future research directions prioritizing the user perspective in LLM developments. This user-centered approach is essential for crafting LLMs that are not just technologically advanced but also resonate with the intricate realities of human interactions and real-world applications.

연구 동기 및 목표

  • Define a taxonomy of user intents for general LLM interfaces grounded in real-world logs and human verification.
  • Assess user satisfaction with current LLM services across intents via a large-scale survey.
  • Identify usage patterns, experiences, and core concerns to inform user-centric LLM design.
  • Reveal gaps between current evaluations and real-world user needs to guide future research directions.

제안 방법

  • Develop and validate a seven-intent taxonomy for LLM interactions using related literature, real-world logs, and human verification.
  • Validate and refine the taxonomy through annotation of English ShareGPT logs with multiple raters.
  • Design and administer a 12-question, 411-response user survey to measure usage, experience across intents, and concerns.
  • Analyze usage frequency, intent distribution, satisfaction, and tool expectations across Chinese and English responses.
  • Cluster intents based on chi-square interdependence to identify three usage categories: GUI-based objective, GUI-based subjective, and API-based usage.
  • Extract and summarize 11 insights and discuss 6 future research directions for user-centered LLM development.

실험 결과

연구 질문

  • RQ1RQ1: What are the primary user intents for engaging with conversational interfaces powered by LLMs?
  • RQ2RQ2: How do users perceive their experience when interacting with current LLM services in real-world settings?
  • RQ3RQ3: What major concerns do users have for using large language models?
  • RQ4RQ4: What are future directions in building user-centered large language models for better human-AI collaboration?

주요 결과

  • Approximately 80% of participants use LLMs at least weekly, with about half of English and 42.09% of Chinese respondents reporting daily use.
  • Seven intents cluster into three groups: Objective Usage via GUIs, Subjective Usage via GUIs, and Usage through APIs.
  • Text Assistant, Information Retrieval, and Solve Problems in Specialized Areas are the top three usage scenarios.
  • Subjective intents like Seek Creativity and Ask for Advice are common but may be underrepresented in prior research; Leisure uses are comparatively lower.
  • Textual/text-manipulation tasks show high satisfaction (over 80%), while Seek Creativity sees the most dissatisfaction, and cross-cultural differences affect satisfaction (e.g., Solve Problems varies between Chinese and English speakers).
  • Personalization is valued across subjective intents, and there is a need to tailor LLMs to different languages and cultural contexts; user concerns center on capability and trustworthiness (hallucinations, long context, multimodal, privacy, safety).

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.