QUICK REVIEW

[论文解读] Understanding the LLM-ification of CHI: Unpacking the Impact of LLMs at CHI through a Systematic Literature Review

Rock Yuren Pang, Hope Schroeder|ArXiv.org|Jan 22, 2025

Law, AI, and Intellectual Property被引用 4

一句话总结

对2020–2024年153篇CHI论文的系统性文献综述，分析LLMs在HCI中的应用、角色、贡献类型以及相关局限性和风险。

ABSTRACT

Large language models (LLMs) have been positioned to revolutionize HCI, by reshaping not only the interfaces, design patterns, and sociotechnical systems that we study, but also the research practices we use. To-date, however, there has been little understanding of LLMs' uptake in HCI. We address this gap via a systematic literature review of 153 CHI papers from 2020-24 that engage with LLMs. We taxonomize: (1) domains where LLMs are applied; (2) roles of LLMs in HCI projects; (3) contribution types; and (4) acknowledged limitations and risks. We find LLM work in 10 diverse domains, primarily via empirical and artifact contributions. Authors use LLMs in five distinct roles, including as research tools or simulated users. Still, authors often raise validity and reproducibility concerns, and overwhelmingly study closed models. We outline opportunities to improve HCI research with and on LLMs, and provide guiding questions for researchers to consider the validity and appropriateness of LLM-related work.

研究动机与目标

Identify domains where LLMs are applied in CHI papers from 2020-2024.
Characterize how LLMs are used in CHI projects (roles and workflows).
Catalog contribution types (empirical, artifact, etc.) and their prevalence.
Document acknowledged limitations and risks of LLM-based research in CHI.
Provide open-source dataset and guiding questions to improve rigor and responsibility in LLM-related CHI work.

提出的方法

Collected full-text CHI proceedings from 2020-2024 focusing on generative LLMs.
Applied keyword-based filtering to identify LLM-relevant papers (e.g., 'language model', 'llm', 'GPT', 'ChatGPT').
Validated corpus via random sampling to check for false negatives (0.5% found).
Developed a codebook through iterative open coding guided by four research questions and existing taxonomies.
Assessed interrater reliability using Krippendorff’s alpha for multiple coding dimensions.
Coded 153 papers to produce domain, role, contribution, and limitation/risk taxonomies.
Provided an open-source dataset of the 153 papers with qualitative codes and metadata.

Figure 1 . The raw number of LLM-related papers, followed by the percentage of the total number of papers in each year 2020-2024.

实验结果

研究问题

RQ1Where have LLMs been applied at CHI?
RQ2How have researchers used LLMs in their CHI papers?
RQ3What contributions have LLM-related CHI papers made to HCI?
RQ4What concerns or risks about LLMs do authors articulate in CHI papers?

主要发现

LLMs were applied across 10 diverse domains within CHI (e.g., Communication & Writing, Augmenting Capabilities, Education, Responsible Computing, Programming, Reliability & Validity, Well-being & Health, Design, Accessibility & Aging, Creativity).
Authors leveraged LLMs in five distinct roles, including as system engines, research tools, participants/users, objects of study, and exploring users’ perceptions.
Empirical and artifact contributions dominated CHI LLM work, with 98.7% empirical and 61.4% artifact-related, often in combination.
A total of 29 limitations and risks were identified, spanning LLM performance, research validity, resources, and societal impacts.
There is a strong emphasis on validity and reproducibility, with many studies focusing on closed models rather than open alternatives.
An open-source dataset of the 153 papers and qualitative codes is provided for replication and further study.

Figure 2 . A flow diagram on our sample selection and refinement process.

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。