QUICK REVIEW

[论文解读] Using ChatGPT for Thematic Analysis

Aleksei Turobov, Diane Coyle|arXiv (Cornell University)|May 13, 2024

Topic Modeling被引用 28

一句话总结

该论文展示了一个用于初始编码的自定义GPT模型，在UN政策文件的主题分析中进行初步编码，通过主题建模进行验证并讨论局限性及风险缓释。

ABSTRACT

The utilisation of AI-driven tools, notably ChatGPT, within academic research is increasingly debated from several perspectives including ease of implementation, and potential enhancements in research efficiency, as against ethical concerns and risks such as biases and unexplained AI operations. This paper explores the use of the GPT model for initial coding in qualitative thematic analysis using a sample of UN policy documents. The primary aim of this study is to contribute to the methodological discussion regarding the integration of AI tools, offering a practical guide to validation for using GPT as a collaborative research assistant. The paper outlines the advantages and limitations of this methodology and suggests strategies to mitigate risks. Emphasising the importance of transparency and reliability in employing GPT within research methodologies, this paper argues for a balanced use of AI in supported thematic analysis, highlighting its potential to elevate research efficacy and outcomes.

研究动机与目标

研究使用GPT模型作为主题分析中初步编码的协作工具的可行性。
开发并测试为定性编码量身定做的自定义GPT模型（Supported Thematic Analysis. AIxGEO）。
评估GPT生成的编码与人工方法及主题建模（LDA）的对比。
识别AI辅助主题分析的优点、局限性及风险缓释策略。
就提示词设计、验证以及维护研究完整性提供实际指导。

提出的方法

开发与结构化主题分析工作流程对齐的自定义GPT模型。
采用少样本学习、思维链与角色扮演提示以提升分析产出。
创建知识库和逐步指令脚本，引导GPT模型完成熟悉化、编码、聚类与主题发展。
在UN政策文件与新闻稿（63份，2017–2024）上进行试点测试。
通过与潜在狄立克特里分配（LDA）主题建模结果进行对比来验证GPT输出。
记录局限性并提出验证与透明度策略。

实验结果

研究问题

RQ1GPT模型在允许研究者验证和 refined 输出的同时，是否能够执行主题分析的初步编码？
RQ2GPT生成的编码与传统人工编码以及与UN政策文件中的主题建模（LDA）相比如何？
RQ3哪些提示词与知识库设计能够提升GPT在主题分析中的表现？
RQ4AI辅助主题分析的局限性与伦理考量有哪些，如何进行缓解？

主要发现

Year	Topic	Terms (from LDA)	Topic Labels (interpreted)	Notes
2019	Topic 1	system militar[y] human applic[ation] weapon[s]	AI Security and Military application	LDA topic about security and military applications
2019	Topic 2	right[s] human privac[s] state protect[ion]	Human Rights approach	Rights and privacy in AI governance
2019	Topic 3	unit[ed] nation educ[ation] develop[ment] learn[ing]	UN Role in AI Education	Education and development role of the UN
2019	Topic 4	artifici[al] intellig[ence] develop[ment] technolog[y] work	AI Development	Technological progress and AI deployment
2019	Topic 5	technolog[y] industri[al] revolut[ion] chang[es] respons[e]	Response to Technological Transformation	Industrial revolution and change management
2019	Topic 6	technolog[y] countr[ies] region develop[ment] govern[ance]	Technological Governance and Regional Developmnet	Governance and regional development in tech
2021	Topic 1	right[s] human intellig[ence] artifici[al] data	AI and Human Rights	Rights and data considerations in AI
2021	Topic 2	technolog[ies] countr[ies] develop[ment] ineq[ality]	Technological Development and Inequality	Inequality implications of tech development
2021	Topic 3	data learn[ing] machin[e] model[s] statist[ics]	Machine Learning	Data, learning, models, statistics focus
2021	Topic 4	nation[s] unit[ed] member[s] develop[ment] work	UN Members Role	UN member involvement in AI development
2021	Topic 5	group[s] terrorist attack individu[al] technolog[ies]	Terrorism and AI in Security	AI in security and counter-terrorism contexts
2021	Topic 6	digit[al] solut[ions] process technolog[ies] strateg[y]	Digital Strategy	Digital solutions and strategic technology deployment

自定义GPT模型对63份UN政策文件和新闻稿（2017年–2024年3月）生成了超过700个不同的编码。
GPT编码覆盖了从伦理与治理到安全等广泛的AI相关主题，并反映了UN话语的转变。
GPT输出总体偏描述性，偶有引用与编码命名错误，需要人工审核。
主题建模（LDA）提供了更广泛的抽象验证层，与GPT编码互为补充。
OpenAI政策在2024年发生变动，限制直接引用，需要在转述的同时进行人工引文核验。
通过提示工程与验证的平衡型AI-人类工作流可以在保持研究严谨性的同时提升效率。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。