QUICK REVIEW

[논문 리뷰] Exploring Qualitative Research Using LLMs

Muneera Bano, Didar Zowghi|arXiv (Cornell University)|2023. 06. 23.

Computational and Text Analysis Methods인용 수 11

한 줄 요약

이 논문은 Alexa 앱 리뷰에 대한 인간과 LLM의 분류 및 추론을 비교하고 부분적 정렬과 인간-LLM 협력의 시너지가 가능함을 발견한다.

ABSTRACT

The advent of AI driven large language models (LLMs) have stirred discussions about their role in qualitative research. Some view these as tools to enrich human understanding, while others perceive them as threats to the core values of the discipline. This study aimed to compare and contrast the comprehension capabilities of humans and LLMs. We conducted an experiment with small sample of Alexa app reviews, initially classified by a human analyst. LLMs were then asked to classify these reviews and provide the reasoning behind each classification. We compared the results with human classification and reasoning. The research indicated a significant alignment between human and ChatGPT 3.5 classifications in one third of cases, and a slightly lower alignment with GPT4 in over a quarter of cases. The two AI models showed a higher alignment, observed in more than half of the instances. However, a consensus across all three methods was seen only in about one fifth of the classifications. In the comparison of human and LLMs reasoning, it appears that human analysts lean heavily on their individual experiences. As expected, LLMs, on the other hand, base their reasoning on the specific word choices found in app reviews and the functional components of the app itself. Our results highlight the potential for effective human LLM collaboration, suggesting a synergistic rather than competitive relationship. Researchers must continuously evaluate LLMs role in their work, thereby fostering a future where AI and humans jointly enrich qualitative research.

연구 동기 및 목표

질적 연구에서 AI 구동 LLM의 역할에 대한 이해를 촉진한다.
LLMs가 질적 데이터를 인간 분석가와 비교하여 얼마나 잘 분류할 수 있는지 평가한다.
질적 분류에서 인간과 LLM의 추론 과정을 조사한다.
질적 연구에서 인간과 LLM 간의 효과적인 협력 가능성을 탐구한다.

제안 방법

먼저 인간 분석가에 의해 분류된 소수의 Alexa 앱 리뷰 샘플에 대해 실험을 수행했다.
LLMs에 리뷰를 분류하고 각 분류 뒤의 추론을 제시하도록 요청했다.
LLM의 분류와 추론을 인간의 분류 및 인간의 추론과 비교했다.
인간, ChatGPT 3.5, 및 GPT-4 분류 간의 정렬(일치)을 측정했다.
인간과 LLM 간의 추론 스타일 차이를 분석했다.

실험 결과

연구 질문

RQ1Alexa 앱 리뷰에 대한 LLM 분류가 인간 분류와 얼마나 일치하는가?
RQ2질적 데이터를 분류할 때 인간과 LLM의 추론 과정은 어떻게 비교되는가?
RQ3인간, ChatGPT 3.5, 및 GPT-4 분류 간의 일치 수준은 어느 정도인가?
RQ4인간과 LLM이 보완적인 강점을 나타내어 협력 가능성을 시사하는가?

주요 결과

인간과 ChatGPT 3.5 사이에서 약 1/3 정도의 분류가 일치했다.
인간과 GPT-4 간의 정렬은 다소 낮아(1/4 이상)였다.
두 AI 모델 간의 상호 일치가 더 높게 나타났다(전체 사례의 절반이 넘음).
세 가지 방법 모두에서 합의는 약 1/5의 분류에서 발생했다.
인간은 개인적 경험에 의존하는 경향이 있고, LLM은 단어 선택과 앱 구성 요소에 기반한 추론에 의존한다.
결과는 질적 연구에서 인간-LLM 협력의 시너지를 시사한다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.