QUICK REVIEW

[논문 리뷰] HowkGPT: Investigating the Detection of ChatGPT-generated University Student Homework through Context-Aware Perplexity Analysis

Christoforos Vasilatos, Manaar Alam|arXiv (Cornell University)|2023. 05. 26.

Artificial Intelligence in Healthcare and Education인용 수 10

한 줄 요약

HowkGPT는 pretrained GPT-2 모델로 계산된 메타데이터 기반 perplexity 임계치를 사용하여 ChatGPT로 생성된 대학 과제와 학생이 작성한 과제를 구분하고, 범주별 임계치가 정확도를 향상시킨다.

ABSTRACT

As the use of Large Language Models (LLMs) in text generation tasks proliferates, concerns arise over their potential to compromise academic integrity. The education sector currently tussles with distinguishing student-authored homework assignments from AI-generated ones. This paper addresses the challenge by introducing HowkGPT, designed to identify homework assignments generated by AI. HowkGPT is built upon a dataset of academic assignments and accompanying metadata [17] and employs a pretrained LLM to compute perplexity scores for student-authored and ChatGPT-generated responses. These scores then assist in establishing a threshold for discerning the origin of a submitted assignment. Given the specificity and contextual nature of academic work, HowkGPT further refines its analysis by defining category-specific thresholds derived from the metadata, enhancing the precision of the detection. This study emphasizes the critical need for effective strategies to uphold academic integrity amidst the growing influence of LLMs and provides an approach to ensuring fair and accurate grading in educational institutions.

연구 동기 및 목표

학생이 작성한 과제와 AI가 생성한 제출물을 구분하여 학문적 정직성 촉진.
메타데이터가 풍부한 데이터세트를 활용하여 과제 작업에 대한 perplexity 기반 탐지를 다듬는다.
카테고리별 perplexity 임계치가 단일 데이터세트 전반 임계치보다 우수함을 보여준다.
실시간 과제 기원 평가를 위한 공개 웹 도구를 제공한다.

제안 방법

연구에서 GPT-3.5/4에 대한 접근이 불가하므로 사전 학습된 GPT-2 모델로 텍스트의 perplexity를 계산한다.
이동 창(window) 접근법을 사용하여 토큰 수준 손실을 누적하고 그 평균을 지수화하여 텍스트 perplexity를 얻는다.
카테고리별 임계치를 정의하기 위해 지식 및 인지 프로세스 카테고리와 함께 데이터세트 메타데이터를 통합한다.
ROC/AUC 및 F1 지표를 통해 서로 다른 데이터세트 구성에서 임계치를 평가하고 최적의 perplexity 컷오프를 선택한다.
임계치 계산 및 기원 분류를 위한 오프라인 및 실시간 웹 애플리케이션 워크플로우를 배포한다.

실험 결과

연구 질문

RQ1대학 데이터세트 내에서 학생이 작성한 과제와 AI가 생성한 텍스트를 perplexity로 구분할 수 있는가?
RQ2메타데이터 기반 텍스트 분류를 도입하면 단일 임계치보다 탐지 정확도가 향상되는가?
RQ3다양한 질문 카테고리와 데이터셋 구성에서 최적의 perplexity 임계치는 무엇인가?

주요 결과

카테고리별 임계치를 사용할 때 단일 데이터세트 전반 임계치보다 perplexity 기반 탐지 정확도가 향상된다.
수학/코드 내용을 제외하는 것과 같은 데이터셋 구성은 perplexity 분포의 형태와 임계치 성능에 상당한 영향을 준다.
ROC/AUC 및 F1 분석은 선택한 지표에 따라 서로 다른 최적 임계치를 식별하며, 정밀도와 재현율 간의 무역을 반영한다.
연구는 실시간 perplexity 기반 원천 평가를 수행하는 공개 웹 플랫폼을 개발한다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.