QUICK REVIEW

[논문 리뷰] DNA-GPT: Divergent N-Gram Analysis for Training-Free Detection of GPT-Generated Text

Xianjun Yang, Wei Cheng|arXiv (Cornell University)|2023. 05. 27.

Topic Modeling인용 수 20

한 줄 요약

DNA-GPT는 입력을 잘라내고, LLM으로 연속문을 재생성하며, n-그램 또는 확률의 발산을 분석하여 GPT-생성 텍스트와 인간 텍스트를 구별하는 학습-free, 제로샷 탐지기를 소개합니다—블랙박스 및 화이트박스 설정에서 최첨단 성능을 달성하고 설명 가능한 증거를 제공합니다.

ABSTRACT

Large language models (LLMs) have notably enhanced the fluency and diversity of machine-generated text. However, this progress also presents a significant challenge in detecting the origin of a given text, and current research on detection methods lags behind the rapid evolution of LLMs. Conventional training-based methods have limitations in flexibility, particularly when adapting to new domains, and they often lack explanatory power. To address this gap, we propose a novel training-free detection strategy called Divergent N-Gram Analysis (DNA-GPT). Given a text, we first truncate it in the middle and then use only the preceding portion as input to the LLMs to regenerate the new remaining parts. By analyzing the differences between the original and new remaining parts through N-gram analysis in black-box or probability divergence in white-box, we unveil significant discrepancies between the distribution of machine-generated text and the distribution of human-written text. We conducted extensive experiments on the most advanced LLMs from OpenAI, including text-davinci-003, GPT-3.5-turbo, and GPT-4, as well as open-source models such as GPT-NeoX-20B and LLaMa-13B. Results show that our zero-shot approach exhibits state-of-the-art performance in distinguishing between human and GPT-generated text on four English and one German dataset, outperforming OpenAI's own classifier, which is trained on millions of text. Additionally, our methods provide reasonable explanations and evidence to support our claim, which is a unique feature of explainable detection. Our method is also robust under the revised text attack and can additionally solve model sourcing. Codes are available at https://github.com/Xianjun-Yang/DNA-GPT.

연구 동기 및 목표

LLM이 진화함에 따라 융통성 있고 설명 가능한 AI 생성 텍스트 탐지의 필요성을 촉구한다.
이전 텍스트를 조건으로 기계와 인간의 연속을 대조하는 학습 없는 탐지 프레임워크(DNA-GPT)를 제안한다.
설명 가능한 증거를 갖춘 블랙박스 및 화이트박스 탐지 기전을 제공한다.
다수의 데이터셋에서 OpenAI 모델과 오픈소스 LLM에 걸쳐 접근 방법을 검증한다.
수정된 텍스트 공격에 대한 강인함을 시연하고 모델 소싱을 가능하게 한다.

제안 방법

입력 텍스트를 비율 gamma로 잘라 X와 Y0(나머지)로 형성한다.
대상 LLM을 사용하여 X로부터 K개의 연속을 Y1,...,YK를 재생성한다.
{Yk}와 Y0 사이의 발산에 기반하여 점수(BScore: 블랙박스; WScore: 화이트박스)를 계산한다.
블랙박스에서는 Yk와 Y0 사이의 n-그램 중복에 가중 함수 f(n)를 사용한다( f(n)=n log n; n0=4, N=25 ).
화이트박스에서는 모델 확률의 로그비율 p(Y0|X) / p(Yk|X)를 사용한다.
재생성 간에 겹치는 n-그램을 통해 증거를 제공한다.

실험 결과

연구 질문

RQ1학습 없이 제로샷 탐지기가 여러 모델과 도메인에 걸쳐 GPT-생성 텍스트를 인간 텍스트와 신뢰할 수 있게 구분할 수 있는가?
RQ2이전 텍스트를 조건으로 하는 것이 기계와 인간 연속 분포의 차이(가능성 차이 가설)를 드러내 탐지를 가능하게 하는가?
RQ3데이터셋과 언어에 따른 성능 및 설명가능성 측면에서 블랙박스와 화이트박스 변형 간의 차이는 무엇인가?
RQ4수정된 텍스트 공격에 견디는 강인성과 모델 소싱이 가능한가?

주요 결과

DNA-GPT는 다수의 데이터셋과 모델에서 학습 기반 기준선 대비 AUROC 및 1% FPR에서 더 높은 TPR로 최첨단 탐지 성능을 달성한다.
BScore를 이용한 블랙박스 탐지와 WScore를 이용한 화이트박스 탐지는 OpenAI 모델(GPT-3.5-turbo, GPT-4)와 오픈소스 모델(GPT-NeoX-20B, LLaMa-13B)에서 강하게 작동한다.
잘라내기 비율 gamma ≈ 0.5와 K를 5–20 범위로 설정하면 설정에 따라 강력하고 견고한 결과를 낳는다.
이 방법은 겹치는 n-그램을 통해 설명 가능한 증거를 제공하여 해석 및 표절 평가에 도움을 준다.
DNA-GPT는 수정된 텍스트 공격에 대한 강인성을 보이고 모델 간 재생성 패턴을 비교하여 모델 소싱을 가능하게 한다.
비영어 탐지(독일어)가 영어와 경쟁하며, 언어별 학습 없이 제로샷 기능을 보여준다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.