QUICK REVIEW

[논문 리뷰] DPIC: Decoupling Prompt and Intrinsic Characteristics for LLM Generated Text Detection

Xiao Yu, Yuang Qi|arXiv (Cornell University)|2023. 05. 21.

Topic Modeling인용 수 12

한 줄 요약

DPIC는 프롬프트로부터 도출된 특징과 고유 텍스트 특성을 분리하고, 시암 네트워크를 사용하여 원문과 GPT가 재답한 버전을 비교함으로써 기계 생성 텍스트를 탐지합니다.

ABSTRACT

Large language models (LLMs) have the potential to generate texts that pose risks of misuse, such as plagiarism, planting fake reviews on e-commerce platforms, or creating inflammatory false tweets. Consequently, detecting whether a text is generated by LLMs has become increasingly important. Existing high-quality detection methods usually require access to the interior of the model to extract the intrinsic characteristics. However, since we do not have access to the interior of the black-box model, we must resort to surrogate models, which impacts detection quality. In order to achieve high-quality detection of black-box models, we would like to extract deep intrinsic characteristics of the black-box model generated texts. We view the generation process as a coupled process of prompt and intrinsic characteristics of the generative model. Based on this insight, we propose to decouple prompt and intrinsic characteristics (DPIC) for LLM-generated text detection method. Specifically, given a candidate text, DPIC employs an auxiliary LLM to reconstruct the prompt corresponding to the candidate text, then uses the prompt to regenerate text by the auxiliary LLM, which makes the candidate text and the regenerated text align with their prompts, respectively. Then, the similarity between the candidate text and the regenerated text is used as a detection feature, thus eliminating the prompt in the detection process, which allows the detector to focus on the intrinsic characteristics of the generative model. Compared to the baselines, DPIC has achieved an average improvement of 6.76\% and 2.91\% in detecting texts from different domains generated by GPT4 and Claude3, respectively.

연구 동기 및 목표

학습 도메인 밖의 데이터에 대한 머신 생성 텍스트 탐지의 강건성 동기를 제시한다.
프롬프트 효과를 고유 텍스트 특성으로부터 분리하는 개념을 도입한다.
생성 텍스트의 상속성을 드러내기 위해 GPT 주도 재답 생성 메커니즘을 제안한다.
탐지를 위한 시암 임베딩 기반 유사성 모듈과 분류기를 개발한다.
실세계 사용을 반영하기 위한 교란과 공격에 대한 강건성을 평가한다.

제안 방법

GPT 유전적 상속을 정의한다: LLM 출력은 학습 데이터와 프롬프트에 의해 형성된다.
원문을 요약하도록 GPT 모델에 프롬트를 주고 이어서 재답하게 하여 재답 텍스트를 생성한다.
시암 네트워크를 사용해 고차원 의미 임베딩을 계산하고 코사인 유사도를 측정한다.
임베딩과 유사성을 결합해 기계 생성 텍스트를 예측하는 분류기로 사용한다.
HC3에서 학습하고 Wiki, CCNews, CovidCM, ACLAbs 데이터셋에서 일반화를 평가한다.
PPL 기반 탐지기, DetectGPT, RoBERTa 기반 탐지기와 비교하고 재번역 및 다듬기 공격에 대한 강건성을 평가한다.

실험 결과

연구 질문

RQ1GPT생성 텍스트를 원문과 GPT가 생성한 재답 간의 유사성을 이용해 탐지할 수 있는가?
RQ2고차원 의미 임베딩을 활용하면 탐지의 교차 도메인 일반화가 향상되는가?
RQ3일반적인 텍스트 교란 및 적응형 공격에 대한 접근법의 강건성은 어떤가?
RQ4GPT-Pat이 다양한 데이터셋에서 최첨단 탐지기와 어떻게 비교되는가?

주요 결과

데이터셋	P_정확도	P_정밀도	P_F1	D_정확도	D_정밀도	D_F1	R_정확도	R_정밀도	R_F1	G_정확도	G_정밀도	G_F1
HC3	0.9344	0.8140	0.9943	0.9989	0.9519	0.8036	0.9936	0.9984	0.9341	0.8171	0.9944	0.9989
Wiki	0.8547	0.7155	0.8843	0.9532	0.8721	0.7181	0.8152	0.9348	0.8512	0.7138	0.8958	0.9541
CCNews	0.7156	0.7650	0.7011	0.9337	0.6825	0.7477	0.6304	0.9670	0.7393	0.7729	0.7648	0.9313
CovidCM	0.8353	0.7192	0.9676	0.9676	0.8758	0.7286	0.9634	0.9903	0.8260	0.7133	0.9678	0.9669
ACLAbs	0.7050	0.8859	0.8745	0.8983	0.9692	0.9000	1.0000	1.0000	0.5915	0.8839	0.8571	0.8872

네 가지 일반화 데이터셋(Wiki, CCNews, CovidCM, ACLAbs)에서 평균 정확도는 0.9457로 RoBERTa 기반 탐지기보다 평균적으로 12.34% 높은 성능을 보인다.
GPT-Pat은 여러 데이터셋에서 더 높은 정밀도(예: CCNews 정밀도 0.9670)를 달성하여 오탐을 줄인다.
유사도 및 임베딩 특징을 모두 활용하는 시암+임베딩 분류기가 테스트된 아키텍처 중 최상의 성능을 보인다.
적응형 공격(재번역 및 부분 다듬기)이 RoBERTa보다 GPT-Pat에서 더 크게 저하되지 않아 실전 강건성이 더 높다.
GPT-Pat은 HC3에서 최첨단 성능을 유지하고 도메인 외 데이터에 대한 일반화가 우수하다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.