QUICK REVIEW

[논문 리뷰] Survey on Plagiarism Detection in Large Language Models: The Impact of ChatGPT and Gemini on Academic Integrity

Shushanta Pudasaini, Luis Miralles‐Pechuán|arXiv (Cornell University)|2024. 06. 04.

Academic integrity and plagiarism인용 수 9

한 줄 요약

본 논문은 ChatGPT와 Gemini 같은 LLM이 학문적 정직성에 어떤 영향을 미치는지 조사하고, AI 생성 콘텐츠 및 표절 탐지 방법을 검토하며, 격차와 향후 해결책에 대해 논의한다.

ABSTRACT

The rise of Large Language Models (LLMs) such as ChatGPT and Gemini has posed new challenges for the academic community. With the help of these models, students can easily complete their assignments and exams, while educators struggle to detect AI-generated content. This has led to a surge in academic misconduct, as students present work generated by LLMs as their own, without putting in the effort required for learning. As AI tools become more advanced and produce increasingly human-like text, detecting such content becomes more challenging. This development has significantly impacted the academic world, where many educators are finding it difficult to adapt their assessment methods to this challenge. This research first demonstrates how LLMs have increased academic dishonesty, and then reviews state-of-the-art solutions for academic plagiarism in detail. A survey of datasets, algorithms, tools, and evasion strategies for plagiarism detection has been conducted, focusing on how LLMs and AI-generated content (AIGC) detection have affected this area. The survey aims to identify the gaps in existing solutions. Lastly, potential long-term solutions are presented to address the issue of academic plagiarism using LLMs based on AI tools and educational approaches in an ever-changing world.

연구 동기 및 목표

LLMs가 학문적 부정행위를 증가시켰으며 표절 탐지에 미친 영향을 보여준다.
AI생성 콘텐츠 탐지의 최신 데이터셋, 알고리즘, 도구 및 회피 전략을 조사한다.
현 탐지기에서의 격차, 한계, 평가상의 도전과제를 식별한다.
AI 주도 표절에 대응하기 위한 장기적 기술적 및 교육적 해결책을 논의한다.

제안 방법

표절 및 AIGC 탐지에 관한 기존 문헌을 검토한다.
AIGC 탐지에 사용되는 데이터셋, 탐지 알고리즘 및 도구를 나열한다.
회피 기술과 그것이 탐지기의 신뢰도에 미치는 영향을 살펴본다.
워터마킹, 제로샷, 기타 탐지 접근법을 분석한다.
격차를 강조하고 잠재적 벤치마크와 교육적 해결책을 제안한다.

Figure 1: Timeline indicating the release date and parameter of different GPT models by OpenAI.

실험 결과

연구 질문

RQ1ChatGPT와 Gemini 같은 LLM이 학문적 부정행위와 표절 탐지에 어떤 영향을 미쳤는가?
RQ2AI생성 콘텐츠 탐지에 사용되는 주요 데이터셋, 알고리즘, 도구는 무엇이며 얼마나 효과적인가?
RQ3탐지기를 우회하기 위한 회피 전략은 무엇이 있으며 현재 솔루션에 어떤 함의를 가지는가?
RQ4현재의 AIGC/표절 탐지 접근 방식의 격차는 무엇이며 어떤 향후 방향이 제안되는가?
RQ5장기적으로 AI 주도의 표절을 해결할 수 있는 교육적 및 기술적 해결책은 무엇인가?

주요 결과

LLMs는 학문적 위조를 심화시키고 전통적인 표절 탐지를 복잡하게 만든다.
AIGC 탐지에는 다양한 데이터셋, 알고리즘 및 도구가 존재하며 지속적인 회피 전략이 신뢰성에 도전한다.
워터마킹, 제로샷, 프롬프트 기반 방법이 핵심 탐지 접근법을 형성하며 각각 강건성과 실용성에서 균형을 이룬다.
벤치마크 데이터셋과 도메인 다양성에 현저한 격차가 있어 연구 간 공정한 비교를 방해한다.
AI 생성 표절에 대한 총체적 대응의 일환으로 여러 교육적 및 정책 지향 솔루션이 논의된다.

Figure 2: Diagram demonstrating how ChatGPT and paraphrasing tools can be used to complete assignments.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.