QUICK REVIEW

[논문 리뷰] Can ChatGPT Really Understand Modern Chinese Poetry?

Shanshan Wang, Derek F. Wong|arXiv (Cornell University)|2026. 03. 21.

Artificial Intelligence in Healthcare and Education인용 수 0

한 줄 요약

본 논문은 ECUMP라는 프레임워크를 제시하여 ChatGPT의 현대 중국시 이해를 평가하고, 48편의 시에 걸쳐 원시 시인 의도와의 일치율이 73%에 달하며, 운문성(poeticity)에서의 성능은 낮게 나타난다.

ABSTRACT

ChatGPT has demonstrated remarkable capabilities on both poetry generation and translation, yet its ability to truly understand poetry remains unexplored. Previous poetry-related work merely analyzed experimental outcomes without addressing fundamental issues of comprehension. This paper introduces a comprehensive framework for evaluating ChatGPT's understanding of modern poetry. We collaborated with professional poets to evaluate ChatGPT's interpretation of modern Chinese poems by different poets along multiple dimensions. Evaluation results show that ChatGPT's interpretations align with the original poets' intents in over 73% of the cases. However, its understanding in certain dimensions, particularly in capturing poeticity, proved to be less satisfactory. These findings highlight the effectiveness and necessity of our proposed framework. This study not only evaluates ChatGPT's ability to understand modern poetry but also establishes a solid foundation for future research on LLMs and their application to poetry-related tasks.

연구 동기 및 목표

전문가의 의견을 바탕으로 현대시를 이해하는 데 필수적인 다섯 가지 차원(내용, 표현 방식, 생각과 감정, 현대성, 운율성)을 식별한다.
ChatGPT로 다차원적 시 해석을 이끌어내는 프롬프트 설계를 개발한다.
전문 시인들의 평가와 ChatGPT 해석을 비교하여 기준 진실을 확립한다.
향후 LLM 기반의 시 작업과 연구를 이끌기 위한 평가 프레임워크와 증거를 제공한다.

제안 방법

시 이론과 전문가 입력에 근거한 다섯 가지 시 이해 차원을 정의한다.
그 차원들(내용, 표현 방식, 생각과 감정, 현대성, 운율성)을 아우르는 현대시 해석을 위한 ChatGPT 프롬프트를 설계하고 최적화한다.
해석 작업을 위해 여섯 명의 전문 시인으로부터 48편의 시 데이터셋(Com-Poetry 및 Spe-Poetry)을 구성한다.
고정된 생성 설정으로 GPT-4(gpt-4-0125)를 사용하여 차원별 해석을 산출한다.
네 차원에 대해 0–100 척도, 운율성에 대해 0/50/100 척도, 원래 시인들의 평가를 받고, 병렬 LLM 평가도 얻는다.

Figure 1: The framework for evaluating ChatGPT’s understanding of modern poetry.

실험 결과

연구 질문

RQ1ChatGPT가 미리 정의된 차원 전반에 걸쳐 현대 중국시에 대해 실제로 이해하고 있는가?
RQ2다른 시 유형(Com-Poetry 대 Spe-Poetry)에서 ChatGPT의 해석이 원시 시인 의도와 얼마나 잘 일치하는가?
RQ3ChatGPT가 포착하기 가장 어려운 차원은 어디인가(예: 운율성 대 이미지화)?

주요 결과

내용	언어	이미지	수사	운율	정의	생각	양식
80.33	79.05	81.18	77.83	76.15	79.40	78.80	79.88
77.50	73.75	81.25	88.75	82.50	77.50	78.75	82.50

GPT-4의 해석은 차원 전반에 걸쳐 원 시인 의도와의 일치가 73% 이상이다.
Com-Poetry의 이미지 이해가 가장 강하며 평균 점수는 81.18이다.
Spe-Poetry의 강점은 수사 기법(88.75), 운율(82.50), 현대성(82.50)이다.
운율성은 GPT-4의 약한 차원으로, 가장 시적인 문장을 식별하는 데 어려움이 많다(표에 0/50/100 다수의 결과가 나와 있음).
인간 시인들의 평가가 시 이해에 대한 자동 LLM 평가보다 더 신뢰성이 높다.

Figure 2: The distribution of evaluation scores for GPT-4’s interpretation of 48 poems.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.