QUICK REVIEW

[論文レビュー] Can ChatGPT Really Understand Modern Chinese Poetry?

Shanshan Wang, Derek F. Wong|arXiv (Cornell University)|Mar 21, 2026

Artificial Intelligence in Healthcare and Education被引用数 0

ひとこと要約

本論文はECUMPというフレームワークを提示し、ChatGPTの現代中国詩の理解を評価。48詩のうち原詩人の意図との整合性は73%である一方、詩性（poeticity）の能力は弱い。

ABSTRACT

ChatGPT has demonstrated remarkable capabilities on both poetry generation and translation, yet its ability to truly understand poetry remains unexplored. Previous poetry-related work merely analyzed experimental outcomes without addressing fundamental issues of comprehension. This paper introduces a comprehensive framework for evaluating ChatGPT's understanding of modern poetry. We collaborated with professional poets to evaluate ChatGPT's interpretation of modern Chinese poems by different poets along multiple dimensions. Evaluation results show that ChatGPT's interpretations align with the original poets' intents in over 73% of the cases. However, its understanding in certain dimensions, particularly in capturing poeticity, proved to be less satisfactory. These findings highlight the effectiveness and necessity of our proposed framework. This study not only evaluates ChatGPT's ability to understand modern poetry but also establishes a solid foundation for future research on LLMs and their application to poetry-related tasks.

研究の動機と目的

現代詩を理解するために不可欠な5つの次元（内容、表現手法、思想・感情、現代性、詩性）を専門家の意見とともに特定する。
ChatGPTに対して多次元の詩解釈を引き出す prompts の設計を行う。
ChatGPTの解釈を専門詩人の評価と比較してグラウンド・トuthを確立する。
将来のLLMベースの詩タスクと研究を導く評価フレームワークと証拠を提供する。

提案手法

詩理論と専門家の意見に基づく5つの詩理解次元を定義する。
それらの次元（内容、表現手法、思想・感情、現代性、詩性）を跨ぐ現代詩の解釈を促すChatGPT promptsを設計・最適化する。
6名の専門詩人によるCom-PoetryとSpe-Poetryの48詩データセットを解釈タスク用に構成する。
固定生成設定を用いてGPT-4（gpt-4-0125）に各次元の解釈を作成させる。
四つの次元は0–100スケール、詩性は0/50/100で評価する原詩人の評価と、並行するLLMベースの評価を取得する。

Figure 1: The framework for evaluating ChatGPT’s understanding of modern poetry.

実験結果

リサーチクエスチョン

RQ1ChatGPTは事前に定義された次元にわたり現代中国詩を真に理解しているのか？
RQ2Com-Poetry対Spe-Poetryの詩タイプ間で、ChatGPTの解釈は原詩人の意図とどれくらい一致するのか？
RQ3詩性対イメージなど、GPT-4にとって最も捕らえにくい次元はどれか？

主な発見

Cont	Lang	Imag	Rhet	Rhyt	Defa	Thou	Mode
80.33	79.05	81.18	77.83	76.15	79.40	78.80	79.88
77.50	73.75	81.25	88.75	82.50	77.50	78.75	82.50

GPT-4の解釈は次元全体で原詩人の意図と73%を超える整合性を示す。
Com-Poetryではイメージ理解が最も強く、平均81.18点。
Spe-Poetryでは修辞技法（88.75）、リズム（82.50）、現代性（82.50）が強み。
詩性はGPT-4にとって最も弱い次元で、最も詩的な文を識別できていない例が多く、表には0/50/100の結果が多く示される。
人間の詩人による評価は、詩理解において自動LLM評価より信頼性が高い。

Figure 2: The distribution of evaluation scores for GPT-4’s interpretation of 48 poems.

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。