QUICK REVIEW

[論文レビュー] Human Label Variation in Implicit Discourse Relation Recognition

Frances Yung, Daniil Ignatev|arXiv (Cornell University)|Feb 26, 2026

Topic Modeling被引用数 0

ひとこと要約

本研究はDiscoGeMを用いた英語の暗黙的談話関係認識において、ラベル分布学習がより安定した予測を生み出す一方、 annotator-specificモデルは認知的要求による disagreement のため細粒度レベルでの難易度が高いことを示す。

ABSTRACT

There is growing recognition that many NLP tasks lack a single ground truth, as human judgments reflect diverse perspectives. To capture this variation, models have been developed to predict full annotation distributions rather than majority labels, while perspectivist models aim to reproduce the interpretations of individual annotators. In this work, we compare these approaches on Implicit Discourse Relation Recognition (IDRR), a highly ambiguous task where disagreement often arises from cognitive complexity rather than ideological bias. Our experiments show that existing annotator-specific models perform poorly in IDRR unless ambiguity is reduced, whereas models trained on label distributions yield more stable predictions. Further analysis indicates that frequent cognitively demanding cases drive inconsistency in human interpretation, posing challenges for perspectivist modeling in IDRR.

研究の動機と目的

暗黙的談話関係認識(IDRR)において単一の正解真理値に依存するのではなく、 disagreement の取り扱いの必要性を動機づける。
異なるモデリングパラダイム（disagreement-aware、distribution-focused、perspectivist）がIDRRタスクでどのように機能するかを評価する。
異なるラベルの粒度（Level-1は5クラス、Level-2は17クラス）での性能を評価する。
IDRRでモデル学習に影響を与える annotator の disagreement の源泉と認知要求を分析する。

提案手法

英語の暗黙的DRに対して multi-annotator ラベルを持つ DiscoGeM データセットを利用する。
3つのタスク設定を訓練・比較する：単一ラベル予測、ラベル分布予測、 annotator-specific ラベル予測。
アーキテクチャ全体で共有バックボーンとして RoBERTa-base を実装する。
3つのソフトラベルアプローチを評価する：multi-label BCE、label-dist with KL-divergence、ST-based baseline。
perspectivistモデル（MTとAE）を非perspectivistベースラインおよび majority-vote ベースラインと比較する。
macro-F1、 accuracy、 cross-entropy、 Jensen–Shannon divergence、 Manhattan distance、Euclidean distance の指標で結果を報告する。

実験結果

リサーチクエスチョン

RQ1disagreement-aware、distribution-based、perspectivist モデルは Level-1（ coarse ）と Level-2（ fine ）の粒度でIDRRにおいてどのように機能するか？
RQ2ラベル分布から学習するか、 annotator-specific predictions から学習するかのどちらが、高いラベル曖昧性下でより頑健なIDRR性能を発揮するか？
RQ3人間の disagreement を生み出す認知的またはアノテーション要因は何か、そしてそれらはモデルの予測性にどう影響するか？
RQ4perspectivist モデルは細粒度な annotator の解釈を現実的な設定で予測可能か、それとも distribution-focused アプローチの方が信頼性が高いか？

主な発見

disagreement-aware アプローチは多数ラベル（単一ラベル）の予測精度を改善する。
annotator-specific モデルは Level-1 では良好に機能するが、Level-2 ではクラス数の増加とともに劣化する。
ラベル分布（ソフトラベル）から学習することが、分布予測において最も強い性能を提供し、特に annotator が未知の場合に顕著。
perspectivist モデルは分布を予測できるが、認知的に負荷の高いケースで一貫性の欠如により細粒度で効果が失われる。
annotator の一貫性と多数派との一致は worker ごとに異なり、高い disagreement は perspectivist モデルの性能低下と相関する。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。