QUICK REVIEW

[論文レビュー] The Gray Area: Characterizing Moderator Disagreement on Reddit

Shayan Alipour, Shruti Phadke|arXiv (Cornell University)|Jan 4, 2026

Hate Speech and Cyberbullying Detection被引用数 0

ひとこと要約

論文は Reddit のグレーゾーンモデレーションを分析し、モデレーター間の争いは約7分の1のケースで発生し、自動化アクションは人間によってしばしば覆されること、LLM はグレーゾーンの内容に対して未確定ケースよりも苦戦することを示している。

ABSTRACT

Volunteer moderators play a crucial role in sustaining online dialogue, but they often disagree about what should or should not be allowed. In this paper, we study the complexity of content moderation with a focus on disagreements between moderators, which we term the ``gray area'' of moderation. Leveraging 5 years and 4.3 million moderation log entries from 24 subreddits of different topics and sizes, we characterize how gray area, or disputed cases, differ from undisputed cases. We show that one-in-seven moderation cases are disputed among moderators, often addressing transgressions where users' intent is not directly legible, such as in trolling and brigading, as well as tensions around community governance. This is concerning, as almost half of all gray area cases involved automated moderation decisions. Through information-theoretic evaluations, we demonstrate that gray area cases are inherently harder to adjudicate than undisputed cases and show that state-of-the-art language models struggle to adjudicate them. We highlight the key role of expert human moderators in overseeing the moderation process and provide insights about the challenges of current moderation processes and tools.

研究の動機と目的

モデレーター間のモデレーション事例が未確定ケースと比較してどのくらい頻繁に争われるのか（グレーゾーン）を定量化する。
人間と自動モデレーターの関与によりグレーゾーンの特徴を分解する。
論争を生むモデレーターの正当化理由、内容特徴、ダイナミクスを検討する。
グレーゾーンケースに対するLLMベースのモデレーションの実現可能性と限界を評価する。
モデレーターのツールとガバナンスの透明性を改善するための洞察を提供する。

提案手法

5年間で24サブレディットのTree 4.3百万件のモデレーションアクションを含む長期データセットOpenModLogを分析する。
グレーゾーンを、複数のモデレーター間で2件以上の対立するモデレーションアクションがあり、最後のアクションをグラウンドトゥルースとして扱うケースとして定義する。
ケースを4つの層に層別化する：グレー-人間、グレー-ボット、未確定-人間、未確定-ボット。
Bies および関連分類に基づくモデレーターの正当化を注釈付けし、自由記述の理由を472クラスターと7つの主題カテゴリにマッピングする。
バイアスを除去したコメントテキストに対してトピックモデリング（Debiasing付きのBERTTopic）を適用し、グレーゾーンのコンテンツテーマを特定する。
点推定情報量（PVI）を用いてテキスト難易度を評価し、最終モデレーション決定との整合性を測る6つのLLMをベンチマークする。
決定論的なプロンプト下で層を横断してLLMsを評価し、ブートストラップ済み信頼区間でマクロF1を測定する。

Figure 1: (a) An example showing multiple moderation actions on a single comment. (b) Cases are partitioned into four mutually exclusive strata by (i) whether there is within-case disagreement defined as more than one unique moderator and more than one unique action and (ii) whether any moderator is

実験結果

リサーチクエスチョン

RQ1Reddit の複数サブレディットにおけるグレーゾーンモデレーションケースの有病率と構成はどのようか？
RQ2グレーゾーンケースは人間によるモデレーションと自動システムによるモデレーションでどのように異なるか？
RQ3グレーゾーンの争いを生む正当化理由と内容特徴は何か？
RQ4現在のLLMは最終的なモデレーション決定に確実に整合できるのか、特にグレーゾーンの内容について？
RQ5グレーゾーンと未確定モデレーションの決定におけるLLMの性能にはどのような要因が影響するか？

主な発見

グレーゾーンモデレーションは一般的で、全モデレーションアクションの約13.54%（n=578,251 / 4,272,178）を占める。
グレーゾーンケースのうち、53.86%は人間モデレーターのみ、46.14%は少なくとも1つのボットモデレーターを含む。
初期のボットアクションは主に削除（95.49%）であり、人間はよりバランスのとれたパターンを示す（承認68.67%、削除31.33%）。
ボット→人間の連続で、初期のボット削除の後に人間による承認が続くケースが93.46%で、ボットによる過剰モデレーションがその後人間によって是正されることを示唆する。
経験豊富な人間モデレーターは、経験の浅い同僚の決定を覆す傾向があり、覆されるモデレーターは平均して51日上位である（95%CI [37,65]）。
LLMsはグレー-ボットおよび未確定-人間ケースでグレー-人間および未確定-ボットケースよりも整合性が高く、LLMのマクロF1は未確定-人間で約0.61、グレー-ボットで約0.59だが、グレー-人間および未確定-ボットでは約0.51となる。

Figure 2: Share of moderation cases by stratum (gray vs. undisputed) across rule categories. Gray-area cases are relatively overrepresented in trolling, brigading, and doxxing, while spam, link-only, and formatting violations make up a larger share of undisputed cases.

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。