QUICK REVIEW

[論文レビュー] See, Explain, and Intervene: A Few-Shot Multimodal Agent Framework for Hateful Meme Moderation

Naquee Rizwan, Subhankar Swain|arXiv (Cornell University)|Jan 8, 2026

Hate Speech and Cyberbullying Detection被引用数 0

ひとこと要約

この論文は、 hate 言 memes を同時に検出し、説明を生成し、介入を提案する few-shot の multimodal エージェントフレームワークを提案します。タスク固有のエージェントを用いて大型モデルの few-shot prompting のための silver data を作成します。

ABSTRACT

In this work, we examine hateful memes from three complementary angles - how to detect them, how to explain their content and how to intervene them prior to being posted - by applying a range of strategies built on top of generative AI models. To the best of our knowledge, explanation and intervention have typically been studied separately from detection, which does not reflect real-world conditions. Further, since curating large annotated datasets for meme moderation is prohibitively expensive, we propose a novel framework that leverages task-specific generative multimodal agents and the few-shot adaptability of large multimodal models to cater to different types of memes. We believe this is the first work focused on generalizable hateful meme moderation under limited data conditions, and has strong potential for deployment in real-world production scenarios. Warning: Contains potentially toxic contents.

研究の動機と目的

hate meme モデレーションにおける検出・説明・介入のギャップを、限られたデータ条件で動作するエンドツーエンドのフレームワークを提示することで橋渡しする。
タスク固有で微調整された multimodal エージェントを活用し、 end-to-end の few-shot 学習を大規模モデル上で可能にする silver training data を生成する。
既存の hate meme ベンチマークを拡張し、分類・説明・介入のエンドツーエンド評価をサポートする一貫したデータセットを作成・編成する。
silver data を用いた few-shot prompting と GPT-4o の組み合わせで、低リソース設定下の標準的 hate meme ベンチマークにおいて最先端の成果を示す。

提案手法

3 つのタスク特化エージェント（キャプショニング、説明、介入）を、小規模マルチモーダルモデル（paligemma-3b-pt-448）と既存データセット（MemeCap、HatReDAug、MemeSense）を用いてトレーニングし、silver data を生成する。
Cosine 類似度（SigLIP 埋め込み）を用いた exemplars ベースの few-shot prompting で、テストセットの近傍から高関連性の exemplars を選択する。
それぞれの exemplar を三つのエージェントで処理してキャプション、説明、介入を得て（該当する場合）、この強化された文脈をより大きなマルチモーダルモデルの予測へ入力する。
二段階のフレームワークを適用する：（i）タスク特化エージェントで silver data を生成、（ii）大規模モデル（GPT-4o、Intern-VL3、Pixtral）を用いた分類、説明、介入の few-shot prompting を実施。
分類には正確さとマクロ-F1、説明と介入には Rouge-L、Semantic Similarity、BertScore-F1 を評価する。
PromptHate、Pro-Cap、ModHate、Few-Shot 手法、MemeSense などの複数のベースラインと、FHM および MAMI データセットを比較する。

Figure 1: Overview of our novel task formulation.

実験結果

リサーチクエスチョン

RQ1限られたデータ条件下で、エンドツーエンドの hate メームモデレーションシステムは同時に分類・説明・介入を実現できるか？
RQ2タスク特化の小規模 multimodal エージェントは、大規模モデルの few-shot 学習に有用な silver data の生成にどれだけ効果的か？
RQ3拡張された exemplars とエージェント生成の説明・介入を用いた few-shot prompting は、標準的な hate meme ベンチマークで既存のベースラインを上回るか？
RQ4この設定で生成される説明と介入の定性的特性（一貫性、感情、トークン化）はどうなるか？

主な発見

GPT-4o を用いた few-shot 分類は、FHM で macro-F1=80.25%、MAMI で 89.07% を達成し、ベースラインを上回る。
GPT-4o が生成した説明は、FHM および MAMI で HatReD ベースの説明よりセマンティック類似度が高い（それぞれ 0.679、0.654）。
Intern-VL3 および Pixtral は、介入生成において MemeSense を上回り、FHM および MAMI でセマンティック類似度が 0.777、0.849。
GPT-4o は最も一貫性のある説明と介入を生成し、トークン数の一貫性が高く、困惑度が低く、データセット間で意味的整合性が良い。
オープンモデルは介入テキストがより繰り返しになりがちだが、GPT-4o は説明の語彙的多様性が高い一方、非 hate ケースでの変動はやや大きい。

Figure 2: Overview of fine-tuning task specific agents and using them for silver data generation of FHM and MAMI datasets.

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。