QUICK REVIEW

[論文レビュー] Unveiling Covert Toxicity in Multimodal Data via Toxicity Association Graphs: A Graph-Based Metric and Interpretable Detection Framework

Gang Wu, Zihao Zhu|arXiv (Cornell University)|Feb 3, 2026

Hate Speech and Cyberbullying Detection被引用数 0

ひとこと要約

論文は Toxicity Association Graphs (TAGs) と Multimodal Toxicity Covertness (MTC) 指標を導入し、画像-テキストデータの潜在的な毒性を検出する新しい Covert Toxic Dataset (CTD) と解釈可能な TA-CTD 検出フレームワークを提供します。

ABSTRACT

Detecting toxicity in multimodal data remains a significant challenge, as harmful meanings often lurk beneath seemingly benign individual modalities: only emerging when modalities are combined and semantic associations are activated. To address this, we propose a novel detection framework based on Toxicity Association Graphs (TAGs), which systematically model semantic associations between innocuous entities and latent toxic implications. Leveraging TAGs, we introduce the first quantifiable metric for hidden toxicity, the Multimodal Toxicity Covertness (MTC), which measures the degree of concealment in toxic multimodal expressions. By integrating our detection framework with the MTC metric, our approach enables precise identification of covert toxicity while preserving full interpretability of the decision-making process, significantly enhancing transparency in multimodal toxicity detection. To validate our method, we construct the Covert Toxic Dataset, the first benchmark specifically designed to capture high-covertness toxic multimodal instances. This dataset encodes nuanced cross-modal associations and serves as a rigorous testbed for evaluating both the proposed metric and detection framework. Extensive experiments demonstrate that our approach outperforms existing methods across both low- and high-covertness toxicity regimes, while delivering clear, interpretable, and auditable detection outcomes. Together, our contributions advance the state of the art in explainable multimodal toxicity detection and lay the foundation for future context-aware and interpretable approaches. Content Warning: This paper contains examples of toxic multimodal content that may be offensive or disturbing to some readers. Reader discretion is advised.

研究の動機と目的

グラフ構造（TAGs）を通じて、良性な視覚/テキスト概念と潜在的毒性含意との意味論的連関をモデル化する。
MTCスコアというスカラーで covert toxicity を定量化し、マルチモーダル表現における隠蔽を測定する。
高いカバー性の毒性マルチモーダル事例を対象としたベンチマーク CTD を作成する。
毒性決定に対する説明を含む解釈可能な検出経路を提供する。

提案手法

Toxicity Association Graphs (TAGs) を視覚的およびテキストの連関ツリーの組として、さらに跨モダリティの二部グラフを含むタプルとして定義する。
ルートを画像概念から開始し、子ノードの分岐を制限し、遷移確率を用いて階層的推論パスを形成することで TAGs を構築する。
Multimodal Toxicity Covertness (MTC) を c = 1 - p_hat として計算する。ここで p_hat は両モダリティにおける根-ノード間の累積遷移確率の積である。
TA-CTD を開発し、TAGs を用いて oracle toxic セットとのマッチングを実行し、LLM による説明を生成して毒性を検出する。
GPT-4.1 と GPT-Image-1 を活用した多エージェントのデータ生成パイプライン（CTD）を導入し、人間による検証を経た高カバー性の毒性画像-テキスト対を作成する。

Figure 1 : Image-text examples with increasing covertness levels:(a) both modalities are toxic, (b) only one modality is toxic, (c) both modalities are non-toxic.

実験結果

リサーチクエスチョン

RQ1Toxicity Association Graphs は、横断モダリティの連関からのみ生じる covert toxicity の検出を可能にするか？
RQ2MTC のような指標でマルチモーダルコンテンツの covertness をどう定量化できるか？
RQ3TAGs をモデレーションモデルと統合することで overt および covert ケースの covert toxicity 検出を改善できるか？
RQ4Covert Toxic Dataset は高いカバー性のマルチモーダル毒性の難易度の高いベンチマークか？
RQ5TAGs 由来の説明は透明で監査可能な意思決定経路を提供するか？

主な発見

TAG ベースの推論は、複数の MLLMs の covert toxicity の検出を、ベーシックな入力と比較して改善する。
TA-CTD は CTD での F2 スコアを大幅に向上させ、例えば Gemma3 を 0.31 から 0.82、Llama 3.2 Vision を高カバー性でほぼ 0.97 に近づける。
CTD データセットは主に高い MTC 値を示し、いくつかの既存データセットとは異なり高カバー性毒性に焦点を当てている。
アブレーション実験は、深い TAGs (l_max = 4) が高カバー性検出に不可欠である一方、低カバー性には浅い TAGs で十分であることを示す。
TA-CTD は Hateful Memes および VLSBench における overt および covert 毒性へ一般化し、混合データセットで Vanilla より F2-スコアを改善する。
ケーススタディは、良性の視覚的/テキスト的手掛かりと潜在的毒性含意を結ぶ解釈可能な経路を示す。

Figure 2 : Workflow of TA-CTD and computation of Multimodal Toxicity Covertness score.

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。