QUICK REVIEW

[論文レビュー] Into the LAIONs Den: Investigating Hate in Multimodal Datasets

Abeba Birhane, Vinay Uday Prabhu|arXiv (Cornell University)|Nov 6, 2023

Hate Speech and Cyberbullying Detection被引用数 16

ひとこと要約

この論文は、二つのオープンソースのビジョン-言語データセットであるLAION-400MとLAION-2B-enを検証し、憎悪を含む content がデータセットの規模とともに増加すること、そして画像ベースのNSFWフィルタリングが有害な代替テキストを完全には除去できないことを示します。

ABSTRACT

'Scale the model, scale the data, scale the compute' is the reigning sentiment in the world of generative AI today. While the impact of model scaling has been extensively studied, we are only beginning to scratch the surface of data scaling and its consequences. This is especially of critical importance in the context of vision-language datasets such as LAION. These datasets are continually growing in size and are built based on large-scale internet dumps such as the Common Crawl, which is known to have numerous drawbacks ranging from quality, legality, and content. The datasets then serve as the backbone for large generative models, contributing to the operationalization and perpetuation of harmful societal and historical biases and stereotypes. In this paper, we investigate the effect of scaling datasets on hateful content through a comparative audit of two datasets: LAION-400M and LAION-2B. Our results show that hate content increased by nearly 12% with dataset scale, measured both qualitatively and quantitatively using a metric that we term as Hate Content Rate (HCR). We also found that filtering dataset contents based on Not Safe For Work (NSFW) values calculated based on images alone does not exclude all the harmful content in alt-text. Instead, we found that trace amounts of hateful, targeted, and aggressive text remain even when carrying out conservative filtering. We end with a reflection and a discussion of the significance of our results for dataset curation and usage in the AI community. Code and the meta-data assets curated in this paper are publicly available at https://github.com/vinayprabhu/hate_scaling. Content warning: This paper contains examples of hateful text that might be disturbing, distressing, and/or offensive.

研究の動機と目的

大規模マルチモーダルデータセットの監査の必要性を、モデル中心のスケーリング信念を超えて動機づけ、根拠づける。
400Mから2B-enへのサンプル拡大が、憎悪・標的化・攻撃的な代替テキストの内容にどのような影響を与えるかを評価する。
画像NSFWラベルと付随する代替テキストの毒性の関係を評価する。
透明で公平なデータセットの作成と利用に向けて、方法論的および政策的提言を提案する。

提案手法

各データセットのシャードから100,000画像行をサブサンプルし、3.2百万（400M）および12.8百万（2B-en）画像-テキストペアを取得する。
pysentimientoのヘイトスピーチアナライザを用いて、各代替テキストに対して三つのスコア（憎悪、標的化、攻撃的）を取得する。
Hate Content Rate (HCR) を、各カテゴリおよびAny-of-the-threeについて閾値P_thresholdを超えるスコアを持つサンプルの割合として定義する。
閾値ベースの曲線とWilsonのスコア区間を用いて、データセット間のHCRを比較し統計的差を評価する。
400Mでは32シャード、2B-enでは128シャードのファイル単位HCRを分析し、平均を比較するためにWelchのt検定を行う。
LAION-2B-enのサンプルを用いて、NSFW画像ラベルと代替テキストの毒性との間のピアソン相関を評価する。

Figure 1 : HCR curves for the LAION400M and LAION-2B-en datasets using pysentimiento outputs showing that Hate Content Rate increased with dataset size.

実験結果

リサーチクエスチョン

RQ1LAION-400MをLAION-2B-enへスケールすることで、憎悪・標的化・攻撃的な代替テキストの出現率は増加するか。
RQ2画像ベースのNSFWフィルターは代替テキストに検出される毒性とどの程度一致するか。
RQ3ファイルレベルのHCRは、データセット間のシャードを比較した際にデータセットレベルのHCRと一致するか。
RQ4大規模なビジョン-言語データセットの透明で頑健な監査とキュレーションに関する推奨事項は何か。

主な発見

LAION-2B-enは閾値全般でAny-of-the-threeのHCRがLAION-400Mよりも高く、規模拡大とともに憎悪コンテンツが増加することを示している。
閾値P_threshold=0.5のとき、憎悪表現のHCRはLAION-2B-enで最大0.7、LAION-400Mで最大0.6に達する。
ファイル単位のHCRは憎悪・標的化・攻撃的の各カテゴリでLAION-2B-enの方が高く、Welchのt検定において統計的に強い根拠（非常に小さなp値）を示す。
NSFW画像ラベルと憎悪的/標的化代替テキストとの間にはわずかな相関があり（相関係数約0.227〜0.215）、攻撃的コンテンツでは弱い（0.076）。
画像だけに基づくNSFWフィルタリングは、憎悪的または標的化された代替テキストを信頼性高く除去するものではなく、安全とされるサブセットにも一部の有害内容が残る。

Figure 2 : Fused swarm-box-violinplot that captures the file-wise HCR metrics for all the 160 (=32+128) parquet files from LAION400M and LAION-2B-en. HCRs for LAION-2B-en (the red swarms) are higher than the 32 file-level HCRs for the LAION400M (the blue swarms) for all three sub-categories – hatefu

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。