QUICK REVIEW

[論文レビュー] AMMeBa: A Large-Scale Survey and Dataset of Media-Based Misinformation In-The-Wild

Nicholas Dufour, Arkanath Pathak|arXiv (Cornell University)|May 19, 2024

Misinformation and Its Impacts被引用数 6

ひとこと要約

AMMeBaは、ClaimReviewファクトチェックを用いた画像およびメディア関連の主張に焦点を当てた、野外でのメディアベースの誤情報に関する二年間の人間注釈研究を提示し、メディアの種類と操作方法を説明する注釈の公開データセットを提供します。

ABSTRACT

The prevalence and harms of online misinformation is a perennial concern for internet platforms, institutions and society at large. Over time, information shared online has become more media-heavy and misinformation has readily adapted to these new modalities. The rise of generative AI-based tools, which provide widely-accessible methods for synthesizing realistic audio, images, video and human-like text, have amplified these concerns. Despite intense public interest and significant press coverage, quantitative information on the prevalence and modality of media-based misinformation remains scarce. Here, we present the results of a two-year study using human raters to annotate online media-based misinformation, mostly focusing on images, based on claims assessed in a large sample of publicly-accessible fact checks with the ClaimReview markup. We present an image typology, designed to capture aspects of the image and manipulation relevant to the image's role in the misinformation claim. We visualize the distribution of these types over time. We show the rise of generative AI-based content in misinformation claims, and that its commonality is a relatively recent phenomenon, occurring significantly after heavy press coverage. We also show "simple" methods dominated historically, particularly context manipulations, and continued to hold a majority as of the end of data collection in November 2023. The dataset, Annotated Misinformation, Media-Based (AMMeBa), is publicly-available, and we hope that these data will serve as both a means of evaluating mitigation methods in a realistic setting and as a first-of-its-kind census of the types and modalities of online misinformation.

研究の動機と目的

ClaimReviewマークアップ付きの公開ファクトチェックを用いて、野外でのメディアベースの誤情報の蔓延度と形態を定量化する。
誤情報の有効性と緩和に関連するメディア特性を捉える、画像に焦点を当てた分類体系を開発する。
緩和手法の評価と今後の研究を支援するため、注釈付きの誤情報主張の公開データセットを提供する。

提案手法

ClaimReviewマークアップを伴う公開ファクトチェックから誤情報主張をサンプリングした（注釈付き主張135,838件）。
注釈者は、メディアベースの主張をモダリティ（画像、動画、音声）および操作タイプ（内容、文脈、テキストベース、偽文書）で分類した。
画像はサブタイプ（基本、複雑、スクリーンショット、アナログギャップ、自己文脈化、テキストベース、偽文書）に分類され、内容、文脈、またはテキストベースの手法で操作された。
認知負荷を管理する段階別注釈ワークフロー。細分化を増やす4段階とWebベースの注釈インターフェース。
評価者（83名）が長期間にわたり参加し、訓練とローカライゼーションを通じて文脈精度を向上させた。

Figure 1 : Examples of media occurring alongside fact-checked misinformation claims . In this report, we introduce a typology to capture the enormous variation in media-based (particularly image-based) misinformation seen in-the-wild and categorize a very large sample of misinformation claims with i

実験結果

リサーチクエスチョン

RQ1大規模で多言語のファクトチェックコーパス全体で、野外のメディアベースの誤情報の蔓延と時系列分布はどうなっているか？
RQ2現実世界の主張における画像ベースの誤情報を特徴づける類型と操作カテゴリは何か？
RQ3誤情報主張におけるAI生成メディアの使用はどのように進化しており、時系列で支配的なモダリティと操作タイプは何か？
RQ4豊富な注釈付きメディアベースの誤情報データセットは、現実的な設定での緩和手法と評価をどのように支援できるか？

主な発見

メディアベースの誤情報主張は分析ケースの大半を占め、約80％である。
画像は歴史的に誤情報主張を支配していたが、2022年以降 videosがより一般的となり、現在はメディア関与の主張の60%以上を占めている。
AI生成コンテンツは2023春以前はまれだったが、その後、ファクトチェックの誤情報主張で劇的に増加した。
画像の操作は多くが単純で文脈ベースであり、文脈操作はしばしば画像の詳細、出所、描写について誤情報をもたらした。
画像にはテキストが頻繁に含まれ、誤情報主張を明示し、テキストベースの画像は独自の操作カテゴリを構成する。
注釈付き誤情報データセットであるAMMeBaは研究用としてKaggleで公開されている。

Figure 2 : Media manipulations have a long history . Top Left : A comparison of an image of Joseph Stalin, originally taken in 1937, where an associate, Nikolai Yezhov, is present along with a later version where he has been manually removed from the official image with airbrushing, following his fa

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。