QUICK REVIEW

[论文解读] AMMeBa: A Large-Scale Survey and Dataset of Media-Based Misinformation In-The-Wild

Nicholas Dufour, Arkanath Pathak|arXiv (Cornell University)|May 19, 2024

Misinformation and Its Impacts被引用 6

一句话总结

AMMeBa 展示了一项 two-year 人类注释的野外媒体信息错误研究，聚焦于使用 ClaimReview 事实核查的图像及媒体相关主张，并提供一个公开的注释数据集，描述媒体类型和操作方法。

ABSTRACT

The prevalence and harms of online misinformation is a perennial concern for internet platforms, institutions and society at large. Over time, information shared online has become more media-heavy and misinformation has readily adapted to these new modalities. The rise of generative AI-based tools, which provide widely-accessible methods for synthesizing realistic audio, images, video and human-like text, have amplified these concerns. Despite intense public interest and significant press coverage, quantitative information on the prevalence and modality of media-based misinformation remains scarce. Here, we present the results of a two-year study using human raters to annotate online media-based misinformation, mostly focusing on images, based on claims assessed in a large sample of publicly-accessible fact checks with the ClaimReview markup. We present an image typology, designed to capture aspects of the image and manipulation relevant to the image's role in the misinformation claim. We visualize the distribution of these types over time. We show the rise of generative AI-based content in misinformation claims, and that its commonality is a relatively recent phenomenon, occurring significantly after heavy press coverage. We also show "simple" methods dominated historically, particularly context manipulations, and continued to hold a majority as of the end of data collection in November 2023. The dataset, Annotated Misinformation, Media-Based (AMMeBa), is publicly-available, and we hope that these data will serve as both a means of evaluating mitigation methods in a realistic setting and as a first-of-its-kind census of the types and modalities of online misinformation.

研究动机与目标

使用带有 ClaimReview 标记的公开事实核查，量化野外媒体型基于错误信息的流行度及其表现形式。
开发一个聚焦于图像的类型学，以捕捉与错误信息有效性和缓解相关的媒体特征。
提供一个公开可访问的被注释的错误信息主张数据集，以支持缓解方法评估和未来研究。

提出的方法

从带有 ClaimReview 标记的公开事实核查中取样错误信息主张（135,838 条被注释的主张）。
注释者按模态（图像、视频、音频）和操控类型（内容、情境、基于文本、伪造文书）对媒体基于的主张进行分类。
图像被分为子类型（basic, complex, screenshots, analog gap, self-contextualizing, text-based, fake documents），并通过内容、情境或基于文本的方法进行操控。
基于阶段的注释工作流以管理认知负荷；四个阶段，粒度逐步增加，并配有基于网络的注释界面。
评分者（83）在较长时间内参与，进行培训和本地化以提高上下文准确性。

Figure 1 : Examples of media occurring alongside fact-checked misinformation claims . In this report, we introduce a typology to capture the enormous variation in media-based (particularly image-based) misinformation seen in-the-wild and categorize a very large sample of misinformation claims with i

实验结果

研究问题

RQ1野外在一个大型多语言事实核查语料库中，基于媒体的信息错误的流行度及时间分布是如何的？
RQ2在真实世界的主张中，哪些类型学和操控类别能够表征基于图像的错误信息？
RQ3在错误信息主张中，AI 生成的媒体使用如何演变，随时间的主导模态和操控类型有哪些？
RQ4如何利用丰富注释的媒体基于错误信息数据集，在现实情境中支持缓解方法和评估？

主要发现

基于媒体的错误信息主张约占分析案例的大多数，约 80% 。
历史上图像主导错误信息主张，但自 2022 年起视频变得更常见，现在参与超过 60% 的媒体相关主张。
在 2023 春季之前，AI 生成的内容较少，但在其后在事实核查的错误信息主张中显著增加。
图像操控往往较为简单且以情境为基础，情境操控常常提供关于图像、来源或描绘的错误细节。
图像中经常出现文本，用以表达错误信息主张，基于文本的图像构成一个独特的操控类别。
AMMeBa 注释错误信息数据集可在 Kaggle 上公开用于研究。

Figure 2 : Media manipulations have a long history . Top Left : A comparison of an image of Joseph Stalin, originally taken in 1937, where an associate, Nikolai Yezhov, is present along with a later version where he has been manually removed from the official image with airbrushing, following his fa

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。