QUICK REVIEW

[论文解读] As Good As A Coin Toss: Human detection of AI-generated images, videos, audio, and audiovisual stimuli

Di Cooke, Abigail Edwards|arXiv (Cornell University)|Mar 25, 2024

Ethics and Social Impacts of AI被引用 5

一句话总结

本研究测量人们在图像、音频、视频和视听格式中分辨 AI 生成的媒体与真实内容的能力，发现检测接近机会水平并存在若干降低准确性的因素。

ABSTRACT

One of the current principal defenses against weaponized synthetic media continues to be the ability of the targeted individual to visually or auditorily recognize AI-generated content when they encounter it. However, as the realism of synthetic media continues to rapidly improve, it is vital to have an accurate understanding of just how susceptible people currently are to potentially being misled by convincing but false AI generated content. We conducted a perceptual study with 1276 participants to assess how capable people were at distinguishing between authentic and synthetic images, audio, video, and audiovisual media. We find that on average, people struggled to distinguish between synthetic and authentic media, with the mean detection performance close to a chance level performance of 50%. We also find that accuracy rates worsen when the stimuli contain any degree of synthetic content, features foreign languages, and the media type is a single modality. People are also less accurate at identifying synthetic images when they feature human faces, and when audiovisual stimuli have heterogeneous authenticity. Finally, we find that higher degrees of prior knowledgeability about synthetic media does not significantly impact detection accuracy rates, but age does, with older individuals performing worse than their younger counterparts. Collectively, these results highlight that it is no longer feasible to rely on the perceptual capabilities of people to protect themselves against the growing threat of weaponized synthetic media, and that the need for alternative countermeasures is more critical than ever before.

研究动机与目标

评估人类在多种模态（图像、音频、视频、视听）中区分真实媒体与 AI 生成媒体的能力。
识别在接近现实世界的合成媒体情景中影响检测准确性的因素。
评估先前知识与人口统计因素对检测表现的影响。
为超越感知能力的武器化合成媒体提供防御策略建议。

提出的方法

进行一项包含 1276 名参与者的知觉研究，评估图像、音频、视频和视听刺激中 AI 生成与真实媒体的真实性。
分析检测表现（准确性）及其如何随合成操控程度、语言特征和模态的变化而变化。
检查图像中的面部内容以及视听刺激中真实性的异质性如何影响准确性。
研究参与者年龄和对合成媒体的先验知识对检测表现的影响。

实验结果

研究问题

RQ1人们在图像、音频、视频和视听格式中多大程度上能够将 AI 生成的媒体与真实媒体区分开？
RQ2哪些因素（合成操控程度、语言、模态、面部内容、真实性的异质性）会影响检测准确性？
RQ3对合成媒体的先验知识是否会改善检测，年龄如何影响表现？

主要发现

在所有媒体类型中，平均检测表现接近机会水平（约 50%）。
当刺激包含任何程度的合成内容、出现外语特征，或媒体类型为单一模态时，准确性会下降。
当合成图像包含人脸时，人们识别为合成图像的准确性较低。
真实性存在异质性的视听刺激会降低检测准确性。
对合成媒体的更高先验知识并不显著提高准确性，而年龄较大的参与者的表现通常比年龄较小的参与者差。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。