QUICK REVIEW

[论文解读] Facebook Ads Monitor: An Independent Auditing System for Political Ads on Facebook

Márcio Silva, Lucas Santos de Oliveira|arXiv (Cornell University)|Jan 28, 2020

Social Media and Politics参考文献 24被引用 72

一句话总结

本文介绍一个独立审计系统 Facebook Ad Monitor，通过浏览器扩展收集 Facebook 广告，给金标准打标签，并使用 CNN 及其他 ML 模型检测政治广告，揭示 Facebook Ad Library 之外未申报的政治广告。

ABSTRACT

The 2016 United States presidential election was marked by the abuse of targeted advertising on Facebook. Concerned with the risk of the same kind of abuse to happen in the 2018 Brazilian elections, we designed and deployed an independent auditing system to monitor political ads on Facebook in Brazil. To do that we first adapted a browser plugin to gather ads from the timeline of volunteers using Facebook. We managed to convince more than 2000 volunteers to help our project and install our tool. Then, we use a Convolution Neural Network (CNN) to detect political Facebook ads using word embeddings. To evaluate our approach, we manually label a data collection of 10k ads as political or non-political and then we provide an in-depth evaluation of proposed approach for identifying political ads by comparing it with classic supervised machine learning methods. Finally, we deployed a real system that shows the ads identified as related to politics. We noticed that not all political ads we detected were present in the Facebook Ad Library for political ads. Our results emphasize the importance of enforcement mechanisms for declaring political ads and the need for independent auditing platforms.

研究动机与目标

在选举期间推动并解决巴西定向政治广告的风险。
设计并部署一个独立的审计平台，通过志愿者来监测政治广告。
开发一个政治广告分类器并评估多种机器学习模型。
将分类器性能与金标准数据集和真实世界广告进行比较。
证明独立审计在提升选举透明度方面的可行性和潜在影响。

提出的方法

改造了浏览器扩展，以收集志愿者时间线中可见的广告，并从“Why am I seeing this?”中获取广告解释。
构建了两个数据集：一个金标准的政治/非政治广告集合，以及一个来自巴西广告的大型AdCollector数据集。
实现并比较六种分类器（CNN、SVM、逻辑回归、随机森林、带哈希的朴素贝叶斯、梯度提升）用于政治广告检测。
使用Word2Vec 300维嵌入来表示广告，并设计一个具有120个过滤器和 dropout 的CNN，使用 RMSProp 训练。
使用10折交叉验证评估模型，报告准确率、AUC和Macro-F1；为反映真实世界的失衡，设定低假阳性率的阈值。
部署了实时政治广告检测器，并分析在 Ad Library 之外检测到的广告，以评估覆盖范围和执法需求。

实验结果

研究问题

RQ1一个独立审计平台如何在巴西收集和分析 Facebook 上的政治广告？
RQ2哪些机器学习模型最适合检测葡萄牙语 Facebook 内容中的政治广告？
RQ3在覆盖范围方面，检测到的政治广告集合与 Facebook Ad Library 相比如何？
RQ4在真实世界的失衡数据集中，哪些阈值能在真阳性与假阳性之间取得平衡？
RQ5对政治广告披露的政策与执法有哪些影响？

主要发现

CNN及其他模型在几乎平衡的金标准上实现了约94%的高准确性。
CNN和朴素贝叶斯在AUC值约0.98–0.99和Macro-F1约0.94方面表现良好。
在1%的假阳性率下，CNN的真正例率为78%，朴素贝叶斯为85%；在3%时，CNN为90%，朴素贝叶斯为95%。
AdCollector数据集在选举期间的38,110条葡萄牙语广告中发现835条政治广告，表明存在 Facebook Ad Library 之外的未申报政治内容。
检测到的政治广告的一部分出现在 Ad Library 中，但很多没有，凸显声明机制的执法差距。
研究展示了独立审计平台在选举透明度方面的可行性和潜在积极影响。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。