QUICK REVIEW

[论文解读] The Surveillance AI Pipeline

Pratyusha Kalluri, William Agnew|arXiv (Cornell University)|Sep 26, 2023

Ethics and Social Impacts of AI被引用 24

一句话总结

本文分析了三十年计算机视觉论文及其下游专利，揭示 AI 研究如何促成对人类数据的提取并推动监控，显示出一个将研究与监控专利联系在一起的领域性规范。

ABSTRACT

A rapidly growing number of voices argue that AI research, and computer vision in particular, is powering mass surveillance. Yet the direct path from computer vision research to surveillance has remained obscured and difficult to assess. Here, we reveal the Surveillance AI pipeline by analyzing three decades of computer vision research papers and downstream patents, more than 40,000 documents. We find the large majority of annotated computer vision papers and patents self-report their technology enables extracting data about humans. Moreover, the majority of these technologies specifically enable extracting data about human bodies and body parts. We present both quantitative and rich qualitative analysis illuminating these practices of human data extraction. Studying the roots of this pipeline, we find that institutions that prolifically produce computer vision research, namely elite universities and "big tech" corporations, are subsequently cited in thousands of surveillance patents. Further, we find consistent evidence against the narrative that only these few rogue entities are contributing to surveillance. Rather, we expose the fieldwide norm that when an institution, nation, or subfield authors computer vision papers with downstream patents, the majority of these papers are used in surveillance patents. In total, we find the number of papers with downstream surveillance patents increased more than five-fold between the 1990s and the 2010s, with computer vision research now having been used in more than 11,000 surveillance patents. Finally, in addition to the high levels of surveillance we find documented in computer vision papers and patents, we unearth pervasive patterns of documents using language that obfuscates the extent of surveillance. Our analysis reveals the pipeline by which computer vision research has powered the ongoing expansion of surveillance.

研究动机与目标

评估计算机视觉研究自述如何实现对人类数据的提取。
量化论文及下游专利中人体数据提取的普遍性及类型。
绘制监控人工智能在机构、国家、子领域与年度中的根源，以揭示领域性的规范。
识别在 CV 研究与专利中模糊监控含义的语言实践。

提出的方法

收集并分析超过 40,000 篇计算机视觉论文及下游专利，关联 19,000+ 篇 CV 论文至 23,000+ 项专利。
对一个子集（100 篇论文和 100 件专利）进行定性内容分析，以归类人体数据提取的目标。
对完整语料库中人体数据提取的普遍性进行定量统计（论文与专利对比）。
分析几十年来 CV 论文的下游专利被用于监控专利的比例趋势。
检查文本中掩盖监控含义的语言模式和模糊化语言。

实验结果

研究问题

RQ1有多少比例的带注释的计算机视觉论文和下游专利提取了关于人类的数据？
RQ2在 CV 论文和专利中识别出的四个人体数据提取目标是什么，它们的普遍性有多高？
RQ3具有下游专利的 CV 论文在数十年、机构、国家和子领域中对监控专利的贡献程度有多大？
RQ4是否存在掩盖CV研究与专利监控潜力的模糊化语言证据？
RQ5从1990年代到2010年代，CV研究与监控专利之间的关系如何演变？

主要发现

90%的带注释的CV论文和专利能够提取关于人类的数据。
68%的论文和专利明确提取关于人体及身体部位的数据。
在监控专利中使用下游专利的 CV 论文比例，从1990年代的50%上升到2010年代的79%。
从1990年代到2010年代，具有下游专利的CV论文数量增长超过五倍。
存在广泛的模糊化语言，将人类当作对象对待，或在图表与数据集中隐瞒人体数据提取。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。