QUICK REVIEW

[论文解读] Fine-Grained Car Detection for Visual Census Estimation

Timnit Gebru, Jonathan Krause|arXiv (Cornell University)|Sep 7, 2017

Video Surveillance and Tracking Methods参考文献 20被引用 20

一句话总结

本文提出了一套计算机视觉流程，利用谷歌街景图像中的细粒度汽车检测，大规模预测收入、犯罪率和碳排放等社会经济属性。通过在一个新型的2657类汽车数据集上训练大规模检测模型，该方法与真实收入数据的相关性达到高值（r=0.82），并揭示了汽车类型与社区人口统计特征之间的社会学关联。

ABSTRACT

Targeted socioeconomic policies require an accurate understanding of a country's demographic makeup. To that end, the United States spends more than 1 billion dollars a year gathering census data such as race, gender, education, occupation and unemployment rates. Compared to the traditional method of collecting surveys across many years which is costly and labor intensive, data-driven, machine learning driven approaches are cheaper and faster--with the potential ability to detect trends in close to real time. In this work, we leverage the ubiquity of Google Street View images and develop a computer vision pipeline to predict income, per capita carbon emission, crime rates and other city attributes from a single source of publicly available visual data. We first detect cars in 50 million images across 200 of the largest US cities and train a model to predict demographic attributes using the detected cars. To facilitate our work, we have collected the largest and most challenging fine-grained dataset reported to date consisting of over 2600 classes of cars comprised of images from Google Street View and other web sources, classified by car experts to account for even the most subtle of visual differences. We use this data to construct the largest scale fine-grained detection system reported to date. Our prediction results correlate well with ground truth income data (r=0.82), Massachusetts department of vehicle registration, and sources investigating crime rates, income segregation, per capita carbon emission, and other market research. Finally, we learn interesting relationships between cars and neighborhoods allowing us to perform the first large scale sociological analysis of cities using computer vision techniques.

研究动机与目标

开发一种可扩展的计算机视觉流程，利用公开的视觉数据预测社会经济属性。
解决传统基于调查的人口统计数据收集方法存在的成本高、耗时长等局限性。
创建迄今为止最大且最具挑战性的细粒度汽车数据集，以提升城市环境中物体识别的性能。
探索车辆类型与社区特征（如收入、隔离与犯罪）之间的社会学关系。
证明仅通过街景图像的视觉数据即可高精度预测复杂的都市指标。

提出的方法

在来自200个美国城市的5000万张谷歌街景图像上，训练了大规模细粒度汽车检测系统。
收集并由汽车专家标注了一个新型2657类汽车数据集，包含70万张来自网络资源和街景图像的图片。
按邮政编码提取每辆车的特征，如品牌、型号、年份、车身类型和价格，以表征社区的车辆构成。
使用汽车特征向量作为输入，训练岭回归模型以预测中位数家庭收入和犯罪率。
采用空间自相关度量（如Moran’s I和Getis-Ord G）分析城市间汽车拥有权的隔离模式。
计算预测值与真实社会经济变量之间的皮尔逊相关系数，以评估模型性能。

实验结果

研究问题

RQ1细粒度汽车检测在街景图像中能否预测城市级社会经济指标（如收入和犯罪率）？
RQ2特定汽车属性与社区级人口统计特征（如收入和隔离）之间存在何种关系？
RQ3仅通过单一来源（谷歌街景）的视觉数据，能否高精度预测传统上依赖昂贵调查收集的多样化都市指标？
RQ4美国城市中汽车拥有权是否存在可测量的空间模式，反映社会经济隔离现象？
RQ5在邮政编码层级上，哪些汽车特征是收入和犯罪的最强预测因子？

主要发现

在城市层级，预测值与真实中位数家庭收入之间的皮尔逊相关系数达到r=0.82；在邮政编码层级，相关系数为r=0.70。
外国制造汽车的占比与收入的相关性最强（r=0.47），其次是平均汽车价格（r=0.44）。
每张图像中的汽车数量是犯罪的最强预测因子，对人身犯罪的相关系数为r=0.36，对财产犯罪的相关系数为r=0.31。
厢型车是犯罪的重要预测因子，与总犯罪率的相关系数为r=0.30，表明车辆密度较高可能与犯罪活动增加相关。
芝加哥的隔离程度最高（Moran’s I = 0.82），而杰克逊维尔的隔离程度最低（仅为芝加哥的33%），与外部社会学排名一致。
该系统仅通过街景图像的视觉数据，成功预测了人均碳排放、车辆注册数据和收入隔离水平。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。