QUICK REVIEW

[论文解读] Learning from Synthetic Data for Crowd Counting in the Wild

Qi Wang, Junyu Gao|arXiv (Cornell University)|Mar 8, 2019

Video Surveillance and Tracking Methods参考文献 37被引用 51

一句话总结

该论文从GTA5创建了一个大型合成GCC数据集，进行自动标注，并展示了两种提升真实世界人群计数的做法：在合成数据上进行预训练后再微调，以及通过SSIM嵌入的CycleGAN进行域适应，将合成图像翻译为真实图像以进行无监督学习。

ABSTRACT

Recently, counting the number of people for crowd scenes is a hot topic because of its widespread applications (e.g. video surveillance, public security). It is a difficult task in the wild: changeable environment, large-range number of people cause the current methods can not work well. In addition, due to the scarce data, many methods suffer from over-fitting to a different extent. To remedy the above two problems, firstly, we develop a data collector and labeler, which can generate the synthetic crowd scenes and simultaneously annotate them without any manpower. Based on it, we build a large-scale, diverse synthetic dataset. Secondly, we propose two schemes that exploit the synthetic data to boost the performance of crowd counting in the wild: 1) pretrain a crowd counter on the synthetic data, then finetune it using the real data, which significantly prompts the model's performance on real data; 2) propose a crowd counting method via domain adaptation, which can free humans from heavy data annotations. Extensive experiments show that the first method achieves the state-of-the-art performance on four real datasets, and the second outperforms our baselines. The dataset and source code are available at https://gjy3035.github.io/GCC-CL/.

研究动机与目标

通过解决数据稀缺和合成与真实场景之间的域差距，推动野外人群计数研究。
构建一个来自GTA5的大型、多样化合成数据集(GCC)，拥有自动标注。
提出一个监督式预训练-微调策略，利用GCC提升真实数据上的性能。
开发一种域适配方法(SE Cycle GAN)，在保持局部纹理的同时将合成场景翻译为照片现实图像，以进行无真实数据标签的训练。
在多个真实数据集上进行评估，以展示性能提升和域迁移能力。

提出的方法

引入 Spatial Fully Convolutional Network (SFCN) 直接回归密度图。
创建 GTA5 Crowd Counting (GCC) 数据集，包含 15,212 张图像和 7,625,843 个标注头部，覆盖 400 个场景，具有多样的天气、时间和地点。
在 GCC 上进行人群计数预训练，并在真实数据上微调，以改善初始化并降低过拟合。
提出 SSIM Embedding (SE) Cycle GAN，在保持局部纹理的同时使用基于 SSIM 的循环一致性损失，将合成场景翻译为照片现实图像。
引入 Density/Scene Regularization 以用 MAX_S 值对输出进行约束，并在具有域差异的数据集上对翻译数据进行有选择的采样。
证明在 GCC 上的预训练相较于从头训练或使用 ImageNet 初始化，在真实数据集上能得到更低的 MAE/MSE。

实验结果

研究问题

RQ1合成数据在 GCC 上的预训练在微调后是否能提升真实数据集的人群计数性能？
RQ2一种域自适应翻译（SE Cycle GAN）是否能足够缩小合成到真实的域差，从而实现对真实数据的无监督训练？
RQ3在域自适应中使用密度感知规范化与数据选择策略能带来哪些改进？
RQ4在 GCC 上的各种训练/测试分割下，所提出的 SFCN 与公认基线相比有怎样的表现？
RQ5在多大程度上，合成数据能推动真实人群计数基准达到最先进水平？

主要发现

在真实数据上进行微调的 GCC 预训练相较于从头训练或使用 ImageNet 初始化，能够降低计数误差（MAE/MSE），如 MCNN：从 277/426 降到 199.8/311.2（在 UCF-QNRF）以及 26.4/41.3 降到 18.8/28.2（在 SHHT B）。
SFCN 在 GCC 上的随机、跨摄像头和跨地点分割下通常具有竞争力且往往优于其他方法（例如 SFCN：随机分割下 MAE/MSE 为 36.2/81.1；跨摄像头为 56.0/129.7；跨地点为 89.3/216.8）。
在五个真实数据集上对 GCC 微调的 SFCN 达到最先进的结果（例如 UCF-QNRF：102.0/171.4 MAE/MSE 对比 SOTA 132/191）。
SE Cycle GAN 在多份真实数据集上显著优于 Cycle GAN 和 No Adaptation 的域自适配（例如在 ShanghaiTech A：NoAdpt 160.0 MAE vs CycleGAN 143.3 MAE vs SE Cycle GAN 123.4 MAE）。
Density/Scene Regularization (DSR) 通过筛选真实数据集中不存在的合成场景来提升自适应效果，在 ShanghaiTech A 的实验中，DSR 的应用相较仅 Cycle GAN 在有无 DSR 的情况下都能提升性能。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。