QUICK REVIEW

[论文解读] Pedestrian-Synthesis-GAN: Generating Pedestrian Data in Real Scene and Beyond

Xi Ouyang, Yu Cheng|arXiv (Cornell University)|Apr 5, 2018

Video Surveillance and Tracking Methods参考文献 32被引用 72

一句话总结

PS-GAN 使用双判别器和空间金字塔池化，在真实场景中合成逼真的行人，生成的带标注数据在用于数据增强时能提升基于 CNN 的行人检测器性能。它在跨数据集上具备泛化能力。

ABSTRACT

State-of-the-art pedestrian detection models have achieved great success in many benchmarks. However, these models require lots of annotation information and the labeling process usually takes much time and efforts. In this paper, we propose a method to generate labeled pedestrian data and adapt them to support the training of pedestrian detectors. The proposed framework is built on the Generative Adversarial Network (GAN) with multiple discriminators, trying to synthesize realistic pedestrians and learn the background context simultaneously. To handle the pedestrians of different sizes, we adopt the Spatial Pyramid Pooling (SPP) layer in the discriminator. We conduct experiments on two benchmarks. The results show that our framework can smoothly synthesize pedestrians on background images of variations and different levels of details. To quantitatively evaluate our approach, we add the generated samples into training data of the baseline pedestrian detectors and show the synthetic images are able to improve the detectors' performance.

研究动机与目标

在不需要大量标注工作的情况下，解决带标注的行人数据的需求。
开发一个基于 GAN 的框架，在背景场景中合成逼真的行人。
为合成的行人提供真实框 ground-truth 边界框，以训练检测器。
确保生成的行人与背景在不同尺度和场景中自然融合。
展示在 Cityscapes 上的数据增强效果以及跨数据集迁移（Cityscapes 到 Tsinghua-Daimler）。

提出的方法

提出 Pedestrian-Synthesis-GAN (PS-GAN)，包含两个判别器：Db 学习背景上下文，Dp 验证行人真实度。
使用 U-Net 生成器来填充将在图像中出现行人的噪声框。
从生成的图像中裁剪合成的行人，在 Dp 中应用 Spatial Pyramid Pooling 以处理不同大小的行人。
以 Db 的 LSGAN 损失、Dp 的标准 GAN 损失，以及 λ=100 的 L1 重建损失的组合进行训练。
采用 Pix2Pix 风格的成对训练设置，在固定边界框内监督合成。
通过在 Cityscapes 上用合成数据扩增 Faster R-CNN 检 detectors，并在 Tsinghua-Daimler 上进行跨数据集测试来评估。

实验结果

研究问题

RQ1PS-GAN 能否生成与真实背景平滑融合的照片级真实感行人？
RQ2用 PS-GAN 合成的行人来增强真实训练数据，是否能提升检测器性能？
RQ3PS-GAN 在无额外标注的情况下对新数据集的泛化能力如何？
RQ4架构选择（Dp 中的 SPP、双判别器、损失类型）对合成质量有何影响？
RQ5是否存在一个最佳的合成数据量，能够提高检测同时不降低数据分布？

主要发现

PS-GAN 产生清晰、照片级真实感的行人，较 Pix2Pix 基线更能融入背景。
将 PS-GAN 合成的行人加入训练，持续提升在 Cityscapes 测试上的 Faster R-CNN AP。
跨数据集实验表明，来自 Cityscapes 的 PS-GAN 生成数据在 Tsinghua-Daimler Cyclist 基准测试上提升检测，而无需额外标注。
DP 中的 SPP 搭配 Db 的 LSGAN 损失，相较其他配置能获得更好的背景保真度和行人细节。
使用过多来自 Pix2Pix 的合成行人可能降低性能，而 PS-GAN 在合成扩增下保持或提升 AP。
在各项实验中，PS-GAN 生成的数据在提升检测器性能方面优于 Pix2Pix。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。