QUICK REVIEW

[논문 리뷰] Pedestrian-Synthesis-GAN: Generating Pedestrian Data in Real Scene and Beyond

Xi Ouyang, Yu Cheng|arXiv (Cornell University)|2018. 04. 05.

Video Surveillance and Tracking Methods참고 문헌 32인용 수 72

한 줄 요약

PS-GAN은 이중 판별기와 공간 피라미드 풀링을 사용하여 실제 장면에서 현실적인 보행자를 합성하고, 데이터 증가에 사용될 때 CNN 기반 보행자 탐지기의 성능을 향상시키는 라벨링된 데이터를 생성합니다. 이는 데이터셋 간에 일반화됩니다.

ABSTRACT

State-of-the-art pedestrian detection models have achieved great success in many benchmarks. However, these models require lots of annotation information and the labeling process usually takes much time and efforts. In this paper, we propose a method to generate labeled pedestrian data and adapt them to support the training of pedestrian detectors. The proposed framework is built on the Generative Adversarial Network (GAN) with multiple discriminators, trying to synthesize realistic pedestrians and learn the background context simultaneously. To handle the pedestrians of different sizes, we adopt the Spatial Pyramid Pooling (SPP) layer in the discriminator. We conduct experiments on two benchmarks. The results show that our framework can smoothly synthesize pedestrians on background images of variations and different levels of details. To quantitatively evaluate our approach, we add the generated samples into training data of the baseline pedestrian detectors and show the synthetic images are able to improve the detectors' performance.

연구 동기 및 목표

확장된 주석 작업 없이 라벨링된 보행자 데이터의 필요성을 해결한다.
GAN 기반 프레임워크를 개발하여 배경 장면 안에서 현실적인 보행자를 합성한다.
합성 보행자에 대한Ground-truth 바운딩 박스를 제공하여 탐지기를 학습시킨다.
생성된 보행자가 다양한 스케일과 맥락에서 배경과 어울리도록 한다.
Cityscapes에서의 데이터 증가 효과를 보여주고 크로스 데이터셋 전이(Cityscapes에서 Tsinghua-Daimler로)를 시연한다.

제안 방법

두 판별기와 함께 Pedestrian-Synthesis-GAN(PS-GAN)을 제안한다: Db는 배경 맥락을 학습하고 Dp는 보행자 현실성을 검증한다.
U-Net 생성기를 사용해 보행자가 나타날 이미지의 노이즈 박스를 채운다.
생성된 이미지에서 합성 보행자를 잘라내고 Dp에서 Spatial Pyramid Pooling을 적용해 가변 보행자 크기를 처리한다.
Db에 대해 LSGAN 손실, Dp에 대해 표준 GAN 손실, λ=100인 L1 재구성 손실의 조합으로 학습한다.
고정된 바운딩 박스 내에서 합성을 감독하기 위해 Pix2Pix 스타일의 페어드 학습 설정을 채택한다.
Cityscapes에서 합성 데이터를 사용한 Faster R-CNN 탐지기 증강과 Tsinghua-Daimler에 대한 크로스 데이터셋 테스트로 평가한다.

실험 결과

연구 질문

RQ1PS-GAN이 실제 배경과 매끄럽게 통합되는 포토 리얼리스틱한 보행자를 생성할 수 있는가?
RQ2PS-GAN 합성 보행자로 실제 학습 데이터를 보강하면 탐지기 성능이 향상되는가?
RQ3추가 주석 없이 새로운 데이터셋으로 PS-GAN이 일반화되는가?
RQ4아키텍처 선택의 영향( Dp의 SPP, 이중 판별기, 손실 유형)이 합성 품질에 어떤 영향을 미치는가?
RQ5제일적인 최적의 합성 데이터 양이 탐지 성능을 향상시키면서 데이터 분포를 악화시키지 않는가?

주요 결과

데이터	Pix2Pix GAN	PS-GAN
1826 real images (7729 labels)	60.11%
+ 3000 synthetic pedestrians	59.95%	61.02%
+ 5000 synthetic pedestrians	60.23%	61.79%
+ 8000 synthetic pedestrians	58.41%	61.59%
Pascal VOC 2007	34.13%
Pascal VOC 2007 & 2012	36.85%

PS-GAN은 픽스투픽스(Pix2Pix) 기준선보다 배경에 더 잘 맞는 날카롭고 포토리얼리스틱한 보행자를 생성한다.
학습에 PS-GAN 합성 보행자를 통합하면 Cityscapes 테스트에서 Faster R-CNN AP가 일관되게 향상된다.
크로스 데이터셋 실험에서 Cityscapes에서 생성된 PS-GAN 데이터가 추가 주석 없이도 Tsinghua-Daimler Cyclist Benchmark에 대한 탐지 성능을 향상시킨다.
Dp의 SPP와 Db에 대한 LSGAN 손실은 대안 구성보다 배경 충실도와 보행자 세부 묘사가 더 우수하다.
Pix2Pix에서 너무 많은 합성 보행자를 사용하는 것은 성능에 해를 끼칠 수 있지만 PS-GAN은 합성 증가를 통해 AP를 유지하거나 향상시킨다.
실험 전반에 걸쳐 PS-GAN 생성 데이터가 탐지기 성능 향상에서 Pix2Pix보다 우수하다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.