QUICK REVIEW

[논문 리뷰] Towards Faster and Stabilized GAN Training for High-fidelity Few-shot Image Synthesis

Bingchen Liu, Yizhe Zhu|arXiv (Cornell University)|2021. 01. 12.

Generative Adversarial Networks and Image Synthesis참고 문헌 53인용 수 109

한 줄 요약

본 논문은 Skip-Layer Channel-wise Excitation (SLE) 모듈을 갖춘 경량 GAN과 특징 인코더로 학습되는 자기지도( self-supervised) 판별기를 제안하여 제한된 하드웨어에서 처음부터 학습된 소수-shot 데이터로도 1024×1024 해상도의 고충실도 이미지를 합성할 수 있게 한다.

ABSTRACT

Training Generative Adversarial Networks (GAN) on high-fidelity images usually requires large-scale GPU-clusters and a vast number of training images. In this paper, we study the few-shot image synthesis task for GAN with minimum computing cost. We propose a light-weight GAN structure that gains superior quality on 1024*1024 resolution. Notably, the model converges from scratch with just a few hours of training on a single RTX-2080 GPU, and has a consistent performance, even with less than 100 training samples. Two technique designs constitute our work, a skip-layer channel-wise excitation module and a self-supervised discriminator trained as a feature-encoder. With thirteen datasets covering a wide variety of image domains (The datasets and code are available at: https://github.com/odegeasslbc/FastGAN-pytorch), we show our model's superior performance compared to the state-of-the-art StyleGAN2, when data and computing budget are limited.

연구 동기 및 목표

Aim to train unconditional GANs for high-resolution images with limited data and modest compute.
Develop a lightweight generator-discriminator architecture that converges from scratch on a single GPU.
Improve training stability and synthesis quality under few-shot data regimes.
Enable automatic style-content disentanglement similar to StyleGAN through architectural design.

제안 방법

Introduce Skip-Layer Channel-wise Excitation (SLE) to re-calibrate high-resolution feature maps using low-resolution activations.
Make SLE operate across resolutions with long-range skip connections and channel-wise gating to improve gradient flow.
Add a self-supervised discriminator trained as a feature-encoder with decoders that reconstruct real-image features, using a reconstruction loss to regularize D.
Train GAN with hinge adversarial loss and include a lightweight auto-encoding reconstruction objective for D.
Compare against StyleGAN2 and a strong DCGAN-derived baseline, focusing on few-shot and high-resolution settings.
Evaluate on 13 diverse datasets up to 1024×1024, using FID and LPIPS as metrics.

실험 결과

연구 질문

RQ1Can a compact GAN with specialized architectural modules achieve high-fidelity 1024×1024 synthesis from limited data and small compute budgets?
RQ2Do cross-resolution skip connections (SLE) and self-supervised discriminator training improve training stability and reduce mode collapse?
RQ3How do the proposed techniques compare to StyleGAN2 and strong baselines under few-shot and small-data regimes?
RQ4To what extent can the discriminator be regularized via self-supervision to benefit G without hindering adversarial training?

주요 결과

The proposed model achieves superior synthesis quality versus the state-of-the-art StyleGAN2 under limited data and compute across multiple datasets.
SLE improves gradient flow and enables automatic content–style disentanglement, contributing to faster convergence.
Self-supervised D, especially auto-encoding, provides the largest performance boost and stabilizes training against mode collapse.
The method remains robust across high-resolution (1024×1024) and small datasets, often requiring only hours of training on a single GPU.
Qualitative and quantitative results show our model outperforms baselines on many few-shot datasets and maintains stability where StyleGAN2 can fail to converge.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.