QUICK REVIEW

[논문 리뷰] StyleGAN-XL: Scaling StyleGAN to Large Diverse Datasets

Axel Sauer, K Schwarz|arXiv (Cornell University)|2022. 02. 01.

Computer Graphics and Visualization Techniques인용 수 20

한 줄 요약

StyleGAN-XL은 Projected GANs, progressive growing, 및 classifier guidance를 활용하여 StyleGAN3를 확장하고 ImageNet-스케일 데이터에서 최첨단 이미지 합성을 달성하며, 1024^2 해상도를 포함하고 역인(inversion) 및 편집을 가능하게 한다.

ABSTRACT

Computer graphics has experienced a recent surge of data-centric approaches for photorealistic and controllable content creation. StyleGAN in particular sets new standards for generative modeling regarding image quality and controllability. However, StyleGAN's performance severely degrades on large unstructured datasets such as ImageNet. StyleGAN was designed for controllability; hence, prior works suspect its restrictive design to be unsuitable for diverse datasets. In contrast, we find the main limiting factor to be the current training strategy. Following the recently introduced Projected GAN paradigm, we leverage powerful neural network priors and a progressive growing strategy to successfully train the latest StyleGAN3 generator on ImageNet. Our final model, StyleGAN-XL, sets a new state-of-the-art on large-scale image synthesis and is the first to generate images at a resolution of $1024^2$ at such a dataset scale. We demonstrate that this model can invert and edit images beyond the narrow domain of portraits or specific object classes.

연구 동기 및 목표

Investigate why StyleGAN struggles on large, unstructured datasets like ImageNet and identify bottlenecks.
Develop architectural and training strategies to enable stable, high-quality StyleGAN3 training on ImageNet-scale data.
Explore the benefits of Projected GANs, multi-network feature guidance, and classifier guidance for diverse data.
Demonstrate high-resolution synthesis (up to 1024^2) and enable inversion/editing on ImageNet classes.

제안 방법

Adopt StyleGAN3 as the base generator with StyleGAN3-T configuration to improve translational equivariance and reduce aliasing.
Employ Projected GAN training with feature projectors, cross-channel and cross-scale mixing to stabilize training on diverse data.
Reduce latent code dimensionality (z from 512 to 64) while keeping style code (w) at 512 to maintain capacity.
Introduce class-conditioned embeddings using pretrained embeddings to prevent embedding collapse and improve class diversity.
Reintroduce progressive growing with a tailored schedule to manage aliasing and scale from 16^2 to 1024^2, coupled with large-batch training at low resolutions.
Apply classifier guidance by adding a cross-entropy loss from a pretrained classifier to the generator, scaled by a factor (lambda).
Combine multiple pretrained feature networks (EfficientNet and DeiT) for F, enabling complementary representations for Projected GANs.

실험 결과

연구 질문

RQ1Can StyleGAN3 be effectively scaled to ImageNet-scale datasets without losing image quality or diversity?
RQ2What training strategies (Projected GANs, progressive growing, feature-network fusion, classifier guidance) best enable high-quality, diverse generation on large, unstructured data?
RQ3How does class-conditioning interact with Projected GANs and latent-space design to improve mode coverage on ImageNet?
RQ4Can inversion and editing be effectively performed on ImageNet-scale models, including out-of-domain inputs?
RQ5What resolution and computation are required to reach state-of-the-art performance on ImageNet-scale synthesis?

주요 결과

StyleGAN-XL achieves state-of-the-art image synthesis on ImageNet across multiple resolutions, including 1024^2.
Projected GANs with a low-dimensional latent z and pretrained class embeddings stabilize training and improve sample diversity.
Combining an EfficientNet and a DeiT backbone for feature projections yields best FID/IS trade-offs among ablations.
Progressive growing with aliasing control substantially reduces training time and enables megapixel synthesis.
Classifier guidance further improves image fidelity on higher resolutions.
Inversion via PTI yields faithful reconstructions and smooth latent-space edits, including out-of-domain embeddings.
StyleGAN-XL demonstrates competitive inversion and editing capabilities, with PTI enabling precise embeddings and smooth interpolations.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.