Skip to main content
QUICK REVIEW

[논문 리뷰] PixelCNN++: Improving the PixelCNN with Discretized Logistic Mixture Likelihood and Other Modifications

Tim Salimans, Andrej Karpathy|arXiv (Cornell University)|2017. 01. 19.
Brain Tumor Detection and Classification인용 수 567
한 줄 요약

PixelCNN++는 discretized logistic mixture likelihood와 추가 아키텍처 수정(whole-pixel conditioning, downsampling with skip connections, dropout)을 도입하여 CIFAR-10에서 PixelCNN 성능을 향상시키고 state-of-the-art log-likelihood를 달성한다.

ABSTRACT

PixelCNNs are a recently proposed class of powerful generative models with tractable likelihood. Here we discuss our implementation of PixelCNNs which we make available at https://github.com/openai/pixel-cnn. Our implementation contains a number of modifications to the original model that both simplify its structure and improve its performance. 1) We use a discretized logistic mixture likelihood on the pixels, rather than a 256-way softmax, which we find to speed up training. 2) We condition on whole pixels, rather than R/G/B sub-pixels, simplifying the model structure. 3) We use downsampling to efficiently capture structure at multiple resolutions. 4) We introduce additional short-cut connections to further speed up optimization. 5) We regularize the model using dropout. Finally, we present state-of-the-art log likelihood results on CIFAR-10 to demonstrate the usefulness of these modifications.

연구 동기 및 목표

  • Motivate improvements to the PixelCNN family for better tractable likelihood and perceptual quality in images.
  • Simplify model structure while speeding training and improving convergence.
  • Explore multi-resolution processing and regularization techniques to boost performance.
  • Demonstrate state-of-the-art log-likelihood on CIFAR-10 with the proposed changes.

제안 방법

  • Use a discretized mixture of logistic distributions for pixel likelihood instead of a 256-way softmax.
  • Condition on whole pixels (R,G,B together) and model channel dependencies linearly on preceding channels.
  • Incorporate downsampling with stride-2 convolutions to capture multi-resolution structure.
  • Add long-range shortcut connections to recover information lost by downsampling/upsampling.
  • Apply dropout regularization to reduce overfitting and improve generative quality.

실험 결과

연구 질문

  • RQ1How does discretized logistic mixture likelihood compare to softmax in training speed and log-likelihood performance?
  • RQ2What is the impact of conditioning on whole pixels versus sub-pixels on model capacity and sample quality?
  • RQ3Do downsampling and shortcut connections provide comparable benefits to dilated convolutions for multi-resolution modeling?
  • RQ4What is the effect of dropout on training stability and generated image quality?
  • RQ5What are the state-of-the-art log-likelihood results on CIFAR-10 with PixelCNN++ and its variants?

주요 결과

모델sub-pixel당 비트
Deep Diffusion5.40
NICE4.48
DRAW4.13
Deep GMMs4.00
Conv DRAW3.58
Real NVP3.49
PixelCNN (van den Oord et al.)3.14
VAE with IAF3.11
Gated PixelCNN3.03
PixelRNN3.00
PixelCNN++2.92
  • PixelCNN++ achieves 2.92 bits per sub-pixel on CIFAR-10, improving over prior PixelCNN variants.
  • Class-conditioned PixelCNN++ attains 2.94 bits per sub-pixel, with qualitative class-distinct samples.
  • Softmax-based ablation trains more slowly and is less efficient than discretized logistic mixtures in this setting.
  • Continuous mixture dequantization yields a variational lower bound of 3.11 bits per dimension, worse than discretized likelihood.
  • Removing short-cut connections prevents training progression, highlighting their importance with downsampling.
  • No dropout leads to overfitting and poorer perceptual image quality, despite high training likelihood.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.