QUICK REVIEW

[논문 리뷰] Avoiding Latent Variable Collapse With Generative Skip Models

Adji Bousso Dieng, Yoon Kim|arXiv (Cornell University)|2018. 07. 12.

Generative Adversarial Networks and Image Synthesis참고 문헌 32인용 수 35

한 줄 요약

이 논문은 생성 모델에 스킵 연결을 도입하여 VAE에서 잠재 변수 붕괴를 방지하는 스킵 변분 오토인코더(Skip-VAEs)를 제안한다. 잠재 변수와 관측치 간의 더 강한 종속성을 강제함으로써 스킵-VAEs는 상호정보량을 증가시키고 더 의미 있는 표현을 생성하며, MNIST, 옴니글랏, 야후 텍스트 데이터셋에서 표준 VAE보다 표현 품질이 뛰어나지만 우도 성능은 유사하게 유지한다.

ABSTRACT

Variational autoencoders learn distributions of high-dimensional data. They model data with a deep latent-variable model and then fit the model by maximizing a lower bound of the log marginal likelihood. VAEs can capture complex distributions, but they can also suffer from an issue known as "latent variable collapse," especially if the likelihood model is powerful. Specifically, the lower bound involves an approximate posterior of the latent variables; this posterior "collapses" when it is set equal to the prior, i.e., when the approximate posterior is independent of the data. While VAEs learn good generative models, latent variable collapse prevents them from learning useful representations. In this paper, we propose a simple new way to avoid latent variable collapse by including skip connections in our generative model; these connections enforce strong links between the latent variables and the likelihood function. We study generative skip models both theoretically and empirically. Theoretically, we prove that skip models increase the mutual information between the observations and the inferred latent variables. Empirically, we study images (MNIST and Omniglot) and text (Yahoo). Compared to existing VAE architectures, we show that generative skip models maintain similar predictive performance but lead to less collapse and provide more meaningful representations of the data.

연구 동기 및 목표

잠재 변수 붕괴 문제를 해결하기 위해, 사후분포가 사전분포로 붕괴되어 의미 있는 데이터 표현을 포착하지 못하는 VAE의 문제를 해결한다.
잠재 변수와 관측 데이터 간의 연결을 강화하여 VAE의 표현 능력을 향상시킨다.
우도 모델에 스킵 연결을 도입함으로써 관측치와 추론된 잠재 변수 간의 상호정보량을 증가시킨다.
스킵-VAEs가 깊은 모델과 고차원 잠재 공간에서 붕괴를 줄이면서도 높은 우도 성능을 유지하는지 확인한다.
스킵 연결이 반감형 추론(semi-amortized inference, sa-VAE)과 같은 고도의 훈련 기법과의 융합 효과를 평가한다.

제안 방법

잠재 변수 z를 생성 네트워크의 여러 은닉층 상태와 연결하는 스킵 연결을 도입한다.
잠재 변수 z가 중간층에 잔차 구조와 유사하게 연결된 깊은 네트워크로 구성된 생성 스킵 모델을 구축한다. 이 모델에서 우도 pθ(x|z)는 스킵 연결을 포함한 딥 네트워크로 매개변수화된다.
증명 하한(lower bound, ELBO)을 최적화하여 암시적 변분 추론(amortized variational inference)을 사용해 모델을 훈련한다. 이때 생성 모델 매개변수 θ와 추론 네트워크 매개변수 φ를 함께 최적화한다.
구면 정규분포 사전분포 p(z) = N(0, I)를 사용하고, 우도 및 사후분포를 파라미터화하기 위해 딥 네트워크를 활용한다.
스킵 연결을 반감형 추론(sa-VAE)과 융합하여 사후분포 품질을 향상시키고 붕괴를 줄인다.
MNIST 및 야후 텍스트 데이터에서 상호정보량, KL 발산, 활성 단위 분석, 그리고 하류 분류 정확도를 사용해 성능을 평가한다.

실험 결과

연구 질문

RQ1생성 모델에 스킵 연결을 추가함으로써 VAE에서 잠재 변수 붕괴가 감소하는가?
RQ2스킵 연결은 관측 데이터와 추론된 잠재 변수 간의 상호정보량을 어느 정도 증가시키는가?
RQ3표준 VAE 및 sa-VAE와 비교할 때 스킵-VAEs의 표현 품질과 우도 성능은 어떠한가?
RQ4모델의 깊이 또는 잠재 차원 수가 증가할수록 스킵 연결의 이점은 증가하는가?
RQ5스킵 연결은 텍스트 생성을 위한 순차적 VAE(autoregressive VAEs)에서 붕괴를 효과적으로 완화할 수 있는가?

주요 결과

MNIST에서 스킵-VAE는 후행 평균을 특징으로 사용해 98.10%의 분류 정확도를 달성했고, 표준 VAE는 97.19%였다.
약한 모델(MLP 기반 인코더/디코더)을 사용한 경우 스킵-VAE는 98.25%의 정확도를 기록했고, 표준 VAE는 97.70%였다.
야후 텍스트 데이터셋에서 스킵-sa-VAE는 64개의 모든 잠재 차원을 효과적으로 활용했지만, 표준 sa-VAE는 고차원에서 상호정보량과 활성 단위 수가 감소했다.
잠재 차원 수가 증가함에 따라 스킵-VAEs는 상호정보량을 유지하거나 향상시키고 붕괴 지표를 감소시키는 반면, 표준 VAE는 고차원에서 성능이 악화되었다.
스킵-sa-VAE는 sa-VAE만 사용한 경우보다 더 높은 상호정보량과 더 나은 붕괴 완화 효과를 보였으며, 스킵 연결과 반감형 추론 간의 상호보완적 상호작용을 확인했다.
t-SNE 시각화 결과 스킵-VAE의 잠재 표현은 표준 VAE보다 더 구조적이고 클래스 구분이 뚜렷한 클러스터를 형성했다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.