QUICK REVIEW

[논문 리뷰] Auto-Encoding Variational Bayes

Diederik P. Kingma, Max Welling|UvA-DARE (University of Amsterdam)|2013. 12. 20.

Gaussian Processes and Bayesian Inference참고 문헌 15인용 수 15,549

한 줄 요약

대규모 데이터 세트에 확장 가능하고 연속 잠재 변수에 대한 효율적인 추론을 가능하게 하는 재매개화(trick) 기법(SGVB)과 Auto-Encoding VB(AEVB) 알고리즘이 포함된 확률적 변분 추론 프레임워크를 도입한다. 인식 모델로 신경망을 사용할 때, 이는 variational auto-encoder를 산출한다.

ABSTRACT

How can we perform efficient inference and learning in directed probabilistic models, in the presence of continuous latent variables with intractable posterior distributions, and large datasets? We introduce a stochastic variational inference and learning algorithm that scales to large datasets and, under some mild differentiability conditions, even works in the intractable case. Our contributions are two-fold. First, we show that a reparameterization of the variational lower bound yields a lower bound estimator that can be straightforwardly optimized using standard stochastic gradient methods. Second, we show that for i.i.d. datasets with continuous latent variables per datapoint, posterior inference can be made especially efficient by fitting an approximate inference model (also called a recognition model) to the intractable posterior using the proposed lower bound estimator. Theoretical advantages are reflected in experimental results.

연구 동기 및 목표

연속 잠재 변수를 가진 방향성 확률 모델에 대해 효율적인 근사 추론 및 학습을 제공한다.
개별 데이터 포인트에 대한 비싼 추론 없이도 계산 불가능한 사후분포와 대용량 데이터 세트를 처리한다.
변분 경계에 대한 저분산 기울기 추정을 얻기 위한 재매개화 트릭을 도입한다.
AEVB를 통해Recognizer 모델 q_phi(z|x)와 생성 모델 p_theta(x|z)를 함께 학습하는 i.i.d. 데이터 세트 프레임워크를 개발한다.
오토인코더와의 연결점을 강조하고 이미지 데이터 세트에서 입증한다.

제안 방법

주변 가능도에 대한 변분 하한(ELBO)을 도출하고 이를 KL 항과 재구성 항으로 분해한다.
z를 z = g_phi(epsilon, x)로 재매개화하여 미분 가능 Monte Carlo 추정치를 가능하게 하는 SGVB 추정기를 도입한다.
두 가지 SGVB 변형을 보인다: (A) 일반 추정기 및 (B) 더 낮은 분산의 그래디언트를 가지는 KL-정규화 추정기.
미니배치 확률적 경사 상승을 사용하여 q_phi(z|x)와 p_theta(x|z)를 함께 학습하는 AEVB 알고리즘을 제안한다.
연속 잠재 변수의 경우 Gaussian 형태의 잠재추정기 q_phi(z|x)와 Gaussian 사전 p_theta(z)를 갖춘 신경망 기반 인코더를 구현하여 가능하면 닫힌 형태의 KL을 얻는다.
대규모 데이터에 확장 가능하도록 미니배치 학습(N 데이터, M 미니배치)로 확장하고 최적화를 위해 Adagrad/SGD를 활용한다.

Figure 2: Comparison of our AEVB method to the wake-sleep algorithm, in terms of optimizing the lower bound, for different dimensionality of latent space ( $N_{\mathbf{z}}$ ). Our method converged considerably faster and reached a better solution in all experiments. Interestingly enough, more latent

실험 결과

연구 질문

RQ1연속 잠재 변수를 가진 방향성 모델에서 포스트리에가 계산 불가능할 때도 효율적인 추론과 학습이 가능한가?
RQ2변분 하한의 재매개화가 미분 가능하고 저분산 그래디언트 추정기를 제공하여 확률적 최적화에 적합한가?
RQ3근사 추론 모델(인식 모델)을 생성 모델과 함께 학습하여 per-datapoint 추론을 빠르게 가능하게 하는가(AEVB)?
RQ4MNIST, Frey Face 등 실제 데이터 세트에서 변분 자동인코더 프레임워크가 wake-sleep 또는 MCEM 같은 기존 온라인 학습 방법과 비교하여 어떤 성능을 보이는가?

주요 결과

SGVB 추정기는 표준 확률적 경사로 최적화할 수 있는 미분 가능하고 바이어스 없는 하한 추정기를 제공한다.
인식 모델 q_phi(z|x)와 재매개화를 사용하면 per-datapoint 추론과 학습이 효율적이 되어 AEVB 알고리즘을 얻을 수 있다.
KL 항은 정규화 역할을 하고 재구성 항은 샘플링을 통해 추정되며, 보통 datapoint당 L=1 샘플로 수행된다.
AEVB는 대규모 데이터에 확장 가능하도록 미니배치로 학습할 수 있으며 실험에서 wake-sleep보다 더 빠른 수렴과 더 나은 하한을 달성한다.
MNIST와 Frey Face에 대한 실험은 더 많은 잠재 변수가 반드시 성능을 악화시키는 것도 아니고 variational 정규화로 인해 오히려 개선될 수 있음을 보여주며, AEVB는 경쟁력 있는 하한 및 주변 우도 추정치를 달성한다.

Figure 3: Comparison of AEVB to the wake-sleep algorithm and Monte Carlo EM, in terms of the estimated marginal likelihood, for a different number of training points. Monte Carlo EM is not an on-line algorithm, and (unlike AEVB and the wake-sleep method) can’t be applied efficiently for the full MNI

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.