QUICK REVIEW

[논문 리뷰] Learning Generative Models with Sinkhorn Divergences

Aude Genevay, Gabriel Peyré|arXiv (Cornell University)|2017. 06. 01.

Generative Adversarial Networks and Image Synthesis참고 문헌 8인용 수 73

한 줄 요약

논문은 Sinkhorn loss를 도입하고, OT 기반 엔트로피 정규화 목표를 통해 생성 모델을 학습시키며, Sinkhorn iterations와 자동 미분으로 OT와 MMD 손실 사이를 보간하여 안정적이고 확장 가능한 학습을 구현한다.

ABSTRACT

The ability to compare two degenerate probability distributions (i.e. two probability distributions supported on two distinct low-dimensional manifolds living in a much higher-dimensional space) is a crucial problem arising in the estimation of generative models for high-dimensional observations such as those arising in computer vision or natural language. It is known that optimal transport metrics can represent a cure for this problem, since they were specifically designed as an alternative to information divergences to handle such problematic scenarios. Unfortunately, training generative machines using OT raises formidable computational and statistical challenges, because of (i) the computational burden of evaluating OT losses, (ii) the instability and lack of smoothness of these losses, (iii) the difficulty to estimate robustly these losses and their gradients in high dimension. This paper presents the first tractable computational method to train large scale generative models using an optimal transport loss, and tackles these three issues by relying on two key ideas: (a) entropic smoothing, which turns the original OT loss into one that can be computed using Sinkhorn fixed point iterations; (b) algorithmic (automatic) differentiation of these iterations. These two approximations result in a robust and differentiable approximation of the OT loss with streamlined GPU execution. Entropic smoothing generates a family of losses interpolating between Wasserstein (OT) and Maximum Mean Discrepancy (MMD), thus allowing to find a sweet spot leveraging the geometry of OT and the favorable high-dimensional sample complexity of MMD which comes with unbiased gradient estimates. The resulting computational architecture complements nicely standard deep network generative models by a stack of extra layers implementing the loss function.

연구 동기 및 목표

target distributions가 특이하거나 낮은 차원의 매니폴드에 놓일 수 있을 때, 제너레이티브 모델을 맞추기 위한 최적 운송(OT) 기하학의 활용을 동기화한다.
차원이 높은 제너레이티브 모델링에 대해 미분 가능하고 강건한 OT 기반 손실(Sinkhorn loss)을 도입한다.
스케일 가능하고 SGD 친화적인 알고리즘을 제공하여 미니 배치 추정치와 미분 가능한 Sinkhorn iterations를 결합해 학습을 확장한다.

제안 방법

Sinkhorn loss를 엔트로피 정규화가 적용된 정규화된 OT 거리로 정의하고, 그것의 극한 거동이 OT(epsilon -> 0)와 MMD(epsilon -> infinity)로 수렴함을 보인다.
모델의 push-forward와 데이터 분포 간의 Sinkhorn loss를 최소화하는 형태로 밀도 적합 문제를 공식화한다.
엔트로피 스무딩을 사용하여 Gibbs 커넬을 통해 Sinkhorn iterations를 이용한 미분 가능하고 GPU 친화적인 최적화를 가능하게 한다.
손실을 미니 배치와 L Sinkhorn iterations로 근사하여 자동 미분의 미분 가능 대리자를 얻는다.
생성 샘플과 실제 샘플 간의 거리 측정을 개선하기 위해 f_phi라는 특징 맵을 통해 파라메트릭 코스트 c_phi를 학습하는 것을 선택적으로 수행할 수 있다 (minimax over theta, phi).
AutoDiff가 가능하도록 Sinkhorn 단계들을 표준 SGD에 통합하는 O(L m n) 복잡도의 알고리즘을 제공한다.

실험 결과

연구 질문

RQ1엔트로피 정규화가 고차원 데이터에서 제너레이티브 모델을 학습하기 위한 tractable하고 미분 가능한 OT 기반 손실을 도출할 수 있는가?
RQ2Sinkhorn loss가 OT와 MMD 사이를 어떻게 보간하며, 샘플 복잡도와 그래디언트 안정성에 어떤 실용적 함의가 있는가?
RQ3생성 샘플과 실제 분포 간 정렬을 개선하기 위해 데이터 주도 ground cost를 학습할 수 있는가?
RQ4표준 하드웨어에서 미니배치와 자동 미분으로 Sinkhorn 기반 학습을 구현하는 것이 가능한가?
RQ5epsilon, 배치 크기, Sinkhorn iterations가 수렴성과 생성 품질에 어떤 영향을 미치는가?

주요 결과

Sinkhorn loss는 OT(epsilon -> 0)와 MMD(epsilon -> infinity) 사이를 매끄럽게 보간하여 기하학과 샘플 효율성 사이의 균형을 제공한다.
엔트로피 스무딩은 그래디언트 편향을 줄이고 고차원 성능을 개선하여 Sinkhorn iterations를 통한 안정적인 학습을 가능하게 한다.
미니 배치와 L Sinkhorn iterations를 활용한 실용적인 AutoDiff 기반 알고리즘은 differentiable 한 GPU 친화적 학습을 달성한다.
특징 맵 f_phi를 통해 파라메트릭 코스트를 학습하면 거리 측정을 더욱 개선할 수 있으며 min_theta max_phi 형태의 최적화로 이어진다.
데이터 적합성에 대한 엘립스 및 이미지 생성(MNIST, CIFAR-10)에서 epsilon, 배치 크기, L에 따른 민감도가 관찰되었으며, 더 큰 epsilon이 종종 더 빠른 수렴을 가능하게 한다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.