QUICK REVIEW

[논문 리뷰] Building Normalizing Flows with Stochastic Interpolants

Michael S. Albergo, Eric Vanden‐Eijnden|arXiv (Cornell University)|2022. 09. 30.

Generative Adversarial Networks and Image Synthesis인용 수 21

한 줄 요약

우리는 확률 흐름의 흐름(current)에서 학습된 연속시간 정규화 흐름 InterFlow를 도입한다. 이는 기저 분포와 목표 분포 사이의 확률 현재를 이용하여 유한 시간 전송(finite-time transport), 경로를 따라의 가능도 평가, 및 ODE 솔버를 통한 역전파 없이도 효율적인 학습을 가능하게 한다.

ABSTRACT

A generative model based on a continuous-time normalizing flow between any pair of base and target probability densities is proposed. The velocity field of this flow is inferred from the probability current of a time-dependent density that interpolates between the base and the target in finite time. Unlike conventional normalizing flow inference methods based the maximum likelihood principle, which require costly backpropagation through ODE solvers, our interpolant approach leads to a simple quadratic loss for the velocity itself which is expressed in terms of expectations that are readily amenable to empirical estimation. The flow can be used to generate samples from either the base or target, and to estimate the likelihood at any time along the interpolant. In addition, the flow can be optimized to minimize the path length of the interpolant density, thereby paving the way for building optimal transport maps. In situations where the base is a Gaussian density, we also show that the velocity of our normalizing flow can also be used to construct a diffusion model to sample the target as well as estimate its score. However, our approach shows that we can bypass this diffusion completely and work at the level of the probability flow with greater simplicity, opening an avenue for methods based solely on ordinary differential equations as an alternative to those based on stochastic differential equations. Benchmarking on density estimation tasks illustrates that the learned flow can match and surpass conventional continuous flows at a fraction of the cost, and compares well with diffusions on image generation on CIFAR-10 and ImageNet $32 imes32$. The method scales ab-initio ODE flows to previously unreachable image resolutions, demonstrated up to $128 imes128$.

연구 동기 및 목표

연속 시간 프레임워크 내에서 기저 분포와 대상 밀도 간의 효율적인 전송을 동기 부여한다.
연속 방정식을 강제하는 간단한 이차 목적 함수를 최소화하여 속도장 v_t를 추정한다.
인터폴런트 경로를 따라 샘플링 및 가능도 추정 가능하게 하고 최적 전달과 연결한다.
ODE 솔버를 통한 역전파를 피하면서 밀도 추정 및 이미지 생성 작업에서 확장성과 경쟁력 있는 성능을 보여준다.

제안 방법

x_t = I_t(x_0,x_1)로 정의된 확률적 인터폴런트를 정의한다. 여기서 x_0 ~ ρ_0, x_1 ~ ρ_1이다.
ρ_t(x)가 속도 v_t(x)로 연속 방정식을 만족시키고, 이 차분 목적 G(v)를 최소화하는 것을 보인다.
G(v)와 그 최소해를 ρ_0, ρ_1, t에서의 샘플의 기대값으로 표현하여 경험적 추정이 가능하도록 한다.
인터폴런트 경로 길이가 인터폴런트 I_t 및/또는 기저 ρ_0를 조정하여 최적 전달(Benamou–Brenier)에 근접하도록 최적화될 수 있음을 보여준다.
ρ_0가 가우시안일 때 속도와 점수와의 연결을 제공하고 점수 기반 확산 개념에 연결하며 샘플링에 대한 SDE 해석을 도출한다.

실험 결과

연구 질문

RQ1확률적 인터폴런트로부터 직접 속도장을 학습하여 결과적으로 ρ_t가 ρ_0과 ρ_1 사이의 연속 방정식을 만족하도록 할 수 있는가?
RQ2학습 목표가 실제 타깃과 전이된 기저 간의 Wasserstein-2 거리와 어떤 관련이 있는가?
RQ3인터폴런트 I_t(및 가능하다면 기저)를 최적화하는 것이 전송 경로를 단축하고 최적 전달 맵을 산출하는가?
RQ4이 속도 기반의 인터폴런트 주도 접근법은 밀도 추정 및 이미지 생성에서 MLE 기반 연속 흐름 및 확산 모델과 어떻게 비교되는가?

주요 결과

인터폴런트로 유도된 밀도 ρ_t는 고유의 속도 v_t를 가진 연속 방정식을 만족시키며, 이는 이차 목적 G(v)를 최소화한다.
목적 함수 G(v)는 샘플로부터 경험적으로 추정할 수 있어 속도 장의 시뮬레이션 없는 학습이 가능하다.
이 프레임워크는 기저와 타깃으로부터 샘플을 생성하고 인터폴런트 경로를 따라 가능도를 계산할 수 있다.
인터폴런트에 대한 G(v)의 최솟값을 최대화하면 적절한 조건 아래 Benamou–Brenier 최적 전달 해에 해당하는 경로를 얻을 수 있다.
가우시안 기저 ρ_0와 사인 인터폴런트일 때 인터폴런트 속도는 밀도 점수(score)와 연결되며 점수 기반 모델에 대한 연결 고리를 제공하고, 샘플링에 대한 SDE 해석을 지원한다.
경험적 결과는 표 형 데이터에서의 경쟁력 있는 밀도 추정 성능, 128×128까지의 확장 가능한 이미지 생성, CIFAR-10 및 ImageNet 32×32에서의 경쟁력 있는 NLL/FID를 보여주며, 현대의 연속 흐름 및 확산 방법과 비교된다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.