Skip to main content
QUICK REVIEW

[論文レビュー] Maximum Mean Discrepancy Gradient Flow

Michael Arbel, Anna Korba|arXiv (Cornell University)|Jun 11, 2019
Generative Adversarial Networks and Image Synthesis参考文献 49被引用数 37
ひとこと要約

Wasserstein 勾配流を MMD に対して導入し、グローバル最適解への収束を分析し、実用的実装のためのノイズ正規化粒子アルゴリズムを提案する。

ABSTRACT

We construct a Wasserstein gradient flow of the maximum mean discrepancy (MMD) and study its convergence properties. The MMD is an integral probability metric defined for a reproducing kernel Hilbert space (RKHS), and serves as a metric on probability measures for a sufficiently rich RKHS. We obtain conditions for convergence of the gradient flow towards a global optimum, that can be related to particle transport when optimizing neural networks. We also propose a way to regularize this MMD flow, based on an injection of noise in the gradient. This algorithmic fix comes with theoretical and empirical evidence. The practical implementation of the flow is straightforward, since both the MMD and its gradient have simple closed-form expressions, which can be easily estimated with samples.

研究の動機と目的

  • Motivate and construct a gradient flow on the space of probability measures endowed with the Wasserstein metric using the MMD as the objective.
  • Derive and analyze the continuous-time and discrete-time (Euler) gradient flows of the MMD towards the target distribution.
  • Investigate conditions for convergence to the global optimum and identify barriers to non-convex settings.
  • Propose a regularization strategy by injecting noise into the gradient to improve convergence in practice and provide theoretical justification.

提案手法

  • Define the MMD between a fixed target distribution µ and a variable ν in a reproducing kernel Hilbert space, and express F(ν)=1/2 MMD^2(µ,ν).
  • Formulate a gradient flow on P2(X) via the continuity equation with velocity Vt=−∇fµ,νt, yielding ∂tνt = div(νt ∇fµ,νt).
  • Show that F(νt) decreases along the flow with dF(νt)/dt = −∫ ||∇fµ,νt(x)||^2 dνt(x).
  • Provide a forward-Euler discretization νn+1=(I−γ∇fµ,νn)#νn) and establish conditions under which F(νn) decreases.
  • Introduce a noisy gradient update Xn+1 = Xn − γ ∇fµ,νn(Xn + βnUn) as a regularized scheme.
  • Present a practical particle-based algorithm updating Xi n+1 = Xi n − γ ∇fˆµ,ˆνn(Xi n + βnUi n) using samples from µ and νn and analyze its convergence.

実験結果

リサーチクエスチョン

  • RQ1Under what conditions does the MMD Wasserstein gradient flow converge to the global optimum?
  • RQ2How can the non-convexity of F be mitigated to ensure convergence in practice?
  • RQ3What regularization (via noise) best promotes global convergence without altering the true optimum?
  • RQ4How does a particle-based sampling implementation approximate the population flow and what are the convergence guarantees?

主な発見

  • The MMD gradient flow in W2 is well-defined, with a Lyapunov decrease of F along the flow.
  • A discrete forward-Euler scheme yields decreasing F provided the step-size γ is small enough (γ ≤ 2/3L).
  • F is not generally displacement convex; instead it is Λ-displacement convex, leading to nontrivial convergence analysis and potential barriers.
  • A regularization by injecting noise into the gradient (noisy update) can guarantee convergence to the global minimum under suitable conditions on the noise schedule.
  • The proposed particle-based algorithm has polynomial-time per-iteration complexity and converges to the population flow as sample sizes increase, with quantified propagation of chaos results.
  • Empirical evidence shows that noise-injected MMD flow can outperform plain MMD and KSD in training regression-like networks on synthetic tasks.

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。