QUICK REVIEW

[论文解读] Learning from Conditional Distributions via Dual Embeddings

Bo Dai, Niao He|arXiv (Cornell University)|Jul 15, 2016

Advanced Bandit Algorithms Research参考文献 29被引用 20

一句话总结

本文提出 Embedding-SGD，一种从条件分布中学习的新型极小-极大重 formulation，通过仅对每个条件分布使用一个样本，实现了高效的函数估计。通过利用对偶嵌入和核方法，该方法在策略评估和不变性学习中实现了最先进性能，并具备可证明的样本复杂度和理论保证。

ABSTRACT

Many machine learning tasks, such as learning with invariance and policy evaluation in reinforcement learning, can be characterized as problems of learning from conditional distributions. In such problems, each sample $x$ itself is associated with a conditional distribution $p(z|x)$ represented by samples $\{z_i\}_{i=1}^M$, and the goal is to learn a function $f$ that links these conditional distributions to target values $y$. These learning problems become very challenging when we only have limited samples or in the extreme case only one sample from each conditional distribution. Commonly used approaches either assume that $z$ is independent of $x$, or require an overwhelmingly large samples from each conditional distribution. To address these challenges, we propose a novel approach which employs a new min-max reformulation of the learning from conditional distribution problem. With such new reformulation, we only need to deal with the joint distribution $p(z,x)$. We also design an efficient learning algorithm, Embedding-SGD, and establish theoretical sample complexity for such problems. Finally, our numerical experiments on both synthetic and real-world datasets show that the proposed approach can significantly improve over the existing algorithms.

研究动机与目标

解决在每个输入下仅能获得一个或少数几个样本时，从条件分布中学习的挑战。
克服现有方法的局限性，这些方法假设 z 和 x 之间相互独立，或要求每个条件分布有大量样本。
为涉及嵌套期望和条件分布的问题，开发一种理论基础扎实、样本高效的算法。
在强化学习策略评估和不变性学习等数据按条件分布稀疏的场景中，实现有效学习。
提供一个统一框架，支持非参数和参数化函数逼近器，包括通过双对偶嵌入实现的神经网络。

提出的方法

提出一种极小-极大重 formulation，将原始问题转化为涉及联合分布 p(z,x) 的问题，避免直接处理条件期望。
采用核嵌入技术，将条件分布表示在再生核希尔伯特空间（RKHS）中，实现非参数估计。
设计 Embedding-SGD 算法，一种在鞍点框架下交替更新原始函数和对偶函数的随机优化方法。
采用比先前方法（如 GTD2）使用的约束空间更具灵活性的对偶函数空间，提升优化能力。
将核嵌入与随机梯度下降相结合，直接最小化均方贝尔曼误差，无需代理目标。
通过随机特征扩展至参数化模型，并通过双神经网络嵌入支持深度学习，实现端到端学习。

实验结果

研究问题

RQ1我们能否设计一种学习算法，有效处理每个分布仅有一个样本的条件分布学习？
RQ2如何将涉及条件分布的嵌套期望问题重新表述为联合优化框架？
RQ3在采样受限条件下，从条件分布中学习的理论样本复杂度是多少？
RQ4我们能否在数据极少的情况下，实现优于 GTD2、RG 和核 MDP 等现有算法的性能，用于策略评估任务？
RQ5如何结合对偶嵌入与核方法，以提升分布学习中的泛化能力和优化性能？

主要发现

所提出的 Embedding-SGD 算法在导航、秋千摆动控制和 PUMA-560 操控任务的策略评估中，显著优于 GTD2、残差梯度和核 MDP。
在导航任务中，Embedding-SGD 的均方贝尔曼误差低于所有基线方法，证明了其在每个条件分布仅使用一个样本时的优越样本效率。
在秋千摆动控制任务中，即使每个状态-动作对的数据极少，该算法仍保持稳定且更低的误差，优于 GTD2 和 RG。
在 PUMA-560 操控任务中，该方法在价值函数估计精度上表现出一致的改进，验证了其在高维控制场景中的鲁棒性。
该算法通过直接优化均方贝尔曼误差，无需代理目标，实现了最先进性能，而 GTD2 和 RG 则依赖代理目标。
理论分析建立了可证明的样本复杂度，使其成为首个在单一样本条件设定下提供此类保证的算法。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。