QUICK REVIEW

[论文解读] Importance Sampling for Minibatches

Dominik Csiba, Peter Richtárik|arXiv (Cornell University)|Feb 6, 2016

Stochastic Gradient Optimization Techniques参考文献 36被引用 24

一句话总结

本文提出了 stochastic optimization 中 minibatch 的首个重要性采样策略，通过将非均匀采样带来的方差减少与小批量训练相结合，加速了收敛。该方法提供了严格的复杂度分析，并在真实数据集上实现了高达一个数量级的速度提升，在重尾分布的合成数据上则实现了数个数量级的性能改进。

ABSTRACT

Minibatching is a very well studied and highly popular technique in supervised learning, used by practitioners due to its ability to accelerate training through better utilization of parallel processing power and reduction of stochastic variance. Another popular technique is importance sampling -- a strategy for preferential sampling of more important examples also capable of accelerating the training process. However, despite considerable effort by the community in these areas, and due to the inherent technical difficulty of the problem, there is no existing work combining the power of importance sampling with the strength of minibatching. In this paper we propose the first {\em importance sampling for minibatches} and give simple and rigorous complexity analysis of its performance. We illustrate on synthetic problems that for training data of certain properties, our sampling can lead to several orders of magnitude improvement in training time. We then test the new sampling on several popular datasets, and show that the improvement can reach an order of magnitude.

研究动机与目标

解决 stochastic optimization 中重要性采样与小批量结合的理论缺失问题。
通过为更具信息量的样本分配更高的采样概率，降低小批量 SGD 中梯度估计的方差。
在一般数据条件下，为所提出方法提供理论基础的复杂度分析。
通过实证结果表明，该方法在合成数据和真实世界数据集上均能显著加速训练过程。
证明重要性采样与小批量的结合带来了收敛速度的乘法性提升，而非仅仅是加法性改进。

提出的方法

提出一种新颖的采样方案，称为 'tau-importance sampling'，通过基于数据依赖的重要性评分，使用非均匀概率选择小批量。
定义了一种基于桶的采样机制，将样本分组到桶中，并通过有放回地选择整个桶来形成小批量。
利用哈达玛积（Hadamard product）和对角矩阵，建立采样分布的概率矩阵表示，以建模联合包含概率。
使用矩阵不等式建立理论边界，特别是利用柯西-施瓦茨不等式，将采样概率与方差减少联系起来。
通过概率矩阵的归一化特征值概念分析收敛速率，并推导复杂度边界。
将该框架应用于合成与真实数据集，与均匀小批量采样及其他基线方法进行性能比较。

实验结果

研究问题

RQ1能否有效结合重要性采样与小批量，以减少梯度方差并加速收敛？
RQ2在一般数据条件下，所提出的重要性采样子批量方法的理论复杂度是什么？
RQ3在样本重要性异质的数据集中，该方法与均匀小批量采样的性能相比如何？
RQ4该方法在实际中是否能实现显著加速，特别是在具有重尾重要性分布的数据上？
RQ5重要性采样与小批量的结合是否带来了收敛速度的乘法性提升，而非仅仅是加法性提升？

主要发现

在具有重尾重要性分布的合成数据集上，所提方法相比均匀小批量采样，训练时间实现了数个数量级的显著提升。
在真实世界数据集上，该方法将训练时间相比标准小批量 SGD（均匀采样）减少了整整一个数量级。
理论分析表明，通过重要性采样，将最大数据依赖量替换为平均值，从而改善了收敛速率中的主导常数。
该方法在多种数据类型下均表现出鲁棒性，包括均匀分布、卡方分布以及极端重要性分布，实验结果充分验证了这一点。
复杂度分析确认，由于梯度估计方差的降低，收敛速率得到提升，在强凸条件下可证明具有更快的线性收敛速度。
实证结果验证了理论假设，表明在多种基准数据集上，该方法始终表现出一致且显著的加速效果。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。