QUICK REVIEW

[论文解读] Learning What Data to Learn

Fan Yang, Fei Tian|arXiv (Cornell University)|Feb 28, 2017

Machine Learning and Data Classification参考文献 27被引用 54

一句话总结

论文提出 Neural Data Filter (NDF)，一个深度强化学习框架，能够自动在小批量中选择数据以加速 SGD 训练同时保持准确性，在 MLP、CNN 和 RNN 任务中得到证明。

ABSTRACT

Machine learning is essentially the sciences of playing with data. An adaptive data selection strategy, enabling to dynamically choose different data at various training stages, can reach a more effective model in a more efficient way. In this paper, we propose a deep reinforcement learning framework, which we call \emph{ extbf{N}eural extbf{D}ata extbf{F}ilter} ( extbf{NDF}), to explore automatic and adaptive data selection in the training process. In particular, NDF takes advantage of a deep neural network to adaptively select and filter important data instances from a sequential stream of training data, such that the future accumulative reward (e.g., the convergence speed) is maximized. In contrast to previous studies in data selection that is mainly based on heuristic strategies, NDF is quite generic and thus can be widely suitable for many machine learning tasks. Taking neural network training with stochastic gradient descent (SGD) as an example, comprehensive experiments with respect to various neural network modeling (e.g., multi-layer perceptron networks, convolutional neural networks and recurrent neural networks) and several applications (e.g., image classification and text understanding) demonstrate that NDF powered SGD can achieve comparable accuracy with standard SGD process by using less data and fewer iterations.

研究动机与目标

将数据选择动机化为一个通用、前瞻性的问题，可以提升训练效率。
开发一个基于 DRL 的师生框架，其中数据选择被学习以优化长期回报。
将 NDF 应用于跨越多种神经网络结构和领域的 mini-batch SGD，以测试其通用性。
证明学习到的数据选择在最终精度相近的情况下可以加速收敛。

提出的方法

将带数据过滤的 SGD 训练表述为一个 SGD-MDP，其中状态将到达的小批量与当前模型参数组合在一起。
使用策略 A(s;Θ) 来决定在一个小批量中保留还是过滤哪些实例 (a ∈ {0,1}^M)。
用数据特征、基模型特征以及组合数据-模型特征来表示状态 s；为策略推导 f(s)。
使用 REINFORCE（策略梯度）优化策略，以最大化期望累积回报 R(s,a)。
用诸如验证准确率等训练信号定义回报，并设定折扣因子 γ 以捕捉长期效应。
在采样子集 D′ 上训练策略，并在 SGD 过程中对全量数据 D 应用学习到的策略。

实验结果

研究问题

RQ1是否可以使用强化学习自动学习能够改进 SGD 收敛的数据过滤策略？
RQ2学习到的数据选择策略是否能在不同的模型类型（MLP、CNN、RNN）和领域（视觉、文本）上泛化？
RQ3就收敛速度和最终准确率而言，NDF 与像自定进度学习（self-paced learning）等启发式数据选择方法相比如何？
RQ4哪些特征最能代表训练状态，从而有效支持数据过滤策略学习？

主要发现

NDF 在 MLP、CNN 和 RNN 实验中加速收敛并减少所需训练数据。
学习到的数据过滤策略在训练过程中往往选择更难的样本，与启发式 SPL 行为不同。
NDF 在收敛速度方面持续优于未过滤的 SGD 和 RandDrop，并且在许多情况下超越 SPL。
用 NDF 训练的策略对超参数设置具有鲁棒性，并在任务之间展示出强泛化性。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。