QUICK REVIEW

[论文解读] Meta-Learning Transferable Active Learning Policies by Deep Reinforcement Learning

Kunkun Pang, Mingzhi Dong|arXiv (Cornell University)|Jun 12, 2018

Machine Learning and Algorithms参考文献 11被引用 57

一句话总结

本文将主动学习视为元学习问题，并训练一个数据集嵌入、元网络引导的 DRL 策略来选择未标注点，目标是实现跨数据集的泛化能力和对基础学习器的不可知性。

ABSTRACT

Active learning (AL) aims to enable training high performance classifiers with low annotation cost by predicting which subset of unlabelled instances would be most beneficial to label. The importance of AL has motivated extensive research, proposing a wide variety of manually designed AL algorithms with diverse theoretical and intuitive motivations. In contrast to this body of research, we propose to treat active learning algorithm design as a meta-learning problem and learn the best criterion from data. We model an active learning algorithm as a deep neural network that inputs the base learner state and the unlabelled point set and predicts the best point to annotate next. Training this active query policy network with reinforcement learning, produces the best non-myopic policy for a given dataset. The key challenge in achieving a general solution to AL then becomes that of learner generalisation, particularly across heterogeneous datasets. We propose a multi-task dataset-embedding approach that allows dataset-agnostic active learners to be trained. Our evaluation shows that AL algorithms trained in this way can directly generalise across diverse problems.

研究动机与目标

通过元学习而非人工设计的启发式规则来激励发展主动学习准则。
提出一个数据集嵌入、元网络增强的 DRL 框架，以产生可迁移的 AL 策略。
通过在多样化的源数据集上进行训练并结合无监督领域自适应，实现跨数据集的泛化。
证明学习到的策略能够跨数据集泛化，并对基础分类器具有不可知性。

提出的方法

将一个主动学习准则建模为一个神经网络策略 π(a|s)，用于选择一个未标注的实例。
使用一个策略网络，其编码器权重 W_e 由一个元网络 Ψ 根据数据集状态 (L,U,f) 生成。
通过代表性和判别性的直方图引入数据集嵌入，以产生数据集条件权重。
用 REINFORCE 共同训练策略和元网络，以最大化最终测试准确率，辅以重构和熵正则化。
将基学习器保持为可配置的组件（对基础学习器不可知），使其适用于各种分类器。
在多个源数据集上进行多任务训练，以学习数据集不可知的策略。

实验结果

研究问题

RQ1基于 DRL 的 AL 策略是否能够在具有不同特征空间和统计特征的数据集之间泛化？
RQ2生成数据集条件策略权重的元网络是否能够实现跨数据集的迁移？
RQ3在多样化数据集上的多任务训练如何影响对未见数据集的泛化？
RQ4学习到的策略是否对底层基础分类器不可知？
RQ5辅助损失（重构、熵）对策略学习的影响如何？

主要发现

元学习的 AL 策略（MLP-GAL）在跨数据集评估中优于若干基线。
跨任务泛化表明 MLP-GAL (Te) 在未见数据集上平均性能高于 SingleRL 及其他方法。
随着数据集多样性的增加，多任务训练提高对未见数据集的泛化能力，尽管在更多领域时单数据集训练的性能可能下降。
该方法对基础学习器不可知，可以通过数据集嵌入适应不同数据集。
像 QUIRE 这样的复杂方法在某些数据集上表现良好，但在其他数据集上可能表现不佳，凸显主动学习中的泛化挑战。
基于元网络的数据集嵌入方法为多样化的主动学习任务提供了强健的可迁移性。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。