QUICK REVIEW

[论文解读] Train Once, Test Anywhere: Zero-Shot Learning for Text Classification

Pushpankar Kumar Pushp, Muktabh Mayank Srivastava|arXiv (Cornell University)|Dec 16, 2017

Domain Adaptation and Few-Shot Learning参考文献 2被引用 75

一句话总结

本文提出一种用于文本分类的零样本学习框架，通过在二元设定下预测句子与标签的相关性，实现跨数据集的无再次训练泛化。它提出了三种神经网络架构，并展示了以带有SEO标签的网页头条为源数据集进行跨数据集迁移的能力。

ABSTRACT

Zero-shot Learners are models capable of predicting unseen classes. In this work, we propose a Zero-shot Learning approach for text categorization. Our method involves training model on a large corpus of sentences to learn the relationship between a sentence and embedding of sentence's tags. Learning such relationship makes the model generalize to unseen sentences, tags, and even new datasets provided they can be put into same embedding space. The model learns to predict whether a given sentence is related to a tag or not; unlike other classifiers that learn to classify the sentence as one of the possible classes. We propose three different neural networks for the task and report their accuracy on the test set of the dataset used for training them as well as two other standard datasets for which no retraining was done. We show that our models generalize well across new unseen classes in both cases. Although the models do not achieve the accuracy level of the state of the art supervised models, yet it evidently is a step forward towards general intelligence in natural language processing.

研究动机与目标

引入一个将文本分类作为句子与标签之间二元相关性任务的零样本学习框架。
实现跨数据集泛化，使在一个数据集上训练的模型能够对来自其他数据集的数据进行分类，而无需重新训练。
提出并评估三种零样本文本分类的神经网络架构。
展示在嘈杂的大规模数据上进行训练可以提升对未见类别和数据集的泛化能力。

提出的方法

将任务建模为二元分类：预测给定句子是否与给定标签相关。
在包含SEO标签的新闻头条大型源数据集上使用二元交叉熵损失进行训练。
开发三种架构：体系结构1 使用平均池化的词嵌入与标签嵌入拼接；体系结构2 使用对句子词的LSTM，最后一个隐藏状态与标签嵌入拼接进行预测；体系结构3 使用对[标签嵌入 : 词]输入的LSTM，最后一个隐藏状态进行预测。
使用预训练的Google News嵌入来初始化词嵌入。
在源数据集的未见标签以及UCI News Aggregator和Tweet Classification数据集上进行评估，采用分类树方法将标签映射到更广的类别。

实验结果

研究问题

RQ1零样本学习在未见标签和数据集上是否也能预测句子-标签相关性，而无需重新训练？
RQ2利用句子和标签嵌入的神经架构是否能泛化到新的数据集和类别的粒度水平？
RQ3在跨数据集评估时，使用类别树与直接标签名进行比较时，性能有何差异？

主要发现

在源数据集的二元相关性任务的测试集上，三种架构的准确率最高可达74%。
对于源数据集中的未见标签，体系结构3的准确率提升至78%。
在UCI News Aggregator数据集上，采用类别树方法的准确率在61.73%至64.21%之间，取决于架构，低于有监督的最新水平，但展示了无需重新训练就实现跨数据集泛化的能力。
在Tweet Classification数据集中，基于类别树的结果约为64.5%（体系结构3），而直接类别名分类在体系结构3中为49%。
总体而言，模型显示出学习句子和标签之间相关性的能力，并对未见数据集和概念具备泛化能力，尽管仍有提升空间。
本工作强调，在嘈杂的网络数据上训练所得到的表征，相较于较小的、任务特定的数据集，具备更强的泛化能力。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。