QUICK REVIEW

[论文解读] Joint Embedding of Words and Labels for Text Classification

Guoyin Wang, Chunyuan Li|arXiv (Cornell University)|May 10, 2018

Topic Modeling参考文献 34被引用 45

一句话总结

LEAM 在共享空间中嵌入单词和标签，利用标签-单词兼容性进行注意力分配并加权词嵌入以进行文本分类，在低复杂度下实现较高的准确性。

ABSTRACT

Word embeddings are effective intermediate representations for capturing semantic regularities between words, when learning the representations of text sequences. We propose to view text classification as a label-word joint embedding problem: each label is embedded in the same space with the word vectors. We introduce an attention framework that measures the compatibility of embeddings between text sequences and labels. The attention is learned on a training set of labeled samples to ensure that, given a text sequence, the relevant words are weighted higher than the irrelevant ones. Our method maintains the interpretability of word embeddings, and enjoys a built-in ability to leverage alternative sources of information, in addition to input text sequences. Extensive results on the several large text datasets show that the proposed framework outperforms the state-of-the-art methods by a large margin, in terms of both accuracy and speed.

研究动机与目标

说明为何标签信息应影响早期文本表示，而不仅仅是最终分类器。
提出一个联合的词-标签嵌入框架，以生成标签感知、可解释的文本表示。
开发基于词-标签兼容性的注意力机制，以对分类中的单词进行加权。
证明标签嵌入在多个数据集上的文本分类具有计算高效且准确的特性。
通过多标签医学编码预测和注意力单词的可解释性，展示潜在的临床应用。

提出的方法

将词和类别标签嵌入到一个共享潜在空间。
通过余弦相似度和基于短语窗口的非线性局部注意力来计算词-标签兼容性 G。
使用注意力分数（对短语进行 SoftMax）将词嵌入的加权平均形成文本表示 z。
以端到端方式进行训练，使用标准分类损失（单标签的交叉熵；多标签的基于 sigmoid 的损失）以及将标签嵌入置于其类别流形上的正则化项。
正则化标签嵌入以与实际类别描述对齐，从而实现可解释性和有意义的锚点。
可选地用预训练向量（如 GloVe）初始化词嵌入，并与标签嵌入一起学习。

实验结果

研究问题

RQ1联合的词-标签嵌入能否通过用标签信息引导词表示来提升文本分类？
RQ2所提出的标签嵌入注意力模型 LEAM 是否在计算成本较低的情况下实现与深度注意力架构相当的准确性？
RQ3标签嵌入是否提供可解释的注意力，突出有助于预测的关键信息词？
RQ4LEAM 在标准基准数据集和真实世界医疗编码预测任务中的表现如何？

主要发现

Yahoo	DBPedia	AGNews	Yelp P.	Yelp F.
77.42	99.02	92.45	95.31	64.09
75.22	98.32	91.75	93.43	61.03
69.98	98.15	89.13	94.46	58.59
70.94	98.28	91.45	95.11	59.48
70.84	98.55	86.06	94.74	58.17
73.43	98.71	91.27	95.72	64.26
73.53	98.42	92.24	93.76	61.11
76.28	98.77	93.32	94.56	62.13

LEAM 在基准文档分类数据集上优于若干最先进的基线，在 Yahoo 和 DBPedia 上取得顶级结果。
LEAM 在模型规模与速度方面表现良好，比某些基线收敛更快，参数量也低于 CNN/LSTM 模型。
非线性、具有空间感知的注意力（基于短语）对高性能是必要的，优于线性变体。
标签嵌入具有意义，与类别中心相关性高，支持可解释的注意力，突出任务相关的关键词。
在多标签临床文本中，LEAM 取得最佳 AUC 和具有竞争力的 F1/P@5 指标，注意力可视化突出与健康相关的术语。
LEAM 的注意力可通过使信息性词汇更加显著来减少临床医生的阅读负担。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。