QUICK REVIEW

[論文レビュー] AttentionXML: Label Tree-based Attention-Aware Deep Model for High-Performance Extreme Multi-Label Text Classification

Ronghui You, Zihan Zhang|arXiv (Cornell University)|Nov 1, 2018

Text and Document Classification Technologies参考文献 31被引用数 81

ひとこと要約

AttentionXML は、各ラベルに対するマルチラベル注意機構を備えたラベルツリーベースの深層モデルと、長いテキストおよび tail ラベルで最先端の XMTC 性能を達成する浅くて広い確率的ラベルツリーを導入します。

ABSTRACT

Extreme multi-label text classification (XMTC) is an important problem in the\nera of big data, for tagging a given text with the most relevant multiple\nlabels from an extremely large-scale label set. XMTC can be found in many\napplications, such as item categorization, web page tagging, and news\nannotation. Traditionally most methods used bag-of-words (BOW) as inputs,\nignoring word context as well as deep semantic information. Recent attempts to\novercome the problems of BOW by deep learning still suffer from 1) failing to\ncapture the important subtext for each label and 2) lack of scalability against\nthe huge number of labels. We propose a new label tree-based deep learning\nmodel for XMTC, called AttentionXML, with two unique features: 1) a multi-label\nattention mechanism with raw text as input, which allows to capture the most\nrelevant part of text to each label; and 2) a shallow and wide probabilistic\nlabel tree (PLT), which allows to handle millions of labels, especially for\n"tail labels". We empirically compared the performance of AttentionXML with\nthose of eight state-of-the-art methods over six benchmark datasets, including\nAmazon-3M with around 3 million labels. AttentionXML outperformed all competing\nmethods under all experimental settings. Experimental results also show that\nAttentionXML achieved the best performance against tail labels among label\ntree-based methods. The code and datasets are available at\nhttp://github.com/yourh/AttentionXML .\n

研究の動機と目的

続く伝統的な BOW や単一表現モデルが失敗する極端に大きなラベル集合と tail ラベルを伴う XMTC の課題を動機づける。
millions ラベルへスケールするラベルツリーベースの深層アーキテクチャを導入する。
各ラベルに対するマルチラベル注意機構を活用して、ラベルごとに情報価値のあるテキスト部分に焦点を合わせる。
訓練/推論の複雑さを削減しつつ精度を維持するために、浅くて広い確率的ラベルツリーを使用する。
多様なデータセットに渡って最先端 XMTC 手法に対する経験的優位性を示す。

提案手法

初期の階層的分割を、内部ノードあたりの子数を制限した subtree に圧縮して、浅くて広い確率的ラベルツリー (PLT) を構築する。
生のテキストと多ラベル注意機構を備えた BiLSTM によって各 PLT レベルでレベル別の深層モデル (AttentionXML_d) を訓練し、ラベルごとに異なる注意ベクトルを割り当てる。
ラベル特異表現を多ラベル注意機構で計算する： m_j = sum_i alpha_ij h_i となり、alpha_ij は exp(h_i^T w_j)/sum_t exp(h_t^T w_j) に比例する。
モデルサイズを制御しラベル間の一貫性を促進するため、全てのラベルで全結合層と出力層のパラメータを共有する。
訓練には二値交差エントロピー損失を用い、深いレベルの初期化を浅いレベルのパラメータから行って収束を加速する。
推定時には PLT の各レベルで候補ノードを制限するビーム探索を採用して効率化する。

実験結果

リサーチクエスチョン

RQ1ショートな浅いラベルツリーは、何百万ものラベルでの学習を可能にしつつ XMTC の性能を維持または向上できるか。
RQ2 生のテキストにおけるラベル固有の多重ラベル注意は、単一表現または Bag-of-Words ベースラインよりラベル関連のサブテキストをより良く捉えるか。
RQ3 レベル別訓練戦略とPLT階層の組み合わせは学習効率と tail ラベルの精度を改善するか。
RQ4 AttentionXML は従来の標準データセットおよび極端スケールのデータセットの両方で、特に長文と tail ラベルに対して、最先端 XMTC 手法と比較してどうか。

主な発見

AttentionXML は、 Amazon-3M を含む約 3 百万ラベルのデータを含む 6 つの XMTC ベンチマークで、8つの競合手法を一貫して上回る。
AttentionXML-1 (単一 PLT) も大規模データセットで競合他社を上回る、アンサンブルなしでも強力な性能を示す。
3つの浅い PLT のアンサンブルは、特に極端スケールデータセットで精度をさらに向上させるが、訓練/予測時間は増加する。
BiLSTM とマルチラベル注意は、長文（EUR-Lex、Wiki10-31K）で XML-CNN および max-pooling を用いた BiLSTM より性能を改善する。
AttentionXML は tail ラベルでの性能（PSP@k）で Parabel および Bonsai に対して優位を示し、長く尾を引く XMTC に対する利点を示す。
より長いテキストはラベル固有の注意機構の恩恵を受け、Wiki-500K、EUR-Lex、Wiki10-31K で顕著な利得を示す。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。