QUICK REVIEW

[论文解读] A no-regret generalization of hierarchical softmax to extreme multi-label classification

Marek Wydmuch, Kalina Jasińska|arXiv (Cornell University)|Oct 27, 2018

Text and Document Classification Technologies被引用 41

一句话总结

本文将概率标签树（PLTs）视为层次软最大（HSM）的无后悔泛化，用于极端多标签分类；证明在多标签设置下 pick-one-label 不一致；引入基于 PLTs 的 extremeText (XT)；并展示 XT 相对于最先进方法在性能和效率上的优势。

ABSTRACT

Extreme multi-label classification (XMLC) is a problem of tagging an instance with a small subset of relevant labels chosen from an extremely large pool of possible labels. Large label spaces can be efficiently handled by organizing labels as a tree, like in the hierarchical softmax (HSM) approach commonly used for multi-class problems. In this paper, we investigate probabilistic label trees (PLTs) that have been recently devised for tackling XMLC problems. We show that PLTs are a no-regret multi-label generalization of HSM when precision@k is used as a model evaluation metric. Critically, we prove that pick-one-label heuristic - a reduction technique from multi-label to multi-class that is routinely used along with HSM - is not consistent in general. We also show that our implementation of PLTs, referred to as extremeText (XT), obtains significantly better results than HSM with the pick-one-label heuristic and XML-CNN, a deep network specifically designed for XMLC problems. Moreover, XT is competitive to many state-of-the-art approaches in terms of statistical performance, model size and prediction time which makes it amenable to deploy in an online system.

研究动机与目标

激发 XMLC 的研究动机，以及在极大标签空间中对标签概率估计的可扩展性和准确性的需求。
提出概率标签树（PLTs），作为层次 softmax（HSM）的恰当多标签泛化。
建立理论保证，展示 PLTs 在 precision@k 下的零后悔性质。
开发基于 fastText 的高效 XT 实现。
在实验中将 XT 与强基线进行比较，展示在准确性、模型尺寸和预测时间方面的有利权衡。

提出的方法

用边际标签概率 eta_j(x) 和 precision@k 作为关键指标来表述 XMLC。
指出与 HSM 结合使用的 pick-one-label 归约在多标签 precision@k 上通常并不一致。
引入带有根指示符的扩展编码的 PLTs，使得节点分类器可以独立训练，并在预测时进行概率校准。
给出理论界限：eta_j 的估计误差被路径级节点分类器的后悔所界定（定理1），reg_p@k 被标签级误差所界定（定理2）。
描述 XT 的实现：在稠密表示上进行在线训练，使用 TF-IDF 加权特征、L2 正则化，并通过自上而下的平衡聚类构建一个平衡的多叉结构。
解释树的选择（例如 Huffman 与聚类）并为统计/计算权衡提供平衡性的正当性。

实验结果

研究问题

RQ1PLTs 能否为极端多标签分类提供零后悔的边际概率估计？
RQ2在常见评估指标如 precision@k 下，pick-one-label 归约是否是多标签 XMLC 的一致方法？
RQ3基于 PLT 的方法（XT）在准确性、模型大小和速度方面与基于 HSM 的方法和深度网络（XML-CNN）相比如何？
RQ4哪些实际指南（树构建、特征表示、正则化）能在不同 XMLC 数据集上实现 robuste XT 性能？

主要发现

PLTs 为多标签 XMLC 提供无后悔的边际概率估计，解决了 pick-one-label 方法的不一致性。
pick-one-label 启发式在 precision@k 上通常并不一致，但在 PLTs 的强适当组合损失下具有理论保证。
XT（extremeText）显著优于基于 HSM 的方法（fastText、Learned Tree），并且与最先进的 XMLC 方法相竞争，同时提供更快的预测和更小的模型。
XT 在多个大型基准上实现接近最先进的 precision@k，并在与某些基线（如 DiSMEC、PPDSparse）相比时实现数量级级别的更快在线预测。
树结构（自上而下聚类）和 TF-IDF 加权表示对 XT 的性能与鲁棒性有实质性贡献。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。