QUICK REVIEW

[论文解读] ZLPR: A Novel Loss for Multi-label Classification

Jianlin Su, Mingren Zhu|arXiv (Cornell University)|Aug 5, 2022

Text and Document Classification Technologies被引用 26

一句话总结

ZLPR 引入一个零界限的 log-sum-exp 和基于成对排序的损失，用于多标签分类，能够处理不确定的标签数量并捕捉标签依赖，同时实现高效的预测。

ABSTRACT

In the era of deep learning, loss functions determine the range of tasks available to models and algorithms. To support the application of deep learning in multi-label classification (MLC) tasks, we propose the ZLPR (zero-bounded log-sum-exp \& pairwise rank-based) loss in this paper. Compared to other rank-based losses for MLC, ZLPR can handel problems that the number of target labels is uncertain, which, in this point of view, makes it equally capable with the other two strategies often used in MLC, namely the binary relevance (BR) and the label powerset (LP). Additionally, ZLPR takes the corelation between labels into consideration, which makes it more comprehensive than the BR methods. In terms of computational complexity, ZLPR can compete with the BR methods because its prediction is also label-independent, which makes it take less time and memory than the LP methods. Our experiments demonstrate the effectiveness of ZLPR on multiple benchmark datasets and multiple evaluation metrics. Moreover, we propose the soft version and the corresponding KL-divergency calculation method of ZLPR, which makes it possible to apply some regularization tricks such as label smoothing to enhance the generalization of models.

研究动机与目标

提出一个适用于多标签分类（MLC）任务的、支持深度学习的损失函数。
开发一个能够处理不确定的目标标签数量，同时考虑标签之间相关性的损失函数。
提供一种在计算上高效的替代 LP 方法，并在对依赖性方面比 BR 方法具有更好的性能。
提出带有 KL 散度正则化的软版本，以提高泛化能力。

提出的方法

定义 ZLPR 损失为 L_zlpr = log(1 + sum_{i in Omega_pos} e^{-s_i}) + log(1 + sum_{j in Omega_neg} e^{s_j}), where Omega_pos is the positive label set and Omega_neg is Λ 兀{Omega_pos} Omega.
解释 s_i 是模型的对数值（logits），并且零界定指在预测阶段目标相关性对应的 s_i 符号。
展示 ZLPR 通过在保持预测高效的同时使标签依赖能够被捕捉，从而结合了 BR 与 LR 模式的优点。
推导出一个使用内积的等价形式：L_zlpr = log(1 + <y, e^{-s}>) + log(1 + <1 - y, e^{s}>) 其中 y 是多热标签向量。
引入 TLPR 先导形式并简化为一个阈值无关的形式，设 s_0 = 0 以提升实用性。
讨论一个软标签版本 L_zlpr^soft，用于概率标签及其梯度，以便潜在的标签平滑采用。

实验结果

研究问题

RQ1ZLPR 能否有效处理具有不确定目标标签数量的 MLC 任务？
RQ2ZLPR 是否比 BR 更好地捕捉标签依赖，并与基于 LR 的损失相比具竞争力？
RQ3相较于基于 LP 的方法，ZLPR 对深度学习模型是否计算高效？
RQ4带有 KL-散度正则化的 ZLPR 的软版本是否能改善泛化能力？

主要发现

ZLPR 在 SubACC 指标上表现出色，在 22 组实验中有 16 组达到最好，表明对标签依赖的有效捕捉。
ZLPR 也在跨数据集的排序相关指标（如 AvgPrec 和 RankLoss）上表现良好。
DL2 在 MLC-F1、Micro-F1 和 Macro-F1 上可以与 ZLPR 相抗衡，但在若干基于 F1 的指标上，ZLPR 优于 BCE、FL、和 DL1。
与 LSEP 和 BCE 相比，ZLPR 在正负影响的平衡方面保持了依赖信息且无需昂贵的采样。
ZLPR 使得可使用带 KL-散度的软标签变体，从而实现如标签平滑和基于 dropout 的正则化策略等正则化技术。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。