QUICK REVIEW

[论文解读] Building Efficient Universal Classifiers with Natural Language Inference

Moritz Laurer, Wouter van Atteveldt|arXiv (Cornell University)|Dec 29, 2023

Topic Modeling被引用 10

一句话总结

该论文展示了自然语言推理（NLI）如何作为通用、高效的分类任务，并提供一个实用的管线和一个在33个数据集、389个类别上训练的通用分类器，相较仅使用NLI的模型，zeroshot性能提升了9.4%。

ABSTRACT

Generative Large Language Models (LLMs) have become the mainstream choice for fewshot and zeroshot learning thanks to the universality of text generation. Many users, however, do not need the broad capabilities of generative LLMs when they only want to automate a classification task. Smaller BERT-like models can also learn universal tasks, which allow them to do any text classification task without requiring fine-tuning (zeroshot classification) or to learn new tasks with only a few examples (fewshot), while being significantly more efficient than generative LLMs. This paper (1) explains how Natural Language Inference (NLI) can be used as a universal classification task that follows similar principles as instruction fine-tuning of generative LLMs, (2) provides a step-by-step guide with reusable Jupyter notebooks for building a universal classifier, and (3) shares the resulting universal classifier that is trained on 33 datasets with 389 diverse classes. Parts of the code we share has been used to train our older zeroshot classifiers that have been downloaded more than 55 million times via the Hugging Face Hub as of December 2023. Our new classifier improves zeroshot performance by 9.4%.

研究动机与目标

证明NLI可以作为零样本和少样本学习的通用分类任务。
提供一个实用、可复现的管道，将NLI与非NLI数据结合起来，构建一个通用分类器。
发布一个在多样化数据集上训练的通用分类器，并为将其适应到新任务和领域提供指导。

提出的方法

将五个NLI数据集与28个非-NLI数据集整合成单一的二分类蕴含格式。
将非NLI类别转化为假设陈述，并将文本与所有类别假设配对以进行评估。
在拼接的假设–前提数据上，采用二分类蕴含目标，对编码器型Transformer（DeBERTaV3）进行微调。
使用28个保留任务的平衡精度以及领域内任务进行模型评估。
提供训练、评估和适应通用分类器的笔记本和工具；发布 deberta-v3-zeroshot-v1.1-all-33 作为推荐模型。

Figure 1: Illustration of universal classification with BERT-NLI based on Laurer et al., 2023a

实验结果

研究问题

RQ1NLI是否可以用作通用任务，在不进行任务特定微调的情况下，对多样化任务进行zeroshot分类？
RQ2将NLI数据与非-NLI分类数据混合相比仅使用NLI数据，是否能提升zeroshot与少样本泛化能力？
RQ3使用NLI进行多类别分类在计算成本上有什么权衡，并且它如何随类别数量增长而扩展？

主要发现

混合训练集的NLI和非NLI数据比仅NLI训练获得更高的zeroshot性能，平均提升9.4%。
在33个数据集和389个类别上训练的通用分类器显示更广的任务覆盖和更好的泛化能力，包括未见数据集。
模型deberta-v3-zeroshot-v1.1-all-33被推荐用于下游的zeroshot分类任务。
每次大约训练约900万对假设–前提对，耗时在现代GPU上数小时，需要多次运行以进行未见数据评估。
存在一些负迁移情况，其中混合任务模型在某些数据集上表现不如NLI-only模型，但总体收益是稳健的。

Figure 3: Mean performance across 28 classification tasks.

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。