QUICK REVIEW

[论文解读] Language Models are General-Purpose Interfaces

Yaru Hao, Haoyu Song|arXiv (Cornell University)|Jun 13, 2022

Topic Modeling被引用 27

一句话总结

MetaLM 训练一个半因果语言模型，将双向编码器（用于语言和视觉）与因果语言模型对接，作为多任务界面的通用接口，能够在语言-only 和视觉-语言设置中进行多任务微调、指令化微调和上下文学习。

ABSTRACT

Foundation models have received much attention due to their effectiveness across a broad range of downstream applications. Though there is a big convergence in terms of architecture, most pretrained models are typically still developed for specific tasks or modalities. In this work, we propose to use language models as a general-purpose interface to various foundation models. A collection of pretrained encoders perceive diverse modalities (such as vision, and language), and they dock with a language model that plays the role of a universal task layer. We propose a semi-causal language modeling objective to jointly pretrain the interface and the modular encoders. We subsume the advantages and capabilities from both causal and non-causal modeling, thereby combining the best of two worlds. Specifically, the proposed method not only inherits the capabilities of in-context learning and open-ended generation from causal language modeling, but also is conducive to finetuning because of the bidirectional encoders. More importantly, our approach seamlessly unlocks the combinations of the above capabilities, e.g., enabling in-context learning or instruction following with finetuned encoders. Experimental results across various language-only and vision-language benchmarks show that our model outperforms or is competitive with specialized models on finetuning, zero-shot generalization, and few-shot learning.

研究动机与目标

在单一的开放式生成界面下实现对不同任务的统一，通过通用任务层进行中介。
开发一个半因果的预训练目标，联合训练编码器和接口。
证明 MetaLM 界面支持上下文学习、微调，以及跨语言和视觉语言任务的零-shot/少-shot 泛化。
证明将非因果编码器与因果解码器结合能够达到或超越特定任务模型的表现。

提出的方法

引入 MetaLM，一种具有单向解码器和用于不同模态的多个双向编码器（连接器）的半因果语言模型。
设计一个半因果目标，在自回归生成标记的同时以来自编码器的双向跨度表示为条件。
使用连接器层将编码器输出映射到通用任务层，并对预测标记共享输出词汇表。
在大型英文文本（Pile）上进行预训练，并且对于视觉-语言任务，在图像-文本对上使用一个联合预训练目标。
在语言单独和视觉-语言基准上，进行多任务微调、单任务微调、指令微调、上下文学习、零-shot/少-shot 设置，以及下游微调的评估。
给出将 MetaLM 与 GPT 在多任务微调和跨多个任务簇的比较结果。

实验结果

研究问题

RQ1半因果预训练目标是否能使通用语言模型界面同时受益于因果编码器和双向编码器？
RQ2将多个双向编码器连到一个因果解码器，是否能在语言单独和多模态任务中实现有效的多任务微调、指令跟随与上下文学习？
RQ3与基线模型相比，MetaLM 在零-shot/少-shot 泛化、上下文学习和微调方面的表现如何？
RQ4仅更新编码器、在保持接口固定的情况下，单任务微调的增益为何？
RQ5在合适的连接器下，视觉-语言任务是否可以通过同一半因果接口有效处理？

主要发现

任务簇	GPT	MetaLM
自然语言推理	65.0	79.1
情感分析	92.9	94.6
改写	83.9	89.6
指代消解	67.1	84.3
常识推理	63.3	84.2
阅读理解	64.5	73.1
其他	80.3	84.3
闭卷问答	38.2	44.3
结构到文本	44.2	44.1
摘要	29.8	31.0

MetaLM 在大多数多任务微调任务簇上持续超越 GPT，尤其在自然语言理解与阅读理解方面。
仅对编码器进行单任务微调时，保持接口不变的情况下，得到与强基线相竞争的结果。
使用 MetaLM 进行指令微调显著提升零-shot与最佳模板在各簇上的表现。
在上下文学习方面，MetaLM 在若干 StoryCloze、HellaSwag、Winograd 风格和常识任务中与或超越 GPT。
在视觉-语言任务上，该框架支持零-shot、上下文学习和微调模式，在 VQA、字幕、视觉推理和解释方面取得具有竞争力的结果。
在语言单独任务中，MetaLM 在自然语言推理、情感、改写、问答等簇的微调相比零-shot呈现显著提升。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。