Skip to main content
QUICK REVIEW

[论文解读] Standing on the Shoulders of Giant Frozen Language Models

Yoav Levine, Itay Dalmedigos|arXiv (Cornell University)|Apr 21, 2022
Topic Modeling被引用 20
一句话总结

该论文表明三种新颖的冻结模型方法——输入依赖的提示微调、冻结阅读器,以及递归语言模型——在不更新基础模型权重的情况下即可在具有挑战性的NLP任务上达到或超过微调模型的表现。它主张冻结的语言模型保持灵活性,并引入在实际应用中成本较高但效果显著的技术来利用它们。

ABSTRACT

Huge pretrained language models (LMs) have demonstrated surprisingly good zero-shot capabilities on a wide variety of tasks. This gives rise to the appealing vision of a single, versatile model with a wide range of functionalities across disparate applications. However, current leading techniques for leveraging a "frozen" LM -- i.e., leaving its weights untouched -- still often underperform fine-tuning approaches which modify these weights in a task-dependent way. Those, in turn, suffer forgetfulness and compromise versatility, suggesting a tradeoff between performance and versatility. The main message of this paper is that current frozen-model techniques such as prompt tuning are only the tip of the iceberg, and more powerful methods for leveraging frozen LMs can do just as well as fine tuning in challenging domains without sacrificing the underlying model's versatility. To demonstrate this, we introduce three novel methods for leveraging frozen models: input-dependent prompt tuning, frozen readers, and recursive LMs, each of which vastly improves on current frozen-model approaches. Indeed, some of our methods even outperform fine-tuning approaches in domains currently dominated by the latter. The computational cost of each method is higher than that of existing frozen model methods, but still negligible relative to a single pass through a huge frozen LM. Each of these methods constitutes a meaningful contribution in its own right, but by presenting these contributions together we aim to convince the reader of a broader message that goes beyond the details of any given method: that frozen models have untapped potential and that fine-tuning is often unnecessary.

研究动机与目标

  • 证明在多任务和开放域问答基准上,冻结的语言模型在不对主干模型进行微调的情况下也能达到具有竞争力的表现。
  • 提出并验证扩展冻结语言模型能力的方法,超越传统的提示微调。
  • 表明基于冻结LM的组件在具有挑战性的领域中能够匹配或超过微调方法,同时保持模型的灵活性。
  • 突出在实际部署中使用冻结语言模型的成本和规模化等实际考虑因素。

提出的方法

  • 引入输入依赖的提示微调(ID-PT),使用一个小型提示生成网络生成输入特定的提示。
  • 展示通过将巨型冻结语言模型用作阅读器并对检索到的文档进行再排序的检索增强生成。
  • 开发LM递归方法(文本型和神经型),对冻结LM进行多次推理以从输入中提取更多信息。
  • 在多任务和开放域问答基准上,将冻结LM方法与强基线微调模型进行比较。
  • 提供ID-PT提示生成器和基于跨注意力的提示综合机制的架构与训练细节。

实验结果

研究问题

  • RQ1在大规模多任务设置中,冻结语言模型能否达到甚至超越微调模型?
  • RQ2通过外部组件(提示生成器、再排序器、递归多轮)增强的冻结LM是否在开放域问答中缩小与微调的差距?
  • RQ3在如 Natural Questions 这样的基准测试上,结合冻结阅读器的检索增强生成能够提升到何种程度?
  • RQ4部署基于冻结LM的系统相比微调模型,在实际成本和可扩展性方面的影响如何?

主要发现

Task ClusterT0++ID-PT+J1-Large
Extractive QA28.526.0
Multiple-Choice QA62.862.9
Sentiment84.691.9
Paraphrase identification62.966.8
Topic classification95.495.5
Closed-book QA64.765.1
Sentence completion49.349.6
Structure-to-text57.950.7
Summarization40.035.9
Natural language inference36.033.7
Avg all datasets61.661.9
  • 在冻结的7B J1-Large模型上使用ID-PT,在P3多任务套件上几乎与微调的11B T0++模型相匹配,且在不同任务簇上的表现相当。
  • ID-PT+J1-Large在情感分析和改述任务上获得更高的平均分,而T0++在结构到文本和摘要任务上表现更好。
  • 使用同一检索器(DPR)并结合再排序的冻结J1-Large-7B阅读器在Natural Questions上可超越某些微调的阅读器,并在Spider+BM25检索下有所提升。
  • 在特定检索器设置下,17B冻结的J1-Grande阅读器与重排序结合可达到或超过FiD-Distill和EMDR2基线在Natural Questions上的表现。
  • 对冻结LM进行多次通过的LM递归在闭卷开放域问答中相对于单次通过带来显著提升。
  • 总体结果显示,冻结-LM方法能够与若干强力微调基线相媲美甚至超越,同时保留模型的灵活性。

更好的研究,从现在开始

从论文设计到论文写作,大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成,并经人工编辑审核。