QUICK REVIEW

[논문 리뷰] Towards a Unified View of Parameter-Efficient Transfer Learning

Junxian He, Chunting Zhou|arXiv (Cornell University)|2021. 10. 08.

Topic Modeling참고 문헌 36인용 수 279

한 줄 요약

논문은 최첨단 매개변수 효율적 전이 학습 방법들을 고정된 사전학습 모델의 은닉 상태 수정으로 재구성하고, 여러 NLP 태스크에서 전체 미세조정 성능에 필적하는 새로운 변형들을 훨씬 더 적은 조정 가능한 매개변수로 보여준다.

ABSTRACT

Fine-tuning large pre-trained language models on downstream tasks has become the de-facto learning paradigm in NLP. However, conventional approaches fine-tune all the parameters of the pre-trained model, which becomes prohibitive as the model size and the number of tasks grow. Recent work has proposed a variety of parameter-efficient transfer learning methods that only fine-tune a small number of (extra) parameters to attain strong performance. While effective, the critical ingredients for success and the connections among the various methods are poorly understood. In this paper, we break down the design of state-of-the-art parameter-efficient transfer learning methods and present a unified framework that establishes connections between them. Specifically, we re-frame them as modifications to specific hidden states in pre-trained models, and define a set of design dimensions along which different methods vary, such as the function to compute the modification and the position to apply the modification. Through comprehensive empirical studies across machine translation, text summarization, language understanding, and text classification benchmarks, we utilize the unified view to identify important design choices in previous methods. Furthermore, our unified framework enables the transfer of design elements across different approaches, and as a result we are able to instantiate new parameter-efficient fine-tuning methods that tune less parameters than previous methods while being more effective, achieving comparable results to fine-tuning all parameters on all four tasks.

연구 동기 및 목표

Break down and connect existing parameter-efficient tuning methods.
Identify design elements critical for effectiveness across tasks.
Propose a unified framework to transfer design choices between methods.
Instantiate and evaluate new variants that use fewer parameters while maintaining performance.

제안 방법

Reframe parameter-efficient tuning methods as modifications to hidden representations in frozen pretrained language models.
Define design dimensions: functional form of the modification, modification position, and integration/composition with the original representation.
Show equivalences (e.g., prefix tuning and adapters) and introduce variants such as multi-head parallel adapters and scaled parallel adapters.
Instantiate new methods by transferring design elements across approaches and evaluate on multiple NLP tasks.

실험 결과

연구 질문

RQ1How are parameter-efficient tuning methods connected within a unified framework?
RQ2What design elements are essential for the effectiveness of these methods?
RQ3Can useful ingredients be transferred across methods to create better variants?
RQ4Do new variants outperform existing approaches under various resource budgets?

주요 결과

Existing methods provide competitive results with less than 1% tuned parameters on some tasks, but gaps remain on higher-resource tasks like XSum and en-ro MT.
Parallel insertion (as in prefix tuning) generally outperforms sequential adapters, and parallel adapters often beat sequential ones.
FFN modifications consistently outperform attention modifications when parameter budgets are larger, suggesting allocating more budget to FFN changes.
A multi-head parallel adapter (MH PA) and a Mix-And-Match adapter (MAM Adapter) achieve strong performance and can match full fine-tuning while tuning around 6.7% of parameters on XSum and MT, and around 0.5% on MNLI/SST2.
Scaling and combining design elements (e.g., prefix tuning with FFN-focused scaling) yields state-of-the-art results within the unified framework.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.