QUICK REVIEW

[논문 리뷰] HyperPrompt: Prompt-based Task-Conditioning of Transformers

Yun He, Huaixiu Zheng|arXiv (Cornell University)|2022. 03. 01.

Topic Modeling인용 수 29

한 줄 요약

HyperPrompt는 HyperNetwork가 생성한 하이퍼-프롬프트 세트를 self-attention에 주입하여 Transformer를 태스크에 따라 조건화하고, 최소한의 추가 매개변수로 강력한 다중태스크 성능과 좋은 효율성을 달성합니다. 완전히 미세조정했을 때 GLUE/SuperGLUE에서 모델 크기에 관계없이 여러 기준선을 능가합니다.

ABSTRACT

Prompt-Tuning is a new paradigm for finetuning pre-trained language models in a parameter-efficient way. Here, we explore the use of HyperNetworks to generate hyper-prompts: we propose HyperPrompt, a novel architecture for prompt-based task-conditioning of self-attention in Transformers. The hyper-prompts are end-to-end learnable via generation by a HyperNetwork. HyperPrompt allows the network to learn task-specific feature maps where the hyper-prompts serve as task global memories for the queries to attend to, at the same time enabling flexible information sharing among tasks. We show that HyperPrompt is competitive against strong multi-task learning baselines with as few as $0.14\%$ of additional task-conditioning parameters, achieving great parameter and computational efficiency. Through extensive empirical experiments, we demonstrate that HyperPrompt can achieve superior performances over strong T5 multi-task learning baselines and parameter-efficient adapter variants including Prompt-Tuning and HyperFormer++ on Natural Language Understanding benchmarks of GLUE and SuperGLUE across many model sizes.

연구 동기 및 목표

하이퍼프롬프트를 도입한 프롬프트 기반의 태스크-조건화 트랜스포머 아키텍처를 다중태스크 학습에 적용한다.
self-attention에 주입된 하이퍼프롬프트가 효율적 매개변수 사용으로 태스크별 메모리를 제공함을 보인다.
HyperNet이 생성한 프롬프트가 유연한 태스크 공유와 Pareto 효율적 성능을 가능하게 한다.
GLUE와 SuperGLUE에서 T5 모델 크기에 걸쳐 다중태스크 기준선 및 매개변수 효율적 어댑터와 비교한다.
난이도 높은 태스크에서 모든 매개변수 튜닝과 태스크-특정 매개변수 튜닝 간의 트레이드오프를 강조한다.

제안 방법

각 Transformer 블록에서 키와 값에 대해 l개의 학습 가능한 하이퍼프롬프트를 다중헤드 self-attention에 주입한다.
글로벌 태스크 프롬프트(HyperPrompt-Global)에서 계층별 및 태스크별 하이퍼프롬프트를 생성하거나 태스크-특정 로컬 프롬프트(HyperPrompt-Share/Sep)로부터 생성하기 위해 하이퍼네트워크를 사용한다.
HyperPrompt-Global의 경우, 계층 인식 태스크 임베딩에 조건된 글로벌 하이퍼네트워크를 통해 P^m_{τ,k}와 P^m_{τ,v}를 얻기 위한 프로젝션 매트릭스를 생성한다.
매개변수 증가를 제한하기 위해 로컬 하이퍼네트워크에서 병목 구조를 채택한다(D, U 다운/업 프로젝션).
전체 미세조정과 오직 태스크-조건 매개변수 튜닝만의 효율성(Pareto)과 모델 성능을 평가한다.
GLUE와 SuperGLUE에서 T5 변형을 사용하여 MTL 기준선, Vanilla Adapter, HyperFormer++, Prompt-Tuning과 비교한다.

실험 결과

연구 질문

RQ1Can HyperNetworks-generated hyper-prompts injected into self-attention outperform standard multi-task learning and parameter-efficient adapters on GLUE/SuperGLUE?
RQ2How does HyperPrompt-Global compare to HyperPrompt-Share/Sep in terms of performance, parameter efficiency, and information sharing across tasks?
RQ3Does fine-tuning the entire model yield better Pareto efficiency for hard tasks like SuperGLUE compared to tuning only task-conditioned parameters?
RQ4What is the impact of hyper-prompt length and where in the encoder/decoder to insert prompts on performance and efficiency?
RQ5How does the proposed approach scale with model size and number of tasks?

주요 결과

HyperPrompt-Global achieves state-of-the-art performance on SuperGLUE for T5 models up to XXL.
HyperPrompt-Global outperforms HyperFormer++ and MTL baselines with as little as 0.14% additional parameters.
Full fine-tuning with HyperPrompt-Global yields larger gains on SuperGLUE than tuning only task-specific parameters.
HyperPrompt-Global provides a favorable balance of lower compute (FLOPs) and competitive accuracy compared to adapters and Prompt-Tuning.
HyperPrompt-Global consistently outperforms baselines across T5 Base and Large in GLUE/SuperGLUE benchmarks.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.