QUICK REVIEW

[논문 리뷰] LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attention

Renrui Zhang, Jiaming Han|arXiv (Cornell University)|2023. 03. 28.

Multimodal Machine Learning Applications인용 수 167

한 줄 요약

LLaMA-Adapter가 LLaMA를 고정시키고 zero-initialized attention로 1.2M 어댑터를 학습하여 Alpaca에 비해 지시 이행 성능을 달성하고, 8 A100 GPU에서 약 1시간의 미세 조정과 다중 모달 작업 지원을 제공한다.

ABSTRACT

We present LLaMA-Adapter, a lightweight adaption method to efficiently fine-tune LLaMA into an instruction-following model. Using 52K self-instruct demonstrations, LLaMA-Adapter only introduces 1.2M learnable parameters upon the frozen LLaMA 7B model, and costs less than one hour for fine-tuning on 8 A100 GPUs. Specifically, we adopt a set of learnable adaption prompts, and prepend them to the word tokens at higher transformer layers. Then, a zero-initialized attention mechanism with zero gating is proposed, which adaptively injects the new instructional cues into LLaMA, while effectively preserves its pre-trained knowledge. With our efficient training, LLaMA-Adapter can generate high-quality responses, comparable to Alpaca with fully fine-tuned 7B parameters. Besides language commands, our approach can be simply extended to multi-modal instructions for learning image-conditioned LLaMA model, which achieves superior reasoning performance on ScienceQA and COCO Caption benchmarks. Furthermore, we also evaluate the zero-initialized attention mechanism for fine-tuning other pre-trained models (ViT, RoBERTa) on traditional vision and language tasks, demonstrating the superior generalization capacity of our approach. Code is released at https://github.com/OpenGVLab/LLaMA-Adapter.

연구 동기 및 목표

효율적으로 LLaMA를 지시 이행 모델로 미세 조정하기 (전체 매개변수 업데이트 없이).
상위 트랜스포머 계층에 삽입된 소수의 학습 가능한 어댑션 프롬프트를 활용한다.
사전 학습된 지식을 보존하기 위해 학습 가능한 게이팅이 있는 0으로 초기화된 주의와 함께 안정적인 학습을 보장한다.
다중 모달 지시 및 다른 모델 계열로의 확장성을 입증한다.

제안 방법

LLaMA의 가장 상위 L 개의 변환기 층에 학습 가능한 어댑션 프롬프트를 삽입한다.
목표 층마다 입력 토큰 앞에 프롬프트 [P_l]를 [P_l; T_l]로 추가한다.
표준 주의 대신 0으로 초기화된 주의와 게이팅 인자 g_l를 사용하여 어댑테이션 영향력을 제어한다.
학습의 안정화를 위해 어댑션 프롬프트와 단어 토큰에 대해 별도의 소프트맥스 경로를 사용한다.
시각적 조건화를 위한 프롬프트에 이미지 토큰 I_p를 incorporating하여 다중 모달 입력으로 확장한다.
ViT 및 RoBERTa 미세 조정 작업에 0으로 초기화된 주의를 적용하여 일반화를 보여준다.

실험 결과

연구 질문

RQ1저렴한 어댑션이 0으로 초기화된 주의와 함께 지시 이행 작업에서 전체 모델 미세 조정과 동등한 성능을 낼 수 있는가?
RQ2기반 모델을 작은 어댑션으로 고정시키면 학습 효율성과 자원 사용에 어떤 영향이 있는가?
RQ3방법이 텍스트를 넘어서 다중 모달 및 다른 모달리티에도 일반화되는가?
RQ4미세 조정 중 안정성과 최종 성능을 위해 0으로 초기화된 주의가 결정적 요소인가?

주요 결과

1.2M 학습 가능한 매개변수는 7B Alpaca를 완전히 미세 조정한 것에 근접한 지시 이행 성능을 달성하는 데 충분하다.
8 A100 GPU에서 학습 비용은 한 시간 미만이다.
LLaMA-Adapter는 다중 모달 지시를 처리할 수 있으며 ScienceQA 및 COCO Caption에서 경쟁력 있는 결과를 달성한다.
게이팅이 있는 0으로 초기화된 주의는 안정성과 최종 성능을 현저히 향상시키며 (랜덤 이니셜 대비 큰 이득).
이 접근법은 ViT(VTAB-1k) 및 RoBERTa(SQuAD) 작업에 적용했을 때도 강한 일반화를 보인다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.