QUICK REVIEW

[논문 리뷰] Label Supervised LLaMA Finetuning

Zongxi Li, Xianming Li|arXiv (Cornell University)|2023. 10. 02.

Topic Modeling인용 수 20

한 줄 요약

이 논문은 텍스트 분류 및 NER를 위한 LLaMA 모델의 레이블-감독 적응 LSU-LLaMA 및 LS-unLLaMA를 제안하고, 마스킹 제거 시 BERT/RoBERTa 및 심지어 최첨단 NER보다도 강한 이점을 보임을 제시합니다.

ABSTRACT

The recent success of Large Language Models (LLMs) has gained significant attention in both academia and industry. Substantial efforts have been made to enhance the zero- and few-shot generalization capabilities of open-source LLMs through finetuning. Currently, the prevailing approach is instruction-tuning, which trains LLMs to complete real-world tasks by generating responses guided by natural language instructions. It is worth noticing that such an approach may underperform in sequence and token classification tasks. Unlike text generation tasks, classification tasks have a limited label space, where precise label prediction is more appreciated than generating diverse and human-like responses. Prior research has unveiled that instruction-tuned LLMs cannot outperform BERT, prompting us to explore the potential of leveraging latent representations from LLMs for supervised label prediction. In this paper, we introduce a label-supervised adaptation for LLMs, which aims to finetuning the model with discriminant labels. We evaluate this approach with Label Supervised LLaMA (LS-LLaMA), based on LLaMA-2-7B, a relatively small-scale LLM, and can be finetuned on a single GeForce RTX4090 GPU. We extract latent representations from the final LLaMA layer and project them into the label space to compute the cross-entropy loss. The model is finetuned by Low-Rank Adaptation (LoRA) to minimize this loss. Remarkably, without intricate prompt engineering or external knowledge, LS-LLaMA substantially outperforms LLMs ten times its size in scale and demonstrates consistent improvements compared to robust baselines like BERT-Large and RoBERTa-Large in text classification. Moreover, by removing the causal mask from decoders, LS-unLLaMA achieves the state-of-the-art performance in named entity recognition (NER). Our work will shed light on a novel approach to adapting LLMs for various downstream tasks.

연구 동기 및 목표

명령어-미세조정이 시퀀스 및 토큰 분류 작업에서 왜 성능이 떨어질 수 있는지 동기를 부여한다.
latent 표현을 판별 가능한 라벨 공간으로 매핑하는 LLaMA의 레이블-감독 적응을 제안한다.
잠재 LLaMA 표현이 더 큰 생성형 LLM 및 강건한 베이스라인을 여러 벤치마크에서 능가할 수 있음을 보여준다.
NER과 같은 토큰 수준 작업의 성능 향상을 위해 인과 마스크 제거의 영향력을 조사한다.

제안 방법

최종 LLaMA 계층에서 잠재 표현을 추출하고 이를 레이블 공간으로 투영하여 교차 엔트로피 손실을 계산한다.
LoRA를 사용하여 교차 엔트로피 손실을 최소화하도록 미세조정한다.
토큰 작업의 경우 LlamaForTokenClassification을 통해 토큰 수준 분류를 가능하게 하고 필요 시 인과 마스크를 제거하여 양방향 주의력을 가능하게 하는 LS-unLLaMA로 확장한다.
세 가지 풀링 방법(최대, 평균, 마지막)을 실험하고 마스킹이 없는 모델에서 최대 풀링이 가장 잘 작동하는 것을 확인한다.

실험 결과

연구 질문

RQ1LLaMA의 잠재 표현을 텍스트 분류에서 판별 가능한 라벨 예측에 효과적으로 사용할 수 있는가?
RQ2라벨-감독 미세조정이 표준 벤치마크에서 명령-미세조정 및 판별 모델을 능가하는가?
RQ3인과 마스크를 제거하는 것이 NER과 같은 토큰 분류 작업에 미치는 영향은 무엇인가?
RQ4LS-LLaMA와 LS-unLLaMA는 다국어 및 소규모 데이터 환경에서 어떻게 성능을 발휘하는가?
RQ5더 작은 LLaMA-2 모델(7B)이 라벨 감독 하에서 더 큰 판별 모델들을 능가하기에 충분한가?

주요 결과

모델	SST2	AGNews	Twitter Fin	SST5
LS-LLaMA-2-7B	96.67	95.38	91.87	62.31
LS-LLaMA-2-13B	96.90	95.66	91.20	62.17
LS-unLLaMA-2-7B	97.36	95.68	91.54	60.50
LS-unLLaMA-2-13B	92.77	95.44	87.94	52.99

LS-LLaMA-2-7B의 SST2, AGNews, Twitter-Fin, SST5에서 각각 96.67, 95.38, 91.87, 62.31을 달성한다.
LS-LLaMA-2-13B의 SST2, AGNews, Twitter-Fin, SST5에서 각각 96.90, 95.66, 91.20, 62.17을 달성한다.
LS-unLLaMA-2-7B의 SST2, AGNews, Twitter-Fin, SST5에서 각각 97.36, 95.68, 91.54, 60.50를 달성한다.
LS-unLLaMA-2-13B의 SST2, AGNews, Twitter-Fin, SST5에서 각각 92.77, 95.44, 87.94, 52.99를 달성한다.
NER에서 LS-unLLaMA가 LS-LLaMA, BERT, RoBERTa 베이스라인보다 성능이 우수하고, 인과 마스크를 제거했을 때 현저한 이점을 보인다.
decoder에서 인과 마스크를 제거하면 CoNNL2003 및 OntoNotes V5에서 LS-unLLaMA의 최첨단 NER 성능을 달성한다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.