QUICK REVIEW

[논문 리뷰] Efficient Large Language Models: A Survey

Zhongwei Wan, Xin Wang|arXiv (Cornell University)|2023. 12. 06.

Topic Modeling인용 수 23

한 줄 요약

모델 중심, 데이터 중심, 프레임워크 중심 접근 방식으로 정리된 효율적 LLM에 대한 체계적 조사이며, 관련 연구를 모아 둔 GitHub 리소스를 유지합니다.

ABSTRACT

Large Language Models (LLMs) have demonstrated remarkable capabilities in important tasks such as natural language understanding and language generation, and thus have the potential to make a substantial impact on our society. Such capabilities, however, come with the considerable resources they demand, highlighting the strong need to develop effective techniques for addressing their efficiency challenges. In this survey, we provide a systematic and comprehensive review of efficient LLMs research. We organize the literature in a taxonomy consisting of three main categories, covering distinct yet interconnected efficient LLMs topics from model-centric, data-centric, and framework-centric perspective, respectively. We have also created a GitHub repository where we organize the papers featured in this survey at https://github.com/AIoT-MLSys-Lab/Efficient-LLMs-Survey. We will actively maintain the repository and incorporate new research as it emerges. We hope our survey can serve as a valuable resource to help researchers and practitioners gain a systematic understanding of efficient LLMs research and inspire them to contribute to this important and exciting field.

연구 동기 및 목표

모델 중심, 데이터 중심, 프레임워크 중심 관점에서 효율적 LLM 연구의 전체적 분류 체계를 제공한다.
LLM의 학습, 추론, 배포의 효율성 향상을 위한 핵심 기술을 요약한다.
효율성과 확장성에 영향을 미치는 데이터 및 프레임워크 고려사항을 강조한다.
관련 논문의 커뮤니티가 관리하는 참고 저장소를 제공한다.

제안 방법

모델 중심, 데이터 중심, 프레임워크 중심의 효율성 주제 3분류를 제안한다.
각 카테고리 내의 기법을 검토한다(예: 압축, 사전 학습, 미세 조정, 추론, 아키텍처; 데이터 선택; 프롬프트 설계; 특수 프레임워크).
결과를 구조적 개요로 종합하고 논문 모음을 위한 GitHub 리소스를 제공한다.

실험 결과

연구 질문

RQ1LLM을 더 효율적으로 만들기 위한 주요 모델 중심 접근법은 무엇인가(양자화, 사전 학습, 미세 조정, 추론, 아키텍처)?
RQ2데이터 선택, 프롬프트 작성 등의 데이터 중심 전략이 LLM의 효율성 향상에 기여하는가?
RQ3효율적 LLM 개발 및 배치에 특화된 프레임워크 수준 도구와 프레이워크는 무엇인가?
RQ4이러한 효율성 기술이 대규모 모델에 대한 트레이드오프와 실용적 영향은 무엇인가?
RQ5연구자들이 유지 관리되는 저장소를 통해 효율적 LLM 문헌을 어떻게 효과적으로 탐색할 수 있는가?

주요 결과

모델	매개변수 크기	데이터 규모	GPU 비용	훈련 시간
GPT-3 (Brown et al., 2020)	175B	300B tokens	-	-
GPT-NeoX-20B (Black et al., 2022)	20B	825GB corpus	96 A100-40G	-
OPT (Zhang et al., 2022a)	175B	180B tokens	992 A100-80G	-
BLOOM (Scao et al., 2022)	176B	366B tokens	384 A100-80G	105 days
GLM (Zeng et al., 2022)	130B	400B tokens	786 A100-40G	60 days
LLaMA (Touvron et al., 2023a)	65B	1.4T tokens	2048 A100-80G	21 days
LLaMA-2 (Touvron et al., 2023b)	70B	2T tokens	A100-80G	71,680 GPU days
Gopher (Rae et al., 2021)	280B	300B tokens	1024 A100	13.4 days
LaMDA (Thoppilan et al., 2022)	137B	768B tokens	1024 TPU-v3	57.7 days
GLaM (Du et al., 2022)	1200B	280B tokens	1024 TPU-v4	574 hours
PanGu-alpha (Zeng et al., 2021)	13B	1.1TB corpus	2048 Ascend 910	-
PanGu-sum (Ren et al., 2023b)	1085B	329B tokens	512 Ascend 910	100 days
PaLM (Chowdhery et al., 2022)	540B	780B tokens	6144 TPU-v4	-
PaLM-2 (Anil et al., 2023)	-	3.6T tokens	TPUv4	-
WeLM (Su et al., 2022b)	10B	300B tokens	128 A100-40G	24 days
Flan-PaLM (Chung et al., 2022)	540B	-	512 TPU-v4	37 hours
AlexaTM (Soltan et al., 2022)	20B	1.3 tokens	128 A100	120 days
Codegeex (Zheng et al., 2023)	13B	850 tokens	1536 Ascend 910	60 days
MPT-7B (Team, 2023)	7B	1T tokens	-	-

본 조사는 모델 중심, 데이터 중심, 프레임워크 중심 관점에서 효율적 LLM 연구의 전체적 분류 체계를 제시한다.
양자화, 가지치기, 저랭크 근사, 지식 증류 등 모델 압축; 데이터 선택 및 프롬프트 설계 등 데이터 중심 효율성; 효율적 학습 및 서비스 제공을 위한 특수 프레임워크 등 다양한 기법을 강조한다.
논문은 효율성 연구가 알고리즘적, 시스템 수준, 데이터 고려사항을 아우르고 관련 문헌을 조직·유지하기 위한 GitHub 저장소를 제공한다고 강조한다.
더 큰 모델은 더 높은 성능을 가져오지만 자원 요구가 크게 늘어나므로 포괄적 효율성 전략의 필요성을 촉구한다.
대표적 사전 학습 비용과 모델 특성을 모아 효율성 필요를 다양한 유명 LLM에 걸쳐 맥락화한다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.