QUICK REVIEW

[논문 리뷰] A Survey of Large Language Models

Wayne Xin Zhao, Kun Zhou|ArXiv.org|2023. 03. 31.

Topic Modeling인용 수 1,375

한 줄 요약

본 조사는 대형 언어 모델(LLMs)의 최근 진전에 대해 배경, 확장 법칙, 출현 능력, 사전 학습, 적응, 활용, 정렬, 평가에 초점을 맞추고, 이용 가능한 자원과 향후 방향을 요약한다.

ABSTRACT

Language is essentially a complex, intricate system of human expressions governed by grammatical rules. It poses a significant challenge to develop capable AI algorithms for comprehending and grasping a language. As a major approach, language modeling has been widely studied for language understanding and generation in the past two decades, evolving from statistical language models to neural language models. Recently, pre-trained language models (PLMs) have been proposed by pre-training Transformer models over large-scale corpora, showing strong capabilities in solving various NLP tasks. Since researchers have found that model scaling can lead to performance improvement, they further study the scaling effect by increasing the model size to an even larger size. Interestingly, when the parameter scale exceeds a certain level, these enlarged language models not only achieve a significant performance improvement but also show some special abilities that are not present in small-scale language models. To discriminate the difference in parameter scale, the research community has coined the term large language models (LLM) for the PLMs of significant size. Recently, the research on LLMs has been largely advanced by both academia and industry, and a remarkable progress is the launch of ChatGPT, which has attracted widespread attention from society. The technical evolution of LLMs has been making an important impact on the entire AI community, which would revolutionize the way how we develop and use AI algorithms. In this survey, we review the recent advances of LLMs by introducing the background, key findings, and mainstream techniques. In particular, we focus on four major aspects of LLMs, namely pre-training, adaptation tuning, utilization, and capacity evaluation. Besides, we also summarize the available resources for developing LLMs and discuss the remaining issues for future directions.

연구 동기 및 목표

통계적 모델에서 트랜스포머 기반으로의 진화와 의의를 요약하고, 방대한 텍스트 데이터로 학습된 수백억 개의 매개변수를 가진 모델로 LLM의 범위를 정의한다.
사전 학습, 적응 조정, 활용, 및 용량 평가의 네 가지 핵심 측면을 종합한다.
확장 법칙, 출현 능력 및 LLM 능력을 가능하게 하는 실용적 기술들을 강조한다.
최신 자원 가이드(예: GitHub 프로젝트)를 제공하고 남아 있는 도전 과제와 향후 방향을 논의한다.

제안 방법

LLMs와 GPT 계열 모델의 배경과 진화를 논의한다.
확장 법칙(KM 확장 법칙 및 찬칠라(Chinchilla) 확장 법칙)을 제시하고 모델/데이터/컴퓨트에 대한 시사점을 설명한다.
LLMs의 출현 능력(맥락 내 학습, 지시 이행, 단계별 추론)과 확장과의 관계를 설명한다.
확장, 분산 학습, 능력 유도, 정렬 튜닝, 도구/플러그인 통합 등 주요 기법을 개요로 제시한다.
실용 자원과 남아 있는 개방형 도전과제를 요약하여 향후 연구 개발을 안내한다.

실험 결과

연구 질문

RQ1이전 선행된 사전 학습된 언어 모델(PLMs)과 비교했을 때 대형 언어 모델의 정의적 특징과 능력은 무엇인가?
RQ2확장 법칙이 모델 크기, 데이터, 컴퓨트와 성능을 어떻게 연결하며, LLM 학습에 대한 실용적 시사점은 무엇인가?
RQ3출현 능력을 이끌어내는 기법과 정렬 및 도구의 활용이 LLM의 유용성과 안전성을 어떻게 향상시키는가?
RQ4LLM 개발, 평가 및 배치를 지원하는 자원(데이터, 도구, 플랫폼)은 무엇이며, 어떤 미래 방향이 제시되는가?

주요 결과

LLMs은 맥락 내 학습, 지시 이행, 단계별 추론과 같은 출현 능력을 보여주며, 이는 더 작고 단순한 모델에는 존재하지 않는다.
두 가지 대표적 확장 법칙(KM 및 찬칠라)이 모델 크기, 데이터 및 컴퓨트가 성능에 미치는 영향과 최적 배분을 설명한다.
LLM의 학습과 배치는 분산 학습 프레임워크, 최적화 요령, 정렬 기법에 의존하며, 사람 피드백을 이용한 강화 학습 및 지시 조정이 포함된다.
외부 도구와 플러그인은 텍스트 생성 외의 기능을 확장하여 최신 정보나 수치 정확도와 같은 한계를 보완한다.
본 고시는 자료를 정리한 GitHub 자원과 지원 자료를 제공하고 데이터 품질, 인간 가치와의 정렬, 해석 가능성과 같은 지속적인 도전 과제에 대해 논의한다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.