QUICK REVIEW

[논문 리뷰] DeepSpeed-Chat: Easy, Fast and Affordable RLHF Training of ChatGPT-like Models at All Scales

Zhewei Yao, Reza Yazdani Aminabadi|arXiv (Cornell University)|2023. 08. 02.

Machine Learning and Data Classification인용 수 10

한 줄 요약

DeepSpeed-Chat을 소개하는 단일 오픈 소스 RLHF 학습 파이프라인과 Hybrid Engine으로, 수십억 매개변수 규모를 포함한 매우 큰 규모의 ChatGPT-와 같은 모델을 단일 GPU 또는 다중 노드 클러스터에서 쉽고 빠르며 비용 효율적으로 학습시킵니다.

ABSTRACT

ChatGPT-like models have revolutionized various applications in artificial intelligence, from summarization and coding to translation, matching or even surpassing human performance. However, the current landscape lacks an accessible, efficient, and cost-effective end-to-end RLHF (Reinforcement Learning with Human Feedback) training pipeline for these powerful models, particularly when training at the scale of billions of parameters. This paper introduces DeepSpeed-Chat, a novel system that democratizes RLHF training, making it accessible to the AI community. DeepSpeed-Chat offers three key capabilities: an easy-to-use training and inference experience for ChatGPT-like models, a DeepSpeed-RLHF pipeline that replicates the training pipeline from InstructGPT, and a robust DeepSpeed-RLHF system that combines various optimizations for training and inference in a unified way. The system delivers unparalleled efficiency and scalability, enabling training of models with hundreds of billions of parameters in record time and at a fraction of the cost. With this development, DeepSpeed-Chat paves the way for broader access to advanced RLHF training, even for data scientists with limited resources, thereby fostering innovation and further development in the field of AI.

연구 동기 및 목표

ChatGPT-와 같은 모델에 대한 RLHF 학습을 대중화하기 위한 접근 가능한 엔드투엔드 파이프라인 제공.
데이터 추상화 및 혼합을 통해 InstructGPT의 3단계 RLHF 워크플로우(SFT, Reward Model Fine-tuning, RLHF) 재현.
학습과 추론 모두를 최적화하는 통합적이고 효율적인 시스템(Hybrid Engine) 제공으로 RLHF 작업 부하를 다룸.
수십억에서 수십0억 매개변수 규모의 다양한 모델에서 확장성과 비용 효율성을 시연.

제안 방법

미리 학습된 HuggingFace 모델에서 InstructGPT-유사 학습까지 세 가지 RLHF 단계 전체를 실행하는 사용하기 쉬운 스크립트를 제공.
SFT, Reward Model Fine-tuning, RLHF를 데이터 추상화 및 혼합 기능과 함께 따르는 DeepSpeed-RLHF 파이프라인 구현.
빠른 추론 커널, 텐서 병렬성, ZeRO 기반 메모리 최적화, LoRA를 결합한 Efficient RLHF 학습 및 생성을 위한 DeepSpeed-Hybrid Engine(DeepSpeed-HE) 개발.
EMA 체크포인트 및 Mixture Training을 도입하여 최종 모델 품질 향상 및 프리트레이닝 능력 유지.
재사용 가능한 엔진과 PPO 트레이너를 통한 RLHF 파이프라인 맞춤형 API를 제공하여 연구 실험 용이성 확보.
단일 GPU 및 다중 노드 설정에서의 개선점을 강조하며 Colossal-AI 및 HuggingFace DDP와의 처리량 및 확장성 벤치마킹 및 비교.

실험 결과

연구 질문

RQ1ChatGPT-와 같은 모델의 RLHF 학습을 다양한 규모에서 접근 가능하고 빠르며 비용 효율적으로 만들 수 있는 방법은 무엇인가?
RQ2큰 액터/보상 모델과 함께 엔드투엔드 RLHF(SFT, RM 튜닝, RLHF)를 가능하게 하는 시스템 설계 및 최적화는 무엇인가?
RQ3유니파이드 Hybrid Engine가 생성 및 학습 단계에서 기존 프레임워크와 비교해 어떤 성능을 보이는가?
RQ4수십억에서 수십0억 매개변수에 이르는 모델을 학습할 때 실제 비용, 시간 및 확장성 이점은 무엇인가?
RQ5유연한 API를 통해 RLHF 파이프라인을 사용자 정의하고 새로운 RLHF 전략을 탐구할 수 있는가?

주요 결과

DeepSpeed-HE는 보고된 설정에서 기존 시스템보다 RLHF 학습 속도에서 15배 이상 빠르다.
단일 노드 8x A100-40G로 Azure에서 OPT-13B를 9시간, OPT-30B를 18시간 학습시킬 수 있으며 각각 비용은 약 $300 및 $600 이하.
다중 노드 64x A100-80G로 OPT-13B를 1.25시간, OPT-175B를 20시간 학습시키고 비용은 최대 약 $5120까지 발생.
DeepSpeed-HE는 13B에서 175B까지의 매개변수 크기의 모델 학습을 확장 가능한 하드웨어에서 가능하게 하며 >13B 모델의 경우 단일 GPU 지원도 가능.
비교 결과 DeepSpeed-HE가 Colossal-AI 대비 6–19x, HuggingFace DDP 대비 1.4–10.5x의 속도 향상을 제공하며, 하드웨어당 실제로는 최대 7.5x 더 큰 모델 크기가 가능하다는 것을 보여줌.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.