QUICK REVIEW

[논문 리뷰] RLHFless: Serverless Computing for Efficient RLHF

Rui Wei, Hanfei Yu|arXiv (Cornell University)|2026. 02. 26.

Explainable Artificial Intelligence (XAI)인용 수 0

한 줄 요약

RLHFless는 서버리스 컴퓨팅에 기반한 동기식 RLHF를 위한 최초의 확장 가능한 훈련 프레임워크로, 동적 자원 적응, 공유 접두사의 사전 계산, 및 비용 인식적 액터 확장을 통해 무대기 시간과 비용을 줄인다.

ABSTRACT

Reinforcement Learning from Human Feedback (RLHF) has been widely applied to Large Language Model (LLM) post-training to align model outputs with human preferences. Recent models, such as DeepSeek-R1, have also shown RLHF's potential to improve LLM reasoning on complex tasks. In RL, inference and training co-exist, creating dynamic resource demands throughout the workflow. Compared to traditional RL, RLHF further challenges training efficiency due to expanding model sizes and resource consumption. Several RLHF frameworks aim to balance flexible abstraction and efficient execution. However, they rely on serverful infrastructures, which struggle with fine-grained resource variability. As a result, during synchronous RLHF training, idle time between or within RL components often causes overhead and resource wastage. To address these issues, we present RLHFless, the first scalable training framework for synchronous RLHF, built on serverless computing environments. RLHFless adapts to dynamic resource demands throughout the RLHF pipeline, pre-computes shared prefixes to avoid repeated computation, and uses a cost-aware actor scaling strategy that accounts for response length variation to find sweet spots with lower cost and higher speed. In addition, RLHFless assigns workloads efficiently to reduce intra-function imbalance and idle time. Experiments on both physical testbeds and a large-scale simulated cluster show that RLHFless achieves up to 1.35x speedup and 44.8% cost reduction compared to the state-of-the-art baseline.

연구 동기 및 목표

모델 크기와 자원 수요가 증가하는 가운데 RLHF 훈련 효율을 개선하고자 함.
서버풀 RLHF 인프라의 유휴 시간 및 자원 변동성에서 오는 비효율을 다룸.
동적 RLHF 작업 부하에 적응하는 확장 가능한 서버리스 프레임워크 도입.
중복 계산을 줄이고 작업 부하의 균형을 맞춰 유휴 시간과 불균형을 최소화.

제안 방법

동적 자원 수요를 처리하기 위해 RLHF 파이프라인을 서버리스 환경에 적응시킴.
RLHF 워크플로우에서 반복 계산을 피하기 위해 공유 접두사를 사전 계산.
응답 길이 변화에 대응하는 비용 인식적 액터 확장 전략을 구현.
내부 함수의 불균형과 유휴 시간을 줄이기 위한 효율적 작업 부하 할당.
물리적 테스트베드와 대규모 시뮬레이션 클러스터에서 RLHFless를 평가.

실험 결과

연구 질문

RQ1동기식 RLHF 훈련을 효율적으로 실행하기 위해 서버리스 컴퓨팅을 어떻게 활용할 수 있는가?
RQ2유휴 시간과 비용을 줄이는 기술(예: 사전 계산, 비용 인식 확장)은 무엇인가?
RQ3자원 조건이 달라질 때 상태-오브-더-아트 벤치마크와 비교했을 때 RLHFless의 속도와 비용 성능은 어떤가?
RQ4RLHF 파이프라인에서 내부 함수 불균형을 최소화하는 작업 부하 관리 전략은 무엇인가?

주요 결과

RLHFless는 기준선 대비 최대 1.35x 속도 향상을 달성.
RLHFless는 기준선 대비 비용을 최대 44.8%까지 감소시킴.
프레임워크가 동적 자원 수요에 적응하고 사전 계산 및 작업 부하 균형을 통해 유휴 시간을 줄임.
물리적 테스트베드와 대규모 시뮬레이션 클러스터에서의 실험이 효율성 향상을 검증.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.