QUICK REVIEW

[논문 리뷰] Toward Sustainable GenAI using Generation Directives for Carbon-Friendly Large Language Model Inference

Baolin Li, Yankai Jiang|arXiv (Cornell University)|2024. 03. 19.

Topic Modeling인용 수 5

한 줄 요약

Sprout는 생성 지시문과 선형 계획 최적화를 도입하여 LLM 추론에서 생성 길이를 안내하고 품질을 보존하면서 탄소 배출을 줄이며, Llama2 13B를 사용한 실제 테스트에서 40%가 넘는 배출 감소를 달성한다.

ABSTRACT

The rapid advancement of Generative Artificial Intelligence (GenAI) across diverse sectors raises significant environmental concerns, notably the carbon emissions from their cloud and high performance computing (HPC) infrastructure. This paper presents Sprout, an innovative framework designed to address these concerns by reducing the carbon footprint of generative Large Language Model (LLM) inference services. Sprout leverages the innovative concept of "generation directives" to guide the autoregressive generation process, thereby enhancing carbon efficiency. Our proposed method meticulously balances the need for ecological sustainability with the demand for high-quality generation outcomes. Employing a directive optimizer for the strategic assignment of generation directives to user prompts and an original offline quality evaluator, Sprout demonstrates a significant reduction in carbon emissions by over 40% in real-world evaluations using the Llama2 LLM and global electricity grid data. This research marks a critical step toward aligning AI technology with sustainable practices, highlighting the potential for mitigating environmental impacts in the rapidly expanding domain of generative artificial intelligence.

연구 동기 및 목표

GenAI 추론의 환경 영향에 대한 동기 부여 및 정량화와 모델 크기 감소를 넘어선 탄소 배출 감소의 기회 식별.
생성 지시문을 토큰 생성 및 배출에 영향을 주는 새로운 제어 메커니즘으로 도입.
다양한 그리드 탄소 강도 하에서 탄소 절감과 유지된 생성 품질 사이의 균형을 맞추는 시스템 전반의 최적화기 개발.
Llama2 13B 및 실제 전력망 데이터를 사용하여 탄소 감소를 시연하고 출력 품질을 유지하는 프로토타입 Sprout를 구현.

제안 방법

자 autoregressive LLM 추론 중 토큰 생성을 제약하거나 안내하는 생성 지시문 수준을 정의합니다.
프롬프트 전반에 걸친 지시문 수준 확률을 선택하여 추론당 예상 탄소 배출을 최소화하는 선형 계획 최적화 문제를 형식화합니다.
오프라인 품질 평가기를 자동 평가 LLM으로 사용하여 품질 선호 벡터를 생성하고 지시문 사용을 제약하는 방법을 포함합니다.
저탄소 강도 기간에 평가를 촉발하는 오프라인 품질 평가 일정으로 추가 배출을 최소화하는 기회를 포착합니다.
시스템 프롬프트를 통한 지시문 및 CarbonTracker를 통한 로깅으로 Sprout를 기존 추론 서버에 통합하고 최적화기의 e와 p 벡터를 계산합니다.

Figure 1: The auto-regressive generation process of generative language model inference.

실험 결과

연구 질문

RQ1생성된 토큰 수가 모델 크기와 무관하게 LLM 추론의 탄소 발자국에 어떤 영향을 미치는가?
RQ2생성 지시문이 토큰 생성을 유도하여 배출을 줄이되 다양한 작업에서 생성 품질에 심각한 악화를 초래하지 않는가?
RQ3시스템 전반의 확률적 지시문 정책 접근이 프롬프트별 최적 지시문을 근접하게 근사하면서도 고처리량 환경에서 실용적인가?
RQ4그리드 탄소 강도에 적응하는 탄소 인식 최적화가 배출 제약 하에서 품질 유지를 얼마나 효과적으로 달성하는가?

주요 결과

생성 지시문은 고품질 출력을 유지하면서 토큰 생성 길이를 단축해 탄소 절감을 가능하게 한다.
Llama2 13B 모델에 간결한 지시문(L1)을 사용하면 기저선이 있는 7B보다 탄소 효율성 및 정확도 측면에서 더 나은 성능을 보일 수 있다.
Sprout는 Llama2 13B 및 전 세계 전력망 데이터를 사용한 실제 평가에서 추론 배출을 40% 이상 감소시킨다.
최적화 문제는 선형이며 HiGHS 이중 단순법 솔버를 사용해 시스템 전반의 지시문 확률을 계산할 수 있다.
품질 피드백은 오프라인 자동 평가 LLM을 통해 얻으며, 온라인 추론에 지연 없이 제약 기반 최적화를 가능하게 한다.

Figure 2: Two factors that impact a request’s carbon footprint during LLM inference: (a) the number of model parameters and (b) the number of generated tokens.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.