QUICK REVIEW

[논문 리뷰] SeedFlood: A Step Toward Scalable Decentralized Training of LLMs

Jihun Kim, Namhoon Lee|arXiv (Cornell University)|2026. 02. 20.

Software-Defined Networks and 5G인용 수 0

한 줄 요약

SeedFlood는 네트워크 전반에 시드-재구성 가능한 제로차 업데이트를 퍼뜨려 토폴로지 불변의 올-가더 합의를 거의 제로에 가까운 통신으로 달성하고, 다수의 업데이트를 효율적으로 집계하기 위한 Sub CGE를 더해 확장 가능한 분산 LLM 미세 조정을 가능하게 한다.

ABSTRACT

This work presents a new approach to decentralized training-SeedFlood-designed to scale for large models across complex network topologies and achieve global consensus with minimal communication overhead. Traditional gossip-based methods suffer from message communication costs that grow with model size, while information decay over network hops renders global consensus inefficient. SeedFlood departs from these practices by exploiting the seed-reconstructible structure of zeroth-order updates and effectively making the messages near-zero in size, allowing them to be flooded to every client in the network. This mechanism makes communication overhead negligible and independent of model size, removing the primary scalability bottleneck in decentralized training. Consequently, SeedFlood enables training in regimes previously considered impractical, such as billion-parameter models distributed across hundreds of clients. Our experiments on decentralized LLM fine-tuning demonstrate thatSeedFlood consistently outperforms gossip-based baselines in both generalization performance and communication efficiency, and even achieves results comparable to first-order methods in large scale settings.

연구 동기 및 목표

모델 크기와 네트워크 토폴로지에 따라 확장 가능한 분산 학습의 필요성을 제시한다.
모델 차원에 의존하지 않는 시드 재구성 가능한 업데이트를 사용하여 주요 통신 병목을 제거한다.
한 번의 반복에 다수의 업데이트를 처리하기 위한 계산 효율적인 집계 메커니즘을 개발한다.
대형 모델과 네트워크에서 경쟁력 있는 성능을 유지하며 경험적 확장성을 입증한다.

제안 방법

공유 RNG를 사용해 제로차 업데이트를 시드-스칼라 쌍으로 표현하고 섭 perturbations를 재구성한다.
가십 대신 홍수(flooding)를 사용해 네트워크 전반으로 각 제로차 업데이트를 전파한다.
Subspace Canonical-basis Gradient Estimation(Sub CGE)을 도입해 저랭크 부분공간에서 다수의 업데이트를 효율적으로 집계한다.
레이어별 글로벌 저랭크 서브스페이스(U, V)를 사용해 순차적으로의 업데이트를 O(n + r d)의 계산으로 가능하게 한다.
서브스페이스를 주기적으로 재초기화하고 네트워크 직경에 해당하는 단계만큼 업데이트를 홍수시키는 알고리즘 개요(Seed Flood)를 제공한다.

Figure 1 : Task performance vs. Total communication cost plot of different decentralized training methods. S eed F lood ( $\bigstar$ ) is extremely efficient—with $10^{2}$ – $10^{7}\times$ less communication bytes–while maintaining a reasonable performance level compared to its rivals and strong-but

실험 결과

연구 질문

RQ1분모 없이 모델 크기에 의존하지 않는 통신 비용으로 어떻게 분산 학습을 수십억 매개변수 모델까지 확장할 수 있는가?
RQ2시드 재구성 가능한 제로차 업데이트가 가십 대신 홍수를 통해 임의의 네트워크 토폴로지에서 완전한 합의를 가능하게 할 수 있는가?
RQ3다양한 업데이트의 대량 집계를 효율적으로 수행하기 위해 어떤 계산 기법이 필요한가?
RQ4SeedFlood가 실제 대규모 LLM 미세 조정에서 일반화 및 통신 효율성 측면에서 어떻게 성능을 발휘하는가?

주요 결과

SeedFlood는 시드 기반 업데이트를 홍수로 확산시킴으로써 모델 크기에 무관한 거의 제로에 가까운 통신 비용을 달성한다.
홍수 방식은 토폴로지 불변의 올가더(all-gather) 유사 합의를 제공하여 거리 기반 합의 저하를 완화한다.
Sub CGE는 집계 비용을 O(nd)에서 O(n + rd)로 줄여 다수 업데이트의 확장을 가능하게 한다.
실험 결과 SeedFlood는 일반화 및 통신 효율성 측면에서 가십 기반 베이스라인을 앞지르며 대형 규모에서 1차 순서 방법과 견줄 만한 성능을 보인다.
OPT 모델을 사용한 16-에서 128-클라이언트 실험에서 SeedFlood는 토폴로지 변경에 견고하고 1차 순서 가십 베이스라인보다 확장성이 우수하다.

Figure 2 : Consensus dynamics of a single gradient under gossip-based model averaging (a) and flooding-based gradient dissemination (b). In gossip, time-varying gradient coefficients induce prohibitive aggregation cost. In contrast, flooding propagates each gradient with a fixed coefficient, without

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.