QUICK REVIEW

[论文解读] SeedFlood: A Step Toward Scalable Decentralized Training of LLMs

Jihun Kim, Namhoon Lee|arXiv (Cornell University)|Feb 20, 2026

Software-Defined Networks and 5G被引用 0

一句话总结

SeedFlood 通过涌 flooding 的种子重建的一阶更新在网络中实现拓扑不变的全收集共识，通信几乎为零，并辅以 Sub CGE 以高效聚合大量更新，从而实现可扩展的分布式大语言模型微调。

ABSTRACT

This work presents a new approach to decentralized training-SeedFlood-designed to scale for large models across complex network topologies and achieve global consensus with minimal communication overhead. Traditional gossip-based methods suffer from message communication costs that grow with model size, while information decay over network hops renders global consensus inefficient. SeedFlood departs from these practices by exploiting the seed-reconstructible structure of zeroth-order updates and effectively making the messages near-zero in size, allowing them to be flooded to every client in the network. This mechanism makes communication overhead negligible and independent of model size, removing the primary scalability bottleneck in decentralized training. Consequently, SeedFlood enables training in regimes previously considered impractical, such as billion-parameter models distributed across hundreds of clients. Our experiments on decentralized LLM fine-tuning demonstrate thatSeedFlood consistently outperforms gossip-based baselines in both generalization performance and communication efficiency, and even achieves results comparable to first-order methods in large scale settings.

研究动机与目标

推动可扩展到模型规模与网络拓扑的分布式训练。
通过使用与模型维度无关的种子可重构更新来消除主要的通信瓶颈。
Develop a computation-efficient aggregation mechanism to handle many updates per iteration.
Demonstrate empirical scalability to large models and networks while maintaining competitive performance.

提出的方法

将 zeroth-order 更新表示为种子–标量对，使用共享 RNG 重构扰动。
用涌 flooding 替代 gossip，在网络中全局传播每个 zeroth-order 更新。
引入 Subspace Canonical-basis Gradient Estimation (Sub CGE) 在低秩子空间中高效聚合大量更新。
使用层级全局低秩子空间 (U, V) 以实现秩-1 风格的更新，计算量为 O(n + r d) 每次迭代。
提供一个算法大纲（Seed Flood），定期重新初始化子空间并在网络直径数量的步内进行涌 flooding 更新。

Figure 1 : Task performance vs. Total communication cost plot of different decentralized training methods. S eed F lood ( $\bigstar$ ) is extremely efficient—with $10^{2}$ – $10^{7}\times$ less communication bytes–while maintaining a reasonable performance level compared to its rivals and strong-but

实验结果

研究问题

RQ1如何在模型尺寸无关的通信成本下，使去中心化训练扩展到十亿参数的模型？
RQ2种子可重构的一阶更新能否通过涌 flooding 实现跨任意网络拓扑的完美共识，而非 gossip？
RQ3需要哪些计算技术才能高效聚合大量基于种子的更新？
RQ4在大规模LLM微调中，SeedFlood 在泛化与通信效率方面的实际表现如何？

主要发现

SeedFlood 通过涌 flooding 的种子更新实现与模型规模无关的近零通信成本。
涌 flooding 提供拓扑不变的全收集式共识，减轻基于距离的共识退化。
Sub CGE 将聚合开销从 O(nd) 降至 O(n + rd)，实现对大量更新的可扩展处理。
实证结果显示 SeedFlood 在泛化和通信效率方面优于基于 gossip 的基线，在大规模设置中接近一阶方法的表现。
在针对 OPT 模型的 16- 到 128-客户端实验中，SeedFlood 对拓扑变化具有鲁棒性，且比一阶 gossip 基线具有更好的扩展性。

Figure 2 : Consensus dynamics of a single gradient under gossip-based model averaging (a) and flooding-based gradient dissemination (b). In gossip, time-varying gradient coefficients induce prohibitive aggregation cost. In contrast, flooding propagates each gradient with a fixed coefficient, without

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。