QUICK REVIEW

[论文解读] A Fundamental Tradeoff between Computation and Communication in Distributed Computing

Songze Li, Mohammad Ali Maddah-Ali|arXiv (Cornell University)|Apr 24, 2016

Stochastic Gradient Optimization Techniques参考文献 42被引用 29

一句话总结

本文通过引入编码分布式计算（CDC），在分布式计算中提出了一个根本性的计算-通信权衡。通过增加Map计算的冗余性，CDC利用编码技术减少通信负载，使通信负载降低为原来的1/r，与信息论下限完全匹配，从而精确刻画了最优权衡。

ABSTRACT

How can we optimally trade extra computing power to reduce the communication load in distributed computing? We answer this question by characterizing a fundamental tradeoff between computation and communication in distributed computing, i.e., the two are inversely proportional to each other. More specifically, a general distributed computing framework, motivated by commonly used structures like MapReduce, is considered, where the overall computation is decomposed into computing a set of "Map" and "Reduce" functions distributedly across multiple computing nodes. A coded scheme, named "Coded Distributed Computing" (CDC), is proposed to demonstrate that increasing the computation load of the Map functions by a factor of $r$ (i.e., evaluating each function at $r$ carefully chosen nodes) can create novel coding opportunities that reduce the communication load by the same factor. An information-theoretic lower bound on the communication load is also provided, which matches the communication load achieved by the CDC scheme. As a result, the optimal computation-communication tradeoff in distributed computing is exactly characterized. Finally, the coding techniques of CDC is applied to the Hadoop TeraSort benchmark to develop a novel CodedTeraSort algorithm, which is empirically demonstrated to speed up the overall job execution by $1.97 imes$ - $3.39 imes$, for typical settings of interest.

研究动机与目标

为解决分布式计算中数据洗牌阶段的高通信开销问题，特别是在MapReduce和Spark等框架中。
探究编码技术是否能在不增加网络带宽的前提下减少通信负载。
刻画分布式系统中计算负载（Map阶段）与通信负载（洗牌阶段）之间的根本性权衡。
设计一种编码方案，实现计算与通信之间的最优平衡。
通过在Hadoop TeraSort等真实世界基准上的实现，验证所提方案的实际效益。

提出的方法

提出一种编码分布式计算（CDC）框架，通过在r个节点上复制Map任务，使Map计算负载增加r倍。
设计一种编码洗牌策略，利用冗余的中间数据值实现多播，使通信负载降低为原来的1/r。
通过结构化地放置输入文件和中间数据，实现在节点间的编码机会。
将通信负载表示为r的函数，并推导出最小可实现负载的信息论下限。
将CDC方案应用于Hadoop TeraSort基准，设计出CodedTeraSort算法，利用编码实现更快速的执行。
证明即使采用随机数据放置（如HDFS风格的复制），编码洗牌也能实现接近最优的性能，验证了方案的鲁棒性。

实验结果

研究问题

RQ1能否通过利用计算冗余，使编码技术减少分布式计算中的通信负载？
RQ2分布式系统中计算负载与通信负载之间的根本性权衡是什么？
RQ3是否存在一种编码方案，能在给定计算负载下达到通信负载的信息论下限？
RQ4所提出的CDC方案能否有效应用于TeraSort等真实工作负载？
RQ5现有存储系统中的数据冗余（如HDFS）是否能在无需显式数据放置控制的情况下，实现实用的编码洗牌？

主要发现

当Map计算负载增加r倍时，CDC方案使通信负载降低为原来的1/r，且与信息论下限完全匹配。
最优的计算-通信权衡被精确刻画，证明了所提方案在信息论上是最优的。
基于CDC的CodedTeraSort算法在典型Hadoop集群环境中，使作业执行速度提升1.97倍至3.39倍。
即使在随机数据放置（如HDFS式复制）的情况下，编码洗牌也能实现接近最优CDC设计的通信负载。
该框架可扩展至分层网络拓扑及边缘/雾计算环境，编码技术可有效降低带宽与延迟。
结果表明，编码技术可成为分布式与边缘计算中的变革性工具，实现可扩展、低延迟的计算。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。