QUICK REVIEW

[论文解读] Occupy the Cloud: Distributed Computing for the 99%

Eric Jonas, Qifan Pu|arXiv (Cornell University)|Feb 13, 2017

Cloud Computing and Resource Management参考文献 29被引用 57

一句话总结

本文提出一种无服务器、无状态函数模型（PyWren）用于分布式计算，以简化弹性和部署，展示它可以在远程存储作为状态的前提下实现多种范式，如 MapReduce 和 BSP。

ABSTRACT

Distributed computing remains inaccessible to a large number of users, in spite of many open source platforms and extensive commercial offerings. While distributed computation frameworks have moved beyond a simple map-reduce model, many users are still left to struggle with complex cluster management and configuration tools, even for running simple embarrassingly parallel jobs. We argue that stateless functions represent a viable platform for these users, eliminating cluster management overhead, fulfilling the promise of elasticity. Furthermore, using our prototype implementation, PyWren, we show that this model is general enough to implement a number of distributed computing models, such as BSP, efficiently. Extrapolating from recent trends in network bandwidth and the advent of disaggregated storage, we suggest that stateless functions are a natural fit for data processing in future computing environments.

研究动机与目标

通过消除集群管理开销，让云计算对更广泛的用户群体更易获取。
提出一种基于无状态函数和远程存储作为数据处理核心抽象的无服务器体系结构。
通过 PyWren 展示多种分布式模型（如 MapReduce、BSP、参数服务器）在极少状态下也能高效实现。
评估仅使用远程存储作为输入/输出的性能权衡，识别瓶颈与未来挑战。

提出的方法

引入具有无状态函数和全局调度器的无服务器执行模型。
开发 PyWren，将用户的 Python 函数序列化并通过 S3 部署到 AWS Lambda，从而实现 map 原语。
使用远程对象存储（S3）和每核吞吐量对 IO 与计算进行基准测试，显示跨数千个函数的可扩展性。
展示 BSP 风格和 MapReduce 类工作负载，并与在专用集群上的 Spark 进行比较。
通过在无状态函数之上构建更高层抽象（Map+Reduce、BSP、参数服务器）来讨论通用性。

实验结果

研究问题

RQ1无状态函数加远程存储是否能为“羞耻并行”和中等耦合工作负载提供足够的性能？
RQ2将远程存储作为唯一状态介质时，计算和 I/O 的性能特征是什么？
RQ3常见分布式计算抽象（MapReduce、BSP、参数服务器）在无状态函数平台上能够实现到什么程度？
RQ4实际系统瓶颈（启动开销、存储吞吐量、调度）有哪些，如何缓解？
RQ5无服务器方法的成本与弹性相较于传统集群在大规模分析中的表现如何？

主要发现

使用远程存储的无状态函数在 S3 上实现每核 30-40 MB/s 的写/读，且在同时运行 2800 个函数时可扩展到 60-80 GB/s。
使用分布式 Lambda 工作者和 Redis 分片，PyWren 可完成 1TB 的排序基准，展示了可扩展的 shuffle 支持性能。
83M 项的单词计数工作负载仅比在专用服务器上的 PySpark 慢约 17%，显示了在大规模任务中的竞争性能。
每个 Lambda 内的矩阵乘法达到每核 18 GFLOPS，使用 2800 个工作者可扩展到超过 40 TFLOPS。
以 Shuffle 为主的工作负载展示了可行性，但随着数据移动增多，存储吞吐量成为主要瓶颈。
参数服务器工作负载（如 Hogwild!）可以通过无状态函数实现，使用中心化的键值存储进行协调。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。