[论文解读] Reinventing High Performance Computing: Challenges and Opportunities
本文分析云端规模厂商、半导体约束及端到端协同设计如何重新塑造高性能计算(HPC),并主张从根本上重新思考领先系统的设计与部署。
The world of computing is in rapid transition, now dominated by a world of smartphones and cloud services, with profound implications for the future of advanced scientific computing. Simply put, high-performance computing (HPC) is at an important inflection point. For the last 60 years, the world's fastest supercomputers were almost exclusively produced in the United States on behalf of scientific research in the national laboratories. Change is now in the wind. While costs now stretch the limits of U.S. government funding for advanced computing, Japan and China are now leaders in the bespoke HPC systems funded by government mandates. Meanwhile, the global semiconductor shortage and political battles surrounding fabrication facilities affect everyone. However, another, perhaps even deeper, fundamental change has occurred. The major cloud vendors have invested in global networks of massive scale systems that dwarf today's HPC systems. Driven by the computing demands of AI, these cloud systems are increasingly built using custom semiconductors, reducing the financial leverage of traditional computing vendors. These cloud systems are now breaking barriers in game playing and computer vision, reshaping how we think about the nature of scientific computation. Building the next generation of leading edge HPC systems will require rethinking many fundamentals and historical approaches by embracing end-to-end co-design; custom hardware configurations and packaging; large-scale prototyping, as was common thirty years ago; and collaborative partnerships with the dominant computing ecosystem companies, smartphone, and cloud computing vendors.
研究动机与目标
- 评估塑造 HPC 的技术与经济力量(云、半导体、地缘政治)。
- 主张从传统的 HPC 设计过渡到端到端协同设计与定制封装。
- 强调对研究机构、产业界和政策的影响,以维持在 HPC 的领先地位。
- 确定未来方向和推动前沿科学计算的机会。
提出的方法
- 从 Cray-1 时代到现代集群与超算的历史与趋势分析。
- 审查影响 HPC 生态系统的经济与地缘政治变化(政府、云服务商、厂商)。
- 讨论影响 HPC 软件栈的云原生软件框架与 AI 驱动需求。
- 综合半导体趋势(Dennard 缩放终结、摩尔定律放缓)与封装方法(芯粒/ chiplets)作为未来 HPC 的驱动因素。
- 与硅晶圆制造与人才分布相关的政策与国家战略考量。
实验结果
研究问题
- RQ1未来高性能计算的关键技术与经济力量是什么?
- RQ2云服务生态系统和 AI 工作负载如何影响 HPC 的设计、部署与访问?
- RQ3半导体挑战与封装(如 chiplets)在实现未来 HPC 中扮演怎样的角色?
- RQ4为维持在先进计算中的领导地位,需要哪些战略方向与政策行动?
主要发现
- 云端规模厂商和 AI 工作负载正在重塑计算基础设施,并比传统厂商单独更大程度地影响 HPC 的优先级。
- Dennard 缩放终结与摩尔定律放缓需要在性能提升上更多依赖规模、协同设计、GPU 和定制加速器。
- 芯粒封装与 EUV/FET 创新正在成为提升性能和制造性的新兴实际路径,尽管制造成本上升。
- 传统 HPC 厂商在经济上相对较小,越来越依赖政府投资来推动前沿技术开发。
- 向大型商业科技公司和初创企业的人才迁移正在加速,影响学术界和国家实验室在 HPC 的能力。
- 需要一个更广泛的生态系统协作——超越传统 HPC 厂商——以维持下一代科学计算的领导地位。
更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。