Skip to main content
QUICK REVIEW

[论文解读] Workflows Community Summit: Bringing the Scientific Workflows Community Together

Rafael Ferreira da Silva, Henri Casanova|arXiv (Cornell University)|Mar 16, 2021
Scientific Computing and Data Management参考文献 12被引用 27
一句话总结

该论文报道了 Workflows Community Summit(2021年1月),总结了六个主题讨论,并提出短期和长期的社区努力以推动科学工作流管理系统及更广泛的工作流生态系统。

ABSTRACT

Scientific workflows have been used almost universally across scientific domains, and have underpinned some of the most significant discoveries of the past several decades. Many of these workflows have high computational, storage, and/or communication demands, and thus must execute on a wide range of large-scale platforms, from large clouds to upcoming exascale high-performance computing (HPC) platforms. These executions must be managed using some software infrastructure. Due to the popularity of workflows, workflow management systems (WMSs) have been developed to provide abstractions for creating and executing workflows conveniently, efficiently, and portably. While these efforts are all worthwhile, there are now hundreds of independent WMSs, many of which are moribund. As a result, the WMS landscape is segmented and presents significant barriers to entry due to the hundreds of seemingly comparable, yet incompatible, systems that exist. As a result, many teams, small and large, still elect to build their own custom workflow solution rather than adopt, or build upon, existing WMSs. This current state of the WMS landscape negatively impacts workflow users, developers, and researchers. The "Workflows Community Summit" was held online on January 13, 2021. The overarching goal of the summit was to develop a view of the state of the art and identify crucial research challenges in the workflow community. Prior to the summit, a survey sent to stakeholders in the workflow community (including both developers of WMSs and users of workflows) helped to identify key challenges in this community that were translated into 6 broad themes for the summit, each of them being the object of a focused discussion led by a volunteer member of the community. This report documents and organizes the wealth of information provided by the participants before, during, and after the summit.

研究动机与目标

  • 记录科学工作流和 WMS 景观及其碎片化状况。
  • 识别工作流社区面临的六个关键主题挑战。
  • 概述峰会结构、参与者及产出。
  • 提出解决已识别挑战的短期和长期社区努力。
  • 概述两个 NSF/DOE 项目(WorkflowsRI 和 ExaWorks)如何协作推进该领域。

提出的方法

  • 峰会前的社区研究基础设施调查,以识别需求和挑战。
  • 在线峰会,来自国际 WMS 开发者和用户的 48 位受邀参与者;全体快讯演讲后进行分组讨论。
  • 将分组讨论的主题性综合以识别挑战和拟议行动。
  • 记录结果,包括每个主题的短期和长期社区努力。
  • 跨项目合作(WorkflowsRI 和 ExaWorks)以告知基础设施和 SDK 开发。

实验结果

研究问题

  • RQ1在生命周期、复用、溯源和标注方面,面向 FAIR 计算工作流的核心挑战是什么?
  • RQ2针对工作流用户存在哪些培训和教育需求,如何解决?
  • RQ3在科学工作流中,AI/ML 支持的工作流有哪些独特的需求与挑战?
  • RQ4超级规模计算及超 HPC 考虑因素如何影响工作流执行、资源管理和容错?
  • RQ5如何推进互操作性、API 和标准以减少 WMS 的碎片化?
  • RQ6如何在开发者和用户之间建立并维持一个统一的工作流社区?

主要发现

  • 确定六个主题(FAIR workflows、training/education、AI workflows、exascale challenges、APIs/interoperability/standards、building a workflows community)及相关挑战。
  • 为每个主题提出具体的短期和长期社区努力以应对已识别的挑战。
  • 记录了峰会结构,包括调查、快题演讲和分组讨论,以及 NSF/DoE 项目 WorkflowsRI 和 ExaWorks 的参与。
  • 概述了需要一个共同的知识库和社区驱动的指南,以减少 WMS 景观的碎片化。
  • 建议利用现有注册表、工作流存储库和课程来促进 FAIR、培训和标准。
  • 建议创建 AI 工作流用例和最终的基准小型应用,以指导 HPC 协同设计与评估。

更好的研究,从现在开始

从论文设计到论文写作,大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成,并经人工编辑审核。