Skip to main content
QUICK REVIEW

[论文解读] Workflows Community Summit 2022: A Roadmap Revolution

Rafael Ferreira da Silva, Rosa M. Badía|arXiv (Cornell University)|Mar 31, 2023
Scientific Computing and Data Management被引用 15
一句话总结

本技术报告总结了2022年工作流社区峰会,概述了六个跨领域主题、讨论内容,以及在标准、AI、数据管理、HPC/量子、FAIR性与连续计算等方面推动科学工作流技术的推荐路线图。

ABSTRACT

Scientific workflows have become integral tools in broad scientific computing use cases. Science discovery is increasingly dependent on workflows to orchestrate large and complex scientific experiments that range from execution of a cloud-based data preprocessing pipeline to multi-facility instrument-to-edge-to-HPC computational workflows. Given the changing landscape of scientific computing and the evolving needs of emerging scientific applications, it is paramount that the development of novel scientific workflows and system functionalities seek to increase the efficiency, resilience, and pervasiveness of existing systems and applications. Specifically, the proliferation of machine learning/artificial intelligence (ML/AI) workflows, need for processing large scale datasets produced by instruments at the edge, intensification of near real-time data processing, support for long-term experiment campaigns, and emergence of quantum computing as an adjunct to HPC, have significantly changed the functional and operational requirements of workflow systems. Workflow systems now need to, for example, support data streams from the edge-to-cloud-to-HPC enable the management of many small-sized files, allow data reduction while ensuring high accuracy, orchestrate distributed services (workflows, instruments, data movement, provenance, publication, etc.) across computing and user facilities, among others. Further, to accelerate science, it is also necessary that these systems implement specifications/standards and APIs for seamless (horizontal and vertical) integration between systems and applications, as well as enabling the publication of workflows and their associated products according to the FAIR principles. This document reports on discussions and findings from the 2022 international edition of the Workflows Community Summit that took place on November 29 and 30, 2022.

研究动机与目标

  • 在新的计算景观(边缘到云到高性能计算)背景下,识别并更新2021年的科学工作流研究与开发路线图。
  • 突出2022年峰会的跨领域议题及具体讨论结果。
  • 提出可执行步骤和由社区驱动的里程碑,以促进工作流互操作性、数据管理与AI整合。

提出的方法

  • 组织并报告一个为期两天、来自多国的106名参与者的虚拟峰会,覆盖工作流相关利益相关者。
  • 通过六个主题共同负责人主导讨论,每个主题包含全体报告和分组研讨。
  • 总结前后路线图(2021年和2022年)的成果、挑战与拟议解决方案。
  • 参考并编目峰会网站和YouTube频道上可用的演讲和视频。
Figure 1: Screenshot of the 2022 edition of the Workflows Community Summit participants. (The event was held virtually via Zoom on November 29 and 30, 2022.)
Figure 1: Screenshot of the 2022 edition of the Workflows Community Summit participants. (The event was held virtually via Zoom on November 29 and 30, 2022.)

实验结果

研究问题

  • RQ1现代科学工作流中互操作性、标准和API的关键挑战与里程碑是什么?
  • RQ2应该如何描述、基准测试以及将AI/ML工作流与HPC环境整合?
  • RQ3面向边缘到云到HPC以及流式/紧急计算情境的数据管理与就地工作流需求是什么?
  • RQ4在FAIR工作流、连续计算和跨设施工作流执行方面有哪些建议?
  • RQ5社区如何演变为一个共同的知识库与治理(如工作流公会)以维持协作?

主要发现

  • 确认为2022年的六个跨领域主题:规范/API、AI工作流、高性能数据管理与就地工作流、HPC/量子工作流、FAIR计算工作流,以及连续计算/跨设施计算。
  • 分组讨论为每个主题产生了具体挑战、实例和可操作的建议。
  • 报告重申并重新审视2021年路线图的里程碑,并使其与当前工作流需求及新兴基础设施保持一致。
  • 强调发展共同术语、在可行的范围内标准化,以及由社区驱动的知识库以提升互操作性。
  • 峰会围绕参考规范、标准和API组织讨论,并评估如Common Workflow Scheduler API和GA4GH风格接口等方法。
  • 呼吁创建基准测试和专门的AI工作流基准套件,以实现跨设施的对比研究。
Workflows Community Summit 2022: A Roadmap Revolution

更好的研究,从现在开始

从论文设计到论文写作,大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成,并经人工编辑审核。