Skip to main content
QUICK REVIEW

[论文解读] A reliability- and latency-driven task allocation framework for workflow applications in the edge-hub-cloud continuum

Andreas Kouloumpris, Georgios L. Stavrinides|arXiv (Cornell University)|Feb 20, 2026
Cloud Computing and Resource Management被引用 0
一句话总结

论文提出一个在边缘、枢纽与云之间分配工作流任务的精确多目标二进制整数线性规划框架,以共同优化可靠性和延迟,结合时间冗余和一整套约束。

ABSTRACT

A growing number of critical workflow applications leverage a streamlined edge-hub-cloud architecture, which diverges from the conventional edge computing paradigm. An edge device, in collaboration with a hub device and a cloud server, often suffices for their reliable and efficient execution. However, task allocation in this streamlined architecture is challenging due to device limitations and diverse operating conditions. Given the inherent criticality of such workflow applications, where reliability and latency are vital yet conflicting objectives, an exact task allocation approach is typically required to ensure optimal solutions. As no existing method holistically addresses these issues, we propose an exact multi-objective task allocation framework to jointly optimize the overall reliability and latency of a workflow application in the specific edge-hub-cloud architecture. We present a comprehensive binary integer linear programming formulation that considers the relative importance of each objective. It incorporates time redundancy techniques, while accounting for crucial constraints often overlooked in related studies. We evaluate our approach using a relevant real-world workflow application, as well as synthetic workflows varying in structure, size, and criticality. In the real-world application, our method achieved average improvements of 84.19% in reliability and 49.81% in latency over baseline strategies, across relevant objective trade-offs. Overall, the experimental results demonstrate the effectiveness and scalability of our approach across diverse workflow applications for the considered system architecture, highlighting its practicality with runtimes averaging between 0.03 and 50.94 seconds across all examined workflows.

研究动机与目标

  • 在边缘-枢纽-云工作流应用中证明对可靠且及时执行的需求。
  • 提出一个精确优化框架,将任务映射到边缘、枢纽和云设备,同时在可靠性与延迟之间取得平衡。
  • 结合时间冗余(双重/三重执行)及现实约束(内存、存储、能源、带宽与连通性)。
  • 在边缘-枢纽-云架构中对真实世界和合成工作流进行有效性和可扩展性演示。

提出的方法

  • 将问题表述为多目标二进制整数线性规划(BILP)。
  • 通过两步任务图变换:从原始任务图(TG)到中间的边缘-枢纽-云图(EG),再到最终的可靠性感知图(REG)。
  • 通过模型化 Eik = Lik Pik 来表示计算成本,通过 CEik→jl 表示数据传输成本,捕捉计算和通信成本。
  • 以基于漏洞的执行模式(SE/DE/TE)对可靠性进行建模,驱动因素为应用的重要性及阈值 VT_DE 与 VT_TE。
  • 在 REG 转换中融入时间冗余(双/三次执行)和多数表决/验证。
  • 在设计阶段离线求解,以获得可靠性与延迟之间的帕累托最优权衡。
Figure 1: Examples of transforming an application’s initial TG $G$ into its corresponding intermediate EG $\dot{G}$ and final REG $\ddot{G}$ .
Figure 1: Examples of transforming an application’s initial TG $G$ into its corresponding intermediate EG $\dot{G}$ and final REG $\ddot{G}$ .

实验结果

研究问题

  • RQ1如何将边缘-枢纽-云 架构中的任务分配形式化,以同时优化可靠性和延迟?
  • RQ2在设备与网络约束下,时间冗余技术(SE/DE/TE)对整体应用可靠性和延迟的影响是什么?
  • RQ3基于精确的 BILP 方法是否能够扩展到真实世界的基于 UAV 的工作流和边缘-枢纽-云连续体中的合成工作流?
  • RQ4应用重要性级别如何影响在边缘、枢纽和云之间的分配与复制策略?

主要发现

  • 在真实世界的基于 UAV 的检查工作流中,该方法在可靠性方面平均提升 84.19%,在延迟方面平均提升 49.81%,相较于基线策略。
  • 运行时间在所有研究工作流中平均在 0.03 到 50.94 秒之间,显示出实际的离线可用性。
  • 该框架在边缘-枢纽-云环境中同时考虑了内存、存储、计算/通信延迟、能耗与可靠性等约束。
  • 时间冗余通过一个结构化的两步图变换进行整合,明确根据漏洞性和关键性为每个任务选择执行模式(SE/DE/TE)。
  • 实验评估包括一个真实世界工作流及多种合成工作流,以展示在结构、规模及关键性水平上的可扩展性。
Figure 3: Normalized overall reliability and latency, and percentage of allocated tasks (primary and replicas) per device, with respect to $w_{\mathrm{rel}}$ and $w_{\mathrm{lat}}$ , for the real-world workflow.
Figure 3: Normalized overall reliability and latency, and percentage of allocated tasks (primary and replicas) per device, with respect to $w_{\mathrm{rel}}$ and $w_{\mathrm{lat}}$ , for the real-world workflow.

更好的研究,从现在开始

从论文设计到论文写作,大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成,并经人工编辑审核。