Skip to main content
QUICK REVIEW

[论文解读] Where to Split? A Pareto-Front Analysis of DNN Partitioning for Edge Inference

Adiba Masud, Nicholas Foley|arXiv (Cornell University)|Jan 12, 2026
Advanced Neural Network Applications被引用 0
一句话总结

论文提出了ParetoPipe,这是一个开源框架,将边缘推理中的DNN分区视为多目标问题,在异构边缘硬件和网络条件下映射延迟-吞吐量Pareto前沿。

ABSTRACT

The deployment of deep neural networks (DNNs) on resource-constrained edge devices is frequently hindered by their significant computational and memory requirements. While partitioning and distributing a DNN across multiple devices is a well-established strategy to mitigate this challenge, prior research has largely focused on single-objective optimization, such as minimizing latency or maximizing throughput. This paper challenges that view by reframing DNN partitioning as a multi-objective optimization problem. We argue that in real-world scenarios, a complex trade-off between latency and throughput exists, which is further complicated by network variability. To address this, we introduce ParetoPipe, an open-source framework that leverages Pareto front analysis to systematically identify optimal partitioning strategies that balance these competing objectives. Our contributions are threefold: we benchmark pipeline partitioned inference on a heterogeneous testbed of Raspberry Pis and a GPU-equipped edge server; we identify Pareto-optimal points to analyze the latency-throughput trade-off under varying network conditions; and we release a flexible, open-source framework to facilitate distributed inference and benchmarking. This toolchain features dual communication backends, PyTorch RPC and a custom lightweight implementation, to minimize overhead and support broad experimentation.

研究动机与目标

  • 将边缘推理的DNN分区重新表述为在延迟和吞吐量之间平衡的多目标优化问题。
  • 在异构边缘硬件上对管道分区推理进行基准测试,以映射Pareto最优前沿。
  • 在不同网络延迟和带宽条件下评估分区策略的鲁棒性。
  • 提供一个开源框架,便于分布式推理基准测试与分析。

提出的方法

  • 提出ParetoPipe,这是一个使用流水线并行化将DNN分区到边缘设备的可扩展框架。
  • 实现两种通信后端:PyTorch RPC和自定义的轻量级TCP套接字后端,以研究开销。
  • 对六个CNN模型进行块级执行时间分析,以识别最优切分点。
  • 在Pi到Pi和Pi到GPU的设置上进行穷举的分区点测试,生成延迟-吞吐量Pareto前沿。
  • 使用tc模拟不良网络条件,以研究在延迟/带宽约束下前沿的偏移。
  • 将自定义后端与PyTorch RPC进行对比,以量化开销和性能差异。

实验结果

研究问题

  • RQ1如何将边缘推理的DNN分区分析为在延迟和吞吐量之间平衡的多目标优化问题?
  • RQ2在异构边缘硬件上,常见CNN模型的Pareto最优分区点是什么?
  • RQ3网络延迟和带宽限制如何改变延迟-吞吐量前沿并影响分区决策?
  • RQ4使用自定义基于套接字的后端与PyTorch RPC在分布式推理中的性能差异有多大?
  • RQ5块级分析如何影响跨模型与配置的最优分区策略?

主要发现

Model (Split)Pi1-Exe(s)Pi2-Exe(s)Net-time(s)Pi1-CPU Util(%)Pi2-CPU Util(%)Pi1 Mem(%)Pi2 Mem(%)
AlexNet (P10)0.4510.4270.050272.7271.924.8421.76
AlexNet (P11)0.4190.3790.045347.7271.921.7220.36
InceptionV3 (P10)2.8732.7660.055328.8322.722.2420.76
InceptionV3 (P19)5.7910.0020.040345.90.1919.4820.20
MobileNetV2 (P3)0.9690.9410.048281.2309.618.5116.16
MobileNetV2 (P17)1.8180.0650.049311.912.116.8415.39
ResNet18 (P2)0.6990.8460.043296.7329.820.2116.73
ResNet18 (P6)1.2900.2780.046351.776.217.3316.58
ResNet50 (P5)2.6902.6500.052342.2336.023.6519.29
ResNet50 (P15)5.2330.1960.041360.414.019.4818.74
VGG16 (P14)6.8276.3190.056328.9305.735.4332.87
VGG16 (P29)13.370.8940.044347.619.333.9534.76
  • ParetoFrontiers在Pi-to-Pi与Pi-to-GPU部署中显示出截然不同的最优分区点,MobileNetV2及类似模型在Pi-to-Pi上偏好非对称切分,在涉及GPU时倾向更多卸载。
  • 在现实网络约束下,前沿向边缘端更多计算偏移,数据传输开销高时GPU卸载的收益降低。
  • 自定义套接字后端相较于PyTorch RPC显著降低端到端延迟(MobileNetV2吞吐量示例中最高可达76%),并提升吞吐量(最高53%)。
  • 块级分析揭示并非所有块成本同等,指导分区点朝着平衡计算和跨设备通信的方向发展。
  • 网络条件是一类一流的瓶颈;高延迟、低带宽会使数据传输开销增加,导致GPU加速效果下降。
  • 在网络瓶颈下,Pareto前沿变得稀疏,凸显对网络感知的自适应分区需求。

更好的研究,从现在开始

从论文设计到论文写作,大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成,并经人工编辑审核。