QUICK REVIEW

[论文解读] Where to Split? A Pareto-Front Analysis of DNN Partitioning for Edge Inference

Adiba Masud, Nicholas Foley|arXiv (Cornell University)|Jan 12, 2026

Advanced Neural Network Applications被引用 0

一句话总结

论文提出了ParetoPipe，这是一个开源框架，将边缘推理中的DNN分区视为多目标问题，在异构边缘硬件和网络条件下映射延迟-吞吐量Pareto前沿。

ABSTRACT

The deployment of deep neural networks (DNNs) on resource-constrained edge devices is frequently hindered by their significant computational and memory requirements. While partitioning and distributing a DNN across multiple devices is a well-established strategy to mitigate this challenge, prior research has largely focused on single-objective optimization, such as minimizing latency or maximizing throughput. This paper challenges that view by reframing DNN partitioning as a multi-objective optimization problem. We argue that in real-world scenarios, a complex trade-off between latency and throughput exists, which is further complicated by network variability. To address this, we introduce ParetoPipe, an open-source framework that leverages Pareto front analysis to systematically identify optimal partitioning strategies that balance these competing objectives. Our contributions are threefold: we benchmark pipeline partitioned inference on a heterogeneous testbed of Raspberry Pis and a GPU-equipped edge server; we identify Pareto-optimal points to analyze the latency-throughput trade-off under varying network conditions; and we release a flexible, open-source framework to facilitate distributed inference and benchmarking. This toolchain features dual communication backends, PyTorch RPC and a custom lightweight implementation, to minimize overhead and support broad experimentation.

研究动机与目标

将边缘推理的DNN分区重新表述为在延迟和吞吐量之间平衡的多目标优化问题。
在异构边缘硬件上对管道分区推理进行基准测试，以映射Pareto最优前沿。
在不同网络延迟和带宽条件下评估分区策略的鲁棒性。
提供一个开源框架，便于分布式推理基准测试与分析。

提出的方法

提出ParetoPipe，这是一个使用流水线并行化将DNN分区到边缘设备的可扩展框架。
实现两种通信后端：PyTorch RPC和自定义的轻量级TCP套接字后端，以研究开销。
对六个CNN模型进行块级执行时间分析，以识别最优切分点。
在Pi到Pi和Pi到GPU的设置上进行穷举的分区点测试，生成延迟-吞吐量Pareto前沿。
使用tc模拟不良网络条件，以研究在延迟/带宽约束下前沿的偏移。
将自定义后端与PyTorch RPC进行对比，以量化开销和性能差异。

实验结果

研究问题

RQ1如何将边缘推理的DNN分区分析为在延迟和吞吐量之间平衡的多目标优化问题？
RQ2在异构边缘硬件上，常见CNN模型的Pareto最优分区点是什么？
RQ3网络延迟和带宽限制如何改变延迟-吞吐量前沿并影响分区决策？
RQ4使用自定义基于套接字的后端与PyTorch RPC在分布式推理中的性能差异有多大？
RQ5块级分析如何影响跨模型与配置的最优分区策略？

主要发现

Model (Split)	Pi1-Exe(s)	Pi2-Exe(s)	Net-time(s)	Pi1-CPU Util(%)	Pi2-CPU Util(%)	Pi1 Mem(%)	Pi2 Mem(%)
AlexNet (P10)	0.451	0.427	0.050	272.7	271.9	24.84	21.76
AlexNet (P11)	0.419	0.379	0.045	347.7	271.9	21.72	20.36
InceptionV3 (P10)	2.873	2.766	0.055	328.8	322.7	22.24	20.76
InceptionV3 (P19)	5.791	0.002	0.040	345.9	0.19	19.48	20.20
MobileNetV2 (P3)	0.969	0.941	0.048	281.2	309.6	18.51	16.16
MobileNetV2 (P17)	1.818	0.065	0.049	311.9	12.1	16.84	15.39
ResNet18 (P2)	0.699	0.846	0.043	296.7	329.8	20.21	16.73
ResNet18 (P6)	1.290	0.278	0.046	351.7	76.2	17.33	16.58
ResNet50 (P5)	2.690	2.650	0.052	342.2	336.0	23.65	19.29
ResNet50 (P15)	5.233	0.196	0.041	360.4	14.0	19.48	18.74
VGG16 (P14)	6.827	6.319	0.056	328.9	305.7	35.43	32.87
VGG16 (P29)	13.37	0.894	0.044	347.6	19.3	33.95	34.76

ParetoFrontiers在Pi-to-Pi与Pi-to-GPU部署中显示出截然不同的最优分区点，MobileNetV2及类似模型在Pi-to-Pi上偏好非对称切分，在涉及GPU时倾向更多卸载。
在现实网络约束下，前沿向边缘端更多计算偏移，数据传输开销高时GPU卸载的收益降低。
自定义套接字后端相较于PyTorch RPC显著降低端到端延迟（MobileNetV2吞吐量示例中最高可达76%），并提升吞吐量（最高53%）。
块级分析揭示并非所有块成本同等，指导分区点朝着平衡计算和跨设备通信的方向发展。
网络条件是一类一流的瓶颈；高延迟、低带宽会使数据传输开销增加，导致GPU加速效果下降。
在网络瓶颈下，Pareto前沿变得稀疏，凸显对网络感知的自适应分区需求。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。