QUICK REVIEW

[论文解读] FCN-Pose: A Pruned and Quantized CNN for Robot Pose Estimation for Constrained Devices

Marrone Silvério Melo Dantas, Iago Richard Rodrigues|arXiv (Cornell University)|May 26, 2022

Advanced Neural Network Applications被引用 20

一句话总结

本文提出 FCN-Pose，一种用于机器人姿态估计的轻量级全卷积网络，通过剪枝和量化在如 Raspberry Pi 等受限设备上高效运行，在参数量、FLOPS 方面实现显著减少并获得实时性能提升。

ABSTRACT

IoT devices suffer from resource limitations, such as processor, RAM, and disc storage. These limitations become more evident when handling demanding applications, such as deep learning, well-known for their heavy computational requirements. A case in point is robot pose estimation, an application that predicts the critical points of the desired image object. One way to mitigate processing and storage problems is compressing that deep learning application. This paper proposes a new CNN for the pose estimation while applying the compression techniques of pruning and quantization to reduce his demands and improve the response time. While the pruning process reduces the total number of parameters required for inference, quantization decreases the precision of the floating-point. We run the approach using a pose estimation task for a robotic arm and compare the results in a high-end device and a constrained device. As metrics, we consider the number of Floating-point Operations Per Second(FLOPS), the total of mathematical computations, the calculation of parameters, the inference time, and the number of video frames processed per second. In addition, we undertake a qualitative evaluation where we compare the output image predicted for each pruned network with the corresponding original one. We reduce the originally proposed network to a 70% pruning rate, implying an 88.86% reduction in parameters, 94.45% reduction in FLOPS, and for the disc storage, we reduced the requirement in 70% while increasing error by a mere $1\%$. With regard input image processing, this metric increases from 11.71 FPS to 41.9 FPS for the Desktop case. When using the constrained device, image processing augmented from 2.86 FPS to 10.04 FPS. The higher processing rate of image frames achieved by the proposed approach allows a much shorter response time.

研究动机与目标

在计算能力和存储受限的物联网/边缘设备上推动实时机器人姿态估计。
为基于关键点的机器人姿态估计开发紧凑的 CNN 架构（FCN-Pose）。
应用剪枝和量化在显著降低模型大小和计算量的同时保持准确性。
在桌面端和受限设备（Raspberry Pi 3）上演示性能。
提供对姿态估计输出的压缩影响的定性与定量评估。

提出的方法

设计 FCN-Pose，一种具有10个卷积层、5个最大池化、4个上采样层、以及9个输出分割图（8个关键点 + 骨架）的轻量全卷积网络。
在包含8个关键点及相应分割掩膜的机器人臂姿态数据集上进行训练；使用数据增强（旋转、填充）以缓解过拟合。
通过滤波器排序（L1-范数）进行剪枝以移除冗余滤波器并进行再训练。
将后训练量化从 FP32 转换为 FP16 以降低存储。
端到端地先剪枝、再重新训练、然后量化以实现压缩；量化后不进行额外训练。
后处理使用基于聚类的细化（Expansion Clustering）从分割区域导出关键点坐标。

实验结果

研究问题

RQ1在资源受限的情况下，FCN-Pose 能否准确预测机器臂的关键点？
RQ2剪枝和量化如何影响关键点检测精度（PCK）和在桌面端与受限设备上的处理速度？
RQ3压缩后参数数量、FLOPs、磁盘存储和姿态估计误差之间的权衡是什么？

主要发现

文件夹ID	PCK@0.5	推理时间（s）	FPS（CPU）
0	0.997	0.088	11.346
1	0.997	0.085	11.731
2	0.999	0.084	11.825
3	0.998	0.085	11.754
4	0.996	0.084	11.899

FCN-Pose 具有 131,705 个参数和 1.7 MB 存储，与典型的 FCN 风格网络相比显著更小。
剪枝达到 70% 时，参数下降 88.86%，FLOPS 降低 94.45%，存储减少 70%，误差仅增加约 1%。
在桌面 CPU 上，平均 PCK@0.5 约为 0.997，平均 CPU FPS 约为 11.711（5 折交叉验证）。
在受限的 Raspberry Pi 3 上，压缩后输入图像处理 FPS 从 2.86 提升到 10.04。
剪枝和量化后的 FCN-Pose 在受限设备上提供显著提升的实时性能，且准确度下降很小。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。