QUICK REVIEW

[论文解读] RepGhost: A Hardware-Efficient Ghost Module via Re-parameterization

Chengpeng Chen, Zichao Guo|arXiv (Cornell University)|Nov 11, 2022

Advanced Neural Network Applications被引用 31

一句话总结

RepGhost 通过结构重参数化取代基于拼接的特征再利用，创建了一个对硬件友好的 RepGhost 模块和 RepGhostNet，在移动设备上提升准确率与延迟的平衡。

ABSTRACT

Feature reuse has been a key technique in light-weight convolutional neural networks (CNNs) architecture design. Current methods usually utilize a concatenation operator to keep large channel numbers cheaply (thus large network capacity) by reusing feature maps from other layers. Although concatenation is parameters- and FLOPs-free, its computational cost on hardware devices is non-negligible. To address this, this paper provides a new perspective to realize feature reuse implicitly and more efficiently instead of concatenation. A novel hardware-efficient RepGhost module is proposed for implicit feature reuse via reparameterization, instead of using concatenation operator. Based on the RepGhost module, we develop our efficient RepGhost bottleneck and RepGhostNet. Experiments on ImageNet and COCO benchmarks demonstrate that our RepGhostNet is much more effective and efficient than GhostNet and MobileNetV3 on mobile devices. Specially, our RepGhostNet surpasses GhostNet 0.5x by 2.5% Top-1 accuracy on ImageNet dataset with less parameters and comparable latency on an ARM-based mobile device. Code and model weights are available at https://github.com/ChengpengChen/RepGhost.

研究动机与目标

激发在轻量级卷积神经网络中超越级联的硬件高效特征重用。
提出 RepGhost 模块，通过权重空间融合实现隐式特征重用。
通过将 Ghost 模块替换为 RepGhost 模块构建 RepGhostNet，并在移动设备上进行评估。
在 ImageNet 和 COCO 任务上展示改进的准确率-延迟权衡。

提出的方法

分析用于特征重用的级联（拼接）与相加操作在硬件上的成本差异。
将 Ghost 模块中的级联替换为相加并重构以满足重参数化规则。
把 ReLU 移到融合之前以满足结构重参数化，使训练时的多样性在权重空间融合。
在恒等分支引入 BN，并在推理时进行融合以形成 RepGhost 模块。
将 Ghost 瓶颈替换为 RepGhost 瓶颈，同时保持输入/输出通道对齐，形成 RepGhostNet。
在 ImageNet 分类和 COCO 检测/分割上进行评估，并对重参数化结构和快捷连接进行消融研究。

实验结果

研究问题

RQ1通过结构重参数化实现的特征重用是否比基于 Ghost 的架构中的拼接在硬件成本上更低？
RQ2RepGhost 模块是否能在移动硬件上达到同等或更高的精度，同时延迟更低？
RQ3哪种重参数化结构最能同时支持训练时的多样性和推理时的高效？
RQ4在像 RepGhostNet 这样的超轻量 CNN 中，快捷连接是否仍然有益？

主要发现

在 ARM 移动端延迟下，RepGhostNet 的准确率-延迟权衡优于 GhostNet 和 MobileNetV3。
RepGhostNet 0.5x 比 GhostNet 0.5x 快 20%，ImageNet Top-1 精度提高 0.5%。
RepGhostNet 1.0x 比 MobileNetV3 Large 0.75x 快 14%，Top-1 精度提高 0.7%。
RepGhostNet 0.58x 在 ImageNet 上比 GhostNet 0.5x 的 Top-1 高 2.5%。
在 COCO 上，RepGhostNet 的推理速度优于 MobileNetV2、MobileNetV3 和 GhostNet；例如，RepGhostNet 1.3x 的 mAP 高于 GhostNet 基线。
消融研究表明基于 BN 的重参数化带来最佳性能；将 ReLU 向后移动以实现融合；快捷连接在小模型中仍然有益。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。