QUICK REVIEW

[论文解读] DyNet: Dynamic Convolution for Accelerating Convolutional Neural Networks

Yikang Zhang, Jian Zhang|arXiv (Cornell University)|Apr 22, 2020

Advanced Neural Network Applications参考文献 43被引用 69

一句话总结

DyNet 引入一种动态卷积，通过将固定卷积核与预测系数线性融合来从图像内容中自适应生成卷积核，在大多数 CNN 中显著降低 FLOPs 或在成本相似的情况下提升准确率，甚至实现更快的 CPU/GPU 推理。

ABSTRACT

Convolution operator is the core of convolutional neural networks (CNNs) and occupies the most computation cost. To make CNNs more efficient, many methods have been proposed to either design lightweight networks or compress models. Although some efficient network structures have been proposed, such as MobileNet or ShuffleNet, we find that there still exists redundant information between convolution kernels. To address this issue, we propose a novel dynamic convolution method to adaptively generate convolution kernels based on image contents. To demonstrate the effectiveness, we apply dynamic convolution on multiple state-of-the-art CNNs. On one hand, we can reduce the computation cost remarkably while maintaining the performance. For ShuffleNetV2/MobileNetV2/ResNet18/ResNet50, DyNet can reduce 37.0/54.7/67.2/71.3% FLOPs without loss of accuracy. On the other hand, the performance can be largely boosted if the computation cost is maintained. Based on the architecture MobileNetV3-Small/Large, DyNet achieves 70.3/77.1% Top-1 accuracy on ImageNet with an improvement of 2.9/1.9%. To verify the scalability, we also apply DyNet on segmentation task, the results show that DyNet can reduce 69.3% FLOPs while maintaining Mean IoU on segmentation task.

研究动机与目标

Motivate and address redundancy among convolution kernels in CNNs which leads to wasted computation.
Propose a simple, trainable dynamic convolution framework that generates kernels based on input content.
Show that dynamic convolution can be a drop-in module for mainstream architectures to reduce FLOPs with minimal accuracy loss or even accuracy gains.
Demonstrate scalability on image classification (ImageNet) and segmentation tasks across multiple architectures.

提出的方法

Introduce a coefficient prediction module that predicts the weights for fusing several fixed kernels.
Propose a dynamic generation module that constructs a dynamic kernel as a weighted sum of fixed kernels: w~t = sum_i eta_t^i * w_t^i.
Use a group-based design with a hyperparameter g_t to control fixed kernel count.
Train by fusing feature maps during training, leveraging Eq. (2) to show equivalence to kernel fusion on outputs.
Apply DyNet to MobileNetV2, ShuffleNetV2, and ResNet variants to create Dy-Mobile, Dy-Shuffle, Dy-ResNet18, and Dy-ResNet50.
Evaluate across ImageNet (top-1 accuracy) and Cityscapes segmentation to demonstrate both classification and segmentation benefits.

实验结果

研究问题

RQ1Can dynamic convolution reduce redundant computations in CNNs without large accuracy losses?
RQ2Does input-adaptive kernel fusion maintain or improve performance while reducing FLOPs across vision backbones?
RQ3How does dynamic convolution affect inference speed on CPU/GPU and training speed on multi-GPU setups?
RQ4Is the approach scalable to segmentation tasks and larger backbones?
RQ5What is the impact of the group size g_t on performance and parameter budget?

主要发现

方法	MFLOPs	Top-1 err. (%)
MobileNetV3-Small(1.0)	56	32.60
ShuffleNet V2 (1.0)	146	30.60
MobileNetV2 (1.0)	298	28.00
MobileNetV3-Large(1.0)	219	24.8
ResNet18	1730	30.41
ResNet50	3890	23.67
ShuffleNet v1 (1.0)	140	32.60
MobileNet v2 (0.75)	145	32.10
MobileNet v2 (0.6)	141	33.30
MobileNet v1 (0.5)	149	36.30
DenseNet (1.0)	142	45.20
Xception (1.0)	145	34.10
IGCV2 (0.5)	156	34.50
IGCV3-D (0.7)	210	31.50
Dy-MobileNetV3-Small	59	29.7
Dy-shuffle (1.0)	92	29.6
Dy-mobile (1.0)	135	28.27
Dy-MobileNetV3-Large	228	22.9
Dy-ResNet18	567	31.01
Dy-ResNet50	1119	23.75

DyNet reduces FLOPs significantly across several networks (e.g., 37.0%, 54.7%, 67.2%, 71.3% for ShuffleNetV2, MobileNetV2, ResNet18, ResNet50 respectively) with minimal accuracy loss.
On MobileNetV3-Small/Large, DyNet improves Top-1 accuracy by 2.9% and 1.9% respectively with marginal FLOP change.
DyNet accelerates inference by up to 1.87x on CPU for Dy-MobileNetV2, 1.32x for Dy-ResNet18, and 1.48x for Dy-ResNet50 (CPU performance gains).
In segmentation (Cityscapes), Dy-ResNet50 reduces FLOPs by 69.3% while maintaining Mean IoU.
Ablations show dynamic kernels outperform fixed-kernel baselines, with g_t increasing performance (e.g., improvements over Fix-mobile/Fix-shuffle).
Dynamic kernels exhibit reduced correlation between kernels, indicating reduced redundancy and more efficient representations.

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。