QUICK REVIEW

[论文解读] Parallel Separable 3D Convolution for Video and Volumetric Data Understanding.

Felix Gonda, Donglai Wei|arXiv (Cornell University)|Jan 1, 2018

Advanced Neural Network Applications被引用 4

一句话总结

本文提出并行可分离3D卷积（PmSCn），一种新型卷积模块，通过在不同维度上使用m条并行的n个2D卷积和1个1D卷积层，替代3D卷积。通过利用张量分解并联合替换连续的3D卷积层，PmSCn在视频动作识别、MRI脑部分割和电子显微镜分割任务中实现了约14%的准确率提升和40%更小的模型尺寸。

ABSTRACT

For video and volumetric data understanding, 3D convolution layers are widely used in deep learning, however, at the cost of increasing computation and training time. Recent works seek to replace the 3D convolution layer with convolution blocks, e.g. structured combinations of 2D and 1D convolution layers. In this paper, we propose a novel convolution block, Parallel Separable 3D Convolution (PmSCn), which applies m parallel streams of n 2D and one 1D convolution layers along different dimensions. We first mathematically justify the need of parallel streams (Pm) to replace a single 3D convolution layer through tensor decomposition. Then we jointly replace consecutive 3D convolution layers, common in modern network architectures, with the multiple 2D convolution layers (Cn). Lastly, we empirically show that PmSCn is applicable to different backbone architectures, such as ResNet, DenseNet, and UNet, for different applications, such as video action recognition, MRI brain segmentation, and electron microscopy segmentation. In all three applications, we replace the 3D convolution layers in state-of-the art models with PmSCn and achieve around 14% improvement in test performance and 40% reduction in model size and on average.

研究动机与目标

解决视频和体素数据理解中3D卷积层带来的高计算成本和训练时间问题。
通过引入并行流，克服现有2D+1D或1D+2D可分离方法的局限性，以更好地近似3D卷积。
开发一种灵活、即插即用的模块，可无缝集成到现有的3D CNN主干网络（如ResNet、DenseNet和UNet）中。
在保持或提升性能的同时，减少模型尺寸和推理时间，适用于多种3D学习任务。

提出的方法

基于张量分解，为使用m条并行流（Pm）替代单个3D卷积层提供理论依据。
设计一种多流架构，结合n条并行2D卷积和1条1D卷积，分别应用于不同的空间或时间维度。
将深度网络中连续的3D卷积层联合替换为多个2D卷积（Cn），以在减少参数量的同时保持表征能力。
通过可分离操作优化架构，以保持空间和时间不变性，同时实现高效计算。
通过设计可微分、端到端可训练的模块，确保与标准深度学习框架的兼容性。
将PmSCn模块直接作为3D卷积的替代品应用于最先进模型，无需重新设计网络架构。

实验结果

研究问题

RQ1与顺序或单流替代方案相比，并行的2D和1D卷积流是否能更好地近似3D卷积？
RQ2将连续的3D卷积层联合替换为多个2D卷积是否能保持或增强特征表征能力？
RQ3PmSCn在保持或提升3D视频和体素学习准确率的同时，能在多大程度上减少模型尺寸和推理时间？
RQ4PmSCn模块在不同架构（如ResNet、DenseNet、UNet）和任务（如动作识别、分割）中的泛化能力如何？

主要发现

PmSCn在所有评估的应用中均实现了约14%的测试性能提升：包括视频动作识别、MRI脑部分割和电子显微镜分割。
在最先进模型中，用PmSCn替换3D卷积层后，模型尺寸平均减少了40%。
通过高效的张量分解和并行计算，该方法显著降低了计算复杂度，同时保持了高准确率。
PmSCn模块无需修改网络架构，即可成功集成到ResNet、DenseNet和UNet主干网络中。
实证结果证实，与顺序或单流替代方案相比，并行流设计能更优地近似3D卷积。
将连续的3D卷积层联合替换为多个2D卷积，能有效保持表征能力并提升泛化性能。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。