QUICK REVIEW

[论文解读] Channel-wise Distillation for Semantic Segmentation.

Changyong Shu, Yifan Liu|arXiv (Cornell University)|Nov 26, 2020

Advanced Neural Network Applications参考文献 41被引用 9

一句话总结

本文提出了一种用于语义分割的通道级蒸馏方法，通过在通道维度上最小化Softmax归一化特征图之间的KL散度来传递知识，而非在空间上对齐特征图。该方法在训练成本更低的情况下，性能优于空间蒸馏基线方法，在多个基准数据集和网络架构上取得了最先进结果。

ABSTRACT

Knowledge distillation (KD) has been proven to be a simple and effective tool for training compact models. Almost all KD variants for semantic segmentation align the student and teacher networks' feature maps in the spatial domain, typically by minimizing point-wise and/or pair-wise discrepancy. Observing that in semantic segmentation, some layers' feature activations of each channel tend to encode saliency of scene categories (analogue to class activation mapping), we propose to align features channel-wise between the student and teacher networks. To this end, we first transform the feature map of each channel into a distribution using softmax normalization, and then minimize the Kullback-Leibler (KL) divergence of the corresponding channels of the two networks. By doing so, our method focuses on mimicking the soft distributions of channels between networks. In particular, the KL divergence enables learning to pay more attention to the most salient regions of the channel-wise maps, presumably corresponding to the most useful signals for semantic segmentation. Experiments demonstrate that our channel-wise distillation outperforms almost all existing spatial distillation methods for semantic segmentation considerably, and requires less computational cost during training. We consistently achieve superior performance on three benchmarks with various network structures. Code is available at: this https URL

研究动机与目标

解决现有知识蒸馏方法在语义分割中局限于特征图空间对齐的局限性。
探究通道级特征对齐是否能更好地捕捉特征图中的语义相关性和显著性。
在保持或提升模型性能的同时，降低训练过程中的计算成本。
开发一种通过利用每通道的软分布来强调特征图中最显著区域的蒸馏方法。

提出的方法

使用Softmax归一化将每个通道的特征图转换为概率分布。
计算学生网络与教师网络对应通道之间的Kullback-Leibler（KL）散度。
通过最小化KL散度来对齐通道间的软激活分布。
将学习重点集中在特征图中最显著的区域，这些区域被认为携带了最有用的语义信号。
在训练过程中应用通道级蒸馏损失，以引导学生网络模仿教师网络的通道级激活模式。
在多种主干网络架构和基准数据集上应用该方法，以评估其泛化能力与效率。

实验结果

研究问题

RQ1通道级蒸馏在语义分割中是否能优于空间蒸馏方法？
RQ2在通道维度上对齐软分布是否能带来比空间特征图对齐更优的特征表示？
RQ3通道级蒸馏是否能在保持或提升性能的同时降低训练成本？
RQ4该方法在不同网络架构和基准数据集上的泛化能力如何？

主要发现

所提出的通道级蒸馏方法在三个主要语义分割基准上，性能优于几乎所有现有的空间蒸馏方法。
该方法在多种网络架构上均一致提升了分割精度，展现出强大的泛化能力。
与空间蒸馏方法相比，使用通道级蒸馏进行训练所需的计算成本更低。
在Softmax归一化的通道特征上使用KL散度，使模型能够聚焦于最显著区域，从而改善了特征表示学习。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。