QUICK REVIEW

[論文レビュー] Channel-wise Distillation for Semantic Segmentation.

Changyong Shu, Yifan Liu|arXiv (Cornell University)|Nov 26, 2020

Advanced Neural Network Applications参考文献 41被引用数 9

ひとこと要約

本稿では、空間的特徴マップの整合性をとるのではなく、ソフトマックス正規化された特徴マップのチャネルごとにKLダイバージェンスを最小化することで知識を転送するチャネルごとの distillation を提案する。この手法は、より低い学習コストで空間的 distillation のベースラインを上回る性能を達成し、複数のベンチマークおよびネットワークアーキテクチャで最先端の結果を達成する。

ABSTRACT

Knowledge distillation (KD) has been proven to be a simple and effective tool for training compact models. Almost all KD variants for semantic segmentation align the student and teacher networks' feature maps in the spatial domain, typically by minimizing point-wise and/or pair-wise discrepancy. Observing that in semantic segmentation, some layers' feature activations of each channel tend to encode saliency of scene categories (analogue to class activation mapping), we propose to align features channel-wise between the student and teacher networks. To this end, we first transform the feature map of each channel into a distribution using softmax normalization, and then minimize the Kullback-Leibler (KL) divergence of the corresponding channels of the two networks. By doing so, our method focuses on mimicking the soft distributions of channels between networks. In particular, the KL divergence enables learning to pay more attention to the most salient regions of the channel-wise maps, presumably corresponding to the most useful signals for semantic segmentation. Experiments demonstrate that our channel-wise distillation outperforms almost all existing spatial distillation methods for semantic segmentation considerably, and requires less computational cost during training. We consistently achieve superior performance on three benchmarks with various network structures. Code is available at: this https URL

研究の動機と目的

特徴マップの空間的整合性に注目する既存の知識 distillation 法がセマンティックセグメンテーションにおいて抱える制限を解決すること。
チャネルごとの特徴マップの整合性が、特徴マップ内の意味的関連性および顕著性をより良く捉えられるかを検討すること。
学習中の計算コストを低減しつつ、モデル性能を維持または向上させること。
各チャネルごとのソフト分布を活用することで、特徴マップの最も顕著な領域に注目する distillation 法を開発すること。

提案手法

各チャネルの特徴マップをソフトマックス正規化により確率分布に変換する。
学生モデルと教師モデルの対応するチャネル間のカルバック・ライブラー（KL）ダイバージェンスを計算する。
KLダイバージェンスを最小化することで、チャネルごとのソフト活性化分布を整合させる。
最も顕著な領域に学習を集中させ、それらが最も有用な意味的信号を保持すると仮定する。
訓練中にチャネルごとの distillation 損失を適用し、学生ネットワークが教師のチャネルごとの活性化パターンを模倣できるように誘導する。
さまざまなバックボーンアーキテクチャおよびベンチマークを用いて、汎用性と効率性を評価する。

実験結果

リサーチクエスチョン

RQ1チャネルごとの distillation は、セマンティックセグメンテーションにおいて空間的 distillation 法を上回ることができるか？
RQ2チャネルごとのソフト分布の整合性が、空間的特徴マップの整合性よりも優れた特徴表現をもたらすか？
RQ3チャネルごとの distillation は、性能を維持または向上させつつ、学習コストを低減できるか？
RQ4この手法は、異なるネットワークアーキテクチャおよびベンチマークデータセットにどのように一般化されるか？

主な発見

提案されたチャネルごとの distillation 法は、3つの主要なセマンティックセグメンテーションベンチマークにおいて、ほぼすべての既存の空間的 distillation 法を上回る優れた性能を達成した。
この手法は、さまざまなネットワークアーキテクチャにおいて一貫してセグメンテーション精度を向上させ、強力な汎用性を示した。
チャネルごとの distillation を用いた学習は、空間的 distillation 法よりも計算コストが低かった。
ソフトマックス正規化されたチャネル特徴にKLダイバージェンスを適用することで、モデルが最も顕著な領域に注目でき、特徴表現学習が向上した。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。