QUICK REVIEW

[論文レビュー] Knowledge Transfer via Distillation of Activation Boundaries Formed by Hidden Neurons

Byeongho Heo, Minsik Lee|arXiv (Cornell University)|Nov 8, 2018

Neural Networks and Applications参考文献 23被引用数 56

ひとこと要約

本論文は、ニューロンが活性化されるかどうかを転送することで活性化境界蒸留を提案し、活性化転送損失と微分可能なヒンジ様代替を用いて、従来の方法より知識移転と転移ラーニングを改善する。

ABSTRACT

An activation boundary for a neuron refers to a separating hyperplane that determines whether the neuron is activated or deactivated. It has been long considered in neural networks that the activations of neurons, rather than their exact output values, play the most important role in forming classification friendly partitions of the hidden feature space. However, as far as we know, this aspect of neural networks has not been considered in the literature of knowledge transfer. In this paper, we propose a knowledge transfer method via distillation of activation boundaries formed by hidden neurons. For the distillation, we propose an activation transfer loss that has the minimum value when the boundaries generated by the student coincide with those by the teacher. Since the activation transfer loss is not differentiable, we design a piecewise differentiable loss approximating the activation transfer loss. By the proposed method, the student learns a separating boundary between activation region and deactivation region formed by each neuron in the teacher. Through the experiments in various aspects of knowledge transfer, it is verified that the proposed method outperforms the current state-of-the-art.

研究の動機と目的

活性化の大きさではなくニューロンの活性境界に焦点を当てた知識移転の改善を動機付ける。
教師と学生のニューロンの活性状態の差を最小化する活性転送損失を提案する。
勾配ベースの最適化を可能にする非微分可能な活性転送損失を近似する微分可能な代替損失を開発する。
連結関数を用いてネットワークサイズの異なる場合に対応し、空間的に共有された転送を持つ畳み込みネットにも拡張する。
様々な転移学習シナリオで最先端手法より優れていることを示す。

提案手法

教師を T(I) 、学生を S(I) と非線形化前の活性化で定義する。
活性転送損失 L(I) = ||rho(T(I)) - rho(S(I))||1 を導入し、活性境界を対象とする。
活性転送損失を近似し勾配ベースの最適化を可能にする微分可能な代替損失を提供する（ヒンジ損失に類似）
訓練を安定化させるマージンパラメータ mu を導入し、勾配挙動を導出する。
ネットワークサイズが異なる場合に学生出力を教師サイズの表現に写像するコネクター関数 r を許可する。
枠組みを畳み込みネットワークへ拡張し、1x1 のコネクターを共有して空間的位置で総和することで適用。

実験結果

リサーチクエスチョン

RQ1教師と学生間で活性境界を転送することは、活性の大きさベースの転送を超えた知識移転の改善につながるか。
RQ2活性に焦点を当てた蒸留法は、さまざまなアーキテクチャとデータレジームで既存のKDベースおよび関連転送法を上回るか。
RQ3提案手法は、訓練データが限られた転移学習シナリオやネットワーク圧縮（サイズ/次元の差異）でどう機能するか。
RQ4畳み込みネットワークと空間的特徴マップへの効果的な拡張は可能か。

主な発見

提案された活性境界蒸留法は、複数の実験設定で最先端の転送手法を一貫して上回る。
学習を速め、特に小さな訓練データで一般化を改善する。
転移学習タスクでは、従来の ImageNet 事前学習ベースラインを上回ることが多い。
コネクター関数により、教師と学生のサイズが異なる場合にも知識移転を可能にし、効果的な圧縮シナリオを実現。
実験を平均した結果は頑健な活性境界転送を示し、アブレーションは非微分の活性転送損失を近似する手法であることを示唆している。
分析は、提案手法が Lp ベースの損失よりも高い活性類似性を教師と学生間で達成し、分類性能も向上することを示す。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。