QUICK REVIEW

[論文レビュー] Deep Model Compression: Distilling Knowledge from Noisy Teachers

Bharat Bhusan Sau, Vineeth N Balasubramanian|arXiv (Cornell University)|Oct 30, 2016

Advanced Neural Network Applications参考文献 23被引用数 99

ひとこと要約

本論文は、複数のノイズのある教師からの学習を模倣するためにロジット撹乱（ノイズ）を導入することで、教師-学生の深層モデル圧縮を強化し、MNIST、SVHN、CIFAR-10における浅い学生の性能を改善する。

ABSTRACT

The remarkable successes of deep learning models across various applications have resulted in the design of deeper networks that can solve complex problems. However, the increasing depth of such models also results in a higher storage and runtime complexity, which restricts the deployability of such very deep models on mobile and portable devices, which have limited storage and battery capacity. While many methods have been proposed for deep model compression in recent years, almost all of them have focused on reducing storage complexity. In this work, we extend the teacher-student framework for deep model compression, since it has the potential to address runtime and train time complexity too. We propose a simple methodology to include a noise-based regularizer while training the student from the teacher, which provides a healthy improvement in the performance of the student network. Our experiments on the CIFAR-10, SVHN and MNIST datasets show promising improvement, with the best performance on the CIFAR-10 dataset. We also conduct a comprehensive empirical evaluation of the proposed method under related settings on the CIFAR-10 dataset to show the promise of the proposed approach.

研究の動機と目的

ストレージだけでなく実行時・学習時間も削減する深層モデル圧縮の動機づけ。
ロジット撹乱によるノイズベースの正則化項を導入することで、teacher-studentフレームワークを拡張する。
サンプルのサブセットから教師のロジットを撹乱することが正則化として機能し、学生の精度を向上させることを示す。
MNIST、SVHN、CIFAR-10 を対象に手法を評価し、性能向上と頑健性を分析する。

提案手法

ロジットをターゲットとして用い、事前学習済みの教師から知識を蒸留するアプローチに基づく。
z′(i) = (1 + ξ) z(i) によって教師ロジットを撹乱する。ξ ~ N(0, σ^2 I)。
損失を計算する前に、各ミニバッチのサブセットのサンプルを確率 α で撹乱する。
perturbed logits に対して L2 損失を用いて学生を訓練する，L(x, z′, θ)。
ロジット撹乱が損失関数内のノイズベース正則化子と同等であることを示す。
ターゲットのノイズ誘発多様性として、複数の教師からの学習を概念的に探る。

実験結果

リサーチクエスチョン

RQ1教師ロジットを撹乱する（ノイズ付きの教師）ことは、教師-学生圧縮における標準のロジット回帰と比較して浅い学生の精度を改善するか？
RQ2撹乱パラメータ（α、σ）はデータセット全体でどのように性能に影響するか？
RQ3複数のノイズ付き教師からの学習を効果的にシミュレートして、教師と学生の間の性能ギャップを縮小できるか？
RQ4提案手法はドロップアウトなどの標準的な正則化技法とどう比較されるか？
RQ5CIFAR-10におけるノイズ付き教師正則化の実行時間とストレージのトレードオフへの影響は？

主な発見

MNIST: 撹乱は一貫した改善をもたらし、ベースラインに対して最大で 11.3% の相対改善。
SVHN: 撹乱は控えめな改善を提供し、最高で約 3.3% の相対改善、ノイズが高いと性能が低下することもある。
CIFAR-10: 撹乱は最大でベースラインより 12.7% の相対改善を示す settings がある。
より高い α（より多くのロジットを撹乱）はCIFAR-10で一般的に性能を向上させるが、最適な α は教師-学生間のギャップに依存する。
教師のロジットを撹乱するほうが、学生を撹乱したりドロップアウトを用いた正則化より効果的である。
複数の教師（ノイズ付き教師を含む）からの学習は、単一教師のベースラインと比べて学生の性能をさらに向上させる可能性がある。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。