QUICK REVIEW

[論文レビュー] DELTA: DEep Learning Transfer using Feature Map with Attention for Convolutional Networks

Xingjian Li, Haoyi Xiong|arXiv (Cornell University)|Jan 26, 2019

Domain Adaptation and Few-Shot Learning参考文献 23被引用数 88

ひとこと要約

DELTAは、外層の特徴マップを監視付きアテンション機構で揃えることによりターゲットとソースのネットワークを正規化し、L2およびL2-SPを上回るファインチューニング性能を向上させる。

ABSTRACT

Transfer learning through fine-tuning a pre-trained neural network with an extremely large dataset, such as ImageNet, can significantly accelerate training while the accuracy is frequently bottlenecked by the limited dataset size of the new target task. To solve the problem, some regularization methods, constraining the outer layer weights of the target network using the starting point as references (SPAR), have been studied. In this paper, we propose a novel regularized transfer learning framework DELTA, namely DEep Learning Transfer using Feature Map with Attention. Instead of constraining the weights of neural network, DELTA aims to preserve the outer layer outputs of the target network. Specifically, in addition to minimizing the empirical loss, DELTA intends to align the outer layer outputs of two networks, through constraining a subset of feature maps that are precisely selected by attention that has been learned in an supervised learning manner. We evaluate DELTA with the state-of-the-art algorithms, including L2 and L2-SP. The experiment results show that our proposed method outperforms these baselines with higher accuracy for new tasks.

研究の動機と目的

転移学習における重み距離を超えた正規化を動機づけ、小さいターゲットデータセット上で一般化を改善する。
ソースとターゲットのネットワーク間で外層の特徴マップを整列させる行動ベースの正規化を提案する。
discriminative powerによって特徴マップを重み付けする監視付きアテンション機構を導入する。
最適化を加速しファインチューニングを安定化させるSPARに触発された近接項を組み込む。

提案手法

各学習サンプルごとにソースネットワークとターゲットネットワークの外層特徴マップの距離を測る正規化項を定義する。
畳み込みフィルターとReLUを通して特徴マップFM_jを計算し、FM_j(z, w, x)としてベクトル化する。
j番目のフィルターを除去したときの損失の差を基にした監視付きアテンションから得られるW_jで各特徴マップの寄与を重み付けする（損失差のソフトマックス）。
総損失を経験的損失に、重み付け済みの行動正規化項 Omega(ω, ω*, x, y, z) とプライベートパラメータ近接項（SPAR）を加えた形で定式化する。
L2-FEをベースラインとしてアテンションの重要度を推定しアテンション重みを計算する。最適化を加速するためSPARと共に訓練を行う。
生の重みではなく特徴マップを抽出・正規化して、識別的な振る舞いの転移を促進する。

実験結果

リサーチクエスチョン

RQ1外層特徴マップ（振る舞い）をソースとターゲットのネットワーク間で整列させることは、重み距離の正規化より転送性能を改善するか。
RQ2ファインチューニング時にCNNを小さなターゲットデータセットで用いる場合、特徴マップを再重み付けする監視付きアテンション機構を組み込むと転移学習は改善されるか。
RQ3DELTAはさまざまな視覚タスクと基本アーキテクチャにおいてL2およびL2-SP正規化とどう比較されるか。
RQ4アテンションが活性化マップと未活性化チャンネルの再利用の概念に与える影響はどのようか。

主な発見

データセット	モデル	L2	L2-SP	DELTA（ATTなし）	DELTA
MIT Indoors 67	ResNet-101	83.7	85.1	85.3	85.5
Stanford Dogs 120	ResNet-101	83.3	88.3	88.3	88.7
Caltech 256-30	ResNet-101	84.7	85.4	85.7	86.6
Caltech 256-60	ResNet-101	87.2	87.2	87.6	88.7
CUB-200-2011	ResNet-101	78.4	79.5	78.9	80.5
Food-101	ResNet-101	85.3	86.4	85.9	86.3
MIT Indoors 67	Inception-V3	74.8	74.6	76.9	78.1
Stanford Dogs 120	Inception-V3	88.6	89.4	88.7	88.7
Caltech 256-30	Inception-V3	83.6	83.3	83.4	84.9
Caltech 256-60	Inception-V3	85.8	85.3	85.1	86.8
CUB-200-2011	Inception-V3	74.3	75.2	74.5	76.5
Food-101	Inception-V3	76.9	75.9	76.2	80.8

DELTAはMIT Indoors 67、Stanford Dogs 120、Caltech 256-30/60、CUB-200-2011、Food-101の複数データセットでトップ1精度の点でL2およびL2-SPベースラインを上回る。
アテンションを組み込んだDELTAは、アテンションなしのDELTAよりすべての報告データセットで精度が向上する。
データ拡張を用いると、DELTAはさらに精度を向上させ、比較対象の中で最良のままを維持する。
ケーススタディで、DELTAは主要な画像領域に対してより識別的な活性化フォーカス（アテンション）を生み出し、未活性化チャンネル再利用仮説を支持する。
DELTAは検証済みスケジューラ下でL2-SPよりも収束が速く滑らかな学習曲線を示す。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。