QUICK REVIEW

[論文レビュー] Distilling Object Detectors via Decoupled Features

Jianyuan Guo, Kai Han|arXiv (Cornell University)|Mar 26, 2021

Advanced Neural Network Applications参考文献 65被引用数 26

ひとこと要約

この論文は、教師モデルから学生モデルへの知識移転を改善するために、特徴領域を分離（対象物と背景）し、正のRoI提案と負のRoI提案を分離して転移を行う DeFeat を導入し、COCOおよびVOCで1段・2段検出器の性能を向上させる。

ABSTRACT

Knowledge distillation is a widely used paradigm for inheriting information from a complicated teacher network to a compact student network and maintaining the strong performance. Different from image classification, object detectors are much more sophisticated with multiple loss functions in which features that semantic information rely on are tangled. In this paper, we point out that the information of features derived from regions excluding objects are also essential for distilling the student detector, which is usually ignored in existing approaches. In addition, we elucidate that features from different regions should be assigned with different importance during distillation. To this end, we present a novel distillation algorithm via decoupled features (DeFeat) for learning a better student detector. Specifically, two levels of decoupled features will be processed for embedding useful information into the student, i.e., decoupled features from neck and decoupled proposals from classification head. Extensive experiments on various detectors with different backbones show that the proposed DeFeat is able to surpass the state-of-the-art distillation methods for object detection. For example, DeFeat improves ResNet50 based Faster R-CNN from 37.4% to 40.9% mAP, and improves ResNet50 based RetinaNet from 36.5% to 39.7% mAP on COCO benchmark. Our implementation is available at https://github.com/ggjy/DeFeat.pytorch.

研究の動機と目的

物体と背景の両方の中間特徴を考慮して、物体検出器の知識移転を改善する動機付け。
背景領域が対象領域を補完し得ることを示し、背景は役に立たないという仮定に挑戦する。
首部（Neck/FFPN）特徴とRoIに整列した提案上で動く、分離特徴蒸留フレームワーク（DeFeat）を提案する。
COCOおよびVOCデータセットで、2段・1段検出器と複数のバックボーンに対して有効性を示す。

提案手法

ground-truthマスクを用いて中間FPN特徴を対象物領域と背景領域に分離し、別個の模倣損失（Eq. 5）を適用する。
分類ヘッド内の領域提案を正（対象物）と負（背景）に分離し、別個のKL発散損失（Eq. 8, Eq. 9）で蒸留する。
分離された特徴蒸留（L_fea）、分離された分類蒸留（L_cls）、および標準の検出損失（L_reg, L_rpn）を組み合わせて、エンドツーエンドの学習を行う（Eq. 3）。
勾配の大きさをバランスさせるために、適応重み付け（α_obj, α_bg, β_obj, β_bg）と温度スケーリング（T_obj, T_bg）を用いた教師–生徒設定を採用する（Eq. 5, Eq. 8）。
COCOおよびVOCで、Faster R-CNN/FPN（二段）と RetinaNet（一段）の両方への適用性を示す。

実験結果

リサーチクエスチョン

RQ1 neck特徴の背景領域が物体検出の知識蒸留をより効果的にするのか？
RQ2分類ヘッドにおける物体/正の提案と背景/負の提案を分離することで、蒸留性能は向上するのか？
RQ3DeFeatは2段・1段検出器と異なるバックボーンの両方で適用可能か？
RQ4従来のKD法と比較して、COCOおよびVOCベンチマークにおける分離特徴の定量的影響はどの程度か？

主な発見

Model	Distillation	mAP	AP_S	AP_M	AP_L
R152-FPN (Teacher) -> R50-FPN (Student)	Baseline (no KD)	37.4	21.8	41.0	47.8
FGFI	R152-R50-FPN	39.9	22.9	43.6	52.8
TADF	R152-R50-FPN	40.1	23.0	43.6	53.0
DeFeat (Decoupled-Neck)	R152-R50-FPN	40.4	23.4	44.4	53.1
DeFeat (Decoupled-Neck + Decoupled-Cls)	R152-R50-FPN	40.8	23.5	44.8	53.3
DeFeat (Backbone + Decoupled-Neck + Decoupled-Cls)	R152-R50-FPN	40.9	23.6	44.8	53.5
R152-R50-RetinaNet	Baseline (no KD)	36.5	20.9	40.2	47.0
FGFI	R152-R50-RetinaNet	38.9	21.9	42.5	52.2
DeFeat	R152-R50-RetinaNet	39.7	23.4	43.6	52.9
R152-FPN (Teacher) -> R50-FPN (Student) on VOC	Baseline	80.53	-	-	-
DeFeat	R152-R50-FPN	82.28	-	-	-

DeFeatはCOCOにおいてResNet50-FPNを用いたFaster R-CNNのmAPを37.4%から40.9%へ、同様にResNet50-FPNを用いたCOCOのRetinaNetのmAPを36.5%から39.7%へ向上させる。
分離されたネック特徴（対象物と背景）を用いると、分離された提案のみよりも利益が大きく、バックボーン蒸留が貢献して最良の結果（COCOで40.9% mAP）を生む。
分類ヘッドでの正/負の領域提案を分離することで、全提案を平等に扱う場合より勾配バランスが改善され、mAPが高くなる（例: 一部の設定で40.9% vs 40.5%）。
Pascal VOCでは、教師-生徒設定のDeFeatがベースラインの学生より82.28% mAPを示す（80.53%）。
DeFeatはCOCOの複数設定でFGFIおよびTADFベースラインを上回る傾向を一貫して示す（例: RetinaNet/FPN系で39.7–40.9% mAP）。
アブレーションにより、物体領域が局在化を推し進め、背景領域が偽陽性を減らす両方が全体の gains に寄与することが示される。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。