QUICK REVIEW

[論文レビュー] Deformable Kernels: Adapting Effective Receptive Fields for Object Deformation

Hang Gao, Xizhou Zhu|arXiv (Cornell University)|Oct 7, 2019

Advanced Neural Network Applications参考文献 37被引用数 35

ひとこと要約

Deformable Kernels (DKs) を導入する、カーネル空間でサンプリングすることにより実効受容野ERFを直接適応させるドロップイン畳み込み演算子。データのサンプリンググリッドを変更せずにオブジェクトの変形をより適切に処理可能にする。

ABSTRACT

Convolutional networks are not aware of an object's geometric variations, which leads to inefficient utilization of model and data capacity. To overcome this issue, recent works on deformation modeling seek to spatially reconfigure the data towards a common arrangement such that semantic recognition suffers less from deformation. This is typically done by augmenting static operators with learned free-form sampling grids in the image space, dynamically tuned to the data and task for adapting the receptive field. Yet adapting the receptive field does not quite reach the actual goal -- what really matters to the network is the "effective" receptive field (ERF), which reflects how much each pixel contributes. It is thus natural to design other approaches to adapt the ERF directly during runtime. In this work, we instantiate one possible solution as Deformable Kernels (DKs), a family of novel and generic convolutional operators for handling object deformations by directly adapting the ERF while leaving the receptive field untouched. At the heart of our method is the ability to resample the original kernel space towards recovering the deformation of objects. This approach is justified with theoretical insights that the ERF is strictly determined by data sampling locations and kernel values. We implement DKs as generic drop-in replacements of rigid kernels and conduct a series of empirical studies whose results conform with our theories. Over several tasks and standard base models, our approach compares favorably against prior works that adapt during runtime. In addition, further experiments suggest a working mechanism orthogonal and complementary to previous works.

研究の動機と目的

ERF を標的にすることで理論的受容野ではなく、ネットワークが物体の変形にどのように対処するかを改善する動機づけ。
推論時にERFを適応させるため、カーネル空間を再サンプリングする Deformable Kernels (DKs) を導入する。
二つの DK 変種（global と local）を提供し、それらの効果と既存手法との適合性を分析する。
画像分類と物体検出タスクにおいて DKs をドロップイン置換として示し、改善を達成する。

提案手法

ERF を定義し、データのサンプリング位置とカーネル値への依存性を分析する。
学習されたオフセットを介して元のカーネル空間を再サンプリングとして Deformable Kernels (DKs) を定式化する： O_j = sum_k I_{j+k} W_{k+Δk}。
DKs における ERF は ERF_DK(i;j) = sum_k∈K R^{(n)}(i; j+k, k+Δk) によって与えられることを示す。
二つの DK 変種を実装する：global DK（画像ごとに1つのオフセット格子）と local DK（出力位置ごとのオフセット）、サブピクセルカーネルサンプリングには双線形補間を用いる。
Deformable Convolutions および Dynamic/Conditional Convolutions との接続と互換性について議論する。
ResNet-50-DW および MobileNet-V2 のバックボーンで ImageNet（分類）と COCO（物体検出）を用いて DKs を評価する。

実験結果

リサーチクエスチョン

RQ1データのサンプリンググリッドを変更せずに、学習されたカーネル空間のサンプリングを介して ERF を直接適応させることで、物体の変形に対するロバスト性を向上させることができるか？
RQ2バックボーンやタスクを跨いで、グローバルとローカルの DK 変種は精度と効率の面でどのように比較されるか？
RQ3DKs は既存の変形または動的推論手法（例：Deformable Convolutions、Conditional Convolutions）と組み合わせた場合、相補的であるか？
RQ4カーネルスコープサイズが性能と最適化に与える影響は？
RQ5DKs は意味よりもオブジェクトのスケールに整列する ERF を学習するのか、そしてこれが他の手法との相補的な挙動にどのように影響するのか？

主な発見

DKs は剛性カーネルより性能を向上させ、利得はカーネルスコープサイズに影響される；例えば ResNet-50-DW の 4x4 スコープを持つローカル DK は Top-1 精度を 78.1%（DK なしは 76.3%）にする。
4x4 を超えるスコープを増やすと収益は逓減し、提示されたアブレーションで ImageNet の最大利益は約 +1.4 Top-1 ポイント程度。
物体検出では、局所 DK がベースラインより一貫した mAP 増加を提供、例えば ResNet-50-DW の局所 DK で 37.8 mAP（ベースライン 36.6）に到達し、グローバル DK と組み合わせると 38.4 mAP。
DKs は物体領域により局在的な ERF を学習し、スケールに対してより意識的である傾向があり、Conditional Convolutions のような意味論に焦点を当てた手法と相補的なダイナミクスを示唆する。
DKs を Deformable Convolutions および Conditional Convolutions と組み合わせると加算的な改善を生み、直交的かつ互換性のある動作機構を示す。
タスクをまたいで、DKs は強力なベースラインと比較して競争力のあるまたは優れた性能を示し、特定の構成ではパラメータ数を減らしつつ（例：SCC と比較）、従来の手法と組み合わせた場合に強い互換性を示す。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。