QUICK REVIEW

[論文レビュー] Dynamic Filter Networks

Bert De Brabandere, Xu Jia|arXiv (Cornell University)|May 31, 2016

Advanced Vision and Imaging参考文献 16被引用数 476

ひとこと要約

Dynamic Filter Networksは入力データに条件づけられたフィルタを生成し、サンプル固有および位置固有のフィルタリングを実現する。動画/ステレオ予測などのタスクで、コンパクトなモデルで最先端の成果を達成する。

ABSTRACT

In a traditional convolutional layer, the learned filters stay fixed after training. In contrast, we introduce a new framework, the Dynamic Filter Network, where filters are generated dynamically conditioned on an input. We show that this architecture is a powerful one, with increased flexibility thanks to its adaptive nature, yet without an excessive increase in the number of model parameters. A wide variety of filtering operations can be learned this way, including local spatial transformations, but also others like selective (de)blurring or adaptive feature extraction. Moreover, multiple such layers can be combined, e.g. in a recurrent architecture. We demonstrate the effectiveness of the dynamic filter network on the tasks of video and stereo prediction, and reach state-of-the-art performance on the moving MNIST dataset with a much smaller model. By visualizing the learned filters, we illustrate that the network has picked up flow information by only looking at unlabelled training data. This suggests that the network can be used to pretrain networks for various supervised tasks in an unsupervised way, like optical flow and depth estimation.

研究の動機と目的

diverse motion patterns and deformations を扱うためのサンプル固有の変換学習を動機づける。
サンプル固有のフィルタを適用するためのフィルタ生成ネットワークと動的フィルタリング層を組み合わせる。
動的畳み込みと動的ローカルフィルタリングを柔軟で微分可能な操作として探る。
動画予測とステレオ予測での有効性を実証する。
フロー/深度関連表現の教師なし事前学習の可能性を示す。

提案手法

フィルタ生成ネットワークと動的フィルタリング層の二部構成の動的フィルタモジュールを導入する。
Dynamic convolution: 生成されたフィルタを入力全体に一様に適用する。
Dynamic local filtering: 各位置で生成されたフィルタを位置特異的に適用する。
フィルタを非拘束またはソフトマックスなどで緩く拘束し、スパース/ノイズの少ないフィルタを促す。
任意で動的なピクセル毎のバイアスを追加する。
バックプロパゲーションでエンドツーエンドに学習し、運動/フローを解釈するために学習済みフィルタを可視化する。

実験結果

リサーチクエスチョン

RQ1入力に条件付けられた動的に生成されたフィルタは固定畳み込みフィルタを超える柔軟性を提供するか。
RQ2動的畳み込みと動的ローカルフィルタリングは動画予測とステレオ視差合成でどう性能を発揮するか。
RQ3教師なしで学習した動的フィルタは追加タスクのプリトレーニングとしての運動/フロー情報を符号化するか。
RQ4動的と従来のフィルタリングアーキテクチャのパラメータ効率の利点は何か。

主な発見

Moving MNISTで、DFNはFC-LSTM（142,667,776）およびConv-LSTM（7,585,296）よりはるかに少ないパラメータ（637,361）で最先端の性能を達成。
ネットワークによって学習された動的フィルタは運動パターンを捉え、正確なフレーム予測と動く数字の分離を可能にする。
動的ローカルフィルタリングは局所的な変形や光度変化をモデル化できる位置特異的変換を可能にする。
動的フィルタはラベルなしデータから学習されたフロー様マップとして視覚化可能。
高速道路走行データに適用した場合、車線や橋といった構造化特徴を予測し、一定の一般化を示す。
ステレオ予測では水平フィルタが深度フロー/視差を可能にし、深度推定の教師なし事前学習の可能性を示す。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。