QUICK REVIEW

[論文レビュー] Top-Down Feedback for Crowd Counting Convolutional Neural Network

Deepak Babu Sam, R. Venkatesh Babu|arXiv (Cornell University)|Jul 24, 2018

Video Surveillance and Tracking Methods被引用数 72

ひとこと要約

本論文は、トップダウンのフィードバック機構を導入し、ボトムアップのCNNアクティベーションをゲートして crowd density の予測を修正し、主要データセット全体で精度を向上させる。

ABSTRACT

Counting people in dense crowds is a demanding task even for humans. This is primarily due to the large variability in appearance of people. Often people are only seen as a bunch of blobs. Occlusions, pose variations and background clutter further compound the difficulty. In this scenario, identifying a person requires larger spatial context and semantics of the scene. But the current state-of-the-art CNN regressors for crowd counting are feedforward and use only limited spatial context to detect people. They look for local crowd patterns to regress the crowd density map, resulting in false predictions. Hence, we propose top-down feedback to correct the initial prediction of the CNN. Our architecture consists of a bottom-up CNN along with a separate top-down CNN to generate feedback. The bottom-up network, which regresses the crowd density map, has two columns of CNN with different receptive fields. Features from various layers of the bottom-up CNN are fed to the top-down network. The feedback, thus generated, is applied on the lower layers of the bottom-up network in the form of multiplicative gating. This masking weighs activations of the bottom-up network at spatial as well as feature levels to correct the density prediction. We evaluate the performance of our model on all major crowd datasets and show the effectiveness of top-down feedback.

研究の動機と目的

高密度カウントにおける密度予測を修正するための高レベルのシーンコンテキストの必要性を動機づける。
ボトムアップ密度回帰器とトップダウンフィードバック生成器の2経路アーキテクチャを提案する。
トップダウンフィードバックの乗法ゲーティングが複数データセットを横断してカウント精度を向上させることを実証する。
フィードバック機構の有効性と信頼性を検証するアブレーションを示す。

提案手法

密度マップを予測する2つの受容野を持つ2列のボトムアップCNN回帰器。
トップダウンCNNはボトムアップネットワークの高レベル特徴からフィードバックを生成する。
フィードバックはボトムアップCNNの下位層アクティベーションへ乗法ゲーティングとして適用される。
訓練は段階的で、まずボトムアップCNNを訓練し、次にカウント損失とゲート特徴量のL1正則化を用いてトップダウンネットワークを訓練する。
最終密度マップはゲーティング適用後に生成され、ボトムアップ訓練には標準的なL2損失、トップダウン訓練にはカウント損失を用いる。
評価は4データセットでMAEとMSEを用い、プーリングの関係で密度マップを1/4解像度へダウンサンプリングする。

実験結果

リサーチクエスチョン

RQ1トップダウンモジュールを介した高レベルのシーンコンテキストは、密集した群衆での偽検出を減らすことができるか？
RQ2ボトムアップのアクティベーションを乗法ゲーティングで制御することは、ベースラインのボトムアップCNNと比べて群衆密度推定を改善するか？
RQ3トップダウンフィードバック枠組みは、密度と視点が異なるデータセットに対して頑健か？
RQ4トップダウンアプローチは他のマルチカラムネットワークと比べてパラメータ効率に優れるか？

主な発見

TDF-CNN は Shanghaitech Part A および Part B で多くのベースラインより低いMAEとMSEを達成し、パラメータ数も少ない。
アブレーションでは、フィードバックなしのボトムアップCNNが Shanghaitech Part A でMAE 147.4 のところ、トップダウンフィードバックで 97.5 に低下。
単一の9×9カラムでもトップダウンフィードバックは効果を維持し、MAEを21.4%削減。
ゲート特徴マップはスパースマスクとして機能し、偽のアクティベーションを抑制し、正当な応答を保持する。
UCF CC 50 では TDF-CNN が 0.13M パラメータで 354.7 MAE と 491.4 MSE を達成し、より多くのパラメータを持つ手法と競合する。
WorldExpo’10 では、いくつかのシーケンスで平均MAEおよびシーンごとのMAEが優れており、データセット全体に渡る利点を示している。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。