QUICK REVIEW

[論文レビュー] Pushing the Limits of Asynchronous Graph-based Object Detection with Event Cameras

Daniel Gehrig, Davide Scaramuzza|arXiv (Cornell University)|Nov 22, 2022

Advanced Memory and Neural Computing被引用数 22

ひとこと要約

この論文は、イベントベースの物体検出のためのスケーラブルで効率的な非同期グラフニューラルネットワークを導入し、深さと容量を大幅に増やしつつ、1イベントあたりの計算を削減し、Gen1とN-Caltech101で最先端の精度を達成し、はるかに低い MFLOPSで実現します。

ABSTRACT

State-of-the-art machine-learning methods for event cameras treat events as dense representations and process them with conventional deep neural networks. Thus, they fail to maintain the sparsity and asynchronous nature of event data, thereby imposing significant computation and latency constraints on downstream systems. A recent line of work tackles this issue by modeling events as spatiotemporally evolving graphs that can be efficiently and asynchronously processed using graph neural networks. These works showed impressive computation reductions, yet their accuracy is still limited by the small scale and shallow depth of their network, both of which are required to reduce computation. In this work, we break this glass ceiling by introducing several architecture choices which allow us to scale the depth and complexity of such models while maintaining low computation. On object detection tasks, our smallest model shows up to 3.7 times lower computation, while outperforming state-of-the-art asynchronous methods by 7.4 mAP. Even when scaling to larger model sizes, we are 13% more efficient than state-of-the-art while outperforming it by 11.5 mAP. As a result, our method runs 3.7 times faster than a dense graph neural network, taking only 8.4 ms per forward pass. This opens the door to efficient, and accurate object detection in edge-case scenarios.

研究の動機と目的

イベントカメラの非同期 GNN における効率性のギャップを、より深く高容量なモデルを速度を犠牲にすることなく実現することで動機づけ、対処する。
アーキテクチャ上の工夫と計算的テクニック（ pruning、早期時間的集約、LUT-Spline Convolutions、directed event graphs）を提案し、1イベントあたりのコストを低く維持しつつ精度を向上させる。
スケーラビリティと効率を示すため、nano、small、medium、large の複数の検出器サイズを設計・評価する。
Gen1 と N-Caltech101 のデータセットに対して、Dense および Sparse の最先端手法と比較し、性能と効率の向上を示す。

提案手法

イベントを最大 50k ノードの指向性時空グラフとして表現する。
Look-up-Table Spline Convolutions (LUT-SCs) をコアのメッセージパッシング演算子として用いる。
最大プーリングによる早期時間的集約を組み込み、迅速な情報融合を可能にし LUT-SC の展開を可能にする。
プーリング、位置丸め、特徴変化に導かれたノード更新の剪定を適用し、不要な計算をスキップする（最大 73％）
入力時に directed event graphs (DEGs) を展開し、最小コストで安定化と性能向上を図る。
グラフ出力上で動作する YOLOX に触発されたマルチスケール検出ヘッドを設計し、境界ボックスとクラススコアを生成する。

実験結果

リサーチクエスチョン

RQ1イベントカメラの非同期グラフベースネットワークで、計算量を急増させることなく深さと容量をどのように拡張できるか。
RQ2精度と効率の最適なトレードオフを提供するアーキテクチャ上の変更（例：剪定、早期集約、LUT-SCs、DEGs）は何か。
RQ3非同期 GNN ベースの検出器は標準のイベントデータセット上で Dense および Recurrent の最先端手法と競合できるか。
RQ4モデルサイズ（nano から large まで）が Gen1 および N-Caltech101 における mAP と MFLOPS/ev に与える影響はどの程度か。

主な発見

手法	Async.	Gen1 mAP	Gen1 MFLOPS/ev	N-Caltech101 mAP	N-Caltech101 MFLOPS/ev
Inception+SSD [21]	✗	30.1	> 8’245*	-	-
Events+RRC [6]	✗	30.7	> 21’758	-	-
MatrixLSTM+YOLOv3 [5]	✗	31.0	> 34’519*	-	-
Events+YOLOv3 [24]	✗	31.2	> 34’518*	-	-
RED [38]	✗	40.0	4’712	-	-
ASTM-Net [26]	✗	46.7	> 21’758*	-	-
NVS-S [27]	✓	8.60	7.80	34.6	7.80
AsyNet [32]	✓	14.5	205	64.3	200
AEGNN [45]	✓	16.3	5.26	59.5	7.41
Spiking DenseNet [7]	✓	18.9	N/A	-	-
YOLE [4]	✓	-	-	39.8	3682
EAGR-N (ours)	✓	26.3	1.36	62.9	2.28
EAGR (ours)	✓	30.4	4.58	70.2	6.85
EAGR-M (ours)	✓	31.8	9.94	72.7	12.2
EAGR-L (ours)	✓	32.1	17.4	73.2	18.9

小型モデルは、非同期ベースラインを上回る mAP を達成しつつ、計算量を最大 3.7x削減。
中型モデルは、最も効率的な従来法より 13% 効率的で、さらに 11.5 mAP 上回る。
大型モデルは、全ての Dense 手法および他の Sparse 手法を上回り、Gen1 で 32.1 mAP、N-Caltech101 で 73.2 mAP。
非同期処理は Dense GNN より 3.7x 速く動作（前方伝搬あたり 8.4 ms）。
アブレーションにより、Max pooling と早期集約による剪定で MFLOPS/ev を最大 4.58 まで低減でき、mAP 損失は僅少、LUT-SC は素朴なスプライン畳み込み実装と比較して計算を約 4.5x 削減。
Directed event graphs は小さな計算コストで mAP を 1.8 上乗せする。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。