QUICK REVIEW

[논문 리뷰] Pushing the Limits of Asynchronous Graph-based Object Detection with Event Cameras

Daniel Gehrig, Davide Scaramuzza|arXiv (Cornell University)|2022. 11. 22.

Advanced Memory and Neural Computing인용 수 22

한 줄 요약

논문은 Gen1 및 N-Caltech101에서 최신 정확도를 달성하면서 per-event 계산을 줄이고 깊이와 용량을 크게 증가시키는 이벤트 기반 물체 탐지를 위한 확장 가능하고 효율적인 비동기 그래프 신경망(GNN)을 소개합니다.

ABSTRACT

State-of-the-art machine-learning methods for event cameras treat events as dense representations and process them with conventional deep neural networks. Thus, they fail to maintain the sparsity and asynchronous nature of event data, thereby imposing significant computation and latency constraints on downstream systems. A recent line of work tackles this issue by modeling events as spatiotemporally evolving graphs that can be efficiently and asynchronously processed using graph neural networks. These works showed impressive computation reductions, yet their accuracy is still limited by the small scale and shallow depth of their network, both of which are required to reduce computation. In this work, we break this glass ceiling by introducing several architecture choices which allow us to scale the depth and complexity of such models while maintaining low computation. On object detection tasks, our smallest model shows up to 3.7 times lower computation, while outperforming state-of-the-art asynchronous methods by 7.4 mAP. Even when scaling to larger model sizes, we are 13% more efficient than state-of-the-art while outperforming it by 11.5 mAP. As a result, our method runs 3.7 times faster than a dense graph neural network, taking only 8.4 ms per forward pass. This opens the door to efficient, and accurate object detection in edge-case scenarios.

연구 동기 및 목표

이벤트 카메라용 비동기 GNN의 효율성 간극을 해소하기 위해 깊고 고용량 모델을 속도를 저하시키지 않고 가능하게 함으로써 원동력을 부여하고자 한다.
정확도 증가와 per-event 비용을 낮게 유지하기 위해 pruning, 조기 시간 축적, LUT-Spline Convolutions, directed event graphs와 같은 아키텍처적·계산적 트릭을 제안한다.
다양한 검출기 규모(nano, small, medium, large)를 설계하고 평가하여 여러 레짐에서의 확장성 및 효율성을 입증한다.
Gen1 및 N-Caltech101 데이터셋에서 Dense 및 Sparse 비동기 방법과 비교하여 성능과 효율성 향상을 입증한다.

제안 방법

이벤트를 최대 5만 노드까지의 방향성 시공간 그래프로 표현한다.
핵심 메시지 전달 연산자로 Look-up-Table Spline Convolutions(LUT-SCs)을 사용한다.
빠른 정보 융합을 가능하게 하고 LUT-SC 배치를 가능하게 하기 위해 max pooling을 통한 조기 시간 축적을 도입한다.
풀링, 위치 반올림, 특징 변화에 의해 가이드되는 노드 업데이트 pruning으로 불필요한 계산을 건너뛴다(최대 73%).
입력 단계에서 방향성 이벤트 그래프(DEGs)를 도입하여 비용 최소화로 성능을 안정화시키고 향상시킨다.
그래프 출력에서 동작하는 YOLOX에서 영감을 받은 다중 스케일 탐지 헤드를 설계하여 경계 상자와 클래스 점수를 산출한다.

실험 결과

연구 질문

RQ1이벤트 카메라용 비동기 그래프 기반 네트워크에서 계산이 급증하지 않으면서 깊이와 용량은 어떻게 확장할 수 있는가?
RQ2정확도와 효율성의 최적의 trade-off를 제공하는 아키텍처 변화(예: pruning, 조기 축적, LUT-SCs, DEGs)는 무엇인가?
RQ3비동기 GNN 기반 검출기가 표준 이벤트 데이터셋에서 Dense 및 Recurrent 최첨단 방법과 경쟁할 수 있는가?
RQ4모델 크기(nano에서 large까지)가 Gen1 및 N-Caltech101에서 mAP와 MFLOPS/ev에 미치는 영향은 무엇인가?

주요 결과

방법	비동기	Gen1 mAP	Gen1 MFLOPS/ev	N-Caltech101 mAP	N-Caltech101 MFLOPS/ev
Inception+SSD [21]	✗	30.1	> 8’245*	-	-
Events+RRC [6]	✗	30.7	> 21’758	-	-
MatrixLSTM+YOLOv3 [5]	✗	31.0	> 34’519*	-	-
Events+YOLOv3 [24]	✗	31.2	> 34’518*	-	-
RED [38]	✗	40.0	4’712	-	-
ASTM-Net [26]	✗	46.7	> 21’758*	-	-
NVS-S [27]	✓	8.60	7.80	34.6	7.80
AsyNet [32]	✓	14.5	205	64.3	200
AEGNN [45]	✓	16.3	5.26	59.5	7.41
Spiking DenseNet [7]	✓	18.9	N/A	-	-
YOLE [4]	✓	-	-	39.8	3682
EAGR-N (ours)	✓	26.3	1.36	62.9	2.28
EAGR (ours)	✓	30.4	4.58	70.2	6.85
EAGR-M (ours)	✓	31.8	9.94	72.7	12.2
EAGR-L (ours)	✓	32.1	17.4	73.2	18.9

소형 모델은 최대 3.7배 낮은 계산으로도 async 기준선을 능가하며 Gen1에서 7.4 mAP의 성능을 기록한다.
중형 모델은 가장 효율적인 이전 방법보다 13% 더 효율적이면서 11.5 mAP를 상회한다.
대형 모델은 모든 Dense 방법 및 다른 Sparse 방법을 능가하며 Gen1에서 32.1 mAP, N-Caltech101에서 73.2 mAP를 달성한다.
비동기 처리는 Dense GNN에 비해 3.7배 빠르게 작동한다(전방향 패스당 8.4 ms).
절단은 max pooling 및 조기 축적을 통해 MFLOPS/ev를 4.58까지 감소시키며 mAP 손실은 미미하게 나타난다; LUT-SC는 단순 스플라인 컨볼루션 구현에 비해 약 4.5배의 계산 감소를 제공한다.
Directed event graphs는 비교적 작은 계산 비용으로 mAP를 1.8만큼 증가시키는 모드형(boost)을 제공한다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.