QUICK REVIEW

[論文レビュー] Deep Learning for Event-based Vision: A Comprehensive Survey and Benchmarks

Zheng Xu, Yexin Liu|arXiv (Cornell University)|Feb 17, 2023

Advanced Memory and Neural Computing被引用数 39

ひとこと要約

この調査はイベントベース視覚の深層学習手法の包括的な分類を提供し、復元とシーン理解の両方を横断する主要なDLアプローチをベンチマークし、課題と将来の方向性をオープンソースのリポジトリとともに議論します。

ABSTRACT

Event cameras are bio-inspired sensors that capture the per-pixel intensity changes asynchronously and produce event streams encoding the time, pixel position, and polarity (sign) of the intensity changes. Event cameras possess a myriad of advantages over canonical frame-based cameras, such as high temporal resolution, high dynamic range, low latency, etc. Being capable of capturing information in challenging visual conditions, event cameras have the potential to overcome the limitations of frame-based cameras in the computer vision and robotics community. In very recent years, deep learning (DL) has been brought to this emerging field and inspired active research endeavors in mining its potential. However, there is still a lack of taxonomies in DL techniques for event-based vision. We first scrutinize the typical event representations with quality enhancement methods as they play a pivotal role as inputs to the DL models. We then provide a comprehensive survey of existing DL-based methods by structurally grouping them into two major categories: 1) image/video reconstruction and restoration; 2) event-based scene understanding and 3D vision. We conduct benchmark experiments for the existing methods in some representative research directions, i.e., image reconstruction, deblurring, and object recognition, to identify some critical insights and problems. Finally, we have discussions regarding the challenges and provide new perspectives for inspiring more research studies.

研究の動機と目的

DL入力のイベント表現と品質向上の包括的な概要を提供する。
画像再構成/復元とシーン理解/3DビジョンにDL手法を分類する。
代表的なDL手法をベンチマークして性能の洞察とギャップを特定する。
イベントベースDLの今後の研究を導くための課題と将来の方向性を議論する。

提案手法

イベント表現を6つのカテゴリ（画像ベース、サーフェスベース、学習ベース、ボクセルベース、グラフベース、スパイクベース）に分類し、それぞれのタスク適性を分析する。
ノイズの多い低解像度のイベントデータに対する品質向上技術（デノイジングと超解像）をレビューする。
DLベースの画像/動画復元とイベント誘導SR/VSRのアプローチを調査し、MSE、SSIM、LPIPS、待機時間指標で性能を比較する。
分類、検出、追跡、セマンティックセグメンテーション、深度推定など、イベントベースのシーン理解タスクのDLパイプラインを要約する。
オープンソースの分類を提供し、公開リポジトリのコードリンクを維持して更新する。

実験結果

リサーチクエスチョン

RQ1イベントデータをDNNフレンドリーな入力としてどのように表現または変換すべきか？
RQ2イベントから学習する際に最適化ベースの方法より深層学習がもたらす利点は何か？
RQ3効果的なイベントベース視覚には非常に深いニューラルモデリングが必要か？
RQ4DL手法はイベントカメラの低遅延・高テンポ特性とモデルの複雑さをどのようにバランスさせるべきか？
RQ5畳み込み演算はイベントのフィルタリングに不可欠か、それとも代替アーキテクチャの方が適している可能性があるか？

主な発見

手法	タイプ	MSE	SSIM	LPIPS	時間
E2VID [90]	DL-based	0.069	0.395	0.438	0.2448 s
ECNN [91]	D-based	0.056	0.416	0.442	0.2839 s
BTEB [92]	DL-based	0.090	0.357	0.520	0.4059 s
Tikhonov [89]	Model-based	0.121	0.356	0.485	0.4401 s
TV [89]	Mode-based	0.113	0.386	0.502	4.0443 s
CNN [89]	DL-based	0.080	0.437	0.485	28.3904 s

DLベースのイベント視覚のためのイベント表現と品質向上手法の包括的な分類を提供する。
画像/動画の復元とイベント誘導SR/VSRのDLベースアプローチを要約し、長所と制約を強調する。
イベントデータを用いたシーン理解と3DビジョンのDLベース手法を要約する。
オブジェクト認識や復元などのベンチマーク実験が実用的な洞察と残る課題を明らかにする。
研究を継続するための分類とコードリンクを含むオープンソースリポジトリを導入する。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。