QUICK REVIEW

[論文レビュー] Efficient Event Camera Volume System

Juan Camilo Soto, Ian Wilfred Noronha|arXiv (Cornell University)|Mar 16, 2026

Advanced Memory and Neural Computing被引用数 0

ひとこと要約

EECVS はイベントストリームを連続時間ディラック衝撃としてモデル化し、DCT、DTFT、DWT のいずれかを適応的に選択し、係数剪定を行うことで、アーティファクトレスで密度志向の圧縮とリアルタイム展開を実現し、データセット間の強い一般化を示す。

ABSTRACT

Event cameras promise low latency and high dynamic range, yet their sparse output challenges integration into standard robotic pipelines. We introduce \nameframew (Efficient Event Camera Volume System), a novel framework that models event streams as continuous-time Dirac impulse trains, enabling artifact-free compression through direct transform evaluation at event timestamps. Our key innovation combines density-driven adaptive selection among DCT, DTFT, and DWT transforms with transform-specific coefficient pruning strategies tailored to each domain's sparsity characteristics. The framework eliminates temporal binning artifacts while automatically adapting compression strategies based on real-time event density analysis. On EHPT-XC and MVSEC datasets, our framework achieves superior reconstruction fidelity with DTFT delivering the lowest earth mover distance. In downstream segmentation tasks, EECVS demonstrates robust generalization. Notably, our approach demonstrates exceptional cross-dataset generalization: when evaluated with EventSAM segmentation, EECVS achieves mean IoU 0.87 on MVSEC versus 0.44 for voxel grids at 24 channels, while remaining competitive on EHPT-XC. Our ROS2 implementation provides real-time deployment with DCT processing achieving 1.5 ms latency and 2.7X higher throughput than alternative transforms, establishing the first adaptive event compression framework that maintains both computational efficiency and superior generalization across diverse robotic scenarios.

研究の動機と目的

動的で高コントラストな環境におけるイベントカメラによる堅牢な知覚の動機付け。
多様なシーン密度に合う適応的・変換ベースの圧縮フレームワークの開発。
連続時間でイベントをモデル化することによる時間ビニングアーティファクトの排除。
複数のロボティクスデータセットに跨るリアルタイム展開と評価を可能にする。
圧縮表現の下流タスクへの一般化を評価する。

提案手法

イベントストリームを連続時間ディラック衝撃列としてモデル化し、時間ビニングアーティファクトを回避する。
窓関数内のイベント密度に基づくDCT、DTFT、DWTの密度駆動変換選択を導入する。
ディラック衝撃モデルの変換アトムとの内積を用いて係数を計算する（c_w,k = sum_i p_i φ_k(t_i)）。
窓ごとに固定予算 M の係数を保持し、変換固有の剪定を行う（DCT: 低周波数を保持; DTFT/DWT: 最大振幅の係数）。
標準的な知覚パイプラインのために、保持した係数を密な表現に詰める。

Figure 1: Event-to-dense representation in EECVS. Incoming event streams are processed within the framework and converted into compact dense representations through the application of DCT, DTFT, or DWT.

実験結果

リサーチクエスチョン

RQ1リアルタイムのイベント密度に基づく適応的変換選択は、イベントカメラストリームの圧縮品質と効率を改善できるか。
RQ2DCT、DTFT、DWT は、疎・中等・高密度のイベント領域において、時間的忠実度と空間的細部をどの程度保持するのか。
RQ3密度駆動の圧縮と得られた表現は、下流タスクおよびデータセット間でうまく一般化するか。

主な発見

DTFT はほとんどの再構成シナリオで最も低い earth mover distance を達成（9件中8件）。
DCT 処理は最も低レイテンシ（1.5 ms）を達成し、M=8 係数で DTFT または DWT より約2.7×高いスループットを実現。
MVSEC の 24 チャンネルで、EECVS の平均IoUは0.87、ボクセルグリッドは0.44であり、データセット間の強い一般化を示す。
EHPT-XC では EECVS は競合力を維持し、ボクセル表現の IoU に7ポイント未満で収まりつつ、計算上の利点を提供。
DTFT は多様なシーンで堅牢な時間的忠実度を提供し、DWT は疎なパターンで好まれ、DCT は高密度活動で効率のため好まれる。
DTFT の選択は 8 件の実験で最小の EMD をもたらし、全体の IoU はチャネル予算が 16 および 24 チャンネルで安定している（0.82）。

Figure 2: Compression process for a single event window. Events are aggregated, transformed with a basis selected according to activity density, pruned by either low-frequency retention (DCT) or magnitude selection (DTFT/DWT), and packed into dense representations.

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。