QUICK REVIEW

[论文解读] Speck: A Smart event-based Vision Sensor with a low latency 327K Neuron Convolutional Neuronal Network Processing Pipeline

Ole Richter, Yannan Xing|arXiv (Cornell University)|Apr 13, 2023

Advanced Memory and Neural Computing参考文献 49被引用 8

一句话总结

Speck1 是一个片上异步事件驱动视觉传感器，拥有327K神经元的 sCNN 管道，在每个事件上实现 3.36 µs 延迟，并在边缘视觉任务中具备高吞吐量。

ABSTRACT

Edge computing solutions that enable the extraction of high-level information from a variety of sensors is in increasingly high demand. This is due to the increasing number of smart devices that require sensory processing for their application on the edge. To tackle this problem, we present a smart vision sensor System on Chip (SoC), featuring an event-based camera and a low-power asynchronous spiking Convolutional Neural Network (sCNN) computing architecture embedded on a single chip. By combining both sensor and processing on a single die, we can lower unit production costs significantly. Moreover, the simple end-to-end nature of the SoC facilitates small stand-alone applications as well as functioning as an edge node in larger systems. The event-driven nature of the vision sensor delivers high-speed signals in a sparse data stream. This is reflected in the processing pipeline, which focuses on optimising highly sparse computation and minimising latency for 9 sCNN layers to 3.36μs for an incoming event. Overall, this results in an extremely low-latency visual processing pipeline deployed on a small form factor with a low energy budget and sensor cost. We present the asynchronous architecture, the individual blocks, and the sCNN processing principle and benchmark against other sCNN capable processors.

研究动机与目标

在边缘实现高速、低功耗感知处理的边缘计算需求的动机。
展示一个将事件驱动相机与低功耗异步 sCNN 处理器集成在单芯片上的系统。
实现超低延迟和以稀疏驱动计算为特征的实时视觉任务。
在延迟、吞吐量和能效方面评估 Speck1 相对于其他 sCNN 处理器的性能。

提出的方法

在单一ASIC上设计并实现一个128x128的事件驱动 Vision Pixel 传感器，采用 Temporal Contrast 编码。
开发一个九层异步脉冲CNN (sCNN) 管道，具备就地计算和4阶段握手，使用 QDI DR 编码。
集成一个星型拓扑的片上网络(NoC)，将 AER 事件路由到最多两个目的地，冲突最小化。
创建传感器事件预处理模块用于 ROI、池化、旋转/翻转、极性过滤和源映射。
实现具有卷积核锚点、地址扫描和就地神经元计算单元的 LIF-like 动力学的卷积核心。
提供一个读出核心，能够进行多通道脉冲/类别计数和基于时间的统计，并实现异步到同步的接口。

实验结果

研究问题

RQ1一个集成事件驱动传感器与专用 sCNN 管线的系统是否能在实时视觉任务中实现超低延迟，同时保持实用的精度？
RQ2在边缘视觉应用中，异步事件驱动架构在延迟、吞吐量和能量方面相较帧基 CNN 加速器有何不同？
RQ3片内突触记忆和随时计算卷积核在稀疏事件处理方面对面积与功耗有何影响？

主要发现

通过九层卷积+池化的 sCNN，在 ASIC 输入/输出边缘测量时实现每事件 3.36 μs 的延迟。
该架构在每个神经元计算单元上支持≈30 M events/s 的吞吐量，支持对稀疏事件流的高并行性。
在 spike 转换的 NMNIST 基准上，Speck1 的片上精度为 86.17%（ANN2SNN）和 98.56%（离线的 BPTT-CNN 训练方案）。
与基于帧的 CNN 加速器相比，Speck1 为事件输入提供显著更低的延迟，并在无完整帧批处理的情况下实现实时处理。
突触记忆与随时计算卷积核在与某些更大规模 SNN 处理器（如 Loihi1/2）相比，能够在相似或更低的面积/能耗预算下实现更高的突触利用率。
在评估配置中，Speck1 的推理能量约为 141 μJ（ANN2SNN）和 180 μJ（BPTT-CNN）。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。