[论文解读] Memristive tabular variational autoencoder for compression of analog data in high energy physics
论文展示了一个边缘AI管线,使用VAE将48个模拟校准器信号压缩为4维潜在空间,并 distilled 到决策树后部署在基于忆阻器的ACAM上,实现12x压缩、24 ns延迟、3.3亿次压缩/秒吞吐量以及每次压缩4.1 nJ。
We present an implementation of edge AI to compress data on an in-memory analog content-addressable memory (ACAM) device. A variational autoencoder is trained on a simulated sample of energy measurements from incident high-energy electrons on a generic three-layer scintillator-based calorimeter. The encoding part is distilled into tabular format by regressing the latent space variables using decision trees, which is then programmed on a memristor-based ACAM. In real-time, the ACAM compresses 48 continuously valued incoming energies measured by the calorimeter sensors into the latent space, achieving a compression factor of 12x, which is transmitted off-detector for decompression. The performance result of the ACAM, obtained using the Structural Simulation Toolkit, the SST open source framework, gives a latency value of 24 ns and a throughput of 330M compressions per second, i.e., 3 ns between successive inputs, and an average energy consumption of 4.1 nJ per compression.
研究动机与目标
- Motivate and enable real-time, front-end compression of high-rate calorimeter data to reduce storage and bandwidth while preserving key physics observables.
- Develop an end-to-end pipeline that combines a neural encoder with a tree-based surrogate deployed on analog memory to enable in-memory compression.
- Quantify physics fidelity after compression and compare hardware performance against a fully digital FPGA baseline.
- Assess latency, throughput, and energy efficiency of the ACAM-based implementation across bit-precision regimes.
- Demonstrate that distillation to tabular representations does not significantly degrade physics observables.
提出的方法
- Train a variational autoencoder (VAE) on simulated ECAL shower data to learn a 4-dimensional latent representation from 48 input energy deposits.
- Distill the VAE encoder using boosted decision trees (BDT) to regress latent variables from the same inputs.
- Tabularize the regressed latent variables into root-to-leaf decision paths and map them to an analog content-addressable memory (ACAM) in a memristor crossbar with an SRAM periphery.
- Deploy the tabular encoder on ACAM for in-memory, row-parallel, analog range-compare inference, obtaining the compressed latent outputs for streaming data.
- Evaluate physics observables (E_tot, E_l, f_l, shower depth, and lateral widths) before and after compression to ensure preservation of shower characteristics.
- Compare ACAM performance against a digital FPGA implementation in terms of latency, throughput, and energy per compression.
实验结果
研究问题
- RQ1Can a VAE-encoded 4D latent representation faithfully compress ECAL shower data without significant loss of physics information?
- RQ2Does distilling the VAE encoder into a tabular, ACAM-deployable form preserve performance and fidelity compared with the original neural encoder?
- RQ3What are the latency, throughput, and energy implications of an ACAM-based, in-memory tabular encoder across different bit-precisions?
- RQ4How does the ACAM-based approach compare with a digital FPGA baseline for front-end data compression in high energy physics?
- RQ5Is the end-to-end pipeline robust for real-time streaming data at high channel counts and collision rates?
主要发现
| Input width (bits) | 4 | 8 | 16 | 32 | Output width (bits) | 16 | 16 | 16 | 16 | Clock speed (MHz) | 300 | 300 | 300 | 300 | Timing (Clock ticks) | 13 | 13 | 13 | 13 | Latency (ns) | 43 | 43 | 43 | 43 | Resource utilization (LUT) | 146k | 146k | 197k | 365k | Register | 56k | 75k | 80k | 88k | Lutram | 2k | 2k | 2k | 2k | Slice | 12k | 26k | 34k | 63k | Lookahead8 | 2k | 2k | 2k | 2k | Energy per compression (nJ) | 20 | 41 | 50 | 74 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 4 | 16 | 43 | 146k | 56k | 2k | 12k | 2k | 2k | --- | --- | 120 | - | ||||||||||||||||||||||||||||||||||||||||||
| 8 | 16 | 43 | 146k | 75k | 2k | 26k | 2k | 2k | --- | --- | 110 | - | ||||||||||||||||||||||||||||||||||||||||||
| 16 | 16 | 43 | 197k | 80k | 2k | 34k | 2k | 2k | --- | --- | 100 | - | ||||||||||||||||||||||||||||||||||||||||||
| 32 | 16 | 43 | 365k | 88k | 2k | 63k | 2k | 2k | --- | --- | 74 | - |
- The VAE learns a 4D latent space from 48 inputs, giving an effective 12x compression while preserving shower structure.
- BDT distillation yields latent reconstructions that are statistically indistinguishable from those produced by the neural encoder for key observables.
- ACAM implementation achieves 24 ns core latency, and up to 330 million compressions per second in a pipelined design, with energy ~4.1 nJ per compression at optimal settings.
- The FPGA baseline shows higher latency and varying energy per compression with bit width, while ACAM shows substantially lower energy at low precision.
- Physics observables such as E_tot, layer energies, fractions, shower depth, and lateral widths are well reproduced after compression, with small tails and few-percent KS differences.
更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。