QUICK REVIEW

[论文解读] Memristive tabular variational autoencoder for compression of analog data in high energy physics

Rajat Gupta, Yuvaraj Elangovan|arXiv (Cornell University)|Feb 17, 2026

Advanced Memory and Neural Computing被引用 0

一句话总结

论文展示了一个边缘AI管线，使用VAE将48个模拟校准器信号压缩为4维潜在空间，并 distilled 到决策树后部署在基于忆阻器的ACAM上，实现12x压缩、24 ns延迟、3.3亿次压缩/秒吞吐量以及每次压缩4.1 nJ。

ABSTRACT

We present an implementation of edge AI to compress data on an in-memory analog content-addressable memory (ACAM) device. A variational autoencoder is trained on a simulated sample of energy measurements from incident high-energy electrons on a generic three-layer scintillator-based calorimeter. The encoding part is distilled into tabular format by regressing the latent space variables using decision trees, which is then programmed on a memristor-based ACAM. In real-time, the ACAM compresses 48 continuously valued incoming energies measured by the calorimeter sensors into the latent space, achieving a compression factor of 12x, which is transmitted off-detector for decompression. The performance result of the ACAM, obtained using the Structural Simulation Toolkit, the SST open source framework, gives a latency value of 24 ns and a throughput of 330M compressions per second, i.e., 3 ns between successive inputs, and an average energy consumption of 4.1 nJ per compression.

研究动机与目标

Motivate and enable real-time, front-end compression of high-rate calorimeter data to reduce storage and bandwidth while preserving key physics observables.
Develop an end-to-end pipeline that combines a neural encoder with a tree-based surrogate deployed on analog memory to enable in-memory compression.
Quantify physics fidelity after compression and compare hardware performance against a fully digital FPGA baseline.
Assess latency, throughput, and energy efficiency of the ACAM-based implementation across bit-precision regimes.
Demonstrate that distillation to tabular representations does not significantly degrade physics observables.

提出的方法

Train a variational autoencoder (VAE) on simulated ECAL shower data to learn a 4-dimensional latent representation from 48 input energy deposits.
Distill the VAE encoder using boosted decision trees (BDT) to regress latent variables from the same inputs.
Tabularize the regressed latent variables into root-to-leaf decision paths and map them to an analog content-addressable memory (ACAM) in a memristor crossbar with an SRAM periphery.
Deploy the tabular encoder on ACAM for in-memory, row-parallel, analog range-compare inference, obtaining the compressed latent outputs for streaming data.
Evaluate physics observables (E_tot, E_l, f_l, shower depth, and lateral widths) before and after compression to ensure preservation of shower characteristics.
Compare ACAM performance against a digital FPGA implementation in terms of latency, throughput, and energy per compression.

实验结果

研究问题

RQ1Can a VAE-encoded 4D latent representation faithfully compress ECAL shower data without significant loss of physics information?
RQ2Does distilling the VAE encoder into a tabular, ACAM-deployable form preserve performance and fidelity compared with the original neural encoder?
RQ3What are the latency, throughput, and energy implications of an ACAM-based, in-memory tabular encoder across different bit-precisions?
RQ4How does the ACAM-based approach compare with a digital FPGA baseline for front-end data compression in high energy physics?
RQ5Is the end-to-end pipeline robust for real-time streaming data at high channel counts and collision rates?

主要发现

Input width (bits)	4	8	16	32	Output width (bits)	16	16	16	16	Clock speed (MHz)	300	300
4	16	43	146k	56k	2k	12k	2k	2k	---	---	120	-
8	16	43	146k	75k	2k	26k	2k	2k	---	---	110	-
16	16	43	197k	80k	2k	34k	2k	2k	---	---	100	-
32	16	43	365k	88k	2k	63k	2k	2k	---	---	74	-

The VAE learns a 4D latent space from 48 inputs, giving an effective 12x compression while preserving shower structure.
BDT distillation yields latent reconstructions that are statistically indistinguishable from those produced by the neural encoder for key observables.
ACAM implementation achieves 24 ns core latency, and up to 330 million compressions per second in a pipelined design, with energy ~4.1 nJ per compression at optimal settings.
The FPGA baseline shows higher latency and varying energy per compression with bit width, while ACAM shows substantially lower energy at low precision.
Physics observables such as E_tot, layer energies, fractions, shower depth, and lateral widths are well reproduced after compression, with small tails and few-percent KS differences.

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。