Skip to main content
QUICK REVIEW

[论文解读] Memristive tabular variational autoencoder for compression of analog data in high energy physics

Rajat Gupta, Yuvaraj Elangovan|arXiv (Cornell University)|Feb 17, 2026
Advanced Memory and Neural Computing被引用 0
一句话总结

论文展示了一个边缘AI管线,使用VAE将48个模拟校准器信号压缩为4维潜在空间,并 distilled 到决策树后部署在基于忆阻器的ACAM上,实现12x压缩、24 ns延迟、3.3亿次压缩/秒吞吐量以及每次压缩4.1 nJ。

ABSTRACT

We present an implementation of edge AI to compress data on an in-memory analog content-addressable memory (ACAM) device. A variational autoencoder is trained on a simulated sample of energy measurements from incident high-energy electrons on a generic three-layer scintillator-based calorimeter. The encoding part is distilled into tabular format by regressing the latent space variables using decision trees, which is then programmed on a memristor-based ACAM. In real-time, the ACAM compresses 48 continuously valued incoming energies measured by the calorimeter sensors into the latent space, achieving a compression factor of 12x, which is transmitted off-detector for decompression. The performance result of the ACAM, obtained using the Structural Simulation Toolkit, the SST open source framework, gives a latency value of 24 ns and a throughput of 330M compressions per second, i.e., 3 ns between successive inputs, and an average energy consumption of 4.1 nJ per compression.

研究动机与目标

  • Motivate and enable real-time, front-end compression of high-rate calorimeter data to reduce storage and bandwidth while preserving key physics observables.
  • Develop an end-to-end pipeline that combines a neural encoder with a tree-based surrogate deployed on analog memory to enable in-memory compression.
  • Quantify physics fidelity after compression and compare hardware performance against a fully digital FPGA baseline.
  • Assess latency, throughput, and energy efficiency of the ACAM-based implementation across bit-precision regimes.
  • Demonstrate that distillation to tabular representations does not significantly degrade physics observables.

提出的方法

  • Train a variational autoencoder (VAE) on simulated ECAL shower data to learn a 4-dimensional latent representation from 48 input energy deposits.
  • Distill the VAE encoder using boosted decision trees (BDT) to regress latent variables from the same inputs.
  • Tabularize the regressed latent variables into root-to-leaf decision paths and map them to an analog content-addressable memory (ACAM) in a memristor crossbar with an SRAM periphery.
  • Deploy the tabular encoder on ACAM for in-memory, row-parallel, analog range-compare inference, obtaining the compressed latent outputs for streaming data.
  • Evaluate physics observables (E_tot, E_l, f_l, shower depth, and lateral widths) before and after compression to ensure preservation of shower characteristics.
  • Compare ACAM performance against a digital FPGA implementation in terms of latency, throughput, and energy per compression.

实验结果

研究问题

  • RQ1Can a VAE-encoded 4D latent representation faithfully compress ECAL shower data without significant loss of physics information?
  • RQ2Does distilling the VAE encoder into a tabular, ACAM-deployable form preserve performance and fidelity compared with the original neural encoder?
  • RQ3What are the latency, throughput, and energy implications of an ACAM-based, in-memory tabular encoder across different bit-precisions?
  • RQ4How does the ACAM-based approach compare with a digital FPGA baseline for front-end data compression in high energy physics?
  • RQ5Is the end-to-end pipeline robust for real-time streaming data at high channel counts and collision rates?

主要发现

Input width (bits)481632Output width (bits)16161616Clock speed (MHz)300300300300Timing (Clock ticks)13131313Latency (ns)43434343Resource utilization (LUT)146k146k197k365kRegister56k75k80k88kLutram2k2k2k2kSlice12k26k34k63kLookahead82k2k2k2kEnergy per compression (nJ)20415074
41643146k56k2k12k2k2k------120-
81643146k75k2k26k2k2k------110-
161643197k80k2k34k2k2k------100-
321643365k88k2k63k2k2k------74-
  • The VAE learns a 4D latent space from 48 inputs, giving an effective 12x compression while preserving shower structure.
  • BDT distillation yields latent reconstructions that are statistically indistinguishable from those produced by the neural encoder for key observables.
  • ACAM implementation achieves 24 ns core latency, and up to 330 million compressions per second in a pipelined design, with energy ~4.1 nJ per compression at optimal settings.
  • The FPGA baseline shows higher latency and varying energy per compression with bit width, while ACAM shows substantially lower energy at low precision.
  • Physics observables such as E_tot, layer energies, fractions, shower depth, and lateral widths are well reproduced after compression, with small tails and few-percent KS differences.

更好的研究,从现在开始

从论文设计到论文写作,大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成,并经人工编辑审核。