We present an implementation of edge AI to compress data on an in-memory analog content-addressable memory (ACAM) device. A variational autoencoder is trained on a simulated sample of energy measurements from incident high-energy electrons on a generic three-layer scintillator-based calorimeter. The encoding part is distilled into tabular format by regressing the latent space variables using decision trees, which is then programmed on a memristor-based ACAM. In real-time, the ACAM compresses 48 continuously valued incoming energies measured by the calorimeter sensors into the latent space, achieving a compression factor of 12x, which is transmitted off-detector for decompression. The performance result of the ACAM, obtained using the Structural Simulation Toolkit, the SST open source framework, gives a latency value of 24 ns and a throughput of 330M compressions per second, i.e., 3 ns between successive inputs, and an average energy consumption of 4.1 nJ per compression.
The growth of data in high energy physics is exploding, mirroring the trend in industry [1]. We describe the challenges, and our solution, of the data acquisition systems for proposed experiments at a future lepton collider. For instance, the electron-electron version of Future Circular Collider (FCC) called FCC-ee [2] or the Muon Collider called 𝜇C [3,4] may need a streaming data acquisition system to readout the collisions. For these and other such large-scale systems, we consider an autoencoder (AE) from artificial intelligence (AI) that is distilled and implemented on a powerefficient memristive analog memory device to compress analog data near the front-end. We consider a generic scenario of compressing energy deposits measured by a three-layer calorimeter detector system in a streaming readout setup. The setup and dataflow are shown on the top row of Figure 1; the bottom row shows the hardware implementation, which is detailed later in Methods.
to sense amplifiers (SA)
Memristor circuit with 6 transistors + 2 memristors The energy deposits are projected onto the transverse planes, which are then simplified by grouping energies of nearby sensor elements, which serves as input to the tabular AE. Bottom: Close-up of a memristor-based analog content-addressable memory shows the crossbar structure of the input data (x) crossing the match line to read-out into static RAM. Further close-up at each crossbar shows the memristor circuit architecture to produce a binary output. The latent data is transmitted and decompressed.
Lepton collisions-in contrast to the proton collisions or collisions involving heavy ions at, e.g., the Large Hadron Collider (LHC) at CERN-produce partonic-level interactions that yield a nearunity fraction of the collision rate to save offline for further study. In terms of dataflow, it is not inconceivable to have a billion channels operating at tens of kHz of the collision rate [5]. Regardless of whether it is necessary to save the full resolution data, it is desirable to compress data to reduce storage requirements if it is possible to maintain key physics attributes of the collision. We focus on the compression scheme that uses variational AE (VAE). In the literature many related ideas exist. For instance, implementations of VAE on field programmable gate arrays (FPGA) are focused on its use as anomaly detector [6], where a score is computed by comparing the input quantities and the decompressed quantities. Other FPGA-based VAE implementations focus on the simplification of the neural networks by putting a penalty on complexity [7,8] and distillation of the model using decision tree regressors [9]. There are other non-neural approaches such as using density estimates in the input space that naturally result in decision tree models [10,11]. Lastly, an implementation on application-specific integrated circuit (ASIC) presents a method to compress data on front-end [12]. While innovations on this front continue, we note that all of these approaches may be susceptible to scaling limitations that may be overcome with in-memory computing (IMC) that is used in our approach. A brief background on IMC on the edge is presented before discussing our setup.
In general, the performance of AI inference is increasingly constrained by the memory wall [13]. For example, in deep learning workloads, moving weights and activations between separate computing and memory units leads to increased latency and to limited scalability and integration, and to low energy efficiency [14]. This bottleneck becomes a significant limitation in data-intensive applications, like the scenario mentioned above [15]. The IMC paradigm [16][17][18][19][20] moves away from conventional von Neumann computing architectures and addresses the memory wall issue by colocating storage and computation. New integrated circuit topologies based on emerging memory technologies, such as resistive RAM (ReRAM or RRAM) and phase-change memory (PCM), or on classic memory cells, such as static RAM (SRAM), enable the parallel execution of arithmetic operations within memory. A successful example of this paradigm are the ReRAM crossbar arrays [21,22], which are used to perform analog dot products in situ via Ohm’s and Kirchhoff’s laws, reducing data movement and the cost per multiply-accumulate (MAC) unit. Using the IMC framework, content-addressable memory (CAM) can be built using analog components. Mixedsignal IMC chips based on phase-change memory have demonstrated higher compute density and energy efficiency than contemporary GPU chips on targeted inference workloads [23,24], while high Technology Readiness Level (TRL) digital SRAM compute-in-memory macros achieve peak energy efficiencies up to 50x those of GPU chips [25]. The acceleration of small ML models at the edge for high energy physics has been demonstrated at high TRL on FPGA platforms, using on-chip URAM and BRAM and following the dataflow paradigm [11,26]. However, this solution
This content is AI-processed based on open access ArXiv data.