Arxiv 2512.05342

Reading time: 5 minute
...

📝 Original Info

  • Title: Arxiv 2512.05342
  • ArXiv ID: 2512.05342
  • Date: Pending
  • Authors: ** 논문에 명시된 저자 정보가 제공되지 않았습니다. **

📝 Abstract

Second-order optimization methods, which leverage curvature information, offer faster and more stable convergence than first-order methods such as stochastic gradient descent (SGD) and Adam. However, their practical adoption is hindered by the prohibitively high cost of inverting the second-order information matrix, particularly in large-scale neural network training. Here, we present the first demonstration of a second-order optimizer powered by inmemory analog matrix computing (AMC) using resistive random-access memory (RRAM), which performs matrix inversion (INV) in a single step. We validate the optimizer by training a two-layer convolutional neural network (CNN) for handwritten letter classification, achieving 26% and 61% fewer training epochs than SGD with momentum and Adam, respectively. On a larger task using the same second-order method, our system delivers a 5.88× improvement in throughput and a 6.9× gain in energy efficiency compared to state-of-the-art digital processors. These results demonstrate the feasibility and effectiveness of AMC circuits for secondorder neural network training, opening a new path toward energy-efficient AI acceleration.

💡 Deep Analysis

📄 Full Content

Training and inference are the two fundamental operational phases of neural networks. Training is essential for determining valid model weights, which are then deployed on target devices for inference [1][2]. Due to its more complex computational nature, training poses greater challenges than inference and demands significantly more processing power. Traditional first-order methods utilize only gradient information, without accounting for the curvature of the loss landscape, which can lead to slow convergence and, in some cases, suboptimal performance [3]. In contrast, second-order methods explicitly capture this curvature by performing INV on the second-order information matrix (Fig. 1a). This allows them to precondition the gradient through automatic scaling and rotation, resulting in more effective optimization (Fig. 1b) [4]. However, implementing second-order optimization on conventional digital computers faces several challenges: (1) the high computational cost of matrix inversion (INV) due to its cubic complexity; (2) the limited memory capacity of digital systems to store the rapidly growing parameters of AI models; and (3) inefficient data transmission in traditional architectures (Fig. 2) [5].

Recently, an RRAM-based in-memory AMC circuit was developed to perform real-valued INV in a single step, leveraging the high parallelism and storage density of RRAM [6]. Meanwhile, Kronecker-factored approximate curvature (KFAC) can reduces the size of the matrix to be inverted by approximating the fisher information matrix (FIM), a form of second-order information [7]. In this work, we develop a second-order optimizer based on a scalable, high-precision inmemory AMC system. By approximating FIM using KFAC, we experimentally demonstrate second-order training of a twolayer CNN for handwritten letter classification, achieving performance comparable to the software baseline. Evaluation on a larger training task reveals significant improvements in both throughput and energy efficiency.

Fig. 3a illustrates the in-memory AMC circuit based on a 1T1R array, which solves the equation = . The matrix ( × ) is mapped as the conductance values of RRAM devices, referenced to a unit conductance value . To represent negative values, a column-wise splitting scheme with analog inverters is used, decomposing the original matrix as = -, where both ( × ) and ( × ) contains only positive entries [8]. Each source line (SL) in the array connects to the inverting input terminal of an operational amplifier (OA). The bit-lines (BLs) of the array are connected to the OA outputs, while those of the array are connected to the analog inverter outputs. Fig. 3b shows an optical photograph of the PCB system implementing the AMC circuit. Input voltages, generated by DACs, are applied to the SLs through unit conductance elements and represent - ( × ) . During operation, a bias voltage is applied to the word lines (WLs) by the Arduino MCU to control the transistors, resulting in that output voltages of OAs correspond to the solution vector ( × ) = , which is then captured by ADCs. All DACs and ADCs are controlled by MCU.

To construct the closed-loop AMC circuit in Fig. 3a, and considering device and circuit reliability, the front-end transistor array was fabricated using 350 nm CMOS technology by a commercial foundry (Fig. 4a). The back-end Pt/Ta/HfO₂/Pt RRAM was then integrated in a university laboratory (Fig. 4b). An optical photograph of the 8×8 1T1R array is also shown in Fig. 4c. Fig. 5 presents the I-V characteristics of a typical 1T1R cell under 8 distinct gate voltage conditions, with 20 cycles per condition. Fig. 6 shows the retention characteristics of the RRAM for over 100,000 seconds across 8 analog conductance states. Fig. 7 displays the cumulative distribution function (CDF) test results for the 8 states (ranging from 20 μS to 220 μS). The cells can be programmed to arbitrary conductance states using a writeverify method with a predefined tolerance (Fig. 8). Fig. 9 shows the repeated conductance programming results of 32 1T1R cells used in the real-valued 4×4 INV circuit experiments performed in this work, with a write error of 10 μS. It includes 2,800 RRAM array updates, resulting in 89,600 conductance values in the figure. To evaluate endurance, the two extreme conductance states (20 μS and 200 μS) were selected for testing. Fig. 10 shows that the RRAM device remains functional after more than 10,000 cycles.

By approximating the inverse of the layer-wise diagonalblock of the FIM, ( × ) , as the Kronecker product of the inverse of two smaller matrices ( × ) and ( × ) , KFAC decomposes the INV of the full FIM (∑ ×∑ ) into INVs of multiple small matrices (Fig. 11a). Damping factors and are also added to regularize the optimization direction. After vectorizing, the update vector ∇ for layer can be derived and used to update the model parameters. In the training process of our second-order optimizer (Fig. 11b), forward and backwar

Reference

This content is AI-processed based on open access ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut