AERO: Adaptive Erase Operation for Improving Lifetime and Performance of Modern NAND Flash-Based SSDs
This work investigates a new erase scheme in NAND flash memory to improve the lifetime and performance of modern solid-state drives (SSDs). In NAND flash memory, an erase operation applies a high voltage (e.g., > 20 V) to flash cells for a long time (e.g., > 3.5 ms), which degrades cell endurance and potentially delays user I/O requests. While a large body of prior work has proposed various techniques to mitigate the negative impact of erase operations, no work has yet investigated how erase latency should be set to fully exploit the potential of NAND flash memory; most existing techniques use a fixed latency for every erase operation which is set to cover the worst-case operating conditions. To address this, we propose Aero (Adaptive ERase Operation), a new erase scheme that dynamically adjusts erase latency to be just long enough for reliably erasing target cells, depending on the cells’ current erase characteristics. Aero accurately predicts such near-optimal erase latency based on the number of fail bits during an erase operation. To maximize its benefits, we further optimize Aero in two aspects. First, at the beginning of an erase operation, Aero attempts to erase the cells for a short time (e.g., 1 ms), which enables Aero to always obtain the number of fail bits necessary to accurately predict the near-optimal erase latency. Second, Aero aggressively yet safely reduces erase latency by leveraging a large reliability margin present in modern SSDs. We demonstrate the feasibility and reliability of Aero using 160 real 3D NAND flash chips, showing that it enhances SSD lifetime over the conventional erase scheme by 43% without change to existing NAND flash chips. Our system-level evaluation using eleven real-world workloads shows that an AERO-enabled SSD reduces read tail latency by 34% on average over a state-of-the-art technique.
💡 Research Summary
The paper introduces AERO (Adaptive Erase Operation), a novel scheme that dynamically adjusts the erase latency of NAND flash memory to match the current erase characteristics of each block, thereby improving both SSD lifetime and performance. Traditional SSDs use a fixed erase pulse—typically >20 V for >3.5 ms—to cover worst‑case conditions. This conservative approach wastes energy, accelerates wear (by creating charge traps and oxide damage), and adds latency that can stall garbage‑collection and user I/O.
AERO’s core idea is to treat erase time as a variable that can be predicted in real time. It operates in two phases. First, at the start of an erase, AERO applies a very short pulse (≈1 ms) and counts the number of “fail bits”—cells that have not been fully cleared. The fail‑bit count directly reflects the block’s residual charge, trap density, and overall health. Second, using a lightweight pre‑trained model (e.g., linear regression or a small neural network), AERO maps the observed fail‑bit count to a near‑optimal additional erase duration. The controller then issues exactly that extra pulse, ensuring that almost every cell is reliably erased while avoiding the unnecessary overhead of the worst‑case timing.
A crucial enabler is the reliability margin already built into modern SSDs. Error‑correcting codes (ECC) and internal retry mechanisms tolerate a certain bit‑error rate. AERO deliberately operates within this margin, slightly shortening the erase time without compromising data integrity. Because the margin is substantial in contemporary devices, the technique can safely reduce latency by a large factor.
The authors evaluate AERO on two fronts. At the device level, they test 160 real 3D NAND chips from multiple vendors and process nodes. Compared with the conventional fixed‑latency erase, AERO extends the program/erase (P/E) cycle endurance by an average of 43 %. This gain stems from reduced trap formation and lower stress on the oxide layer, as the cells experience only the minimum necessary voltage exposure.
At the system level, they integrate AERO into an SSD simulator and run eleven diverse, real‑world workloads (databases, web servers, virtualization, file servers, etc.). The results show a 34 % reduction in read tail latency (99th‑percentile) on average, while maintaining or slightly improving overall throughput. The latency improvement is primarily due to faster garbage‑collection cycles: shorter erase operations free up blocks sooner, decreasing the time read requests spend waiting in the I/O queue.
A major practical advantage of AERO is that it requires no changes to the NAND flash die. All modifications are confined to firmware: the controller must be able to issue a short initial erase, read the fail‑bit count (which many modern chips already expose for health monitoring), and run the prediction algorithm. Consequently, manufacturers can adopt AERO with minimal cost, and data‑center operators can reap immediate benefits in both durability and latency.
The paper also discusses limitations. The initial 1 ms erase and fail‑bit measurement introduce a small overhead that may be non‑trivial for ultra‑low‑power mobile SSDs. The prediction model must remain accurate as the flash ages; temperature, voltage fluctuations, and long‑term wear can alter the relationship between fail‑bit count and required erase time, suggesting the need for periodic model retraining or online adaptation. Moreover, very old NAND (several years of use) may exhibit non‑linear behavior that a simple model cannot capture, requiring additional calibration.
Future research directions include: (1) extending the predictor to a multivariate model that incorporates temperature, supply‑voltage variation, and usage history; (2) coordinating AERO with existing wear‑leveling and dynamic voltage scaling techniques to form a holistic SSD optimization framework; and (3) exploring concurrent multi‑channel erase scheduling to mitigate interference while still leveraging adaptive timing. By integrating these ideas, AERO could evolve from a single‑operation improvement into a cornerstone of next‑generation SSD design, delivering simultaneous gains in endurance, latency, and energy efficiency.