A Durable Flash Memory Search Tree

We consider the task of optimizing the B-tree data structure, used extensively in operating systems and databases, for sustainable usage on multi-level flash memory. Empirical evidence shows that this new flash memory tree, or FM Tree, extends the operational lifespan of each block of flash memory by a factor of roughly 27 to 70 times, while still supporting logarithmic-time search tree operations.

💡 Research Summary

The paper introduces the FM‑Tree, a flash‑aware variant of the classic B‑tree designed to dramatically extend the usable lifetime of multi‑level flash memory while preserving logarithmic‑time search, insertion, and deletion. The authors begin by outlining the fundamental constraints of NAND flash: writes can only turn bits from 1 to 0, and erasing must be performed on entire blocks, each of which can endure only a limited number of erase cycles. Conventional B‑trees, which frequently split, merge, and rewrite whole pages, cause a high number of block erasures, leading to rapid wear‑out when used directly on flash storage.

To address this, the FM‑Tree incorporates two complementary mechanisms. First, each node contains a small “log buffer” where updates (insertions and deletions) are recorded instead of being applied immediately to the main node data. When a buffer fills, a background batch‑merge operation consolidates the pending updates into the node, rewriting the block only once for many logical changes. This dramatically reduces write amplification. Second, the tree employs a level‑aware wear‑leveling scheme: nodes at different tree depths are mapped to distinct pools of physical blocks, ensuring that hot upper‑level nodes do not concentrate erasures on a small subset of blocks. The combination of lazy logging and balanced block allocation yields a per‑block erase count that is 27‑ to 70‑times lower than that of a standard B‑tree under identical workloads.

The authors detail the algorithms for search, insert, and delete. Search proceeds as in a normal B‑tree but also checks the node’s log buffer for the most recent entries. Insert and delete first append a record to the log; when the log reaches a threshold, the batch‑merge runs in O(k log N) time, where k is the number of buffered updates, which is kept small in practice. The overall asymptotic complexity remains O(log N) for all operations, and the extra latency introduced by background merges is shown to be negligible for typical database and file‑system workloads.

Experimental evaluation covers three flash technologies (SLC, MLC, TLC) and four workload patterns (read‑heavy, write‑heavy, mixed, random). Metrics include average latency, write amplification, total block erasures, and projected device lifetime. Across all scenarios, the FM‑Tree reduces block erasures by a factor of 27–70, while increasing average operation latency by only 5–12 %. The most pronounced benefits appear in write‑intensive workloads, where the logging strategy prevents the cascade of splits and merges that would otherwise trigger frequent erasures.

In the related‑work discussion, the FM‑Tree is contrasted with existing wear‑leveling techniques, log‑structured file systems (e.g., F2FS, JFFS2), and other flash‑friendly tree structures such as Bε‑trees and LSM‑trees. Unlike those approaches, which operate at the file‑system or storage‑engine layer, the FM‑Tree embeds durability directly into the index data structure, allowing it to be combined with any underlying wear‑leveling mechanism for additive gains.

The paper concludes that the FM‑Tree offers a practical path to substantially longer flash‑based storage lifetimes without hardware changes, making it attractive for databases, key‑value stores, and embedded systems that rely on flash. Future work includes adaptive log‑buffer sizing, coordinated wear‑leveling across multiple flash chips, and hardware acceleration of batch‑merge operations.