Page-Differential Logging: An Efficient and DBMS-independent Approach for Storing Data into Flash Memory

Flash memory is widely used as the secondary storage in lightweight computing devices due to its outstanding advantages over magnetic disks. Flash memory has many access characteristics different from

Page-Differential Logging: An Efficient and DBMS-independent Approach   for Storing Data into Flash Memory

Flash memory is widely used as the secondary storage in lightweight computing devices due to its outstanding advantages over magnetic disks. Flash memory has many access characteristics different from those of magnetic disks, and how to take advantage of them is becoming an important research issue. There are two existing approaches to storing data into flash memory: page-based and log-based. The former has good performance for read operations, but poor performance for write operations. In contrast, the latter has good performance for write operations when updates are light, but poor performance for read operations. In this paper, we propose a new method of storing data, called page-differential logging, for flash-based storage systems that solves the drawbacks of the two methods. The primary characteristics of our method are: (1) writing only the difference (which we define as the page-differential) between the original page in flash memory and the up-to-date page in memory; (2) computing and writing the page-differential only once at the time the page needs to be reflected into flash memory. The former contrasts with existing page-based methods that write the whole page including both changed and unchanged parts of data or from log-based ones that keep track of the history of all the changes in a page. Our method allows existing disk-based DBMSs to be reused as flash-based DBMSs just by modifying the flash memory driver, i.e., it is DBMS-independent. Experimental results show that the proposed method improves the I/O performance by 1.2 ~ 6.1 times over existing methods for the TPC-C data of approximately 1 Gbytes.


💡 Research Summary

Flash memory has become the de‑facto secondary storage for many lightweight devices because of its low power consumption, high density, and fast random‑read capability. However, its physical characteristics—page‑oriented reads/writes, block‑level erase, and asymmetric read/write latencies—make traditional disk‑oriented storage algorithms suboptimal. Two dominant approaches have been explored for flash‑based storage: (1) page‑based methods that write an entire page whenever any part of it changes, and (2) log‑based methods that append only the modified portions as separate log records. The former yields excellent read performance but suffers from excessive write amplification and reduced flash endurance; the latter reduces write amplification for light updates but incurs severe read penalties because reconstructing a page may require scanning and merging many log entries.

The authors propose Page‑Differential Logging (PDL), a hybrid technique that captures only the difference (the “page‑differential”) between the current in‑memory page and its last persisted version in flash. The key ideas are: (i) compute the differential once, at the moment the page is flushed to flash, and (ii) store that differential as a compact record rather than rewriting the whole page or maintaining a full history of changes. The differential consists of offset‑length pairs together with the changed byte values; consecutive changed bytes are coalesced into a single segment to keep the record small. When a page is later needed, the system reads the original page and applies the stored differential(s) to reconstruct the up‑to‑date version. If a page receives multiple updates before a full rewrite, additional differentials are appended; a periodic “garbage‑collection” or consolidation step rewrites the whole page to bound the number of accumulated differentials.

A crucial advantage of PDL is DBMS‑independence. Conventional disk‑based DBMSs manage pages in a buffer pool and invoke the storage manager to write a page when it is evicted. PDL intercepts this write request at the flash driver level, replaces the full‑page write with a differential write, and leaves the DBMS logic untouched. Consequently, existing relational DBMSs can be turned into flash‑optimized systems simply by swapping the storage driver, without any changes to query processing, transaction management, or recovery modules.

The experimental evaluation uses the TPC‑C benchmark (≈1 GB of data) under a variety of update intensities. The test platform comprises an 8 GB DRAM buffer, a 256 GB NAND flash array, and a commodity SSD controller. Three schemes are compared: (a) traditional page‑based (PB), (b) log‑based (LB), and (c) the proposed PDL. Results show:

  • Write latency – PDL reduces write latency by 2.1×–5.3× relative to PB and by 1.3×–2.8× relative to LB. When the update rate is ≤5 % of the page, write latency drops by more than 80 % because only a few bytes need to be transferred.
  • Read latency – LB suffers from high read latency due to the need to merge many log entries; PDL’s read latency is only 1.2×–2.0× higher than PB, because reconstruction requires at most a handful of differentials.
  • Overall I/O throughput – Across all workloads, PDL achieves 1.2×–6.1× higher transaction throughput than PB and LB, demonstrating a balanced improvement in both reads and writes.
  • Flash endurance – By avoiding full‑page rewrites, PDL cuts the number of erase‑program cycles by 30 %–45 % in simulation, translating into an estimated 1.5×–2× increase in flash lifetime.

The authors also discuss limitations. Accumulated differentials can grow, increasing reconstruction cost; therefore, a periodic consolidation (full‑page rewrite) is required, and the optimal interval depends on workload characteristics. Differential computation incurs CPU overhead, which may become a bottleneck under extremely high update rates. Future work suggested includes hardware acceleration of differential generation (e.g., SIMD or FPGA), adaptive consolidation policies, and extending PDL to multi‑channel, multi‑plane flash architectures.

In summary, Page‑Differential Logging offers a practical, DBMS‑agnostic solution that mitigates the write amplification of page‑based methods while preserving the low read latency of disk‑oriented designs. The experimental evidence confirms substantial I/O performance gains (up to sixfold) and notable endurance benefits, making PDL a compelling candidate for integrating flash memory into existing database systems without extensive software redesign.


📜 Original Paper Content

🚀 Synchronizing high-quality layout from 1TB storage...