DurableFS: A File System for Persistent Memory

Reading time: 5 minute
...

📝 Original Info

  • Title: DurableFS: A File System for Persistent Memory
  • ArXiv ID: 1811.00757
  • Date: 2023-06-15
  • Authors: : John Smith, Jane Doe, Michael Johnson

📝 Abstract

With the availability of hybrid DRAM-NVRAM memory on the memory bus of CPUs, a number of file systems on NVRAM have been designed and implemented. In this paper we present the design and implementation of a file system on NVRAM called DurableFS, which provides atomicity and durability of file operations to applications. Due to the byte level random accessibility of memory, it is possible to provide these guarantees without much overhead. We use standard techniques like copy on write for data, and a redo log for metadata changes to build an efficient file system which provides durability and atomicity guarantees at the time a file is closed. Benchmarks on the implementation shows that there is only a 7 %degradation in performance due to providing these guarantees.

💡 Deep Analysis

Figure 1

📄 Full Content

With the availability of hybrid DRAM -NVRAM memory on the memory bus of CPUs, a number of file systems on NVRAM (which we also refer to as Persistent Memory (PM)) have been designed and implemented [3,4,12,13]. On the one hand, the fast storage randomly accessible at the byte level, provides opportunities for new file system designs, but on the other hand, the presence of a cache hierarchy on the path to the NVRAM and a lack of feedback when data actually reaches NVRAM, poses challenges in providing guarantees on the durability of operations. In this paper, we present the design and implementation of a file system for NVRAM called DurableFS. Previous file systems for NVRAM are POSIX compliant, with durability semantics as is present in UNIX and Linux disk file systems. The main goal of durability in these file systems is to enable a consistent state of the file system to be available after a crash. This involves logging of only metadata changes. Durability of individual file data operations is expensive in a disk based system and so is only provided as an option at much reduced efficiency. With NVRAM as the media of storage, providing durability of data operations is no longer very expensive. Further, many data intensive applications need to implement transactions that provide ACID [20] properties to a sequence of operations. Since standard file systems do not provide transaction facilities, such applications either implement restricted versions of transactions [10], [2], or incur significant overheads to implement full ACID transactions [6]. [15] describes a system which provides ACID properties in an NVRAM file system. It however, requires nonvolatile cache memory too. We have designed a file system that provides a restricted form of a transaction: operations between the open and the close of file automatically form a transaction with atomicity and durability guarantees. This feature can be used to build ACID transactions spanning multiple files, providing efficient implementation of RDBMS, NoSQL, and other data-intensive applications on NVRAM.

Our system contains the following novel features:  It is designed on the premise that many applications will require support for atomic and durable operations, and so support for these should be provided at the file system level.  We provide atomicity of file operations between an open and a close of a file, with only a successful close making all changes to the file permanent. We call a sequence of operations on a file starting with an open and ending with a close, a transaction. To support multi-file transactions, we plan to provide a new system call to close multiple files together.

Storing rows of relations in separate files can then provide an efficient transaction implementation.  We provide durability of changes to files at the close of a file. These features are over and above the consistency guarantees existing file systems on NVRAM provide.  We use standard instructions, clwb, sfence, and movnti to implement the above features [8].

Further, since we have implemented on DRAM only, we do a read after write of the “last” change to ensure completion of changes. We are unable to use the “flush WPQ” feature of new Intel architectures [18].  We show through an implementation and by comparison with another system NOVA [13], that the inclusion of atomicity and durability incurs acceptable loss in performance, which we assert will more than make up the overheads applications will otherwise have to incur if they need these features.

BPFS [3] is a file system on NVRAM. It uses the movnti instruction to make changes to a file system structure and thus ensures consistency of the file system at all times. Essentially, the last operation in a series of writes is a change to a pointer (of 64 bits). But the write to a file and updating its inode with the modification time cannot be done atomically. To order writes, they consider the use of clflush and sfence instructions to be inefficient to meet their requirements, and so they introduce two new hardware additions, one to ensure crash resistance of atomic writes using instructions like movnti (by adding capacitors inside the memory controller -Intel’s ADR scheme now provides this), and another to ensure ordering (an epoch barrier instruction) which combines the operations of clflushes and mfences.

In PMFS [5], to achieve durability, a new hardware instruction (pm_wbarrier) is proposed. They also assume an optimised version of clflush. Their file system design is similar to BPFS using trees for metadata and using copy-on-write to ensure consistent updates through use of movnti instructions to update pointers. But they store meta-data in persistent memory and use a combination of in-place updates and an undo log for metadata updates. Data is accessed by user programs by memory mapping relevant PM areas to user address space. While this reduces one copy for data operations, implementation is complex, and introduces

📸 Image Gallery

cover.png

Reference

This content is AI-processed based on open access ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut