A distributed file system for a wide-area high performance computing infrastructure

A distributed file system for a wide-area high performance computing   infrastructure
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

We describe our work in implementing a wide-area distributed file system for the NSF TeraGrid. The system, called XUFS, allows private distributed name spaces to be created for transparent access to personal files across over 9000 computer nodes. XUFS builds on many principles from prior distributed file systems research, but extends key design goals to support the workflow of computational science researchers. Specifically, XUFS supports file access from the desktop to the wide-area network seamlessly, survives transient disconnected operations robustly, and demonstrates comparable or better throughput than some current high performance file systems on the wide-area network.


💡 Research Summary

The paper presents XUFS, a wide‑area distributed file system designed for the NSF TeraGrid, which interconnects more than 9,000 compute nodes across geographically dispersed sites via a dedicated 30 Gbps WAN. The authors begin by outlining the limitations of existing solutions such as GPFS‑WAN, OpenAFS, and NFSv4: while these systems provide high‑throughput parallel I/O, they require manual copying of a researcher’s personal code and data to multiple sites, lack transparent access from a desktop environment, and offer limited support for the mobile, personal workstations that modern scientists now carry.

To address these gaps, the authors first analyze the typical computational science workflow—code development, source transfer, input staging, simulation execution, result analysis, result staging back to the user, and archival. Empirical measurements of TeraGrid usage reveal that a small fraction of files (>100 MB) account for over 98 % of the bytes transferred, indicating a strong bias toward large, sequential I/O. From this observation they derive five design assumptions: (1) personal file access outweighs inter‑user sharing; (2) the user’s workstation should act as the primary file server; (3) some files must never be copied back to the workstation; (4) client machines have ample local storage; and (5) the system should expose the native parallel file‑system interface to applications.

XUFS is implemented as a shared object library (libxufs.so) that is pre‑loaded on the client side, intercepting standard libc file‑system calls (open, read, write, stat, etc.). When a remote namespace is first mounted, a private cache directory is created on the client (typically on a parallel file‑system partition at the TeraGrid site). Directory listings are fetched once and stored locally; subsequent metadata operations (stat, readdir) are served from this cache. Actual file contents are only transferred on the first open, and all write operations are buffered in a shadow file. The buffered data is flushed to the remote server only when the file is closed, implementing a “write‑on‑close” semantics that dramatically reduces WAN traffic for large files.

Cache consistency is maintained through a TCP‑based notification callback manager. The remote file server registers with each client; any modification on the server side triggers an invalidation message, forcing the client to discard or refresh the cached copy before the next access. This mechanism guarantees that a client always sees a view of the file consistent with the server, even after network partitions. In the event of a client crash, a command‑line tool allows users to replay pending meta‑operations stored in a queue. Server crashes are handled by a cron‑based recovery script that restarts the service, while clients periodically attempt to re‑establish the notification channel.

Performance experiments compare XUFS with GPFS‑WAN under identical WAN conditions. For large files (≥500 MB), XUFS achieves throughput comparable to or slightly better than GPFS‑WAN, primarily because it eliminates the initial metadata round‑trip that GPFS‑WAN incurs for each file. Moreover, XUFS demonstrates robust behavior during simulated network outages: clients continue to operate on their local cache, and once connectivity is restored, pending changes are automatically synchronized without user intervention.

The authors conclude that XUFS successfully blends the high‑performance parallel I/O capabilities of existing WAN file systems with new features tailored to modern scientific practice: transparent access from personal workstations, resilience to transient disconnections, and user‑controlled data placement policies. Future work includes strengthening security (e.g., Kerberos integration), extending cross‑site sharing mechanisms, and adding automated tiered storage (SSD ↔ HDD) to further optimize performance and cost.


Comments & Academic Discussion

Loading comments...

Leave a Comment