Towards Marrying Files to Objects

Towards Marrying Files to Objects
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

To deal with the constant growth of unstructured data, vendors have deployed scalable, resilient, and cost effective object-based storage systems built on RESTful web services. However, many applications rely on richer file-system APIs and semantics, and cannot benefit from object stores. This leads to storage sprawl, as object stores are deployed alongside file systems and data is accessed and managed across both systems in an ad-hoc fashion. We believe there is a critical need for a transparent merger of objects and files, consolidating data into a single platform. Such a merger would extend the capabilities of both object and file stores while preserving existing semantics and interfaces. In this position paper, we examine the viability of unifying object stores and file systems, and the various design tradeoffs that exist. Then, using our own implementation of an object-based, POSIX-complete file system, we experimentally demonstrate several critical design considerations.


💡 Research Summary

The paper addresses the growing problem of “storage sprawl” that arises when enterprises deploy both traditional file systems and modern object storage systems. While object stores such as Amazon S3 or OpenStack Swift provide excellent scalability, cost efficiency, and a simple GET/PUT/DELETE API, many existing applications still rely on POSIX‑defined file‑system semantics—hierarchical namespaces, hard‑ and symbolic links, in‑place updates, and fine‑grained permissions. Because these two worlds are usually kept separate, data ends up duplicated or managed through ad‑hoc bridges, leading to over‑provisioning, higher operational costs, and degraded QoS.

The authors propose a “dual‑access” solution called ObjectFS, a FUSE‑based file system that sits on top of a generic object store while exposing a full POSIX interface. The core idea is that the same data can be accessed either through traditional file‑system calls (open, read, write, rename, etc.) or directly via the native object API, without requiring any changes to the applications. To explore the design space, the paper systematically examines four major dimensions:

  1. File‑to‑object mapping – four strategies are considered:

    • 1⇒1 (one file per object) – simplest and most intuitive, but any modification forces a full object rewrite, severely limiting write performance.
    • 1⇒N (file split into multiple objects) – enables partial updates and improves write throughput; however, it introduces extra metadata and makes direct object‑API access more complex.
    • N⇒1 (multiple files packed into a single object) – can boost throughput for workloads that frequently read many small files together, but essentially eliminates file‑level random access.
    • Hybrid – combines the above, e.g., creating new objects for each write (extents) and later merging them in the background, trading consistency for performance.
  2. Object naming policy – how a file is translated into an object identifier:

    • FILE‑NAME – identical names; suffers from flat‑namespace collisions.
    • FILE‑PATH – full path as object name; avoids collisions but makes rename/move operations expensive because each affected object must be copied.
    • INODE‑NUMBER – uses the file system’s inode number; cheap renames but requires a lookup step for object‑side clients.
    • USER‑DEFINED – lets administrators or applications supply custom naming functions, offering flexibility at the cost of added complexity.
  3. Metadata storage location – three alternatives:

    • IN‑OBJECT – metadata stored inside the same object or in dedicated “metadata objects”. This can expose internal metadata to object‑side clients and incurs high latency due to object‑store round‑trips.
    • IN‑OBJECT‑META – leverages object‑store‑provided user‑defined metadata (similar to extended attributes). It offers low‑latency metadata access but depends on richer APIs that are not universally available.
    • IN‑DEPENDENT – stores all file‑system metadata in an external, low‑latency key‑value store (Redis in the prototype). This yields fast inode lookups and stat calls, at the expense of additional system components and synchronization logic.
  4. Caching strategy – two models:

    • Local cache – each client node maintains its own RAM/SSD cache. Provides low latency for that node but requires explicit coherence mechanisms; otherwise, object‑native applications may see stale data.
    • Unified (distributed) cache – a shared cache service (e.g., Redis or Memcached) that all clients consult. It not only speeds up file‑system operations but can also re‑export a coherent object‑API endpoint, giving object‑native workloads the same cache benefits.

The implementation of ObjectFS uses FUSE for the user‑space file‑system layer, Redis as the metadata service, and a thin object library that supports both S3 and Swift. The prototype adopts the 1⇒1 mapping for simplicity, the FILE‑PATH naming policy, and stores metadata in Redis (IN‑DEPENDENT). A write‑back cache is enabled by default: data is written to Redis on each write() call and flushed to the object store only when the file is closed. The system also exploits multipart upload/download to parallelize large object transfers.

Evaluation is performed on an AWS t2.2xlarge instance with S3 as the backend. Four workloads are examined: sequential reads, sequential writes, random writes, and file renames. Key findings include:

  • Sequential reads: With multipart downloads (2, 4, 8 threads), ObjectFS achieves 80‑95 % of raw S3 bandwidth. The remaining gap is mainly due to metadata lookups and cache overhead.
  • Sequential writes: Without caching, each write triggers a read‑modify‑write cycle on the whole object, limiting throughput to < 2 MB/s. Enabling the write‑back cache and multipart uploads raises throughput to ~30 MB/s, comparable to native S3 performance.
  • Random writes: The prototype’s 1⇒N approach (splitting files into chunks) combined with caching shows substantial gains over the naïve 1⇒1 scheme, confirming that partial updates are essential for write‑intensive workloads.
  • Renames: Using the FILE‑PATH naming policy, renaming a directory forces a copy of every object in that directory, causing severe performance degradation. In contrast, the INODE‑NUMBER policy updates only the inode‑to‑object mapping, making renames cheap.

These experiments validate the authors’ design trade‑offs: the choice of mapping and naming directly influences the cost of metadata‑heavy operations (renames, moves), while caching and multipart transfers are critical to hide the high latency and large‑object constraints of underlying object stores.

Conclusion: ObjectFS demonstrates that a carefully engineered “dual‑access” file system can bridge the gap between POSIX‑centric applications and modern object storage, thereby reducing storage sprawl and preserving the economic benefits of object stores. The paper’s systematic taxonomy of design options, coupled with quantitative measurements, provides a valuable roadmap for future research. Open challenges remain, such as handling highly concurrent workloads, supporting richer object‑store features (versioning, ACLs), and ensuring strong consistency across the metadata service and the cache. Nonetheless, the work establishes a solid foundation for unified storage platforms that can serve both legacy and cloud‑native workloads.


Comments & Academic Discussion

Loading comments...

Leave a Comment