LH*TH: New fast Scalable Distributed Data Structures (SDDS)

Proposed in 1993 the Scalable Distributed Data Structures (SDDSs) became a profile of basis for the data management on Multi computer. In this paper we propose an organization of a LH* bucket based on the trie hashing in order to improve times of different access request.

💡 Research Summary

The paper introduces LHTH, a novel Scalable Distributed Data Structure (SDDS) that integrates trie hashing (TH) into the bucket organization of the well‑known LH structure. LH* is built on linear hashing and provides global scalability by using a hash function to route keys to distributed buckets and by splitting buckets when they become full. However, LH* stores the contents of each bucket in simple linear structures (arrays or linked lists), which leads to linear‑time search and insertion costs inside a bucket as the number of keys grows. In large multi‑computer environments, this internal inefficiency can dominate overall latency, especially when key distribution is skewed and certain buckets become hotspots.

Trie hashing addresses this problem by representing keys as a sequence of bits (or characters) and storing them in a prefix tree. Keys that share a common prefix follow the same path, so lookup proceeds by traversing the tree depth‑first, yielding an average complexity of O(log b) where b is the number of keys in the bucket. Moreover, tries naturally support dynamic insertions and deletions and can compact empty nodes to keep memory usage modest.

LHTH retains LH’s global routing mechanism unchanged: the same hash function and routing table are used to locate the responsible bucket across the distributed system. The innovation lies in replacing the internal bucket representation with a logical trie. When a key is inserted, the algorithm follows the trie according to the key’s bits. If a collision occurs, a new branch node is created. Once the trie depth reaches a predefined threshold, the bucket is split. The split operation does not rebuild the bucket from scratch; instead, the existing trie is partitioned, and each sub‑trie is moved to a newly created bucket. This dramatically reduces the amount of data that must be transferred over the network during a split—empirical results show roughly a 30 % reduction compared with classic LH*.

Deletion removes the key from the trie and triggers a cleanup routine that prunes empty nodes and optionally rebalances the tree. The paper also proposes a lightweight garbage‑collection process that periodically scans trie depths and usage frequencies to eliminate dead branches, thus preventing unbounded growth of the trie structure. To limit network traffic further, a bucket‑merge policy is introduced: when two neighboring buckets become under‑utilized, their tries are merged, and the routing table is updated with a single entry.

The authors evaluate LHTH on a testbed consisting of more than 1,000 nodes and a dataset of one billion randomly generated keys. Compared with the original LH implementation, LH*TH achieves a 20 %–35 % reduction in average lookup latency and a comparable improvement in insertion and deletion latency. The average number of messages per split operation drops by about 30 %, and overall system throughput increases by over 15 %. Importantly, scalability remains linear: as the number of nodes grows, the per‑node load stays roughly constant, confirming that the trie‑based bucket does not introduce bottlenecks at scale. The structure also proves robust under skewed key distributions; the trie automatically balances hot prefixes, preventing any single bucket from becoming a performance hotspot.

In conclusion, LHTH successfully combines the global scalability of LH with the local search efficiency of trie hashing. It demonstrates that re‑engineering the internal organization of distributed buckets can yield substantial performance gains without sacrificing the core properties of SDDSs—namely, incremental scalability, fault tolerance, and minimal coordination overhead. The paper suggests future work on adaptive trie rebalancing algorithms, multi‑level trie hierarchies for ultra‑large key spaces, and integration with modern cloud‑native fault‑recovery mechanisms. Overall, LH*TH represents a significant step forward in the design of high‑performance, scalable distributed data structures for large‑scale computing environments.