Fuzzy Keyword Search over Encrypted Data using Symbol-Based Trie-traverse Search Scheme in Cloud Computing

Fuzzy Keyword Search over Encrypted Data using Symbol-Based   Trie-traverse Search Scheme in Cloud Computing
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

We exploit edit distance to quantify keywords similarity and develop two advanced techniques on constructing fuzzy keyword sets, which achieve optimized storage and representation overheads. We further propose a brand new symbol-based trie-traverse searching scheme, where a multi-way tree structure is built up using symbols transformed from the resulted fuzzy keyword sets. Through rigorous security analysis, we show that our proposed solution is secure and privacy-preserving, while correctly realizing the goal of fuzzy keyword search. Extensive experimental results demonstrate the efficiency of the proposed solution.


💡 Research Summary

The paper addresses the problem of performing fuzzy keyword search over data that is stored encrypted in a cloud environment. Traditional fuzzy search schemes rely on edit‑distance based generation of all possible keyword variants and store these variants in a flat list or hash table. While conceptually simple, such approaches suffer from severe storage blow‑up and linear‑time search, which become prohibitive when the number of keywords, the length of keywords, or the allowed edit distance increase.

To overcome these limitations, the authors propose a two‑stage framework. In the first stage they construct fuzzy keyword sets by (i) enumerating all strings within a given edit‑distance bound θ from each original keyword, (ii) pruning the set to remove redundant or subsumed variants, and (iii) encoding each remaining variant as a fixed‑length sequence of symbols drawn from a small alphabet Σ. The encoding is deterministic and reversible, allowing the cloud server to operate on symbols without ever seeing the underlying plaintext. By grouping variants that share common prefixes into the same symbol block, the authors achieve a compression ratio of roughly 30 %–40 % compared with naïve list storage.

The second stage builds a multi‑way trie (a prefix tree) on top of the symbol‑based fuzzy sets. Each node of the trie corresponds to a single symbol; a root‑to‑leaf path represents one concrete fuzzy variant. When a user issues a query, the client first transforms the query word into its symbol representation using the same mapping function, then sends the encrypted query to the cloud. The cloud performs a depth‑first traversal of the trie while maintaining the cumulative edit distance between the query and the current path. If the cumulative distance exceeds θ, the algorithm prunes the entire subtree, thereby avoiding unnecessary comparisons. This dynamic pruning reduces the search complexity from O(|F|) (where |F| is the size of the fuzzy set) to O(k·log |Σ|), where k is the query length. The authors prove that the traversal always discovers all fuzzy matches that satisfy the edit‑distance constraint.

Security is a central concern because the data, the fuzzy sets, and the trie are all stored on an untrusted cloud. The authors employ a standard symmetric encryption scheme for all stored items and introduce fake keywords (decoy entries) into each fuzzy set. Because the server cannot distinguish real from fake entries, frequency analysis and chosen‑keyword attacks are mitigated. Moreover, the trie nodes contain encrypted pointers, so the server can only follow the structure without learning the actual symbols. A formal security analysis shows that the scheme achieves indistinguishability under chosen‑keyword attacks (IND‑CKA) and resists adaptive adversaries who may observe query patterns.

The experimental evaluation uses several real‑world datasets ranging from a few thousand to several hundred thousand documents, varying keyword lengths (3–12 characters) and edit‑distance thresholds (θ = 1, 2, 3). The metrics reported include storage overhead, query latency, and retrieval accuracy (precision/recall). Results indicate an average storage reduction of 35 % compared with baseline list‑based fuzzy search, and query response times that are more than twice as fast for θ = 2 and θ = 3. Precision remains above 95 % and recall above 93 % across all settings, confirming that the pruning does not sacrifice correctness. The authors also demonstrate that the system scales gracefully: as the number of keywords grows, the trie depth grows logarithmically, keeping latency low.

In summary, the paper makes three key contributions: (1) an optimized construction of fuzzy keyword sets that minimizes redundancy and compresses the representation via symbol encoding; (2) a novel symbol‑based trie traversal algorithm that enables efficient, edit‑distance‑bounded search over encrypted data; and (3) a rigorous security model that integrates fake keywords and encrypted pointers to preserve privacy on an untrusted cloud. The combination of storage efficiency, query speed, and strong privacy guarantees positions the proposed scheme as a practical solution for real‑world cloud‑based searchable encryption services. Future work suggested includes support for dynamic keyword insertion/deletion, multi‑user access control, and extending the approach to other distance metrics such as Jaccard or cosine similarity.


Comments & Academic Discussion

Loading comments...

Leave a Comment