Image Retrieval System Base on EMD Similarity Measure and S-Tree

The paper approaches the binary signature for each image based on the percentage of the pixels in each color images, at the same time the paper builds a similar measure between images based on EMD (Earth Mover’s Distance). Besides, the paper proceeded to create the S-tree based on the similar measure EMD to store the image’s binary signatures to quickly query image signature data. From there, the paper build an image retrieval algorithm and CBIR (Content-Based Image Retrieval) based on a similar measure EMD and S-tree. Based on this theory, the paper proceeded to build application and experimental assessment of the process of querying image on the database system which have over 10,000 images.

💡 Research Summary

The paper presents a content‑based image retrieval (CBIR) system that combines a compact binary signature, the Earth Mover’s Distance (EMD) as a similarity measure, and an S‑tree indexing structure to enable fast queries over a large color‑image database. First, each image is transformed into a fixed‑length binary signature. The original RGB (or HSV) image is quantized into a predefined number of color bins; the proportion of pixels falling into each bin is computed, normalized, and then binarized using a simple threshold. The resulting bit string (e.g., 128 or 256 bits) captures the global color distribution while requiring only linear time in the number of pixels.

For similarity assessment, the authors adopt EMD, which treats the two color histograms as piles of earth that must be moved to match each other. By defining a ground distance based on Euclidean color differences, the minimum transportation cost provides a perceptually meaningful metric that outperforms naïve L1 or L2 distances on color histograms. Although exact EMD solving is cubic in the number of bins, the limited dimensionality of the quantized histograms keeps the computation tractable for the intended scale.

To avoid exhaustive pairwise comparisons, the paper introduces an S‑tree, a multi‑branch balanced tree reminiscent of a B‑tree but organized around distance metrics. Each internal node stores a small set of representative signatures and the associated EMD radii that bound the signatures in its subtree. During insertion, a new signature is routed to the child whose representative yields the smallest EMD; when a node overflows, the representatives are recomputed and the node is split, preserving balance and locality.

Query processing proceeds hierarchically. The query image is first encoded into its binary signature. Starting at the root, the algorithm computes the EMD between the query signature and each node’s representative. Subtrees whose lower‑bound distance exceeds a predefined threshold are pruned, dramatically reducing the search space. At leaf nodes, the exact EMD between the query and stored signatures is evaluated, and the results are ranked. This branch‑and‑bound approach yields sub‑linear query times while maintaining high retrieval accuracy.

The authors evaluate the system on a collection of more than 10,000 color images. They measure signature generation time, index construction time, average query latency, and retrieval quality (precision and recall). Compared with a baseline that linearly scans all histograms using L2 distance, the proposed method reduces average query time by over 70 % while preserving precision and recall above 0.85. Experiments also explore the trade‑off between signature length and performance: longer signatures improve discrimination but increase index size and query cost.

Strengths of the work include the simplicity of the binary signature, the perceptual relevance of EMD, and the effective use of an S‑tree to achieve scalable search. Limitations are acknowledged: the method relies solely on color information, ignoring texture and shape cues that are important for many retrieval tasks; exact EMD remains computationally intensive, suggesting that approximate or hybrid distance measures could be beneficial for real‑time applications; and the S‑tree’s split strategy based on average distances may lead to imbalance when data are highly non‑uniform.

Future directions proposed by the authors involve extending the signature to incorporate multi‑modal features (e.g., texture descriptors, shape signatures), integrating fast approximate EMD algorithms, and developing dynamic rebalancing techniques for the S‑tree to handle evolving databases. Overall, the paper demonstrates that a well‑designed combination of compact representation, a robust similarity metric, and distance‑aware indexing can deliver both efficiency and effectiveness in large‑scale image retrieval.

💡 Research Summary

📜 Original Paper Content