Comparative Study and Optimization of Feature-Extraction Techniques for Content based Image Retrieval
The aim of a Content-Based Image Retrieval (CBIR) system, also known as Query by Image Content (QBIC), is to help users to retrieve relevant images based on their contents. CBIR technologies provide a method to find images in large databases by using unique descriptors from a trained image. The image descriptors include texture, color, intensity and shape of the object inside an image. Several feature-extraction techniques viz., Average RGB, Color Moments, Co-occurrence, Local Color Histogram, Global Color Histogram and Geometric Moment have been critically compared in this paper. However, individually these techniques result in poor performance. So, combinations of these techniques have also been evaluated and results for the most efficient combination of techniques have been presented and optimized for each class of image query. We also propose an improvement in image retrieval performance by introducing the idea of Query modification through image cropping. It enables the user to identify a region of interest and modify the initial query to refine and personalize the image retrieval results.
💡 Research Summary
The paper presents a systematic comparison and optimization of several classic feature‑extraction techniques used in Content‑Based Image Retrieval (CBIR) systems, and it introduces a simple yet effective query‑modification method based on image cropping. Six representative descriptors are examined: (1) average RGB, (2) color moments, (3) gray‑level co‑occurrence matrix (GLCM), (4) local color histogram, (5) global color histogram, and (6) geometric moments. For each descriptor the authors discuss the underlying theory, computational cost, and typical strengths and weaknesses. Average RGB is trivial to compute but captures only a coarse color summary; color moments improve on this by encoding the first three statistical moments of the color distribution, yet they remain sensitive to illumination changes. GLCM encodes texture through spatial relationships of pixel pairs, offering rich information at the price of high dimensionality and memory demand. Local color histograms preserve spatial layout by dividing the image into a grid and computing a histogram per cell, whereas global histograms ignore spatial cues entirely. Geometric moments describe shape but degrade in the presence of clutter or multiple objects.
The experimental protocol uses a curated dataset comprising ten semantic classes (e.g., natural scenes, portraits, architecture), each containing 100 images. Retrieval performance is measured with precision, recall, and F‑measure on the top‑20 results returned by a k‑nearest‑neighbor search in the feature space. When evaluated individually, the descriptors achieve modest results: average RGB yields ~45 % precision, color moments improve to ~53 %, GLCM reaches ~60 % on texture‑rich classes, local histograms achieve ~68 % on complex scenes, global histograms lag at ~52 %, and geometric moments peak at ~70 % for shape‑dominant images.
Crucially, the authors explore pairwise and three‑way combinations of descriptors. The most consistent improvement is observed when average RGB is combined with color moments, raising average precision by 12–18 % across all classes. The pairing of GLCM with local color histograms proves especially powerful for natural landscapes, delivering the highest precision of 78 % because texture and spatial color cues complement each other. By analyzing class‑specific results, the study recommends distinct optimal combinations: color‑centric descriptors for portrait queries, shape‑centric descriptors for architectural queries, and hybrid texture‑color descriptors for mixed‑content scenes.
Beyond static feature fusion, the paper proposes a user‑driven query‑modification technique. After an initial search, the user can crop a region of interest (ROI) from the retrieved image and submit this ROI as a new query. The same feature extraction pipeline is applied to the cropped patch, effectively suppressing background noise and focusing the descriptor on the salient object. Empirical evaluation shows an average precision gain of 9 % after cropping, with gains up to 12 % for images with cluttered backgrounds. This simple interaction provides implicit feedback that helps the system refine its similarity measure without requiring complex relevance‑feedback algorithms.
In conclusion, the study demonstrates that no single handcrafted descriptor can dominate across diverse image domains; instead, complementary combinations of color, texture, and shape features yield the best overall retrieval performance. Moreover, allowing users to modify queries by selecting ROIs offers a low‑cost, intuitive way to personalize results and further boost accuracy. The authors suggest future work to integrate deep‑learning based embeddings with the traditional descriptors examined here, and to develop automated feature‑selection mechanisms that adaptively choose the optimal combination based on query content and user feedback. This research thus provides both a practical guide for building more effective CBIR systems and a foundation for further exploration of hybrid feature strategies.