A Fusion of Labeled-Grid Shape Descriptors with Weighted Ranking Algorithm for Shapes Recognition

Retrieving similar images from a large dataset based on the image content has been a very active research area and is a very challenging task. Studies have shown that retrieving similar images based on their shape is a very effective method. For this purpose a large number of methods exist in literature. The combination of more than one feature has also been investigated for this purpose and has shown promising results. In this paper a fusion based shapes recognition method has been proposed. A set of local boundary based and region based features are derived from the labeled grid based representation of the shape and are combined with a few global shape features to produce a composite shape descriptor. This composite shape descriptor is then used in a weighted ranking algorithm to find similarities among shapes from a large dataset. The experimental analysis has shown that the proposed method is powerful enough to discriminate the geometrically similar shapes from the non-similar ones.

💡 Research Summary

The paper addresses the challenging problem of content‑based image retrieval (CBIR) by focusing on shape similarity, which has been shown to be a highly effective cue for distinguishing objects in large image collections. While many shape descriptors have been proposed, most existing approaches rely on a single type of feature—either boundary‑based (e.g., Shape Context, curvature) or region‑based (e.g., Zernike moments). Recent studies have explored the fusion of multiple descriptors, yet they often lack a systematic way to balance the contributions of each feature type, especially when the underlying representation of the shape itself is not robust to noise, scale, or rotation.

To overcome these limitations, the authors introduce a Labeled‑Grid (LG) representation. An input binary silhouette is partitioned into a regular grid of cells; each cell receives a label indicating whether it belongs to foreground, background, or lies on the shape boundary. This discretization preserves spatial relationships while dramatically reducing sensitivity to pixel‑level noise. The grid resolution (cell size) is a controllable parameter, empirically set between 8 and 16 pixels for the experiments, and can be tuned to trade off between detail preservation and computational cost.

From the LG representation, two families of local features are extracted:

Boundary‑based descriptors – for each cell that touches the shape contour, the algorithm computes the length of the contour segment inside the cell, the average curvature, and a histogram of tangent directions. These capture fine‑grained edge geometry and are particularly useful for distinguishing shapes with similar overall silhouettes but differing local curvature patterns.
Region‑based descriptors – for interior cells, the proportion of foreground pixels, the cell’s centroid relative to the shape’s global centroid, and second‑order moments (principal axes lengths) are calculated. These encode the distribution of mass and internal structure, complementing the boundary information.

In addition to the local descriptors, the authors incorporate three global shape features that summarize the entire silhouette:

Normalized Distance Histogram (NDH) – a radial distribution of distances from the shape’s centroid to its boundary, providing a scale‑invariant signature.
Fourier Descriptor (FD) – the first few coefficients of the discrete Fourier transform of the boundary, capturing overall contour shape while being robust to rotation.
Center‑Axis Ratio (CAR) – the ratio of the lengths of the principal axes derived from the shape’s covariance matrix, a compact measure of elongation and orientation.

All feature vectors are individually normalized to unit length. The final composite descriptor (\mathbf{F}) is formed by a weighted linear combination: