Evaluation of Distance Measures for Feature based Image Registration using AlexNet

Evaluation of Distance Measures for Feature based Image Registration   using AlexNet
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Image registration is a classic problem of computer vision with several applications across areas like defence, remote sensing, medicine etc. Feature based image registration methods traditionally used hand-crafted feature extraction algorithms, which detect key points in an image and describe them using a region around the point. Such features are matched using a threshold either on distances or ratio of distances computed between the feature descriptors. Evolution of deep learning, in particular convolution neural networks, has enabled researchers to address several problems of vision such as recognition, tracking, localization etc. Outputs of convolution layers or fully connected layers of CNN which has been trained for applications like visual recognition are proved to be effective when used as features in other applications such as retrieval. In this work, a deep CNN, AlexNet, is used in the place of handcrafted features for feature extraction in the first stage of image registration. However, there is a need to identify a suitable distance measure and a matching method for effective results. Several distance metrics have been evaluated in the framework of nearest neighbour and nearest neighbour ratio matching methods using benchmark dataset. Evaluation is done by comparing matching and registration performance using metrics computed from ground truth. Keywords: Distance measures; deep learning; feature detection; feature descriptor; image matching


💡 Research Summary

This paper investigates the use of deep convolutional neural network (CNN) features—specifically those extracted from the fully‑connected layers (fc6 and fc7) of a pre‑trained AlexNet—as local descriptors for feature‑based image registration. Traditional feature‑based registration pipelines rely on handcrafted keypoint detectors (e.g., SIFT, SURF) and hand‑engineered descriptors, followed by a similarity measure (Euclidean, Manhattan, etc.) and a matching strategy (nearest‑neighbor, nearest‑neighbor‑ratio). The authors replace the handcrafted descriptor with a high‑dimensional vector produced by feeding a SIFT‑detected keypoint’s surrounding patch (resized to 224 × 224) through AlexNet and extracting the activations of fc6 or fc7 (4096‑dimensional).

The study evaluates five distance metrics—Euclidean (L2), Manhattan (L1), Minkowski (generalized Lp), Cosine, and Correlation—combined with four matching schemes: one‑way nearest neighbor (1‑NN), two‑way nearest neighbor (2‑NN), one‑way nearest‑neighbor‑ratio (1‑NNR), and two‑way nearest‑neighbor‑ratio (2‑NNR). Thresholds for the ratio test are varied (1.1, 1.2, 1.3) while NN thresholds are set at 0.3, 0.5, and 0.7.

Experiments are conducted on the Oxford VGG Affine dataset, which comprises eight subsets, each containing six images: an original scene and five transformed versions (zoom, rotation, illumination change, compression, and viewpoint shift). Ground‑truth homographies are provided, allowing the authors to compute quantitative registration metrics: true positives (TP), keypoint error with respect to ground‑truth homography (KE_GH), keypoint error with respect to the estimated homography (KE_CH), and inlier ratio (IR). All processing is performed in MATLAB on a modest workstation (Intel i7‑2.7 GHz, 8 GB RAM).

Key findings are as follows:

  1. Distance Metric Performance – Cosine distance consistently yields the highest TP and IR values and the lowest KE_GH/KE_CH across all deformation types. Correlation distance performs similarly, albeit slightly worse than Cosine. Euclidean, Manhattan, and generic Minkowski distances lag behind, indicating that direction‑based similarity (angle between vectors) is more robust for AlexNet descriptors than raw magnitude differences.

  2. Matching Strategy – One‑way nearest‑neighbor‑ratio (1‑NNR) with a ratio threshold of 1.1 emerges as the most balanced approach, delivering a large number of correct matches while maintaining low keypoint error. Two‑way nearest neighbor (2‑NN) also provides competitive IR but with fewer matches. Two‑way NNR, although occasionally achieving high IR, suffers from an extremely low match count (<10), making it impractical for most registration tasks.

  3. Feature Layer Comparison – Descriptors derived from the fc6 layer outperform those from fc7 in terms of both match quantity and registration accuracy. The authors attribute this to fc6 retaining more localized texture and edge information, whereas fc7 is more abstract and geared toward global object classification, thus losing fine‑grained spatial detail needed for precise point‑to‑point alignment.

  4. Computational Considerations – The pipeline still depends on SIFT for keypoint detection, and each keypoint requires a full forward pass through AlexNet, which is computationally intensive. The authors acknowledge that real‑time or large‑scale applications would benefit from lighter networks or from extracting descriptors directly from the whole image without per‑keypoint cropping.

The paper concludes that integrating deep CNN features into a classic registration pipeline is feasible and advantageous, provided that an appropriate similarity measure (Cosine) and matching scheme (1‑NNR) are selected. The authors propose future work to (i) evaluate other deep architectures such as VGG and ResNet, and (ii) develop end‑to‑end deep similarity networks (e.g., Siamese or triplet‑loss models) that learn the distance function directly, potentially eliminating the need for handcrafted ratio thresholds.

Overall, the study offers a practical guideline for researchers seeking to modernize feature‑based registration: use AlexNet’s fc6 activations as local descriptors, compare them with Cosine distance, and adopt a one‑way ratio test with a modest threshold. This combination delivers robust registration across a variety of geometric and photometric transformations while leveraging the expressive power of deep learned features.


Comments & Academic Discussion

Loading comments...

Leave a Comment