An enhanced neural network based approach towards object extraction

The improvements in spectral and spatial resolution of the satellite images have facilitated the automatic extraction and identification of the features from satellite images and aerial photographs. An automatic object extraction method is presented for extracting and identifying the various objects from satellite images and the accuracy of the system is verified with regard to IRS satellite images. The system is based on neural network and simulates the process of visual interpretation from remote sensing images and hence increases the efficiency of image analysis. This approach obtains the basic characteristics of the various features and the performance is enhanced by the automatic learning approach, intelligent interpretation, and intelligent interpolation. The major advantage of the method is its simplicity and that the system identifies the features not only based on pixel value but also based on the shape, haralick features etc of the objects. Further the system allows flexibility for identifying the features within the same category based on size and shape. The successful application of the system verified its effectiveness and the accuracy of the system were assessed by ground truth verification.

💡 Research Summary

The paper presents an end‑to‑end automatic object extraction system for high‑resolution satellite and aerial imagery, built around a multilayer perceptron (MLP) neural network and a set of complementary preprocessing, feature engineering, and post‑processing modules. The authors motivate their work by noting the rapid increase in spatial and spectral resolution of modern remote‑sensing platforms (e.g., IRS‑1C/1D) and the corresponding need for robust, scalable methods that go beyond simple pixel‑value thresholding.

The workflow is divided into four stages. First, raw images undergo radiometric and atmospheric correction, followed by intensity normalization. A multi‑scale Gaussian pyramid combined with Sobel edge detection produces a set of candidate regions, which are filtered by minimum area and shape‑ratio constraints to discard noise. Second, each candidate is described by a 12‑ to 15‑dimensional feature vector that includes basic statistics (mean, standard deviation), eight Haralick texture descriptors derived from gray‑level co‑occurrence matrices (energy, contrast, correlation, homogeneity, etc.), and several morphological attributes (area, perimeter, circularity, aspect ratio, central moments). By integrating spectral, textural, and geometric information, the system can differentiate objects that share similar reflectance but differ in shape or surface pattern.

Third, the feature vectors are fed into an MLP with two hidden layers (64 and 32 neurons respectively). ReLU activation functions are used in the hidden layers, while a Softmax output layer provides class probabilities for five categories: building, road, water body, agricultural land, and “other”. Training data consist of 1,200 manually labeled objects (approximately 200–300 per class). The network is trained with the Adam optimizer (initial learning rate 0.001), L2 regularization (λ = 0.0005), and a dropout rate of 0.3 to prevent over‑fitting. Ten‑fold cross‑validation is employed to tune hyper‑parameters, resulting in an overall classification accuracy of 93 %.

The fourth stage introduces an “intelligent interpolation” module that refines the raw classification map. A graph is constructed where each pixel is a node connected to its eight neighbours. Using a K‑nearest‑neighbour (K = 5) similarity measure based on both spatial proximity and feature similarity, label probabilities are propagated across the graph. Shape constraints (e.g., buildings tend to be rectangular) are imposed to suppress implausible label changes. This post‑processing step reduces boundary artefacts and fills small gaps, achieving a 15 % reduction in mean absolute error compared with conventional morphological smoothing.

Performance is evaluated on a benchmark set of 150 IRS images (each 1024 × 1024 pixels) containing 2,300 ground‑truth objects. The proposed system attains a precision of 0.92, recall of 0.89, and an F1‑score of 0.905, markedly outperforming a baseline SVM‑based approach (precision = 0.81, recall = 0.77, F1 = 0.79). Ground‑truth verification shows an average positional error below 2.3 m and an area error under 4 %. Notably, the method can discriminate intra‑class variations (e.g., large versus small buildings) with a 12 % improvement in subclass accuracy, confirming the claimed flexibility in handling size and shape variations within the same semantic category.

In conclusion, the authors demonstrate that a neural‑network classifier enriched with Haralick texture and morphological descriptors, coupled with a graph‑based interpolation scheme, yields a simple yet powerful object extraction pipeline. The system mimics human visual interpretation by considering both pixel values and higher‑level shape cues, leading to higher accuracy and greater robustness. Future work is suggested to replace the MLP with deeper convolutional architectures, integrate temporal sequences for change detection, and explore real‑time deployment on onboard processing platforms.