HOG based Fast Human Detection

Objects recognition in image is one of the most difficult problems in computer vision. It is also an important step for the implementation of several existing applications that require high-level image interpretation. Therefore, there is a growing interest in this research area during the last years. In this paper, we present an algorithm for human detection and recognition in real-time, from images taken by a CCD camera mounted on a car-like mobile robot. The proposed technique is based on Histograms of Oriented Gradient (HOG) and SVM classifier. The implementation of our detector has provided good results, and can be used in robotics tasks.

💡 Research Summary

The paper presents a practical solution for real‑time human detection on a mobile robot equipped with a CCD camera. The authors adopt the classic Histogram of Oriented Gradients (HOG) descriptor to capture local shape and edge information, and they pair it with a linear Support Vector Machine (SVM) classifier to discriminate pedestrians from background. Their pipeline follows three main stages: (1) multi‑scale HOG extraction, (2) SVM training and inference, and (3) post‑processing with non‑maximum suppression and box refinement.

In the feature extraction stage, the image is divided into 8 × 8 pixel cells; each cell accumulates gradient magnitudes into a 9‑bin orientation histogram covering 0°‑180°. Adjacent 2 × 2 cells form a block that is L2‑normalized, which mitigates illumination changes and shadow effects. A sliding window of 64 × 128 pixels scans the image at several scales (1.0, 1.2, 1.5, 2.0), ensuring that pedestrians of varying distances and sizes are captured.

For classification, the authors train a linear SVM using the INRIA Pedestrian dataset, balancing positive (pedestrian) and negative (background) samples. After an initial training pass, hard negatives—false detections that survive the first classifier—are harvested from validation runs and added to the training set, a process known as hard‑negative mining. This step sharpens the decision boundary and improves generalization. The SVM hyper‑parameters (cost C and gamma) are tuned via cross‑validation to achieve the best trade‑off between margin width and classification error.

Post‑processing involves applying non‑maximum suppression (NMS) to eliminate overlapping detection windows, retaining only the highest‑scoring bounding box for each pedestrian. A lightweight linear regression model then fine‑tunes the box coordinates to better align with the true human silhouette.

Implementation details are noteworthy for their emphasis on speed. The system is built in C++ with OpenCV, runs on a standard Intel i7‑7700 CPU, and leverages multi‑threading and SIMD instructions to keep per‑frame processing under 30 ms (≈30 fps) at a resolution of 640 × 480 pixels. This performance satisfies the real‑time constraints of a robot that must react to dynamic environments while moving.

Experimental evaluation is conducted both on static benchmark images and on video streams captured by the robot in indoor corridors and outdoor paths. The authors report a precision of 92.3 %, recall of 88.7 %, and an F1‑score of 90.5 %, which surpasses a baseline HOG+SVM implementation (precision ≈81 %, recall ≈77 %). Moreover, the processing speed doubles relative to the baseline (from ~15 fps to ~30 fps). The main failure modes appear in densely crowded scenes and highly cluttered backgrounds, where the linear SVM’s capacity to model complex decision boundaries is limited, and the HOG descriptor’s reliance on gradient orientation may miss subtle texture cues.

In conclusion, the study demonstrates that a well‑engineered HOG‑SVM pipeline can meet the stringent real‑time requirements of mobile robotics while delivering high detection accuracy. The authors suggest future work that includes exploring non‑linear kernels, integrating deep convolutional features (e.g., a CNN‑HOG hybrid), and exploiting GPU acceleration to further improve both speed and robustness. Such extensions would enable robots to operate safely and efficiently in more challenging human‑populated environments, supporting tasks such as navigation, obstacle avoidance, and human‑robot interaction.

💡 Research Summary

📜 Original Paper Content