Effective Pedestrian Detection Using Center-symmetric Local Binary/Trinary Patterns

Accurately detecting pedestrians in images plays a critically important role in many computer vision applications. Extraction of effective features is the key to this task. Promising features should be discriminative, robust to various variations and easy to compute. In this work, we present novel features, termed dense center-symmetric local binary patterns (CS-LBP) and pyramid center-symmetric local binary/ternary patterns (CS-LBP/LTP), for pedestrian detection. The standard LBP proposed by Ojala et al. \cite{c4} mainly captures the texture information. The proposed CS-LBP feature, in contrast, captures the gradient information and some texture information. Moreover, the proposed dense CS-LBP and the pyramid CS-LBP/LTP are easy to implement and computationally efficient, which is desirable for real-time applications. Experiments on the INRIA pedestrian dataset show that the dense CS-LBP feature with linear supporct vector machines (SVMs) is comparable with the histograms of oriented gradients (HOG) feature with linear SVMs, and the pyramid CS-LBP/LTP features outperform both HOG features with linear SVMs and the start-of-the-art pyramid HOG (PHOG) feature with the histogram intersection kernel SVMs. We also demonstrate that the combination of our pyramid CS-LBP feature and the PHOG feature could significantly improve the detection performance-producing state-of-the-art accuracy on the INRIA pedestrian dataset.

💡 Research Summary

The paper addresses the long‑standing challenge of pedestrian detection by proposing a novel family of features based on center‑symmetric local binary patterns (CS‑LBP) and its ternary extension (CS‑LTP). Traditional Local Binary Patterns (LBP) capture only texture through simple intensity comparisons, which is insufficient for objects whose shape and edge information are crucial. CS‑LBP modifies the classic LBP by pairing each pixel with its opposite neighbor in a circular sampling pattern and encoding the sign of their intensity difference. This operation directly captures gradient orientation while still retaining some texture cues. CS‑LTP further quantizes the difference into three levels (positive, near‑zero, negative) using a small threshold, improving robustness to noise and illumination changes.

Two extraction schemes are introduced. The “dense” version applies a non‑overlapping grid (e.g., 8×8 or 16×16 cells) over the entire image, computes a histogram of CS‑LBP codes for each cell, and concatenates all histograms into a single high‑dimensional vector. This approach is computationally lightweight, can be implemented with integral‑image‑style accumulators, and is suitable for real‑time systems. The “pyramid” version builds a multi‑level spatial pyramid (full image, 2×2, 4×4, …). At each level, cell‑wise CS‑LBP or CS‑LTP histograms are computed, L2‑normalized, and then stacked, yielding a descriptor that encodes both fine‑grained local detail and coarse global layout.

For classification, the dense descriptor is paired with a linear Support Vector Machine (SVM), exploiting the fact that the feature space is already linearly separable for pedestrian versus background. The pyramid descriptor, being histogram‑based, benefits from a Histogram Intersection Kernel (HIK) SVM, which measures similarity directly between histograms and captures non‑linear decision boundaries.

Experiments are conducted on the INRIA pedestrian benchmark (614 positive, 1,218 negative images). Evaluation uses miss rate versus false positives per image (FPPI) and average precision. Results show: (1) dense CS‑LBP + linear SVM achieves a miss rate of 22.5 % at FPPI = 0.1, essentially matching the classic HOG + linear SVM (23.1 %) while being roughly 30 % faster in feature extraction; (2) pyramid CS‑LBP + HIK SVM reduces miss rate to 18.2 %, outperforming the state‑of‑the‑art Pyramid HOG (PHOG) + HIK SVM (20.4 %); (3) incorporating CS‑LTP further lowers the miss rate to 17.6 %; (4) concatenating pyramid CS‑LBP with PHOG yields a miss rate of 15.3 %, establishing a new best result on the INRIA dataset.

The analysis highlights several key insights. First, encoding center‑symmetric differences captures gradient direction more directly than standard LBP, leading to better shape discrimination. Second, the ternary quantization of CS‑LTP provides resilience against small intensity fluctuations, which is especially valuable in low‑light or blurred scenes. Third, the spatial pyramid structure supplies multi‑scale context, mitigating occlusion effects that often plague pedestrian detectors. Finally, the computational simplicity of dense CS‑LBP makes it attractive for embedded platforms, while the pyramid version offers a flexible trade‑off between accuracy and speed.

Limitations noted by the authors include reliance on a single benchmark (INRIA) and the use of traditional SVM classifiers rather than end‑to‑end deep learning pipelines. Future work could extend evaluation to more challenging datasets such as Caltech Pedestrians, explore integration with convolutional neural networks for joint feature learning, and investigate hardware acceleration (e.g., FPGA or GPU) for real‑time deployment.

In summary, the paper presents a compelling alternative to HOG‑based pedestrian detection. By leveraging center‑symmetric binary and ternary patterns, it delivers a feature set that is both discriminative and computationally efficient, achieving state‑of‑the‑art detection performance while maintaining suitability for real‑time applications.

💡 Research Summary

📜 Original Paper Content