A Study on the Effectiveness of Different Patch Size and Shape for Eyes and Mouth Detection
Template matching is one of the simplest methods used for eyes and mouth detection. However, it can be modified and extended to become a powerful tool. Since the patch itself plays a significant role in optimizing detection performance, a study on the influence of patch size and shape is carried out. The optimum patch size and shape is determined using the proposed method. Usually, template matching is also combined with other methods in order to improve detection accuracy. Thus, in this paper, the effectiveness of two image processing methods i.e. grayscale and Haar wavelet transform, when used with template matching are analyzed.
💡 Research Summary
This paper presents a detailed study on optimizing template matching for eye and mouth detection by rigorously analyzing the impact of two critical factors: the size/shape of the search patch and the type of image preprocessing.
The core methodology involves manually defining a face region, which is then partitioned into three search areas for the left eye, right eye, and mouth based on a geometric model. Template matching is performed on two versions of the image: a standard grayscale image and an image transformed using the Horizontal Haar Wavelet. The Haar transform emphasizes horizontal edges (characteristic of eyes and mouth) while suppressing vertical noise (like hair). The matching process uses a 10x10 pixel average template (created separately for grayscale and Haar images) and normalized correlation coefficient as the similarity measure.
The primary experiment investigates two patch shapes: square and rectangle. For each shape, the detection accuracy is tested while progressively reducing the patch size from an initially proposed dimension. The rectangular patch sizes are derived from literature (e.g., eye width = 50% of face width), while square patches are defined as a percentage of face width. The system is evaluated on 201 frontal face images from the FERET database, categorized into normal faces, faces with long front hair, and faces with spectacles.
The key findings are:
- Patch Size is Critical: The originally proposed patch sizes yielded the highest accuracy. Reducing the patch size by just 10% caused a significant drop in performance, and reductions of 30% rendered the system nearly ineffective. This underscores that the patch must be large enough to encapsulate sufficient structural information of the target feature.
- Rectangular Patches & Haar Transform are Generally Optimal: The combination of rectangular patches and Haar wavelet preprocessing achieved the highest overall accuracy (Left Eye: 90.46%, Right Eye: 95.37%, Mouth: 90.10%). Rectangular patches were particularly superior for mouth detection compared to square patches, as they better accommodated the horizontal shape of the mouth.
- Preprocessing Method Depends on Facial Context:
- Haar Transform excelled on faces with long front hair (achieving ~96% eye detection accuracy) because it effectively filtered out vertical hair noise.
- Grayscale images performed better on faces with spectacles (~91% accuracy), as the Haar transform sometimes misinterpreted the vertical edges of spectacle frames as noise.
- Both methods performed equally well on normal, unoccluded faces.
In conclusion, the research demonstrates that the performance of a simple template-matching detector is highly sensitive to the careful design of the search patch and the choice of image representation. It provides concrete evidence that there is no universally “best” setting; instead, optimal performance depends on the specific facial attributes present (e.g., hair, glasses). The findings offer practical guidance for building robust detection systems, suggesting considerations for adaptive patch sizing or context-aware selection of preprocessing techniques.
Comments & Academic Discussion
Loading comments...
Leave a Comment