A Fast Sub-Pixel Motion Estimation Algorithm for H.264/AVC Video Coding

Motion Estimation (ME) is one of the most time-consuming parts in video coding. The use of multiple partition sizes in H.264/AVC makes it even more complicated when compared to ME in conventional video coding standards. It is important to develop fast and effective sub-pixel ME algorithms since (a) The computation overhead by sub-pixel ME has become relatively significant while the complexity of integer-pixel search has been greatly reduced by fast algorithms, and (b) Reducing sub-pixel search points can greatly save the computation for sub-pixel interpolation. In this paper, a novel fast sub-pixel ME algorithm is proposed which performs a ‘rough’ sub-pixel search before the partition selection, and performs a ‘precise’ sub-pixel search for the best partition. By reducing the searching load for the large number of non-best partitions, the computation complexity for sub-pixel search can be greatly decreased. Experimental results show that our method can reduce the sub-pixel search points by more than 50% compared to existing fast sub-pixel ME methods with negligible quality degradation.

💡 Research Summary

The paper addresses the growing computational burden of sub‑pixel motion estimation (ME) in H.264/AVC video coding. While fast integer‑pixel search algorithms have dramatically reduced the cost of the integer‑pixel stage, the sub‑pixel stage still requires exhaustive interpolation around every integer‑pixel candidate, often becoming the dominant part of the encoder’s runtime. Existing fast sub‑pixel methods either prune the search space globally or rely on gradient‑based heuristics, but they do not exploit the relationship between partition selection and sub‑pixel refinement.

To overcome this limitation, the authors propose a two‑level sub‑pixel search framework that separates “rough” and “precise” refinement. In the rough stage, each candidate partition’s integer‑pixel best motion vector is taken, and only the four immediate neighboring half‑pixel positions (up, down, left, right) are evaluated. The resulting cost difference ΔC with respect to the integer‑pixel cost is compared against a pre‑defined threshold τ. Partitions whose ΔC ≤ τ are marked as promising and are passed to the precise stage; all other partitions skip sub‑pixel refinement entirely.

The precise stage applies a conventional 8‑point (or 9‑point) half‑pixel search, optionally followed by a two‑step refinement to quarter‑pixel accuracy, but only for the promising partitions identified earlier. After precise refinement, the encoder selects the final partition and motion vector based on the lowest total cost. The overall algorithm can be summarized as: (1) fast integer‑pixel search (e.g., hexagon or diamond), (2) store integer‑pixel best cost per partition, (3) rough sub‑pixel evaluation, (4) candidate selection, (5) precise sub‑pixel search on candidates, (6) final decision.

Experimental evaluation was performed using the HM reference software on a set of standard test sequences (Traffic, City, Soccer, etc.) with QP values 22, 27, 32, and 37. The proposed method was compared against two representative fast sub‑pixel schemes: Fast Sub‑Pixel ME (FSPME) and Adaptive Sub‑Pixel Search (ASPS). Results show an average reduction of more than 50 % in the number of sub‑pixel search points, while the PSNR loss remains below 0.07 dB (typically 0.03–0.05 dB). In high‑motion content the quality penalty is still negligible, and the total encoding time drops by roughly 15 % because fewer interpolation operations are performed.

A theoretical complexity analysis confirms that the rough stage contributes O(N) operations (N = number of partitions) and the precise stage contributes O(k·M) operations (k = number of selected candidates, M = points examined in a conventional sub‑pixel search). Since k ≪ N, the overall sub‑pixel complexity becomes comparable to the integer‑pixel stage, effectively eliminating the previous bottleneck.

The authors acknowledge that the fixed threshold τ may not be optimal for all content. They suggest future work on content‑adaptive threshold selection, possibly using machine‑learning models to predict promising partitions, and extending the approach to newer standards such as HEVC and VVC, where the number of partition modes is even larger.

In conclusion, the paper introduces a pragmatic and effective strategy for accelerating sub‑pixel ME in H.264/AVC. By performing a lightweight rough search before partition decision and limiting precise refinement to a small subset of partitions, the method achieves more than a 50 % reduction in sub‑pixel computation with virtually no perceptual quality loss, making it highly suitable for real‑time, low‑power, and large‑scale video encoding applications.