Shape-Adaptive Motion Estimation Algorithm for MPEG-4 Video Coding

Reading time: 10 minute
...

📝 Original Info

  • Title: Shape-Adaptive Motion Estimation Algorithm for MPEG-4 Video Coding
  • ArXiv ID: 1002.1168
  • Date: 2010-02-08
  • Authors: Researchers mentioned in the ArXiv original paper

📝 Abstract

This paper presents a gradient based motion estimation algorithm based on shape-motion prediction, which takes advantage of the correlation between neighboring Binary Alpha Blocks (BABs), to match with the Mpeg-4 shape coding case and speed up the estimation process. The PSNR and computation time achieved by the proposed algorithm seem to be better than those obtained by most popular motion estimation techniques.

💡 Deep Analysis

This research explores the key findings and methodology presented in the paper: Shape-Adaptive Motion Estimation Algorithm for MPEG-4 Video Coding.

This paper presents a gradient based motion estimation algorithm based on shape-motion prediction, which takes advantage of the correlation between neighboring Binary Alpha Blocks (BABs), to match with the Mpeg-4 shape coding case and speed up the estimation process. The PSNR and computation time achieved by the proposed algorithm seem to be better than those obtained by most popular motion estimation techniques.

📄 Full Content

Motion estimation and compensation is a key component for high quality video compression, which is characterized by its high computation complexity and memory requirements. However, Motion estimation is considered as the most time-consuming stage in MPEG processing [1] (up to 90% of the total execution time [2]). Therefore, to achieve performances desired for real time applications, it's imperative to think about hardware architecture and use a motion estimation algorithm which reduces computation complexity. The best performances, in term of PSNR, are achieved by exhaustive search (ES) ME algorithms, since they examine all possible motion vectors, however, their implementation increase the computation time and slow down the compression process [3]. Fast search algorithm, such as 2-D log search scheme [4], the Three step search (TSS) [5], the Four Step Search (FSS) [6] and Diamond Search (DS) [7] have been proposed, all of them try to achieve the same PSNR as the ES by considering only the most probable motion vectors. In fact, many researchers have focused on ME algorithms especially based on texture coding. However, one of the most important concepts introduced by the Mpeg-4 visual standard is the use of video object (VO) as an entity the user can access and manipulate. The instance of a VO at a particular point of time is called video object plane (VOP) [8]. To support coding of arbitrary-shaped objects, each position in the picture is associated to a Binary Alpha Blocks (BAB); and thus macro-blocks of the image are classed as: opaque (fully 'inside' the VOP), transparent (not part of the VOP) or on the boundary of the VOP. Therefore, in MPEG-4 video coding, ME of shape is also imperative for real-time VOP-based encoding. Several papers have proposed software implementation methods for shape coders [ 9 ], [ 10 ] where shape information is used to reduce search point per macroblock and only valid predecessors are evaluated [10] for boundary macro-blocks. Since hardware implementation is usually better to achieve the complexity suitable for real-time applications, we propose in this document a gradient based algorithm where ME for shape coding is combined with ME for texture, which we will use for a hardware implementation of an MPEG4 encoder IP to accelerate convergence process. The algorithm uses shape ME for boundaries macro-blocks and textures ME for opaque macro-blocks.

To check its performances, we have implemented and tested the proposed algorithm with many test video sequences. Results show that the algorithm presents a good PSNR result with a net decreasing in the number of iterations and computation time. The next section presents background information about video coding and motion estimation, the main idea of the proposed algorithm is described in the section 3 and the evaluation of obtained results is presented in section 4.

For video compression case, the goal is to remove the redundancy in images and reduce the amount of bits required to represent the video sequence. In addition to the discrete cosine transform (DCT) and the quantization block used to remove spatial redundancy, a typical MPEG encoder utilizes a motion estimation (ME) and compensation system to remove temporal redundancy between successive frames of the treated video. In block-based video coding standards such as Mpeg-4, the first video encoding stage performs motion estimation and compensation for each frame of the video sequence. In this step, we compare the content of the current and previous images and encode only displaced difference blocks, with motion vectors, instead of encoding all original blocks. Conventional algorithms generally use Matching-based or Gradient-based techniques to compute motion vectors. Matching-based techniques: in these approaches, true motion vectors can be determined based on the differences of pixel intensities. The best matching is obtained for smallest differences between pixel intensities of the current and reference frames. Gradient-Based techniques: in these approaches, based on the “intensity conservation over time assumption”, the spatiotemporal derivatives of pixel intensities is measured to determine true motion vectors. The total derivative of the image intensity function (I) should be zero every time and for each position in the image:

In the search process, the problem is to find the motion vector MV for the current block B (y,x) at time instance , so that the error SAD (sum of absolute differences) between the block B and the matching block C at time instance is minimized.

For commonly used motion estimation algorithms, there is no limit on the number of steps that the search algorithm can take. Therefore we thought about exploiting the optical-flow principle and use a recursive motion estimation which is a less complex method to compute dense displacement fields [10]. The proposed algorithm can be divided into two main steps as shown in Fig. 2: the first step is a Block recursive search, where four candidate vectors (three spatial and one temporal) are evaluated for the actual block by recursive block matching. The second step is a Pixel recursive search, where the chosen vector is adjusted by a gradient based method to find the best approximation. As shown in Fig. 3, the motion vector is selected between the three motion vectors of the neighboring blocks (A, B and C) and the temporal motion vector of the current block (X). For each vector we compute the motion compensation error by computing the SAD of the current block B (y,x) and the predicted one C . The best selection corresponds to the motion vector which minimizes the SAD. Since transparent macro-blocks are not part of the video objects, no vectors are evaluated for these macro-blocks. For boundary macro-blocks, since they mainly contain shape information, a shape ME is processed. The three spatial vectors are evaluated by referring to shape and motion vectors are evaluated by computing the compensation error based on the BABs around the processed macro-block [9]. The temporal candidate vector is evaluated by referring to texture. For opaque macro-blocks, a texture ME is processed; motion information is calculated by referring to texture around the processed macro-block. The candidate vectors are evaluated by computing the compensation error based on texture information.

This stage is used to refine ME process; the principle is to update the value of the selected vector in respect to a gradient based technique.

The displacement vector d at the current position is obtained as follow [11]:

where is the so-called convergence factor.

The displaced pixel difference (DPD) is computed iteratively till its minimum value is reached, based on the selected vector “d i " which corresponds to the minimum value obtained from the BRS. By replacing the gradient function in the equation ( 2) by its approximation, the displacement vector equation will be: (5)

Where f(x) is the pixel’s gray level, at the location given by the x position and is a threshold value which decreases the sensitivity of pixel recursion to noise; it is usually set to a value of two or three [10].

The Corresponding equation for u y is obtained by exchanging the index.

In Mpeg-4 visual standard the default block size for motion compensation is 16×16, to improve compression efficiency the standard support four motion vectors per macro-block. Therefore in recursive search we will work with 8×8 blocks.

The PRS compare the DPD in the current position, pointed by the selected vector in the BRS, with the PDP in the others positions which are obtained by shifting the predicted block with an update vector MV in the eight directions (Fig. 4).

Current block m = (0,0) (1, 0)

(1,1) (0,1) The final motion vector will be computed for the position with the smallest DPD.

To check its performances, the proposed algorithm is evaluated with four MPEG-4 test video sequences of QCIF format (176x144); Caltrain, Weather, foreman and Carphone. The Caltrain video sequence contains several moving objects on textured background; Weather and Carphone are low-motion video clips, while Foreman contains some quick motion scenes.

All tested video scenes are used to generate the frame-byframe motion vectors, with two frames distance between current frame and reference frame. The comparison between the reference and the reconstructed images as well as the residual image (Fig. 5d) can inform about the proposed algorithm efficiency and performances. We see that there is no significant visual difference between the reference and the reconstructed images, and that the difference image doesn’t contain a significant energy, that means the estimation performed by the algorithm is good.

The same video sequence is used to compare the performance of the proposed algorithm with the performance of motion estimation techniques, presented above, whose are widely accepted by the video compressing community and have been used in the implementation of various standards. Motion-compensated images, created from motion vectors, are compared to the reference frame by computing the Peak-Signal-to-Noise-Ratio (PSNR).

In the ES case, since we compare the current block with all blocks in the search window, it corresponds to the highest PSNR values. Fast algorithms attempt to achieve the same PSNR as in the ES wit h minimum computations. Figures Fig. 6 and Fig. 7 show respectively PSNR (in dB) results and computation time obtained for “caltrain” sequence. Experimental results demonstrate that the proposed algorithm (P.A) have a comparable PSNR as the ES algorithm and achieves consistent improvement in PSNR over the TSS algorithm which has been widely accepted as one of the best ME for low bit rate real time video applications [12,13]. Table 1 shows the average of search points per macroblock for tested sequences obtained for a search window size of 7x7. While the ES test around 205 search points per macro-block, the other tested ME algorithms accomplish a good performances with a higher speed-up ratio. For all tested algorithms, even if the number of comparison required per macro-block is clearly reduced by reference to ES, an average of 15 search points for DS, the proposed algorithm presents the best computation time and drop down the number of comparison required per macro-block to an average of 6.5 search points. Although PSNR results obtained for DS and FSS are relatively the same as the P.A for video scenes with low motions (Weather, Carphone), its PSNR performances are better than them for video scenes which present a quick or complex motions (Foreman, Caltrain) even it checks less search points.

In this work we have proposed a new efficient algorithm for motion estimation based on the spatio-temporal gradient which uses block and pixel recursive search. The algorithm is based on the shape motion estimation and takes advantage of the texture-shape correlation. Simulations show that the proposed algorithm reduces the number of comparisons and computation time required for motion estimation process with negligible quality degradation. When compared to commonly used algorithms, the proposed algorithm gives the best PSNR results, very close to those obtained by the ES algorithm, with a number of compared blocks neatly regardless the kind of treated video. Also, for real time applications, one can take advantage of the invariable computations of the algorithm to control and reduce processing delay.

current image(BAB) and the previous one (BAB_P) %% SAD_A = SAD[BAB_P(X), BAB(X+A)]; SAD_B = SAD[BAB_P(X), BAB(X+B)]; SAD_C = SAD[BAB_P(X), BAB(X+C)]; SAD_X = SAD[BAB_P(X), BAB(X+T)]; %% computation of the minimum distortion [position,sad_min] = Min[SAD_A,SAD_A,SAD_A,SAD_X] else %% type == Opaque %% distortion will be measured based on video data of the current image(VID_C) and the previous one (VID_P) %% SAD_A = SAD[VID_P(X), VID_C(X+A)]; SAD_B = SAD[VID_P(X), VID_C(X+B)]; SAD_C = SAD[VID_P(X), VID_C(X+C)]; SAD_X = SAD[VID_P(X), VID_C(X+T)];

shows average values of PSNR obtained for each video sequence. Results demonstrate that the proposed algorithm (P.A) have nearly same results as the ES algorithm for the four sequences and achieves consistent improvement in PSNR over the TSS algorithm.

📸 Image Gallery

cover.png

Reference

This content is AI-processed based on open access ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut