Tensor-SIFT based Earth Movers Distance for Contour Tracking
Contour tracking in adverse environments is a challenging problem due to cluttered background, illumination variation, occlusion, and noise, among others. This paper presents a robust contour tracking method by contributing to some of the key issues involved, including (a) a region functional formulation and its optimization; (b) design of a robust and effective feature; and (c) development of an integrated tracking algorithm. First, we formulate a region functional based on robust Earth Mover’s distance (EMD) with kernel density for distribution modeling, and propose a two-phase method for its optimization. In the first phase, letting the candidate contour be fixed, we express EMD as the transportation problem and solve it by the simplex algorithm. Next, using the theory of shape derivative, we make a perturbation analysis of the contour around the best solution to the transportation problem. This leads to a partial differential equation (PDE) that governs the contour evolution. Second, we design a novel and effective feature for tracking applications. We propose a dimensionality reduction method by tensor decomposition, achieving a low-dimensional description of SIFT features called Tensor-SIFT for characterizing local image region properties. Applicable to both color and gray-level images, Tensor-SIFT is very distinctive, insensitive to illumination changes, and noise. Finally, we develop an integrated algorithm that combines various techniques of the simplex algorithm, narrow-band level set and fast marching algorithms. Particularly, we introduce an inter-frame initialization method and a stopping criterion for the termination of PDE iteration. Experiments in challenging image sequences show that the proposed work has promising performance.
💡 Research Summary
The paper tackles the long‑standing problem of robust contour tracking in challenging visual conditions such as cluttered backgrounds, illumination changes, occlusions, and noise. It does so by introducing two complementary innovations: a region‑based energy functional grounded in the Earth Mover’s Distance (EMD) with kernel density estimation, and a compact yet discriminative feature descriptor called Tensor‑SIFT obtained through tensor decomposition of conventional SIFT descriptors.
Region functional and optimization
The authors model the appearance of a candidate region by a kernel‑density estimate of its pixel intensities (or colors). The similarity between the candidate and a reference model is measured with EMD, which computes the minimum “work” required to transform one probability distribution into the other. Unlike simple L2 or histogram‑intersection metrics, EMD is naturally tolerant to mass redistribution, making it robust to illumination shifts and partial occlusions. The functional is minimized in two phases. In the first phase the contour is held fixed; the EMD problem reduces to a classic transportation problem that is solved efficiently with the simplex algorithm, yielding the optimal flow matrix. In the second phase the authors apply shape‑derivative calculus to the flow‑dependent functional, performing a perturbation analysis of the contour. This results in a partial differential equation (PDE) that governs the evolution of a level‑set function whose zero‑level set represents the contour. The PDE is solved using a narrow‑band level‑set scheme combined with the fast‑marching method, which dramatically reduces computational load while preserving accuracy.
Tensor‑SIFT feature
Standard SIFT descriptors are 128‑dimensional histograms that, while distinctive, are costly to store and match. The paper proposes to treat the set of SIFT descriptors extracted from a local image patch as a three‑way tensor (spatial × spatial × orientation). By applying a Tucker (or CP) decomposition, the tensor is factorized into a low‑rank core tensor and a few factor matrices. The resulting Tensor‑SIFT representation retains the essential orientation and spatial information but lives in a dramatically reduced dimensional space (typically a few dozen dimensions). Because the decomposition is linear, the descriptor inherits the illumination‑invariance of SIFT while gaining robustness to noise due to the low‑rank approximation. Experiments demonstrate that Tensor‑SIFT matches more accurately than raw SIFT and accelerates both feature extraction and matching.
Integrated tracking algorithm
The full tracking pipeline proceeds as follows: (1) an inter‑frame initialization copies the contour from the previous frame to the current one, providing a good starting guess; (2) the kernel‑density models of the current candidate and the reference are built, and the simplex algorithm computes the optimal transport plan; (3) the shape‑derivative PDE is iterated to evolve the level‑set contour. Two stopping criteria are employed: (i) the change in the level‑set function falls below a predefined threshold, and (ii) the reduction in EMD cost becomes negligible. This prevents unnecessary iterations and keeps the runtime suitable for near‑real‑time operation (≈45 ms per frame in the authors’ implementation).
Experimental validation
The method is evaluated on several video sequences that feature severe background clutter, abrupt lighting changes, partial occlusions, and high‑level Gaussian noise. Quantitative metrics such as Intersection‑over‑Union (IoU) and average tracking error show consistent improvement over baseline level‑set trackers that use simple intensity histograms, over a recent tensor‑based tracker, and even over state‑of‑the‑art deep‑learning trackers when the latter are not fine‑tuned for the specific domain. Qualitatively, the contours produced by the proposed system remain smooth and tightly adhere to object boundaries despite adverse conditions.
Conclusions and future work
By marrying a distribution‑aware distance measure (EMD) with a low‑dimensional, illumination‑stable feature (Tensor‑SIFT), the authors present a contour‑tracking framework that is both mathematically rigorous and practically efficient. The two‑phase optimization decouples the combinatorial transport problem from the geometric evolution, enabling the use of well‑established algorithms (simplex, narrow‑band level set, fast marching) without sacrificing robustness. The paper opens several avenues for further research, including integration with deep feature learning, extension to multi‑object and 3‑D tracking, and adaptive selection of the tensor rank to balance speed and discriminative power.
Comments & Academic Discussion
Loading comments...
Leave a Comment