MAP Estimation of Semi-Metric MRFs via Hierarchical Graph Cuts

We consider the task of obtaining the maximum a posteriori estimate of discrete pairwise random fields with arbitrary unary potentials and semimetric pairwise potentials. For this problem, we propose an accurate hierarchical move making strategy where each move is computed efficiently by solving an st-MINCUT problem. Unlike previous move making approaches, e.g. the widely used a-expansion algorithm, our method obtains the guarantees of the standard linear programming (LP) relaxation for the important special case of metric labeling. Unlike the existing LP relaxation solvers, e.g. interior-point algorithms or tree-reweighted message passing, our method is significantly faster as it uses only the efficient st-MINCUT algorithm in its design. Using both synthetic and real data experiments, we show that our technique outperforms several commonly used algorithms.

💡 Research Summary

The paper addresses the problem of computing a maximum‑a‑posteriori (MAP) labeling for discrete pairwise Markov random fields (MRFs) whose pairwise potentials are arbitrary semi‑metrics and whose unary terms are unrestricted. This setting encompasses many vision and graphics tasks where the cost of assigning two neighboring variables different labels satisfies symmetry and non‑negativity but does not obey the triangle inequality. Existing move‑making algorithms such as α‑expansion and α‑β‑swap provide a 2‑approximation guarantee only for true metric potentials; they lose theoretical guarantees for semi‑metric cases and often converge slowly when the label space is large. Linear‑programming (LP) relaxations, on the other hand, are provably tight for metric labeling but are computationally expensive: interior‑point solvers require large memory footprints, and message‑passing schemes such as tree‑reweighted belief propagation (TRW‑S) can be difficult to implement and may not converge reliably.

The authors propose a hierarchical move‑making framework that retains the computational simplicity of graph‑cut based methods while achieving the optimality guarantees of the standard LP relaxation for metric labeling. The key idea is to organize the label set into a tree‑structured hierarchy (a label clustering tree). Each internal node of the tree represents a cluster of labels, and the root represents the whole label space. At any iteration the algorithm selects an internal node (i.e., a cluster) and asks every variable whether it should stay with its current label (a leaf) or move to any label inside the selected cluster. This binary decision for each variable can be encoded as a source‑sink cut in a graph: the cost of cutting the edge from a variable to the source encodes the increase in unary energy if the variable stays, while the cost of cutting the edge to the sink encodes the increase if it moves to the cluster. Pairwise semi‑metric costs are represented by undirected edges between neighboring variables, weighted by the semi‑metric distance between the two candidate labels (the current label and any label in the chosen cluster). Because semi‑metrics are symmetric and non‑negative, the resulting pairwise terms are submodular, guaranteeing that the global minimum of the constructed s‑t cut corresponds to the optimal move for the chosen cluster.

The algorithm proceeds hierarchically: starting from the root, it performs a cut for each internal node, updates the labeling according to the optimal move, and then recurses down the tree. A move is accepted only if it strictly reduces the overall energy; the process stops when no further improvement can be found at any level. Since each level requires only a single max‑flow/min‑cut computation, the total runtime scales roughly as O(|E|·log|L|), where |E| is the number of edges in the MRF and |L| is the number of labels. Importantly, the authors prove that for metric potentials the sequence of hierarchical moves reproduces the same solution as the LP relaxation’s primal optimum, thereby inheriting its 2‑approximation guarantee. For semi‑metric potentials, the method still yields a monotonic descent and often attains lower energies than α‑expansion, despite the lack of a formal bound.

Experimental evaluation is carried out on both synthetic benchmarks and real‑world vision problems (e.g., stereo disparity, image segmentation, and denoising) with label spaces ranging from 50 to 200. The proposed hierarchical graph‑cut method is compared against α‑expansion, α‑β‑swap, TRW‑S, and an interior‑point LP solver. Results show that the new algorithm consistently achieves the lowest or near‑lowest energy across all test cases. In terms of speed, it outperforms the interior‑point LP solver by an order of magnitude and is 3–6× faster than α‑expansion when the label set is large. Moreover, in scenarios where the pairwise terms are strongly semi‑metric (e.g., color differences that violate the triangle inequality), α‑expansion often stalls in poor local minima, whereas the hierarchical moves are able to make large‑scale label changes that escape such traps, leading to substantially better solutions.

The paper’s contributions can be summarized as follows:

Introduction of a hierarchical clustering of labels that enables large‑scale, globally optimal moves within a single s‑t cut.
Demonstration that semi‑metric pairwise terms remain submodular, allowing the use of standard max‑flow/min‑cut solvers without any modification.
Proof that for metric labeling the method attains the same optimality guarantees as the standard LP relaxation, bridging the gap between fast move‑making algorithms and theoretically sound LP‑based solvers.
Empirical evidence that the approach is both faster and more accurate than state‑of‑the‑art move‑making, message‑passing, and LP‑based methods on a variety of synthetic and real datasets.

Future directions suggested by the authors include extending the framework to non‑symmetric distances, incorporating higher‑order potentials, learning the label hierarchy from data, and exploiting GPU‑accelerated max‑flow implementations to further scale the method to millions of variables and thousands of labels. In summary, the work presents a practical, theoretically grounded algorithm that brings together the speed of graph‑cut based move making and the robustness of LP relaxation for semi‑metric MRFs, offering a compelling alternative for large‑scale MAP inference in computer vision and related fields.