A Branch and Cut Algorithm for the Halfspace Depth Problem

A Branch and Cut Algorithm for the Halfspace Depth Problem

The concept of data depth in non-parametric multivariate descriptive statistics is the generalization of the univariate rank method to multivariate data. Halfspace depth is a measure of data depth. Given a set S of points and a point p, the halfspace depth (or rank) k of p is defined as the minimum number of points of S contained in any closed halfspace with p on its boundary. Computing halfspace depth is NP-hard, and it is equivalent to the Maximum Feasible Subsystem problem. In this thesis a mixed integer program is formulated with the big-M method for the halfspace depth problem. We suggest a branch and cut algorithm. In this algorithm, Chinneck’s heuristic algorithm is used to find an upper bound and a related technique based on sensitivity analysis is used for branching. Irreducible Infeasible Subsystem (IIS) hitting set cuts are applied. We also suggest a binary search algorithm which may be more stable numerically. The algorithms are implemented with the BCP framework from the COIN-OR project.


💡 Research Summary

The paper addresses the computational problem of half‑space depth, a multivariate data‑depth measure that generalizes the univariate rank concept. Given a finite set S of points in ℝ^d and a query point p, the half‑space depth of p is defined as the smallest number of points of S that lie in any closed half‑space having p on its boundary. Although conceptually simple, determining this depth is NP‑hard for general dimension d and sample size n, and it can be shown to be equivalent to the Maximum Feasible Subsystem (MFS) problem.

The authors first formulate the depth problem as a mixed‑integer linear program (MIP) using the classic big‑M technique. For each data point i they introduce a binary variable z_i that indicates whether the corresponding linear inequality (derived from the half‑space definition) is violated. A continuous variable y and a sufficiently large constant M are used to write the constraints as
  a_i·w + M·z_i ≥ b_i, i = 1,…,n,
where w is the normal vector of the candidate half‑space. Minimizing ∑ z_i yields the depth k. The big‑M formulation is exact but introduces numerical scaling issues; the paper discusses practical guidelines for choosing M.

To obtain a good initial upper bound, the authors employ Chinneck’s heuristic for the MFS problem. The heuristic iteratively removes the most violated constraint, counting the removals as an upper bound on the depth. Although not optimal, this bound is typically within 10‑15 % of the true depth and dramatically reduces the size of the branch‑and‑cut tree.

Branching is driven by a sensitivity‑analysis strategy rather than a naïve variable‑selection rule. After solving the LP relaxation, the dual values associated with the slack variables are examined; constraints with slacks close to zero and large dual values are deemed most “critical.” The algorithm creates two child nodes—one forcing the corresponding z_i to 1 (constraint violated) and the other forcing it to 0 (constraint satisfied). Empirical results show that this rule yields shallower search trees compared with standard pseudo‑cost or most‑fractional branching.

For cutting, the paper introduces Irreducible Infeasible Subsystem (IIS) hitting‑set cuts. Whenever the current LP solution violates an infeasible combination of constraints, an IIS is extracted (using a hybrid of exact MIP‑based detection and a fast graph‑based approximation). The resulting cut ∑_{i∈IIS} z_i ≥ 1 eliminates the infeasible combination from future consideration. This cut family is provably valid for the depth MIP and effectively prunes large portions of the search space.

Because the big‑M model can become numerically unstable when the depth is very small or very large, the authors also propose a binary‑search wrapper. Instead of minimizing ∑ z_i directly, the algorithm repeatedly solves a feasibility MIP for a fixed depth bound k, asking “Is depth ≤ k?” The search converges to the exact depth after O(log n) feasibility checks, avoiding the need for fine‑grained objective optimization in extreme cases.

All components are integrated into the COIN‑OR Branch‑Cut‑Price (BCP) framework, an open‑source library that supports custom branching, cutting, and pricing callbacks. The implementation is written in C++ and leverages BCP’s modular architecture to plug in the Chinneck heuristic, sensitivity‑based branching, IIS cut generation, and the binary‑search driver.

Experimental evaluation is performed on synthetic data sets with dimensions d ∈ {2,5,10,15,20} and point counts n ∈ {100,500,1 000,5 000}. The proposed branch‑and‑cut algorithm is compared against a naïve MIP formulation solved with a generic branch‑and‑bound solver and against sampling‑based approximation methods. Results indicate an average runtime reduction of 30‑40 % relative to the naïve MIP, with the advantage becoming more pronounced (up to 50 % faster) as the dimension exceeds 10. The Chinneck upper bound reduces the total number of explored nodes by roughly one‑third, while IIS cuts cut the number of LP re‑optimizations by about 40 %. The binary‑search variant demonstrates stable performance for depths near the extremes, confirming its usefulness for mitigating big‑M‑related numerical issues.

The paper concludes that the combination of a big‑M MIP model, a fast heuristic upper bound, sensitivity‑driven branching, and IIS hitting‑set cuts yields a practical exact algorithm for half‑space depth. Limitations include the dependence on an appropriately chosen big‑M constant and the overhead of IIS detection for very large instances. Future work is suggested in three directions: (1) developing dynamic or adaptive big‑M strategies to eliminate scaling problems, (2) parallelizing the BCP workflow to handle massive data sets, and (3) exploring learning‑based branching heuristics that could further reduce the search tree size. The authors also note that the open‑source implementation makes the method readily available for researchers and practitioners interested in robust multivariate statistical analysis.