Segmentation of Natural Images by Texture and Boundary Compression

We present a novel algorithm for segmentation of natural images that harnesses the principle of minimum description length (MDL). Our method is based on observations that a homogeneously textured region of a natural image can be well modeled by a Gaussian distribution and the region boundary can be effectively coded by an adaptive chain code. The optimal segmentation of an image is the one that gives the shortest coding length for encoding all textures and boundaries in the image, and is obtained via an agglomerative clustering process applied to a hierarchy of decreasing window sizes as multi-scale texture features. The optimal segmentation also provides an accurate estimate of the overall coding length and hence the true entropy of the image. We test our algorithm on the publicly available Berkeley Segmentation Dataset. It achieves state-of-the-art segmentation results compared to other existing methods.

💡 Research Summary

The paper introduces a novel image segmentation algorithm grounded in the Minimum Description Length (MDL) principle. The authors argue that a natural image can be efficiently described by two complementary components: the texture of each homogeneous region and the shape of the region boundaries. By modeling texture with a multivariate Gaussian distribution and encoding boundaries with an adaptive chain code, the total coding length of an image becomes a quantitative measure of how well a particular segmentation explains the data. The optimal segmentation is defined as the one that minimizes this total coding length.

Texture modeling: For a given window size, the algorithm extracts local patches and fits a Gaussian model, estimating the mean vector and covariance matrix. The number of bits required to transmit these parameters is calculated using Shannon entropy, providing a precise cost for representing the texture of a region.

Boundary coding: Region borders are represented as a sequence of directional symbols (the classic Freeman chain code). The probability of each symbol is learned from the data, yielding an adaptive code length: simple, straight edges receive short codes, while intricate, highly curved boundaries incur longer codes. This adaptive scheme directly penalizes overly complex boundaries.

MDL formulation: The overall description length L is the sum of texture costs for all regions plus the sum of boundary costs for all region interfaces. Minimizing L simultaneously encourages compact, statistically coherent regions and parsimonious boundaries.

Hierarchical agglomerative clustering: The algorithm starts with each pixel as an individual region. For every pair of adjacent regions it computes the change in description length ΔL that would result from merging them. The pair with the largest negative ΔL (i.e., greatest reduction in total cost) is merged, and the neighboring ΔL values are updated. This greedy merging proceeds until no merge can reduce L.

Multi‑scale processing: To capture both coarse structures and fine details, the authors construct a hierarchy of window sizes. At the coarsest scale, large windows provide robust texture estimates and drive the formation of big regions. The process is then repeated with progressively smaller windows, allowing the algorithm to refine boundaries and split regions where finer texture variations become apparent. The multi‑scale hierarchy is integrated into the same MDL‑driven merging framework, ensuring that decisions at each scale are globally optimal with respect to the total coding length.

Experiments: The method is evaluated on the Berkeley Segmentation Dataset (BSDS500), using standard metrics such as F‑measure, Probabilistic Rand Index (PRI), and Variation of Information (VOI). Compared with state‑of‑the‑art techniques—including graph‑cut, Mean Shift, and the Felzenszwalb‑Huttenlocher method—the proposed approach achieves higher scores across most metrics. Notably, it balances boundary precision and region recall, avoiding both over‑segmentation and under‑segmentation. Additionally, the final description length closely approximates the true entropy of the image, offering a by‑product estimate of image complexity.

Limitations and future work: The Gaussian texture assumption may be insufficient for highly non‑Gaussian or mixed textures, and the chain‑code boundary model can be sub‑optimal for extremely thin or fractal contours. The hierarchical merging process is computationally intensive, suggesting the need for GPU acceleration or approximate merging heuristics for real‑time applications. The authors propose extending the framework with more expressive texture models (e.g., mixture models or deep feature embeddings) and richer boundary representations (e.g., spline‑based codes).

In summary, the paper demonstrates that an MDL‑based formulation, which jointly compresses texture and boundary information, yields a principled and effective segmentation algorithm. It not only delivers competitive segmentation quality on benchmark data but also provides a meaningful estimate of image entropy, thereby contributing both practical performance and theoretical insight to the fields of computer vision and image analysis.

💡 Research Summary

📜 Original Paper Content