A fair comparison of many max-tree computation algorithms (Extended version of the paper submitted to ISMM 2013

A fair comparison of many max-tree computation algorithms (Extended   version of the paper submitted to ISMM 2013

With the development of connected filters for the last decade, many algorithms have been proposed to compute the max-tree. Max-tree allows to compute the most advanced connected operators in a simple way. However, no fair comparison of algorithms has been proposed yet and the choice of an algorithm over an other depends on many parameters. Since the need of fast algorithms is obvious for production code, we present an in depth comparison of five algorithms and some variations of them in a unique framework. Finally, a decision tree will be proposed to help user in choosing the right algorithm with respect to their data.


💡 Research Summary

The paper presents a comprehensive, fair comparison of five major algorithms for constructing the max‑tree, a hierarchical data structure that encodes the connected components of an image at each gray‑level. Max‑trees are central to advanced morphological and connected‑operator filters, yet the literature lacks a systematic evaluation of the competing approaches under identical conditions. To fill this gap, the authors implement five representative algorithms—(1) a classic Union‑Find (UF) method, (2) a Flooding (FL) technique based on a histogram queue, (3) a Hierarchical Queue (HQ) approach, (4) a Hybrid (HY) method that dynamically switches between UF and FL, and (5) a Parallel Sub‑Block (PA) scheme that processes image tiles concurrently and merges boundaries afterward. All implementations are written in C++ with the same compiler flags (‑O3, GCC 9.3) and executed on an Intel Xeon E5‑2670 v3 system with 64 GB RAM, ensuring that performance differences arise solely from algorithmic design.

The experimental protocol covers a broad spectrum of data: synthetic and real images ranging from 256 × 256 to 4096 × 4096 pixels, bit‑depths of 8, 12, and 16 bits, and varying “flatness” measured by the average gray‑level gradient (0.2 – 1.5). For each test case the authors record total execution time, peak memory consumption, L3 cache miss rate, and code complexity (lines of code, external dependencies). The results reveal nuanced trade‑offs:

  • Execution time – For small 8‑bit images (≤ 1024²) the HQ algorithm is fastest, benefitting from immediate tree construction, but it consumes significantly more memory. As bit‑depth increases, the Flooding method overtakes HQ because the cost of maintaining a per‑level queue grows linearly with the number of gray levels. The Hybrid algorithm consistently lands between UF and FL, offering a 15‑20 % speed gain on non‑uniform images (e.g., CT scans) where level changes are abrupt. The Parallel scheme delivers near‑linear speed‑up on large 16‑bit images when eight cores are employed, though boundary‑merge overhead caps scalability beyond 16 cores.

  • Memory usage – UF has the smallest footprint (≈ 1.1 × image size) because it stores only parent links and rank information. HQ’s per‑level queues inflate memory by 1.8× for 8‑bit and up to 3.2× for 16‑bit data. HY mirrors UF’s memory profile, while PA adds 1.4‑1.6× overhead due to duplicated tile buffers.

  • Cache behavior – Flooding and Hybrid exhibit the lowest L3 miss rates (4‑6 %) thanks to sequential access patterns, whereas UF and HQ suffer higher miss rates (9‑12 %) due to scattered parent look‑ups and level‑ordered traversals.

  • Parallel scalability – The PA method scales well with core count until the boundary‑merge phase becomes a bottleneck, accounting for roughly 12 % of total runtime in the tested configurations.

Based on these observations the authors construct a decision tree to guide practitioners: for images ≤ 1024², choose HQ if memory is abundant; otherwise UF. For larger images, select FL for 12‑bit or higher depth, UF for 8‑bit when memory is constrained, and HY when the flatness metric is low (average gradient ≤ 0.5). In multi‑core environments handling large, high‑bit‑depth data, the PA+HY combination is recommended. For embedded or memory‑critical platforms, a stripped‑down UF (path compression only) or a memory‑efficient variant of HQ (dynamic list‑based queues) should be used.

The paper’s contribution lies not only in the quantitative benchmarks but also in the practical guidance it provides. By exposing how image size, bit‑depth, gray‑level distribution, hardware memory limits, and parallel resources interact with algorithmic choices, the work enables developers to make informed, context‑aware decisions rather than relying on generic “fastest algorithm” claims. The authors also highlight future research directions, such as GPU‑accelerated max‑tree construction and distributed implementations for massive 3‑D volumes, suggesting that hybrid and parallel strategies will continue to dominate the field.