Restrictive Hierarchical Semantic Segmentation for Stratified Tooth Layer Detection
Accurate understanding of anatomical structures is essential for reliably staging certain dental diseases. A way of introducing this within semantic segmentation models is by utilising hierarchy-aware methodologies. However, existing hierarchy-aware segmentation methods largely encode anatomical structure through the loss functions, providing weak and indirect supervision. We introduce a general framework that embeds an explicit anatomical hierarchy into semantic segmentation by coupling a recurrent, level-wise prediction scheme with restrictive output heads and top-down feature conditioning. At each depth of the class tree, the backbone is re-run on the original image concatenated with logits from the previous level. Child class features are conditioned using Feature-wise Linear Modulation of their parent class probabilities, to modulate child feature spaces for fine grained detection. A probabilistic composition rule enforces consistency between parent and descendant classes. Hierarchical loss combines per-level class weighted Dice and cross entropy loss and a consistency term loss, ensuring parent predictions are the sum of their children. We validate our approach on our proposed dataset, TL-pano, containing 194 panoramic radiographs with dense instance and semantic segmentation annotations, of tooth layers and alveolar bone. Utilising UNet and HRNet as donor models across a 5-fold cross validation scheme, the hierarchical variants consistently increase IoU, Dice, and recall, particularly for fine-grained anatomies, and produce more anatomically coherent masks. However, hierarchical variants also demonstrated increased recall over precision, implying increased false positives. The results demonstrate that explicit hierarchical structuring improves both performance and clinical plausibility, especially in low data dental imaging regimes.
💡 Research Summary
This paper addresses the challenge of accurately segmenting tooth layers and alveolar bone in panoramic dental radiographs by introducing a novel hierarchical segmentation framework called Restrictive Hierarchical Semantic Segmentation (RHSS). Traditional hierarchy‑aware methods typically embed hierarchical information only in the loss function, which provides indirect supervision and often fails to exploit the easy‑to‑detect coarse features of parent classes. RHSS instead embeds the hierarchy directly into the network architecture and inference process.
The authors first construct a new dataset, TL‑pano, comprising 194 anonymised panoramic X‑ray images with dense pixel‑wise annotations for seven semantic classes: Upper Alveolar Bone, Lower Alveolar Bone, Tooth (parent), and four child classes – Enamel, Dentin, Pulp, and Composite. The hierarchy is stored as a JSON tree; parent masks are not present in the ground‑truth files but are generated on‑the‑fly as the sum of their children.
The core of RHSS consists of four components:
-
Recurrent Level‑wise Connections with Restrictive Output Heads – The backbone (UNet or HRNet) processes the original image to predict only the level‑0 classes (the parent Tooth class). Its logits are concatenated with the original image and fed back into the same backbone to predict level‑1 child classes. This process repeats for any deeper levels, ensuring that each pass focuses on a limited set of classes.
-
Feature‑wise Linear Modulation (FiLM) Conditioning – After each level, the class probability map is globally averaged to produce a conditioning vector. A shallow, level‑specific MLP converts this vector into scaling and shifting parameters that modulate the feature maps of the next level. This top‑down modulation aligns fine‑level feature representations with the coarse‑level confidence, effectively guiding the network to attend to regions already identified as belonging to the parent class.
-
Hierarchical Probability Composition – The final probability for a child class is obtained by multiplying its predicted probability with the probability of its parent. This enforces logical consistency: a child cannot be active where the parent is absent.
-
Hierarchical Loss – For each level, a weighted Dice plus cross‑entropy loss is computed. An additional consistency loss penalises mismatches between the summed child probabilities and the parent probability, encouraging the network to respect the hierarchy during training.
The framework is evaluated using 5‑fold cross‑validation on TL‑pano with two backbone models. Compared with baseline (non‑hierarchical) versions, the hierarchical variants achieve consistent improvements: mean IoU rises by 3–5 percentage points, Dice by 4–6 points, and recall for fine‑grained classes (Enamel, Pulp, Composite) increases dramatically (up to +15 pp). However, precision slightly declines, indicating a higher false‑positive rate, which the authors attribute to the hierarchical constraint allowing child predictions wherever the parent is predicted positive.
Qualitative results show more anatomically plausible masks, with fewer instances of a child class appearing outside its parent region. The method adds modest computational overhead (≈1.5× inference time) but remains feasible for offline clinical analysis.
The paper concludes that embedding explicit hierarchical structure through recurrent connections, restrictive output heads, FiLM conditioning, and a consistency‑aware loss yields superior segmentation performance and better clinical plausibility, especially in low‑data regimes typical of medical imaging. Limitations include the increased false‑positive rate and the focus on a single imaging modality; future work is suggested on precision‑recall balancing, multi‑modal extensions (e.g., CBCT), and model compression for real‑time deployment.
Comments & Academic Discussion
Loading comments...
Leave a Comment