Edge-Aligned Initialization of Kernels for Steered Mixture-of-Experts
Steered Mixture-of-Experts (SMoE) has recently emerged as a powerful framework for spatial-domain image modeling, enabling high-fidelity image representation using a remarkably small number of parameters. Its ability to steer kernel-based experts toward structural image features has led to successful applications in image compression, denoising, super-resolution, and light field processing. However, practical adoption is hindered by the reliance on gradient-based optimization to estimate model parameters on a per-image basis - a process that is computationally intensive and difficult to scale. Initialization strategies for SMoE are an essential component that directly affects convergence and reconstruction quality. In this paper, we propose a novel, edge-based initialization scheme that achieves good reconstruction qualities while reducing the need for stochastic optimization significantly. Through a method that leverages Canny edge detection to extract a sparse set of image contours, kernel positions and orientations are deterministically inferred. A separate approach enables the direct estimation of initial expert coefficients. This initialization reduces both memory consumption and computational cost.
💡 Research Summary
Steered Mixture‑of‑Experts (SMoE) is a spatial‑domain regression framework that models an image as a weighted sum of locally steered Gaussian kernels. While SMoE can achieve high‑fidelity reconstructions with relatively few parameters, its practical deployment has been limited by the need to estimate kernel positions, scales, orientations, and expert coefficients for each image through computationally intensive gradient‑based optimization. This paper introduces a deterministic, edge‑driven initialization scheme that dramatically reduces the optimization burden while preserving reconstruction quality.
The proposed pipeline begins with a standard Canny edge detector applied to a normalized grayscale image, producing a binary edge mask. The mask is then scanned along four canonical directions (0°, 90°, 45°, –45°) to extract maximal connected line segments. For each segment the geometric centre and orientation are computed, yielding a set of candidate kernel locations P = { (µ_x, µ_y, θ) }. To avoid an explosion of redundant kernels, each candidate is assigned an importance score that balances proximity to same‑orientation neighbours (d_sim) and to differently oriented neighbours (d_dis) via a weighted sum s_i = (1‑λ)d_sim + λd_dis (λ≈0.1). Low‑scoring candidates are clustered with DBSCAN using a spatial radius ε; each resulting cluster is represented by its mean position and modal orientation. The final initialization set K consists of the top‑scoring unclustered candidates together with the cluster representatives, allowing the user to control the maximum number of kernels (M) and thus the compression rate.
Kernel placement follows directly from the line‑segment representation: for every segment two Gaussian kernels are positioned orthogonal to the segment direction, separated by a fixed pixel distance Δµ. The steering matrix Σ is initialized isotropically as Σ = ½Δµ² I, which simplifies early computation while still permitting later refinement. Expert coefficients m are first set to the image intensity sampled at each kernel centre (m⁰_n = L
Comments & Academic Discussion
Loading comments...
Leave a Comment