Acceleration of the shiftable O(1) algorithm for bilateral filtering and non-local means
A direct implementation of the bilateral filter [1] requires O(\sigma_s^2) operations per pixel, where \sigma_s is the (effective) width of the spatial kernel. A fast implementation of the bilateral filter was recently proposed in [2] that required O(1) operations per pixel with respect to \sigma_s. This was done by using trigonometric functions for the range kernel of the bilateral filter, and by exploiting their so-called shiftability property. In particular, a fast implementation of the Gaussian bilateral filter was realized by approximating the Gaussian range kernel using raised cosines. Later, it was demonstrated in [3] that this idea could be extended to a larger class of filters, including the popular non-local means filter [4]. As already observed in [2], a flip side of this approach was that the run time depended on the width \sigma_r of the range kernel. For an image with (local) intensity variations in the range [0,T], the run time scaled as O(T^2/\sigma^2_r) with \sigma_r. This made it difficult to implement narrow range kernels, particularly for images with large dynamic range. We discuss this problem in this note, and propose some simple steps to accelerate the implementation in general, and for small \sigma_r in particular. [1] C. Tomasi and R. Manduchi, “Bilateral filtering for gray and color images”, Proc. IEEE International Conference on Computer Vision, 1998. [2] K.N. Chaudhury, Daniel Sage, and M. Unser, “Fast O(1) bilateral filtering using trigonometric range kernels”, IEEE Transactions on Image Processing, 2011. [3] K.N. Chaudhury, “Constant-time filtering using shiftable kernels”, IEEE Signal Processing Letters, 2011. [4] A. Buades, B. Coll, and J.M. Morel, “A review of image denoising algorithms, with a new one”, Multiscale Modeling and Simulation, 2005.
💡 Research Summary
The bilateral filter combines a spatial Gaussian kernel gₛ with a range Gaussian kernel gᵣ to achieve edge‑preserving smoothing. A naïve implementation requires O(σₛ²) operations per pixel, which becomes prohibitive for large spatial widths. Chaudhury et al. introduced a constant‑time (O(1) with respect to σₛ) algorithm by exploiting the shiftability of trigonometric functions: the Gaussian range kernel is approximated by a raised‑cosine φ(s)=h·cosⁿ(πs/2T) (or equivalently by a polynomial). While this yields O(1) spatial complexity, the runtime still scales as O(T²/σᵣ²) because the order N of the approximation must satisfy N ≥ N₀≈c·(T/σᵣ)². In practice the worst‑case dynamic range T is set to 255 for 8‑bit images, leading to excessively large N for narrow range kernels (small σᵣ).
The paper presents two complementary accelerations that dramatically reduce N and thus the overall runtime, especially when σᵣ is small.
-
Exact computation of the dynamic range T in O(1) time.
The true T is defined as the maximum absolute intensity difference within the spatial window:
T = maxₓ max_{‖y‖≤R} |f(x) – f(x–y)|.
By observing that |a–b| = max(a–b, b–a) and that the max operator is associative, the authors rewrite T as the maximum of f(x) minus the windowed maximum of f. Computing the windowed maximum for every pixel can be done with the classic van Herk–Gil–Werman max‑filter, which runs in constant time per pixel regardless of window size. The algorithm applies the 1‑D max‑filter along rows, then columns, and finally subtracts the result from the original image to obtain T. Because real images rarely use the full 0‑255 range, the computed T is often substantially smaller, reducing N₀ proportionally to (T/σᵣ)². -
Truncation of the shiftable series.
The raised‑cosine approximation can be expressed as a finite sum:
φ(s) = Σₙ₌₀ᴺ aₙ cos((2n–N)s/√N σᵣ), where the coefficients aₙ are positive, unimodal, and sum to one. For large N most aₙ are tiny, especially those corresponding to frequencies far from the centre. Since the Gaussian gᵣ(s) is essentially zero outside ±3σᵣ, the contributions of these tail terms are negligible for the filter’s effective region. The authors therefore drop all terms whose coefficient aₙ falls below a user‑defined tolerance ε (e.g., ε = 0.005). This truncation can eliminate 80‑95 % of the terms while preserving the shape of φ(s) in the central region; the remaining series still approximates gᵣ(s) with high fidelity, and the monotonicity/positivity constraints are retained because the discarded terms only affect the far tails.
The combined effect is a substantial reduction in the number of basis functions N required for a given σᵣ. For example, with σᵣ = 10 and a true T = 210 (instead of 255), N₀ drops from 263 to about 178 (≈30 % reduction). With truncation, σᵣ = 5 sees N₀ reduced from 1053 to 95, a >90 % cut. Experimental results on the standard 512 × 512 Lena image confirm speed‑ups of 10‑20 % from exact T computation alone, and up to 80‑90 % when truncation is applied, while maintaining PSNR differences of less than 0.2 dB compared to the untruncated O(1) implementation. The accelerated algorithm also outperforms Porikli’s polynomial‑based O(1) filter in both speed and accuracy.
In summary, the paper resolves the long‑standing bottleneck of range‑kernel dependence in shiftable constant‑time bilateral and non‑local means filtering. By providing an O(1) method to compute the true dynamic range and by intelligently pruning insignificant terms from the shiftable series, the authors achieve near‑constant runtime irrespective of σᵣ, making the technique practical for real‑time high‑resolution imaging, video processing, and low‑power embedded vision systems.
Comments & Academic Discussion
Loading comments...
Leave a Comment