KAN/H: Kolmogorov-Arnold Network using Haar-like bases

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Function approximation using Haar basis systems offers an efficient implementation when compressed via Patricia trees while retaining the flexibility of wavelets for both global and local fitting. However, like B-spline-based approximations, achieving high accuracy in high dimensions remains challenging. This paper proposes KAN/H, a variant of the Kolmogorov-Arnold Network (KAN) that uses a Haar-like hierarchical basis system with nonzero first-order derivatives, instead of B-splines. We also propose a learning-rate scheduling method and a method for handling unbounded real-valued inputs, leveraging properties of linear approximation with Haar-like hierarchical bases. By applying the resulting algorithm to function-approximation problems and MNIST, we confirm that our approach requires minimal problem-specific hyperparameter tuning.

💡 Research Summary

The paper introduces KAN/H, a new variant of the Kolmogorov‑Arnold Network (KAN) that replaces the B‑spline unary functions traditionally used in KAN with a Haar‑like hierarchical basis system called Slash‑Haar (denoted ∖H). The motivation is twofold: Haar bases provide a natural multi‑resolution representation that is both globally and locally expressive, and they can be implemented efficiently with Patricia trees; however, classic Haar wavelets have zero first‑order derivatives almost everywhere, making back‑propagation impossible. To overcome this, the authors design a piecewise‑linear “Slash‑Haar” wavelet whose first derivative is constant (−2^j) on its support, enabling gradient computation while preserving the hierarchical structure of the original Haar system.

The Slash‑Haar basis is defined by scaling the traditional Haar wavelet with a discount factor β and adding a linear term, resulting in bases that decay exponentially with depth but retain a non‑zero slope. Shallow bases (small j) act as coarse, global approximators, whereas deeper bases provide fine, local detail. This duality mirrors the original Kolmogorov‑Arnold decomposition, where each layer computes a sum of unary functions.

Implementation leverages an extended Patricia tree: only visited bases are stored, and consecutive single‑child edges are collapsed into a single edge holding a coefficient and index. This yields O(min{n, 2^p}) memory usage (n = number of samples, p = input precision bits) and O(min{log n, p}) update time per sample. Because ∖H values depend on the exact position of x within a support, the authors adopt a “relaxed hill‑climbing” update rule: the forward pass and loss are computed using ∖H, but coefficient updates follow the same sign‑only rule as pure Haar (i.e., they ignore the exact magnitude of the gradient, only its direction). Empirically this approximation does not harm accuracy.

A major practical challenge is that intermediate layer activations in a KAN can take any real value, while Haar bases are defined on

KAN/H: Kolmogorov-Arnold Network using Haar-like bases

💡 Research Summary

Comments & Academic Discussion

Leave a Comment