Improved Approximation of Linear Threshold Functions

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

We prove two main results on how arbitrary linear threshold functions $f(x) = \sign(w\cdot x - \theta)$ over the $n$-dimensional Boolean hypercube can be approximated by simple threshold functions. Our first result shows that every $n$-variable threshold function $f$ is $\eps$-close to a threshold function depending only on $\Inf(f)^2 \cdot \poly(1/\eps)$ many variables, where $\Inf(f)$ denotes the total influence or average sensitivity of $f.$ This is an exponential sharpening of Friedgut’s well-known theorem \cite{Friedgut:98}, which states that every Boolean function $f$ is $\eps$-close to a function depending only on $2^{O(\Inf(f)/\eps)}$ many variables, for the case of threshold functions. We complement this upper bound by showing that $\Omega(\Inf(f)^2 + 1/\epsilon^2)$ many variables are required for $\epsilon$-approximating threshold functions. Our second result is a proof that every $n$-variable threshold function is $\eps$-close to a threshold function with integer weights at most $\poly(n) \cdot 2^{\tilde{O}(1/\eps^{2/3})}.$ This is a significant improvement, in the dependence on the error parameter $\eps$, on an earlier result of \cite{Servedio:07cc} which gave a $\poly(n) \cdot 2^{\tilde{O}(1/\eps^{2})}$ bound. Our improvement is obtained via a new proof technique that uses strong anti-concentration bounds from probability theory. The new technique also gives a simple and modular proof of the original \cite{Servedio:07cc} result, and extends to give low-weight approximators for threshold functions under a range of probability distributions beyond just the uniform distribution.

💡 Research Summary

This paper presents two major advances in the structural approximation of linear threshold functions (LTFs), i.e., Boolean functions of the form (f(x)=\operatorname{sign}(w\cdot x-\theta)) over the hypercube ({-1,1}^n).

Result 1 – Approximation by juntas.
Friedgut’s classic theorem (1998) shows that any Boolean function with total influence (\operatorname{Inf}(f)) can be (\varepsilon)-approximated by a junta (a function depending on a small set of variables) of size (2^{O(\operatorname{Inf}(f)/\varepsilon)}). For general Boolean functions this bound is tight, but it becomes vacuous when (\operatorname{Inf}(f)) is sub‑logarithmic. The authors prove that for the restricted class of LTFs the dependence on (\operatorname{Inf}(f)) can be dramatically reduced: every LTF is (\varepsilon)-close to a junta of size (\operatorname{Inf}(f)^2\cdot\operatorname{poly}(1/\varepsilon)). This bound is essentially optimal because there exist LTFs that require (\Omega(\operatorname{Inf}(f)^2+1/\varepsilon^2)) variables for any (\varepsilon)-approximation.

The proof proceeds by first defining a regular LTF, meaning that each weight (w_i) is small relative to the Euclidean norm (|w|_2). For regular LTFs the authors construct a randomized distribution over approximators: they sample a small subset of coordinates according to the normalized absolute weights and form a new threshold function that uses only those coordinates. Lemmas 8 and 9 show that a random draw from this distribution has high expected accuracy and depends on few variables, with the bound expressed directly in terms of the weights and the regularity parameter.

Since not every LTF is regular, the authors invoke a recent structural result of O’Donnell and Servedio (2008), which states that any LTF can be (\varepsilon/2)-approximated by a near‑regular LTF (f’): only a few “large” weights remain, while the remaining (small) weights are essentially scaled versions of the influences of the original variables. By fixing the large‑weight variables, each restriction of (f’) becomes a regular LTF, to which the random‑approximation scheme applies. A union‑bound over all possible restrictions yields a single approximator that depends on at most (\operatorname{Inf}(f)^2\cdot\operatorname{poly}(1/\varepsilon)) variables, completing the proof of Theorem 1.

Result 2 – Low‑weight integer approximators.
Servedio (2007) previously showed that any LTF can be (\varepsilon)-approximated by another LTF whose integer weights have magnitude at most (\operatorname{poly}(n)\cdot2^{\tilde O(1/\varepsilon^{2})}). The dependence on (\varepsilon) is exponential, limiting applications that require fine accuracy. The present work improves this dramatically to (\operatorname{poly}(n)\cdot2^{\tilde O(1/\varepsilon^{2/3})}).

The key technical innovation is the use of strong anti‑concentration inequalities. The authors prove that every LTF admits a representation in which many weights are well‑separated: no short interval on the real line captures a large fraction of the total weight mass. Under this separation condition, Halász’s anti‑concentration theorem (1977) guarantees that the random variable (w\cdot x) (with uniform ({-1,1}^n) input) does not concentrate on any narrow interval, yielding a bound on the probability that the sign flips when the weights are rounded to integers. By carefully rounding the well‑separated weights and scaling the remaining small weights, they construct an integer‑weight LTF whose weights are bounded by (n^{3/2}\cdot2^{\tilde O(1/\varepsilon^{2/3})}) and that agrees with the original function on all but an (\varepsilon) fraction of inputs.

Unlike the Berry‑Esseen based approach used in earlier work (which essentially reduces to a Gaussian approximation and cannot beat the (2^{\Theta(1/\varepsilon^{2})}) barrier), the anti‑concentration method exploits the additive structure of the weight vector itself, leading to the improved exponent. Moreover, the technique is modular: the same analysis works for a broad class of product distributions (including biased product measures) and for (k)-wise independent distributions, provided (k=\tilde O(1/\varepsilon^{2})).

Implications and broader context.
Both results tighten fundamental parameters governing LTF complexity. The junta bound shows that the number of relevant variables needed for a good approximation scales quadratically with total influence, which is tight up to constant factors. This has immediate consequences for learning theory (e.g., sample‑complexity bounds for learning LTFs under uniform distribution), property testing, and circuit lower bounds where small‑junta approximations are a common tool.

The low‑weight integer approximation improves the feasibility of representing LTFs in hardware or in algorithms that require integer arithmetic, and it strengthens several downstream results that rely on Servedio’s 2007 construction (e.g., works on bounded‑independence fooling LTFs, pseudorandom generators, and hardness of approximation).

Finally, the paper’s methodology—combining recent Fourier‑analytic structural theorems with probabilistic constructions and refined anti‑concentration tools—offers a template for tackling similar approximation problems for other Boolean function classes, such as polynomial threshold functions of higher degree. The authors also conjecture extensions of the junta result to degree‑(d) polynomial thresholds, suggesting a rich avenue for future research.

Improved Approximation of Linear Threshold Functions

💡 Research Summary

Comments & Academic Discussion

Leave a Comment