The Median is Easier than it Looks: Approximation with a Constant-Depth, Linear-Width ReLU Network

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

We study the approximation of the median of $d$ inputs using ReLU neural networks. We present depth-width tradeoffs under several settings, culminating in a constant-depth, linear-width construction that achieves exponentially small approximation error with respect to the uniform distribution over the unit hypercube. By further establishing a general reduction from the maximum to the median, our results break a barrier suggested by prior work on the maximum function, which indicated that linear width should require depth growing at least as $\log\log d$ to achieve comparable accuracy. Our construction relies on a multi-stage procedure that iteratively eliminates non-central elements while preserving a candidate set around the median. We overcome obstacles that do not arise for the maximum to yield approximation results that are strictly stronger than those previously known for the maximum itself.

💡 Research Summary

This paper investigates how well a ReLU neural network can approximate the median of d real‑valued inputs when the inputs are drawn uniformly from the unit hypercube. The authors view the problem through the lens of “rank‑k” functions, where the median corresponds to k = ⌈d/2⌉, and they study depth‑width trade‑offs for approximating these functions in the L₂ sense.

The first construction (Theorem 3.1) shows that a depth‑3 network with O(d²) hidden units can achieve any prescribed mean‑squared error ϵ>0. The idea is straightforward: the first hidden layer computes pairwise comparison indicators for every ordered pair (x_i, x_j); the second hidden layer aggregates these indicators to count how many inputs are smaller than each x_i, and the output neuron selects the entry whose count matches the desired rank. This yields an exact rank‑k extractor but requires quadratic width because of the O(d²) pairwise comparisons.

Increasing the depth modestly to five layers allows a substantial reduction in width (Theorem 3.2). The network first partitions the d inputs into batches, computes a “local median” for each batch using the depth‑3 construction, and then leverages the fact that a sufficiently large collection of local medians contains the global median with probability 1 − exp(−Ω(d²γ)), where γ>0 is a confidence parameter. The remaining two layers compare all candidates against all inputs, thereby identifying the true median while keeping the hidden width at O(d^{5/3+γ}). The error bound consists of the target ϵ plus an exponentially small term exp(−Ω(d²γ)).

The central contribution is a constant‑depth, linear‑width construction (Theorem 3.3). The authors design a depth‑46 ReLU network with O(d) hidden units that attains mean‑squared error ϵ + exp(−Ω(d)). The architecture proceeds through four iterative “window‑estimate‑and‑zero‑out” stages. In each stage a ReLU‑based threshold isolates the current candidate set’s lower and upper halves, zeros out elements outside a shrinking interval, and re‑centers the remaining candidates. After four rounds the candidate set shrinks to O(log d) elements. A novel hashing trick—implemented by a carefully chosen linear combination of the remaining activations—compresses this small set into a single scalar from which the exact median can be recovered. The construction requires weights of magnitude O(d²/ϵ), which can be made arbitrarily large to drive the error term down.

Beyond upper bounds, the paper establishes lower bounds that connect the median to the maximum function. A generic reduction (Theorem 4.3) shows that any approximation lower bound for the maximum immediately yields a corresponding bound for the median. By invoking the known exact‑computation lower bound for the maximum (which forces super‑linear width for constant depth), the authors derive an Ω(d) width lower bound for exact median computation (Theorem 4.2). Moreover, they prove that for any depth, achieving exponentially small error requires at least linear width (Theorem 4.9). These lower bounds contrast sharply with the constant‑depth, linear‑width approximation results, highlighting a genuine separation between exact and approximate regimes.

The paper also discusses extensions to other continuous input distributions. Although the proofs are presented for the uniform distribution, the key ingredients—pairwise comparisons, batch processing, and the probabilistic guarantee that a random sample of local medians captures the global median—depend only on permutation invariance and bounded density, suggesting that the results extend to any i.i.d. bounded continuous distribution.

In summary, the work delivers three major messages: (1) the median, despite lacking the idempotent property of the maximum, can be approximated with a ReLU network of constant depth and linear width; (2) a careful multi‑stage reduction combined with a hashing trick enables exponentially small L₂ error; and (3) exact computation remains fundamentally harder, as evidenced by linear‑width lower bounds that do not apply to the approximation setting. These findings deepen our theoretical understanding of depth‑width trade‑offs for piecewise‑linear functions and provide practical guidance for designing compact neural architectures that need to estimate order statistics.

The Median is Easier than it Looks: Approximation with a Constant-Depth, Linear-Width ReLU Network

💡 Research Summary

Comments & Academic Discussion

Leave a Comment