Learning with Boolean threshold functions

Learning with Boolean threshold functions
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

We develop a method for training neural networks on Boolean data in which the values at all nodes are strictly $\pm 1$, and the resulting models are typically equivalent to networks whose nonzero weights are also $\pm 1$. The method replaces loss minimization with a nonconvex constraint formulation. Each node implements a Boolean threshold function (BTF), and training is expressed through a divide-and-concur decomposition into two complementary constraints: one enforces local BTF consistency between inputs, weights, and output; the other imposes architectural concurrence, equating neuron outputs with downstream inputs and enforcing weight equality across training-data instantiations of the network. The reflect-reflect-relax (RRR) projection algorithm is used to reconcile these constraints. Each BTF constraint includes a lower bound on the margin. When this bound is sufficiently large, the learned representations are provably sparse and equivalent to networks composed of simple logical gates with $\pm 1$ weights. Across a range of tasks – including multiplier-circuit discovery, binary autoencoding, logic-network inference, and cellular automata learning – the method achieves exact solutions or strong generalization in regimes where standard gradient-based methods struggle. These results demonstrate that projection-based constraint satisfaction provides a viable and conceptually distinct foundation for learning in discrete neural systems, with implications for interpretability and efficient inference.


💡 Research Summary

The paper introduces a fundamentally different paradigm for training neural networks on Boolean data, where every neuron’s activation and every weight are constrained to the discrete set {‑1, +1}. Rather than minimizing a conventional loss function, the authors formulate learning as a feasibility problem involving two complementary constraint sets, denoted A and B, and solve it using the Reflect‑Reflect‑Relax (RRR) projection algorithm.

Boolean Threshold Functions (BTFs).
Each neuron implements a Boolean threshold function y = sgn(w·x). In addition to the sign constraint, a margin condition |w·x| ≥ μ is imposed. The margin μ is scaled with the input dimension m as μₘ = p·m/σ, where σ (the “support hyper‑parameter”) limits the number of non‑zero entries in a weight vector. When σ is small and μ is sufficiently large, the optimal weight vectors become exactly σ‑sparse with entries ±1, which mathematically corresponds to simple logical gates: σ = 1 yields a copy (identity) gate, σ = 2 yields 2‑input AND/OR, and σ = 3 yields a 3‑input majority (Maj) gate that can emulate AND/OR when one input is fixed. This sparsity‑induced equivalence is proved in the paper and provides a direct link between learned BTFs and interpretable Boolean circuitry.

Divide‑and‑Concur Decomposition.
The learning problem is split into two high‑level constraints:

  • Constraint A (local BTF consistency).
    For every neuron in every data‑specific network replica, the triple (w, x, y) must satisfy the sign equation and the margin. This set is non‑convex because of the sign operation, but a closed‑form Euclidean projection onto A can be derived: given any (w, x, y), replace them with the nearest point that fulfills both equations while minimizing ‖w′‑w‖² + ‖x′‑x‖² + (y′‑y)².

  • Constraint B (architectural concurrence).
    All replicas share a single set of weights, i.e., w must be identical across data items, and each neuron’s output must equal the downstream neuron’s input. This is a linear subspace (hyperplane) defined by a system of equalities, including the normalization w·w = m (mirroring the input norm x·x = m). Projection onto B is trivial: average the replicated weights and enforce the linear equalities.

By keeping the replicas independent, the A‑projection remains simple; the B‑projection then enforces the global consistency that would otherwise make A intractable.

RRR Algorithm.
At each iteration the current variable vector z (collecting all w, x, y for all replicas) is projected onto A and B, yielding z_A and z_B. The update rule is

 z ← z + β (z_A − z_B),

where β ∈ (0, 2) plays the role of a step‑size. The “gap” Δ = ‖z_A − z_B‖ measures how far the two constraints are from agreement; Δ = 0 indicates a feasible solution. Because A is non‑convex, Δ can temporarily increase, allowing the algorithm to escape local dead‑ends—a behavior absent in stochastic gradient descent (SGD), which relies on smooth gradients and small minibatches. Consequently, large batch sizes (as large as memory permits) are advantageous for the constraint‑based method.

Theoretical Guarantees on Sparsity.
The authors prove that if the margin is set to μₘ = p·m/σ, then, with high probability, every trained BTF will have at most σ non‑zero weights of equal magnitude. When σ is odd, the only way to satisfy the margin equality is to have exactly σ non‑zero entries, each of magnitude p/σ, yielding the logical gate equivalence described above.

Empirical Evaluation.
Six experiments illustrate the method’s capabilities:

  1. Multiplier‑circuit discovery.
    Random 5‑layer logic circuits (AND/OR or Maj gates) generate 32‑bit multiplication tables. With as few as 256 training examples, the RRR‑based learner achieves 100 % test accuracy, while a baseline multilayer perceptron (trained with SGD for up to 10⁶ steps) plateaus below 100 % and shows no sharp transition.

  2. Binary autoencoding.
    An autoencoder built from BTFs learns exact 0/1 codes without the “½” saturation typical of sigmoid‑based autoencoders, demonstrating that the margin constraint forces truly binary latent representations.

  3. MNIST binary classification.
    After binarizing MNIST images, the BTF network reaches high accuracy with far fewer epochs than SGD, highlighting robustness to discrete data.

  4. Logic‑network inference.
    Given partial input‑output pairs from an unknown Boolean circuit, the method reconstructs the exact gate layout, confirming the theoretical claim that limited data suffice when the target function is realizable by a sparse BTF network.

  5. Cellular automata learning.
    Training on a subset of space‑time patterns from a 1‑D cellular automaton (e.g., Rule 110) enables the network to predict the rule perfectly, again with a modest number of examples.

  6. Weight‑distribution analysis.
    Histograms of learned weights show peaks at ±√(m/σ) for σ = 3 and at ±√(m) for σ = 1, matching the expected magnitudes for Maj/Copy gates, confirming that the algorithm indeed discovers logical structures rather than arbitrary real‑valued solutions.

Across all tasks, the constraint‑based approach either finds exact solutions or generalizes dramatically better than SGD, especially when the underlying function is highly discrete and sparse.

Implications and Contributions.
The paper’s key contributions are:

  • A rigorous reformulation of Boolean neural network training as a non‑convex feasibility problem, sidestepping gradients entirely.
  • A margin‑driven sparsity theory that guarantees learned neurons correspond to interpretable logical gates.
  • Demonstration that the RRR projection algorithm, originally from phase‑retrieval, scales to large batches and yields superior performance on discrete learning problems.
  • Empirical evidence that the method excels in settings where traditional back‑propagation struggles, suggesting new avenues for hardware‑efficient (±1 weight) implementations and for building inherently interpretable models.

In summary, “Learning with Boolean threshold functions” provides a compelling alternative to gradient‑based learning for discrete neural systems, blending ideas from constraint satisfaction, geometry, and Boolean logic to achieve both theoretical clarity and practical success.


Comments & Academic Discussion

Loading comments...

Leave a Comment