Term Coding and Dispersion: A Perfect-vs-Rate Complexity Dichotomy for Information Flow

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

We introduce a new framework term coding for extremal problems in discrete mathematics and information flow, where one chooses interpretations of function symbols so as to maximise the number of satisfying assignments of a finite system of term equations. We then focus on dispersion, the special case in which the system defines a term map $Θ^\mathcal I:\A^k\to\A^r$ and the objective is the size of its image. Writing $n:=|\A|$, we show that the maximum dispersion is $Θ(n^D)$ for an integer exponent $D$ equal to the guessing number of an associated directed graph, and we give a polynomial-time algorithm to compute $D$. In contrast, deciding whether \emph{perfect dispersion} ever occurs (i.e.\ whether $\Disp_n(\mathbf t)=n^r$ for some finite $n\ge 2$) is undecidable once $r\ge 3$, even though the corresponding asymptotic rate-threshold questions are polynomial-time decidable.

💡 Research Summary

The paper introduces a novel formalism called “term coding” that captures a broad class of extremal problems in discrete mathematics, information theory, and network coding. In a term‑coding instance one is given a finite alphabet A of size n, a set V of variables, a signature F of function symbols (each with a fixed arity), and a finite set Γ of term equations over V and F. An interpretation I assigns to each k‑ary function symbol a concrete total function f^I : A^k → A. For a fixed interpretation, the set of global assignments a ∈ A^V that satisfy all equations is denoted Sol_I(Γ; n). The central optimisation problem, TERM‑CODING‑MAX, asks for the maximum possible size S_n(Γ) = max_I |Sol_I(Γ; n)| over all interpretations.

The authors first develop a systematic preprocessing pipeline that converts any term‑coding instance into a canonical, graph‑based form without changing S_n(Γ). The pipeline consists of (i) flattening all terms to depth‑one equations of the shape f(u₁,…,u_k)=v (introducing fresh auxiliary variables as needed), (ii) quotienting equalities that share the same left‑hand side, and (iii) enforcing a collision‑free normal form where each variable appears on the left‑hand side of at most one equation. These steps are all polynomial‑time and preserve the optimum code size.

From the collision‑free functional form one extracts a directed dependency graph G_Γ whose vertices are the variables and whose edges encode functional dependence (an edge u→v means that v is defined as a function of u’s). This graph provides a bridge to the well‑studied notion of guessing games on directed graphs. The guessing number of a graph, originally introduced in the context of network coding, counts (in logarithmic base n) the number of global configurations that can be guessed correctly when each vertex knows the values of its in‑neighbors. The paper proves a “guessing‑number sandwich” theorem: for any term‑coding instance the optimal code size lies between the winning configuration counts of two diversified versions of the instance, each of which corresponds exactly to a guessing game on G_Γ. Consequently, the integer D = log_n S_n(Γ) (when it exists) coincides with the guessing number of G_Γ.

The authors then focus on the special case of dispersion. A dispersion instance is defined by k input variables x = (x₁,…,x_k) and a tuple of r output terms t(x) = (t₁(x),…,t_r(x)). Under an interpretation I the terms induce a map Θ^I : A^k → A^r, and the dispersion is Disp_n(t) = max_I |Im(Θ^I)|. The dispersion exponent D(t) = lim_{n→∞} log_n Disp_n(t) is shown to exist, to be an integer, and to equal the guessing number of the associated dependency graph. Moreover, D(t) can be computed in polynomial time by constructing a flow network N(t) whose size is polynomial in the syntactic size of t and applying a standard max‑flow/min‑cut algorithm. This yields a max‑flow/min‑cut characterisation of the exponent (Theorem 5.4).

The most striking contribution is a complexity dichotomy for dispersion. For output dimension r ≥ 3, deciding whether perfect dispersion occurs—i.e., whether there exists any finite alphabet size n ≥ 2 and interpretation I such that Disp_n(t) = n^r (equivalently Θ^I is surjective onto A^r)—is proved undecidable (Theorem 6.7). The proof reduces from the finite‑satisfiability and finite‑bijectivity problems, embedding them into term‑coding constraints. In stark contrast, deciding any asymptotic rate question—such as whether D(t) ≥ k for a given integer k—is polynomial‑time decidable via the flow‑network construction. Thus exact solvability (perfect dispersion) is algorithmically intractable, while rate‑threshold questions are efficiently solvable, establishing a perfect‑vs‑rate complexity jump.

Beyond the theoretical results, the paper supplies a toolbox for practitioners: (C1) a concrete preprocessing pipeline that yields a compact functional representation and dependency graph; (C2) a bridge to guessing games that imports graph‑entropy techniques into term‑coding analysis; (C3) an algorithmic method to compute the dispersion exponent using max‑flow/min‑cut. These tools unify and extend earlier work on network coding entropy (Riis & Gadouleau, 2015) to a broader equational setting.

In summary, the work defines a versatile term‑coding framework, shows that its optimisation reduces to graph‑based guessing numbers, provides a polynomial‑time method to compute the dispersion exponent via max‑flow/min‑cut, and demonstrates a sharp complexity dichotomy: perfect dispersion is undecidable for r≥3, whereas rate‑based dispersion thresholds are tractable. This delineates a clear boundary between what can be decided exactly and what can be decided asymptotically in deterministic information‑flow design problems.

Term Coding and Dispersion: A Perfect-vs-Rate Complexity Dichotomy for Information Flow

💡 Research Summary

Comments & Academic Discussion

Leave a Comment