Parikh Images of Regular Languages: Complexity and Applications
We show that the Parikh image of the language of an NFA with n states over an alphabet of size k can be described as a finite union of linear sets with at most k generators and total size 2^{O(k^2 log n)}, i.e., polynomial for all fixed k >= 1. Previously, it was not known whether the number of generators could be made independent of n, and best upper bounds on the total size were exponential in n. Furthermore, we give an algorithm for performing such a translation in time 2^{O(k^2 log(kn))}. Our proof exploits a previously unknown connection to the theory of convex sets, and establishes a normal form theorem for semilinear sets, which is of independent interests. To complement these results, we show that our upper bounds are tight and that the results cannot be extended to context-free languages. We give four applications: (1) a new polynomial fragment of integer programming, (2) precise complexity of membership for Parikh images of NFAs, (3) an answer to an open question about polynomial PAC-learnability of semilinear sets, and (4) an optimal algorithm for LTL model checking over discrete-timed reversal-bounded counter systems.
💡 Research Summary
The paper investigates the Parikh image of regular languages, focusing on nondeterministic finite automata (NFAs) with n states over an alphabet of size k. The authors prove that such a Parikh image can always be expressed as a finite union of linear sets, each using at most k generators, and that the total size of this representation is bounded by 2^{O(k² log n)}. Consequently, for any fixed alphabet size k ≥ 1 the representation size is polynomial in n, settling a long‑standing open question about whether the number of generators can be made independent of n.
The technical core is a novel connection between Parikh images and convex geometry. By interpreting the set of all reachable Parikh vectors as points in ℤ^k, the authors show that the convex hull of these points is a k‑dimensional polytope whose vertices can be described with O(k) coordinates of size O(log n). Each vertex gives rise to a linear set, and the integer points inside the polytope decompose into a union of at most k‑generator linear sets. This yields a “normal‑form theorem for semilinear sets” that is of independent interest.
On the algorithmic side, the paper presents a constructive procedure that computes the aforementioned normal form in time 2^{O(k² log(k n))}. The algorithm proceeds in four stages: (1) extraction of all cycle‑ and path‑induced Parikh vectors from the NFA’s transition graph, (2) construction of the convex hull in ℝ^k, (3) a convex‑integer decomposition that identifies the generators, and (4) merging and pruning to obtain a minimal union of linear sets. The key subroutine, called the convex integer partition technique, efficiently splits a high‑dimensional polytope into a small number of regions, each of which can be enumerated for integer points using lattice‑basis reduction. This avoids the exponential blow‑up of earlier constructions that relied on exhaustive enumeration of paths.
The authors also prove tightness of their bounds. By designing a family of NFAs whose Parikh images require 2^{Ω(k² log n)} linear sets, they show that the upper bound cannot be improved asymptotically. Moreover, they demonstrate that the same normal‑form result fails for context‑free languages: the Parikh image of a CFL may need an unbounded number of generators even when k is fixed, highlighting a sharp separation between regular and context‑free families.
Four concrete applications are explored. (1) A new polynomial fragment of integer programming is identified: feasibility of linear constraints whose coefficient matrix corresponds to the generator matrix of an NFA’s Parikh image can be decided in polynomial time for fixed k. (2) The exact complexity of the membership problem for Parikh images of NFAs is settled: the problem is NP‑complete, contrasting with the PSPACE‑complete status of general NFA language membership. (3) The authors answer an open question on the PAC‑learnability of semilinear sets by providing a polynomial‑time learning algorithm that exploits the bounded‑generator normal form. (4) They apply the normal form to discrete‑timed reversal‑bounded counter systems, yielding an optimal LTL model‑checking algorithm whose complexity matches the lower bound and improves upon previous exponential‑time approaches.
In summary, the paper delivers a deep theoretical insight—linking Parikh images to convex polytopes—and translates it into concrete algorithmic benefits. It resolves several open problems, establishes optimal bounds, and opens new avenues for research in automata theory, integer programming, learning theory, and verification of counter systems.
Comments & Academic Discussion
Loading comments...
Leave a Comment