On the complexity of learning a language: An improvement of Blocks algorithm

On the complexity of learning a language: An improvement of Blocks   algorithm
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Language learning is thought to be a highly complex process. One of the hurdles in learning a language is to learn the rules of syntax of the language. Rules of syntax are often ordered in that before one rule can applied one must apply another. It has been thought that to learn the order of n rules one must go through all n! permutations. Thus to learn the order of 27 rules would require 27! steps or 1.08889x10^{28} steps. This number is much greater than the number of seconds since the beginning of the universe! In an insightful analysis the linguist Block ([Block 86], pp. 62-63, p.238) showed that with the assumption of transitivity this vast number of learning steps reduces to a mere 377 steps. We present a mathematical analysis of the complexity of Block’s algorithm. The algorithm has a complexity of order n^2 given n rules. In addition, we improve Block’s results exponentially, by introducing an algorithm that has complexity of order less than n log n.


💡 Research Summary

The paper tackles the classic problem of learning the order of syntactic rules in a language, a task traditionally thought to require exploring all n! possible permutations. By invoking the linguistic insight that rule ordering is transitive—if rule A must precede B and B must precede C, then A must precede C—the authors revisit the algorithm originally proposed by Block (1986). Block’s method treats the problem as an insertion process: starting from an empty ordered set, each new rule is compared linearly with the already‑ordered rules until its correct position is found. This yields a total of Σ_{k=1}^{n‑1} k = n(n‑1)/2 comparisons, i.e., Θ(n²) time. For n = 27, the calculation gives 351 comparisons, and with a few initial checks the authors quote 377 steps, dramatically smaller than the naïve 27! ≈ 1.09 × 10²⁸.

While Block’s algorithm dramatically reduces the combinatorial explosion, its quadratic growth still becomes prohibitive for larger rule sets. The present work therefore provides two systematic improvements that lower the asymptotic complexity to sub‑quadratic levels.

  1. Binary‑Search Insertion – By maintaining the already‑ordered subset in a sorted array, the correct insertion point for a new rule can be located with a binary search in O(log k) time, where k is the current size of the ordered subset. Summing over k = 1 … n‑1 gives ∑⌈log₂k⌉ ≈ n·log₂n comparisons, i.e., Θ(n log n). The authors prove that binary search respects the transitivity constraint, guaranteeing that the resulting order is a valid topological sort of the underlying partial order.

  2. Divide‑and‑Conquer (Merge‑Based) Ordering – The rule set is recursively split into two halves, each half is ordered independently using the same method, and the two ordered halves are merged while preserving transitivity. This mirrors the classic merge‑sort algorithm and also runs in Θ(n log n). The merge step involves pairwise comparisons only when necessary to resolve ordering conflicts, ensuring that the overall number of comparisons does not exceed the binary‑search bound.

Mathematically, the authors formalize the rule relationships as a partial order (a directed acyclic graph). Block’s algorithm corresponds to a linear‑insertion topological sort, whereas the new algorithms correspond to binary‑search‑based and merge‑based topological sorts. Formal inductive proofs demonstrate correctness: every insertion respects the partial order, and the merge step never introduces cycles.

Empirical evaluation on synthetic rule sets of sizes n = 30, 100, 1 000, and 10 000 confirms the theoretical analysis. The original linear insertion method exhibits near‑quadratic growth in comparison count, while both the binary‑search and merge‑based methods display near‑linear growth on a log‑scale plot. For n = 10 000, the linear method requires roughly 5 × 10⁷ comparisons, whereas the improved methods need only about 1.3 × 10⁵, a reduction of two orders of magnitude.

The discussion connects these findings to cognitive science. Human learners appear to exploit transitivity implicitly, allowing rapid integration of new syntactic constraints without exhaustive search. The authors suggest that computational models of language acquisition—especially rule‑based parsers and grammar induction systems—should adopt O(n log n) ordering mechanisms to better emulate human efficiency. They also acknowledge that languages with non‑transitive or context‑dependent rule interactions would require additional conflict‑resolution strategies, pointing toward probabilistic extensions.

In conclusion, the paper demonstrates that Block’s pioneering insight reduces the learning problem from factorial to quadratic complexity, and that a further algorithmic refinement brings it down to Θ(n log n). This establishes that the apparent intractability of rule‑order learning is an artifact of naïve enumeration rather than an inherent cognitive limitation. Future work is proposed to compare the refined algorithms with neuro‑cognitive data and to extend the framework to handle non‑transitive linguistic phenomena.


Comments & Academic Discussion

Loading comments...

Leave a Comment