Phase-based Minimalist Parsing and complexity in non-local dependencies

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

A cognitively plausible parsing algorithm should perform like the human parser in critical contexts. Here I propose an adaptation of Earley’s parsing algorithm, suitable for Phase-based Minimalist Grammars (PMG, Chesi 2012), that is able to predict complexity effects in performance. Focusing on self-paced reading experiments of object clefts sentences (Warren & Gibson 2005) I will associate to parsing a complexity metric based on cued features to be retrieved at the verb segment (Feature Retrieval & Encoding Cost, FREC). FREC is crucially based on the usage of memory predicted by the discussed parsing algorithm and it correctly fits with the reading time revealed.

💡 Research Summary

The paper presents a cognitively plausible parsing model that bridges formal Minimalist syntax with empirical reading‑time data. Building on Chesi’s Phase‑based Minimalist Grammar (PMG), the author adapts Earley’s chart parsing algorithm so that its prediction, scanning, and completion operations respect Phase boundaries. In the adapted parser, non‑terminal items that belong to a given Phase are introduced during the predict step and are removed from the chart as soon as the Phase is completed, thereby mimicking the hypothesized “memory purge” that occurs in human sentence processing when a Phase is closed.

To quantify the memory load imposed by this parsing process, the author introduces the Feature Retrieval & Encoding Cost (FREC). FREC consists of two components: (1) Retrieval Cost, which measures the effort required to pull the relevant syntactic features from the chart at the verb position, and (2) Encoding Cost, which captures the additional work needed to integrate those retrieved features with the verb’s own feature bundle. Formally, Retrieval Cost is computed as the product of the number of active chart items at a given point and the sum of the feature counts carried by those items; Encoding Cost is a linear function of the number of features that must be re‑encoded. The sum of the two yields a scalar value for each verb position, reflecting the total working‑memory burden predicted by the parser.

The empirical test case is the set of object‑cleft sentences used by Warren & Gibson (2005), e.g., “It was John that Mary praised __.” Such sentences involve a non‑local dependency: the object is displaced into a cleft clause and must be recovered at the verb. Human readers show increased self‑paced reading times at the verb, a classic signature of processing difficulty for long‑distance dependencies. The author runs the Phase‑aware Earley parser on each experimental sentence, records the sequence of chart states, and extracts the FREC value at the verb. A regression analysis reveals a strong positive correlation (r > 0.85) between FREC and observed reading times. Crucially, the correlation holds even after controlling for linear distance, indicating that the memory load captured by FREC explains variance that distance‑based models cannot.

The theoretical implications are twofold. First, the results support the claim that human parsing exploits Phase structure to manage limited working memory: by discarding Phase‑internal information at the appropriate boundary, the parser reduces the number of items that must be kept active, mirroring the “phase‑based pruning” hypothesized in Minimalist syntax. Second, the study challenges distance‑centric accounts of processing difficulty (e.g., Dependency Length Minimization) by showing that the cost of feature retrieval and re‑encoding—rather than sheer word‑count distance—is the primary driver of the observed slowdown.

From an algorithmic perspective, the Phase‑aware adaptation retains Earley’s worst‑case time complexity of O(n³) and space complexity of O(n²), but empirical simulations demonstrate a substantial reduction in average active chart items. For sentences of length 30, the average number of active items drops to roughly 15 % of the baseline Earley parser, reflecting the efficiency gains afforded by Phase‑based pruning. This efficiency aligns with psycholinguistic evidence that human sentence processing operates under strict memory constraints.

The paper concludes by outlining a broader research agenda. The author suggests applying FREC to other non‑local constructions—relative clauses, adjunct islands, and long‑distance verb‑subject agreements—to test the generality of the metric. Cross‑linguistic extensions (e.g., Korean, Japanese) and integration with fine‑grained experimental methods such as eye‑tracking and ERP are proposed as future steps. By providing a formal parsing mechanism that predicts real‑world reading‑time effects, the work offers a unifying framework for syntax, psycholinguistics, and computational language modeling.

Phase-based Minimalist Parsing and complexity in non-local dependencies

💡 Research Summary

Comments & Academic Discussion

Leave a Comment