Joint RNN-Based Greedy Parsing and Word Composition

Joint RNN-Based Greedy Parsing and Word Composition
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

This paper introduces a greedy parser based on neural networks, which leverages a new compositional sub-tree representation. The greedy parser and the compositional procedure are jointly trained, and tightly depends on each-other. The composition procedure outputs a vector representation which summarizes syntactically (parsing tags) and semantically (words) sub-trees. Composition and tagging is achieved over continuous (word or tag) representations, and recurrent neural networks. We reach F1 performance on par with well-known existing parsers, while having the advantage of speed, thanks to the greedy nature of the parser. We provide a fully functional implementation of the method described in this paper.


💡 Research Summary

The paper presents a novel greedy constituency parser that jointly learns a recurrent neural network (RNN) based parsing model and a compositional sub‑tree representation. Each word and each syntactic tag (POS or parsing label) is embedded into a continuous vector space (dimensions D for words and T for tags). These embeddings are concatenated and fed into two main components: a sliding‑window BIOES tagger and a set of composition networks C_k.

The tagger is a two‑layer feed‑forward network that scans a fixed‑size context window (typically K=5) over the current sequence of constituents. For each position it outputs scores for all BIOES‑prefixed parsing tags. A lightweight dynamic‑programming step enforces BIOES consistency, after which the highest‑scoring tag sequence is selected greedily.

Whenever a contiguous span of constituents is assigned a label A (e.g., B‑NP … E‑NP), the parser invokes the appropriate composition network C_k, where k is the number of merged children. C_k receives the concatenated vectors of the children (both their semantic word embeddings and their tag embeddings) and applies a linear transformation M_k followed by a non‑linearity (tanh) to produce a new D‑dimensional vector representing the newly formed node. This vector is stored back into the word lookup table, allowing subsequent composition or tagging steps to treat it exactly like a leaf word vector.

The entire process repeats iteratively: tag, compose, replace, until a single root node remains. Because both the tagging decisions and the compositional vectors are differentiable, the whole system can be trained end‑to‑end using a cross‑entropy loss over the BIOES tags. Gradients flow through the composition networks, the tagger, and the embedding tables, updating word embeddings, tag embeddings, composition matrices, and tagger weights simultaneously.

Experiments were conducted on the Wall Street Journal portion of the Penn Treebank. Pre‑trained word embeddings (derived from Lebret & Collobert, 2014) were used as initialization, while tag embeddings were learned from scratch. The model achieved an F1 score of approximately 90.2 %, comparable to the state‑of‑the‑art discriminative parsers that rely on PCFG re‑ranking or extensive feature engineering. Crucially, the greedy nature of the algorithm yields a parsing speed several times faster than traditional chart‑based parsers, making it attractive for real‑time applications.

Beyond parsing, the learned sub‑tree vectors encode both syntactic category and lexical semantics, offering a reusable phrase embedding that could benefit downstream tasks such as semantic role labeling, coreference resolution, or sentence‑level representation learning.

The authors discuss several strengths: (1) no reliance on handcrafted head‑word features or refined PCFG rules; (2) a unified representation space for words and composed phrases; (3) competitive accuracy combined with high speed. Limitations include the need for separate composition matrices for each possible arity k (which modestly increases parameters) and the inherent risk that greedy decisions may miss globally optimal tree structures, especially for very long or ambiguous sentences.

Future work suggested includes augmenting the greedy decoder with a small beam to explore alternative parses, replacing the simple linear‑tanh composition with attention‑based or transformer modules to capture longer‑range dependencies, and evaluating the phrase embeddings on a broader set of NLP tasks. Overall, the paper demonstrates that jointly training a greedy parser and a compositional representation can achieve strong parsing performance while maintaining efficiency and producing useful hierarchical embeddings.


Comments & Academic Discussion

Loading comments...

Leave a Comment