Entity-Augmented Distributional Semantics for Discourse Relations

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Discourse relations bind smaller linguistic elements into coherent texts. However, automatically identifying discourse relations is difficult, because it requires understanding the semantics of the linked sentences. A more subtle challenge is that it is not enough to represent the meaning of each sentence of a discourse relation, because the relation may depend on links between lower-level elements, such as entity mentions. Our solution computes distributional meaning representations by composition up the syntactic parse tree. A key difference from previous work on compositional distributional semantics is that we also compute representations for entity mentions, using a novel downward compositional pass. Discourse relations are predicted not only from the distributional representations of the sentences, but also of their coreferent entity mentions. The resulting system obtains substantial improvements over the previous state-of-the-art in predicting implicit discourse relations in the Penn Discourse Treebank.

💡 Research Summary

The paper tackles the challenging problem of automatically identifying implicit discourse relations, which are crucial for higher‑level language understanding tasks such as summarization, sentiment analysis, and coherence evaluation. Implicit relations lack explicit connective cues, making them difficult to detect with surface‑level features alone. The authors argue that representing each sentence as a single vector is insufficient because discourse relations often hinge on the roles that coreferent entities play across sentences. To address this, they propose a novel “entity‑augmented distributional semantics” framework that jointly learns vector representations for whole sentences and for each entity mention that participates in the discourse.

The model consists of two compositional passes over a binary constituency parse tree. In the upward pass, a standard Recursive Neural Network (RNN) composition is applied: each non‑terminal node receives a K‑dimensional vector computed from its two children via a learned matrix U and a tanh non‑linearity, ultimately yielding a sentence‑level vector u₀ at the root. The downward pass is the key innovation: for any node that corresponds to an entity mention, a downward vector dᵢ is computed from the parent’s downward vector d_{ρ(i)} and the sibling’s upward vector u_{s(i)} using a second learned matrix V and tanh. The root’s downward vector is initialized to the upward root vector, ensuring that the downward flow starts from the global sentence meaning. This process yields, for each coreferent entity, a vector that encodes its semantic role within the discourse context.

For relation classification, the authors define a bilinear scoring function:
ψ(y) = u_m^{0⊤} A_y u_n^{0} + Σ_{(i,j)∈A(m,n)} d_m^{i⊤} B_y d_n^{j} + β_y^{⊤} φ(m,n) + b_y.
The first term captures interactions between the two sentence vectors, the second term aggregates interactions between every pair of coreferent entity vectors across the two sentences, φ(m,n) is a small set of handcrafted surface features (e.g., presence of a connective, sentence length), and b_y is a bias. The predicted relation is the label y that maximizes ψ(y). Parameters A_y, B_y, β_y are learned jointly with the composition matrices U and V using a discriminative loss over the training data.

Experiments are conducted on the Penn Discourse Treebank (PDTB) 2.0, focusing on the second‑level implicit relation classification task. The authors automatically generate constituency parses with Stanford CoreNLP and coreference links with the Berkeley coreference system. They compare several configurations: (1) a baseline using only surface features (Lin et al., 2009), (2) their model with only sentence vectors, (3) only entity vectors, (4) sentence + surface features, and (5) the full model combining sentence, entity, and surface features. The full model achieves 43.56 % accuracy, surpassing the prior state‑of‑the‑art (≈40.2 %) by 3.4 % absolute (p < 0.05). Notably, adding entity vectors yields a larger gain than adding surface features alone, confirming that entity‑role information is highly discriminative for implicit discourse relations.

The paper’s contributions are threefold: (i) introducing a downward compositional pass that produces role‑specific entity embeddings, (ii) integrating sentence‑level and entity‑level interactions in a unified bilinear classifier, and (iii) demonstrating empirically that this joint representation outperforms strong feature‑based baselines. The authors also discuss the importance of syntactic structure for semantic composition, arguing that purely sequential models (e.g., left‑to‑right LSTMs) may struggle to capture the hierarchical role information essential for discourse parsing. Limitations include dependence on the quality of automatic parses and coreference resolution, and reduced benefit when no coreferent entities are present.

In conclusion, the work shows that enriching distributional sentence semantics with entity‑specific vectors, derived via a novel downward pass over parse trees, substantially improves implicit discourse relation identification. Future directions suggested include making the composition robust to parsing errors, extending the approach to multilingual settings where parsers and coreference tools are less mature, and exploring deeper or attention‑based variants of the upward/downward composition to capture more nuanced discourse phenomena.

Entity-Augmented Distributional Semantics for Discourse Relations

💡 Research Summary

Comments & Academic Discussion

Leave a Comment