TuLiPA: Towards a Multi-Formalism Parsing Environment for Grammar Engineering

Reading time: 5 minute
...

📝 Original Info

  • Title: TuLiPA: Towards a Multi-Formalism Parsing Environment for Grammar Engineering
  • ArXiv ID: 0807.3622
  • Date: 2009-09-29
  • Authors: ** Laura Kallmeyer, Yannick Parmentier, Timm Lichte, Johannes Dellert, Wolfgang Maier, Kilian Evang **

📝 Abstract

In this paper, we present an open-source parsing environment (Tuebingen Linguistic Parsing Architecture, TuLiPA) which uses Range Concatenation Grammar (RCG) as a pivot formalism, thus opening the way to the parsing of several mildly context-sensitive formalisms. This environment currently supports tree-based grammars (namely Tree-Adjoining Grammars, TAG) and Multi-Component Tree-Adjoining Grammars with Tree Tuples (TT-MCTAG)) and allows computation not only of syntactic structures, but also of the corresponding semantic representations. It is used for the development of a tree-based grammar for German.

💡 Deep Analysis

Deep Dive into TuLiPA: Towards a Multi-Formalism Parsing Environment for Grammar Engineering.

In this paper, we present an open-source parsing environment (Tuebingen Linguistic Parsing Architecture, TuLiPA) which uses Range Concatenation Grammar (RCG) as a pivot formalism, thus opening the way to the parsing of several mildly context-sensitive formalisms. This environment currently supports tree-based grammars (namely Tree-Adjoining Grammars, TAG) and Multi-Component Tree-Adjoining Grammars with Tree Tuples (TT-MCTAG)) and allows computation not only of syntactic structures, but also of the corresponding semantic representations. It is used for the development of a tree-based grammar for German.

📄 Full Content

arXiv:0807.3622v1 [cs.CL] 23 Jul 2008 TuLiPA: Towards a Multi-Formalism Parsing Environment for Grammar Engineering Laura Kallmeyer SFB 441 Universit¨at T¨ubingen D-72074, T¨ubingen, Germany lk@sfs.uni-tuebingen.de Yannick Parmentier CNRS - LORIA Nancy Universit´e F-54506, Vandœuvre, France parmenti@loria.fr Timm Lichte SFB 441 Universit¨at T¨ubingen D-72074, T¨ubingen, Germany timm.lichte@uni-tuebingen.de Johannes Dellert SFB 441 - SfS Universit¨at T¨ubingen D-72074, T¨ubingen, Germany {jdellert,kevang}@sfs.uni-tuebingen.de Wolfgang Maier SFB 441 Universit¨at T¨ubingen D-72074, T¨ubingen, Germany wo.maier@uni-tuebingen.de Kilian Evang SFB 441 - SfS Universit¨at T¨ubingen D-72074, T¨ubingen, Germany Abstract In this paper, we present an open-source parsing environment (T¨ubingen Linguistic Parsing Architecture, TuLiPA) which uses Range Concatenation Grammar (RCG) as a pivot formalism, thus opening the way to the parsing of several mildly context-sensitive formalisms. This en- vironment currently supports tree-based grammars (namely Tree-Adjoining Gram- mars (TAG) and Multi-Component Tree- Adjoining Grammars with Tree Tuples (TT-MCTAG)) and allows computation not only of syntactic structures, but also of the corresponding semantic representations. It is used for the development of a tree-based grammar for German. 1 Introduction Grammars and lexicons represent important lin- guistic resources for many NLP applications, among which one may cite dialog systems, auto- matic summarization or machine translation. De- veloping such resources is known to be a complex task that needs useful tools such as parsers and generators (Erbach, 1992). Furthermore, there is a lack of a common frame- work allowing for multi-formalism grammar engi- neering. Thus, many formalisms have been pro- posed to model natural language, each coming with specific implementations. Having a com- mon framework would facilitate the comparison c⃝2008. Licensed under the Creative Commons Attribution-Noncommercial-Share Alike 3.0 Unported li- cense (http://creativecommons.org/licenses/by-nc-sa/3.0/). Some rights reserved. between formalisms (e.g., in terms of parsing com- plexity in practice), and would allow for a better sharing of resources (e.g., having a common lex- icon, from which different features would be ex- tracted depending on the target formalism). In this context, we present a parsing environ- ment relying on a general architecture that can be used for parsing with mildly context-sensitive (MCS) formalisms1 (Joshi, 1987). Its underly- ing idea is to use Range Concatenation Grammar (RCG) as a pivot formalism, for RCG has been shown to strictly include MCS languages while be- ing parsable in polynomial time (Boullier, 2000). Currently, this architecture supports tree-based grammars (Tree-Adjoining Grammars and Multi- Component Tree-Adjoining Grammars with Tree Tuples (Lichte, 2007)). More precisely, tree- based grammars are first converted into equivalent RCGs, which are then used for parsing. The result of RCG parsing is finally interpreted to extract a derivation structure for the input grammar, as well as to perform additional processings (e.g., seman- tic calculus, extraction of dependency views). The paper is structured as follows. In section 2, we present the architecture of the TuLiPA parsing environment and show how the use of RCG as a pivot formalism makes it easier to design a modu- lar system that can be extended to support several dimensions (syntax, semantics) and/or formalisms. In section 3, we give some desiderata for gram- mar engineering and present TuLiPA’s current state 1A formalism is said to be mildly context sensitive (MCS) iff (i) it generates limited cross-serial dependencies, (ii) it is polynomially parsable, and (iii) the string languages gener- ated by the formalism have the constant growth property (e.g., {a2n|n ≥0} does not have this property). Examples of MCS formalisms include Tree-Adjoining Grammars, Combinatory Categorial Grammars and Linear Indexed Grammars. with respect to these. In section 4, we compare this system with existing approaches for parsing and more generally for grammar engineering. Fi- nally, in section 5, we conclude by presenting fu- ture work. 2 Range Concatenation Grammar as a pivot formalism The main idea underlying TuLiPA is to use RCG as a pivot formalism for RCG has appealing for- mal properties (e.g., a generative capacity ly- ing beyond Linear Context Free Rewriting Sys- tems and a polynomial parsing complexity) and there exist efficient algorithms, for RCG parsing (Boullier, 2000) and for grammar transformation into RCG (Boullier, 1998; Boullier, 1999). Parsing with TuLiPA is thus a 3-step process: 1. The input tree-based grammar is converted into an RCG (using the algorithm of Kallmeyer and Parmentier (2008) when deal- ing with TT-MCTAG). 2. The resulting RCG is used for parsing the in- put string using an extension of the parsing algorithm of Boullier (2000). 3. The RCG deriva

…(Full text truncated)…

Reference

This content is AI-processed based on ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut