In this paper, we present an open-source parsing environment (Tuebingen Linguistic Parsing Architecture, TuLiPA) which uses Range Concatenation Grammar (RCG) as a pivot formalism, thus opening the way to the parsing of several mildly context-sensitive formalisms. This environment currently supports tree-based grammars (namely Tree-Adjoining Grammars, TAG) and Multi-Component Tree-Adjoining Grammars with Tree Tuples (TT-MCTAG)) and allows computation not only of syntactic structures, but also of the corresponding semantic representations. It is used for the development of a tree-based grammar for German.
Deep Dive into TuLiPA: Towards a Multi-Formalism Parsing Environment for Grammar Engineering.
In this paper, we present an open-source parsing environment (Tuebingen Linguistic Parsing Architecture, TuLiPA) which uses Range Concatenation Grammar (RCG) as a pivot formalism, thus opening the way to the parsing of several mildly context-sensitive formalisms. This environment currently supports tree-based grammars (namely Tree-Adjoining Grammars, TAG) and Multi-Component Tree-Adjoining Grammars with Tree Tuples (TT-MCTAG)) and allows computation not only of syntactic structures, but also of the corresponding semantic representations. It is used for the development of a tree-based grammar for German.
arXiv:0807.3622v1 [cs.CL] 23 Jul 2008
TuLiPA: Towards a Multi-Formalism Parsing Environment for
Grammar Engineering
Laura Kallmeyer
SFB 441
Universit¨at T¨ubingen
D-72074, T¨ubingen, Germany
lk@sfs.uni-tuebingen.de
Yannick Parmentier
CNRS - LORIA
Nancy Universit´e
F-54506, Vandœuvre, France
parmenti@loria.fr
Timm Lichte
SFB 441
Universit¨at T¨ubingen
D-72074, T¨ubingen, Germany
timm.lichte@uni-tuebingen.de
Johannes Dellert
SFB 441 - SfS
Universit¨at T¨ubingen
D-72074, T¨ubingen, Germany
{jdellert,kevang}@sfs.uni-tuebingen.de
Wolfgang Maier
SFB 441
Universit¨at T¨ubingen
D-72074, T¨ubingen, Germany
wo.maier@uni-tuebingen.de
Kilian Evang
SFB 441 - SfS
Universit¨at T¨ubingen
D-72074, T¨ubingen, Germany
Abstract
In this paper, we present an open-source
parsing environment (T¨ubingen Linguistic
Parsing Architecture, TuLiPA) which uses
Range Concatenation Grammar (RCG)
as a pivot formalism, thus opening the
way to the parsing of several mildly
context-sensitive formalisms.
This en-
vironment currently supports tree-based
grammars (namely Tree-Adjoining Gram-
mars (TAG) and Multi-Component Tree-
Adjoining Grammars with Tree Tuples
(TT-MCTAG)) and allows computation not
only of syntactic structures, but also of the
corresponding semantic representations. It
is used for the development of a tree-based
grammar for German.
1
Introduction
Grammars and lexicons represent important lin-
guistic resources for many NLP applications,
among which one may cite dialog systems, auto-
matic summarization or machine translation. De-
veloping such resources is known to be a complex
task that needs useful tools such as parsers and
generators (Erbach, 1992).
Furthermore, there is a lack of a common frame-
work allowing for multi-formalism grammar engi-
neering. Thus, many formalisms have been pro-
posed to model natural language, each coming
with specific implementations.
Having a com-
mon framework would facilitate the comparison
c⃝2008.
Licensed under the Creative Commons
Attribution-Noncommercial-Share Alike 3.0 Unported
li-
cense
(http://creativecommons.org/licenses/by-nc-sa/3.0/).
Some rights reserved.
between formalisms (e.g., in terms of parsing com-
plexity in practice), and would allow for a better
sharing of resources (e.g., having a common lex-
icon, from which different features would be ex-
tracted depending on the target formalism).
In this context, we present a parsing environ-
ment relying on a general architecture that can
be used for parsing with mildly context-sensitive
(MCS) formalisms1 (Joshi, 1987).
Its underly-
ing idea is to use Range Concatenation Grammar
(RCG) as a pivot formalism, for RCG has been
shown to strictly include MCS languages while be-
ing parsable in polynomial time (Boullier, 2000).
Currently, this architecture supports tree-based
grammars (Tree-Adjoining Grammars and Multi-
Component Tree-Adjoining Grammars with Tree
Tuples (Lichte, 2007)).
More precisely, tree-
based grammars are first converted into equivalent
RCGs, which are then used for parsing. The result
of RCG parsing is finally interpreted to extract a
derivation structure for the input grammar, as well
as to perform additional processings (e.g., seman-
tic calculus, extraction of dependency views).
The paper is structured as follows. In section 2,
we present the architecture of the TuLiPA parsing
environment and show how the use of RCG as a
pivot formalism makes it easier to design a modu-
lar system that can be extended to support several
dimensions (syntax, semantics) and/or formalisms.
In section 3, we give some desiderata for gram-
mar engineering and present TuLiPA’s current state
1A formalism is said to be mildly context sensitive (MCS)
iff (i) it generates limited cross-serial dependencies, (ii) it is
polynomially parsable, and (iii) the string languages gener-
ated by the formalism have the constant growth property (e.g.,
{a2n|n ≥0} does not have this property). Examples of MCS
formalisms include Tree-Adjoining Grammars, Combinatory
Categorial Grammars and Linear Indexed Grammars.
with respect to these. In section 4, we compare
this system with existing approaches for parsing
and more generally for grammar engineering. Fi-
nally, in section 5, we conclude by presenting fu-
ture work.
2
Range Concatenation Grammar as a
pivot formalism
The main idea underlying TuLiPA is to use RCG
as a pivot formalism for RCG has appealing for-
mal properties (e.g., a generative capacity ly-
ing beyond Linear Context Free Rewriting Sys-
tems and a polynomial parsing complexity) and
there exist efficient algorithms, for RCG parsing
(Boullier, 2000) and for grammar transformation
into RCG (Boullier, 1998; Boullier, 1999).
Parsing with TuLiPA is thus a 3-step process:
1. The input tree-based grammar is converted
into
an
RCG
(using
the
algorithm
of
Kallmeyer and Parmentier (2008) when deal-
ing with TT-MCTAG).
2. The resulting RCG is used for parsing the in-
put string using an extension of the parsing
algorithm of Boullier (2000).
3. The RCG deriva
…(Full text truncated)…
This content is AI-processed based on ArXiv data.