Framework and Resources for Natural Language Parser Evaluation

Reading time: 5 minute
...

📝 Original Info

  • Title: Framework and Resources for Natural Language Parser Evaluation
  • ArXiv ID: 0712.3705
  • Date: 2007-12-24
  • Authors: Researchers from original ArXiv paper

📝 Abstract

Because of the wide variety of contemporary practices used in the automatic syntactic parsing of natural languages, it has become necessary to analyze and evaluate the strengths and weaknesses of different approaches. This research is all the more necessary because there are currently no genre- and domain-independent parsers that are able to analyze unrestricted text with 100% preciseness (I use this term to refer to the correctness of analyses assigned by a parser). All these factors create a need for methods and resources that can be used to evaluate and compare parsing systems. This research describes: (1) A theoretical analysis of current achievements in parsing and parser evaluation. (2) A framework (called FEPa) that can be used to carry out practical parser evaluations and comparisons. (3) A set of new evaluation resources: FiEval is a Finnish treebank under construction, and MGTS and RobSet are parser evaluation resources in English. (4) The results of experiments in which the developed evaluation framework and the two resources for English were used for evaluating a set of selected parsers.

💡 Deep Analysis

Deep Dive into Framework and Resources for Natural Language Parser Evaluation.

Because of the wide variety of contemporary practices used in the automatic syntactic parsing of natural languages, it has become necessary to analyze and evaluate the strengths and weaknesses of different approaches. This research is all the more necessary because there are currently no genre- and domain-independent parsers that are able to analyze unrestricted text with 100% preciseness (I use this term to refer to the correctness of analyses assigned by a parser). All these factors create a need for methods and resources that can be used to evaluate and compare parsing systems. This research describes: (1) A theoretical analysis of current achievements in parsing and parser evaluation. (2) A framework (called FEPa) that can be used to carry out practical parser evaluations and comparisons. (3) A set of new evaluation resources: FiEval is a Finnish treebank under construction, and MGTS and RobSet are parser evaluation resources in English. (4) The results of experiments in which the

📄 Full Content

I wish to extend my thanks and acknowledgement to everyone who has helped me with this dissertation. I would like to express my gratitude to my supervisor, Professor Erkki Sutinen, without whom I would never have begun this project. It was Professor Sutinen's encouragement that originally inspired me to undertake scientific research. I would also like to thank PhD Stefan Werner for acting as the co-supervisor, Roger Loveday for proofreading the manuscript, and Phil. Lic. Simo Vihjanen of Lingsoft Inc for initially directing me to the field of parser evaluation. The comments of the two reviewers, Professor Mike Joy and PhD Krister Lindén, were valuable for preparing the final version of the manuscript. I thank them for their work.

My visits to other research institutes and groups were an integral part of the research process. I appreciated from the very beginning how contacts with other researchers with different skills and interests enabled me to develop my ideas, and I would like to extend a warm word of thanks to all those with whom I made contact during this research. I would in particular like to thank Professor Koenraad de Smedt, who acted as my supervisor during my stay in Bergen, Norway, in 2005, where I was privileged to be a research fellow at the Marie Curie Early Stage Researcher Training Site MULTILINGUA. I would also like to thank Professor Etienne Barnard, PhD Willie Smit and others on the staff of the Human Language Technology Group of the Council for Scientific and Industrial Research (CSIR) in Pretoria, South Africa, for creating a pleasant working environment in which I was able to accomplish a great deal. I would also like to thank Professor Joško Božanić for giving me permission to work in the friendly atmosphere of the Faculty of Philosophy at the University of Split, Croatia, in 2006 and2007. Working ………………………………………………………………… 9.1.2 Preciseness…………………………………………………………………………………………………………….. 9.1.3 Coverage……………………………………………………………………………………………………………….. 9.1.4 Robustness …………………………………………………………………………………………………………….. 9.1.5 Efficiency ………………………………………………………………………………………………………………. 9.1.6 10.3.1 Previous work ………………………………………………………………………………………………………. 10.3.2 Test settings………………………………………………………………………………………………………….. 10.3.3 Results…………………………………………………………………………………………………………………. 10.3.4 Conclusion …………………………………………………………………………………………………………. 10.4.1 Previous work ………………………………………………………………………………………………………. 10.4.2 Test settings………………………………………………………………………………………………………….. 10.4.3 10.6.1 Test settings………………………………………………………………………………………………………….. 10.6.2 Results…………………………………………………………………………………………………………………. 10.6.3 Conclusion ……………………………………………………………………………………………………………

This thesis reports research into the syntactic parsing of natural languages and evaluation of parsing systems. In this work, techniques and algorithms for parsing have been analyzed and compared on the theoretical level, and resources, methods and tools for the practical evaluation and comparison of syntactic parsers have been designed and implemented. A natural language is a language that has evolved through use in a social system, and is used by human beings for everyday communication. A grammar specifies the rules for how each sentence is constructed from parts. Parsing is the process of identifying the syntactic structure of a given sentence. A natural language parser is computer software that automatically performs parsing and outputs the structural description of a given character string in the context of a specific grammar. The output of a parser is called a parse and it describes the structure of a particular analyzed language fragment.

Because of the ubiquity of the Internet among other factors, the amount of available textual information has grown explosively in past decades. This has resulted in an ever-increasing demand for software that can automatically pro

…(Full text truncated)…

📸 Image Gallery

cover.png

Reference

This content is AI-processed based on ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut