Overview of Annotation Creation: Processes & Tools

February 23, 2026

Reading time: 6 minute

...

📝 Abstract

Creating linguistic annotations requires more than just a reliable annotation scheme. Annotation can be a complex endeavour potentially involving many people, stages, and tools. This chapter outlines the process of creating end-to-end linguistic annotations, identifying specific tasks that researchers often perform. Because tool support is so central to achieving high quality, reusable annotations with low cost, the focus is on identifying capabilities that are necessary or useful for annotation tools, as well as common problems these tools present that reduce their utility. Although examples of specific tools are provided in many cases, this chapter concentrates more on abstract capabilities and problems because new tools appear continuously, while old tools disappear into disuse or disrepair. The two core capabilities tools must have are support for the chosen annotation scheme and the ability to work on the language under study. Additional capabilities are organized into three categories: those that are widely provided; those that often useful but found in only a few tools; and those that have as yet little or no available tool support.

💡 Analysis

🇰🇷 한글로 읽기

📄 Content

To appear in James Pustejovsky & Nancy Ide (2016) “Handbook of Linguistic Annotation.” New York: Springer Overview of Annotation Creation: Processes & Tools Mark A. Finlayson and Tomaž Erjavec

Abstract
Creating linguistic annotations requires more than just a reliable annotation scheme. Annotation can be a complex endeavour potentially involving many people, stages, and tools. This chapter outlines the process of creating end-to- end linguistic annotations, identifying specific tasks that researchers often perform. Because tool support is so central to achieving high quality, reusable annotations with low cost, the focus is on identifying capabilities that are necessary or useful for annotation tools, as well as common problems these tools present that reduce their utility. Although examples of specific tools are provided in many cases, this chapter concentrates more on abstract capabilities and problems because new tools appear continuously, while old tools disappear into disuse or disrepair. The two core capabilities tools must have are support for the chosen annotation scheme and the ability to work on the language under study. Additional capabilities are organized into three categories: those that are widely provided; those that often useful but found in only a few tools; and those that have as yet little or no available tool support. 1 Annotation: More than just a scheme Creating manually annotated linguistic corpora requires more than just a reliable annotation scheme. A reliable scheme, of course, is a central ingredient to successful annotation; but even the most carefully designed scheme will not answer a number of practical questions about how to actually create the annotations, progressing from raw linguistic data to annotated linguistic artifacts that can be used to answer interesting questions or do interesting things. Annotation, especially high-quality annotation of large language datasets, can be a complex process potentially involving many people, stages, and tools, and the scheme only specifies the conceptual content of the annotation. By way of example, the following questions are relevant to a text annotation project and are not answered by a scheme:
 How should linguistic artifacts be prepared? Will the originals be annotated directly, or will their textual content be extracted into separate files for annotation? In the latter case, what layout or formatting will be kept (lines, paragraphs page breaks, section headings, highlighted text)? What file format will be used? How will typographical errors be handled? Will typos be ignored, changed in the original, changed in extracted content, or encoded as an additional annotation? Who will be allowed to make corrections: the annotators themselves, adjudicators, or perhaps only the project manager?  How will annotators be provided artifacts to annotate? How will the order of annotation be specified (if at all), and how will this order be enforced? How will the project manager ensure that each document is annotated the appropriate number of times (e.g., by two different people for double annotation).  What inter-annotator agreement measures (IAAs) will be measured, and when? Will IAAs be measured continuously, on batches, or on other subsets of the corpus? How will their measurement at the right time be enforced? Will IAAs be used to track annotator training? If so, what level of IAA will be considered to indicate that training has succeeded? These questions are only a small selection of those that arise during the practical process of conducting annotation. The first goal of this chapter is to give an overview of the process of annotation from start to finish, pointing out these sorts of questions and subtasks for each stage. We will start with a known conceptual framework for the annotation process, the MATTER framework (Pustejovsky & Stubbs, 2013) and expand upon it. Our expanded framework is not guaranteed to be complete, but it will give a reader a very strong flavor of the kind of issues that arise so that they can start to anticipate them in the design of their own annotation project. The second goal is to explore the capabilities required by annotation tools. Tool support is central to effecting high quality, reusable annotations with low cost. The focus will be on identifying capabilities that are necessary or useful for annotation tools. Again, this list will not be exhaustive but it will be fairly representative, as the majority of it was generated by surveying a number of annotation experts about their opinions of available tools. Also listed are common problems that reduce tool utility (gathered during the same survey). Although specific examples of tools will be provided in many cases, the focus will be on more abstract capabilities and problems because new tools appear all the time while old tools disappear into disuse or disrepair. Before beginning, it is well to first i

View Original ArXiv

This content is AI-processed based on ArXiv data.

Overview of Annotation Creation: Processes & Tools

📝 Abstract

💡 Analysis

📄 Content

Table of Contents

Table of Contents

📝 Abstract

💡 Analysis

📄 Content

Start searching

No results found