Knowledge organization systems (KOS), like thesauri and other controlled vocabularies, are used to provide subject access to information systems across the web. Due to the heterogeneity of these systems, mapping between vocabularies becomes crucial for retrieving relevant information. However, mapping thesauri is a laborious task, and thus big efforts are being made to automate the mapping process. This paper examines two mapping approaches involving the agricultural thesaurus AGROVOC, one machine-created and one human created. We are addressing the basic question "What are the pros and cons of human and automatic mapping and how can they complement each other?" By pointing out the difficulties in specific cases or groups of cases and grouping the sample into simple and difficult types of mappings, we show the limitations of current automatic methods and come up with some basic recommendations on what approach to use when.
Deep Dive into Comparing human and automatic thesaurus mapping approaches in the agricultural domain.
Knowledge organization systems (KOS), like thesauri and other controlled vocabularies, are used to provide subject access to information systems across the web. Due to the heterogeneity of these systems, mapping between vocabularies becomes crucial for retrieving relevant information. However, mapping thesauri is a laborious task, and thus big efforts are being made to automate the mapping process. This paper examines two mapping approaches involving the agricultural thesaurus AGROVOC, one machine-created and one human created. We are addressing the basic question “What are the pros and cons of human and automatic mapping and how can they complement each other?” By pointing out the difficulties in specific cases or groups of cases and grouping the sample into simple and difficult types of mappings, we show the limitations of current automatic methods and come up with some basic recommendations on what approach to use when.
Proc. Int’l Conf. on Dublin Core and Metadata Applications 2008
Comparing human and automatic thesaurus mapping
approaches in the agricultural domain
Boris Lauser
Gudrun Johannsen
Caterina Caracciolo
Johannes Keizer
FAO,
Italy
boris.lauser@fao.org
gudrun.johannsen@fao.org
caterina.caracciolo@fao.org
johannes.keizer@fao.org
Willem Robert van Hage
TNO Science & Industry /
Vrije Universiteit
Amsterdam,
the Netherlands
wrvhage@few.vu.nl
Philipp Mayr
GESIS Social Science
Information Centre
Bonn,
Germany
philipp.mayr@gesis.org
Abstract
Knowledge organization systems (KOS), like thesauri and other controlled vocabularies, are used
to provide subject access to information systems across the web. Due to the heterogeneity of
these systems, mapping between vocabularies becomes crucial for retrieving relevant
information. However, mapping thesauri is a laborious task, and thus big efforts are being made
to automate the mapping process. This paper examines two mapping approaches involving the
agricultural thesaurus AGROVOC, one machine-created and one human created. We are
addressing the basic question “What are the pros and cons of human and automatic mapping and
how can they complement each other?” By pointing out the difficulties in specific cases or groups
of cases and grouping the sample into simple and difficult types of mappings, we show the
limitations of current automatic methods and come up with some basic recommendations on what
approach to use when.
Keywords: mapping thesauri, knowledge organization systems, intellectual mapping, ontology
matching.
- Introduction
Information on the Internet is constantly growing and with it the number of digital libraries,
databases and information management systems. Each system uses different ways of describing
their metadata, and different sets of keywords, thesauri and other knowledge organization
systems (KOS) to describe its subject content. Accessing and synthesizing information by subject
across distributed databases is a challenging task, and retrieving all information available on a
specific subject in different information systems is nearly impossible. One of the reasons is the
different vocabularies used for subject indexing. For example, one system might use the keyword
‘snakes’, whereas the other system uses the taxonomic name ‘Serpentes’ to classify information
about the same subject. If users are not aware of the different ‘languages’ used by the systems,
they might not be able to find all the relevant information. If, however, the system itself “knows”,
by means of mappings, that ‘snakes’ maps to ‘Serpentes’, the system can appropriately translate
the user’s query and therefore retrieve the relevant information without the user having to know
about all synonyms or variants used in the different databases.
Mapping major thesauri and other knowledge organization systems in specific domains of
interest can therefore greatly enhance the access to information in these domains. System
developers for library search applications can programmatically incorporate mapping files into
the search applications. The mappings can hence be utilized at query time to translate a user
2007 Proc. Int’l Conf. on Dublin Core and Metadata Applications
query into the terminology used in the different systems of the available mappings and seamlessly
retrieve consolidated information from various databases1.
Mappings are usually established by domain experts, but this is a very labor intensive, time
consuming and error-prone task (Doerr, 2001). For this reason, great attention is being devoted to
the possibility of creating mappings in an automatic or semi-automatic way (Vizine-Goetz,
Hickey, Houghton, Thompsen, 2004), (Euzenat & Shvaiko, 2007), (Kalfoglou & Schorlemmer,
2003) and (Maedche, Motik, Silva, Volz, 2002). However, so far, research has focused mainly on
the quantitative analysis of the automatically obtained mappings, i.e. purely in terms of precision
and recall of either end-to-end document retrieval or of the quality of the sets of mappings
produced by a system. Only little attention has been paid to a comparative study of manual and
automatic mappings. A qualitative analysis is necessary to learn how and when automatic
techniques are a suitable alternative to high-quality but very expensive manual mapping. This
paper aims to fill that gap. We will elaborate on mappings between three KOS in the agricultural
domain: AGROVOC, NALT and SWD.
•
AGROVOC2 is a multilingual, structured and controlled vocabulary designed to cover
the terminology of all subject fields in agriculture, forestry, fisheries, food and related
domains (e.g. environment). The AGROVOC Thesaurus was developed by the Food
and Agriculture Organization of the United Nations (FAO) and the European
Commission, in the early 1980s. It is currently available online in 17 languages (more
are under development) and cont
…(Full text truncated)…
This content is AI-processed based on ArXiv data.