Comparing human and automatic thesaurus mapping approaches in the agricultural domain

Reading time: 5 minute
...

📝 Original Info

  • Title: Comparing human and automatic thesaurus mapping approaches in the agricultural domain
  • ArXiv ID: 0808.2246
  • Date: 2019-01-15
  • Authors: Researchers from original ArXiv paper

📝 Abstract

Knowledge organization systems (KOS), like thesauri and other controlled vocabularies, are used to provide subject access to information systems across the web. Due to the heterogeneity of these systems, mapping between vocabularies becomes crucial for retrieving relevant information. However, mapping thesauri is a laborious task, and thus big efforts are being made to automate the mapping process. This paper examines two mapping approaches involving the agricultural thesaurus AGROVOC, one machine-created and one human created. We are addressing the basic question "What are the pros and cons of human and automatic mapping and how can they complement each other?" By pointing out the difficulties in specific cases or groups of cases and grouping the sample into simple and difficult types of mappings, we show the limitations of current automatic methods and come up with some basic recommendations on what approach to use when.

💡 Deep Analysis

Deep Dive into Comparing human and automatic thesaurus mapping approaches in the agricultural domain.

Knowledge organization systems (KOS), like thesauri and other controlled vocabularies, are used to provide subject access to information systems across the web. Due to the heterogeneity of these systems, mapping between vocabularies becomes crucial for retrieving relevant information. However, mapping thesauri is a laborious task, and thus big efforts are being made to automate the mapping process. This paper examines two mapping approaches involving the agricultural thesaurus AGROVOC, one machine-created and one human created. We are addressing the basic question “What are the pros and cons of human and automatic mapping and how can they complement each other?” By pointing out the difficulties in specific cases or groups of cases and grouping the sample into simple and difficult types of mappings, we show the limitations of current automatic methods and come up with some basic recommendations on what approach to use when.

📄 Full Content

Proc. Int’l Conf. on Dublin Core and Metadata Applications 2008 Comparing human and automatic thesaurus mapping approaches in the agricultural domain

Boris Lauser Gudrun Johannsen Caterina Caracciolo Johannes Keizer FAO,
Italy boris.lauser@fao.org gudrun.johannsen@fao.org caterina.caracciolo@fao.org johannes.keizer@fao.org
Willem Robert van Hage TNO Science & Industry / Vrije Universiteit Amsterdam,
the Netherlands wrvhage@few.vu.nl

Philipp Mayr GESIS Social Science Information Centre
Bonn,
Germany philipp.mayr@gesis.org

Abstract Knowledge organization systems (KOS), like thesauri and other controlled vocabularies, are used to provide subject access to information systems across the web. Due to the heterogeneity of these systems, mapping between vocabularies becomes crucial for retrieving relevant information. However, mapping thesauri is a laborious task, and thus big efforts are being made to automate the mapping process. This paper examines two mapping approaches involving the agricultural thesaurus AGROVOC, one machine-created and one human created. We are addressing the basic question “What are the pros and cons of human and automatic mapping and how can they complement each other?” By pointing out the difficulties in specific cases or groups of cases and grouping the sample into simple and difficult types of mappings, we show the limitations of current automatic methods and come up with some basic recommendations on what approach to use when. Keywords: mapping thesauri, knowledge organization systems, intellectual mapping, ontology matching.

  1. Introduction Information on the Internet is constantly growing and with it the number of digital libraries, databases and information management systems. Each system uses different ways of describing their metadata, and different sets of keywords, thesauri and other knowledge organization systems (KOS) to describe its subject content. Accessing and synthesizing information by subject across distributed databases is a challenging task, and retrieving all information available on a specific subject in different information systems is nearly impossible. One of the reasons is the different vocabularies used for subject indexing. For example, one system might use the keyword ‘snakes’, whereas the other system uses the taxonomic name ‘Serpentes’ to classify information about the same subject. If users are not aware of the different ‘languages’ used by the systems, they might not be able to find all the relevant information. If, however, the system itself “knows”, by means of mappings, that ‘snakes’ maps to ‘Serpentes’, the system can appropriately translate the user’s query and therefore retrieve the relevant information without the user having to know about all synonyms or variants used in the different databases.
    Mapping major thesauri and other knowledge organization systems in specific domains of interest can therefore greatly enhance the access to information in these domains. System developers for library search applications can programmatically incorporate mapping files into the search applications. The mappings can hence be utilized at query time to translate a user

2007 Proc. Int’l Conf. on Dublin Core and Metadata Applications query into the terminology used in the different systems of the available mappings and seamlessly retrieve consolidated information from various databases1.
Mappings are usually established by domain experts, but this is a very labor intensive, time consuming and error-prone task (Doerr, 2001). For this reason, great attention is being devoted to the possibility of creating mappings in an automatic or semi-automatic way (Vizine-Goetz, Hickey, Houghton, Thompsen, 2004), (Euzenat & Shvaiko, 2007), (Kalfoglou & Schorlemmer, 2003) and (Maedche, Motik, Silva, Volz, 2002). However, so far, research has focused mainly on the quantitative analysis of the automatically obtained mappings, i.e. purely in terms of precision and recall of either end-to-end document retrieval or of the quality of the sets of mappings produced by a system. Only little attention has been paid to a comparative study of manual and automatic mappings. A qualitative analysis is necessary to learn how and when automatic techniques are a suitable alternative to high-quality but very expensive manual mapping. This paper aims to fill that gap. We will elaborate on mappings between three KOS in the agricultural domain: AGROVOC, NALT and SWD. • AGROVOC2 is a multilingual, structured and controlled vocabulary designed to cover the terminology of all subject fields in agriculture, forestry, fisheries, food and related domains (e.g. environment). The AGROVOC Thesaurus was developed by the Food and Agriculture Organization of the United Nations (FAO) and the European Commission, in the early 1980s. It is currently available online in 17 languages (more are under development) and cont

…(Full text truncated)…

Reference

This content is AI-processed based on ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut