Title: Establishing a Multi-Thesauri-Scenario based on SKOS and Cross-Concordances
ArXiv ID: 1009.5352
Date: 2010-09-28
Authors: ** - Mayr, B. (KoMoHe 프로젝트) - Petras, V. (KoMoHe 프로젝트) - Zapilko, M. (GESIS) - Sure, Y. (GESIS) - Neubert, H. (ZBW) - 기타 기여자 (STW, IBLK‑Thesaurus 등) **
📝 Abstract
This case study proposes a scenario with three topic-related thesauri, which have been connected with bilateral cross-concordances as part of a major terminology mapping initiative in the project KoMoHe (Mayr & Petras, 2008). The thesauri have already been or will be converted to SKOS and in order to not omit the relevant crosswalks, the mapping properties of SKOS will be used for modeling them adequately.
💡 Deep Analysis
📄 Full Content
With the standardization of SKOS1 (Simple Knowledge Organization System) in August 2009 a data model has been offered to publish controlled vocabularies and taxonomies on the web in a technical and semantically interoperable way. The heterogeneous environment of various vocabularies worldwide can be technically harmonized prospectively and especially the content of traditional databases can be made accessible and connectable for applications of the Semantic Web, i.e. as Linked Open Data2 . Vocabularies in SKOS format and respectively crosswalks between them can play a relevant role in this context, because they can serve as a bridging hub for the inter-linking of different published and indexed data sets.
This case study proposes a scenario with three topic-related thesauri, which have been connected with bilateral cross-concordances as part of a major terminology mapping initiative in the project KoMoHe (Mayr & Petras, 2008). The thesauri have already been or will be converted to SKOS and in order to not omit the relevant crosswalks, the mapping properties of SKOS will be used for modeling them adequately. The participating thesauri in this approach are: (i) TheSoz (Thesaurus for the Social Sciences, GESIS) which has been converted to SKOS in a first experimental version (Zapilko & Sure, 2009) in 2009, the current version uses SKOS-XL for representing preferred and non-preferred terms and defines additional extensions which are oriented on the introduced SKOS extensions of the EUROVOC thesaurus (Smedt, 2009) to model more complex relations between terms, i.e. “use combination” relations, (ii) STW (Standard-Thesaurus for Economics, ZBW) which has also been published in SKOS format (Neubert, 2009) and (iii) IBLK-Thesaurus (SWP).
Currently, the conversion of vocabularies to SKOS is an active research area, but there are still unsolved and relevant issues which could not be treated satisfyingly yet. Our approach focuses on the application of existing crosswalks to the SKOS mapping properties and the establishment of a linked data application based on those connected thesauri.
The SKOS mapping properties provide standardized relations in order to link SKOS concepts of different concept schemes, which are represented in this scenario by three participating thesauri. When modeling cross-concordances in SKOS format inconsistencies and problems can occur which are caused by idiosyncrasies in thesauri. Although SKOS provides a standard model for representing vocabularies, transformed or converted thesauri can differ a lot due to various complexity and heterogeneous structure. Modeling mostly term-based thesauri in a concept-based way can be realized differently. A reason for inconsistencies is that the given cross-concordances where defined on term-based thesauri, but the SKOS versions of those thesauri are concept-based.
Therefore crosswalks between traditional thesauri cannot simply be adapted to the SKOS mapping properties under certain conditions. It has to be proven if the two terms of a given crosswalk represent adequate concepts in the corresponding SKOS versions by i.e. being used as skos:prefLabel in a concept. In case of the cross-concordances defined in the KoMoHe project they were only defined between preferred terms which means that a conversion to SKOS should be feasible without further complications. In general, if the above described requirements are met, existing cross-concordances can be relatively easy be transformed to SKOS (see listing 1). Depending on where the SKOS cross-concordances are physically stored the full URIs of the references concepts have to be addressed.
<rdf:Description rdf:about=“http://lod.gesis.org/thesoz/concept/10039068">
<skos:exactMatch rdf:rescource=“http://zbw.eu/stw/descriptor/11971-0">
</rdf:Description> Listing 1: Example for a simple cross-concordance in SKOS between TheSoz and STW For the case that there are crosswalks between non-preferred terms, each participating SKOS vocabulary has to be checked on how non-preferred terms are modeled, because the mapping properties of SKOS can only be used between concepts. At current state those crosswalks could not directly be modeled in SKOS, additional extensions would have been to define in order to preserve the relevant information.
Domain-specific differences in thesauri can cause conversion problems either. For example, a concept in one thesaurus can correspond to a combination of two concepts in another thesaurus. Cross-concordances can be in such a complex manner like associate relations between terms of one vocabulary. But the mapping properties of SKOS are too restrictive in their current definition that alternative possibilities, i.e. defining own extensions, have to be defined on how to deal with these special use cases.
In order to provide interoperability between the participating thesauri and the external data sets, the thesauri and the cross-concordances between them have to be made accessible on the web