A Practical Ontology for the Large-Scale Modeling of Scholarly Artifacts and their Usage

Reading time: 5 minute
...

📝 Original Info

  • Title: A Practical Ontology for the Large-Scale Modeling of Scholarly Artifacts and their Usage
  • ArXiv ID: 0708.1150
  • Date: 2007-08-09
  • Authors: ** 제공된 텍스트에는 저자 정보가 명시되어 있지 않음. (원문 PDF 혹은 학술 데이터베이스에서 확인 필요) **

📝 Abstract

The large-scale analysis of scholarly artifact usage is constrained primarily by current practices in usage data archiving, privacy issues concerned with the dissemination of usage data, and the lack of a practical ontology for modeling the usage domain. As a remedy to the third constraint, this article presents a scholarly ontology that was engineered to represent those classes for which large-scale bibliographic and usage data exists, supports usage research, and whose instantiation is scalable to the order of 50 million articles along with their associated artifacts (e.g. authors and journals) and an accompanying 1 billion usage events. The real world instantiation of the presented abstract ontology is a semantic network model of the scholarly community which lends the scholarly process to statistical analysis and computational support. We present the ontology, discuss its instantiation, and provide some example inference rules for calculating various scholarly artifact metrics.

💡 Deep Analysis

📄 Full Content

New publications are added to the scholarly record at an accelerating pace. This point is realized by observing the evolution of the amount of publications indexed in Thomson Scientific's citation database over the last fifteen years: 875,310 in 1990; 1,067,292 in 1995; 1,164,015 in 2000, and 1,511,067 in 2005. However, the extent of the scholarly record reaches far beyond what is indexed by Thompson Scientific. While Thompson Scientific focuses primarily on quality-driven journals (roughly 8,700 in 2005), they do not index more novel scholarly artifacts such as preprints deposited in institutional or discipline-oriented repositories, datasets, software, and simulations that are increasingly being considered scholarly communication units in their own right.

While the size (and growth) of the scholarly record is impressive, the extent of its use is even more staggering. For instance, in November 2006, Elsevier’s Science Direct, which provides access to articles from approximately 2,000 journals, celebrated its 1 billionth full-text download since counting started in April of 1999 1 . And, again, the extent of scholarly usage clearly reaches far beyond Elsevier’s repository. Furthermore, usage events include not only full-text downloads, but also events such as requesting services from linking servers, downloading bibliographic citations, emailing abstracts, etc.

To a large extent, the effect of usage behavior on the scholarly process is a horizon that is only beginning to be understood and, if properly studied, will offer clues to the evolutionary trends of science [1,2,3], quantitative models of the value of scholarly artifacts [4,5], and services to support scholars [6]. The Andrew W. Mellon funded MESUR 2 project at the Research Library of the Los Alamos National Laboratory aims at developing metrics for assessing scholarly communication artifacts (e.g. articles, journals, conference proceedings, etc.) and agents (e.g. authors, institutions, publishers, repositories, etc.) on the basis of scholarly usage. In order to do this, the MESUR project makes use of a representative collection of bibliographic, citation and usage data. This data is collected from a wide variety of sources including academic publishers, secondary publishers, institutional linking servers, etc. Expectations are that the collected data will eventually encompass tens of millions of bibliographic records, hundreds of millions of citations, and billions of usage events. Mining such a vast data set in an efficient, performing, and flexible manner presents significant challenges regarding data representation and data access. This article presents, the OWL ontology [7] used by MESUR to represent bibliographic, citation and usage data in an integrated manner. The proposed MESUR ontology is practical, as opposed to all encompassing, in that it represents those artifacts and properties that, as previously shown in [6], are realistically available from modern scholarly information systems. This includes bibliographic data such as author, title, identifier, publication date and usage data such as the IP address of the accessing agent, the date and time of access, type of usage, etc. Finally, another novel contribution of this work is the hybrid storage and access architecture in which relational database and triple store technology are combined. This is achieved by storing core data and relationships in the triple store and auxiliary data in a relational database. This design choice is driven by the need to keep the size of the triple store to a level that can realistically be handled by current technologies. The combination of the data architecture and scholarly ontology presented in this article provide the foundation for the largescale modeling and analysis of scholarly artifacts and their usage.

A semantic network (sometimes called a multi-relational network or multi-graph) is composed of a set of nodes (representing heterogeneous artifacts) connected to one another by a set of qualified, or labeled, edges [8]. In a graph theoretic sense, a semantic network is a directed labeled graph. Because an edge is labeled, two nodes can be connected to one another by an infinite number of edges. However, in most cases, the possible interconnections between node types is constrained to a predetermined set. This predetermined set is made explicit in the semantic network’s associated ontology. An ontology is generally defined as a set of abstract classes, their relationship to one another, and a collection of inference rules for deriving implicit relationships [9]. An ontology makes no explicit reference to the actual instances of the defined abstract classes; this is the role of the semantic network.

An ontology is related to the developer’s API in object oriented programming languages such as C++ and Java (minus the explicit representation of class methods/functions). For example, the set of relationships of an ontological class are known as the cl

Reference

This content is AI-processed based on open access ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut