Title: Web Usage Analysis: New Science Indicators and Co-usage
ArXiv ID: 0811.0719
Date: 2008-11-06
Authors: Researchers from original ArXiv paper
📝 Abstract
A new type of statistical analysis of the science and technical information (STI) in the Web context is produced. We propose a set of indicators about Web users, visualized bibliographic records, and e-commercial transactions. In addition, we introduce two Web usage factors. Finally, we give an overview of the co-usage analysis. For these tasks, we introduce a computer based system, called Miri@d, which produces descriptive statistical information about the Web users' searching behaviour, and what is effectively used from a free access digital bibliographical database. The system is conceived as a server of statistical data which are carried out beforehand, and as an interactive server for online statistical work. The results will be made available to analysts, who can use this descriptive statistical information as raw data for their indicator design tasks, and as input for multivariate data analysis, clustering analysis, and mapping. Managers also can exploit the results in order to improve management and decision-making.
💡 Deep Analysis
Deep Dive into Web Usage Analysis: New Science Indicators and Co-usage.
A new type of statistical analysis of the science and technical information (STI) in the Web context is produced. We propose a set of indicators about Web users, visualized bibliographic records, and e-commercial transactions. In addition, we introduce two Web usage factors. Finally, we give an overview of the co-usage analysis. For these tasks, we introduce a computer based system, called Miri@d, which produces descriptive statistical information about the Web users’ searching behaviour, and what is effectively used from a free access digital bibliographical database. The system is conceived as a server of statistical data which are carried out beforehand, and as an interactive server for online statistical work. The results will be made available to analysts, who can use this descriptive statistical information as raw data for their indicator design tasks, and as input for multivariate data analysis, clustering analysis, and mapping. Managers also can exploit the results in order to
📄 Full Content
Web Usage Analysis: New Science Indicators and
Co-usage
Xavier Polanco, Ivana Roche, Dominique Besagni {polanco,roche,besagni}@inist.fr
Institut de l’Information Scientifique et Technique (INIST / CNRS) 2 allée du Parc de Brabois – 54514 Vandoeuvre-lès-Nancy – France
Mots clés : webométrie, fouille de données d’usage du Web, indicateurs de la science, comportement
utilisateur Web, analyse co-usage, serveur Web
Keywords: webometrics, Web usage mining, science indicators, Web user behaviour, co-usage analysis,
Web server
Palabras clave: webometría, minería utilización de la Web, indicadores de la ciencia, comportamiento
utilizador de la Web, análisis co-utilización, servidor Web
Résumé
A new type of statistical analysis of the science and technical information (STI) in the Web context is
produced. We propose a set of indicators about Web users, visualized bibliographic records, and e-
commercial transactions. In addition, we introduce two Web usage factors. Finally, we give an
overview of the co-usage analysis. For these tasks, we introduce a computer based system, called
Miri@d, which produces descriptive statistical information about the Web users’ searching
behaviour, and what is effectively used from a free access digital bibliographical database. The
system is conceived as a server of statistical data which are carried out beforehand, and as an
interactive server for online statistical work. The results will be made available to analysts, who can
use this descriptive statistical information as raw data for their indicator design tasks, and as input for
multivariate data analysis, clustering analysis, and mapping. Managers also can exploit the results in
order to improve management and decision-making.
1 Introduction
Two scientific communities are dealing with Web analysis related questions. This is the reason why
we can observe in the literature two traditions about the analysis of the Web. One developed by people
coming from documentation, and the other by computer scientists. The first was developed in the field
of information science under the appellations of “webometrics” (Almind & Ingwersen, 1997), or
“cybermetrics” (cf. http://www.cindoc.csic.es/cybermetrics
) while seeking to extend the informetric
techniques to the analysis of the Web (Björneborn & Ingwersen, 2001; Ingwersen & Björneborn,
2004). The second one arose in the field of the computer science while seeking to extend the data
mining techniques to Web analysis under the appellation of “Web mining” (Chakrabarti, 2003) and
according to three main categories: Web structure mining, Web content mining, and Web usage
mining (Kosala & Blockeel, 2000). We work at the border of these two traditions: we consider
informetrics from the point of view of computer-based technologies. The Web represents a new
environment for the quantitative studies of science, and a new family of computer-based science
indicators can be developed. This article deals with a system able to produce descriptive bibliometric
statistics, and statistical information on Web users’ behaviour.
The article is organized as follows. The first two sections deal with the presentation of the Miri@d
server (section 2), and the statistical indicators that Miri@d is able to produce (section 3). The results
of the Miri@d application are exposed in section 4. Section 5 describes the co-usage analysis and
section 6 deals with the application of co-usage analysis on Web user data coming from the Miri@d
server.
2 Server organisation
We provide in this section a detailed description of the Miri@d server structure. We start
distinguishing the conceptual model that Miri@d represents and its actual technological
implementation. The first is general and the second is local.
2.1 The model
Figure 1 represents what we call the model. The model is general in the sense that it is not limited
to the particular characteristics represented in figure 2. It is significant to see that the model
implies three families of data which it can exploit on the one hand log-files data and on the other
hand bibliographic data and commercial data. From the economic point of view, the bibliographic
database can be replaced by the concept of an unspecified product database. From the point of
view of scientific information, the bibliographic database can also be any. At least theoretically,
i.e., on the level of its concept, the model is not completely enclosed within the data sources
which it is today using.
Figure 1: The model
2.2 The server structure
Figure 2 represents the server structure, which consists of a set of external resources that providing
raw data, and a set of database internal to server.