Corrigendum to 'Is the expertise of evaluation panels congruent with the research interests of the research groups: a quantitative approach based on barycenters' [Journal of Informetrics 9(4) (2015) 704-721]
Title: Corrigendum to ‘Is the expertise of evaluation panels congruent with the research interests of the research groups: a quantitative approach based on barycenters’ [Journal of Informetrics 9(4) (2015) 704-721]
ArXiv ID: 1609.07975
Date: 2016-09-27
Authors: A. I. M. Jakaria Rahman, Raf Guns, Ronald Rousseaub, Tim C. E. Engels
📝 Abstract
In Rahman, Guns, Rousseau, and Engels (2015) we described several approaches to determine the cognitive distance between two units. One of these approaches was based on what we called barycenters in N dimensions. This note corrects this terminology and introduces the more adequate term 'similarity-adapted publication vectors'.
💡 Deep Analysis
📄 Full Content
and another one based on so-called "barycenters" in N dimensions, where N denotes the total number of Web of Science Subject Categories (WoS SCs). The first of these approaches uses overlay maps (Rafols, Porter, & Leydesdorff, 2010). Each SC has a place on this map, characterized by the corresponding coordinates, denoted as (L j,1 , L j,2 ), j = 1, …, N. Now for each panel member and for each research group a barycenter is calculated and Euclidean distances between barycenters can be determined. Coordinates of these barycenters (in 2 dimensions) are given as
where m j is the number of publications of the unit under investigation (panel member, research group) belonging to category j; this category j has coordinates (L j,1 , L j,2 ) in the base map. The total number of publications of the unit under investigation is denoted as
This approach is the barycenter approach as announced in the title of Rahman et al. (2015). In this way distances between entities, as represented by their barycenters, can be calculated leading to quantitative results answering the aforementioned research question.
Urged by a colleague, we point out that the term ‘barycenter’ taken on its own, has no meaning. Any point can be the barycenter of infinitely many sets of points, possibly using sets of weights. We refer the reader to appendix A for a formal description of the notion of a barycenter.
We further note that in order to obtain meaningful distances these values must be scaleinvariant. This means that the distance between points P and Q must be the same as the distance between the points P and cQ, where c is a strictly positive number. Indeed: the total output of a research group can be several orders of magnitude larger than that of one expert.
This difference must not play a role in determining cognitive distances. The barycenter method explained above and in particular formulae (1) satisfy this requirement as multiplying all m j s with the same strictly positive factor leads to the same barycenter.
As stated earlier, we also used another quantitative approach, which was referred to as a barycenter approach in N dimensions. In this approach, we used a matrix of similarity values between the WoS SCs as made available by Rafols et al. (2010) at http://www.leydesdorff.net/overlaytoolkit/map10.paj
. These authors created a matrix of citing to cited SCs based on the Science Citation Index (SCI) and Social Sciences Citation Index (SSCI), which was cosine-normalized in the citing direction. The result is a symmetric N×N similarity matrix (here, N=224) which we denote by S = (s ij ) ij . Now each unit’s publications are represented by an N-dimensional vector. Coordinates of these vectors are the number of publications in each WoS SC. Then we wrote in (Rahman et al., 2015):
A barycenter in N dimensions is determined as the point = , , … , , where:
Here denotes the -th coordinate of WoS subject category , is the number of publications in subject category , and = ∑ is the total number of publications.
Observe that we replaced (for clarity) L = A as used in Rahman et al. (2015) by S (the similarity matrix) and M (in the original publication) by T (the total number of publications of the unit under investigation). In Rahman et al. (2015) we provided concrete calculations of distances between these so-called barycenters of units. Although formula (1) and ( 2) look the same, their interpretation is different as will be explained.
The numerator of formula ( 2) is equal to the k-th coordinate of ! * #, the multiplication of the similarity matrix S and the column matrix of publications # = $ % . We next include an example showing what is actually happening.
Let N be 4. Assume that a unit has publication column
Dividing by T=5 yields the vector
Clearly, the resulting column vector is not a barycenter as it is not obtained as the result of a barycenter operation on a set of vectors.
The column vector ! * #/ , resulting from the matrix product of matrix S and column vector M/T, can be interpreted as a ‘pseudo-normalized’ publication vector that takes similarity into account. It is not a real normalization because normalization has been performed with respect to the sum of the coordinates of M and not with respect to ! * #. For this reason, we call ! * # a similarity-adapted publication vector, denoted as M sa . In this example, this means that, for instance, the one publication in the second category also contributes (for 10%) to the publications in category 1. Although there is no original publication in category 4, we end up with a value 3.3 because category 1 and category 4 are very similar (80% similarity) and also the second category contributed. If we neglect similarity then S is the identity matrix and publication columns stay unchanged.
Hence, the distances we calculated through the N-dimensional approach in Rahman et al. (2015) are not normalized and not scale-invariant although they should be. In retrospect, we admit that re