Author Identifiers in Scholarly Repositories

February 23, 2026

Reading time: 5 minute

...

📝 Original Info

Title: Author Identifiers in Scholarly Repositories
ArXiv ID: 1003.1345
Date: 2011-03-25
Authors: Researchers from original ArXiv paper

📝 Abstract

Bibliometric and usage-based analyses and tools highlight the value of information about scholarship contained within the network of authors, articles and usage data. Less progress has been made on populating and using the author side of this network than the article side, in part because of the difficulty of unambiguously identifying authors. I briefly review a sample of author identifier schemes, and consider use in scholarly repositories. I then describe preliminary work at arXiv to implement public author identifiers, services based on them, and plans to make this information useful beyond the boundaries of arXiv.

💡 Deep Analysis

Deep Dive into Author Identifiers in Scholarly Repositories.

📄 Full Content

Author Identiﬁers in Scholarly Repositories Simeon Warner Cornell Information Science and Cornell University Library Ithaca, NY 14850, USA simeon.warner@cornell.edu Submitted: 2009-10-09 Abstract Bibliometric and usage-based analyses and tools highlight the value of information about scholarship contained within the network of authors, articles and usage data. Less progress has been made on populating and using the author side of this network than the article side, in part because of the diﬃculty of unambiguously identifying authors. I brieﬂy review a sample of author identiﬁer schemes, and consider use in scholarly repositories. I then describe preliminary work at arXiv to implement public author identiﬁers, services based on them, and plans to make this information useful beyond the boundaries of arXiv. 1 Context In an ideal scholarly communication system there would be tools to browse, navigate, make recom- mendations and assess inﬂuence based on the complete graph of all actors (people, collaborations, institutions) and all communication artifacts (articles, comments, blog posts, usage data1). As a shorthand I will call this complete graph the publication network. Contained within it are the famil- iar citation, usage, co-authorship, and co-citation graphs. In recent bibliometric and usage-based work, signiﬁcant progress has been made with the artifact part of this graph (see, for example the work of the MESUR project [3]). Much less progress has been made with the actor part of the graph, in part because it is much harder to unambiguously identify authors than articles. Consider table 1 which shows the most frequently occurring lastname, initial pairs in arXiv user accounts. This illustrates one facet of the name disambiguation problem, namely that there are many authors with the same name. This is compounded by inconsistent spellings, use of initials or full ﬁrst names, and even name changes. Within a single repository such as arXiv it is not usually possible to accurately answer the question “show me all the articles by this Zhang, Y”. In recent years there has been considerable work on unsupervised and supervised author name disambiguation using many diﬀerent heuristic, machine learning and clustering techniques, and many diﬀerent properties including co-authorship, citations and subjects/topics. While much better than naive approaches, these techniques are still far from perfect. In a recent Nature Correspondence, Raf Aerts asked “If it is possible to have DOIs for objects (or, so they say, enough IPv6 addresses for every molecule on Earth), why is it so diﬃcult to implement 1Logically usage data would be links between actors and artifacts. However, for historical, cultural and practical reasons most usage data is treated as anonymous even though co-usage information may be extracted. 1 arXiv:1003.1345v1 [cs.DL] 6 Mar 2010 Lastname, Initial Count Zhang, Y 100 Lee, J 97 Wang, Y 89 Wang, J 84 Chen, Y 77 Kim, J 77 Wang, X 76 Lee, S 74 Kim, S 69 Liu, Y 69 Table 1: Most frequently occurring lastname, initial pairs in arXiv user accounts. There may be a few duplicate accounts but this indicates that nearly 100 diﬀerent people named “Zhang, Y” have created user accounts at arXiv (as of May 2009). DAIs [Digital Author Identiﬁers] for authors?” [1]. Raf had earlier hinted at part of the answer by pointing out that he has more than one identiﬁer in Scopus [6]. As we have already discussed, it is diﬃcult to mine existing data to disambiguate references to authors. The more fundamental part of the answer is that it is much easier to create DOIs for articles when the one owner for an article creates the one DOI for it and presents it with the article (ignoring the issue of multiple versions of articles). As authors, we are not owned by a single authority and even if an identiﬁer were created for us at birth by the appropriate government, there would be signiﬁcant privacy concerns about using it for everything. Consider, for example, concerns over the uses and misuses of social security numbers in the USA. While we want to link a single author’s works together, do we want that identity to immediately link us to all other digital information about the private life of the individual? 2 Author Identiﬁers To illustrate the diversity of currently used author identiﬁers, table 2 shows several example schemes used in the scholarly domain. A more detailed inventory is provided on the repinf wiki [5]. The OpenID and ISNI schemes are not limited to the scholarly domain. OpenID is aimed primarily at authentication, however, if it continues to see growing acceptance it may well be a useful open system that repositories could use. It is not clear whether ISNI will develop into a widely used system. The largest eﬀorts to create author identiﬁers speciﬁcally for the scholarly domain, Scopus Author Identiﬁers and ResearcherID, come from commercial entities and are clearly motivated by the desire to provide improved services b

…(Full text truncated)…

📄 Read Full PDF on ArXiv

📸 Image Gallery

Reference

This content is AI-processed based on ArXiv data.

Author Identifiers in Scholarly Repositories

📝 Original Info

📝 Abstract

💡 Deep Analysis

📄 Full Content

📸 Image Gallery

Reference

Table of Contents

Table of Contents

📝 Original Info

📝 Abstract

💡 Deep Analysis

📄 Full Content

📸 Image Gallery

Reference

Related Posts

A Datamining Approach to the Short Title Catalogue Flanders: the Case of Early Modern Quiring Practices

A Mathematical Approach to the Study of the United States Code

A Simple Framework to Typify Social Bibliographic Communities

Start searching

No results found