Author Identifiers in Scholarly Repositories

Reading time: 5 minute
...

📝 Original Info

  • Title: Author Identifiers in Scholarly Repositories
  • ArXiv ID: 1003.1345
  • Date: 2011-03-25
  • Authors: Researchers from original ArXiv paper

📝 Abstract

Bibliometric and usage-based analyses and tools highlight the value of information about scholarship contained within the network of authors, articles and usage data. Less progress has been made on populating and using the author side of this network than the article side, in part because of the difficulty of unambiguously identifying authors. I briefly review a sample of author identifier schemes, and consider use in scholarly repositories. I then describe preliminary work at arXiv to implement public author identifiers, services based on them, and plans to make this information useful beyond the boundaries of arXiv.

💡 Deep Analysis

Deep Dive into Author Identifiers in Scholarly Repositories.

Bibliometric and usage-based analyses and tools highlight the value of information about scholarship contained within the network of authors, articles and usage data. Less progress has been made on populating and using the author side of this network than the article side, in part because of the difficulty of unambiguously identifying authors. I briefly review a sample of author identifier schemes, and consider use in scholarly repositories. I then describe preliminary work at arXiv to implement public author identifiers, services based on them, and plans to make this information useful beyond the boundaries of arXiv.

📄 Full Content

Author Identifiers in Scholarly Repositories Simeon Warner Cornell Information Science and Cornell University Library Ithaca, NY 14850, USA simeon.warner@cornell.edu Submitted: 2009-10-09 Abstract Bibliometric and usage-based analyses and tools highlight the value of information about scholarship contained within the network of authors, articles and usage data. Less progress has been made on populating and using the author side of this network than the article side, in part because of the difficulty of unambiguously identifying authors. I briefly review a sample of author identifier schemes, and consider use in scholarly repositories. I then describe preliminary work at arXiv to implement public author identifiers, services based on them, and plans to make this information useful beyond the boundaries of arXiv. 1 Context In an ideal scholarly communication system there would be tools to browse, navigate, make recom- mendations and assess influence based on the complete graph of all actors (people, collaborations, institutions) and all communication artifacts (articles, comments, blog posts, usage data1). As a shorthand I will call this complete graph the publication network. Contained within it are the famil- iar citation, usage, co-authorship, and co-citation graphs. In recent bibliometric and usage-based work, significant progress has been made with the artifact part of this graph (see, for example the work of the MESUR project [3]). Much less progress has been made with the actor part of the graph, in part because it is much harder to unambiguously identify authors than articles. Consider table 1 which shows the most frequently occurring lastname, initial pairs in arXiv user accounts. This illustrates one facet of the name disambiguation problem, namely that there are many authors with the same name. This is compounded by inconsistent spellings, use of initials or full first names, and even name changes. Within a single repository such as arXiv it is not usually possible to accurately answer the question “show me all the articles by this Zhang, Y”. In recent years there has been considerable work on unsupervised and supervised author name disambiguation using many different heuristic, machine learning and clustering techniques, and many different properties including co-authorship, citations and subjects/topics. While much better than naive approaches, these techniques are still far from perfect. In a recent Nature Correspondence, Raf Aerts asked “If it is possible to have DOIs for objects (or, so they say, enough IPv6 addresses for every molecule on Earth), why is it so difficult to implement 1Logically usage data would be links between actors and artifacts. However, for historical, cultural and practical reasons most usage data is treated as anonymous even though co-usage information may be extracted. 1 arXiv:1003.1345v1 [cs.DL] 6 Mar 2010 Lastname, Initial Count Zhang, Y 100 Lee, J 97 Wang, Y 89 Wang, J 84 Chen, Y 77 Kim, J 77 Wang, X 76 Lee, S 74 Kim, S 69 Liu, Y 69 Table 1: Most frequently occurring lastname, initial pairs in arXiv user accounts. There may be a few duplicate accounts but this indicates that nearly 100 different people named “Zhang, Y” have created user accounts at arXiv (as of May 2009). DAIs [Digital Author Identifiers] for authors?” [1]. Raf had earlier hinted at part of the answer by pointing out that he has more than one identifier in Scopus [6]. As we have already discussed, it is difficult to mine existing data to disambiguate references to authors. The more fundamental part of the answer is that it is much easier to create DOIs for articles when the one owner for an article creates the one DOI for it and presents it with the article (ignoring the issue of multiple versions of articles). As authors, we are not owned by a single authority and even if an identifier were created for us at birth by the appropriate government, there would be significant privacy concerns about using it for everything. Consider, for example, concerns over the uses and misuses of social security numbers in the USA. While we want to link a single author’s works together, do we want that identity to immediately link us to all other digital information about the private life of the individual? 2 Author Identifiers To illustrate the diversity of currently used author identifiers, table 2 shows several example schemes used in the scholarly domain. A more detailed inventory is provided on the repinf wiki [5]. The OpenID and ISNI schemes are not limited to the scholarly domain. OpenID is aimed primarily at authentication, however, if it continues to see growing acceptance it may well be a useful open system that repositories could use. It is not clear whether ISNI will develop into a widely used system. The largest efforts to create author identifiers specifically for the scholarly domain, Scopus Author Identifiers and ResearcherID, come from commercial entities and are clearly motivated by the desire to provide improved services b

…(Full text truncated)…

📸 Image Gallery

cover.png page_2.webp page_3.webp

Reference

This content is AI-processed based on ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut