On the Composition of Scientific Abstracts
📝 Abstract
Scientific abstracts contain what is considered by the author(s) as information that best describe documents’ content. They represent a compressed view of the informational content of a document and allow readers to evaluate the relevance of the document to a particular information need. However, little is known on their composition. This paper contributes to the understanding of the structure of abstracts, by comparing similarity between scientific abstracts and the text content of research articles. More specifically, using sentence-based similarity metrics, we quantify the phenomenon of text re-use in abstracts and examine the positions of the sentences that are similar to sentences in abstracts in the IMRaD structure (Introduction, Methods, Results and Discussion), using a corpus of over 85,000 research articles published in the seven PLOS journals. We provide evidence that 84% of abstract have at least one sentence in common with the body of the article. Our results also show that the sections of the paper from which abstract sentence are taken are invariant across the PLOS journals, with sentences mainly coming from the beginning of the introduction and the end of the conclusion.
💡 Analysis
Scientific abstracts contain what is considered by the author(s) as information that best describe documents’ content. They represent a compressed view of the informational content of a document and allow readers to evaluate the relevance of the document to a particular information need. However, little is known on their composition. This paper contributes to the understanding of the structure of abstracts, by comparing similarity between scientific abstracts and the text content of research articles. More specifically, using sentence-based similarity metrics, we quantify the phenomenon of text re-use in abstracts and examine the positions of the sentences that are similar to sentences in abstracts in the IMRaD structure (Introduction, Methods, Results and Discussion), using a corpus of over 85,000 research articles published in the seven PLOS journals. We provide evidence that 84% of abstract have at least one sentence in common with the body of the article. Our results also show that the sections of the paper from which abstract sentence are taken are invariant across the PLOS journals, with sentences mainly coming from the beginning of the introduction and the end of the conclusion.
📄 Content
1
Preprint: I. Atanassova, M. Bertin, V. Lariviere (2016) On the Composition of Scientific Abstracts, Journal of
Documentation, vol. 72, issue 4. Submitted 12-Sep-2015, accepted 10-Feb-2016.
On the Composition of Scientific Abstracts
Iana Atanassova1, Marc Bertin2 and Vincent Lariviere3
1 iana.atanassova@univ-fcomte.fr
Centre de Recherche en Linguistique et Traitement Automatique des Langues ―Lucien Tesnière‖, Université de
Franche-Comté, 30 rue Mégevand, Besançon 25000 (France)
2bertin.marc@gmail.com
Centre Interuniversitaire de Recherche sur la Science et la Technologie (CIRST), Université du Quebec à
Montréal, CP 8888, Succ. Centre-Ville, Montreal, QC. H3C 3P8 (Canada)
3vincent.lariviere@umontreal.ca
École de bibliothéconomie et des sciences de l‘information, Université de Montréal, C.P. 6128,
Succ. Centre-Ville, Montréal, QC. H3C 3J7 (Canada) and Observatoire des Sciences et des Technologies (OST),
Centre Interuniversitaire de Recherche sur la Science et la Technologie (CIRST), Université du Quebec à
Montréal, CP 8888, Succ. Centre-Ville, Montreal, QC. H3C 3P8 (Canada)
Abstract
Scientific abstracts contain what is considered by the author(s) as information that best describe documents‘
content. They represent a compressed view of the informational content of a document and allow readers to
evaluate the relevance of the document to a particular information need. However, little is known on their
composition. This paper contributes to the understanding of the structure of abstracts, by comparing similarity
between scientific abstracts and the text content of research articles. More specifically, using sentence-based
similarity metrics, we quantify the phenomenon of text re-use in abstracts and examine the positions of the
sentences that are similar to sentences in abstracts in the IMRaD structure (Introduction, Methods, Results and
Discussion), using a corpus of over 85,000 research articles published in the seven PLOS journals. We provide
evidence that 84% of abstract have at least one sentence in common with the body of the article. Our results also
show that the sections of the paper from which abstract sentence are taken are invariant across the PLOS
journals, with sentences mainly coming from the beginning of the introduction and the end of the conclusion.
Introduction
Scientific abstracts contain what is considered by the author(s) as information that best
describe documents‘ content. They represent a compressed view of the informational content
of a document and allow readers to evaluate the relevance of the document to a particular
information need. According to Hartley (2008), an abstract gives a summary of the content of
an article that is comparable to its title and key words but provides different degree of detail:
―All articles begin with a title. Most include an abstract. Several include ‗key words‘. All
three of these features describe an article‘s content in varying degrees of detail and
abstraction. The title is designed to stimulate the reader‘s interest. The abstract summarises
the content.‖ (Hartley, 2008: p. 23).
Given the difficulties in obtaining and processing the full-text of scientific documents, as well as the fact that large-scale databases typically index abstracts, most bibliometrics studies use abstracts as a proxy for the content of scientific articles. The motivations for working with abstracts rather than the entire text body of articles are related to the fact that, by definition, abstracts are intended to represent as much as possible the quantitative and qualitative information in documents. Moreover, abstracts are relatively short–between 150 and 300 words—which allows efficient processing and are often available as part of the metadata of scientific articles.
2
Preprint: I. Atanassova, M. Bertin, V. Lariviere (2016) On the Composition of Scientific Abstracts, Journal of
Documentation, vol. 72, issue 4. Submitted 12-Sep-2015, accepted 10-Feb-2016.
However, abstracts reproduce only part of the information and the complexity of
argumentation in a scientific article. Previous work on the topic has provided
recommendations on how to write an efficient abstract (Andrade, 2011), on conventions in
abstract writing (Hernon & Schwartz, 2010; Swales & Feak, 2009), as well as on the
advantages of structured abstracts (Hartley, 2014; Hartley & Sydes, 1997). An important
question arises: to what extent and with what accuracy do scientific abstracts reflect article‘s
content? Studying the properties of abstracts and, more specifically, the relationships that
exist between abstracts and the full-text of papers can provide important insight into the
structure of scientific writing and the possible biases related to representing scientific articles
by their abstracts.
Since abstracts include very limited information of an article, they convey only part of the originality and the relevance of the research study.
This content is AI-processed based on ArXiv data.