On the Composition of Scientific Abstracts

Reading time: 6 minute
...

📝 Abstract

Scientific abstracts contain what is considered by the author(s) as information that best describe documents’ content. They represent a compressed view of the informational content of a document and allow readers to evaluate the relevance of the document to a particular information need. However, little is known on their composition. This paper contributes to the understanding of the structure of abstracts, by comparing similarity between scientific abstracts and the text content of research articles. More specifically, using sentence-based similarity metrics, we quantify the phenomenon of text re-use in abstracts and examine the positions of the sentences that are similar to sentences in abstracts in the IMRaD structure (Introduction, Methods, Results and Discussion), using a corpus of over 85,000 research articles published in the seven PLOS journals. We provide evidence that 84% of abstract have at least one sentence in common with the body of the article. Our results also show that the sections of the paper from which abstract sentence are taken are invariant across the PLOS journals, with sentences mainly coming from the beginning of the introduction and the end of the conclusion.

💡 Analysis

Scientific abstracts contain what is considered by the author(s) as information that best describe documents’ content. They represent a compressed view of the informational content of a document and allow readers to evaluate the relevance of the document to a particular information need. However, little is known on their composition. This paper contributes to the understanding of the structure of abstracts, by comparing similarity between scientific abstracts and the text content of research articles. More specifically, using sentence-based similarity metrics, we quantify the phenomenon of text re-use in abstracts and examine the positions of the sentences that are similar to sentences in abstracts in the IMRaD structure (Introduction, Methods, Results and Discussion), using a corpus of over 85,000 research articles published in the seven PLOS journals. We provide evidence that 84% of abstract have at least one sentence in common with the body of the article. Our results also show that the sections of the paper from which abstract sentence are taken are invariant across the PLOS journals, with sentences mainly coming from the beginning of the introduction and the end of the conclusion.

📄 Content

1 Preprint: I. Atanassova, M. Bertin, V. Lariviere (2016) On the Composition of Scientific Abstracts, Journal of Documentation, vol. 72, issue 4. Submitted 12-Sep-2015, accepted 10-Feb-2016.
On the Composition of Scientific Abstracts
Iana Atanassova1, Marc Bertin2 and Vincent Lariviere3 1 iana.atanassova@univ-fcomte.fr Centre de Recherche en Linguistique et Traitement Automatique des Langues ―Lucien Tesnière‖, Université de Franche-Comté, 30 rue Mégevand, Besançon 25000 (France) 2bertin.marc@gmail.com Centre Interuniversitaire de Recherche sur la Science et la Technologie (CIRST), Université du Quebec à Montréal, CP 8888, Succ. Centre-Ville, Montreal, QC. H3C 3P8 (Canada) 3vincent.lariviere@umontreal.ca École de bibliothéconomie et des sciences de l‘information, Université de Montréal, C.P. 6128, Succ. Centre-Ville, Montréal, QC. H3C 3J7 (Canada) and Observatoire des Sciences et des Technologies (OST), Centre Interuniversitaire de Recherche sur la Science et la Technologie (CIRST), Université du Quebec à Montréal, CP 8888, Succ. Centre-Ville, Montreal, QC. H3C 3P8 (Canada) Abstract Scientific abstracts contain what is considered by the author(s) as information that best describe documents‘ content. They represent a compressed view of the informational content of a document and allow readers to evaluate the relevance of the document to a particular information need. However, little is known on their composition. This paper contributes to the understanding of the structure of abstracts, by comparing similarity between scientific abstracts and the text content of research articles. More specifically, using sentence-based similarity metrics, we quantify the phenomenon of text re-use in abstracts and examine the positions of the sentences that are similar to sentences in abstracts in the IMRaD structure (Introduction, Methods, Results and Discussion), using a corpus of over 85,000 research articles published in the seven PLOS journals. We provide evidence that 84% of abstract have at least one sentence in common with the body of the article. Our results also show that the sections of the paper from which abstract sentence are taken are invariant across the PLOS journals, with sentences mainly coming from the beginning of the introduction and the end of the conclusion. Introduction Scientific abstracts contain what is considered by the author(s) as information that best describe documents‘ content. They represent a compressed view of the informational content of a document and allow readers to evaluate the relevance of the document to a particular information need. According to Hartley (2008), an abstract gives a summary of the content of an article that is comparable to its title and key words but provides different degree of detail: ―All articles begin with a title. Most include an abstract. Several include ‗key words‘. All three of these features describe an article‘s content in varying degrees of detail and abstraction. The title is designed to stimulate the reader‘s interest. The abstract summarises the content.‖ (Hartley, 2008: p. 23).

Given the difficulties in obtaining and processing the full-text of scientific documents, as well as the fact that large-scale databases typically index abstracts, most bibliometrics studies use abstracts as a proxy for the content of scientific articles. The motivations for working with abstracts rather than the entire text body of articles are related to the fact that, by definition, abstracts are intended to represent as much as possible the quantitative and qualitative information in documents. Moreover, abstracts are relatively short–between 150 and 300 words—which allows efficient processing and are often available as part of the metadata of scientific articles.

2 Preprint: I. Atanassova, M. Bertin, V. Lariviere (2016) On the Composition of Scientific Abstracts, Journal of Documentation, vol. 72, issue 4. Submitted 12-Sep-2015, accepted 10-Feb-2016.
However, abstracts reproduce only part of the information and the complexity of argumentation in a scientific article. Previous work on the topic has provided recommendations on how to write an efficient abstract (Andrade, 2011), on conventions in abstract writing (Hernon & Schwartz, 2010; Swales & Feak, 2009), as well as on the advantages of structured abstracts (Hartley, 2014; Hartley & Sydes, 1997). An important question arises: to what extent and with what accuracy do scientific abstracts reflect article‘s content? Studying the properties of abstracts and, more specifically, the relationships that exist between abstracts and the full-text of papers can provide important insight into the structure of scientific writing and the possible biases related to representing scientific articles by their abstracts.

Since abstracts include very limited information of an article, they convey only part of the originality and the relevance of the research study.

This content is AI-processed based on ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut