Analyzing Linguistic Complexity and Scientific Impact
📝 Abstract
The number of publications and the number of citations received have become the most common indicators of scholarly success. In this context, scientific writing increasingly plays an important role in scholars’ scientific careers. To understand the relationship between scientific writing and scientific impact, this paper selected 12 variables of linguistic complexity as a proxy for depicting scientific writing. We then analyzed these features from 36,400 full-text Biology articles and 1,797 full-text Psychology articles. These features were compared to the scientific impact of articles, grouped into high, medium, and low categories. The results suggested no practical significant relationship between linguistic complexity and citation strata in either discipline. This suggests that textual complexity plays little role in scientific impact in our data sets.
💡 Analysis
The number of publications and the number of citations received have become the most common indicators of scholarly success. In this context, scientific writing increasingly plays an important role in scholars’ scientific careers. To understand the relationship between scientific writing and scientific impact, this paper selected 12 variables of linguistic complexity as a proxy for depicting scientific writing. We then analyzed these features from 36,400 full-text Biology articles and 1,797 full-text Psychology articles. These features were compared to the scientific impact of articles, grouped into high, medium, and low categories. The results suggested no practical significant relationship between linguistic complexity and citation strata in either discipline. This suggests that textual complexity plays little role in scientific impact in our data sets.
📄 Content
Lu et al. 1 Analyzing Linguistic Complexity and Scientific Impact Chao Lu1, Yi Bu2,3, Xianlei Dong4, Jie Wang5, Ying Ding2,6, Vincent Larivière7, Cassidy R. Sugimoto2, Logan Paul2, Chengzhi Zhang1*
- School of Economics and Management, Nanjing University of Science and Technology, Nanjing, Jiangsu, China
- School of Informatics, Computing, and Engineering, Indiana University, Bloomington, Indiana, U.S.A.
- Center for Complex Networks and Systems Research, School of Informatics, Computing, and Engineering, Indiana University, Bloomington, Indiana, U.S.A.
- School of Management Science and Engineering, Shandong Normal University, Jinan, Shandong, China
- School of Information Management, Nanjing University, Nanjing, Jiangsu, China
- School of Information Management, Wuhan University, Wuhan, Hubei, China
- École de bibliothéconomie et des sciences de l’information, Université de Montréal, Montréal, Québec, Canada
Corresponding author: Chengzhi Zhang, Email: zhangcz@njust.edu.cn
Lu et al.
2
Analyzing Linguistic Complexity and Scientific Impact
Abstract: The number of publications and the number of citations received have
become the most common indicators of scholarly success. In this context, scientific
writing increasingly plays an important role in scholars’ scientific careers. To
understand the relationship between scientific writing and scientific impact, this paper
selected 12 variables of linguistic complexity as a proxy for depicting scientific writing.
We then analyzed these features from 36,400 full-text Biology articles and 1,797 full-
text Psychology articles. These features were compared to the scientific impact of
articles, grouped into high, medium, and low categories. The results suggested no
practical significant relationship between linguistic complexity and citation strata in
either discipline. This suggests that textual complexity plays little role in scientific
impact in our data sets.
Keywords: English scientific writing; syntactic complexity; lexical complexity; lexical
diversity; lexical density; lexical sophistication.
- INTRODUCTION The success of scholars can be assessed with the aid of several indicators (e.g., high- impact publications, distinguished positions, prizes, etc.). Among these, high-impact publications have become one of the most important criteria, and several scholars have attempted to understand the factors that affect the impact of scholarly works (e.g., Amjad et al., 2017; Onodera & Yoshikane, 2015). Wang, Song & Barabási (2013) found that fitness (accounting for the perceived novelty and importance of a discovery) plays a vital role in affecting the long-term impact of a work. Amjad et al. (2017) found that there is a positive correlation between collaboration with advanced researchers and more citation counts of the publications in a domain. Other variables have also been shown to have a correlation with citation counts, such as publication venues and review Lu et al. 3 cycles (Larivière & Gingras, 2010; Shen et al., 2015; Onodera & Yoshikane, 2015; Tang, Shapira, & Youtie, 2015; Waltman, 2016) as well as collaboration (Larivière, Gingras, Sugimoto, & Tsou, 2015; Wu, Wang, & Evans, 2019; Wuchty, Jones, & Uzzi, 2007; Zhang, Bu, Ding, & Xu, 2018). The growth of big data provides new opportunities and challenges for the field of bibliometrics (Ding et al., 2014). The advent of digital publishing, which has led to an abundance of structured full-text scholarly documents, allows for opportunities that did not exist when Garfield developed the Science Citation Index. Several publishers provide the full-text of open-access papers (e.g., PLoS1), which can be used to enrich existing studies (Ding & Stirling, 2016). The availability of both full-text and metadata has allowed for the combination of computation linguistics and citation analysis (e.g., Bertin, Atanassova, Gingras, & Larivière, 2016; Ding et al., 2013; McKeown et al., 2016; Teufel, 2000; Wan & Liu, 2014). Several studies have examined the relationship between writing features and scientific impact, focusing on inter alia, title length, abstract length, keywords selection, and figure usage within the text (e.g., Didegah & Thelwall, 2013; Lee, West, & Howe 2018; Moat & Preis, 2015; Uddin & Khan, 2016). However, these studies have emphasized external features or descriptive attributes rather than the internal structure and writing. To address this gap, we examined the relationship between linguistic complexity and scientific impact, following the framework developed in Lu et al. (2019). Linguistic complexity generally takes two forms—syntactic and lexical (Levinson, 2007; Lu et al., 2019; Nolan, 2013)—by which the variety and sophistication of forms in language production can be quantitatively measured. Taking advantage of the recent development in computational linguistics techniques and the availability of the full-t
This content is AI-processed based on ArXiv data.