Search for Evergreens in Science: A Functional Data Analysis

Evergreens in science are papers that display a continual rise in annual citations without decline, at least within a sufficiently long time period. Aiming to better understand evergreens in particular and patterns of citation trajectory in general, …

Authors: Ruizhi Zhang, Jian Wang, Yajun Mei

Search for Evergreens in Science: A Functional Data Analysis
Searc h for Ever gr e ens in Science: A F unctional Data Analysis ∗ Ruizhi Zhang 1 , Jian W ang 2 , 3 & Y a jun Mei 1 1 H. Milton Stew art Sc ho ol of Industrial & Systems Engineering, Georgia Institute of T echnology 2 Cen ter for R&D Monitoring and Departmen t of Managerial Economics, Strategy & Inno v ation, KU Leuv en 3 German Cen ter for Higher Education Researc h and Science Studies, DZHW Berlin Emails: zrz123@gatec h.edu, jian.w ang@kuleuv en.b e, ymei@isy e.gatech.edu Ma y 17, 2017 Abstract Ever gr e ens in science are pap ers that display a contin ual rise in annual cita- tions without decline, at least within a sufficien tly long time p erio d. Aiming to b etter understand ev ergreens in particular and patterns of citation tra jectory in general, this paper develops a functional data analysis metho d to cluster citation tra jectories of a sample of 1699 research pap ers published in 1980 in the American Ph ysical So ciety (APS) journals. W e propose a functional Poisson regression model for individual pap ers citation tra jectories, and fit the mo del to the observed 30- y ear citations of individual pap ers by functional principal comp onent analysis and maxim um likelihoo d estimation. Based on the estimated paper-sp ecific co efficien ts, w e apply the K-means clustering algorithm to cluster pap ers into different groups, for uncov ering general types of citation tra jectories. The result demonstrates the existence of an evergreen cluster of papers that do not exhibit an y decline in ann ual citations ov er 30 years. Keyw ords : citation tra jectory; evergreen; functional P oisson regression; func- tional principal comp onent analysis; K-means clustering ∗ Ruizhi Zhang, Jian W ang & Y a jun Mei. (2017). Search for ev ergreens in science: A functional data analysis. Journal of Informetrics , 11(3), 629–644. http://dx.doi.org/10.1016/j.joi.2017.05.007 c  2017 Elsevier Ltd. The authors thank the editor and three anon ymous referees for their constructiv e commen ts whic h hav e substan tially impro v ed this pap er. R. Zhang and Y. Mei were supported in part b y the NSF grant CMMI-1362876, and J. W ang by a postdo ctoral fellowship from the Researc h F oundation – Flanders (FW O). Data used in this pap er are from a bibliometric database developed b y the Comp etence Center for Bibliometrics for the German Science System (KB) and deriv ed from the 1980 to 2012 Science Citation Index Expanded (SCI-E), So cial Sciences Citation Index (SSCI), Arts and Humanities Citation Index (AHCI), Conference Pro ceedings Citation Index–Science (CPCI-S), and Conference Proceedings Citation Index–So cial Science & Humanities (CPCI-SSH) prepared b y Thomson Reuters (Scien tific) Inc. (TR R  ), Philadelphia, Pennsylv ania, USA: c  Cop yrigh t Thomson Reuters (Scientific) 2013. KB is funded by the German F ederal Ministry of Education and Research (BMBF, pro ject num b er: 01PQ08004A). 1 2 1. In tr od u c tion S c ienc e is a ske we d wo rld w he re a small number of pu bli c a ti ons re c e ive a dispropor ti ona te a mount of c it a ti ons. W ha t do citation tra jec torie s of the most c it e d pa pe rs l ook li ke ? Do the y follow the “ t y pi c a l” c it a ti on tra jec tor y do c ument ed in t he li ter a ture , spe c ific a ll y , the a nnua l c it a ti ons of a pa pe r r ise t o a pe a k in t he fi rst fe w y e a rs a fte r pub li c a ti on a nd then slowl y f a de a wa y ove r tim e ? Fig. 1 plot s a nnua l citation s of the top ten most c it e d pa p e rs publis he d in the A mer ica n P h y sica l S oc ie t y (A P S ) jour na ls , a nd th e ir a nnua l citations a re c ounted in the W e b of S c ienc e (W oS ) f rom the y e a r of publi c a ti on to 2016 . Among them the y ou ng e st w a s publ ished in 1999, a nd the olde st 196 4. C orr e spondi ng l y , t he l e ng th of th e ir obse rv e d c i tation t ra jec torie s ra n g e f rom 18 to 53 y e a r s ( Appe ndix A ) . I n a ddit ion t o their e x c e pti ona l ly lar ge number o f c it a ti ons, a re mar k a ble o bse rva ti on is t ha t most of the m (a t l e a st s e ve n out of te n) do not e ve n show a n y sig n that their a nnua l citations a r e a bou t t o pe a k a nd will star t t o de c li ne in t he ne a r futur e . W e re fe r to t his p he nomenon of c onti nua l rise in annua l citations w it hout de c li ne a s e v e rgre e ns , whic h c lea rl y viol a tes the “ t y pi c a l ” p a tt e rn of c it a ti on tra j e c tor y . Althou g h we c a nnot pre dict wh e ther th e se pa pe rs w il l re main hi g hl y c it e d in t he futu re , t he fa c t t ha t t he y h a ve not y e t be c ome obsole te a fte r up to 53 y e a rs c a ll s for a tt e nti on , e spe c iall y c onsi de ring that the major it y of p a pe rs r e a c h their c it a ti on pe a k a roun d the 3 rd or 5 th y e a r a fte r publi c a ti on an d that most bibl iom e tric a na l y s e s e x a mi ne c it a ti ons i n a re lativel y short ti me w indow . I ns e rt F i g . 1 h e re The objec ti ve of this pa p e r is t o be tt e r unde rsta nd e v e rgre e ns in par ti c ula r a nd pa tt e rns of c it a ti on tra jec tor y in g e n e ra l. Mo re ove r, do e v e rgree ns c o nsti tut e a g e n e ra l t y pe o f c it a ti on tra jec tor y , or a re the y so r a r e that the y c a nnot be c a pture d in an y s tatist ica l c lust e r a na l y sis ? 3 To thi s e nd, we de ve lop a func ti ona l data a na l y sis (F DA ) me thod t o a na l y z e the 30 - y e a r c it a ti on tra jec torie s o f a sa mpl e o f pub li c a ti ons publ ished in 1980 i n APS journa ls. Our F D A me thod int e g r a tes func ti ona l p rinc ipal c ompone nts a na l y si s , P oiss on re g r e ssi on , a n d K - mea ns c lust e rin g . More spe c ifi c a ll y , we mo de l the c it a ti on tra jec tori e s of indi vidual publi c a ti ons b y a small number of c omm on ba sis func ti o ns a nd pa pe r - spe c i fic c oe ff icie nts on these b a sis func ti ons. F or e a c h pa pe r, its 30 - dim e nsion a l ve c tor of c it a ti ons c a n b e c ha r a c te riz e d b y it s c o e ff icie nts on the c omm on ba sis func ti ons, whic h subseque ntl y se rv e a s inpu ts for the K - me a ns c lust e ring , to unc ove r g e n e r a l t y p e s of c it a ti on tra jec torie s. R e s ult s of our c lust e r a na l y si s provide strong e videnc e th a t e v e rgr e e ns e x i s t as a g e ne ra l cla ss o f c it a ti on tra jec tor y . I n a d dit ion, we a re not a ble to pre dict wh e ther a pa pe r w il l bec om e a n e v e rgree n b y som e e x ante p a pe r f e a tur e s such a s the numbe r of a uthors a n d re fe re nc e s. The re maind e r of thi s pa pe r is or g a niz e d a s follo ws. W e be g in wit h a b rie f r e vi e w of pr e vious c lust e r a na l y se s o f c it a ti o n tra jec torie s , a s we ll a s t he method of func ti ona l da ta a na l y sis , follo we d b y a de s c riptio n of o ur da tase t . Ne x t, our pr opose d model a nd met hod is pre se nted, with t he e mphasis on how to combine f unc ti ona l princ ipal c omponent a n a ly s is, P oiss on re g re ssi on, a nd K - m e a ns c lust e ring a l g orithm for model ing a nd c lust e r in g c it a ti on tra jec torie s. The n we r e port the e mpi r ica l re sult s of our prop os e d model a nd method to t he re a l c it a ti on da tase t. I mpl ica ti ons of our f indi ng s a re a lso di sc ussed. 2. P r io r li te r a tur e 2.1. C lust e ring c it ati on tra jec tori e s C it a ti on a g e in g ha s b e e n a long - st a nding re se a rc h topi c , a nd d iff e r e nt patter ns of c it a ti on tra jec torie s h a ve be e n do c umente d i n the bibl iom e tric s li ter a ture ( Avr a m e sc u, 1979; Ga rf ield, 4 1980; Ave rsa , 1985; L ine , 1993; Glänz e l & S c hoe pflin, 1995; R e dne r, 200 5; R og e rs, 2010; W a ng , 2013; B a umgar tn e r & L e y de sdo rf f, 201 4 ) . Ave rsa (1985 ) c ondu c te d pro ba bl y the f irst rig orous statist ica l an a l y s is of c it a ti on tra jec torie s, i nve sti g a t in g 9 - y e a r c it a ti on tra jec torie s o f 400 hig hl y c it e d pa p e rs publi she d in 1972 a nd a ppl y in g the K - me a ns c lust e rin g a lg orithm to t he norma li z e d a nnua l citation c ount s (i.e ., a nnua l c it a ti ons di vided b y tot a l c it a ti ons in t he whole studi e d ti me pe riod ) . Av e rsa (198 5 ) identifie d tw o c lust e rs: de laye d ri se - slow de c li ne a nd e arly ris e - rapi d de c li ne . C ostas, va n L e e uw e n, a n d va n R a a n (20 10 ) a na l y z e d a bout 30 m il li on doc uments i n W oS publi she d be twe e n 1980 a nd 2008. F oll ow in g P ric e ’ s obse rva ti on, doc ume nted in his pe rsona l c omm unica ti on to Ave rs a (198 5) , C ostas e t al. ( 2 010) c lassif ied pa p e rs int o thre e c a te g orie s: 50% pa pe rs a s normal do c ume nts , 25% a s de laye d do c u me nts , a nd 25% as fl ashe s - in - the - pan . How e ve r, the se thr e e c lu ster s a re d e fine d b a se d o n a sing le re a l - v a lued su mm a r y statis ti c s of indi vidual pa pe rs, Y e ar 5 0% , de fine d a s the y e a r whe n a pa p e r ha s c umul a ted ha lf of it s tot a l c it a ti ons up t o y e a r 200 8. I n a ddit ion, ther e is no s tatist ica l j usti fic a ti on on t he prop ortion of these thre e c lust e rs. More re c e ntl y , C olaviz z a a nd F r a nc e s c he t (2016 ) e x a mi ne d a bout half mi ll ion pape rs publi she d in AP S journa ls and a ppli e d the spe c tr a l clust e rin g method on the nor m a li z e d a nnua l citations re c e iv e d b y these p a pe rs withi n the A P S da taba se . The thre e id e nti fie d g e n e r al t y pe s of c it a ti on tra jec torie s a re middle - of - the - roads , spri nters, a nd marathone rs . M iddl e - of - the - roads pa pe rs displ a y a n a ve r a g e c it a ti on a ge in g pa tt e rn, a nd c a n be view e d a s c orr e spond ing to normal doc ume nts . Spri nters ha ve a n e a rl y a nd hi g h pe a k a nd a fa st d e c li ne , whic h c a n be vie we d a s fl ash es - in - the - pan . M ar athoners re pre s e nt “ f a st or sl ow - rise , mod e r a tel y pe a ke d hist orie s, 5 followe d b y a slow de c li ne , or a bse n c e of de c li ne , or e ve n a c onst a nt rise in re c e iv e d c it a ti ons ove r tim e ” a nd the re fo re c a n c or re spond t o de lay e d doc ume nts or e v e rgre e ns . The phe nomenon of e v e rgree ns , whic h w e re e mp ha siz e d b y Avr a mesc u (1 979) a nd P ric e (se e Ave rsa (1985 ) ), w e re not identifie d b y c lust e rin g a na l y se s in Av e rsa (1985 ) a nd C ostas e t al. (2010 ) , while m arat hone rs in som e spe c ific a ti ons in C olaviz z a a nd F ra n c e s c he t (2016) a lso displ a y a c onti nua ll y inc r e a sing a nnua l citation cu rve . 2.2. Func ti onal dat a analysis F unc ti ona l data a na l y sis (F DA ) i s a re c e nt ne w de ve lopm e nt i n the f ield of statis ti c s a nd ha s a tre mendous g row th ove r the pa st dec a d e s ( B e sse & R a msa y , 1986; R ice & S il ve rma n, 1991; Hoove r, Rice , W u, & Ya ng , 1998; R a msa y & S il ve rma n, 2005; Ya o, Mül le r, & W a n g , 2005; Ha ll , Mül ler , & W a n g , 2 006; L e n g & Mül ler , 200 6; Ha dji pa ntelis, Ast on, & Eva ns, 2012 ) . F D A mi g ht be pa rtic ul a rl y use ful f or bibli ometr ic a na l y sis for two r e a sons: F irst, F DA is a non - pa ra metr i c method a nd is ther e for e use ful for a n a ly z in g bibl iom e tric da t a f or w hich the unde rl y in g dist r ibut ion i s ofte n unc lea r. Sec ond, F DA a n a l y z e s hi g h - di me nsion a l data , suc h a s c urve s a nd sh a pe s, whic h a re of p a rtic ula r inte re st to bibl iom e tric studi e s. Using r e g re ssi on a na l y sis a s a n a n a log y , w hil e tra dit ional r e g re ssi on a na l y sis onl y a ll ows one re a l - v a lued de pe nde nt va ria ble, F DA a ll ows both depe nde nt a nd indepe nde nt var i a bles to be mul ti dim e nsion a l. Most F DA methods de a l with continuous da ta , but pa pe r c it a ti ons t o be a na l y z e d in t his s tud y a r e discr e te c ount d a ta . The r e a re onl y a fe w F D A studi e s de a li ng with c ount var iable s (v a n de r L inde, 2009; S e rb a n, S taic u, & C a rr oll , 2013; W u, Mül ler , & Z ha n g , 2013 ) , a nd o ur pr opose d method is diff e re nt f rom these li mi ted e x ist ing F D A me thods for P oiss on da ta . S pe c ific a ll y , w e 6 a da pt t he methods i n R ice a nd S il ve rma n (19 91) f rom Ga ussi a n dist ribute d da ta to P oiss on c ount da ta b y e x ploring the c lo se re lations hip betwe e n P oiss on a nd Ga ussi a n dist ributi ons. 3. Dat a The da ta use d for thi s st ud y a re r e se a rc h p a pe rs pu bli she d in 1980 i n the A mer ica n P h y sica l S oc iet y ( APS ) jour na ls, s pe c ific a ll y six journa ls whi c h we re a c ti ve in 1980: Phy sical Re v iew A , B, C, D, Phy sical Re v iew L e tt e r , a nd Re v iews of M ode rn Phy sics . APS journa l pa pe r c it a ti on tra je c torie s h a ve be e n e x tensive l y studi e d in prior l it e ra ture (R e dne r, 200 5; W a ng , S ong , & B a r a bá si, 20 13; C olaviz z a & F ra n c e s c he t, 2016) . W e onl y include ori g inal re se a rc h pa p e rs labe led a s “ a rtic le ” a nd e x c lude other doc ument t y pe s such a s “ re view ” or “ note. ” The r e a r e a tot a l of 4023 r e se a rc h a rt icle s , a nd their c umul a ti v e c it a ti ons i n the f irst 30 y e a rs a fte r publi c a ti on, i.e., be twe e n 1980 a nd 2009, a r e re tri e ve d fr om the W e b of Sc ienc e (W oS ). Si nc e a suff icie nt amount of c it a t ions ar e re qui re d fo r r e li a ble mode li ng of the c it a ti on tra jec torie s (A ve rs a , 1985; W a ng e t al., 2013; C olaviz z a & F r a nc e sc h e t, 2016) , w e de c ide to f oc us on pa pe rs with at lea st 30 c it a ti ons i n the f irst 30 y e a rs a fte r publi c a ti on. The re sult ing da t a se t consist s of 1699 pa pe rs. F or a robus tness c he c k , w e teste d tw o diff e re nt t hr e shol ds, spec ific a ll y , w e re pli c a ted the a na l y sis us ing two diff e re nt dat a se ts : (1) pa pe rs with at lea st 20 tot a l citations and (2) pa pe rs w it h a t l e a st 5 0 tot a l citations. W e obt a ined c onsi stent c lust e rin g r e sult s ( Appe ndix B ) . I ns e rt F i g . 2 h e re The re is conside ra ble v a r iation in i ndivi dua l pape rs ’ c it a ti on tra jec torie s in our d a tase t. Fig . 2 plot s the c it a ti on tra jec torie s of four s e lec ted p a pe rs . Thr e e o f the m loosel y r e se mbl e the thr e e g e n e ra l t y pe s l a be led b y C ostas e t al. ( 2010) a s fl a sh - in - the - pan (r e d c u rve ) , normal doc ume nt (blue c urv e ), a nd d e laye d doc ume nt (pur pl e c urv e ) . The normal doc ume nt ( blue ) f oll ows the 7 “ t y pic a l” c it a ti o n a g in g p a tt e rn, w he re the c it a ti ons g r a dua ll y incr e a se a nd then de c r e a s e ove r ti me. T he fl ash - in - the - p an (r e d ) h a s re lativel y fa s ter c it a ti on rising a nd de c li ning proc e ss e s , while the de laye d do c um e nt (pur ple) h a s re lativel y slowe r c it a ti on rising a nd de c li ning pro c e ss. All t he se thre e t y pe s follow the “ t y pica l” p a tt e rn of c it a ti on tra jec tor y , a lt houg h the y v a r y in t he g e n e ra l sp e e d of c it a ti on a ge in g . How e v e r, the gr e e n c urv e ha s a c onti nua l l y risin g a nnua l c it a ti on c urve without a d e c li ning sta ge durin g the fir st 30 y e a rs a fte r b e ing publi she d. T he gre e n c urve in Fig . 2 il lust ra tes the a nnua l and c umul a ti ve c it a ti ons of the pa p e r i n Phy sical Re v iew L e tt e rs e nti tl e d “ Gr ound S tate of the Ele c tron G a s b y a S toch a sti c Me thod ” c oa uthore d b y C e pe rle y a nd Alde rb y , w hich i s the most c it e d pa pe r up to y e a r 200 9 a mon g a ll pa p e rs publis he d in 1980 i n APS journa ls. I n a ddit ion, Fig . 2 sugg e s ts a hig h leve l o f unc e rta i nt y in pape r c it a ti ons a nd the diff icult y o f using short - te rm c it a ti ons t o pre dict long - ter m citat ions . S pe c ific a ll y , w hil e e v e rgre e n pa p e rs would e ve ntuall y be c om e e x tre mel y hi g hl y c it e d, their c it a ti ons i n the f irst fe w y e a rs a re not ne c e ssa ril y v e r y lar ge . More ove r, the dist ributi on of the 30 - y e a r c umul a ti ve c it a ti on c ounts is hig hl y ske w e d : 10 pa pe r s (0.6% ) in our sa mpl e h a ve c it a ti ons g re a ter than 1000, 50 pa pe rs (2.9% ) g r e a ter th a n 400, a nd 1244 pa pe rs ( 73.2% ) fe we r tha n 100. This i mpl ies that the dist ributi on of pa pe rs a c r oss di ff e re nt t y pe s o f c it a ti on tra jec torie s mi g ht a lso be une ve n. 4. M e thod ology The objec ti ve of this pap e r is t o e mpi ric a ll y unc ov e r g e ne ra l t y pe s of c it a ti on tra jec torie s b a se d on the obse rve d p a pe r c it a ti on ti me se rie s a nd e x a mi ne whe ther e v e rgre e ns c onst it ute a ge ne r a l t y pe of c it a ti on tra jec to r y . The main idea is t o use func ti ona l princ ipal c om pone nt ana l y sis a nd P oiss on re g r e ssi on to m ode l citation t ra jec torie s . This appr oa c h a ll ows us to conduc t di mension 8 re duc ti on , that is , to cha r a c ter iz e the ve c tor of 30 - y e a r c it a ti on c ount s of a pa pe r b y a sm a ll e r number of pa r a mete rs de rive d fr om our mode l . S ubse que ntl y th e s e pa r a m e ter s c a n be use d a s input s f or the K - mea ns c l uster a na l y sis for un c ove ring g e n e r a l t y p e s of c it a t ion t ra jec torie s . 4.1. F unc ti onal Poi sson r e gre ssi on mode l W e de ve lop a nonpa r a me tric model f or the c umul a ti ve c it a ti ons ba se d on f u nc ti ona l P oiss on re g re ssi on. Th e nonpa r a metr ic a ppro a c h do es not im pose a n y theo re ti c a l a ssum pti ons on t he mec ha nism s unde rl y in g the c it a ti on pr oc e ss but let s the da ta spe a k for them se lves . W e a dopt t his e x plora ti ve a pproa c h in o rde r to be tt e r und e rsta nd diver ge nt citation patter n s in re a l l ife . F or e a c h pa p e r i = 1, …, N ,   󰇛  󰇜 de note s the obse rve d c umul a ti ve number of c it a ti ons for the i - th pa pe r in y e a r j a ft e r be in g publi she d, wh e re j is a discr e te tim e va ri a ble ( i.e ., y e a r) a nd j = 1, …, T . F or no tational c onve n ienc e , w e de note   󰇛  󰇜   . Our p ropose d fun c ti ona l P oiss on re g re ssi on model a ssum e s that the obse rve d c umul a ti ve c it a ti on s   󰇛  󰇜 a re the re a li z a ti on of a c ounti ng pro c e ss   󰇛  󰇜 for the c onti nuous t im e va ria ble t a nd 0 ≤ t ≤ T .   󰇛  󰇜       󰇛   󰇛  󰇜 󰇜 (1) for t ≥ 0, whe r e the me a n func ti on   󰇛  󰇜 sa ti sfie s    󰇛  󰇜    󰇛  󰇜             󰇛  󰇜 (2) a nd the f unc ti on  󰇛  󰇜  a nd the ba sis func ti ons   󰇛  󰇜 a re smoo th func ti ons of t that a re t he sa me for a ll pa pe rs. T he i r e sti mations wi ll be fur ther discus se d late r. I n Equ a ti on (2 ) w e a dopt a squa re - root tra ns for ma ti on fo r the mea n fun c ti on   󰇛  󰇜 . Note that f or P oiss on re g r e ssi on, or more g e ne ra ll y G e n e ra li z e d L ine a r M ode ls, t he r e a r e two popular 9 tra nsfor mation for th e mea n   󰇛  󰇜 of the c ount data : l o g - tr a nsfor mation  󰇛   󰇛  󰇜 󰇜 a nd squa re - root tra nsfor mation    󰇛  󰇜 (N e l de r & B a ke r, 197 2; McC ull a g h & Ne ld e r, 198 9) . F or a n y g iv e n ba sis func ti ons   󰇛  󰇜 , both t ra nsfor mation stra te g ie s ha ve be e n wid e l y us e d in t he statis ti c s li ter a ture , a nd whic h tra n sfor mation i s be tt e r de pe nds on t he spe c ific a ppli c a ti on a nd da tase t. I n the c ontex t of thi s st ud y , the squa re root tr a nsfor m a ti on is pre fe r a ble. As w il l be e x plaine d in S e c ti on 4.2, i n our func ti ona l princ ipal c omponent a na l y sis for de riving ba s is func ti ons, we will a pprox im a te the P oiss on dist ributi on of c it a ti on c ount s b y a G a ussi a n dist ributi on via the squa re - root tra nsfor mation, whic h a ll ows us to t a ke a dva n tage of the ri c h li ter a ture of F DA for G a ussi a n dist ribute d da ta (R ice & S il ve rma n, 1991; R a msay & S il ve rma n, 2005) . T he re fo re , a dopti n g the squa re - root tra nsfor m a ti on stra te g y he r e for the P o iss on re gre ssi on matc h es the squa re - root tra nsfor mation i n fun c ti o na l princ ipal c omponent a na l y sis a nd c onse qu e ntl y y i e ld s a b e tt e r f it to the da ta. I n a ddit ion, i n st a nda rd p rinc ipal c omponent a n a l y sis , the numbe r  of ba sis func ti on s is assum e d to be r e lativel y small , wh il e the r e taine d b a sis func ti on s shoul d be a ble to e xplain most info rma ti on of the or i g in a l data . Unde r our model in Equa ti on s (1 ) a nd (2 ), the g o a l i s e ssentiall y t o find a n e sti mate    󰇛  󰇜 that i s a smoo th ver sion o f   󰇛  󰇜 with ce rta in co rr e lation s tru c ture . I n a ddit ion , it is also usef ul t o thi nk our prop ose d mode l a s a dim e nsion re du c ti on , re pre se nt in g the T - dim e nsion a l cumulative c it a ti ons of a pa p e r by a  - dim e nsion a l vec tor o f c oe ff i c ients     . S ubse que ntl y , t he prob l e m of identif y in g g e n e ra l c it a ti on pa tt e rns c a n b e r e duc e d to t he c lust e r a na l y sis o f the  - dim e nsion a l vec t or of c oe f fic ients . 4.2. M ode l para me ter e sti mation 10 W he n fitt ing the func ti on a l P oiss on re g r e ssi on model in Equa ti on s (1) a nd ( 2) to t he obse rve d c umul a ti ve c it a ti ons   󰇛  󰇜 of the N pa pe rs, w e ne e d to esti mate two kinds of un known qua nti ti e s: the c omm on ba sis func ti on s  󰇛  󰇜 a nd   󰇛  󰇜 whic h a r e the s a me f or a ll pa pe rs , a nd the pa pe r - spe c ific c o e ff i c ients     whic h a re t a il ore d fo r e a c h p a pe r indivi dua ll y . C le a rl y the y a re c losel y re late d, a nd th e re a re no unique e sti mation m e thods . He re w e prop ose to e sti mate t he m b y usi n g the f unc ti ona l princ ipal c omponent a na l y sis metho d a nd P oiss on re gre ssi on, re spe c ti ve l y . R e g a rdin g the e sti mation of the c omm on ba sis fun c ti ons  󰇛  󰇜 a nd   󰇛  󰇜 in Equa ti on (2 ), int uit ivel y on e shoul d use infor mation ac ross a ll the obse rve d N p a pe rs. F r om t he func ti ona l de c ompos it ion vi e wpoint, t he se ba sis func ti ons c a n be a n y se t of o rthog on a l base s, a lt houg h some ba se s a r e more e ff i c ient than othe rs. I n the func ti ona l data a n a l y sis li ter a ture , the e sti mation of the se ba sis func ti ons ha s be e n we ll - s tudi e d for Ga ussi a n dist ri buted da ta, e . g ., R ice a nd S il ve rma n (19 91) a n d R a msa y a nd S il ve rma n (200 5) . He re w e prop os e to ada pt t he se p rior methods t o P o iss on c oun t data b y e x ploring the c l ose re lations hip betwe e n P oiss on a nd Ga ussi a n dist ributi ons. F or a P oiss on ra ndom var iabl e X with a la r g e me a n μ > 0, a w e ll - known f a c t i s     󰇛        󰇜 (T ha c k e r & B romil e y , 2 001) . Note that the va ria nc e of   is appr ox im a tel y c onst a nt, and thus t he squa re - root tra nsfor m a ti on of P oiss on da ta is a ls o re fe rr e d to as the va ria nc e - stabili z ing tra ns for mation i n the sta ti sti c a l l it e ra ture (A ns c ombe, 1 948) . B ro wn, Car ter , L ow, a nd Z ha n g (2004 ) a lso use d the squa re - root tra nsfor mation t o e stablis h the g lobal a s y mpt oti c e quival e nc e b e twe e n P oiss on pr oc e ss a nd Ga ussi a n pro c e ss. I n thi s pa p e r w e c onsi de r the squa re - root tra nsfo r mation of the c ount var ia ble,    󰇛  󰇜 , so t ha t t he ba se s  󰇛  󰇜 a nd   󰇛  󰇜 in Equa ti on (2 ) c a n be e sti mate d b y a p pl y in g the r ich func ti ona l da ta a na l y sis li ter a ture to the “ a pprox im a te G a ussi a n” da ta    󰇛  󰇜 , e .g ., R ice a nd S il ve rma n (19 91) 11 a nd R a msa y a nd S il ve rm a n (20 05) . S pe c i fic a ll y , the squa re - root tra ns for m a ti on of the obse rve d c it a ti on c ounts for e a c h p a pe r c a n be model e d a s b e ing indep e nde nt r e a li z a ti ons of a stocha sti c proc e ss  󰇛  󰇜    󰇛  󰇜 , with mea n    󰇛  󰇜    󰇛  󰇜 a nd c ova ria n c e fun c ti on  󰇛    󰇜     󰇛  󰇜   󰇛  󰇜  . W e a ssum e that ther e i s a n orthog on a l ex pa nsion (in the L 2 se nse ) of  󰇛    󰇜 in ter ms of e i g e nfun c ti ons  󰇛    󰇜         󰇛  󰇜       󰇛  󰇜 (3) The n, a c c ordin g to the K a rhune n – L o è ve e x pa nsion theor e m , a r a ndom cita ti on c urve   󰇛  󰇜     󰇛  󰇜 c a n be e x pre ssed a s    󰇛  󰇜    󰇛  󰇜             󰇛  󰇜 (4) whe re the c oe ff icie nts     a r e unc orr e late d r a ndom va ria bles w it h mea n 0 a nd va ria nc e            (Rice & S il ve rma n, 199 1; Ha ll e t al., 2006) . The re fo re , fun c ti ons  󰇛  󰇜 a nd   󰇛  󰇜 in E qua ti on (4 ) a r e c los e re late d to t he m e a n fun c ti on a nd c orr e l a ti on fu nc ti on of th e stocha sti c proc e ss  󰇛  󰇜    󰇛  󰇜 , a nd we will use them a s the ba sis func ti on s in E qua ti on (2 ) . The ba sis func ti ons  󰇛  󰇜 a nd   󰇛  󰇜 in Equa ti on (4 ) f or G a ussi a n da ta c a n be e st im a ted b y spl in e smoo thi ng a nd fu n c ti ona l princ ipal c omponent a na l y sis methods in R ice a nd S il ve rma n (1 99 1) a nd R a msa y a nd S il ve rma n (20 05) . Af ter e sti matin g the c om mon ba sis func ti on s  󰇛  󰇜 a nd   󰇛  󰇜 , the ne x t st e p is t o e sti m a t e the c oe ff i c ients     in t he standa rd Poi sson re gre ssi on model fr om obs e rv e d ra w c i tations   󰇛  󰇜 . This c a n be don e b y max im u m l ikelihood esti ma ti on f or Poi sson re gre ssi on , w hich is i mpl e mente d in man y statis ti c a l pa c ka g e s . I n ou r a na l y sis , the e sti mation of the c oe f fic ient s     is don e on a 12 W indows 8 L a ptop wit h I ntel i7 - 4510 U CP U 2.0G Hz b y usi n g the glm () f unc ti on in t he fr e e statis ti c a l softwa re R (v e r sion 3 .1.1). 4.3. C lust e r analysis Give n that the N pa pe rs a nd their c orr e spondi ng c umul a ti ve c it a ti on c urve s   󰇛  󰇜 c a n be re pre s e nted a s N point s in t he  - dim e nsion a l spac e of c oe f fic ients (     , ...,     ), w e prop ose to c onduc t clust e r a na l y sis b y a ppl y in g the K - me a ns c lust e ring a l g orithm to the re duc e d  - dim e nsion a l coe ff ici e nt spac e . I n a ddit ion, in t his  - dim e nsion a l coe f fic ient spa c e , the c oe ff i c ients     in Equa ti on (2 ) c or re spond t o diff e re nt ba sis func ti on s   󰇛  󰇜 a nd va r y c onsi de ra bl y in sca l e . T he re fo re , w e fir st standa r diz e c oe ff icie nts     by  󰆻              ( 5 ) whe re    a nd   a re re sp e c ti ve l y the me a n a nd standa rd de riva ti on of the f it ted N c oe ff i c ient va lues (     , ...,     ), fo r e a c h pr incipa l compone nt ν = 1,.. .   . S ubse que ntl y , w e de fine the d ist anc e be twe e n pa p e rs in ter m of c it a ti on tra j e c torie s a s the Euc li d dist a nc e of the sta nd a rdiz e d c oe f fic ients (  󰆻    ,...,   󰆻    ) in t he  - dim e nsion a l spa c e , ba s e d on whic h we use the K - me a ns c lust e ring a l g ori thm to clust e r pa pe rs into K diff e r e nt group s. Give n the e x plora ti ve na ture of this stud y , we e x pe rime nt an d c ompar e c lust e rin g re s ult s for K = 2, 3, 4, 5, a nd 6 c lust e rs. 4.4. Summ ary of me thodol ogy Our prop ose d fun c ti ona l P oiss on re g r e ssi on model f or c lust e rin g pa p e r c it a t ion t ra jec torie s c a n be summ a riz e d a s follows. 13  Give n the T - y e a r c umul a ti ve c it a ti on tra jec tori e s o f N pa pe rs   󰇛  󰇜 for i = 1, 2, …, N , a nd j = 1, 2, …, T , f irst der ive the squa r e - root tra ns for med da ta,   󰇛  󰇜     󰇛  󰇜 .  Esti mate the me a n fun c ti ons  󰇛  󰇜 a nd e i g e nfun c ti ons   󰇛  󰇜 of the tra nsfor m e d da ta   󰇛  󰇜 , using func ti ona l prin c ipa l compone nt ana l y sis .  De ter mi ne  , the numbe r of e igenf un c ti ons   󰇛  󰇜 to re ta in.  F or e a c h indi vidual pa pe r i , use the me a n fun c ti ons  󰇛  󰇜 a nd  e i g e nfun c ti ons   󰇛  󰇜 a s ba sis func ti ons a nd fit a P o iss on re g r e ssi on model to i ts obs e rve d c umul a ti ve c it a ti on tra jec tor y 󰇛   󰇛  󰇜    󰇛  󰇜      󰇛  󰇜 󰇜 . This y ields, f or e a c h indi vidual pa p e r, the e sti mate d c oe ff i c ients                   . Ac c o rdin g ly , the T - dim e nsion ve c tor of c umul a ti ve c it a ti ons for pa pe r i, 󰇛   󰇛  󰇜    󰇛  󰇜      󰇛  󰇜 󰇜 , c a n be re pr e se nted b y it s  - dim e nsion a l ve c tor of c oe f fic ients,                   .  S tanda rdiz e e a c h c oe f fic i e nt     b y  󰆻              , whe r e   a nd   a re t he mea n a nd standa rd de riv a ti on of the N fitted c oe f fic ient va lu e s                   , for e a c h prin c ipal c omponent        .  Appl y the K - me a ns c lust e ring a l g orithm to t he standa rdiz e d c oe f fic ients  󰆻    to g roup N pa pe rs into K c lust e rs. 5. Re su lts This s e c ti on re ports the n umer ica l r e sult s of a ppl y i ng our p ropose d model a nd method to our sa mpl e d 1699 APS journa l pa pe r s . 5.1. Esti mating b as is function s 14 The ba sis func ti on s  󰇛  󰇜 a nd   󰇛  󰇜 pla y a n im porta nt role in our p ropose d model a nd method, a nd t he y a r e e sti mate d in R (ve rsion 3.1.1) usi n g t he c ode s of R a msa y , Ho oke r, a nd G ra ve s (2009 ) . I ns e rt F i g . 3 h e re Fig. 3 plot s the e sti mate d mea n c urv e  󰇛  󰇜 a nd it s first de riva ti ve    󰇛  󰇜 . He re  󰇛  󰇜 a nd   󰇛  󰇜 a re c losel y re late d to t he a ve ra g e c umul a ti ve c it a ti ons a nd a ve r a g e a nnua l citati ons ove r tim e , re spe c ti ve l y . Th e e sti ma ted f irst der ivative   󰇛  󰇜 is pos it ive but de c re a s e s ove r ti me. T his i s c onsi stent wit h the “ t y pic a l ” c it a ti on pa tt e rn tha t t h e a nnua l citations g e n e ra l l y a r e the la r g e st i n e a rl y y e a rs a nd subs e que ntl y de c li ne slowl y . I ns e rt F i g . 4 h e re Fig. 4 plot s the e sti mate d sm oothi ng ve rsions of th e fir st four e i ge nfunc ti on s   󰇛  󰇜 . The y c orr e spond t o the four lar g e st ei ge nva lues of 299.8 6, 16.39, 2.17 a nd 0.65, a nd these four e igenf unc ti ons a c c ount f or 93.8% , 5.1% , 0.7% a n d 0.2% of the tot a l var iab il it y , r e spe c ti ve l y . The sha pe of th e se e i ge nf unc ti ons i ndica tes how a pa pe r’ s c umul a ti ve c it a ti on tra jec tor y mi g ht de viate fr om t he m e a n c u rve  󰇛  󰇜 . S pe c ific a ll y , th e fir st s moot he d e ig e nfun c ti on    󰇛  󰇜 is posi ti ve a nd monot onica ll y incr e a sin g . The re fo re , if a pa pe r ha s a posi ti ve c oe ff i c ient on    󰇛  󰇜 , then thi s pa pe r w ill ha ve more c it a ti ons than the a ve ra g e p a pe r ( i.e., the me a n c urve ) a c r oss all y e a rs, a nd mo re im porta n tl y it s a dva nta g e ov e r the a ve ra ge pa p e r ma g nifie s ove r tim e . This obse rva ti on is c onsi stent wit h the we ll - known c um ulat ive adv antage or pre f e re nti al at tachm e nt phe nomenon in c it a ti ons. The se c ond sm oothed e i g e n func ti on    󰇛  󰇜 is pos it ive i n e a rl y y e a rs but ne ga ti ve in l a te y e a rs . I f a pa pe r h a s a posi ti ve c o e ff icie nt on    󰇛  󰇜 , then this pape r w ould hav e re lativel y more c it a ti ons in ea rl y y e a rs tha n a n a ve ra g e pa p e r but f e we r c it a t ions i n late r y e a rs, 15 displ a y in g a r e lativel y fa st citati on a g e in g p roc e ss . The thi rd a nd fo urth s moot he d e igenf unc ti ons,    󰇛  󰇜 a nd    󰇛  󰇜 , c a p ture more fine - g ra in e d fl uc tuation i n c it a ti on tra je c torie s ove r tim e . F urth e rmor e , the y both e x hibi t a pe rio dic pa tt e rn, su gg e st in g that the hig hl y or l e ss c it e d fe a tu re c a n be c y c li c . 5.2. De term ini ng the num be r of eige nfunctions A c ritica l st e p of ou r a na l y s is i s to d e c ide how ma n y e i ge nfu n c ti ons t o re tai n , for whic h the re is sti ll no st a nda rd proc e dur e in t he F DA li ter a ture ( W a ng , C hiou, & Mue ll e r , 2015) . T he rule o f thum b is to choose a re a s ona bl y small num be r  of e igenf unc ti ons t ha t not o nl y e x plain hig h pe rc e nt a g e ( e . g ., 95% or 99% ) of total va ri a ti on but also ha ve a g ood f it t o the obse rve d d a ta . The re fo re , w e take int o a c c ount bot h the total e x pl a ined va ria bil it y a nd the g oodne ss o f f it . I n te rms of e x plaine d va r iabili t y , th e fir st one, two a nd thre e e igenf un c ti ons t og e the r a c c ount for 93.8% , 98.9% a nd 99.6% of the tot a l var iabili t y , re spe c ti ve l y . Ac c o rdin g to t he rule of thumb, that is , 95% or 99 % of to tal va ria ti on to re tain, we c a n c hoose  = 2 or 3. I ns e rt F i g . 5 h e re W e then e x a mi ne the g oo dne ss of f it . F ig . 5 e va lu a tes the g oodne ss of fit for the f irst  = 2,3,4,5 ba sis func ti on s using the mea n squa r e d e r ror (MS E) c rite rion. Mor e pre c is e l y , r e sult s in Fig. 5 a re ba s e d on 10 - fold c r os s va li da ti on: W e ra ndoml y pa rtiti on the 1699 pa p e rs int o 10 subg roups, whe re 9 sub group s ha ve 170 pa pe rs a nd th e 10 th s ubg roup h a s 169 pa pe rs. Of the 10 sub g roups, a sing le sub group is re tai ne d a s the va li da ti on se t f or e va luatin g the g oodne ss of f it ti ng , a nd the re maining 9 sub group s a r e use d a s tra ini n g da t a to fit our p ropose d fun c ti ona l P oiss on re g r e ssi on model usi ng the  = 2,3,4,5 ba sis f unc ti ons. W e re pe a t t he proc e ss 10 t im e s, with ea c h of the 10 16 subg roups use d e x a c tl y o nc e a s the v a li da ti on da ta to ca lcula te the me a n sq ua re d e rr or (MSE). The a ve ra g e m e a n squa re d e rr ors fr om t he se 10 r e pe ti ti ons a re plot ted in Fig.5 . B a se d on thi s g r a p h, we c a n a do pt a str a te g y sim il a r to the C a tt e ll ’s sc re e test , that is , s e a rc h for the e lbow point . I t see ms t ha t t he goodne ss of f it im prove s c onsi de ra bl y whe n inc re a s ing  fr om 2 t o 3, while a fur the r incr e a se i n  onl y im prove s the g oo dne ss of f it mar g inall y . The re fo re , w e c hoose  = 3 , pa rtl y b e c a us e incr e a si ng  fr om 2 t o 3 br ing s the lar ge st im prove ment in fitti ng pe rf orma n c e a nd pa rtl y be c a use the fir st t hre e e igen func ti ons c ontain 99.6% va ria bil it y , whic h is s uff icie ntl y hi g h . 5.3. Fitt ing indi v idual pape r mode ls B a se d on th e e sti mate d b a sis func ti on s, we fit our propo se d fun c ti ona l P oiss on re g re ssi on model to ea c h indi vidual pa pe r in the da tase t , followin g t he proc e du re d e sc ribe d i n se c ti on 4.2. F or e va luatin g the f it ne ss of our mode l , we c ompa re o ur mode l wit h a r e c e ntl y de ve loped pa ra metr ic model for indi vidual pa p e rs’ c it a ti ons doc ument e d in W a ng e t al. ( 2013 ) ( he re a fte r the W S B model) . W a ng e t a l. (20 13) model pa pe r c it a ti ons b y a P oiss on pr oc e ss, sp e c ific a ll y , the e x pe c ted c umul a ti ve number of c it a ti ons of the i - th pape r in y e a r t ( t ≥ 0) is  󰇡  󰇥    󰇡  󰇛  󰇜      󰇢 󰇦   󰇢 ( 6 ) whe re  󰇛  󰇜  is t he c umul a ti ve de nsit y fun c ti on of the sta nda rd no rm a l N (0,1) ra n dom var iable ,   ,   , a nd    a re th re e p a pe r - spe c ific pa r a mete rs that d e sc ribe the c it a ti on tra je c tor y of the i - th pape r, a nd pa ra m e ter m is a g lo ba l consta nt for the a ve r a g e c it a ti ons of a ll pa pe rs a nd is se t at 30 in W a ng e t al. ( 2013 ) . 17 F or f it ti ng indi vidual pa p e r mode ls, t he na tur a l ch oice is t o use the e sti mate d ba sis func ti on s  󰇛  󰇜 a nd   󰇛  󰇜 in se c ti on 5.1 dire c tl y to der iv e the e sti mate d c oe ff i c ients     in Equa ti on (2 ) f or e a c h indi vidual pa pe r (a s will be im pl e mente d in t he ne x t subs e c ti on fo r c lust e ring a n a l y sis ) . How e ve r, using thi s a ppr oa c h for c ompar in g mod e l fit ti ng pe r for man c e is unfa ir to t he W S B model , be c a use ou r f unc t ional P oiss on re g r e ssi on model would ha ve use d t he sa me da tas e t t wic e : One a t t he population l e v e l for e sti mating b a sis fu nc ti ons a nd the othe r a t t he indi vidual pa pe r leve l for e sti matin g pa pe r - spe c ifi c c oe ff icie nts on the c omm on ba sis func ti ons . How e ve r, th e W S B model use s the da ta onl y on c e . The re fo re , fo r a r e lativel y f a ir c ompa rison of model f it ti ng , we us e the sa m e 10 - fold c ross - va li da ti on a s di sc ussed in S e c ti on 5.2. S pe c ific a ll y , w e ra ndoml y pa rtiti on the 1699 pa pe rs e ve nl y int o ten subgr oup s. F or pa p e rs in e a c h sub g roup, we fit our f un c ti ona l P oiss on re g r e ssi on model usi ng the 3 ba sis f unc ti ons e sti mate d fr om pa pe rs in a ll the other nin e subgroups. F or pa pe rs in e a c h sub g roup, we a lso fit the W S B model se pa r a tel y . The n w e c a lcula te the me a n squa re d e rr or (MSE) of t he fit b y ou r mode l a nd W S B model . I ns e rt F i g . 6 h e re To a ssess the g oodne ss o f f it , we c ompa re the dist ributi on of re siduals . I n a ddit ion, w e plot log MS E s ins tea d of MS E s a t t he orig inal sc a le , c onsi de ring th a t the dist ributi on of MS E s is hig hl y ske w. Fig . 6 lef t pane l pl ots t he ke rne l densit ies of log MS E s . Our fun c ti ona l P oison re gre ssi on model c lea rl y ha s sm a ll e r MSEs , a nd th e W il c ox o n sum r a nk t e st fur ther su gg e sts that the MS E s of ou r pr opose d func ti on a l P oiss on re g r e ssi on model a re stoch a sti c a ll y small e r tha n thos e of the W S B model. I n a ddit ion , F ig. 6 rig ht pan e l re port s a sc a tt e r plot of lo g MS E s, whic h sugge st s 18 that our p ropose d model fits m ost pape rs ( i.e., poi nts below t he diag on a l l ine) be tt e r th a n the W S B model. I t i s im porta nt t o note tha t thi s c ompar ison i s st il l t o som e e x tent unf a ir to t he W S B model. W e a dopted a 10 - fold c ross - v a li da ti on st ra te g y a nd use d a se pa r a te tra ini n g se t f or e sti mating our ba sis func ti ons, a lt houg h th e tra ini ng s e t does not ove rla p with t he testin g se t, the y a re s a mpl e d fr om t he sa me bi g da tas e t and the re fo re sti ll sha re man y thi n g s in comm on . I n a ddit ion, t he W S B model is de ve loped f or p re dicting lon g - ter m citati ons, while t he g o a l of our model is to have a pa rsimoni ous c ha ra c ter iz a ti on of c it a ti on tra jec tori e s with s a ti sfa c tor y g ood ne ss of f it . The r e for e , the W S B model would a void over fitti ng , while ou r mode l would i ntentionall y ove r - fit t he da ta to c e rta in de gre e . F or th e s a me r e a son, we opted for the or ig inal W S B model doc umente d in W a ng e t al. ( 2013) fo r this compa rison, ins tea d of the W S B - w ith - prior model doc umente d in S he n, W a ng , S ong , a nd B a r a bá si (2014) . The W S B - wit h - prior model incor por a t e s a c onju g a te p rior a nd ther e b y re duc e s the n umber of e sti mate d pa ra mete rs, for a void ing ov e r fitti ng . C ompar e d with t he orig inal W S B m ode l, t he W S B - with - prio r mode l has a lowe r f it ti ng pow e r but a hi g he r pre diction powe r. I n su mm a r y , b a se d on the c o mpar ison re sult s, we do not c laim t ha t our mod e l is s upe rior to t he W S B model, e spe c iall y wh e n c o nsider ing their di ff e re nt purpo se s a nd struc ture s, but onl y c onc l ude that our model doe s f it the da ta we ll . 5.4. C lust e r ing paper traj e c tori e s Using the e sti mate d ba sis func ti on s  󰇛  󰇜 a nd the f irst t hre e   󰇛  󰇜 fr om t he whole sa m ple of 1699 pa pe rs a s re porte d in sub se c ti on 5 .1, we e sti mate c oe ff i c ients     in E qua ti on (2 ) f or e a c h o f the 1699 pa pe rs. T h e se e sti mate d c oe f fic ients     a re th e n st a nda rdiz e d a nd use d a s inpu ts for the K - 19 mea ns c lust e rin g a n a l y sis . Give n the e x plora ti ve na ture of this clust e rin g a n a l y sis , we e x pe rime nt wit h diff e re nt number of c lust e rs, ra n ging fr om t wo to six . I ns e rt F i g . 7 h e re W e fir st re port r e sult s for four c lust e rs. T o il lust ra te c ha r a c te risti c s of the identifie d fou r c lust e rs, i.e., f our g e n e ra l t y pe s of c it a ti on tra jec torie s, w e f ind t he c e nter s of e a c h c l uster in t he 3 - dim e nsion a l st a nda rd iz e d c oe ff i c ients spac e s a nd t he n c onve rt the m b a c k in to t he orig inal pa p e r c it a ti ons spac e to der iv e c e ntra l cu rve s in t e rms of c umul a ti ve a nd a nnu a l c it a ti ons ( Fig. 7 ). The number of ob se rv a ti ons i n e a c h c lust e r is a s follo ws: re d ( 97 2 pa p e rs, tha t is, 57 . 2 % of the whole sa mpl e of 16 99 pa pe rs) , blue ( 4 54 pa pe rs, 2 6 . 7 %) , purple (228 pa p e rs, 13. 4% ), a nd g r e e n (4 5 pa pe rs, 2. 6 % ). B oth t he re d a nd blue c u r ve s in Fig. 7 ar e c onsi ste nt wit h pre vious c lust e ring studi e s (A v e rsa , 1985; C ostas e t al., 2010; C olaviz z a & F ra nc e sc he t, 2016) , in t he se nse that the spe e d of c it a ti on a g in g is s low f or som e p a pe rs w hil e r e lativel y f a st for other s. How e ve r, th e y e a r o f c it a ti on pe a k se e ms t o be the sa me for both the r e d a nd blue c ur ve s, whil e the onl y di ff e r e nc e is about the s c a le of the pe a k. Th e re for e , b oth re d a nd blue c u rve s mi g ht belon g to t he c a te gor y o f normal doc uments a s labe led b y C ostas e t al. ( 2010) . W e na me the re d c urv e a s no rmal - low a nd the blue c urve a s nor mal - high. The purp le c u rve , c omp a re d with bot h the r e d a nd blue one s, di spla y a slowe r r isi n g proc e ss, as we ll a s a slowe r de c li nin g pro c e ss a fte r the c it a ti on pe a k. The ti mi ng o f i ts citati on pe a k is late r than the re d a nd blu e one s. The sc a le o f its citation pea k is l owe r tha n th e blue one but hi g h e r than the re d one . I n a ddi ti on, it s tot a l num be r of 30 - y e a r c it a ti ons i s lar ge r t ha n both the r e d a nd 20 blue one s . The purp le c u rve c or re sponds t o the de laye d doc um e nts , a s labe l e d b y C ostas e t al. (2010 ) . The most int e re sti ng c ur ve in Fig . 7 is t he g r e e n o ne , whic h c le a rl y de mons tra tes a c onti nua l ris e in annua l c it a ti ons wit hout dec li ning withi n the 3 0 - y e a rs p e riod a ft e r pub li c a ti on. W e re f e r to thi s t y p e of pa p e rs a s e v e rgree n s , whic h w e re e mphasiz e d b y P ric e (se e Av e rsa (198 5 ) ) a nd Avr a mesc u (1979 ) but w e re not i de nti fie d b y late r c lust e r a na l y se s (A ve rsa , 1985; C ostas e t al., 2010) . M arat hone rs in som e spe c ific a ti ons i n C ol a viz z a a nd F ra nc e sc h e t ( 2016) a lso di spla y a c onti nua ll y incr e a sin g a n nua l citation cur ve . Th e s e e v e rgree ns a pp e a r to ha ve fe w e r c it a ti ons than the normal - high a nd de laye d doc u me nts in t he fir st fe w y e a rs a fte r pu bli c a ti ons but c lea rl y much mor e c it a ti on s in t he long ru n. F urth e rmor e , a ll other t y p e s (i.e ., nor mal - low , normal - high , a nd de laye d do c ume nts ) sti ll follow the “ t y pica l” c it a ti on tra jec tor y , wh e re a pa pe r ’s a nnua l c it a ti on s rise to i t s pe a k shortl y a fte r pub li c a ti on a nd then slowl y de c li ne , a lt houg h som e t y p e s re a c h th e c it a ti on pe a k hi g he r or f a ster than othe rs . How e ve r, e v e rgree ns c lea rl y viol a te this “ t y pic a l” pa tt e rn, a t l e a st withi n the 30 - y e a r tim e window, w hich is m uc h long e r tha n the c it a ti on ti me w indow a dopted in most bibl iom e tric a na l y s e s. I ns e rt F i g . 8 h e re R e sult s for other c hoic e s of K a re r e porte d in Fig. 8 . On the one ha nd, d e c r e a sing K would mi ss some t y pe s o f c it a ti on tra jec torie s. F o r e x a mpl e , the thre e - c lust e r r e sult ( Fi g. 8A3 ) miss es de laye d doc u me nts , a nd t he two - c lust e r re sult ( Fig . 8A2 ) a ddit ionall y mi sse s e v e rgre e ns . On the other ha nd, incr e a sin g K fr om 4 t o 5 or 6 doe s not unc ove r ne w t y pe s whic h a re suf fic ientl y dist inct f rom the ide nti fie d fou r t y p e s, a nd a ddit ional c lust e rs in Fig. 8A5 - 6 loca te in a c onti nuous spa c e f rom fa st t o sl ow a g e in g , followin g the “ t y p ica l ” pa tt e rn . 21 I n o rde r t o be tt e r e v a luat e the pe r for manc e of ou r pr opose d c lust e rin g a pp r oa c h, w e c ompar e our propo se d c lust e rin g meth od , whic h c lust e rs c it a ti o n tra jec torie s b a se d on th e  - dim e nsion a l ve c tor of stand a rdiz e d pa pe r - spe c ific c o e ff i c ients  󰆻   , with t wo a lt e rna ti ve a pp roa c he s, spe c i fic a ll y , c lust e ring b a se d on ( a ) the T - dim e nsion a l vec tor of th e ra w a n nua l citations ( raw an n ual me thod ) a nd (b ) the T - dim e nsion a l vec tor of the prop ortion of a nnu a l citations ( proport ion me thod , i.e., nor maliz e d a nnua l citations, t he number of a nnu a l citations i n e a c h y e a r divi de d b y the numbe r of tot a l citations over the T y e a rs) . F or the c ompar is on of c lust e rin g re sult s we foc us on t wo a spe c ts : t he sha pe of th e c e ntra l cur v e s a nd the dist ributi on of pa pe rs a c ross c lust e rs. C lust e ring re sult s usi n g the pr oportion m e thod for K = 2, …, 6 a re re porte d in Fig. 8B2 - 6 . C ompar e d with our pr op ose d method, the pr oport ion m e thod clust e rs pa pe rs more e ve nl y a c ross diff e re nt c lust e rs. I n te r ms of the sha pe of the c e ntra l cur ve s, usi n g K = 4 ( Fig. 8B4 ) a s a n e x a mpl e , a ll four c urve s se e m t o re a c h their pe a k a round the sa me tim e ( w hil e the g r e e n c urve ha s a n ini ti a l l oc a l pea k a t ar ound the sa me tim e , f oll owe d b y a de c li ne a nd then sta rt s ris ing a ga in) , a lt hou g h the y dis pla y v e r y diff e re n t spe e d of c it a ti on de c li ning . I n a ddit ion, the spe e d of c it a ti on de c li ning se e ms t o be posi ti ve l y a ssoci a te d with t he sc a le of the p e a k . F or e x a mpl e , the blue c urve h a s the hi g he s t pea k a nd a lso the fa s tes t citation dec li ne a fte r the pe a k. I t i s di ff icult to i nter pre t t he c lust e rs. Ma y b e the r e d , purple , a nd blue c urve s c a n be l a b e led as de lay e d doc ume nt, normal doc um e nt, a nd f lash - in - the - pan re spe c ti ve l y , a c c o rding t o their spe e d of rising a nd de c li nin g , but t he re d one doe s not se e m t o ha ve a late r p e a k than th e o ther s. I n a ddit ion, it is unc lea r how to i nte rpr e t t he gre e n c urv e , it a lso ex hibi ts a c onti nua l rise in a nnua l citations (if w e ig nore th e de c l ine follow ing the f irst l oc a l pea k ), simi lar to our ide nti fie d e v e rgree ns . How e ve r, diff e re nt fr om e v e rgree n s , the numbe r of a nnu a l c it a ti ons of the g r e e n t y pe in Fig. 8B4 is a small 22 c onst a nt, and most pa pe r s in t he g r e e n c lust e r ha v e ve r y li mi ted numbe r of tot a l citations . One possi ble e x plana ti on is th a t t his alt e rna ti ve a ppro a c h use s the pr oportion of a nnua l citations, whic h is ve r y se nsit ive w he n a pa p e r h a s a re lative l y small num be r of tot a l c it a ti ons. C e ntra l cur ve s o f a nnua l c it a ti ons re sult ing fr om t he c lust e rin g method ba s e d on r a w a nnua l c it a ti ons a re plot ted in Fig. 8 C2 - 6 . The c lust e rin g re sult is do mi na ted b y th e sc a le o f c it a ti ons , but does not re ve a l di sti nc t fe a tur e s be twe e n diff e r e nt clust e rs in t e rms of th e sha pe of c it a ti on c urve s . Ta k e 4 - c lust e r r e sult s ( Fig. 8C 4 ) a s a n e x a mpl e , 9 4 . 2% pa pe rs (r e d ) ha ve a mode ra te number of c it a ti ons, 5.1 % pa pe rs (pur ple) h a ve e ve n fe w e r c it a ti ons, 0 .6% pa pe rs (blue ) ha v e c onsi de ra bl y more c it a ti ons, a nd 0. 1% pa pe rs ( g r e e n) a r e e x tre mel y hig hl y c it e d. Ex c e pt t he g r e e n c u rve , a ll other s show a sim il a r sha pe in t he c it a ti on c urve , a nd th e diff e r e nc e b e twe e n them is the sc a le of c it a ti ons. Althoug h thi s a lt e rn a ti ve a pproa c h a lso suc c e ssfull y identifie s a small num be r of e v e rgre e n pa pe rs (i.e ., the g r e e n c urve ) , it mi sses out some true e v e rgre e n pa pe rs , that is , p a pe rs tha t ar e c lassifi e d a s e v e rgre e ns b y ou r pr opose d met hod but not b y the r a w a nnua l m e thod a c tu a ll y a lso ex hibi t a pa tt e rn of c o nti nua l ri se in annua l cita ti ons. Thus, we c onc lude tha t clust e ring using ra w a nnua l c it a ti ons i s ove r - domi na ted b y th e sc a le o f c it a ti ons a nd is i na de qua te f or c a pturin g nua n c e d dif fe r e nc e in t he sha pe of c it a ti on tra je c torie s. 5.5. Ex plori ng c harac terist ics of e v e rgre e ns The c lust e rin g r e sult c lea rl y su gg e st s the e x ist e nc e of e v e rgr e e ns a s a g e n e r a l t y pe o f c it a ti on tra jec tor y , in additi on to pre vious l y do c umente d n ormal a nd de laye d do c u me nts . I t a lso ra ises the que sti on of how a r e f our c lust e rs dif f e r f rom e a c h other , in t e rms of va r ious p a pe r f e a ture s, suc h a s the numbe r o f a u thors a nd re f e re n c e s. I n a ddit ion, how do e v e rgre e ns diff e r fr om ot he rs a nd c a n w e pre di c t whe th e r a pa p e r w il l be c ome a n e v e rgre e n pa p e r ba s e d on it s obser ve d pa pe r 23 fe a tur e s? To a nswe r these que sti ons, T able 1 re po rts the me a ns a nd media n s of va rious pa pe r fe a tur e s b y c lust e rs, a nd T able 2 c onduc ts nonpa r a metr ic W il c ox on ra nk sum tests for pa irw ise c ompar ison s be t we e n fou r ide nti fie d c lust e rs . I ns e rt Ta ble 1 h e re I ns e rt Ta ble 2 h e re The orde r o f c lust e rs fr o m bigg e st t o sm a ll e st i n ter ms of the numbe r o f 3 - y e a r c it a ti ons is: normal - high , de laye d , e v e rgree n , a nd nor mal - low , a nd pa irw ise dif fe r e nc e s a re a ll si g nific a nt e x c e pt for be twe e n e v e rg re e ns a nd normal - low do c ume nts . On the othe r ha nd, the or de r f rom bigg e st t o sm a ll e st i n ter ms of 30 - y e a r c it a ti ons i s: e v e rgre e ns, de laye d, no rmal - high , a nd normal - low . I t i s in l ine w it h Fig. 7 that e v e rgree ns a nd de laye d do c ume nt s ha ve fe we r c it a ti ons in t he short r un but m uc h more c it a ti ons i n the long run, c ompar e d with normal - high doc ume nts , while normal - low doc um e nts a re the m a jorit y o f p a pe rs w hich h a ve onl y r e lativel y f e w c it a ti ons throug hout t he whole ti m e pe riod. I n te rms of the numbe r o f r e f e re n c e s, the onl y si gnific a nt di ff e re nc e s a r e that de laye d do c ume nts ha ve more re f e re n c e s tha n normal - low a nd normal - high doc uments. The o rde r of c lust e rs f rom bigg e st t o sm a ll e st ac c or ding to t he number o f pa g e s is : de lay e d , e v e rgre e ns , normal - low , normal - high , while e v e rg re e ns a r e not s ig nific a ntl y diff e re nt fr om de lay e d or normal - low . The re se e m s to b e a posi ti ve a s soc iation betwe e n the nu mber of pa ge s a nd c it a ti o n de la y , in l ine w it h W a ng , Thij s, a nd Glä nz e l (2015) . N ormal - high , normal - low , de laye d, a nd e v e rgre e ns , following thi s orde r, ha ve fr om t he l a r g e st t o the sma ll e st num be r of a uthors, while the pa irw is e diff e r e nc e be twe e n nor mal - low a nd de laye d 24 doc ume nts is i nsig nific a n t. E v e rgree ns invol ve f e we r insti tut e s than normal - high , while normal - high ha ve more inst it utes tha n normal - low . I n a dd it ion, t he re a re no si g nifi c a n t di ff e re nc e s be twe e n c lust e rs in the n umber of c ountrie s. I nter e sti ng l y , e v e rgree ns s e e m t o ha ve a small tea m siz e in t e rms of the numbe r of a uthors o r involved inst it utes. This obser va ti on re mi nds t he thesis that bre a kthrou g hs a re of ten de li ve re d b y “ lone w olves ” (Steinbe c k, 1952) , a nd futur e r e se a rc h is ne e de d fo r a b e tt e r und e r standing of th is obse rv a ti on. I ns e rt Ta ble 3 h e re I ns e rt Ta ble 4 h e re F oc usin g on e v e rgre e ns , T able 3 provide s de tailed c ha ra c ter ist ics of the e v e rgree n c lust e r su c h a s the a ve r a g e a nnu a l and c umul a ti ve c it a ti ons i n va rious y e a rs. F u rthe rmor e , we e sti mate two log ist ic r e g r e ssi on models, t e sti ng whe the r e v e rgr e e ns c a n b e identifie d b a se d on the se t of pre vious l y discuss e d pa p e r f e a ture s ( T able 4 ). I n both l og ist ic r e g r e ssi ons, t he re sponse f o r e a c h pa pe r is a bina r y va ri a ble : W he ther the pa pe r is a n e v e rgre e n pa p e r or no t. T he fir st l og ist ic re g re ssi on inco rpor a tes five e x ante e x plana tor y v a ria bles whic h a re de t e rmi ne d upon publi c a ti on, while the se c ond a dds t wo mor e e x po st e x plana tor y va ri a bles, spe c ific a ll y the l og number o f 3 - a nd 30 - y e a r c it a ti ons. Using the f irst model, i f w e c lassif y p a pe r s a s e v e rgre e n s whe n the f it ted pr ob a bil it y is g re a ter than 0.02 6 , th e n the mi sc lassific a ti on r a te is 34.0% . I f w e c lassif y p a pe r s with a f it ted pr oba b il it y gre a t e r tha n 0.5 a s e v e rgr e e n s , then the mi sc lassific a ti on ra te is 2 . 6 % , but in t his ca se no pa pe rs a r e c lassifi e d a s e v e rgree ns . The e x plana tor y powe r of the fir st m ode l i s ve r y low. Adding the c it a ti ons i mprove s the fitti ng p e rf o rma nc e . Usin g the s e c ond model, the mis c lassific a ti on ra te is 8. 5% a nd 1 .9 % , re spe c ti ve l y , if w e c lass if y pa p e r s with a fitted pr oba bil it y g r e a ter than 0.02 6 a nd 0. 5 a s e v e rgree n s . The r e g re ssi on re sult c onfir ms that 25 e v e rgre e ns tend to ha ve r e lativel y fe w e r c it a ti ons i n the shor t run but m uc h more c it a ti ons i n the long run. Mor e im porta n tl y , the r e gre ssi on re sult sugg e st s that w e a r e not able to pre dict e v e rgre e ns using re a dil y - a va il a ble e x ante pa p e r f e a ture s such a s th e numbe r of a uthors or re fe re nc e s, re fle c ti n g a hi g h lev e l of unc e rta int y in scie nti fic im pa c t. 6. Disc u ssi on T his pape r pr opos e s a no npa ra metr i c func ti ona l P oiss on re gre ssi on model to de sc ribe c it a ti on tra jec torie s o f indivi dua l pape rs a nd c ombi ne s our model with t he K - mea ns c lust e ring a l g orithm for c lust e r a na l y sis , usi ng the c oe f fic ients of the e ig e n func ti ons i n our model. R e sult s sugg e st t he e x ist e nc e of e v e rgree ns a s a ge ne r a l t y p e of c it a ti o n tra jec torie s. This pape r ma ke s two m e t hodolog ic a l contr ibut ion s . F irst, we de v e lop a f unc ti ona l da ta a na l y sis method f or discr e t e c oun t da ta, b y c ombi ning prin c ipal c omponent a n a l y sis a nd P oiss on re g re ssi on , while the p rior li ter a ture o f f unc ti ona l da ta a na l y sis is do mi na te d b y a n a l y z in g c onti nuous da ta . S e c ond , thi s pa pe r a lso demonst ra te s the use fulne ss of th e func ti ona l data a na l y sis f or bibli ometr ic studi e s. B e c a use it is a n onpa ra metr i c a ppro a c h a nd is de sig ne d f or a na l y z in g hi g h - dim e nsio na l data , the f un c ti ona l data a na l y sis c a n be a po w e rf ul t ool for bibl iom e tric a na l y sis . 6.1. L imit ati ons and f u ture re se arc h This s tud y ha s sev e ra l l i mi tations . F irst, constra ined b y d a ta a v a il a bil it y , we c a nnot c laim whe ther our o bs e rve d e v e rgree n pa p e r s will re ma in be ing ( hi g hl y ) c it e d in t he futur e or will e ve ntuall y be c om e obsol e te. A lt houg h th e latter i s ve r y plausibl e , the f orm e r is not entire l y im possi ble. L a riviè r e , A rc ha mbault, and G in gra s (2008 ) show tha t re s e a r c he rs ha ve b e e n r e l y in g 26 on a n incr e a sing l y old bo d y of lit e r a ture sinc e the mid - 1960s, so i t i s st i ll p ossi ble tha t som e c lassic pie c e s will ne ve r e x pe rie nc e o bsol e sc e or obli teration by inc orporat ion , that is , be c omi ng c omm onl y known a nd in teg r a ted into t he da il y wo rk in the f ield that it is no long e r e x pli c it l y c it e d (Me rton, 1983) . Althoug h we c a nnot dr a w a c o nc lusi ve infe r e nc e on the f a te of ou r identifie d e v e rgr e e n pa p e rs , the f indi ng that a c ons ider a ble numbe r of pa p e r s a ssemble c ha ra c ter ist ics of e v e rgr e e n s in a 30 - y e a r tim e pe ri od is sti ll ve r y re lev a nt f or sc ienc e a nd bibl iom e tric studi e s, si nc e most studi e s a nd e va lua ti ons use a shorte r tim e window a nd a ssum e a the “ t y pi c a l” c it a ti on tr a jec tor y . S e c ond, thi s st udy use s a sa mpl e of jour n a l ar ti c les in one fie ld (i.e ., ph y si c s) a nd on e y e a r ( i.e ., 1980) , a nd a c c or ding l y ha s a li mi tation i n ter ms of g e n e ra li z a bil it y . Third , a lt houg h our method c a n s ing le e v e rgre e ns out, it d oe s not i de nti f y slee ping beauties in scie nc e (V a n R a a n, 2004; K e , F e r ra r a , R a dicc hi, & F lammi ni, 2015) . This i s proba bl y be c a use slee pin g be auti e s a re v e r y r a re a nd ther e for e a re diff i c ult to identif y in lar ge sc a le sta ti sti c a l ana l y se s (Colaviz z a & F ra n c e sc h e t, 2016) . The re is plent y of r oom f or improving our func ti o na l data a n a l y sis method for c it a ti on da ta , c a ll ing fo r fur th e r r e se a rc h. F rom the f un c ti ona l sm oothi ng view point , the c umul a ti ve c it a ti on c urve must be non - d e c r e a sing . W hil e our p ropos e d fitt ing method y ield s non - de c r e a sin g fitt e d c urve s num e ri c a ll y for th e c umul a ti ve c it a ti ons of a ll 1699 pa pe rs in our da t a se t, i t i s im porta nt to de ve lop a be tt e r e sti mation m e thod t ha t g ua ra nte e s the non - de c r e a sin g pro pe rt y th e ore ti c a ll y , e .g ., usi n g the monotone smoo thi ng method de ve l ope d in R a msa y (1998 ) . F rom the c lust e r a na l y sis vie wpoint, we c onduc t unsu pe rvise d lea r ning in our da tas e t and r e l y on prior li ter a tu re a nd our domain knowle d g e on p a pe r c it a ti on be ha vior, for a ssessin g the c la ssi fic a ti on re sult s of diff e re nt a pproa c he s . I t will be use ful to de ve lop a more objec ti ve c rite ri on for e va lu a ti ng r e sult s of c lust e r a n a l y sis . I n a d dit ion, we ha v e som e int e re sti n g obse rv a ti ons, for e x a mpl e e v e rgre e ns 27 ha ve a r e lativel y small number of a uthors, mor e r e se a r c h is ne e d e d for b e tt e r und e rsta ndin g wha t de ter mi ne s the c it a ti on tr a jec tor y of a pa p e r. T he re g re ssi on model usi n g r e a dil y - a va il a bl e pa pe r fe a t ur e for pr e dictin g e v e rgree ns ha s ve r y poor pe rf orma nc e , a nd it would i nter e sti ng t o investi g a te w h a t ki nd s of int rinsic pa pe r qua li t y mi g ht pre dict wh e ther a pa pe r will be c ome a n e v e rgre e n in scie nc e . 6.2. Im pli c ati ons R e sult s of thi s pa pe r ha ve thre e im porta n t i mpl ica ti ons for bi bli ometr ic studi e s a nd re se a r c h e va luation s . F irst, our f i nding s de mons tra te th a t pape rs w it h si mi lar c it a ti ons i n the shor t run ma y h a ve c ompl e t e l y dif fe re nt c it a ti on pa tt e rns in the long run. De laye d d oc ume nts a nd e v e rgre e ns re c e ive fe w e r c it a ti ons i n the shor t run but m ore c it a ti ons i n the long run , c ompar e d with normal doc ume nts . This s e rve s a s a wa rnin g a bout t he bias in t he use of shor t - time - window c it a ti on c ounts in re se a rc h e va luations. S e c ond, the obse rv a ti on of e v e rgr e e ns c a ll s for m ore re s e a r c h on the “ lon ge vit y ” o f c it a ti on im pa c t, i n a ddit ion t o the a spe c t of “ de la y ” e mpha siz e d in prior li ter a ture . P he nomena of sc ientif ic pre maturi ty (Stent, 1972) , de laye d r e c og nit ion (G a rf ield, 1980) , a nd slee ping beauties (V a n R a a n, 2004 ) ha v e b e e n e x tensive l y stud i e d i n pre vious li ter a t u re , whic h foc us on t he lon g ti me la g be for e a sc ientifi c c ontribut ion make s noti c e a ble im pa c t. On the ot he r ha nd, e v e rgre e ns , sim il a r a s the te rm of ma ratho ne rs in C olaviz z a a nd F ra n c e sc h e t (2016) , re mi nds t he other im porta nt but unde rstudied a spe c t o f c it a ti on tra j e c to r y — u nfa din g or lon g - lasting im pa c t . Third, e v e rgr e e ns a lso ha ve im pli c a ti ons for pa r a metr ic mode ls of c it a ti on tra jec torie s. Th e re is a strong int e r e st i n modelli ng c it a ti on tra j e c torie s, p a rtl y b e c a us e it is a c ha ll e ng in g sc i e nti fic proble m and pa rtl y be c a u se of the polic y int e re st i n pre dicting lon g - ter m ci tations . I n a re c e nt 28 re port publi she d in Sc ien c e , W a n g e t al. ( 2013 ) pr opose d a pa r a metr ic non homog e n e ous P oiss on proc e ss t o model the c it a t ion t ra jec tor y of indivi du a l pape rs. Althou g h thi s mod e l i s e lega nt fr om the pur e mathe mat ic a l vi e wpoint, i ts pre dicti ve po we r is unsa ti sfa c tor y , e sp e c iall y for those hig hl y c it e d one s (W a ng , Me i, & Hic ks, 2014 ) . O ne possi ble e x plana ti on is that it a ssum e s the “ t y pic a l ” c it a ti on tra je c to r y , while e v e rgree ns , whi c h a re hi g hl y c it e d, do no t follo w this pa tt e rn. R e sult s of o ur no npa ra m e tric a na l y sis , in par ti c ul a r the obse rv a ti on of e v e r gre e ns , c a st dou bt on thi s a ssum pti on a nd she d li g ht on future p a ra met ric modeling of c it a ti on tra jec torie s. 29 R ef er enc e A nsc om be, F. J . ( 1 948) . T h e Tr an sf o r m at i on o f Po i ss o n, B i nom i al a nd N eg at i v e - B i nom i al D at a. Biomet ri k a, 35 ( 3/ 4 ) , 24 6 - 2 54. doi : 10.2 307/ 233234 3 A v er sa , E. (1 985) . C i t at i on pat t er n s o f hi g hl y ci t ed p ap er s a nd t hei r r el at i onsh i p t o l i t er at u r e ag i ng : A st udy of t he wo r k i ng l i t er a t ur e. S ci e nt om e t r i cs , 7 (3 - 6) , 383 - 389. A v r a m es cu, A. (19 79) . A ct ual i t y and ob sol es c enc e of sc i e nt i f i c l i t er at u r e. Jour nal of t he Am er i ca n Soc i e t y f or I nf o rmat i on S ci e nce , 30 ( 5) , 2 96 - 303. B aum g ar t ner , S. E ., & Ley des do r f f , L. ( 2014) . G r oup ‐ bas ed t r a j ec t or y m odel i ng ( G B T M ) of ci t a t i on s i n sc hol ar l y l i t er at u r e : dy nam i c qua l i t i e s o f “t r an si ent ” a n d “s t i ck y k now l edg e c l ai m s” . Jou r nal of t he As soc i at i on f or I n f orma t i on Sci enc e and T ec hno l og y, 65 ( 4 ) , 79 7 - 811. B es se , P., & R am sa y , J . O . ( 1986 ) . Pr i nc i pa l com pone nt s a nal y si s of sa m pl ed f un ct i ons. P sy chom et r i ka , 51 ( 2) , 285 - 311. B r ow n, L. D ., C ar t e r , A . V . , Low , M. G., & Z hang , C . - H . ( 2004 ) . Equ i v al enc e t h e or y f or d ens i t y es t i m at i on, Po i s son p r oce ss es and G aus s i an wh i t e no i s e wit h dr i f t . 2074 - 2097. doi : 10.1214 / 009 053604 000 000012 C ol av i z z a, G., & Fra nce s ch et , M . ( 20 16) . C l us t e r i ng ci t at i on h i s t or i e s i n t h e Phy si ca l R ev i ew. Jour n al o f In f o rmet ri c s, 10 ( 4) , 1037 - 1051. do i : h t t p : / / dx. doi . or g / 1 0.1016/ j . j oi . 2016.07. 009 C ost a s, R., v an Leeuwe n, T. N ., & v an Raan, A . F. (20 10) . I s s ci en t i f i c l i t er at u r e s ub j ec t t o a ‘ Se l l ‐ B y ‐ D at e’ ? A g ene r a l m et hodo l ogy t o ana l y z e t he ‘ du r ab i l i t y ’ of s ci e nt i f i c do cum ent s. Jour na l o f t h e Am er i ca n Soc i e t y f or I nf or m at i on Sci enc e and T ec hno l ogy, 6 1 ( 2 ) , 329 - 339. G ar f i el d, E. ( 1980 ) . Pre m at ur e d i sc ov er y or del ay ed r e cog ni t i on - Why . C urr ent C o nt en t s ( 21) , 5 - 10. G l änz el , W., & Scho epf l i n , U . ( 1995 ) . A bi b l i om et r i c s t udy on ag ei ng and r ec e pt i o n pr oc es s es o f sc i en t i f i c l i t er at u r e. Jour n al o f i nf o rm at i o n Sc i enc e, 21 ( 1) , 37 - 53. H ad j i p ant el i s, P. Z ., A st on, J . A ., & Ev ans , J . P. ( 2012 ) . C har a ct e r i z i ng f un dam ent al f r eq u enc y i n Ma nda r i n: A f unc t i o nal pr i nci pa l c om ponent app r oa ch ut i l i z i ng m i xed e f f ec t m odel s. T he J o urna l of t h e Acou st i ca l So ci e t y of Am er i ca , 131 ( 6) , 4651 - 4664. H al l , P. , Mü l l er , H . - G ., & Wa ng , J . - L. (2006) . P r ope r t i es o f p r i n ci p al com ponent m et hods f o r f u n ct i ona l and l o ng i t ud i na l da t a a nal y si s. T he Anna l s of S t a t i s t i cs , 34 , 1493 - 1517. H oov er , D . R ., R i ce , J. A., Wu, C. O ., & Y ang , L. - P. ( 1998) . N on par am et r i c s m oot hi ng es t i m at e s of t i m e - v ar y i ng coe f f i ci e nt m odel s w i t h l ong i t u di n al d at a . Bio m et ri ka, 85 ( 4 ) , 809 - 822. K e, Q., F er r ar a, E., R ad i c ch i , F., & Fl am m i ni , A . ( 2015 ) . D ef i ni ng and i d ent i f y i ng Sle ep i ng B ea ut i e s i n sc i e nce . P roc ee d i ngs o f t he N at i on al A cade m y of S ci e n ce s, 11 2 ( 24 ) , 74 26 - 7431. doi : 10.1073 / pn as .14 24329 112 Lar i v i è r e, V ., A r cha m baul t , É., & G i ng r as , Y . ( 2008 ) . Long ‐ t er m v ar i at i on s i n t he ag i ng of sc i ent i f i c l i t er at u r e : From expone nt i a l g r ow t h t o s t ea dy ‐ s t a t e s ci e nce ( 1900 – 20 04) . Jour nal o f t h e Am er i can Soci e t y f or I n f orma t i on Sc i enc e a nd Tec hnol ogy, 5 9 ( 2 ) , 288 - 296. Leng , X ., & Mü l l er , H . - G . ( 2 006) . C l as si f i c at i on u si ng f unc t i on al dat a a n al y si s f o r t em por al g ene expr e ss i on da t a. B i o i nf orm at i cs , 22 ( 1) , 68 - 76. Line , M. B. (1 993) . C hang e s i n t he u se o f l i t er at u r e wi t h t i m e - obsol es ce nce r ev i s i t ed. Lib rar y Tren ds, 41 ( 4) , 665 - 684. Mc C ul l ag h, P., & N el der , J . A . ( 1989 ) . G ene r a l i z ed L i nea r Mode l s, no. 37 i n Mon og r aph on St at i s t i cs an d A ppl i e d Pro bab i l i t y : C hap m an & H al l . Me r t on, R . K . ( 1983 ) . Fo r e w or d. I n E. G ar f i el d ( Ed.) , C i t a t i on i nd ex i ng, i t s t h eor y and app l i ca t i on i n sc i e nce , t e chno l ogy, and hu m a ni t i es ( pp. xi i i , 2 74 p. ) . Phil ade l phi a, PA : I S I Pre ss . N el de r , J . A ., & B ak er , R . J . ( 1972 ) . G en er a l i z ed l i n ea r m odel s. Ency cl op edi a of st at i st i ca l sc i en ce s . R am sa y , J . O . ( 1998 ) . Es t i m at i ng s m oot h m onot one f unct i ons. Jour nal of t he Ro yal S t at i s t i cal Soc i et y: Ser i es B ( St a t i st i ca l M et ho dol ogy) , 60 ( 2 ) , 365 - 375. R am sa y , J . O ., H ook er , G ., & G r av es , S. (200 9) . Fu nct i ona l dat a ana l ys i s w i t h R and M AT LA B : Sp r i ng er Sci en ce & B us i nes s M edi a. R am sa y , J . O ., & Sil v er m an, B. W. ( 2005 ) . Func t i ona l dat a ana l ys i s : Spr i ng er . 30 R edne r , S . ( 20 05) . C i t at i on st a t i st i cs f r om 110 y ea r s of Phy si ca l R ev i ew. Phys i c s T oday, 58 ( 6) , 49 - 54. doi : 10.1063 / 1.1 996475 R i ce , J. A., & Si l v er m an, B. W. ( 1 991) . Est i m at i ng t he m ea n and c ov ar i anc e st r uc t ur e no npar am et r i ca l l y w hen t h e da t a a r e c ur v es . J ourna l of t he R oya l St at i st i cal So ci et y. Ser i e s B ( M e t h odol og i ca l ) , 233 - 243. R og er s, J . D . ( 2010) . C i t at i on ana l y si s o f nan ot ec hno l ogy at t he f i el d l ev el : i m pl i ca t i ons of R & D ev al uat i on. R es ea rc h E val u at i o n, 19 ( 4) , 2 81 - 290. Ser ban , N ., Sta i cu , A . M., & C ar r o l l , R . J. ( 2 013) . Mu l t i l ev el C r os s‐ D ep ende nt B i nar y Long i t ud i na l D at a. Biomet ri c s, 69 ( 4) , 903 - 913. Shen, H . - W., W ang , D ., Song , C ., & B ar abá s i , A . - L. (2014) . Mode l i ng and pr edi c t i ng popu l ar i t y dy na m i cs v i a r ei n f or ce d Po i ss o n pr o ce s se s. arXi v pr ep ri n t arX i v: 1401.07 78 . Ste i nbec k , J . ( 1952 ) . Eas t o f Eden . N ew Y or k ,: V i k i ng Pre ss . Ste n t , G . ( 19 72) . P r em at ur i t y and uni quene ss i n s ci e nt i f i c di s cov er y . Sci e nt i f i c Am er i c an, 227 ( 6 ) , 84 - 93. T ha ck er , N . A ., & B r om i l e y , P. A . ( 2001) . T h e e f f e ct s of a s qua r e r oot t r ans f or m on a Poi ss on di s t r i bu t ed quant i t y . Tin a m emo, 10 , 2 001. v an de r Lin de, A. ( 2009 ) . A B ay es i an l a t en t v ar i a bl e ap pr oac h t o f un ct i ona l pr i nc i p al com ponen t s a n al y si s w i t h b i na r y and c ou nt d at a . AStA Adva nce s i n St a t i st i ca l Anal ys i s, 93 ( 3 ) , 307 - 333. V an R aa n, A . ( 2004 ) . Sl ee p i ng bea ut i e s i n sc i enc e. Sc i ent om e t r i cs , 59 ( 3 ) , 46 7 - 472. Wa ng , D ., Song , C ., & B ar abá s i , A . - L. (2013) . Q ua nt i f y i ng Long - T er m Sci ent i f i c I m pac t . Sc i en ce , 342 ( 615 4) , 1 27 - 132. R et r i ev ed f r om ht t p : / / w w w .sc i e nce m ag .or g / cont en t / 34 2/ 61 54/ 127. abs t r ac t Wa ng , J . - L., C hi ou, J . - M., & Mue l l e r , H . - G . ( 2015) . R ev i ew of f unc t i on al d at a a n al y si s. arXi v pr epr i nt arXi v : 1507 .05135 . Wa ng , J . ( 2 013) . C i t at i on t i m e window choi c e f o r r es e ar ch i m pac t ev al u at i on. S ci ent om e t r i cs , 94 ( 3 ) , 85 1 - 872. do i : 10. 1007 / s111 92 - 0 12 - 0775 - 9 Wa ng , J ., M ei , Y ., & H i ck s, D. (2014 ) . C om m ent on “ Q uant i f y i ng l ong - t er m sc i ent i f i c i m pac t ” . Sc i enc e, 345 ( 619 3) , 1 49 - 149. Wa ng , J ., T h i j s, B ., & G l än z el , W. ( 2015 ) . I nt er di sc i pl i nar i t y and I m pac t : D i st i nc t Eff e ct s of V a r i et y , B al anc e, a n d D i sp ar i t y . Plo s One, 10 ( 5) , e0127 298. do i : 10. 1371 / j o ur na l .po ne.012 7298 Wu, S., Mül l e r , H . - G ., & Zhang , Z . ( 2013) . Fun ct i ona l dat a ana l y si s f or poi n t p r oc es se s wi t h r ar e ev ent s. St at i s t i ca S i ni ca, 23 , 1 - 23. Y ao, F., Mü l l er , H . - G ., & Wa ng , J . - L. (2005) . Fun ct i onal d at a a na l y si s f or spa r s e l ong i t udi n al dat a. Jour na l o f t h e Am er i can St at i st i ca l Ass oci at i on, 10 0 ( 4 70) , 577 - 590. 31 T able 1 Me an s a nd m edi a ns o f se l e ct ed v ar i ab l e s by f our i de nt i f i ed c l us t e r s. n o r ma l - lo w n o r ma l - h ig h d ela ye d ev erg r ee n m ea n m ed ian m ea n m ed ian m ea n m ed ian m ea n m ed ian 3 - y ea r citatio n s 8 . 3 9 8 . 0 2 8 . 3 9 2 3 . 0 1 7 . 7 1 1 4 . 0 1 2 . 8 0 8 . 0 30 - y ea r citatio n s 5 6 . 5 9 4 7 . 0 1 0 6 . 7 0 7 9 . 0 2 1 6 . 9 6 1 5 0 . 5 6 0 2 . 6 7 3 5 5 . 0 R ef er e n ce s 2 6 . 2 8 2 2 . 0 2 7 . 5 8 2 0 . 0 3 2 . 6 8 2 6 . 5 3 0 . 2 2 2 2 . 0 P ag es 8 . 5 0 7 . 0 8 . 1 9 4 . 0 1 3 . 9 3 1 0 . 0 1 2 . 6 4 9 . 0 Au t h o r s 3 . 1 1 2 . 0 4 . 6 5 3 . 0 2 . 9 6 3 . 0 2 . 4 2 2 . 0 I n s tit u te s 1 . 5 5 1 . 0 1 . 7 7 1 . 0 1 . 5 1 1 . 0 1 . 3 6 1 . 0 C o u n tr ies 1 . 1 6 1 . 0 1 . 1 8 1 . 0 1 . 1 5 1 . 0 1 . 0 7 1 . 0 T able 2 C l ust er p ai r w i s e c om par i so n: W i l coxon r ank sum t es t s . ev erg r ee n – n o r ma l - lo w ev erg r ee n – n o r ma l - h ig h ev erg r ee n – d ela ye d d ela ye d – n o r ma l - lo w d ela ye d – n o r ma l - h ig h n o r ma l - h ig h – n o r ma l - lo w 3 y r citat io n s 0 . 7 1 - 7 . 7 7 - 3 . 0 2 1 1 . 1 - 1 1 . 8 1 2 9 . 7 3 0 y r citatio n s 1 0 . 9 8 9 . 1 6 5 . 4 7 2 0 . 3 1 1 1 . 4 9 1 6 . 7 7 R ef er e n ce s - 0 . 5 5 - 0 . 1 6 - 1 . 9 4 3 . 5 7 4 . 3 4 - 1 . 8 3 P ag es 1 . 8 9 3 . 8 0 . 4 2 3 . 3 3 6 . 2 3 - 6 . 5 4 Au t h o r s - 2 . 4 5 - 3 . 6 6 - 2 . 9 4 0 . 9 6 - 2 . 2 3 4 . 1 4 I n s tit u te s - 1 . 0 7 - 1 . 8 4 - 1 . 0 4 0 . 0 5 - 1 . 5 5 2 . 2 4 C o u n tr ies - 0 . 8 2 - 0 . 7 7 - 0 . 8 3 0 . 1 3 0 . 1 5 - 0 . 0 4 N um ber s a r e Z st a t i st i c s, bo l d num ber s a r e si g ni f i c ant a t p < .0 5. 32 T able 3 Ever gr ee n p ape r s ( N = 45 ) An n u al cita tio n s C u m u lati v e citatio n s Me an  Stan d ar d E r r Me an  Stan d ar d E r r 3 - y ea r citatio n s 7 . 0   1 . 2 1 2 . 8   2 . 1 5 - y ea r citatio n s 8 . 9   1 . 6 3 0 . 1   5 . 0 10 - y ea r citatio n s 1 2 . 6  2 . 8 8 3 . 9   1 5 . 0 15 - y ea r citatio n s 1 8 . 9   4 . 9 1 6 7 . 0   3 4 . 3 20 - y ea r citatio n s 2 5 . 9  7 . 2 2 8 3 . 8  6 5 . 6 25 - y ea r citatio n s 3 1 . 5      4 3 1 . 0      30 - y ea r citatio n s 3 6 . 8  1 0 . 3 6 0 2 . 7   1 4 4 . 5 T able 4 Log i t r eg r e ss i on ( N = 1699 ) B ein g ev erg r ee n s lo g it ( 1 ) ( 2 ) 3 - y ea r citatio n s ( ln ) - 2 . 5 4 5 [ 0 . 3 5 8 ] * * * 30 - y ea r citatio n s ( ln ) 3 . 7 0 0 [ 0 . 3 9 9 ] * * * R ef er e n ce s ( l n ) - 0 . 7 9 9 [ 0 . 2 8 3 ] * * * 0 . 3 5 7 [ 0 . 4 2 7 ] P ag es ( ln ) 1 . 1 5 8 [ 0 . 2 7 8 ] * * * 0 . 4 7 5 [ 0 . 4 3 8 ] Au t h o r s ( ln ) - 1 . 0 4 4 [ 0 . 4 5 3 ] * * - 0 . 0 1 6 [ 0 . 6 1 0 ] I n s tit u te s ( ln ) - 0 . 0 2 3 [ 0 . 8 0 1 ] - 0 . 1 6 1 [ 0 . 4 0 0 ] C o u n tr ies ( ln ) - 1 . 0 1 0 [ 1 . 2 3 0 ] - 2 . 6 6 5 [ 2 . 3 7 0 ] I n ter ce p t - 1 . 6 1 6 [ 1 . 1 1 1 ] - 1 3 . 7 9 7 [ 2 . 4 3 8 ] * * * R esid u al d ev ia n ce 3 9 1 . 8 1 6 8 . 9 ∆ R esid u al d e v ian ce 2 2 2 . 9 *** Sta nda r d er r or s i n b r a ck et s. *** p<.0 1, ** p <.05, * p< .10. 33 App e n d ix A. L ist o f top c ite d A P S journal p ap e r s This appe ndix re ports the lis t of the top 10 m ost ci ted A P S journa l pape rs i n Fig. 1 . T able 5 T op 10 m ost ci t ed A PS j ou r nal pap er s T it le J o urna l P ub lica t io n Yea r T o t a l cit a t io ns 1 Dev elo p m e n t o f t h e co lle - s al v etti c o r r elatio n - e n er g y f o r m u la in to a f u n ctio n al o f t h e elec tr o n - d en s it y P h y s ical R ev ie w B 1988 5 7 2 0 6 2 Gen er alize d g r ad ien t a p p r o x i m atio n m ad e s i m p le P h y s ical R ev ie w L etter s 1996 5 4 6 6 2 3 Den s i t y - f u n ctio n al ex c h a n g e - en er g y ap p r o x i m atio n w it h co r r ec t a s y m p to tic - b eh a v io r P h y s ical R ev ie w A 1988 3 2 0 9 5 4 E f f icien t iter ativ e s ch e m e s f o r ab in itio to tal - e n er g y ca lcu latio n s u s i n g a p la n e - w a v e b asis s e t P h y s ical R ev ie w B 1996 2 9 0 6 4 5 I n h o m o g e n eo u s elec tr o n g as P h y s ical R ev ie w B 1964 2 6 0 0 5 6 Sp ec ial p o in ts f o r b r illo u i n - zo n e in te g r atio n s P h y s ical R ev ie w B 1976 2 3 8 0 6 7 Fro m u ltra s o f t p s e u d o p o ten tia ls to th e p r o j ec to r au g m e n ted - w av e m et h o d P h y s ical R ev ie w B 1999 2 1 6 0 6 8 P r o j ec to r au g m e n ted - w a v e m e th o d P h y s ical R ev ie w B 1994 2 0 7 5 3 9 A cc u r ate an d s i m p le an a l y t ic r ep r esen tatio n o f th e elec tr o n - g as c o r r elatio n - en er g y P h y s ical R ev ie w B 1992 1 5 1 2 2 10 A b i n itio m o lec u lar - d y n a m ics f o r liq u id - m etal s P h y s ical R ev ie w B 1993 1 4 9 0 8 Note : t otal c it a ti ons a re r e trie ve d f rom W e b of Scie nc e onli ne int e r fa c e on F e bru a r y 27, 2017. App e n d ix B . Ro b u stnes s to alt e r ativ e c itat ion t h r e sh old s R e sult s re porte d in t he main tex t ar e ba se d on 169 9 pa pe rs w it h a t l e a st 30 c it a ti ons i n the f irst 30 y e a rs a fte r pub li c a ti on, we test whe ther th e re sult s a re robu st t o a lt e ra ti ve c h oice s of c it a ti on thre shol ds. W e re pli c a te d the a na l y se s usi n g two other da tase ts: ( a ) 220 3 p a pe rs w it h a t l e a st 20 tot a l citations and ( b) 10 78 pa pe rs w it h a t l e a st 50 tot a l citations. C lust e rin g r e sult s a re c onsi stent with wha t we r e port in t he main tex t ( Fig. 9 ). I ns e rt F i g . 9 h e re 0 10 20 30 40 50 0 2000 4000 6000 8000 Y ear Annual citations Figure 1: Annual citations of the top ten cited APS pap ers. One curv e represen ts one pap er, and details ab out these ten pap ers are rep orted in App endix A. 0 5 10 15 20 25 30 0 100 200 300 400 Y ear Annual citations 0 5 10 15 20 25 30 0 2000 4000 6000 Y ear Cumulativ e citations Figure 2: Ann ual and cumulativ e citations of four selected pap ers. One curve represents one selected pap er. The red, blue, purple, and green curves corresp ond to flash-in-the- p an , normal do cument , delaye d do cument , and ever gr e en resp ectively . 34 0 5 10 15 20 25 30 2 4 6 8 Mean curve year 0 5 10 15 20 25 30 0.5 1.0 1.5 First deriv ative of the mean cur ve year Figure 3: Mean function and its first deriv ativ e. 35 0 5 10 15 20 25 30 0.05 0.10 0.15 0.20 0.25 93.8% of variability y ear 0 5 10 15 20 25 30 −0.2 0.0 0.1 0.2 0.3 5.1% of variability y ear 0 5 10 15 20 25 30 −0.2 0.0 0.1 0.2 0.3 0.7% of variability 0 5 10 15 20 25 30 −0.2 0.0 0.2 0.4 0.2% of variability Figure 4: The first four eigenfunctions. 36 ● ● ● ● 20 40 60 80 Number of basis function Mean of MSE 2 3 4 5 Figure 5: Determining the n um b er of eigenfunctions. −2 0 2 4 6 8 10 0.00 0.05 0.10 0.15 0.20 0.25 0.30 log MSE Density WSB Our model ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0 2 4 6 8 10 −2 0 2 4 6 8 WSB Our model Figure 6: Go odness of fit. The left panel plots kernel densities of log MSEs. The righ t panel is a scatterplot, where one p oin t represents one pap er, and its X - and Y -axes are the log MSEs for the WSB mo del and our functional P oisson regression mo del resp ectiv ely . 37 0 5 10 15 20 25 30 0 100 200 300 400 year Cumulativ e citations 57.2% 26.7% 13.4% 2.6% 0 5 10 15 20 25 30 0 5 10 15 20 25 year Annual citations 57.2% 26.7% 13.4% 2.6% Figure 7: Clustering results: F our general t yp es of citation tra jectories. The red, blue, purple, and green curves represent normal-low , normal-high , delaye d , and ever gr e en pa- p ers resp ectively . 38 0 5 10 15 20 25 30 2 4 6 8 10 12 14 y ear Proposed method A2 67.0% 33.0% 0 5 10 15 20 25 30 0 5 10 15 20 25 30 y ear Proposed method A3 65% 32.2% 2.8% 0 5 10 15 20 25 30 0 5 10 15 20 25 y ear Proposed method A4 57.2% 26.7% 13.4% 2.6% 0 5 10 15 20 25 30 0 5 10 15 20 25 30 35 y ear Proposed method A5 48.6% 27.3% 18.4% 3.2% 2.5% 0 5 10 15 20 25 30 0 20 40 60 80 100 Proposed method A6 46.3% 23.8% 16.6% 9.6% 3.2% 0.5% 0 5 10 15 20 25 30 0 2 4 6 8 10 12 y ear Ann ual citations Proportion method B2 56.1% 43.9% 0 5 10 15 20 25 30 0 2 4 6 8 10 12 y ear Ann ual citations Proportion method B3 44.7% 30% 25.3% 0 5 10 15 20 25 30 0 2 4 6 8 10 12 y ear Ann ual citations Proportion method B4 36.7% 27.0% 19.7% 16.6% 0 5 10 15 20 25 30 0 2 4 6 8 10 12 y ear Ann ual citations Proportion method B5 30.3% 23.0% 17.6% 16.3% 12.8% 0 5 10 15 20 25 30 0 5 10 15 Ann ual citations Proportion method B6 24.9% 20.4% 17.9% 16.1% 13.0% 7.8% 0 5 10 15 20 25 30 0 20 40 60 80 100 y ear Ann ual citations Raw Annual method C2 99.5% 0.5% 0 5 10 15 20 25 30 0 100 200 300 400 y ear Ann ual citations Raw Annual method C3 98.3% 1.6% 0.1% 0 5 10 15 20 25 30 0 100 200 300 400 y ear Ann ual citations Raw Annual method C4 94.2% 5.1% 0.6% 0.1% 0 5 10 15 20 25 30 0 100 200 300 400 y ear Ann ual citations Raw Annual method C5 85.7% 12.1% 1.6% 0.5% 0.1% 0 5 10 15 20 25 30 0 100 200 300 400 Ann ual citations Raw Annual method C6 79.4% 17.0% 2.0% 1.0% 0.5% 0.1% Figure 8: Clustering results: K = 2-6, three metho ds. 39 0 5 10 15 20 25 30 0 5 10 15 20 59.9% 26.6% 11.2% 2.3% 0 5 10 15 20 25 30 0 5 10 15 20 25 30 35 Ann ual citations 51.6% 35.5% 9.2% 3.7% Figure 9: Clustering results: Alterative citation thresholds. 40

Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Leave a Comment