On the relationship between set-based and network-based measures of gender homophily in scholarly publications
There is an increased interest in the scientific community in the problem of measuring gender homophily in co-authorship on scholarly publications (Eisen, 2016). For a given set of publications and co-authorships, we assume that author identities hav…
Authors: Y. Samuel Wang, Elena A. Erosheva
On the relationship b et w een set-based and net w ork-based measures of gender homophily in sc holarly publications W orkin g P ap er no. 157 Cen ter for Stat istics and the So cia l Sciences Universit y of W ashingto n Y. S a m uel W ang 1 and Elena A . Eroshev a 1,2,3 1 Department of Stati stics 2 Sc ho o l o f So ci al W or k 3 Cen ter for Stati stics and the So cia l Sciences Octob er 26, 2016 Abstract There is an increased in terest in the scien tific communit y in the problem of measuring gender homophily in co-authorship on sc holarly publications (Eisen, 2016). F or a giv en set of pub- lications and co-authorships, w e assume that author iden tities ha ve not b een disam biguated in that w e do not kno w when one person is an author on more than one pap er. In this case, one w ay to think abo ut meas uring gender homophily is to consider all observ ed co- authorship pairs and obtain a set-based gender homophily co efficien t (e.g., Bergstrom et al., 2016). Another w a y is to consider pa p ers as observ ed disjoin t net w orks of co-authors a nd use a net w ork-based a ssortativit y coefficien t ( e.g., Newman, 2003). In this note, w e review b oth metrics and sho w t hat the gender homophily set-based index is equiv alen t to the gender assortativit y net work-based co efficien t with prop erly w eigh ted edges. KEY W ORDS: homophily; gender bias; so cial net w orks; a ssortativit y; coauthorship 1 In tro duction The phenome non of individuals with similar c haracteristics mor e likely to f orm ties than individuals with dissimilar c haracteristics is know n as assortativit y or homophilly . Wh en studying patterns in co- authorship in scien tific publications, researc hers typically consider sets o f co-a uthorship o ccurrences and corresp onding individual characteristics fo r some col- lection of pap ers, and measure homophily on a set of co-autho rship o ccurrences. F or example, previous studies in economics, an a cademic field dominated b y men, ha v e found evidence for gender-based homophily – the principle that similarity breeds connection b etw een individu- als (McPherson et al., 2001) – in coautho rship. An earlier study that analyzed publications from a cohort sample of 178 PhDs in economics found that wome n were more than five times more like ly than men t o ha v e w omen co-authors (McDo we ll a nd Smith, 1992). A recen t study that analyzed coauthor teams from 3,09 0 articles in the top three economics journals b etw een 1 991-2002 has fo und evidence in fav or of gender-based ho mophily in team formation a t the subfield lev el (Bosc hini and Sj¨ ogren, 2007). Giv en a set of co-authorship o ccurrences, Bergstrom (2003) suggest using a co efficien t of homophily α for set-based data where individuals tak e on a binary c haracteristic. The co efficien t of homophily α has a simple and intuitiv e in terpretation: the difference b et wee n the probability that a randomly c hosen coauthor of a man is a man and the probability that a randomly c hosen coauthor o f a woman is a man. Bergstrom et a l. (2016) 1 sho w that it is equal to the observ ed coauthor-gender correlation in the given collection of pa p ers, and, in case of t wo-author pap ers, it is equal to Sew ell W right’s co efficien t o f inbre eding (W r ig h t, 1949). In the presence of t ies b etw een individuals, a no ther natura l w a y to think a b out assortativit y is through net w orks. Th us, assortativ e interactions ha ve b een studied in bio lo gical net works (Pira v eenan et al., 2012), net w orks among animals and fish (Lusseau and Newman , 2004; Croft et al., 2 005), and so cial net w orks in hum ans (F oster et al., 2010; Riv era et al., 2010). V arious metrics ha v e been pro p osed to measu re the assortativity within an observ ed net- w ork, including Newman’s (2003) net w ork-based assortativit y co efficien t where individuals are a ssigned a single categorical c hara cteristic. In this pap er, w e consider a common scenario when gender indicators are know n for coauthors on a set of publications but the author iden tities hav e not b een disambiguated. W e describ e a set-based gender homophily co efficien t (e.g., Bergstrom et al., 201 6) and sho w that it is equiv alent to the netw ork-based assortativity co efficien t (e.g., Newman, 200 3) when edges 1 http:/ /eigen factor.org/gender/assortativity/measuring_homophily.pdf 1 (a) Ass ortativ e net w ork; r = . 71 and α = . 78 (b) Disassortativ e net work; r = − . 13 and α = − . 33 within each pap er ar e w eigh ted inv ersely prop ortional to the n um b er of co-authors on a pap er. 2 Newman’s Meas ure of Assortati vit y W e first consider a measure of assortativit y defined by Newman (2003) whic h explicitly assumes a netw ork ba sed represen tation. When the relational data is represen ted as a gra ph, eac h individual is represen ted as a no de and edges b etw een nodes indicate a relationship b et w een the t w o no des. If t he relatio nship is asymme tric, t he edges ma y b e directed, or if the relationship is symmetric an undirected edge may b e used. Assum ing we ha v e i = 1 , 2 , . . . K groups and that eac h individual in our sample b elongs to a single group, let e ij b e the prop ortion o f all edges whic h p oin t from an individual in category i to an individual in category j , a i b e the prop ortion of all edges whic h p oin t from an individual in i and b i b e the prop ortion of all edges whic h p oint to a n individual in category i . Then, r = P i e ii − P i a i b i 1 − P i a i b i . (1) When there is no observ ed assortativity , r = 0; when individuals form ties exclusiv ely with other individuals with the same c haracteristic r = 1 ; and when the net w ork is p erfectly dissasortiv e (eac h no de is only connected t o no des of differen t c haracteristics) r is negative and b ounded b elow b y -1 (Newman, 2003) . 3 Bergstr om’s α F or set-based data where individuals take on a binary c haracteristic, Bergstrom (2003) pro- p oses an index of a ssortat ivity α b y constructing a difference in risks. More formally , supp ose 2 w e ha ve a single c haracteristic whic h is either p ositiv e of negative. Let p b e the probability that a randomly selected tie of a ra ndomly selected p ositiv e individual connects t o another p ositiv e individual. Let q b e the probability that a r andomly selected tie o f a randomly se- lected negativ e individual connects to a p ositiv e individual. W e then define the assortativity measure as the difference of these risks α = p − q . Because α is the difference b etw ee n tw o probabilities, it mus t lie in [-1, 1]. The low er b o und of - 1 is only a c hiev ed in the extreme case where ev ery individual has exactly one tie to an individual of the opp osite c haracteristic, and the upp er b ound is only ac hiev ed in the extreme case where individuals f orm ties exclusiv ely with others with the same c haracteristic. Define N + and N − to b e the set o f all individuals with the p ositive and negativ e c hara cter- istics eac h o f size | N + | and | N − | resp ectiv ely . Let π s and ν s b e the n um b er of ties t o p ositiv e and negativ e individuals for individual s . Finally let K ⋆ b e the size of the largest clique in the net w ork (b y assumption this is also equiv alen t to the max degree of the net w ork), [ K ⋆ ] denote the set { 0 , 1 , ...K } and n ij denote the n um b er of cliques with i p ositiv e individuals and j negativ e individuals. W e can then calculate α for a giv en net w or k- α = 1 | N + | X s ∈ N + π s π s + ν s − 1 | N − | X s ∈ N − π s π s + ν s = 1 | N + | X i,j ∈ [ K ⋆ ] × [ K ⋆ ] n ij i − 1 i + j − 1 − 1 | N − | X i,j ∈ [ K ⋆ ] × [ K ⋆ ] n ij i i + j − 1 . (2) The first form ulatio n arises explicitly from the difference o f risks interpretation of α , while the second formulation pro vides a computationally con v enien t wa y to calculate α from the sufficien t statistics n ij . 4 Equiv ale nce of α and r Because Newman’s r is a function o f edge counts , individuals with hig her degrees (n um b er of edges) will influence the calculation of r more than individuals with few er co-authors. On the other hand, α explicitly places equal w eigh t on each individual. Th us, although r and α b oth measure assortativit y , they are not equiv alen t in general. Ho w ev er, it can b e sho wn that in a carefully specified net w ork represen tation, Newman’s r is equal to α . This work is motiv ated by the study of gender assortativity within co-authorships. In this case, the authors on eac h pap er form a clique and the entire netw ork is comp osed of disjoin t cliques (a subset of no des in whic h ev ery no de is connected to ev ery other no de; see Figure 1). Ho w ev er, this result holds for an y graph in which all edges are recipro cated (an edge fro m s → t implies there is also an edge from t → s ) . 3 Figure 1: Net w ork of disjoin t cliques. Eac h clique migh t repre sen t an article and edges represen t co-authorships. Note that w e assume each individual has at least edge. Sp ecifically , w e construct a netw ork G = { V , E } , where V is the set of all individuals a nd E ⊆ V × V denote the set of directed edges. If t wo individuals are tied, there a re t w o directed edges, so that when individuals s and t are connected, b oth s → t and s ← t are in E . Theorem 1 In the gr aph G = { V , E } , if e ach outgoing e dge is w eighte d i nversely to the no de de gr e e, then Newman ’s r is e qual to Ber gstr om’s α In tuitiv ely , w e can see that t his is t r ue b ecause down w eigh ting the edges of authors with man y co-authors results in eac h author b eing coun ted equally r ega rdless of the n umber of co-authors. The follo wing coro lla ry is a direct result of Theorem 1. Corollary 2 If every c lique has the same numb er of in dividuals, then in the gr aph G = { V , E } wher e e ach e dge has w eight 1 Newman ’s r is e qual to Ber gstr o m’s α 5 Pro o f of Theorem 1 Let V b e t he set of individuals, and N + and N − b e the set of p o sitive and negative individuals resp ectiv ely . Let π s b e the n um b er of edges from individual s to a p ositiv e individual and ν s b e the num ber o f edges from individual s to a negativ e individual. Let K s = π s + ν s denote the out- degree for individual s . Finally , let Z s denote the w eigh t of edges for no de s . Note that e + − = e − + since for ev ery edge from a p ositiv e to a negativ e individual, there m ust b e the corresp onding edge bac k form the negativ e to the p o sitiv e. Thes e quantities can b e o r ganized in a jo in t distribution table (same as T able 1 in Newman). 4 + − M ar g inal + e ++ e + − a + = e ++ + e + − − e − + e −− a − = e − + + e −− M ar g inal b + = e ++ + e − + b − = e + − + e −− (3) where the marginal quan tities a and b a re simply the ro w and column sums resp ectiv ely . Note tha t since e + − = e − + , a i = b i . Because we hav e only t w o groups and since the table is symmetric, then w e ha v e r = P i e ii − P i a i b i 1 − P i a i b i = e ++ + e −− − a 2 + − a 2 − 1 − a 2 + − a 2 − (4) W e then define t he following quan tities: W eighted sum of + → + edges = X s ∈ N + Z s π s W eighted sum of − → − edges = X s ∈ N − Z s ν s W eighted sum of + → − edges = X s ∈ N + Z s ν s W eighted sum of + outgoing edges = X s ∈ N + Z s ( π s + ν s ) W eighted sum of − outgoing edges = X s ∈ N − Z s ( π s + ν s ) (5) where the w eigh ted prop o rtion of the edges e ++ , e + − , e −− , a + , a − are the quan tities ab ov e normalized b y the tota l w eigh t o f all edges X s ∈ N + Z s π s + X s ∈ N + Z s ν s + X s ∈ N − Z s π s + X s ∈ N − Z s ν s = X s ∈ N + Z s π s + 2 X s ∈ N + Z s ν s + X s ∈ N − Z s ν s The simplification in the tota l w eigh t uses t he assumption that eac h recipro cated so P s ∈ N + Z s ν s = P s ∈ N − Z s π s . 5 First w e simplify the n umerator of r fro m equation 4- X s ∈ N + Z s π s + 2 X s ∈ N + Z s ν s + X s ∈ N − Z s ν s 2 e ++ + e −− − a 2 + − a 2 − = X s ∈ N + Z s π s + 2 X s ∈ N + Z s ν s + X s ∈ N − Z s ν s X s ∈ N + Z s π s + X s ∈ N − Z s ν s + X s ∈ N + Z s π s + X s ∈ N + Z s ν s 2 − X s ∈ N − Z s ν s + X s ∈ N + Z s ν s 2 = X s ∈ N + Z s π s 2 + 2 X s ∈ N + Z s ν s X s ∈ N + Z s π s + 2 X s ∈ N − Z s ν s X s ∈ N + Z s π s + 2 X s ∈ N + Z s ν s X s ∈ N − Z s ν s + X s ∈ N − Z s ν s 2 − X s ∈ N + Z s π s 2 − 2 X s ∈ N + Z s π s X s ∈ N + Z s ν s − X s ∈ N + Z s ν s 2 − X s ∈ N − Z s ν s 2 − 2 X s ∈ N + Z s ν s X s ∈ N − Z s ν s − X s ∈ N + Z s ν s 2 = 2 X s ∈ N − Z s ν s X s ∈ N + Z s π s − X s ∈ N + Z s ν s 2 (6) No w considering the denominator from equation 4, X s ∈ N + Z s π s + 2 X s ∈ N + Z s ν s + X s ∈ N − Z s ν s 2 1 − a 2 1 − a 2 2 = X s ∈ N + Z s π s + 2 X s ∈ N + Z s ν s + X s ∈ N − Z s ν s 2 − X s ∈ N + Z s π s + X s ∈ N + Z s ν s 2 − X s ∈ N − Z s ν s + X s ∈ N + Z s ν s 2 = X s ∈ N + Z s π s 2 + 4 X s ∈ N + Z s ν s 2 + X s ∈ N − Z s ν s 2 + 4 X s ∈ N + Z s ν s X s ∈ N + Z s π s + 4 X s ∈ N + Z s ν s X s ∈ N − Z s ν s + 2 X s ∈ N − Z s ν s X s ∈ N + Z s π s − X s ∈ N + Z s π s 2 − 2 X s ∈ N + Z s π s X s ∈ N + Z s ν s − X s ∈ N + Z s ν s 2 − X s ∈ N − Z s ν s 2 − 2 X s ∈ N − Z s ν s X s ∈ N + Z s ν s − X s ∈ N + Z s ν s 2 = 2 X s ∈ N + Z s ν s 2 + 2 X s ∈ N + Z s π s X s ∈ N + Z s ν s + 2 X s ∈ N + Z s ν s X s ∈ N − Z s ν s + 2 X s ∈ N − Z s ν s X s ∈ N + Z s π s = 2 X s ∈ N + Z s ν s X s ∈ N + Z s ν s + X s ∈ N + Z s π s + 2 X s ∈ N − Z s ν s X s ∈ N + ν s + X s ∈ N + Z s π s = 2 X s ∈ N + Z s ν s + X s ∈ N − Z s ν s X s ∈ N + Z s ν s + X s ∈ N + Z s π s (7) Recall that P s ∈ N − π s = P s ∈ N + ν s . Simplifying t he n umerator and denominator together 6 yields- r = P s ∈ N − Z s ν s P s ∈ N + Z s π s − P s ∈ N + Z s ν s 2 P s ∈ N + Z s π s + P s ∈ N + Z s ν s P s ∈ N − Z s ν s + P s ∈ N + Z s ν s = P s ∈ N + Z s π s P s ∈ N − Z s ν s + P s ∈ N + Z s ν s + P s ∈ N − Z s ν s P s ∈ N + Z s π s + P s ∈ N + Z s ν s P s ∈ N + Z s π s + P s ∈ N + Z s ν s P s ∈ N − Z s ν s + P s ∈ N + Z s ν s − P s ∈ N + Z s π s + P s ∈ N + Z s ν s P s ∈ N − Z s ν s + P s ∈ N + Z s ν s P s ∈ N + Z s π s + P s ∈ N + Z s ν s P s ∈ N − Z s ν s + P s ∈ N + Z s ν s = P s ∈ N + Z s π s P s ∈ N + Z s π s + P s ∈ N + Z s ν s + P s ∈ N − Z s ν s P s ∈ N − Z s ν s + P s ∈ N + Z s ν s − 1 = P s ∈ N + Z s π s P s ∈ N + Z s π s + P s ∈ N + Z s ν s − P s ∈ N − Z s π s P s ∈ N + Z s ν s + P s ∈ N − Z s ν s (8) When Z s = c K s for some constant c , t he denominators simplify X s ∈ N + Z s π s + X s ∈ N + Z s ν s = X s ∈ N + c K s ( π s + ν s ) = X s ∈ N + c = c | N + | (9) X s ∈ N − Z s ν s + X s ∈ N + Z s ν s = X s ∈ N − Z s ν s + X s ∈ N − Z s π s = X s ∈ N − c K s ( π s + ν s ) = X s ∈ N − c = c | N − | (10) and r = P s ∈ N + c K s π s c | N + | − P s ∈ N − c K s π s c | N − | = 1 | N + | X s ∈ N + π s K s − 1 | N − | X s ∈ N − π s K s = α (11) Th us, Newman’s assortativit y co efficien t, when o btained from a net w ork of co-authorships where edges are w eigh ted inv ersely prop ortional to the n um b er of co-authors, is equal to α . F or a pro of of Corollary 2, if all aut ho rs hav e the same n umber o f co-autho rs K, then w e let c = K so that Z s = c K s = K K = 1 and all edges hav e w eight 1. 6 Conclus ion Scien tists hav e b ecome increasingly a w are of the gender im balances presen t in prof essional academic activities (e.g., W est et al., 201 3), and ha ve raised t he issue of imp ortance of prop er analysis and measuremen t (Eisen , 2016). In this note, b y sho wing under what circumstances Bergstrom’s α and Newman’s r are equiv alen t, we also hop e to highlight ho w they differ. In particular, w e note that authors with many co-authors (o r equiv alently pap ers with man y 7 authors) hav e a greater effect on the originally prop osed Newman’s r (with all edge w eigh ts equal) than they would ha v e in Bergstrom’s α . This ma y or may no t b e desirable and should b e considered carefully dep ending on the sp ecific con text. References Bergstrom, T., Bergstrom, M., King, M., Jacquet, J., W est, J., and Correll, S. (2016) . A note on measuring gender homophily among sc holarly authors. http://eige nfactor.org/gender/assortativity/measuring_homophily.pdf . Bergstrom, T. C. (2003 ) . The algebra o f assortativ e encounters a nd the ev olution of co op- eration. International Game The ory R eview , 5(03):211 –228. Bosc hini, A. and Sj¨ ogren, A. (20 07). Is team formation g ender neutral? Evidence from coauthorship pa t t erns. Journal of Lab or Ec on omics , 25(2):325 –365. Croft, D ., Ja mes, R., W ar d, A., Botham, M., Mawd sley , D., and Kra use, J. (2005). Assor- tativ e in teractions and so cial netw orks in fish. Oe c olo gia , 14 3 (2):211–219 . Eisen, M. ( 2 016). Exploring the relat io nship b et w een gender and a uthor order and comp osi- tion in NIH-funded r esearc h. http://www.mic haeleisen.org/blog/?p=1931#respo nd . F oster, J. G ., F oster, D. V., Grassb erger, P ., a nd Paczus ki, M. (2010). Edge direction and the structure of net w orks. Pr o c e e dings of the National A c ademy of Scienc es , 107(2 4):10815– 10820. Lusseau, D. and Newman, M. E. J. (2004). Iden tifying the role that animals pla y in their so- cial net w orks. Pr o c e e dings of the R oyal So ciety of L ondon B : Biolo gic a l S cienc es , 271(Suppl 6):S477–S481 . McDo w ell, J. M. and Smith, J. K. (1 992). The effect of g ender-sorting on pro p ensit y to coauthor: Implications for academic promotion. Ec onomic I nquiry , 30(1):68– 82. McPherson, M., Smith- Lo vin, L., and Co ok, J. M. (2001). Birds of a feather: Homophily in so cial net works . Annual Review of So ciolo gy , pa ges 415– 4 44. Newman, M. (2003 ). Mix ing pa t t erns in net w orks. Physic al R eview E , 67( 2 ). Pira v eenan, M., Prok op enk o, M., and Zoma ya, A. (201 2 ). As sortative mixing in directed biological netw orks. IEEE/ACM T r an s. Comput. Biol. Bioinformatics , 9(1):66 –78. 8 Riv era, M. T., So derstrom, S. B., a nd Uzzi, B. (2 010). Dynamics of dy ads in so cial netw orks: Assortativ e, relational, and pro ximit y mec hanisms. annual R eview of So ciolo gy , 3 6:91–115. W est, J. D ., Ja cquet, J., King, M. M., Correll, S. J., and Bergstrom, C. T. (2013 ) . The role of gender in sc holarly authorship. PloS one , 8(7):e66212 . W r igh t, S. (1949) . The genetical structure of p opulations. Annals of eugenics , 15(1):323 –354. 9
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment