Loss of information in feedforward social networks

Loss of information in feedforw ard so cial net w orks Simon Stolarczyk 1 , Manisha Bhardwa j 1 , Kevin E. Bassler 1 , 2 , 3 , W ei Ji Ma 4 , Kre ˇ simir Josi´ c 1 , 5 , 6 , ∗ Abstract W e consider model so cial net works in whic h information propagates directionally across la y ers of rational agen ts. Eac h agent makes a lo cally optimal estimate of the state of the w orld, and comm unicates this estimate to agents do wnstream. When agen ts receiv e information from the same source their estimates are correlated. W e sho w that the resulting redundancy can lead to the loss of information ab out the state of the w orld across lay ers of the netw ork, even when all agents ha v e full kno wledge of the netw ork’s structure. A simple algebraic condition iden tiﬁes net works in which information loss o ccurs, and we sho w that all suc h net works m ust con tain a particular netw ork motif. W e also study random net works asymptotically as the n umber of agents increases, and ﬁnd a sharp transition in the probabilit y of information loss at the p oin t at which the n umber of agents in one la y er exceeds the n umber in the previous la yer. Keywor ds: So cial netw orks, random graphs, information propagation, net works, graphs, Ba yesian agents 1 Departmen t of Mathematics, Universit y of Houston, Houston, TX 77204 2 Departmen t of Physics, Universit y of Houston, Houston, TX 77204 3 T exas Cen ter for Sup erconductivit y , Universit y of Houston, 77204 4 Cen ter for Neural Science and Departmen t of Psychology , New Y ork Universit y , New Y ork, NY 10003 5 Departmen t of Biology and Bio c hemistry , Universit y of Houston, Houston, TX 77204 6 Departmen t of BioSciences, Rice Universit y , Houston, TX 77251 1. In tro duction While there are billions of people on the planet, w e exc hange information with only a small fraction of them. How does information propagate through suc h so cial netw orks, shap e our opinions, and inﬂuence our decisions? How do our in teractions impact our choice of career or candidate in an election? More generally , ho w do we as agen ts in a netw ork aggregate noisy signals to infer the state of the w orld? These questions ha v e a long history . The general problem is not easy to describ e using a tractable mathematical mo del, as it is diﬃcult to pro vide a reasonable probabilistic description of the state of the w orld. W e also lac k a full understanding of how p erception (Brun ton et al., ∗ Corresp onding Author: josic@math.uh.edu Pr eprint submitte d to Elsevier Octob er 30, 2018 2013; Beck et al., 2012), and the information we exc hange (Bahrami et al., 2010) shap es our decisions. Progress has therefore relied on tractable idealized mo dels that mimic some of the main features of information exc hange in so cial net works. Early mo dels relied on computationally tractable in teractions, such as the ma jorit y rule assumed in Condorcet’s Jury Theorem (Condorcet, 1976), or lo cal a v eraging assumed in the DeGro ot mo del (DeGro ot, 1974). More recen t mo dels relied on the assumption of rational agen ts – mostly Ba yesian agents who use priv ate signals and measurements (observ ations) of eac h other’s actions to maximize utility . These rational mo dels are generally either sequential or iterativ e. In sequen tial mo dels, agents are ordered and act in turn based on a priv ate signal and the observ ed action of their predecessors (Banerjee, 1992; Bikhc handani et al., 1992). In iterativ e mo dels, agen ts make a measuremen t, and then iteratively exc hange information with their neigh b ors (Gale and Kariv, 2003; Mossel et al., 2014). Sequen tial mo dels hav e b een used to illustrate information cascades (Bikhc handani et al., 1998), while iterative mo dels ha ve b een used to illustrate agreement and learning (Mossel and T amuz, 2014). Here w e consider a sequen tial mo del in whic h information propagates directionally through la yers of rational agen ts. The agen ts are part of a structured net w ork, rather than a simple c hain. As in the sequential mo del, we assume that information transfer is directional, and the recipien t do es not communicate information to its source. This assumption could describ e the propagation of information via prin t or any other ﬁxed medium. W e assume that at eac h step, a la yer of agen ts receive information from those in a previous la yer. This is diﬀerent from previous sequential mo dels where agents receiv ed information in turn from all their predecessors as in Banerjee (1992); Easley and Klein b erg (2010); W elch (1992) and Bharat and Mihaila (2001). Imp ortan tly , the same information can reac h an agen t via multiple paths. Therefore, information received from agen ts in the previous la yer can be redundan t. W e show that, dep ending on the netw ork structure, rational agents with full knowledge of the net work structure cannot alw a ys resolv e such redundancy . As a result, an estimate of the state of the world can degrade o ver la yers. W e also show that netw ork arc hitectures that lead to information loss can amplify an agen t’s bias in subsequent lay ers. As an example, consider the netw ork in Fig. 1.1(a). W e assume that the ﬁrst-la yer agen ts mak e measurements x 1 , x 2 , and x 3 of the state of the w orld, s , and that these measurements are normally distributed with equal v ariance. Eac h agent makes an estimate, ˆ s (1) 1 , ˆ s (1) 2 , and ˆ s (1) 3 , of s . The sup erscript and subscript refer to the lay er and agent num b er, resp ectiv ely . An agen t with global access to all ﬁrst-lay er estimates w ould b e able to make the optimal (minim um-v ariance) estimate ˆ s ideal = 1 3  ˆ s (1) 1 + ˆ s (1) 2 + ˆ s (1) 3  of s . All agen ts in the ﬁrst lay er then comm unicate their estimates to one or b oth of the second- la yer agen ts. These in turn use the receiv ed information to make their own estimates, ˆ s (2) 1 = 1 2 ( ˆ s (1) 1 + ˆ s (1) 2 ) and ˆ s (2) 2 = 1 2 ( ˆ s (1) 2 + ˆ s (1) 3 ). An agen t receiving the tw o estimates from the second la yer then takes their linear combination to estimate s . Ho wev er, in this netw ork no linear com bination of the lo cally optimal estimates, ˆ s (2) 1 and ˆ s (2) 2 , equals the b est estimate, ˆ s ideal , obtainable from all measuremen ts in the ﬁrst lay er. Indeed, ˆ s = β 1 ˆ s (2) 1 + β 2 ˆ s (2) 2 = β 1  ˆ s (1) 1 + ˆ s (1) 2  + β 2  ˆ s (1) 2 + ˆ s (1) 3  6 = ˆ s ideal = 1 3  ˆ s (1) 1 + ˆ s (1) 2 + ˆ s (1) 3  , with the inequality holding for any choice of β 1 , β 2 . Moreo ver, assume the estimates of ﬁrst- la yer agen ts are biased, and ˆ s (1) i = x i + b i . If the the other agen ts are unaw are of this bias, 2 a (1) 1 (1) 2 (1) 3 (2) 1 (2) 2 a a a a a (1) 1 (1) 2 (1) 3 a a (2) 2 a (2) 1 a La y er 1 La y er 2 (2) 3 a (a) (b) Fig. 1.1: Illustration of the general setup. Agen ts in the ﬁrst lay er mak e measuremen ts, x 1 , x 2 , and x 3 , of a parameter s . In each lay er agen ts make an estimate of this parameter, and communicate it to agents in the subsequent la yer. W e sho w that information ab out s degrades across la yers in the netw ork in panel (a), but not in the netw ork in (b). then, as we will show, the ﬁnal estimate is ˆ s = ( 1 4 , 1 2 , 1 4 ) · ( ˆ s (1) 1 + b 1 , ˆ s (1) 2 + b 2 , ˆ s (1) 3 + b 3 ) = ( 1 4 , 1 2 , 1 4 ) · ˆ s (1) + ( 1 4 , 1 2 , 1 4 ) · ( b 1 , b 2 , b 3 ) . Th us the bias of the second agen t in the ﬁrst la yer, a (1) 2 , has disprop ortionate weigh t in the ﬁnal estimate. In this example the information ab out the state of the w orld, s, av ailable from second- la yer agents is less than that av ailable from ﬁrst-lay er agents. In the preceding example the measuremen t x 2 is used by b oth agen ts in the second lay er. The estimates of the t w o second- la yer agents are therefore correlated, and the ﬁnal agent cannot disentangle them to recov er the ideal estimate. W e will show that the type of subgraph sho wn in Fig. 1.1(a), which we call a W-motif , pro vides the main obstruction to obtaining the b est estimate in subsequen t la yers. 2. The Mo del W e consider feedforw ard netw orks ha ving n la yers and identify each no de of a net work with an agent. The structure of the net work is thus giv en by a directed graph with agen ts o ccup ying the vertices. Agents in eac h lay er only communicate with those in the next lay er. F or conv enience, w e will assume that la yer n consists of a single agent that receives information from all agen ts in la yer n − 1 (Fig. 1.1). This ﬁnal agent in the last la yer therefore mak es the b est estimate based on all the estimates in the next-to-last lay er. W e will use this last agen t’s estimate to quan tify information loss in the netw ork. W e assume that all agents are Ba yesian, and know the structure of the netw ork. Every agen t estimates an unknown parameter, s ∈ R , but only the agents in the ﬁrst lay er mak e a measurement of this parameter. Eac h agen t makes the b est p ossible estimate giv en the information it receiv es and communicates this estimate to a subset of agents in the next la yer. W e also assume that measuremen ts, x i , made by agen ts in the ﬁrst lay er are normally distributed and independent, x i ∼ N ( x | s, σ 2 i ) . F urthermore, ev ery agent in the net w ork knows the v ariance of eac h measurement in the ﬁrst lay er, σ 2 i . Also, for simplicity , we will assume 3 that all agen ts share an improper, ﬂat prior o ver s . This assumption do es not aﬀect the main results. An agen t with access to all of the measurements, { x i } i , has access to all the informa- tion av ailable ab out s in the netw ork. This agent can mak e an ideal estimate, ˆ s ideal = argmax s p ( s | x 1 , ..., x n ). W e assume that the actual agents in the net work are making lo cally optimal, maximum-lik eliho od estimates of s , and ask when the estimate of the ﬁnal agent equals the ideal estimate, ˆ s ideal . Individual Estimate Calculations. Eac h agent in the ﬁrst la yer only ha ve access to its own measuremen t, and mak es an estimate equal to this measuremen t. W e therefore write ˆ s (1) i = x i . W e denote the j th agen t in lay er k b y a ( k ) j . Eac h of these agents makes an estimate, ˆ s ( k ) j of s , using the estimates communicated by its neighbors in the previous la yer. Under our assumptions, the p osterior computed b y an y agen t is normal and the vector of estimates in a la yer follows a m ultiv ariate Gaussian distribution. As agents in the second lay er and b ey ond can share upstream neighbors, the cov ariance b et ween their estimates is typically nonzero. W e sho w that under the assumption that the v ariance of the initial measurements and the structure of the net work is kno wn to all agents, eac h agen t knows the full join t p osterior distribution o ver s for all agents it receives information from. Weight Matric es. W e deﬁne the connectivit y matrix C ( k ) for 1 ≤ k ≤ n − 1 as, C ( k ) ij = ( 1 , if a ( k ) j comm unicates with a ( k +1) i 0 , otherwise. (2.1) An agent receives a subset of estimates from the previous la yer determined b y this connectivit y matrix. The agent then uses this information to mak e its o wn, maximum-lik eliho od estimate of s . By our assumptions, this estimate will b e a linear combination of the comm unicated estimates (Ka y, 1993). Denoting b y ˆ s ( k ) the v ector of estimates in the k th la yer, we can therefore write ˆ s ( k +1) i = w ( k +1) i · ˆ s ( k ) , and ˆ s ( k +1) = W ( k +1) ˆ s ( k ) . Here W ( k +1) is a matrix of w eights applied to the estimates in the k th la yer. Weighting by Pr e cision. W e can write ˆ s (1) = W (1) x where W (1) is the identit y matrix and x is the v ector of measurements made in the ﬁrst lay er. W e assume that all measuremen ts hav e ﬁnite, nonzero v ariance. Deﬁning w i := 1 σ 2 i , w e can calculate W (2) en trywise: w (2) ij is 0 if agen t a (2) i do es not communicate with a (1) j . Otherwise w (2) ij = w (1) j P k → i w (1) k , where the sum is tak en ov er all agen ts in the ﬁrst lay er that communicate with agent a (2) i . Therefore, ˆ s (2) = W (2) ˆ s (1) = W (2) W (1) x . (2.2) Covarianc e Matric es. The estimates in the second lay er and b ey ond can b e correlated. Let L k b e the n umber of agents in the k th la yer and for 2 ≤ k ≤ n − 1 deﬁne Ω ( k ) = ( ξ ( k ) ij ) as the L k × L k co v ariance matrix of estimates in the k th la yer, ξ ( k ) ij = Co v( ˆ s ( k ) i , ˆ s ( k ) j ) . 4 When all of the w eights are known, we hav e ˆ s ( k ) = W ( k ) ˆ s ( k − 1) = W ( k ) W ( k − 1) ˆ s ( k − 2) = · · · = k − 2 Y l =0 W ( k − l ) ! ˆ s (1) . (2.3) The i th ro w of  Q k − 2 l =0 W ( k − l )  is the vector of w eights that the agen t a ( k ) i applies to the ﬁrst-la yer estimates, since its entries are the co eﬃcien ts in s ( k ) i . The complete co v ariance matrix, Ω ( k ) , can therefore b e written as Ω ( k ) = Co v( ˆ s ( k ) ) = Cov( W ( k ) ˆ s ( k − 1) ) = W ( k ) Co v( ˆ s ( k − 1) )  W ( k )  T (2.4) = k − 2 Y l =0 W ( k − l ) ! Co v( ˆ s (1) ) k − 2 Y l =0 W ( k − l ) ! T = k − 2 Y l =0 W ( k − l ) ! Diag  1 w 1 , ..., 1 w L 1  k − 2 Y l =0 W ( k − l ) ! T . No w the i th agen t in la yer k ≥ 3, a ( k ) i , can use Ω ( k − 1) to calculate w ( k ) i . If the agen t is not connected to all agents in the ( k − 1) th la yer, it uses the submatrix of Ω ( k − 1) with rows and columns corresp onding to the agen ts in the previous la yer that comm unicate their estimates to it. W e denote this submatrix R ( k − 1) i . As in Mossel and T amuz (2010), we assume that we remo ve edges from the graph so that all submatrices R ( k − 1) i are inv ertible, but all estimates are the same as in the original net work. An agen t th us receiv es estimates that follo w a m ultiv ariate normal distribution, N ( ˆ s ( k − 1) j → i , R ( k − 1) i ), see Kay (1993). The w eights assigned b y agen t a ( k ) i to the estimates of agents in the previous la yer are therefore (see also Mossel and T amuz (2010)), ˜ w ( k ) i = 1 T  R ( k − 1) i  − 1 1 T  R ( k − 1) i  − 1 1 . (2.5) W e deﬁne w ( k ) i b y using the corresp onding entries from ˜ w ( k ) i and setting the remainder to zero. In the follo wing we describ e the maximum-lik eliho od estimate that can b e made from all the estimates in a lay er. F or simplicit y , we denote this ﬁnal estimate by ˆ s . The follo wing results are standard (Ka y, 1993). Prop osition 1. The p osterior distribution over s of the ﬁnal agent is normal with ˆ s = 1 T (Ω ( n − 1) ) − 1 1 T (Ω ( n − 1) ) − 1 1 ˆ s ( n − 1) and V ar [ ˆ s ] = 1 1 T (Ω ( n − 1) ) − 1 1 (2.6) wher e Ω ( n − 1) is deﬁne d by Eq. (2.4) and Eq. (2.5) . Her e ˆ s is the maximum-likeliho o d, as wel l as minimum-varianc e, unbiase d estimate of s . It follows from Eq. (2.3) that the estimate of any agent in the net work is a conv ex linear com bination of the estimates in the ﬁrst lay er. 5 Examples. Returning to the example in Fig. 1.1(a) we hav e C (1) =  1 1 0 0 1 1  , W (2) =  1 2 1 2 0 0 1 2 1 2  , Ω (2) =  1 2 1 4 1 4 1 2  , (Ω (2) ) − 1 = 16 3  1 2 − 1 4 − 1 4 1 2  The ﬁnal agen t applies the weigh ts W (3) =  1 2 , 1 2  to the estimates from the second lay er. W e th us ha ve the ﬁnal estimate ˆ s =  1 4 , 1 2 , 1 4  · ˆ s (1) with V ar [ ˆ s ] = 3 8 . The v ariance of the ideal estimate is 1 3 . On the other hand, the ﬁnal agent in the example in Fig. 1.1(b) mak es an ideal estimate: Here W (2) =   1 2 1 2 0 1 2 0 1 2 0 1 2 1 2   , Ω (2) =   1 2 1 4 1 4 1 4 1 2 1 4 1 4 1 4 1 2   , and after inv erting Ω (2) w e see that applying a w eight of 1 3 to ev ery agent in the second lay er giv es the ideal estimate, ˆ s =  1 3 , 1 3 , 1 3  · ˆ s (1) . Remark. If the agents have a pr op er prior: p ( s | χ, ν ) = N ( s | χ, 1 ν ) = r ν 2 π exp  − ν 2 π ( s − χ ) 2  , (2.7) then agents in the ﬁrst layer make the estimate, ˆ s (1) i = 1 σ 2 i 1 σ 2 i + ν x i + ν 1 σ 2 i + ν χ, with a similar form in the fol lowing layers. This do es not change the subse quent r esults as long as al l agents have the same prior. Also, if e ach agent in the network makes a me asur ement, the gener al ide as r emain unchange d. 3. Results W e ask what graphical conditions need to b e satisﬁed so that the agent in the ﬁnal lay er mak es an ideal estimate. That is, when do es kno wing all estimates of the agents in the ( n − 1) st la yer giv e an estimate that is as go o d as p ossible given the measurements of all ﬁrst-la yer agen ts. W e refer to a net work in whic h the ﬁnal estimate is ideal as an ideal net work. Prop osition 2. A network with n layers and σ 2 i 6 = 0 for i = 1 , . . . , L 1 , is ide al if and only if the ve ctor of inverse varianc es, ( w 1 , ..., w L 1 ) , is in the r ow sp ac e of the weight matrix pr o duct ( Q n − 3 l =0 W ( n − 1 − l ) ) . Pr o of. In this setting the ideal estimate is ˆ s ideal = 1 P i w i L 1 X i =1 w i ˆ s (1) i . (3.1) The net work is ideal if and only if there are co eﬃcien ts β j ∈ R suc h that ˆ s ideal = L n − 1 X j =1 β j ˆ s ( n − 1) j . 6 Matc hing co eﬃcien ts with Eq. (3.1), we need 1 P j w j L 1 X i =1 w i ˆ s (1) i =  β 1 , ..., β L n − 1  · ˆ s ( n − 1) , or equiv alen tly , 1 P j w j ( w 1 , ..., w L 1 ) · ˆ s (1) =  β 1 , ..., β L n − 1  · W ( n − 1) ˆ s ( n − 2) =  β 1 , ..., β L n − 1  · n − 3 Y l =0 W ( n − 1 − l ) ! ˆ s (1) . Equalit y holds exactly when ( w 1 , ..., w L 1 ) is in the ro w space of  Q n − 3 l =0 W ( n − 1 − l )  . In particular, a three-lay er net work with σ 2 i = σ for all i ∈ { 1 , . . . , L 1 } is ideal if and only if the vector ~ 1 = (1 , 1 , ..., 1) is in the row space of the connectivity matrix C (1) deﬁned by Eq. (2.1). W e will use and extend this observ ation b elo w. 3.1. Gr aphic al Conditions for Ide al Networks W e say that a netw ork con tains a W-motif if tw o agents downstream receive common input from a ﬁrst-la yer agent, as w ell as priv ate input from tw o distinct ﬁrst-lay er agents. Examples are sho wn in Fig. 1.1(a) and Fig. 3.1. A rigorous deﬁnition follo ws. W e will sho w that al l networks that ar e not ide al c ontain a W-motif . Ho wev er, the conv erse is not true: The netw ork in Fig. 1.1(b) contains man y W-motifs, but is ideal. Therefore ideal net works can contain a W-motif, as the redundancy introduced by a W-motif can sometimes b e resolved. Hence, additional graphical conditions determine if the netw ork is ideal. Fig. 3.1: A W-motif spanning three la yers. As shown in Fig. 3.1, in a W-motif there is a directed path from a single agent in the ﬁrst la yer to tw o agen ts in the third lay er. There are also paths from distinct ﬁrst-lay er agen ts to the t wo third-lay er agen ts. This general structure is captured b y the following deﬁnitions. Deﬁnition 1. The p ath matrix P kl , l < k , fr om layer l to layer k is deﬁne d by, P kl ij = ( 1 , if ther e is a dir e cte d p ath fr om agent a ( l ) j to agent a ( k ) i 0 , otherwise. 7 Deﬁnition 2. A network c ontains a W-motif if a p ath matrix fr om the ﬁrst layer, P k 1 , has a 2 × 3 submatrix e qual to  1 1 0 0 1 1  (mo dulo c olumn p ermutation). Gr aphic al ly, two agents in layer k ar e c onne cte d to one c ommon, and two distinct agents in layer 1 . Theorem 1. A non-ide al network in which every agent c ommunic ates its estimate to the subse quent layer must c ontain a W-motif. Equivalently, if ther e ar e no W-motifs, then the network is ide al. The proof of this theorem can be found in App endix A. In tuitiv ely , an y agen t receiv es esti- mates that are a linear com bination of ﬁrst-la yer measuremen ts. If there are no W-motifs, an y t wo estimates are either obtained from disjoint sets of measuremen ts, or the measurements in the estimate of one agent con tain the measurements in the estimate of another. When measuremen ts are disjoint, there are no correlations b etw een the estimates and thus no degra- dation of information. When one set of measurements contains the other, then the estimates in the subset are redundan t and can b e discarded. Therefore, this redundant information do es not cause a degradation of the ﬁnal estimate. 3.2. Suﬃcient Conditions for Ide al Thr e e-L ayer Networks W e next consider only three-la yer netw orks. This allows us to giv e a graphical in terpreta- tion of the algebraic condition describing ideal netw orks in Prop osition 2. T o do so, w e will use the follo wing corollary of the prop osition. Corollary 1. L et C (1) b e deﬁne d as in Eq. (2.1) . Then a thr e e-layer network is ide al if and only if the ve ctor m ~ 1 is in the r ow sp ac e of C (1) over Z for some nonzer o m ∈ N . The pro of is straigh tforward and pro vided in App endix B for completeness. Note that the corollary is not restricted to the case where ﬁrst-la yer agents hav e equal v ariance measure- men ts; whether the netw ork is ideal or not dep ends entirely on the connection matrix C (1) . The i th ro w of the matrix C (1) corresp onds to the inputs of agen t a (2) i , and the sum of the j th column is the out-degree of agent a (1) j . Therefore, Corollary 1 is equiv alen t to the follo wing: If each second-lay er agent applies equal in teger w eights to all of its receiv ed estimates, then a three-la yer net w ork is ideal if and only if, for some c hoice of w eights, the weigh ted out-degrees of all agen ts in the ﬁrst lay er are equal. Hence, we hav e the follo wing sp ecial case: Corollary 2. A thr e e-layer network is ide al if al l ﬁrst-layer agents have e qual out-de gr e e in e ach c onne cte d c omp onent of the network r estricte d to the ﬁrst two layers. In the connected netw ork in Fig. 1.1(a), the second agent in the ﬁrst la yer has greater out-degree than the others, while the agents in the ﬁrst la yer of the connected netw ork in Fig. 1.1(b) ha ve equal out-degree. Some row reduction op erations can b e interpreted graphically . Let g b e the input-map whic h maps an agent, a (2) i , to the subset of agents in the ﬁrst lay er that it receives estimates from. If g ( a (2) i ) ⊆ g ( a (2) j ) for some i 6 = j , then some of the information receiv ed by a (2) j is redundan t, as it is already con tained in the estimate of agent a (2) i . W e can then reduce the net work by eliminating the directed edges from g ( a (2) i ) to a (2) j , so that in the reduced net work g ( a (2) i ) ∩ g ( a (2) j ) = ∅ . This reduction is equiv alent to subtracting row i from row j of C (1) 8 resulting in a connection matrix with the same row space. By Prop osition 2, the reduced net work is ideal if and only if the original netw ork is ideal. This motiv ates the following deﬁnition. Deﬁnition 3. A thr e e-layer network is said to b e r e duc e d if g ( a (2) i ) is not a subset of g ( a (2) j ) for al l 1 ≤ i 6 = j ≤ L 2 . Reducing a netw ork eliminates edges, and results in a simpler netw ork structure. In a three-la yer netw ork, this will not aﬀect the ﬁnal estimate: Since reduction leav es the ro w space of C (1) unc hanged, the ﬁnal estimate in the reduced and unreduced netw ork is the result of applying the same weigh ts to the ﬁrst-lay er estimates. This reduction pro cedure often simpliﬁes iden tiﬁcation of ideal netw orks to a counting of out-degrees (see Corollary 2). Example. In Fig. 3.2, we illustrate a tw o-step reduction of a net work. In b oth steps, an agen t (in yello w) has an input set which is o verlapped b y the input sets of some other agen ts (b olded). W e use this to cancel the common inputs to the b olded agents and simplify the net work. In the ﬁrst step, note that the yello w agent receives input (in red) from a single ﬁrst-la yer agen t. W e use this to remov e all of the other connections (in green) emanating from this ﬁrst-lay er agen t. In the second step, we again see that the yello w agent receiv es input (red) that is ov erlapp ed b y input to the agen t next to it. W e can thus remo ve the redundant inputs (in green) to the b olded agent. The reduced netw ork has 5 connected comp onen ts all con taining vertices with equal out-degree. Hence, this netw ork is ideal by Corollary 2. Fig. 3.2: Example of a tw o step netw ork reduction. It is diﬃcult to tell whether the netw ork on the left is ideal. How ever, after the reduction, all ﬁrst-la yer agen ts in each of the ﬁve connected comp onen ts ha ve equal out-degree. The netw ork is therefore ideal. 3.3. V arianc e and Bias of the Final Estimate W e next consider how the v ariance and bias of the estimate in lay er n depend on the net work structure. By deﬁnition, the v ariance of the ideal estimate is V ar( ˆ s ) =  P L 1 i =1 w i  − 1 . Therefore, as the size of the netw ork increases, the ﬁnal estimate in an ideal netw ork is c on- sistent : As the num b er of measurements increases the ﬁnal estimate conv erges in probability to the true v alue of s (Ka y, 1993). W e next show that the ﬁnal estimate in non-ideal netw orks is not necessarily consistent. W e also show that biases of certain ﬁrst-la yer agents can hav e a disprop ortionate impact on the bias of the ﬁnal estimate. V arianc e Maximizing Network Structur e. Fig. 3.3 shows an example of a net work structure for which the v ariance of the ﬁnal estimate con verges to a p ositive n um b er as the num b er of agen ts in the ﬁrst lay er increases. W e assume that all ﬁrst-lay er agents make measurements with unit v ariance. W e will show that as the n umber of agents in b oth lay ers increases, the 9 Fig. 3.3: Example of a netw ork with an inconsisten t ﬁnal estimate. The green and blue no des represen t agents in the ﬁrst and second la yer, resp ectiv ely . Each second-lay er agent receives input from the common, cen tral agent and a distinct ﬁrst-lay er agent. v ariance of the ﬁnal estimate approac hes 1 / 4. If there are n + 1 agents in the ﬁrst lay er and n agen ts in the second lay er, we hav e C (1) =      1 1 0 . . . 0 1 0 1 . . . 0 . . . . . . 0 . . . 0 1 0 0 . . . 1      , W (2) =      1 2 1 2 0 . . . 0 1 2 0 1 2 . . . 0 . . . . . . 0 . . . 0 1 2 0 0 . . . 1 2      , Ω =      2 1 . . . 1 1 2 . . . 1 . . . . . . . . . 1 1 1 . . . 2      . The in verse of Ω can b e explicitly computed. But note that Ω is a circulant matrix, and hence so is its in verse. This means that ev ery row of the inv erse adds to the same n umber, and so W (3) =  1 n , . . . , 1 n  . This gives the estimate ˆ s = W (3) W (2) ˆ s (1) =  1 2 , 1 2 n , ..., 1 2 n  · ˆ s (1) . Therefore, the estimate of the cen tral agen t (whic h comm unicates with all agen ts in the second la yer) receiv es a m uc h higher w eigh t than all other estimates from the ﬁrst la yer. The v ariance of this estimate is equal to the sum of the squares of the weigh ts, V ar( ˆ s ) = 1 4 + 1 4 n . Hence, the ﬁnal estimate is not consistent, as its v ariance remains p ositiv e as the num b er of ﬁrst-la yer agents div erges. Giv en a restriction on the n umber of second-la yer agents, w e show that this net work leads to the highest p ossible v ariance of the ﬁnal estimate: Prop osition 3. The ﬁnal estimate in the network in Fig. 3.3 has the lar gest varianc e among al l thr e e-layer networks with a ﬁxe d numb er n ≥ 4 of ﬁrst-layer, and m ≥ n − 1 se c ond-layer agents, assuming that every ﬁrst-layer agent makes at le ast one c onne ction. The idea of the pro of is to limit the p ossible out-degrees of the agen ts in the ﬁrst lay er and sho w that the structure in Fig. 3.3 has the highest v ariance for this restriction. The pro of is pro vided in App endix C. In general, we conjecture that for the ﬁnal estimate to ha ve large v ariance, some agents upstream must hav e a disprop ortionately large out-degree, with the remaining agents making 10 few connections. On the other hand, as the in-degree of a second-lay er agent increases, the v ariance of its estimate shrinks. Thus when a few agents comm unicate information to many , the resulting redundancy is diﬃcult to resolve downstream. But when downstream agents receiv e many estimates, w e exp ect the estimates to b e go o d. W e next show that the biases of the agen ts with the highest out-degrees can hav e an outsized inﬂuence on the estimates do wnstream. Pr op agation of Biases. W e next ask whether the bias of the ﬁnal estimate can remain ﬁnite in the limit of inﬁnitely man y measurements. W e assume constant, additiv e biases, ˆ s (1) i = x i + b i , with the constant bias, b i , unknown to agen ts downstream. Since all estimates in the netw ork are con vex linear combinations of ﬁrst-lay er measurements, the ﬁnal estimate will hav e the form ˆ s = X α i ( x i + b i ) = X α i x i + X α i b i , (3.2) and th us will hav e ﬁnite bias b ounded b y the maximum of the individual biases. W e hav e pro vided examples of net work structures where the estimate of a ﬁrst-la yer agent w as given higher weigh t than others, ev en when all ﬁrst-la yer measurements had equal v ari- ance. Eq. (3.2) shows that this agent’s bias will also b e disprop ortionately represented in the bias of the ﬁnal estimate. Indeed, in the example in Fig. 1.1(a), the estimate of second agen t in ﬁrst lay er has w eight 1 2 , and its bias will hav e twice the w eight of the other agen ts in the ﬁnal estimate. Similarly , the bias of the central agent in Fig. 3.3 will accoun t for half the bias of the ﬁnal estimate as n → ∞ . Thus even if the biases, b i , are distributed randomly with zero mean, the asymptotic bias of the ﬁnal estimate do es not alw ays disapp ear as the n umber of measuremen ts increases. More generally , netw orks that contain W-motifs can result in biases of ﬁrst-lay er agents with disprop ortionate impact on the ﬁnal estimate. As with the v ariance, w e conjecture that the bias of agents that communicate their estimates to many agents downstream will b e disprop ortionately represented in the ﬁnal estimate. Equiv alently , if the net work contains agen ts that receive many estimates, we exp ect the bias of the ﬁnal estimate to b e reduced. 3.4. Infer enc e in r andom fe e dforwar d networks W e hav e shown that netw orks with sp eciﬁc structures can lead to inconsistent and asymp- totically biased ﬁnal estimates. W e no w consider net works with randomly and indep endently c hosen connections b et ween lay ers. Such netw orks are likely to contain many W-motifs, but it is unclear whether these motifs are resolv ed and whether the ﬁnal estimate is ideal. W e will use results of random matrix theory to show that there is a sharp transition in the probability that a net work is ideal when the n umber of agents from one lay er exceeds that of the previous la yer (Bollob´ as, 2001). W e assume that connections b et ween agents in diﬀeren t lay ers are random, indep enden t and made with ﬁxed probability , p . W e will use the following result of Koml´ os (1968), also discussed b y Bollob´ as (2001): Theorem 2 (Komlos) . L et ξ ij , i, j = 1 , . . . , n b e i.i.d. with non-de gener ate distribution function F ( x ) . Then the pr ob ability that the matrix X = ( ξ ij ) is singular c onver ges to 0 with the size of the matrix, lim n →∞ P (det X = 0) = 0 . 11 Proba bi l i t y of Ideal Network R atio of La y er 2 t o La y er 1 Agents Number of First La y er Agents 10 25 50 100 0 1 2 0 0.5 1 0 1 2 0 1 2 Fig. 3.4: The probabilit y that a random, three-la yer net work is ideal for connection probabili- ties p = 0.1 (left), 0.5 (center) , and 0.9 (righ t). In each panel, the diﬀeren t curv es corresp ond to diﬀeren t, but ﬁxed num b ers of agents in the ﬁrst la yer. The num b er of agents in the second la yer is v aried. There is a sharp transition in the probability that a netw ork is ideal when the n umber of agents in the the second lay er exceeds the num b er in the ﬁrst. Corollary 3. F or a thr e e-layer network with indep endent, r andom, e qual ly pr ob able ( p = 1 / 2 ) c onne ctions fr om ﬁrst to se c ond-layer, as the numb er of agents L 1 and L 2 incr e ases, L 1 L 2 ≤ 1 = ⇒ P ( ˆ s = ˆ s ide al ) → 1 , and L 1 L 2 > 1 = ⇒ P ( ˆ s = ˆ s ide al ) → 0 . The pro of is given in App endix D. The same pro of w orks when L 1 /L 2 ≤ 1 and the probabilit y of a connection is arbitrary , p ∈ (0 , 1]. W e conjecture that the result also holds for L 1 /L 2 > 1 and arbitrary p , but the present pro of relies on the assumption that p = 1 / 2. Fig. 3.4 sho ws the results of simulations which supp ort this conjecture: The diﬀeren t panels corresp ond to diﬀerent connection probabilities, and the curv es to diﬀerent n umbers of agen ts in the ﬁrst lay er. As the num b er of agen ts in the second lay er exceeds that in the ﬁrst, the probabilit y that the netw ork is ideal approac hes 1 as the n umber ﬁrst-la yer agen ts increases. With 100 agen ts in the ﬁrst la yer, the curv e is appro ximately a step function for all connection probabilities w e tested. Mor e than 3 L ayers. W e conjecture that a similar result holds for netw orks with more than three la yers: Conjecture. F or a network with n layers with indep endent, r andom, e qual ly pr ob able c on- ne ctions b etwe en c onse cutive layers, as the total numb er of agents incr e ases, L k ≤ L k +1 for 1 ≤ k < n − 1 = ⇒ P ( ˆ s = ˆ s ide al ) → 1 and L 1 > L k for some 1 < k < n = ⇒ P ( ˆ s = ˆ s ide al ) → 0 . Fig. 3.5 shows the results with four-la yer net w orks with diﬀeren t connection probabilities across lay ers. The num b er of agents in the ﬁrst and second lay ers are equal, and we v aried the n umber of agents in the third lay er. The results supp ort our conjecture. 12 With multiple la yers, if L 1 > L 2 then the netw ork will not b e ideal as in the limit the estimate of s will not b e ideal already in the second lay er b y Corollary 3. If the num b er of agen ts do es not decrease across la yers, we conjecture that the probability that information is lost across la yers is small when the num b er of agents is large. Indeed, it seems reasonable that the pro ducts of the random weigh t matrices will b e full rank with increasing probabilit y allo wing us to apply Prop osition 2. Ho w ever, the en tries in these matrices are no longer indep enden t, so classical results of random matrix theory no longer apply . 0 1 2 0 0.5 1 Proba bi l i t y of Ideal Network 0 1 2 R atio of La y er 3 t o La y er 2 Agents 0 0.5 1 0 1 2 0 0.5 1 Number of Second La y er Agents 10 25 50 100 Fig. 3.5: The probability that a random, four-la yer net work is ideal for connection probabilities p = 0.1 (left), 0.5 (center) , and 0.9 (right). Eac h curv e corresp onds to equal, ﬁxed n umbers of agen ts in the ﬁrst tw o lay ers, with a changing num b er of agents in the third lay er. 4. Conclusion W e examined how information ab out the world propagates through lay ers of agen ts. W e assumed that at each step, a group of agents mak es an inference ab out the state of the world from information provided by their predecessors. The setup is related, but diﬀeren t from information cascades where a chain of rational agen ts make decisions in turn (Banerjee, 1992; Easley and Klein b erg, 2010; W elc h, 1992; Bharat and Mihaila, 2001), or recurren t netw orks where agen ts exchange information iteratively (Mossel et al., 2014). W e translated the question ab out whether the estimate of the state of the w orld degrades across la y ers in the netw ork to a simple algebraic condition. This allo wed us to use results of random matrix theory in the case of random net works, ﬁnd equiv alen t netw orks through an in tuitive reduction pro cess, and iden tify a class of net works in whic h estimates do not degrade across la yers, and another class in which degradation is maximal. Net works in which estimates degrade across lay ers m ust contain a W-motif. This motif in tro duces redundancies in the information that is communicated downstream and may not b e remov ed. Such redundancies, also kno wn as “bad correlations,” are known to limit the information that can b e deco ded from neural resp onses (Moreno-Bote et al., 2014; Bhardw a j et al., 2015). This suggests that agen ts with large out-degrees and small in-degrees can hinder the propagation of information, as they in tro duce redundan t information in the net work. On the other hand, agen ts with large in-degrees integrate information from man y sources, which can help impro v e the ﬁnal estimate. How ev er, the detailed structure of a net work is important: F or example, an agent with large in-degree in the second lay er can hav e a large out-degree without hindering the propagation of information as it has already integrated most a v ailable ﬁrst-la yer measurements. 13 T o make the problem tractable, we ha ve made a num b er of simplifying assumptions. W e made the strong assumption that agen ts hav e full kno wledge of the netw ork structure. Some agen ts may hav e to make several calculations in order to make an estimate, so we also do not assume bounded rationality (Bala and Goy al, 1998). This is unlik ely to hold in realistic situ- ations. Even when making simple decisions, pairs of agen ts are not alwa ys rational (Bahrami et al., 2010): When t w o agen ts each make a measurement with diﬀeren t v ariance, exc hanging information can degrade the b etter estimate. The assumption that only agen ts in the ﬁrst lay er mak e a measurement is not crucial. W e can obtain similar results if all agents in the netw ork make indep endent measurements, and the information is propagated directionally , as we assume here. How ev er, in such cases, the conﬁdence (in verse v ariance of the estimates) typically b ecomes un b ounded across la yers. 5. Ac kno wledgments F unding: This researc h was supp orted by NSF-DMS-1517629 (SS and KJ), NSF/NIGMS- R01GM104974 (KJ), NSF-DMR-1507371 (KB), and NSF-IOS-1546858 (KB). App endix A. Pro of of Theorem 1 W e start with the simpler case of a W-motif b et ween the ﬁrst t w o lay ers and then extend it to the general case. W e b egin with deﬁnitions that will b e used in the proof. Let g b e the input-map which maps an a gent to the subset of agen ts in the ﬁrst la yer that it receiv es information from (through some path). That is, g ( a ( j ) i ) is the set of agents in the ﬁrst lay er that pro vide input to a ( j ) i . It is in tuitive – and we show it formally in Lemma 1 – that a netw ork contains a W-motif if eac h of the inputs to tw o agents, A and B are not contained in the other, and their in tersection is not empty . That is, g ( A ) 6⊆ g ( B ) and g ( B ) 6⊆ g ( A ) , but g ( A ) ∩ g ( B ) 6 = ∅ . If these conditions are met, we also sa y that the inputs of A and B ha ve a non trivial intersection . If g ( A ) ⊆ g ( B ), w e say that the input of B ov erlaps the input of A : ev ery agent which contributes to the estimate of A also contributes to the estimate of B . Similarly , we let f b e the output-map whic h maps an agen t, a ( j ) i , to the set of all agents in the next, j + 1 st , lay er that receiv e input from a ( j ) i . W e ﬁrst prov e a few lemmas essential to the pro of of Theorem 1. Lemma 1. Assume a network do es not c ontain a W-motif and ther e ar e two agents, a ( k ) i 1 and a ( k ) i 2 , with g ( a ( k ) i 1 ) ∩ g ( a ( k ) i 2 ) nonempty. Then g ( a ( k ) i 1 ) overlaps or is overlapp e d by g ( a ( k ) i 2 ) . Pr o of. W e prov e the claim by contradiction. If one input do es not ov erlap the other, then there are tw o distinct ﬁrst-la yer agents a (1) n 1 and a (1) n 2 suc h that a (1) n 1 ∈ g ( a ( k ) i 1 ) \ g ( a ( k ) i 2 ) and a (1) n 2 ∈ g ( a ( k ) i 2 ) \ g ( a ( k ) i 1 ). This means P k 1 i 1 n 1 = P k 1 i 2 n 2 = 1 and P k 1 i 1 n 2 = P k 1 i 2 n 1 = 0. Since the inputs of the agen ts hav e nonempt y intersection, w e also ha ve P k 1 i 1 m = P k 1 i 2 m = 1 for some m . Th us there is a 2 × 3 submatrix of P k 1 whic h, up to rearrangement of the columns, is equal to  1 1 0 1 0 1  and the net work contains a W-motif, contrary to assumption. Ev ery agen t’s estimate is a con vex linear com bination of estimates in the ﬁrst la yer, giv en b y Eq.(2.3). W e will use the corresp onding weigh t v ectors in the following pro ofs. W e show 14 that in netw orks without W-motifs, agen ts will only b e receiving collections of estimates with w eight v ectors which pairwise either ha ve disjoint supp ort (nonzero indices) or the supp ort is con tained in the supp ort of the other agen t. Th us, with no W-motifs, no tw o agen ts hav e inputs with non trivial in tersection. The next tw o lemmas will allo w us to easily calculate the estimates of suc h agents. Lemma 2. L et r , s, t b e p ositive inte gers, w i = σ − 2 i , and c onsider thr e e weight ve ctors applie d by thr e e agents in layer k , a ( k ) 1 , a ( k ) 2 , and a ( k ) 3 , to the estimates of the ﬁrst layer: v 1 =  w 1 P r i =1 w i , . . . , w r P r i =1 w i , 0 , . . . , 0  v 2 =  w 1 P r + s i =1 w i , · · · , w r + s P r + s i =1 w i , 0 , . . . , 0  v 3 = 0 , . . . , 0 , w r + s +1 P r + s + t i = r + s +1 w i , . . . , w r + s + t P r + s + t i = r + s + t w i , 0 , . . . , 0 ! . A n agent a ( k +1) i in f ( a ( k ) 1 ) ∩ f ( a ( k ) 2 ) , but not in f ( a ( k ) 3 ) , wil l use weight ve ctor v 2 . A n agent a ( k +1) i in f ( a ( k ) 2 ) ∩ f ( a ( k ) 3 ) , but not f ( a ( k ) 1 ) , wil l use weight ve ctor v 4 =  w 1 P r + s + t i =1 w i , . . . , w r + s + t P r + s + t i =1 w i , 0 , ..., 0  . Pr o of. First, consider an agen t receiving the ﬁrst t wo estimates with w eights v 1 and v 2 . Sup- p ose that a ﬁctitious agent receiv es a collection of estimates with weigh t vectors { z 1 , ..., z r + s } , where z i = (0 , . . . , 0 , 1 , 0 , . . . , 0), i.e. , eac h estimate equals the measurement of agen t a (1) i . This ﬁctitious agent can obtain an y linear com bination of the ﬁrst r + s measurements. The linear com bination with low est v ariance has weigh ts giv en by v 2 . Therefore, an agent receiving mea- suremen ts corresp onding to the weigh t v ectors v 1 and v 2 cannot do b etter than the estimate of agen t a ( k ) 2 with w eights given by v 2 . A similar argumen t w orks when estimates are receiv ed from agen ts a ( k ) 2 and a ( k ) 3 . Since these t wo agen ts mak e lo cally optimal estimates based on non-ov erlapping sets of measuremen ts in the ﬁrst lay er, the b est estimate is obtained by com bining the t wo sets of measurements. This is precisely the estimate corresp onding to the weigh ts given by v ector v 4 . Lemma 3. Supp ose an agent, a ( k ) i , r e c eives a c ol le ction of estimates such that for any p air, ther e is a r elab eling of agents in the ﬁrst layer that makes the p air lo ok like v 1 and v 2 or like v 2 and v 3 in L emma 2. Then, up to some r elab eling of the agents in the ﬁrst layer, that agent wil l make an estimate with c orr esp onding weight ve ctor v =  w 1 P r i =1 w i , . . . , w r P r i =1 w i , 0 , . . . , 0  . Pr o of. Let the vectors z i b e deﬁned as in the pro of of Lemma 2. Relab el the ﬁrst-lay er agen ts so that only the ﬁrst r entries of the w eight v ector applied b y agent a ( k ) i are non-zero. Then a ﬁctitious agent receiving estimates with w eight vectors z i , 1 ≤ i ≤ r can construct any estimate that agent a ( k ) i can obtain. The optimal estimate of this ﬁctitious agen t has w eight 15 v ector v . Hence if some linear combination of the weigh t v ectors of estimates communicated to agen t a ( k ) i equals v , this linear com bination deﬁnes the b est estimate. Then for eac h j = 1 , ..., r , w e can ﬁnd a weigh t v ector, v j , whic h is nonzero in the j th en try with supp ort that contains the supp ort of every other weigh t v ector which is nonzero in the j th en try . Such a vector exists by the assumption that an y tw o vectors hav e disjoint supp ort or the supp ort of one con tains the other. Therefore, we can ﬁnd the weigh t vector with maximal supp ort for eac h en try . If we take the distinct elemen ts of { v j : 1 ≤ j ≤ r } , then these maximal weigh t v ectors will ha ve disjoin t supp ort that partitions the ﬁrst r indices. Therefore, v = 1 P r i =1 w i X v j distinct   r X i =1 ,v i j nonzero w i   v j , whic h shows the lemma. W e now state and prov e the three-la yer case of Theorem 1 and then use it to ﬁnish the pro of of Theorem 1. Prop osition 4. If a thr e e-layer network is not ide al and every ﬁrst-layer agent c ommunic ates with at le ast one se c ond-layer agent, then the network must c ontain a W-motif. Pr o of. Assume the netw ork do es not contain a W-motif. Given a ﬁrst-la yer agent a (1) i , Lemma 1 sa ys that for an y tw o agen ts in f ( a (1) i ), one agent’s input must ov erlap the other. Tw o second-lay er agents th us receive estimates with input sets where one ov erlaps the other, or the sets do not intersect. Thus the set of weigh t v ectors in the second lay er satisﬁes the assumptions of Lemma 3. As all agents from the ﬁrst lay er communicate with the ﬁnal agent, the net work is ideal. T o obtain the pro of of Theorem 1, we use induction with Prop osition 4 as a base case. Pr o of of The or em 1. Assume the netw ork has n lay ers, there are no W-motifs, and ev ery agen t (except those in the ﬁrst lay er) receiv es input from at least one other agent. Lemma 1 implies that in the second la yer eac h pair of agen ts has either disjoin t input or one ov erlaps the other. Th us in the third lay er, by relab eling the agents, eac h agent makes an estimate with weigh t v ector of the form: 1 P r i =1 w i ( w 1 , . . . , w r , 0 , . . . , 0). No w assume that an y estimate in lay er k can b e put in this form by relab eling the agents. Since there are no W-motifs, Lemma 1 implies that set of measuremen ts used b y agents a ( k ) i 1 and a ( k ) i 2 is disjoint or o verl apping. This again allows us to apply Lemma 3 and an y agen t in lay er k + 1 makes an estimate whose weigh t v ector again has the form 1 P r i =1 w i ( w 1 , . . . , w r , 0 , . . . , 0). Applying the same argument to the ﬁnal agent, where ev ery entry will b e nonzero in some p en ultimate-la yer agent’s weigh t vector, w e ha ve that the netw ork is ideal. App endix B. Pro of of Corollary 1 W e will sho w that a three-lay er net work is ideal if and only if m ~ 1 is in the ro w space of C (1) o ver Z for some m ∈ N . W e do this by ﬁrst sho wing that the net work is ideal if and only 16 if ~ 1 is in the row space of C (1) o ver R , and then we sho w that this is equiv alent to m ~ 1 b eing in the ro w space of C (1) o ver Z . By Prop osition 2, a three-la yer netw ork is ideal if and only if ( w 1 , . . . , w L 1 ) is in the row space of W (2) . W e claim that this is equiv alent to ~ 1 being in the ro w space of C (1) : Multiplying eac h row of W (2) b y the common denominator of the nonzero en tries gives R ( W (2) ) = R ( C (1) Diag( w 1 , . . . , w L 1 )) , where R denotes the ro w space. By deﬁnition, ~ 1 is a linear com bination of the ro ws of C (1) if and only if 1 = X i β i C (1) ij , ∀ j. This holds if and only if w j = X i β i w j C (1) ij , ∀ j . The last equalit y is equiv alent to ( w 1 , . . . , w L 1 ) = X i β i ( C (1) Diag( w 1 , . . . , w L 1 )) i , whic h means ( w 1 , . . . , w L 1 ) is in the ro w space of W (2) . Hence, for three-lay er net works, the net work is ideal if and only if the vector ~ 1 is in the ro w space of C (1) o ver R . Th us it remains to show that ~ 1 ∈ R ( C (1) ) ov er R is equiv alent to ~ 1 ∈ R ( C (1) ) ov er Z . If m ~ 1 ∈ R ( C (1) ) ov er Z , then it is a linear com bination of the rows of C (1) with integer co eﬃcien ts. Multiplying the co eﬃcien ts of this linear combination by 1 m sho ws that ~ 1 is in the ro w space of C (1) and hence the net work is ideal. If ~ 1 is in the ro w space of C (1) o ver R , then by closure of Q n this means there is some linear com bination of the rows of C (1) o ver Q which is equal to ~ 1: L 2 X i =1 α i C (1) i = ~ 1 , α i ∈ Q . Multiplying b oth sides b y the absolute v alue of the product of the denominators of the nonzero α i sho ws that L 2 X i =1 β i C (1) i = m ~ 1 , β i ∈ Z for some m ∈ N and th us m ~ 1 is in the ro w space of C (1) o ver Z . App endix C. Pro of of Prop osition 3 W e will show that the netw ork architecture that maximizes the v ariance of the ﬁnal es- timate for a giv en num b er of ﬁrst and second-la yer agents is the one sho wn in Fig. 3.3. T o simplify notation w e write L 1 = n and L 2 = m . Lemma 4. If d = ( d 1 , ..., d n ) is the ve ctor of out-de gr e es in the ﬁrst layer, so d i = | f ( a (1) i ) | , then to maximize the varianc e of the ﬁnal estimate, d must e qual ( m, 1 , . . . , 1) , up to r elab eling. 17 Pr o of of Claim. Given a netw ork structure consider the na ¨ ıv e estimate: 1 Z X i | g ( a (2) i ) | ˆ s (2) i = 1 P ij C (1) ij X i C (1) i · ˆ s (1) , (C.1) where Z is a normalizing factor that mak es the en tries of the corresp onding vector of w eights sum to 1. This estimate can alw a ys b e made and is the same as using a linear com bination of estimates of agen ts a (1) j with weigh ts d i P n j =1 d j . Thus the v ariance of the optimal estimate of the agen t in the ﬁnal lay er is b ounded ab o ve b y the v ariance of the na ¨ ıve estimate in Eq. (C.1). By assumption 1 ≤ d j ≤ m for all j . F or the netw ork in Fig. 3.3, this na ¨ ıve estimate equals the ﬁnal estimate. Th us it is suﬃcient to show that the na ¨ ıve estimate has maximal v ariance when d = ( m, 1 , . . . , 1), up to relab eling. The v ariance, V , of the na ¨ ıve estimate is: V ( d 1 , . . . , d n ) = X j  d j P n k =1 d k  2 . If w e treat the degrees as contin uous v ariables then V is contin uous on d ∈ [1 , m ] n and w e can calculate the gradien t of V to ﬁnd the critical p oin ts. ∂ V ∂ d i = 2  d i P k d k  P k d k − d i ( P k d k ) 2 + X j 6 = i 2  d j P k d k  − d j ( P k d k ) 2 Setting ∂ V ∂ d i = 0 and m ultiplying b oth sides b y 1 2 ( P n k =1 d k ) 3 giv es 0 = d i ( X k 6 = i d k ) − X j 6 = i d 2 j = X j 6 = i d j ( d i − d j ) . This shows that d = k ~ 1 for k = 1 , . . . , m are the only critical p oints, since if there exist d i ≤ d j , for all j 6 = i and d i < d k for some k 6 = i then the right hand side w ould b e negativ e. These critical p oin ts are the ﬁrst-lay er out-degrees of ideal netw orks by Corollary 2, hence they are minima. This implies that V tak es on its maximum v alues on the b oundary . The b oundary of [1 , m ] n consists of p oin ts where at least one co ordinate is 1 or m . Since V is inv ariant under p erm utation of the v ariables, we set d 1 equal to one of these v alues and in vestigate the b eha vior of V on this restricted set. First set d 1 = m . Setting ∂ V ∂ d i to 0 on this b oundary gives: 0 = m ( d i − m ) + X j 6 = i, 1 d j ( d i − d j ) One critical p oin t is thus m ~ 1. If d i ≤ d j for j 6 = i and d i < m then again the righ t hand side w ould b e negative. Hence d i = m for all i , and there are no critical p oin ts on the interior of { m } × [1 , d ] n − 1 . Next if d 1 = 1, setting ∂ V ∂ d i to 0 on this b oundary and multiplying by − 1 gives: 0 = 1 − d i + X j 6 = i, 1 d j ( d j − d i ) 18 Here a critical p oint is ~ 1. If d i ≤ d j for j 6 = i and 1 < d i < m then again the righ t hand side w ould b e negative. Hence d i = 1 for all i , and there are no critical p oints on the in terior of { 1 } × [1 , d ] n − 1 . If w e iterate this pro cedure, we see that the maximum v alue of V m ust o ccur on the corners of the hypercub e [1 , d ] n . Cho ose one of these corners, c , and, without loss of generalit y , assume that the ﬁrst l co ordinates are m and the last n − l co ordinates are 1, 1 ≤ l < n . Then V ( c ) = l X j =1  m P n k =1 d k  2 + n X j = l +1  1 P n k =1 d k  2 =  1 l m + ( n − l )  2  l m 2 + ( n − l )  = l m 2 + n − l l 2 m 2 + 2 lm ( n − l ) + ( n − l ) 2 = l ( m 2 − 1) + n l 2 ( m − 1) 2 + l 2 n ( m − 1) + n 2 Under the assumption that m ≥ n − 1, a lengthy algebra calculation that we omit sho ws that this is maximized for l = 1. Hence the maxim um v alue of V is ac hieved at ( m, 1 , . . . , 1), or an y of its co ordinate p erm utations. Finally , to ha ve d = ( m, 1 , . . . , 1), one ﬁrst-lay er agen t, a (1) 1 , comm unicates with all second- la yer agents and ev ery other agen t has exactly one output. Since there are at least n − 1 agen ts in the second lay er, this means that each ﬁrst-la yer agen t must comm unicate with a distinct second-la yer agen t and each second-la y er agen t m ust receiv e input from a (1) 1 . Otherwise, some agen t in the second lay er would receiv e only the input from a (1) i and thus the ﬁnal estimate could use that estimate to decorrelate all of the second-la y er estimates. So, the na ¨ ıve estimate for an alternative net work has smaller v ariance than the ideal estimate for the ring net work in Fig. 3.3. Hence the ﬁnal estimate in an y alternativ e net work will hav e smaller v ariance. Since the only net work with d = ( m, 1 , . . . , 1) is the netw ork in Fig. 3.3, w e ha ve shown that this structure maximizes the v ariance of the ﬁnal estimate among all net works with L 2 ≥ L 1 − 1. App endix D. Pro of of Corollary 3 Whether or not ˆ s ideal = ˆ s is determined by C (1) . F or simplicity , we drop the sup erscript and refer to this connectivity matrix as C . By our assumption, this is a random matrix with P ( C ij = 0) = P ( C ij = 1) = 1 / 2. First assume that there are at least as man y second-lay er agen ts as there are ﬁrst-la yer agen ts: L 2 ≥ L 1 or L 1 L 2 ≤ 1. Then C is a random L 2 × L 1 matrix with i.i.d. non-degenerate en tries that has more ro ws than columns. By Theorem 2, this means that the L 1 × L 1 submatrix formed b y the ﬁrst L 1 ro ws and columns is nonsingular with probabilit y approaching 1 as L 1 , L 2 → ∞ . Thus the probabilit y that the ro w space of C con tains the v ector ~ 1 con verges to 1 with the size of the net work. 19 Next assume that there are few er second-lay er agents than ﬁrst-la yer agents, that is L 2 < L 1 or L 1 L 2 > 1. W e will show that the probability that the row space of C contains ~ 1 go es to zero as L 1 , L 2 → ∞ . Since increasing the num b er of ro ws will not decrease the probabilit y that C contains a vector in its row space we assume that L 2 = L 1 − 1 and let L 1 = n : lim L 1 ,L 2 →∞ P ( ˆ s = ˆ s ideal ) ≤ lim n →∞ P ( ~ 1 ∈ R ( C ( n − 1 , n ))) where C ( n − 1 , n ) refers to the random matrix as b efore, and iden tiﬁes that it has n − 1 rows and n columns. W e ﬁrst use: P ( ~ 1 ∈ R ( C ( n − 1 , n ))) ≤ P (  ~ 1 C  is singular) since if ~ 1 is the ro w space of C , then attaching that row of ones to it would create a singular matrix. Lemma 1. P  det(  ~ 1 C  ) = 0  → 0 as n → ∞ . W e can rewrite C =  B v  , where v is the n th column of C and B is the remaining submatrix. W e claim (  ~ 1 C  ) = − 1 k det(  ~ 1 1 ˜ B ~ 0  ) = − 1 k + n +1 ∗ det( ˜ B ) (D.1) where ˜ B is a random ( n − 1) × ( n − 1) matrix distributed like C . Assuming this claim, then b y Koml´ os (1968) : P (det(  ~ 1 C = 0  )) = P  det( ˜ B ) = 0  → 0 as n → ∞ . Th us P ( ~ 1 ∈ R ( M ( n − 1 , n ))) → 0 as n → ∞ . T o prov e the ﬁrst equality in Eq. (D.1), w e use ro w op erations on  ~ 1 1 B v  : If v i = 1 then subtract the ﬁrst row from the i th ro w, ( B i v i ), to get a v ector whose entries are all 0 and − 1. Then ( B i v i ) → − ( ˜ B i 0) where ( ˜ B i 0) is a v ector of entries which are again either 0 or 1 with equal probability . W e do this for every ro w which has a 1 in its last entry and m ultiply the determinant a factor − 1 and denote the num b er of these reductions as k . Since P ( C ij = 0) = 1 2 w e also hav e P ( ˜ B ij = 0) = 1 2 . References References Bahrami, B., Olsen, K., Latham, P . E., Ro epstorﬀ, A., Rees, G., F rith, C. D., 2010. Optimally in teracting minds. Science 329 (5995), 1081–1085. Bala, V., Go yal, S., 1998. Learning from neighbours. Rev. Econ. Stud. 65 (3), 595–621. 20 Banerjee, A. V., 1992. A simple mo del of herd b eha vior. Q. J. Econ., 797–817. Bec k, J. M., Ma, W. J., Pitko w, X., Latham, P . E., Pouget, A., 2012. Not noisy , just wrong: the role of sub optimal inference in b eha vioral v ariabilit y . Neuron 74 (1), 30–39. Bharat, K., Mihaila, G. A., 2001. When exp erts agree: using non-aﬃliated exp erts to rank p opular topics. In: Pro ceedings of the 10th in ternational conference on W orld Wide W eb. A CM, pp. 597–602. Bhardw a j, M., Carroll, S., Ma, W. J., Josi ´ c, K., 2015. Visual decisions in the presence of measuremen t and stimulus correlations. Neural Comput. 27 (11), 2318–2353. Bikhc handani, S., Hirshleifer, D., W elc h, I., 1992. A theory of fads, fashion, custom, and cultural c hange as informational cascades. J. Polit. Econ., 992–1026. Bikhc handani, S., Hirshleifer, D., W elch, I., 1998. Learning from the b eha vior of others: Conformit y , fads, and informational cascades. J. Econ. Perspect. 12 (3), 151–170. Bollob´ as, B., 2001. Random Graphs. No. 73 in Cam bridge Studies in Adv anced Mathematics. Cam bridge Universit y Press. Brun ton, B. W., Botvinick, M. M., Bro dy , C. D., 2013. Rats and humans can optimally accum ulate evidence for decision-making. Science 340 (6128), 95–98. Condorcet, M. d., 1976. Essay on the application of mathematics to the theory of decision- making. Reprin ted in Condorcet: Selected W ritings, Keith Michael Baker, ed 33. DeGro ot, M. H., 1974. Reaching a consensus. JASA 69 (345), 118–121. Easley , D., Klein b erg, J., 2010. Net works, cro wds, and mark ets. V ol. 1. Cam bridge Univ Press. Gale, D., Kariv, S., 2003. Ba yesian learning in so cial net works. GEB 45 (2), 329–346. Ka y , S. M., 1993. F undamentals of statistical signal pro cessing, v olume I: estimation theory . Pren tice Hall. Koml´ os, J., 1968. On the determinant of random matrices. Stud. Sci. Math. Hung. 3 (4), 387–399. Moreno-Bote, R., Beck, J., Kanitscheider, I., Pitk ow, X., Latham, P ., P ouget, A., 2014. Information-limiting correlations. Nat. Neurosci. 17 (10), 1410–1417. Mossel, E., Sly , A., T amuz, O., 2014. Asymptotic learning on ba yesian social netw orks. Probab. Theory Related Fields 158 (1-2), 127–157. Mossel, E., T amuz, O., 2010. Eﬃcient bay esian learning in so cial netw orks with gaussian estimators. arXiv preprin t Mossel, E., T am uz, O., 2014. Opinion exchange dynamics. arXiv preprint W elc h, I., 1992. Sequential sales, learning, and cascades. J. Finance 47 (2), 695–732. 21

Loss of information in feedforward social networks

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment