Bayesian Learning without Recall

Bayesian Lear ning without Recall M. Amin Rahimian & Ali Jadbabaie ? Abstract —W e analyze a model of learning and belief formation in networks in which agents follow Bayes rule yet they do not recall their history of past observations and cannot reason about how other agents’ beliefs are f ormed. They do so by making rational inferences about their obser vations which include a sequence of independent and identically distributed private signals as well as the actions of their neighboring agents at each time. Successive applications of Bayes rule to the entire history of past observations lead to forebodingly complex inferences: due to lack of knowledge about the global network structure, and unavailability of private observations, as well as third party interactions preceding every decision. Such difﬁculties make Bayesian updating of beliefs an implausible mechanism for social learning . T o addr ess these complexities, we consider a Bayesian without Recall model of inference. On the one hand, this model pro vides a tractable framework for analyzing the behavior of rational agents in social networks. On the other hand, this model also pro vides a behavioral f oundation for the variety of non-Bayesian update rules in the literature. W e present the implications of various choices for the structure of the action space and utility functions f or such agents and investigate the properties of lear ning, con vergence, and consensus in special cases. Index T erms —Learning Models and Methods, Adaptation and Learning over Graphs, Sequential learning, Sequential Decision Methods, Social Learning, Bayesian Learning, Non-Bayesian Learning, Rational Learning, Observational Learning, Statistical Learning, Distributed Learning, Distributed Hypothesis T esting, Distributed Detection. I . I N T RO D U C T I O N & B A C K G RO U N D Individuals often e xchange opinions with their peers in order to learn from their knowledge and experiences, and in making various decisions such as in vesting in stock markets, voting in elections, choosing their political afﬁliations, selecting a brand of a product or a medical treatment. These interactions occur through a variety of media which we collectively refer to as social networks. James Surowiecki in his popular science book on wisdom of crowds [1], provides well-known cases for information aggregation in social networks, and argues how under the right circumstances (div ersity of opinion, indepen- dence, decentralization and aggregation) groups outperform ev en their smartest or best informed members; see for example the essentially perfect performance of the middlemost estimate at the weight-judging competition of the 1906 W est of England Fat Stock and Poultry Exhibition studied by Francis Galton in his 1907 Nature article [2], entitled “V ox Populi” (The W isdom of Crowds), or the study of market reaction to the 1986 challenger disaster in [3], where its is pointed out that the main responsible company’ s (Morton Thiokol) stock was hit hardest of all, e ven months before the cause of the accident could be ofﬁcially determined. ? Correspondence to: Ali Jadbabaie, Institute for Data, Systems, and Society (IDSS), Massachusetts Institute of T echnology (MIT), Cambridge, MA 02139, USA. (email: jadbabai@mit.edu ). This work was supported by ARO MURI W911NF-12-1-0509. On the other hand, sev eral studies point out that the ev o- lution of p e ople’ s opinions and decisions in social networks is subject to various kind of biases and inef ﬁciencies [4]–[8]. Such deviations from the ef ﬁcient and/or rational outcome are often attributed to the structural effects that arise in networked interactions; in particular , the predominant inﬂuence of more central agents in shaping the group decision, in spite of the fact that such inﬂuential agents do not necessarily enjoy a high quality of observ ations or superior kno wledge; cf. persuasion bias in [9], obstructions to wisdom of crowds in [10], and data incest in [11]. Subsequently , a better understanding of social learning can also help us analyze the effect of such biases and use our insights and conclusions to improve policy designs that are aimed at implementing desirable social norms or eradicating undesirable ones, or even to come up with more efﬁcient procedures for aggregating individual beliefs, and to understand how media sources, prominent agents, government and politicians are able to manipulate public opinion and inﬂuence spread of beliefs in society [12]. W e model the set of possible alternati ves that are of common interest to all individual in society by a set of ﬁnitely many states of the world; and endow each individual agent with a belief, representing her opinion or understanding of the true state of the world. Thereby , agents exchange beliefs in social networks to beneﬁt from each other’ s opinions and priv ate information in trying to learn an unknown state of the world. The problem of social learning is to characterize and understand such interactions and it is a classical focus of research in behavioral microeconomic theory [13, Chapter 8], [14, Chapter 5], [15]. Research on formation and evolution of beliefs in social networks and subsequent shaping of the indi- vidual and mass behavior has attracted much attention amongst div erse communities in engineering [16]–[18], statistics [19], economics [20], and sociology [21]. The problem of social learning has close siblings in distributed estimation [22], [23], data fusion [24], and statistical learning theory [25]; while relations can be also traced to the consensus and coordination problems that are studied in the distributed control theory [26], [27]. Consider an agent trying to estimate an unknown state of the world. She bases her estimation on a sequence of inde- pendent and identically distributed (i.i.d.) priv ate signals that she observes and whose common distribution is determined by the unknown state. Suppose further that her belief about the unkno wn state is represented by a discrete probability distribution ov er the set of ﬁnitely many possibilities Θ , and that she sequentially applies Bayes rule to her observations at each step, and updates her beliefs accordingly . It is a well- known consequence of the classical results in merging and learning theory [28], [29] that the beliefs formed in the above manner constitute a bounded martingale and con verge to a limiting distribution as the number of observations increases. Howe ver , the limiting distribution may differ from a point mass centered at the truth, in which case the agent fails to learn the true state asymptotically . This may be the case, for instance if the agent faces an identiﬁcation problem, that is when there are states other than the true state which are observ ationally equiv alent to the true state and induce the same distribution on her sequence of priv ately observed signals. Accordingly , the agents hav e an incentive to communicate in a social network so that the y can resolve their identiﬁcation problems by relying on each other’ s observational abilities. Rational agents in a social netw ork w ould apply Bayes rule successively to their observ ations at each step, which include not only their priv ate signals but also the beliefs and actions communicated by their neighbors. Ho wever , such repeated applications of Bayes rule in networks become very complex, especially if the agents are unaware of the global network structure. This is due to the fact that the agents at each step should use their local data that is increasing with time, and make very complex inferences about possible signal structures leading to their observ ations. Indeed, tractable modeling and analysis of rational behavior in networks is an important problem in network economics and hav e attracted much attention, [19], [30], [31]. Modeling memory constraints in the context of social learn- ing is an important research problem [15, Chapter 5]. In recent results, W ilson [32] considers the model of a decision maker who chooses between two actions with pay-of fs that depend on the true state of the world. Furthermore, the decision maker must always summarize her information into one of the ﬁnitely many states, leading to optimal decision rules that specify the transfers between these states. The problem of learning with ﬁnite memory in the context of hypothesis testing was originally formulated by [33], [34], where memory constraints restrict the storage capacity for the test statistics. Accordingly , while sufﬁcient statistics are very useful computational tools their utility for memory reduction is not clear . Subsequent results provide sophisticated algorithms to perform the task of hypothesis testing using test statistics that take only ﬁnitely many values and to guarantee an asymptotically vanishing error probability [35]–[38]. More recently , the authors in [39] hav e considered this problem in a setting where agents each receiv e an independent priv ate signal and make decisions sequentially . Memory in this context refers to the number of immediate predecessors whose decisions are observable by an y giv en agent at the time of making her decision. Accordingly , while the almost sure con ver gence of the sequence of indi- vidual decisions to the correct state is not po s sible in this ﬁnite memory setting, the authors construct decision rules that achiev e conv ergence and learning in probability . They next go on to consider the behavior of rational (pay-off maximizing) agents in this context and sho w that in no equilibrium of the associated Bayesian game learning can occur . T o avoid the complexities of fully rational inference, a vari- ety of non-Bayesian update rules hav e been proposed that rely on the seminal work of DeGroot in linear opinion pooling [40], where agents update their opinions to a conv ex combination of their neighbors’ beliefs and the coef ﬁcients correspond to the lev el of conﬁdence that each agent puts in each of her neighbors. More recently , [20], [41] consider a v ariation of this model for streaming observations, where in addition to the neighboring beliefs the agents also receiv e priv ate signals. Other forms of non-Bayesian rules were studied by [42] who consider a variation of observational learning in which agents observe the action and pay-offs of their neighbors and make rational inferences about these action/pay-off correspondence together with the choices made by their neighbors, b ut ignore the fact that their neighbors are themselves learning from their own observ ations. In more recent results, [43], [44] consider models of autarkic play where players at each generation ob- serve their predecessor b ut na ¨ ıvely think that an y predecessor’ s action relies solely on that player’ s pri v ate information, thus ignoring the possibility that successi ve generations are learning from each other . A. Motivation & Contributions In this paper, we analyze a model of repeated interactions for social learning when agents receiv e priv ate signals and observe their neighboring decisions (actions) at ev ery epoch of time. Such a model is a good descriptor for online reputation and polling systems such as Y elp R  and TripAdvisor R  , where individuals’ recommendations are based on their priv ate ob- servations and recommendations of their friends [45, Chapter 5]. The analysis of such systems is important not only because they play a signiﬁcant role in generating rev enues for the busi- nesses that are being ranked [46], but also for the purposes of designing fair rankings and accurate recommendation systems. Heuristics are widely used in the literature to model social interactions and decision making [47]–[49]. They provide tractable tools to analyze boundedly rational beha vior and of fer useful insights about decision making under uncertainty [10], [50]. They are also veriﬁed to be good descriptors for the behavior of real world agents in the experimental studies by Grimm and Mengel [51] and Chandrasekhar, Larreguy and Xandri [52]. Despite their widespread applications, theoretical and axiomatic foundations of social inferences using non- Bayesian (heuristic) update rules hav e received limited atten- tion and only recently [53], [54]. A comprehensiv e theory of non-Bayesian learning that reconciles the rational and boundedly rational approaches with the widely used heuristics remains in demand. A chief contribution of this paper is in establishing a behavioral foundation for the existing non- Bayesian updates in the literature. In particular , this paper addresses the question of how one can limit the information and cognitive requirements of a Bayesian inference and still ensure consensus or learning for the agents. Some of the non-Bayesian update rules have the property that they resemble the replication of a ﬁrst step of a Bayesian update from a common prior cf. [9]; in this paper , we formalize and expand up on this idea. In particular , we propose the so-called Bayesian without Recall (BWR) model as a belief formation and update rule for Rational but Memoryless agents. T o model the behavior of such agents we replicate the time- one update of a Bayesian agent for all future time steps. On the one hand, the BWR model of inference is motiv ated by the real-world behavior of people reﬂected in their spur-of-the- moment decisions and impromptu behavior; basing decisions only on the immediately observed actions and without regard for the history of such actions. On the other hand, BWR offers a boundedly rational approach to model decision making over social networks. The latter is in contrast with the Bayesian approach which is not only unrealistic in the amount of cognitiv e burden that it imposes on the agents, but also is often computationally intractable and complex to analyze. B. Brief Overview & Organization of P aper A key message of this paper is to show how the BWR scheme can provide justiﬁcation for some well-known update rules. These rules can be equiv alently explained as the update of a Bayesian agent that naiv ely assumes the beliefs of each of her neighbors were formed by a priv ate observ ation, and not through repeated interaction with others. W e begin by spe- cializing the BWR model to a case where agents try to decide between one of the two possible states and are rewarded for ev ery correct choice that they make. W e show that the BWR action updates in this case are given by weighted majority and thr eshold rules that linearly combine the observed binary actions and log-likelihoods of priv ate signal. W e show that under these update rules the action proﬁles ev olve as a Markov chain on the Boolean cube and that the properties of consensus and learning are subsequently determined by the equilibria of this Marko v chain. When there are only ﬁnitely many states of the world and agents choose actions over the probability simplex, then the action spaces are rich enough to rev eal the beliefs of every communicating agent. W e sho w that the BWR updates in this second case are log-linear in the reported beliefs of the neighbors and the likelihood of private signals. W e inv estigate the properties of con vergence and learning for such agents in a strongly connected social network, provided that the truth is identiﬁable through the aggregate observations of the agents across entire network. This is of particular interest, when the agents cannot distinguish the truth based solely on their priv ate observations, and yet together they learn. Analysis of con vergence and learning in this case re veals that almost- sure learning happens only if the agents are arranged in a directed circle. W e explain how the circular BWR updates generalize to any strongly connected topology by choosing a single neighbor randomly at every time step. W e characterize the rate of learning in such cases as being asymptotically exponentially fast with an e xponent that is linear in time and whose coefﬁcient can be expressed as a weighted av erage of the relati ve entropies of the signal likelihoods of all agents. The remainder of this paper is organized as follows. The details of our BWR model for social learning are explained in Section II. In Section III we specialize the BWR model to a binary state and action space and inv estigate the ev olution of actions in the resultant Ising model. Next in Section IV we analyze the case where the network agents are announcing their beliefs at ev ery epoch and the BWR updates become log-linear . Section V concludes the paper , followed by four mathematical appendices, labeled A to E, which provide the detailed deriv ations and proofs for the technical results that are presented throughout the paper . I I . T H E B W R M O D E L O F N A I VE I N F E R E N C E A dual process theory for the psychology of thinking identiﬁes two systems for the operations of mind [55]: one that is fast, intuiti ve, non-deliberativ e, habitual and automatic (system one); and a second one that is slo w , attenti ve, effortful, deliberativ e, and conscious (system two) [56]. Major advances in beha vioral economics are due to incorporation of this dual process theory and the subsequent models of bounded rationality [57]. Reliance on heuristics for decision making is a distinctiv e feature of system one that av oids the computational burdens of a rational ev aluation; system two on the other hand, is bound to deliberate on the options based on the av ailable information before making recommendations. The interplay between these two systems and how they shape the individual decisions is of paramount importance [58]. According to the BWR model of nai ve inference for social learning, as the agent experiences with her en vironment her initial response would engage her system two: she rationally ev aluates the reports of her neighbors and use them along with her own priv ate signal to make a decision. Ho wever , after her initial experience and by engaging in repeated interactions with the environment, her system one takes over her decision processes, implementing a heuristic that imitates her (rational/Bayesian) inferences from her initial experience; hence avoiding the burden of additional cogniti ve processing in the ensuing interactions with her en vironment. In the following subsections we set forth the mathematical and modeling details that allow us to implement the BWR model in a variety of environments with different utility , observation and information structures. A. Signals and En vironment Consider a set of n agents that are labeled by [ n ] and interact according to a digraph G = ([ n ] , E ) . 1 The neighborhood of agent i is the set of all agents who communicate with agent i and it is denoted by N ( i ) = { j ∈ [ n ]; ( j, i ) ∈ E } ; the degree of agent i is the cardinality of its neighborhood card ( N ( i )) . Digraph G is strongly connected if there are directed paths from any agent to any other agents. Let Θ denote a ﬁnite set of possible states of the world and ∆Θ be the space of all probability measures on the set Θ . 1 Throughout the paper, R is the set of real numbers, N denotes the set of all natural numbers, and N 0 := N ∪ { 0 } . F or n ∈ N a ﬁxed inte ger the set of integers { 1 , 2 , . . . , n } is denoted by [ n ] , while any other set is represented by a calligraphic capital letter. The cardinality of a set X , which is the number of its elements, is denoted by card ( X ) . The set difference between X and Y denoted by X K Y is the set of all elements of X that do not belong to Y . A parameter θ ← - Θ is chosen arbitrarily from Θ by nature. 1 Associated with each agent i , S i is a ﬁnite set called the signal space of i and given θ , l i ( ·| θ ) is a probability mass function on S i , which is referred to as the signal structur e or likelihood function of agent i . Let t ∈ N 0 denote the time index; for each agent i , { s i,t , t ∈ N 0 } is a sequence of independent and identically distributed (i.i.d.) random variables that take values in S i and with the probability mass function l i ( ·| θ ) . This sequence represents the pri v ate signals that agent i observes ov er time. Note that the priv ate signals are independent over time and across the agents. Remark 1 (Learning from others) . The fact that differ ent people make independent observations about the underlying truth state θ gives them incentive to communicate in social networks, in order to beneﬁt fr om each others’ observations and to augment their private information. Mor eover , differ ent people differ in their observational abilities. F or instance suppose that the signal structure of agent i allows her to distinguish the truth θ and the false state ˇ θ , while the two states ˆ θ and θ are indistinguishable to her: i.e. ` i ( s i | ˇ θ ) 6 = ` i ( s i | θ ) for some s i ∈ S i , wher eas ` i ( s i | ˇ θ ) = ` i ( s i | θ ) for all s i ∈ S i . In such cir cumstances, agent i can never r esolve her ambiguity between θ and ˇ θ on her own; hence, she has no choice but to rely on other people’s observations to be able to learn the truth state with certainty . For any ˇ θ ∈ Θ let λ ˇ θ : ∪ i ∈ [ n ] S i → R be the real valued function measuring log -likelihood ratio of the signal s i under states ˇ θ and θ , deﬁned as λ ˇ θ ( s i ) := log  ` i ( s i | ˇ θ ) /` i ( s i | θ )  . This is a measure of the information content that the signal s i provides for distinguishing the false state ˇ θ from the truth θ . Subsequently we work with the probability triplet (Ω , F , P θ ) , where Ω =  Q i ∈ [ n ] S i  N 0 is an inﬁnite product space with a typical element ω = (( s 1 , 0 , . . . , s n, 0 ) , ( s 1 , 1 , . . . , s n, 1 ) , . . . ) and the associated sigma ﬁeld F = P (Ω) . The probability measure on Ω is P θ ( · ) which assigns probabilities consistently with the likelihood functions l i ( ·| θ ) , i ∈ [ n ] ; and in such a way that conditional on θ the random variables s i,t , i ∈ [ n ] , t ∈ N 0 taking values in S i , i ∈ [ n ] , are independent. The expectation operator E θ {·} represents integration with respect to d P θ ( ω ) , ω ∈ Ω . B. Beliefs, Observations, Actions and Rewar ds An agents’ belief about the unknown allows her to make decisions ev en as her pay-off is dependent on the unknown state θ . These beliefs about the unknown state are probability distributions over Θ . Even before the true state θ is assigned and any observations are made, every agent i ∈ [ n ] holds a prior belief ν i ( · ) ∈ ∆Θ with full support: ν i ( ˆ θ ) > 0 , ∀ ˆ θ ∈ Θ ; this represents her subjectiv e biases about the true value of θ . For each time instant t , let µ i,t ( · ) be probability 1 For a set X , x ← - X denotes an arbitrary choice from the elements of X that is assigned to x . The power-set of X is the set of all its subsets and it is denoted by P ( X ) = {M ; M ⊂ X } . Boldface letters denote random variables, vectors are denoted by a bar ov er their respective upper or lower- case letters and T denotes matrix transpose. mass function on Θ , representing the opinion or belief at time t of agent i about the realized value of θ , and deﬁne φ i,t ( ˇ θ ) := log  µ i,t ( ˇ θ ) / µ i,t ( θ )  as the log-belief ratio of agent i at time t under the states ˇ θ and θ . Moreo ver , let P i,t {·} denote the probability measure on Θ × Ω that assigns probabilities consistently with µ i,t ( · ) and the independent signals likelihoods, and let the associated expectation operator be E i,t {·} . At t = 0 after θ ← - Θ is assigned, the values s i ∈ S i of s i, 0 are realized and the latter is observed priv ately by each agent i for all i ∈ [ n ] . Associated with every agent i is an action space A i that represents all the choices a vailable to her at e very point of time t ∈ N 0 , and a utility function u i ( · , · ) : A i × Θ → R . Subsequently , at every time t ∈ N 0 each agent i ∈ [ n ] chooses an action a i,t ∈ A i and is rewarded u i ( a i,t , θ ) . 2 Such modeling of rewards to actions at successi ve time periods is common place in the study of learning in games and a central question of interest is that of regret which measures possible gains by the players if they were to play other actions from what they have chosen in a realized path of play; in particular , existence of update/decision rules that would guarantee a vanishing time-av erage of regret as t → ∞ , and possible equilibria that characterize the limiting behavior of agents under such rules hav e attracted much attention, cf. [25], [60], [61]. Under a similar framework Rosenberg et al. [62] consider consensus in a general setting with players who observe a priv ate signal, choose an action and receive a pay-off at every stage, and pay-offs that depend only on an unknown parameter and players’ actions. They show that in this setting with no pay-off externalities and interactions which are purely informational players asymptoticly play their best-replies given their beliefs and will agree in their pay-of fs; in particular , all motiv es for experimentation will e ventually disappear . C. Naive (Memoryless) Action Updates Giv en s i, 0 , agent i forms an initial Bayesian opinion µ i, 0 ( · ) about the value of θ , which is giv en by µ i, 0 ( ˆ θ ) = ν i ( ˆ θ ) l i ( s i, 0 | ˆ θ ) P ˜ θ ∈ Θ ν i ( ˜ θ ) l i ( s i, 0 | ˜ θ ) , ∀ ˆ θ ∈ Θ . (1) She then chooses the action: a i, 0 ← - arg max a i ∈A i P ˆ θ ∈ Θ u i ( a i , ˆ θ ) µ i, 0 ( ˆ θ ) , maximizing her expected rew ard: E i, 0 { u i ( a i, 0 , θ ) } . Not being notiﬁed of the actual realized value for u i ( a i, 0 , θ ) , she then observes the actions that her neighbors have taken: a j, 0 , j ∈ N ( i ) . Gi ven her extended set 2 The utility functions u i ( · , · ) , signal structures l i ( ·|· ) , priors ν i ( · ) , as well as the corresponding sample spaces A i , S i and Θ are all common knowledge amongst the communicating agents for all i ∈ [ n ] . The assumption of common knowledge in the case of fully rational (Bayesian) agents implies that giv en the same observations of one another’ s actions or priv ate signals distinct agents would mak e identical inferences; in the sense that starting form the same belief about the unknown θ , their updated beliefs giv en the same observations would be the same; in Aumann’ s words, rational agents cannot agree to disagree [59]. of observ ations at time t = 1 , she mak es a second and possibly dif ferent mo v e a i, 1 according to a i, 1 ← - arg max a i ∈A i X ˆ θ ∈ Θ u i ( a i , ˆ θ ) µ i, 1 ( ˆ θ ) , (2) maximizing her e xpected pay of f condi tional on e v ery- thing that she has observ ed thus f ar: E i, 1 { u i ( a i, 1 , ˆ θ ) } = E { u i ( a i, 1 , ˆ θ ) | s i, 0 , a j , 0 : j ∈ N ( i ) } . Subsequently , she is granted her net re w ard of u i ( a i, 0 , θ ) + u i ( a i, 1 , θ ) from her past tw o plays. F ollo wing realization of re w ards for their ﬁrst tw o plays, in an y subsequent time instance t > 1 each agent i ∈ [ n ] observ es a pri v ate signal s i,t together with the preceding actions of her neighbors a j ,t − 1 , j ∈ N ( i ) . She then tak es an option a i,t out of the set A i , such that her e xpected utility gi v en her observ ations is maximized. Of particular signiﬁcance in our description of the beha vior of agents in the succeeding time periods t > 1 , is the relation f i ( s i, 0 , a j , 0 : j ∈ N ( i )) := a i, 1 ← - arg max a i ∈A i E i, 1 { u i ( a i , ˆ θ ) } (3) deri v ed in (2), whi ch gi v en the observ ations of agent i from time t = 0 , speciﬁes her (Bayesian) pay-of f maximizing action for time t = 1 . Note that in writing (2), we assumed that the agents do not recei v e an y pri v ate signals at t = 1 and there is therefore no s i, 1 appearing in the updates of an y agent i ; and this con v ention is e xactly to f acilitate the deri v ation of mapping f i : S i × Q j ∈N ( i ) A j → A i , from the pri v ate signal space and action spaces of the neighbors to succeeding actions of each agent. I n e v ery follo wing instance we aim to model the inferences of agents about their observ ations as being rational b ut memoryless: as of those who come to kno w their immediate observ ations which include the actions of their neighbors and their last pri v ate signals, b ut cannot trace these observ ations to their roots and has no ability to reason about wh y their neighbors may be beha ving the w ay the y do. In particular , such agents ha v e no incenti v es for e xperimenting with f alse reports, as their lack of memory pre v ents them from reaping the beneﬁts of their e xperiment, including an y possible re v elations that a truthful report may not re v eal. Subsequently , we ar gue on normati v e grounds that such rational b ut memoryless agents w ould replicate the beha vior of a Bayesian (fully-rational ) agent between times zero and one; whence by re g arding their observ ations as being direct consequences of inferences that are made based on the initial priors, the y reject an y possibility of a past history be yond their immediate observ ations: 1 a i,t = f i ( s i,t , a j ,t − 1 : j ∈ N ( i )) , ∀ t > 1 . On the other hand, note that rationality of agents constrains their beliefs µ i,t ( · ) gi v en their immediate observ ations; hence, we can also write a i,t ← - arg m ax a i ∈A i E i,t { u i ( a i , ˆ θ ) } , (4) or equi v alently a i,t ← - arg max a i ∈A i P ˆ θ ∈ Θ u i ( a i , ˆ θ ) µ i,t ( ˆ θ ) . In the sequel, we e xplore v arious structures for the action space and the resultant update rules f i . In Section III, we sho w ho w a common heuristic such as weighted majority can be e xplained as a rational b ut memoryless beha vior with actions tak en from a binary set. In Section IV we shift focus to a ﬁnite state space and the probability simple x as the action space. There agents e xchange beliefs and the belief updates are log-linear . I I I . W E I G H T E D M A J O R I T Y A N D T H R E S H O L D R U L E S Consider a binary state space Θ = { +1 , − 1 } , and suppose that the agents ha v e a common binary action space A i = {− 1 , 1 } , for all i . Let their utilities be gi v en by u i ( a, θ ) = 2 1 a ( θ ) − 1 , for an y agent i and all θ , a ∈ {− 1 , 1 } ; here, 1 a ( θ ) is equal to one only if θ = a and is equal to zero otherwise. Subsequently , the agent is re w arded by +1 e v ery time she correctly determines the v alue of θ and is penalized by − 1 otherwise (Fig. 1). Fig. 1: Bipartisanship is an e xample of a binary state space. W e can no w calculate P ˆ θ ∈ Θ u i ( a i , ˆ θ ) µ i,t ( ˆ θ ) = a ( µ i,t (+1) − µ i,t ( − 1)) = a (2 µ i,t (+1) − 1) , ∀ a ∈ {− 1 , 1 } ; 1 Extensions of the abo v e beha vioral model to rational agents with bounded memory is of interest; nonetheless, the analysis of Bayesian update e v en in the simplest cases become increasingly comple x. As an e xample, consider a rational agent who recalls only the last tw o epochs of her past. In order for such an agent to interpret her observ ations in the penultimate and ultimate steps, she needs not only a prior to interpret her neighbor’ s beliefs at the penultimate step, b ut also a prior to i nterpret her neighbor’ s inferences about what she reported to them at the penultimate step leading to their ultimate beliefs. In other w ords, she needs a prior on what her neighbor’ s re g ard as her prior when t he y interpret what she reports to them as her penultimate belief. Indeed, such belief hierarchies are commonplace in g ame-theoretic analysis of incomplete information and are captured by the formalism of type space [63], [64]. and from (4), we get 1 a i,t =  1 if µ i,t (+1) ≥ µ i,t ( − 1) , − 1 if µ i,t (+1) < µ i,t ( − 1) , (5) W e can no w proceed to deri ve the memoryless update rule f i under the above prescribed settings. This is achiev ed by the following expression of the action update of agent i at time 1 . Throughout this section and without any loss of generality , we assume that θ = − 1 . Lemma 1 (Time-One Bayesian Actions) . The Bayesian action of agent i at time one following her observations of actions of her neighbors at time zer o and her own private signal at time zer o is given by a i, 1 = sign ( P j ∈N ( i ) w j a j, 0 + η i + λ 1 ( s i, 0 )) , wher e w i and η i ar e constants for each i and they are completely determined by the initial prior and signal structur es of ag ent i and her neighbors. The exact e xpressions of the constants w i , η i and their deri vations can be found in Appendix A. Indeed, making the necessary substitutions we deri ve the follow- ing memoryless update f i for all t > 1 : a i,t = sign  P j ∈N ( i ) w j a j,t − 1 + η i + λ 1 ( s i,t )  . This update rule has a familiar format as a weighted majority and threshold function with the weights and threshold giv en by w i and t i,t := − λ 1 ( s i,t ) − η i , the latter being random and time-v arying. Majority and threshold functions are studied in the analysis of Boolean functions [65, Chapter 5] and sev eral properties of them including their noise stability are of particular interest [66]–[68]. This update rule also appears as the McCulloch- Pitts model of an artiﬁcial neuron [69], with important applica- tions in neural networks and computing [70]. This update rule is also important in the study of the Glauber Dynamics in the Ising model, where the ± 1 states represent atomic spins. The spins are arranged in a graph and each spin conﬁguration has a probability associated with it depending on the temperature and the interaction structure [71, Chapter 15], [72]. The Ising model provides a natural setting for the study of cooperative behavior in social networks. Recent studies ha ve explored the applications of Ising model for analysis of social and economic phenomena such as rumor spreading [73], study of market equilibria [74], and opinion dynamics [75]. Follo wing this model, every agent i ∈ [ n ] chooses her action a i,t ∈ {± 1 } as the sign of P j ∈N ( i ) w j a j,t − 1 + η i + λ 1 ( s i,t ) . Subsequently , in processing her a v ailable data and choosing her action a i,t , ev ery agent seeks to maximize ( P j ∈N ( i ) w j a j,t − 1 a i,t + a i,t ( η i + λ 1 ( s i,t )) . Hence, we can interpret each of the terms appearing as the argument of the sign function, in accordance with how they inﬂuence agent i ’ s choice of action. In particular , the term η i + λ 1 ( s i,t ) represents the propensity of agent i in choosing the false action θ 1 := 1 at time t , and it is determined by the log -likelihood ratio of 1 In writing (5) we follow the con vention that agents choose +1 when they are indifferent between their two options. Similarly , the sign function is assumed to take the value +1 when its argument is zero. This assumption is consistently followed ev erywhere throughout this paper, except in Proposition 1 and its proof in Appendix C, see the footnote therein for further details. priv ate signal λ 1 ( s i,t ) , as well as her innate tendency tow ards +1 irrespectiv e of any observ ations. The latter is reﬂected in the constant η i := log ( ν i ( θ 1 ) /ν i ( θ 2 )) + log V i based on the log -ratio of her initial prior belief and her knowledge of her neighbor’ s signal structures, as captured by the constant V i in (17) of Appendix A. The latter is increasing in ` j ( s j | θ 1 ) and decreasing in ` j ( s j | θ 2 ) for any ﬁxed signal s j ∈ S j , j ∈ N ( i ) ; cf. Lemma 3 of Appendix A. By the same token, we can also interpret the interaction terms w j a j,t − 1 a i,t . Lemma 4 of Appendix A establishes that constants w j are non-negati ve for every agent j ∈ [ n ] . Hence, in maximizing P j ∈N ( i ) w j a j,t − 1 a + a ( η i + λ 1 ( s i,t )) through her choice of a ∈ ± 1 at every time t , agent i aspires to align her choice with as many of her neighbors j ∈ N ( i ) as possible. Howe ver , in doing so she weighs more the actions of those of her neighbors j ∈ N ( i ) who hav e larger constants w j . The constant w j := log W j with W j giv en in (18) of Appendix A is a measure of observational ability of agent j as relates to our model: agents with large constants w j are those who hold expert opinions in the social network and they play a major role in shaping the actions of their neighboring agents. Positivity of w i for any i ∈ [ n ] , per Lemma 4 of Appendix A, also signiﬁes a case of positi ve externalities: an agent is more likely to choose an action if her neighbors make the same decision. A. Analysis of Con vergence and Learning in Ising Networks T o begin with the analysis of the binary action update dynamics deriv ed above, we introduce some useful notation. For all t ∈ N 0 , let a t := ( a 1 ,t , . . . , a n,t ) T be the proﬁle of actions taken by all agents at time t . Subsequently , we are interested in the probabilistic ev olution of the action proﬁles a t , t ∈ N 0 under the following dynamics a i, 0 = sign  log ν i ( θ 1 ) ν i ( θ 2 ) + λ 1 ( s i, 0 )  , (6) a i,t = sign   X j ∈N ( i ) w j a j,t − 1 + η i + λ 1 ( s i,t )   , t ≥ 1 , (7) for all i ∈ [ n ] . The two constants w i and η i for each agent i are speciﬁed in Appendix A and they depend only on the signal structure and initial prior of that agent and her neighbors. The ev olution of action proﬁles a t in (7) speciﬁes a ﬁnite Markov chain that jumps between the vertices of the Boolean hyper cube, {± 1 } n . The analysis of the time-e volution of action proﬁles is facilitated by the classical results from the theory of ﬁnite Marko v chains with the details spelled out in Appendix B. If the signal structures are rich enough to allo w for suf- ﬁciently strong signals (having large absolute log -likelihood ratios), or if the initial priors are suf ﬁciently balanced (di viding the probability mass almost equally between θ 1 and θ 2 ), then any action proﬁles belonging to {± 1 } n is realizable as a 0 with positiv e probability under (6). In particular , any recurrent state of the ﬁnite Markov chain over the Boolean cube is reachable with positiv e probability and the asymptotic behavior can be only determined up to a distribution over the ﬁrst set of communicating recurrent states that is reached by a t , cf. Proposition 2 of Appendix B. Howe ver , if a recurrent class constitutes a singleton, then our model makes sharper predictions: lim t →∞ a t almost surely exists and is identiﬁed as an absorbing state of the ﬁnite Markov chain. This special case is treated next due to its interesting implications. B. Equilibrium, Consensus, and (Mis-)Learning W e begin by noting that the absorbing states of the Marko v chain of action proﬁles specify the equilibria under the action update dynamics in (7). Formally , an equilibrium a ∗ ∈ {± 1 } n is such that if the dynamics in (7) is initialized by a 0 = a ∗ , then with probability one it satisﬁes a t = a ∗ for all t ≥ 1 . Subsequently , the set of all equilibria is completely char- acterized as the set of all absorbing states, i.e. any action proﬁles a ∗ ∈ {± 1 } n satisfying P ( a ∗ , a ∗ ) = 1 , where P : {± 1 } n × {± 1 } n → [0 , 1] speciﬁes the transition probabilities in the Markov chain of action proﬁles, as deﬁned in (21) of Appendix B. It is useful to express this condition in terms of the model parameters as follows. The proof is included in Appendix C and with a cav eat explained in its footnote. Proposition 1 (Characterization of the Equilibria) . An action pr oﬁle ( a ∗ 1 , . . . , a ∗ n ) ∈ {± 1 } n is an equilibrium of (7) if, and only if, − min s i ∈S i a ∗ i ( λ 1 ( s i ) + η i ) ≤ P j ∈N ( i ) w j a ∗ j a ∗ i , ∀ i ∈ [ n ] . Of particular interest are the two action proﬁles (1 , . . . , 1) T and ( − 1 , . . . , − 1) T which specify a consensus amongst the agents in their chosen actions. The preceding characterization of equilibria is specialized next to highlight the necessary and sufﬁcient conditions for the agents to be at equilibrium whenev er they are in consensus. Corollary 1 (Equilibrium at Consensus) . The agents will be in equilibrium at consensus if, and only if, max s i ∈S i | λ 1 ( s i ) + η i | < P j ∈N ( i ) w j , ∀ i ∈ [ n ] . The requirement of learning under our model is for the agents to reach a consensus on truth. That is for the action proﬁles a t to con verge to ( θ , . . . , θ ) as t → ∞ . In particular , as in Corollary 1, we need agents to be at equilibrium when in consensus; hence, there would always be a positi ve probability for the agents to reach consensus on an untruth: with a positive probability , the agents (mis-)learn. Next in Section IV we sho w that when the action space is rich enough to rev eal the beliefs of the agents, then the rational but memoryless beha vior culminates in a log-linear updating of the beliefs with the observations. The analysis of con vergence and learning under these log-linear updates consumes the bulk of that section. I V . L O G - L I N E A R U P D A T E R U L E S Suppose that Θ = { θ 1 , . . . , θ m } and for all j label θ j by e j ∈ R m which is a column vector of all zeros except for its j -th element which is equal to one. Furthermore, for all agents i ∈ [ n ] let A i be the m -dimensional probability simplex: A i = { ( x 1 , . . . , x m ) T ∈ R m : P m 1 x i = 1 and x i ≥ 0 , ∀ i } ; and for all a := ( a 1 , . . . , a m ) T ∈ A i and θ j ∈ Θ , set u i ( a, θ j ) = −k a − e j k 2 2 := − (1 − a j ) 2 − m X k =1 , k 6 = j a 2 k . Subsequently , we can calculate the expected pay-of f from ev ery such action a as: E i,t { u i ( a, θ ) } = − m X k =1 a 2 k − 1 + 2 m X j =1 a j µ i,t ( θ j ) . (8) Over the m -dimensional probability simplex, (8) is uniquely maximized by a ∗ := ( µ i,t ( θ 1 ) , . . . , µ i,t ( θ m )) T . Hence, with the probability simplex as their action space A i and subjected to the aforementioned utility structure, ever agnet announces her beliefs truthfully , as her optimal action at ev ery epoch of time: a i,t = arg max a ∈A i E i,t { u i ( a, θ ) } ≡ µ i,t ( · ) . Therefore, the memoryless update rule f i in (3) describes how agents’ beliefs are being updated follo wing their observations (Fig. 2). Fig. 2: People rev eal their beliefs through status updates and what they share and post on various social media platforms. In Appendix D, we calculate the follo wing Bayesian belief at time one, in terms of the observed neighboring beliefs and priv ate signal at time zero. Lemma 2 (T ime-One Bayesian Beliefs) . The Bayesian belief of agent i at time one following her observations of beliefs of her neighbors at time zer o and her own private signal at time zer o is given by µ i, 1 ( ˆ θ ) = ν i ( ˆ θ ) l i ( s i, 0 | ˆ θ )  Q j ∈N ( i ) µ j, 0 ( ˆ θ ) ν j ( ˆ θ )  P ˜ θ ∈ Θ ν i ( ˜ θ ) l i ( s i, 0 | ˜ θ )  Q j ∈N ( i ) µ j, 0 ( ˜ θ ) ν j ( ˜ θ )  , ∀ ˆ θ ∈ Θ . (9) Subsequently , at any time step t > 1 , each agent i observes the realized values of s i,t as well as the current beliefs of her neighbors µ j,t − 1 ( · ) , ∀ j ∈ N ( i ) and forms a reﬁned opinion µ i,t ( · ) , using the following rule: µ i,t ( ˆ θ ) = ν i ( ˆ θ ) l i ( s i,t | ˆ θ ) Q j ∈N ( i ) µ j,t − 1 ( ˆ θ ) ν j ( ˆ θ ) ! P ˜ θ ∈ Θ ν i ( ˜ θ ) l i ( s i,t | ˜ θ ) Q j ∈N ( i ) µ j,t − 1 ( ˜ θ ) ν j ( ˜ θ ) ! , (10) for all ˆ θ ∈ Θ and at an y t > 1 . In writing (10), ev ery time agent i regards each of her neighbors j ∈ N ( i ) as having started from prior belief ν j ( · ) and arri ved at their currently reported belief µ j,t − 1 ( · ) directly , hence rejecting any possibility of a past history . This is equiv alent to the assumption that the reported beliefs of every neighbor are formed from a priv ate observation and a ﬁxed prior , and not through repeated communications. Such a rule is of course not the optimum Bayesian update of agent i ’ s belief at any step t > 1 , because the agent is not taking into account the complete observed history of her priv ate signals and neighbors’ beliefs and is instead, basing her inference entirely on the immediately observed signal and neighboring beliefs; hence, the name memoryless . Here, the status of a Rational but Memoryless agent is akin to a person who is possessed of a knowledge but cannot see how she has come to be possessed of that knowledge. Likewise, it is by the requirement of rationality in such a predicament that we impose a ﬁxed prior ν i ( · ) on e very agent i and carry it through for all times t . Indeed, it is the grand tradition of Bayesian statistics, as advocated in the prominent and inﬂuential works of [76], [77], [78], [79] and man y others, to argue on normativ e grounds that rational beha vior in a decision theoretic framew ork forces individuals to employ Bayes rule and appropriate it to their personal priors. A. Analysis of Con vergence and Log-Linear Learning A main question of interest is whether the agents can learn the true realized value θ : Deﬁnition 1 (Learning) . An agent i is said to learn the truth, if lim t →∞ µ i,t ( θ ) = 1 , P θ -almost sur ely . W e begin our analysis of con vergence and learning under the update rule in (10) by considering the case of a single agent i , who starts from a prior belief ν i ( · ) and sequentially updates her beliefs according to Bayes rule: µ i,t ( ˆ θ ) = µ i,t − 1 ( ˆ θ ) ` i ( s i,t | ˆ θ ) X ˜ θ ∈ Θ µ i,t − 1 ( ˜ θ ) ` i ( s i,t | ˜ θ ) , ∀ ˆ θ ∈ Θ . (11) The Bayesian belief update in (11) linearizes in terms of the log-ratio of beliefs and signal likelihoods, φ i,t ( · ) and λ ˇ θ ( · ) , leading to φ i,t ( ˇ θ ) = log  ν i ( ˇ θ ) ν i ( θ )  + t X τ =0 λ ˇ θ ( s i,τ ) → log  ν i ( ˇ θ ) ν i ( θ )  + ( t + 1) E θ { λ ˇ θ ( s i, 0 ) } (12) P θ -almost surely , as t → ∞ ; by the strong law of large numbers [80, Theorem 22.1] applied to the sequence of E θ - integrable, independent and identically distributed variables: λ ˇ θ ( s i,t ) , t ∈ N 0 . In particular, if D K L  ` i ( ·| θ ) || ` i ( ·| ˇ θ )  := − E θ { λ ˇ θ ( s i,t ) } > 0 , then φ i,t ( ˇ θ ) → −∞ almost surely and agent i asymptotically rejects the false state ˇ θ in fav or of the true state θ , putting a v anishing belief on the former relativ e to the latter . Therefore, the single Bayesian agent following (11) learns the truth if and only if D K L  ` i ( ·| θ ) || ` i ( ·| ˇ θ )  > 0 for all ˇ θ 6 = θ and the learning is asymptotically exponentially fast at the rate min ˇ θ ∈ Θ K { θ } D K L  ` i ( ·| θ ) || ` i ( ·| ˇ θ )  as sho wn in [81]. 1 The preceding result is also applicable to the case of a Bayesian agents with direct (centralized) access to all ob- servations across the network: consider an outside Bayesian agent ˆ o who shares the same common kno wledge of the prior and signal structures with the networked agents; in particular , ˆ o kno ws the signal structures ` i ( ·| ˆ θ ) , for all ˆ θ ∈ Θ and i ∈ [ n ] ; thence, making the same inferences as any other agent when gi ven access to the same observations. Consider next a Gedanken experiment where ˆ o is granted direct access to all the signals of ev ery agent at all times. The analysis leading to (12) can be applied to the ev olution of log belief ratios for ˆ o , whose observations at e very time t ∈ N 0 is an element of the product space Q i ∈ [ n ] S i . Subsequently , the centralized Bayesian beliefs concentrate on the true state at the asymptotically exponentially fast rate of R n := min ˇ θ ∈ Θ K { θ } D K L   Y i ∈ [ n ] ` i ( ·| θ )         Y i ∈ [ n ] ` i ( ·| ˇ θ )   = min ˇ θ ∈ Θ K { θ } X i ∈ [ n ] D K L  ` i ( ·| θ ) || ` i ( ·| ˇ θ )  . (13) Next to understand the ev olution of beliefs under the log- linear updates in (10), consider the network graph structure as encoded by its adjacency matrix A deﬁned as [ A ] ij = 1 ⇐ ⇒ ( j, i ) ∈ E , and [ A ] ij = 0 otherwise. For a strongly connected G the Perron-Frobenius theory [83, Theorem 1.5] implies that A has a simple positive real eigenv alue, denoted by ρ > 0 , which is equal to its spectral radius. Moreover , the left eigenspace associated with ρ is one-dimensional with the corresponding eigenv ector α = ( α 1 , . . . , α n ) T , uniquely satisfying P n i =1 α i = 1 , α i > 0 , ∀ i ∈ [ n ] , and α T A = ρα T . The entry α i is also called the centrality of agent i and as the name suggests, it is a measure of how central is the location of agent in the network. Our main result state that almost sure learning cannot be realized in a strongly connected network unless it has unit spectral radius which is the case only of a directed circle. Theorem 1 (No Learning when Spectral Radius ρ > 1 ) . In a str ongly connected social network and under the memoryless belief updates in (10) , no agents can learn the truth unless the 1 Note from the information inequality for the Kullback-Leibler di- ver gence that D K L ( ·||· ) ≥ 0 and the inequality is strict whenever ` i ( ·| ˇ θ ) 6≡ ` i ( ·| θ ) , i.e. ∃ s ∈ S i such that ` i ( s | ˇ θ ) 6 = ` i ( s | θ ) [82, The- orem 2.6.3]. Further note that whenever ` i ( ·| ˇ θ ) ≡ ` i ( ·| θ ) or equivalently D K L  ` i ( ·| θ ) || ` i ( ·| ˇ θ )  = 0 , then the two states ˇ θ and θ are statically indistinguishable to agent i : there is no way for agent i to distinguish between ˇ θ and θ , based only on her received signals. This is because both θ and ˇ θ induce the same probability distribution on her sequence of observed i.i.d. signals. Since different states ˆ θ ∈ Θ are distinguished through their different likelihood functions ` i ( · | ˆ θ ) ; the more reﬁned such dif ferences are, the better the states are distinguished. Hence, the proposed asymptotic rate is one measure of resolution for the likelihood structure of agent i . spectral radius ρ = 1 . Proof outline: A complete proof is included in Appendix E, but here we provide a description of the mechanism and the interplay between the belief aggregation and information propagation. T o f acilitate the exposition of the underlying logic we introduce some notation. W e deﬁne a global (network- wide) random variable Φ t ( ˇ θ ) := P n i =1 α i φ i,t ( ˇ θ ) , where α i is the centrality of agent i and Φ t ( ˇ θ ) characterizes how biased (away from the truth and towards ˇ θ ) the network beliefs and priors are at each point in time. In particular, if any agent is to learn the truth, then Φ t ( ˇ θ ) → −∞ as t → ∞ for all the false states ˇ θ ∈ Θ K { θ } . T o proceed, we deﬁne another network-wide random variable Λ t ( ˇ θ ) := P n i =1 α i λ ˇ θ ( s i,t ) which character- izes the information content of the observed signals (recei ved information) for the entire network, at each time t . Moreover , since the receiv ed signal vectors { ( s 1 ,t , . . . , s n,t ) , t ∈ N 0 } are i.i.d. over time, ∀ ˇ θ 6 = θ , { Λ t ( ˇ θ ) , t ∈ N 0 } constitutes a sequence of i.i.d. random variables satisfying E  Λ t ( ˇ θ )  = − P n i =1 α i D K L  ` i ( ·| θ ) || ` i ( ·| ˇ θ )  6 0 . In order for the agents to learn the true state of the world based on their observations, it is necessary that at each false state ˇ θ 6 = θ some agent be able to distinguish ˇ θ from the truth θ , in which case E  Λ t ( ˇ θ )  < 0 , and we can refer to this criterion as global identiﬁablity for the true state θ . 1 In Appendix E we argue that under the update rules in (10) the global belief ratio statistics Φ t ( ˇ θ ) e volv es as a sum of weighted i.i.d. variables ρ τ Λ t − τ ( ˇ θ ) : Φ t ( ˇ θ ) = t X τ =0 ρ τ  Λ t − τ ( ˇ θ ) + (1 − ρ ) β ( ˇ θ )  , (14) where β ( ˇ θ ) := P n i =1 α i log  ν i ( ˇ θ ) /ν i ( θ )  is a measure of bias in the initial prior beliefs. The weights in (14) form a geometric progressions in ρ ; hence, the variables increase unbounded in their variance and con vergence cannot hold true in a strongly connected social network, unless ρ = 1 . This is due to the fact that ρ upper bounds the average degree of the graph [84, Chapter 2], and ev ery node in a strongly connected graph has degree greater than or equal to one, subsequently ρ ≥ 1 for all strongly connected graphs.  Remark 2 (Polarization, data incest and unlearning) . The unlearning in the case of ρ > 1 in Theor em 1, which applies to all str ongly connected topologies except dir ected cir cles (wher e ρ = 1 , see Subsection IV -B below), is related to the inefﬁciencies associated with social learning and can be attributed to the agents’ naivety in inferring the sour ces of their information, and their inability to interpr et the actions of their neighbors rationally [85]. In particular , when ρ > 1 the noise or randomness in the agents’ observations is ampliﬁed at every stage of network interactions; since the agents fail to 1 The global identiﬁability condition can be also viewed in the following sense: consider a gedanken experiment where an external fully rational observer ˆ o is granted direct access to all the signals of all agents in the network and assume further that she shares the same common knowledge of the prior and signal structures with the network agents. Then ˆ o learns the truth if, and only if, it is globally identiﬁable. corr ect for the r epetitions in the sources of their observations as in the case of persuasion bias argued by DeMarzo, V ayanos and Zwiebel [9], or data incest ar gued by Krishnamurthy and Hoiles [11]. When ρ > 1 the ef fect of the agents’ priors is also ampliﬁed thr ough the network interactions and those states ˆ θ for which β ( ˆ θ ) > 0 in (14) , will be asymptotically rejected as P t τ =0 ρ τ (1 − ρ ) β ( ˇ θ ) → −∞ , irr espectively of the observed data Λ τ ( ˇ θ ) , τ ∈ N 0 . This phenomenon arises as agents engage in excessive anti-imitative behavior , compensating for the neighboring prior s at every period [44]. It is justiﬁed as a case of choice shift towar d mor e extr eme opinions [5], [6] or gr oup polarization [7], [8], when like-minded people after interacting with each other and under the inﬂuence of their mutually positive feedback become more extr eme in their opinions, and less receptive of opposing beliefs. B. Learning in Cir cles and General Connected T opologies For a strongly connected digraph G , if ρ = 1 , then it has to be the case that all nodes hav e degree 1 and the graph is a directed circle. Subsequently , the progression for Φ t ( ˇ θ ) in (14) reduces to sum of i.i.d. variables in L 1 and by the strong law of large numbers [80, Theorem 22.1], it con verges almost surely to the mean value Φ t ( ˇ θ ) = β ( ˇ θ ) + t X τ =0 Λ τ ( ˇ θ ) → β ( ˇ θ ) + ( t + 1) E  Λ 0 ( ˇ θ )  → −∞ , as t → ∞ , provided that E  Λ 0 ( ˇ θ )  < 0 , i.e. if the truth is globally identiﬁable. Note also the analogy with (12), where Λ t ( ˇ θ ) is replaced by λ ˇ θ ( s i,t ) as both represent the observed signal(s) or receiv ed information at time t . Indeed, if we further assume that ν i ( · ) ≡ ν ( · ) for all i , i.e. all agents share the same common prior , then (10) for a circular network becomes µ i,t ( ˆ θ ) = µ j,t − 1 ( ˆ θ ) ` i ( s i,t | ˆ θ ) X ˜ θ ∈ Θ µ j,t − 1 ( ˜ θ ) ` i ( s i,t | ˜ θ ) , ∀ ˆ θ ∈ Θ , (15) where j ∈ [ n ] is the unique verte x j ∈ N ( i ) . Update (15) repli- cates the Bayesian update of a single agents in (11) but the self belief µ i,t − 1 ( · ) on the right-hand side being is replaced by the belief µ j,t − 1 ( · ) of the unique neighbor { j } = N ( i ) . Indeed, the learning in this case is asymptotically exponentially fast at the rate (1 /n ) min ˇ θ ∈ Θ K { θ } P n j =1 D K L  ` j ( ·| θ ) || ` j ( ·| ˇ θ )  = 1 / 3 R 3 ; hence, the same exponential rate as that of a central Bayesian can be achiev ed through the BWR update rule, except for a 1 /n factor that decreases with the increasing cycle length, cf. [81]. Example 1 (Eight Agents with Binary Signals in a T ri-State W orld.) . Consider the network of agents in F ig. 3 with the true state of the world being 1 , the ﬁrst of the tr ee possible states Θ = { 1 , 2 , 3 } . The agents r eceive binary signals about the true state θ accor ding to the likelihoods listed in the table. 1 2 5 4 3 6 7 8 likelihoods ˆ θ = 1 ˆ θ = 2 ˆ θ = 3 l 1 ( s 1 ,t = 0 | ˆ θ ) 1 3 1 3 1 5 l 2 ( s 2 ,t = 0 | ˆ θ ) 1 2 2 3 1 2 l 3 ( s 3 ,t = 0 | ˆ θ ) 1 4 1 4 1 4 Fig. 3: A hybrid structure W e begin by the observation that this network can be thought of as a r ooted directed tr ee, in which the r oot node is replaced with a directed cir cle (the r oot cir cle). 1 Next note that the r oot cir cle is comprised of thr ee agents and none of them can learn the truth on their own. Indeed, agent 3 does not r eceive any informative signals; ther efore, in isolation i.e. using (11) , her beliefs shall never depart fr om their initial priors. W e further set l j ( · | · ) ≡ l 3 ( · | · ) for all j ∈ [8] K [3] , so that all the peripheral follower agents are also unable to infer anything about the true state of the world from their own private signals. Starting fr om a uniform common prior and following the pr oposed rules (15) , all ag ents asymptotically learn the true state, even though none of them can learn the true state on their own. The plots in F igs. 4 and 5 depict the evolutions of the beliefs for the thir d agent as well as the differ ence between the beliefs for the ﬁrst and eighth agents. W e can further show that all agents learn the true state at the same exponentially fast asymptotic rate. In fact, the three nodes belonging to the dir ected cir cle learn the true state of the world at the exponentially fast asymptotic rate of (1 / 3) R 3 noted above, irrespectively of the peripheral nodes. The r emaining peripheral nodes then follow up with the beliefs of r oot cir cle nodes, except for a vanishing differ ence that increases with the incr easing distance of a peripheral node fr om the r oot circle: following (15) , the ﬁrst three agents form a circle of leaders wher e the y combine their observations and r each a consensus; every other agent in the network then follows whatever state that the leaders have collectively agr eed upon. In [86] the authors show the application of the up- date rule in (15) to general strongly connected topologies 1 Any weakly connected digraph G which has only degree zero or degree one nodes can be drawn as a rooted tree, whose root is replaced by a directed circle, a so-called root circle. This is true since any such digraph can have at most one directed circle and all other nodes that are connected to this circle should be directed away from it, otherwise G would hav e to include a node of degree two or higher . 0 50 100 150 200 250 300 350 400 450 500 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 t µ 3 , t ( ˆ θ ) ˆ θ = 1 ˆ θ = 2 ˆ θ = 3 Fig. 4: Evolution of the third agent’ s beliefs ov er time 0 50 100 150 200 250 300 350 400 450 500 −0.1 −0.05 0 0.05 0.1 0.15 t µ 8 , t ( ˆ θ ) − µ 1 , t ( ˆ θ ) ˆ θ = 1 ˆ θ = 2 ˆ θ = 3 Fig. 5: The difference between the ﬁrst and eighth agents’ beliefs ov er time where agents have more than just a single neighbor in their neighborhoods. It is proposed to choose a neighbor j ∈ N ( i ) independently at random every time and then apply (15) with the reported belief from that neighbor . Here again if the truth is globally identiﬁable, all agents learn the truth at an asymptotically exponentially fast rate gi ven by min ˇ θ ∈ Θ K { θ } P n j =1 π j D K L  ` j ( ·| θ ) || ` j ( ·| ˇ θ )  , where π j are the probabilities in the stationary distribution of the Markov chain whose transition probabilities are the same as the probabilities for the random choice of neighbors at ev ery point in time. 1 It is notable that the asymptotic rate here is a weighted average of the K L distances D K L  ` i ( ·| θ ) || ` i ( ·| ˇ θ )  , in contrast with the arithmetic (unweighted) mean (1 /nR n ) that arise in the circular case. Both rates are upper bounded by the centralized Bayesian learning rate of R n calculated in (13). Finally , we point out that the rate of distributed learning upper bounds the (weighted) average of individual learning rates. It is due to the fact that observations of different agents complement each other, and while one agent may be good at distinguishing one false state from the truth, she can rely on observ ational abilities of other agents for distinguishing the remaining false states: consider agents 1 and 2 in Example 1, the former can distinguish ˆ θ = 2 from θ = 1 , while the latter is good at distinguishing ˆ θ = 3 from θ = 1 ; together they can distinguish all states. Hence, the ov erall rate of distributed learning upper bounds the av erage of individual learning rates, and is itself upper bounded by the learning rate of a central Bayesian agent: 1 n n X i =1 min ˇ θ ∈ Θ K { θ } D K L  ` i ( ·| θ ) || ` i ( ·| ˇ θ )  < min ˇ θ ∈ Θ K { θ } 1 n n X i =1 D K L  ` i ( ·| θ ) || ` i ( ·| ˇ θ )  = 1 n R n < R n . Fixing the priors over time will not result in conv ergence of beliefs, except in very speciﬁc cases as discussed above. In [90] we inv estigate the properties of con vergence and learning under the update rules in (10), where the priors ν j ( · ) are replaced by time-varying distributions ξ i,j ( · , t ) that parametrize the log-linear updating of the agents’ beliefs over time. It is notable that the memoryless Bayesian update in (10) has a log-linear structure similar to non-Bayesian update rules studied in the literature [91]–[95]; the roots for such a geometric av eraging of the neighboring beliefs is traced to logarithmic opinion pools [96], [97] and can be also justiﬁed under speciﬁc behavioral assumptions [53]. V . C O N C L U S I O N S This work addressed a social and observational learning model in social networks. Agents attempt to learn some un- known state of the world which belongs to a ﬁnite state space. Conditioned on the true state, a sequence of i.i.d. priv ate sig- nals are generated and observed by each agent of the network. The priv ate signals do not provide each agent with adequate information to identify the truth. Hence, agents interact with their neighbors to augment their imperfect observ ations with those of their neighbors. W e proposed a belief aggregation and 1 In many distributed learning models over random and switching net- works, agents must have positive self-reliant at any time; as for instance in gossip algorithms [87] and ergodic stationary processes [88]. This condition howe ver is relaxed under (15), as our agents rely entirely on the beliefs of their neighbors every time that they select a neighbor to communicate with. Moreover , unlike the majority of results that rely on the conv ergence properties of products of stochastic matrices and are applicable only to irreducible and aperiodic communication matrices, cf. [10, Proporition 1]; the con ver gence results in [86] do not require the transition probability matrix to be aperiodic, as it relies on properties of ergodic Markov chains and holds true for any irreducible, ﬁnite-state chain [89, Theorems 1.5.6 and 1.7.7]. inference scheme that we call Bayesian without Recall (BWR), as a behavioral model to interpret and provide justiﬁcation for the v ariety of non-Bayesian update rules that are suggested in the literature. Accordingly , by replicating the rule that maps the initial priors, neighbor’ s decisions, and the priv ate signal to Bayesian posterior at one time step for all future time steps, one can deriv e non-Bayesian updates with well-known and intuitiv e structures, such as majority rules or log-linear belief updates. Follo wing the BWR approach, the complexities of a fully rational inference at the forthcoming epochs are a voided, while some essential features of Bayesian inference are preserved. W e analyzed the speciﬁc form of BWR updates in two cases of binary state and action space, as well as a ﬁnite state space with actions taken ov er the probability simplex. In the case of binary actions the BWR updates take the form of a linear majority rule, whereas if the action spaces are rich enough for the agents to rev eal their beliefs, then belief updates take a log-linear format. In each case we in vestigate the properties of con vergence, consensus and learning; the latter is of particular interest, in a strongly connected social network when the truth is identiﬁable through the aggregate priv ate observations of all individuals but not individually . On the one hand, the speciﬁc forms of the BWR update rules in each case help us to better understand the mechanisms of naive inference, when rational agents are dev oid of their ability to make recollections. On the other hand, our results also highlight the consequences of such naiv ety in shaping the mass beha vior; by comparing our predications with the rational learning outcomes. In particular , we saw in Subsection III-B that there is a positiv e probability for rational but memoryless agents in an Ising model to mis-learn by reaching consensus on an untruth. Ho wev er Bayesian (fully rational) beliefs constitute a bounded martingale; hence, when truth is identiﬁable and number of observations increases, the beliefs of rational agents con verge almost surely to a point mass centered at the true state [28], [29]. Similarly Theorem 1 states the impossibility of asymptotic learning under the BWR belief updates, whenev er the spectral radius of the interconnection graph adjacency is greater than one. Last but not least, we pinpoint a key difference between the BWR action and belief updates: the former are weighted up- dates, wheras the latter are unweighted symmetric updates. Ac- cordingly , an agent weighs each neighbor’ s action differently and in accordance with the quality of priv ate signals which are inferred from actions. On the other hand, when communicating their beliefs the quality of each neighbor’ s signal is already internalized in their reported belief; hence, when incorporating her neighboring beliefs, an agent regards the reported beliefs of all her neighbors equally , and irrespectiv e of the quality of their pri vate signals. A P P E N D I X A P RO O F O F L E M M A 1 : T I M E - O N E B AY E S I A N A C T I O N S Note that giv en her observation of priv ate signal s i, 0 , the posterior probability assigned by agent i to the set θ 1 is given by (1) with ˆ θ = θ 1 . W e form a dichotomy of the signal space S i of each agent into S 1 i and S − 1 i ; by setting S 1 i := { s ∈ S i : ` i ( s | θ 1 ) ν i ( θ 1 ) ≥ ` i ( s | θ 2 ) ν i ( θ 2 ) } and S − 1 i := S i \ S 1 i . It thus follows from (6) that for any j ∈ N ( i ) the observation that a j, 0 = 1 is equiv alent to the information that { s j, 0 ∈ S 1 j } and a j, 0 = − 1 is equi v alent to the information that { s j, 0 ∈ S − 1 j } . Thereby , the belief of agent i at time t = 1 giv en her observation of the actions of her neighbors and the pri v ate signal s j, 0 is gi ven by µ i, 1 ( θ 1 ) = ` i ( s i, 0 | θ 1 ) Y j ∈N ( i )    X s j ∈S a j, 0 j ` j ( s j | θ 1 )    ν i ( θ 1 ) X ˆ θ ∈ Θ ` i ( s i, 0 | ˆ θ ) Y j ∈N ( i )    X s j ∈S a j, 0 j ` j ( s j | ˆ θ )    ν i ( ˆ θ ) , and we can thus form the ratio µ i, 1 ( θ 1 ) µ i, 1 ( θ 2 ) = ` i ( s i, 0 | θ 1 ) ν i ( θ 1 ) ` i ( s i, 0 | θ 2 ) ν i ( θ 2 ) Y j ∈N ( i )       X s j ∈S a j, 0 j ` j ( s j | θ 1 ) X s j ∈S a j, 0 j ` j ( s j | θ 2 )       = ` i ( s i, 0 | θ 1 ) ν i ( θ 1 ) ` i ( s i, 0 | θ 2 ) ν i ( θ 2 ) V i Y j ∈N ( i ) W a j, 0 j , (16) where for all i ∈ [ n ] we hav e deﬁned V i = Y j ∈N ( i )       X s j ∈S 1 j ` j ( s j | θ 1 ) X s j ∈S 1 j ` j ( s j | θ 2 ) × X s j ∈S − 1 j ` j ( s j | θ 1 ) X s j ∈S − 1 j ` j ( s j | θ 2 )       1 / 2 (17) W i =      X s i ∈S 1 i ` i ( s i | θ 1 ) X s i ∈S 1 i ` i ( s i | θ 2 ) × X s i ∈S − 1 i ` i ( s i | θ 2 ) X s i ∈S − 1 i ` i ( s i | θ 1 )      1 / 2 . (18) Furthermore let w i := log W i and η i := log( V i ν i ( θ 1 ) /ν i ( θ 2 )) be constants that are determined completely by the initial prior and signal structures of each agent and her neighbors. Subsequently , taking logarithms of both sides in (16) yields the following update rule for the log-ratio of the beliefs at time one, log  µ i, 1 ( θ 1 ) µ i, 1 ( θ 2 )  = X j ∈N ( i ) w j a j, 0 + η i + λ 1 ( s i, 0 ) . (19) Finally , we can apply (5) to derive the claimed expression in Lemma 1 for the updated Bayesian action of agent i following her observations of her neighbors’ actions a j, 0 , j ∈ N ( i ) and her own pri v ate signal s i, 0 . W e end our deriv ation by pointing out some facts concerning constants η i and w i which appear in (19). Lemma 3 (Monotonicity of η i ) . Consider any i ∈ [ n ] and ﬁx a signal s j ∈ S j for some j ∈ N ( i ) . It holds true that the constant η i is incr easing in ` j ( s j | θ 1 ) and decr easing in ` j ( s j | θ 2 ) . Pr oof. The claim follows directly from the deﬁning relation η i = log( ν i ( θ 1 ) /ν i ( θ 2 )) + log V i , as replacing from (17) yields log V i = + 1 2 X j ∈N ( i ) log X s j ∈S 1 j ` j ( s j | θ 1 ) + 1 2 X j ∈N ( i ) log X s j ∈S − 1 j ` j ( s j | θ 1 ) − 1 2 X j ∈N ( i ) log X s j ∈S 1 j ` j ( s j | θ 2 ) − 1 2 X j ∈N ( i ) log X s j ∈S − 1 j ` j ( s j | θ 2 ) . (20) The proof now follows upon the realization that for any ﬁxed s j ∈ S j , j ∈ N ( i ) the term ` j ( s j | θ 1 ) appears in one of the ﬁrst two terms appearing with a plus sign in (20), and the term ` j ( s j | θ 2 ) appears in one of the last ﬁrst two terms appearing with a minus sign in (20). Hence, when all else kept constant, log V i and subsequently η i is increasing in ` j ( s j | θ 1 ) and decreasing in ` j ( s j | θ 2 ) . Lemma 4 (Positivity of w i ) . It holds true for any i ∈ [ n ] that w i ≥ 0 . Pr oof. First note from the deﬁnitions of the sets S 1 i and S − 1 i that ∀ s ∈ S 1 i , ` i ( s | θ 1 ) ` i ( s | θ 2 ) ≥ ν i ( θ 2 ) ν i ( θ 1 ) , and ∀ s ∈ S − 1 i , ` i ( s | θ 2 ) ` i ( s | θ 1 ) > ν i ( θ 1 ) ν i ( θ 2 ) . Next we sum the numerators and denominators of the likeli- hood ratios of the signals in each of sets S 1 i and S − 1 i ; in voking basic algebraic properties from the resultant fractions yields P s ∈S 1 i ` i ( s | θ 1 ) P s ∈S 1 i ` i ( s | θ 2 ) ≥ ν i ( θ 2 ) ν i ( θ 1 ) , and P s ∈S − 1 i ` i ( s | θ 2 ) P s ∈S − 1 i ` i ( s | θ 1 ) > ν i ( θ 1 ) ν i ( θ 2 ) . Subsequently , replacing form (18) yields that W i := P s ∈S 1 i ` i ( s | θ 1 ) P s ∈S 1 i ` i ( s | θ 2 ) × P s ∈S − 1 i ` i ( s | θ 2 ) P s ∈S − 1 i ` i ( s | θ 1 ) ≥ ν i ( θ 2 ) ν i ( θ 1 ) × ν i ( θ 1 ) ν i ( θ 2 ) = 1 , and proof follows from the deﬁning relation w i := log W i ≥ 0 . A P P E N D I X B A M A R K OV C H A I N O N T H E B O O L E A N C U B E T o begin, for any verte x of the Boolean hypercube a := ( a 1 , . . . , a n ) T ∈ {± 1 } n and each agent i , deﬁne the function π i : {± 1 } n → [0 , 1] as π i ( a ) := P { a i,t +1 = +1 | a t = a } = P θ {− λ 1 ( s i,t +1 ) ≤ P j ∈N ( i ) w j a j + η i } . The transition probabilities for the Markov chain of action proﬁles on the Boolean hypercube are giv en by P ( a 0 , a ) := P { a t +1 = a 0 | a t = a } = Y i : a 0 i =+1 π i ( a ) Y i : a 0 i = − 1 (1 − π i ( a )) , (21) for all t ∈ N 0 and any pair of vertices a 0 := ( a 0 1 , . . . , a 0 n ) T ∈ {± 1 } n and a ∈ {± 1 } n . It follows from the classiﬁcation of states and chains in [98, Section 2.4] that {± 1 } n can be partitioned into sets of transient communication classes: C 0 1 , . . . , C 0 r 0 , and recurrent (ergodic) communication classes: C 1 , . . . , C r . Moreover , as t → ∞ , a t almost surely belongs to ∪ i ∈ [ r ] C i . It is further true that if a t 0 ∈ C i for some i ∈ [ r ] and t 0 ∈ N , then a t ∈ C i almost surely for all t ≥ t 0 : the process will almost surely leav e any set of transient action proﬁles, i.e. ∪ i ∈ [ r 0 ] C 0 i , and will almost surely remain in the ﬁrst recurrent set that it reaches before any other . Let r ∗ := arg min ρ ∈ [ r ] { t : a t ∈ C ρ } be the random v ariable that determines the ﬁrst ergodic set of action proﬁles that is reached by the Markov chain process { a t , t ∈ N 0 } ; suppose τ := card ( C r ∗ ) and further denote C r ∗ := { a ∗ 1 , . . . , a ∗ τ } . The asymptotic behavior of the process can no w be characterized as follo ws. Proposition 2 (Asymptotic Distribution of Action Proﬁles) . Let p := ( p 1 , . . . , p τ ) T be the stationary distribution over C r ∗ which uniquely satisﬁes p k P τ j =1 P ( a ∗ k , a ∗ j ) = P τ j =1 P ( a ∗ k , a ∗ j ) p j , for all k ∈ [ τ ] . Then P { lim t →∞ a t = a ∗ k } = p k , for all k ∈ [ τ ] .  A P P E N D I X C P RO O F O F P RO P O S I T I O N 1 : E Q U I L I B R I U M A C T I O N P R O FI L E S Any equilibrium a ∗ := ( a ∗ 1 , . . . , a ∗ n ) of (7) should satisfy a ∗ i = sign ( P j ∈N ( i ) w j a ∗ j + η i + λ 1 ( s i,t )) , with probability one for all i and t . Hence, a ∗ ∈ {± 1 } n is an equilibrium of (7) if, and only if, − λ 1 ( s i ) ≤ P j ∈N ( i ) w j a ∗ j + η i , ∀ s i ∈ S i whenev er a ∗ i = 1 , and − λ 1 ( s i ) ≥ P j ∈N ( i ) w j a ∗ j + η i , ∀ s i ∈ S i when- ev er a ∗ i = − 1 . By multiplying both sides of the inequalities by a ∗ i in each case and reordering the terms we derive the claimed characterization of the equilibria or the absorbing states under the action update dynamics in (7). 1  A P P E N D I X D P RO O F O F L E M M A 2 : T I M E - O N E B AY E S I A N B E L I E F S W e begin by applying the Bayes rule to the observation of agent i at time 1 which include her neighbors’ initial 1 Here, and in writing the conditions for the case of a ∗ i = − 1 as non-strict inequalities we have violated our earlier convention that agents choose +1 when they are indifferent between +1 and − 1 . Instead, we are assuming that ties are broken in favor of the equilibrium action proﬁle. This assumption facilitates compact expression of the characterizing conditions for the equilibrium action proﬁles, and it will have no effect unless with some pathological settings of the signal structure and priors leading to P j ∈N ( i ) w j a ∗ j + η i = 0 for some s i ∈ S i , i ∈ [ n ] . beliefs { µ j, 0 ( · ); j ∈ N ( i ) } as well as her priv ate signal s i, 0 . Accordingly , for any ˆ θ ∈ Θ : µ i, 1 ( ˆ θ ) = P i, 0  ˆ θ | s i, 0 , { µ j, 0 ( · ); j ∈ N ( i ) }  (22) = P i, 0 ( ˆ θ , s i, 0 , { µ j, 0 ( · ); j ∈ N ( i ) } ) P i, 0 ( s i, 0 , { µ j, 0 ( · ); j ∈ N ( i ) } ) , = P i, 0 ( ˆ θ , s i, 0 , { µ j, 0 ( · ); j ∈ N ( i ) } ) P ˜ θ ∈ Θ P i, 0 ( ˜ θ , s i, 0 , { µ j, 0 ( · ); j ∈ N ( i ) } ) . The succeeding steps follo w those in [99] for the case of two communicating agents. For any j ∈ [ n ] and all π ( · ) ∈ ∆Θ , deﬁne the correspondence I j : ∆Θ → P ( S j ) and function K j : ∆Θ → R , giv en by: I j ( π ( · )) = { s ∈ S j : π ( ˆ θ ) = ν j ( ˆ θ ) l j ( s | ˆ θ ) P ˜ θ ∈ Θ ν j ( ˜ θ ) l j ( s | ˜ θ ) , ∀ ˆ θ ∈ Θ } , K j ( π ( · )) = X s ∈I j ( π ( · )) X ˜ θ ∈ Θ ν j ( ˜ θ ) l j ( s | ˜ θ ) . (23) In (23), I j ( π ( · )) signiﬁes the set of priv ate signals for agent j , which are consistent with the observation of belief π ( · ) in that agent. By the same token, K j ( π ( · )) in (23) is the ex- ante probability for the ev ent that the priv ate signal of agent j belongs to the set I j ( π ( · )) . The terms P i, 0 ( ˜ θ , s i, 0 , { µ j, 0 ( · ); j ∈ N ( i ) } ) for ˜ θ ∈ Θ , which appear in the both numerator and denominator of (22) can be simpliﬁed by conditioning on the neighbors’ observed signals { s j, 0 ; j ∈ N ( i ) } as follo ws in (24). P i, 0 ( θ , s i, 0 , { µ j, 0 ( · ); j ∈ N ( i ) } ) = X s j ∈S j , j ∈N ( i ) P ( θ , s i, 0 , { µ j, 0 ( · ); j ∈ N ( i ) } | { s j, 0 = s j ; j ∈ N ( i ) } ) × . . . . . . P i, 0 ( { s j, 0 = s j ; j ∈ N ( i ) } ) . (24) W e next express P i, 0 ( · ) in terms of the priors and signal structures leading to: P i, 0 ( ˜ θ , s i, 0 , { µ j, 0 ( · ); j ∈ N ( i ) } ) (25) = X { s j ∈I j ( µ j, 0 ( · )) ,j ∈N ( i ) } ν i ( ˜ θ ) l i ( s i, 0 | ˜ θ ) Y j ∈N ( i ) l j ( s j | ˜ θ ) = ν i ( ˜ θ ) l i ( s i, 0 | ˜ θ ) Q j ∈N ( i ) ν j ( ˜ θ ) Y j ∈N ( i )     X s j ∈ I j ( µ j, 0 ( · )) ν j ( ˜ θ ) l j ( s j | ˜ θ )     . Bayes rule in (1), together with the functions deﬁned in (23), can now be used to eliminate the product terms in volving s j from (25) and get: P i, 0 ( ˜ θ , s i, 0 , { µ j, 0 ( · ); j ∈ N ( i ) } ) = ν i ( ˜ θ ) l i ( s i, 0 | ˜ θ ) Q j ∈N ( i ) ν j ( ˜ θ ) × (26) . . . × Y j ∈N ( i )   µ j, 0 ( ˜ θ ) X s j ∈I j ( µ j, 0 ( · )) X ¯ θ ∈ Θ ν j ( ¯ θ ) l j ( s | ¯ θ )   = ν i ( ˜ θ ) l i ( s i, 0 | ˜ θ )   Y j ∈N ( i ) µ j, 0 ( ˜ θ ) ν j ( ˜ θ )   Y j ∈N ( i ) K j ( µ j, 0 ( · )) . Upon replacing (26) in (22), the product terms inv olving K j ( µ j, 0 ( · )) cancel out and (9) follo ws.  A P P E N D I X E P RO O F O F T H E O R E M 1 : N O L E A R N I N G W H E N ρ > 1 W e begin the analysis of the beliefs propagation under (10) by forming the ratio µ i,t ( ˇ θ ) µ i,t ( θ ) = ν i ( ˇ θ ) ν i ( θ ) × l i ( s i,t | ˇ θ ) l i ( s i,t | θ ) × Y j ∈N ( i ) µ j,t − 1 ( ˇ θ ) µ j,t − 1 ( θ ) × ν j ( θ ) ν j ( ˇ θ ) , for any false state ˇ θ ∈ Θ K { θ } and each agent i ∈ [ n ] at all times t ∈ N . The above has the advantage of removing the normalization factor in the dominator out of the picture; thence, focusing instead on the e volution of belief ratios, which has a log-linear format. The latter motiv ates deﬁnitions of log- likelihood ratios for signals, beliefs, and priors as follows. Similarly to λ ˇ θ ( s i,t ) and φ i,t ( ˇ θ ) , deﬁne the log-ratios of prior beliefs as γ i ( ˇ θ ) := log  ν i ( ˇ θ ) /ν i ( θ )  . Starting from the abov e iterations for the belief ratio and taking the logarithms of both sides yields φ i,t ( ˇ θ ) = γ i ( ˇ θ ) + λ ˇ θ ( s i,t ) + X j ∈N ( i ) φ j,t − 1 ( ˇ θ ) − γ j ( ˇ θ ) . (27) Multiplying both sides of (27) by α i , which is the centrality of agent i , and summing ov er all i ∈ [ n ] yields that Φ t ( ˇ θ ) = n X i =1 α i γ i ( ˇ θ ) + n X i =1 α i λ ˇ θ ( s i,t ) (28) + n X i =1 α i X j ∈N ( i ) ( φ j,t − 1 ( ˇ θ ) − γ j ( ˇ θ )) . First note that we can write n X i =1 α i γ i ( ˇ θ ) − n X i =1 α i X j ∈N ( i ) γ j ( ˇ θ ) = tr n  I − A T  α γ  ˇ θ  T o = (1 − ρ ) β ( ˇ θ ) , (29) where γ ( ˇ θ ) := ( γ 1 ( ˇ θ ) , . . . , γ n ( ˇ θ )) T . Next note that by the choice of α as the eigen vector corresponding to the ρ eigen- value of matrix A we get n X i =1 α i X j ∈N ( i ) φ j,t − 1 ( ˇ θ ) = α T A φ t − 1 ( ˇ θ ) = ρα T φ t − 1 ( ˇ θ ) = ρ Φ t − 1 ( ˇ θ ) . (30) where φ t ( ˇ θ ) := ( φ 1 ,t ( ˇ θ ) , . . . , φ n,t ( ˇ θ )) T . No w replacing (29) and (30) in (28) yields the follo wing recursion for Φ t ( ˇ θ ) : Φ t ( ˇ θ ) = Λ t ( ˇ θ ) + ρ Φ t − 1 ( ˇ θ ) + (1 − ρ ) β ( ˇ θ ) , (31) initialized by Φ 0 ( ˇ θ ) = β ( ˇ θ ) + Λ 0 ( ˇ θ ) , where β ( ˇ θ ) := P n i =1 α i log  ν i ( ˇ θ ) /ν i ( θ )  is a constant that is determined by the initial prior beliefs, and it measures the total bias in the network relativ e between the two states ˇ θ and θ . In particular, if the agents are unbiased starting from uniform priors on Θ , then β ( ˇ θ ) = 0 , ∀ ˇ θ ∈ Θ . Note also that the assumption of full support priors implies that | β ( ˇ θ ) | is ﬁnite. By iterating (31) for t ∈ N we obtain (14). Next note that in a strongly connected graph every node has a degree greater than or equal to one so that ρ ≥ 1 , [84, Chapter 2]. If ρ > 1 , then the term ρ t Λ 0 ( ˇ θ ) increases in variance as t → ∞ , and unless Λ 0 ( ˇ θ ) <  with P θ -probability one for some  < 0 , almost sure con vergence to −∞ for Φ t ( ˇ θ ) in (14) cannot hold true.  R E F E R E N C E S [1] J. Surowiecki, The W isdom of Crowds . Knopf Doubleday Publishing Group, 2005. [2] F . Galton, “V ox populi (the wisdom of crowds), ” Nature , vol. 75, no. 7, pp. 450–451, 1907. [3] M. T . Maloney and J. H. Mulherin, “The complexity of price discovery in an efﬁcient market: the stock market reaction to the challenger crash, ” Journal of corporate ﬁnance , vol. 9, no. 4, pp. 453–479, 2003. [4] I. L. Janis, Gr oupthink: Psychological studies of policy decisions and ﬁascoes . Houghton Mifﬂin Boston, 1982. [5] K. Eliaz, D. Ray , and R. Razin, “Choice shifts in groups: A decision- theoretic basis, ” The American economic revie w , pp. 1321–1332, 2006. [6] J. A. Stoner, “Risky and cautious shifts in group decisions: The inﬂuence of widely held values, ” Journal of Experimental Social Psychology , vol. 4, no. 4, pp. 442–459, 1968. [7] D. J. Isenberg, “Group polarization: a critical review and meta-analysis. ” Journal of personality and social psychology , vol. 50, no. 6, p. 1141, 1986. [8] N. Roux and J. Sobel, “Group polarization in a model of information aggregation, ” American Economic Journal: Microeconomics , vol. 7, no. 4, pp. 202–32, 2015. [9] P . M. DeMarzo, D. V ayanos, and J. Zwiebel, “Persuasion bias, social inﬂuence, and unidimensional opinions, ” The Quarterly Journal of Economics , vol. 118, pp. 909–968, 2003. [10] B. Golub and M. O. Jackson, “Na ¨ ıve Learning in Social Networks and the Wisdom of Crowds, ” American Economic Journal: Microeconomics , vol. 2, no. 1, pp. 112–149, Feb. 2010. [11] V . Krishnamurthy and W . Hoiles, “Online reputation and polling sys- tems: Data incest, social learning, and revealed preferences, ” IEEE T ransactions on Computational Social Systems , vol. 1, no. 3, pp. 164– 179, 2014. [12] D. Acemoglu and A. Ozdaglar , “Opinion dynamics and learning in social networks, ” Dynamic Games and Applications , vol. 1, no. 1, pp. 3–49, 2011. [13] M. O. Jackson, Social and Economic Networks . Princeton, NJ: Princeton University Press, 2008. [14] S. Goyal, Connections: an introduction to the economics of networks . Princeton University Press, 2012. [15] C. P . Chamley , Rational Herds: Economic Models of Social Learning . Cambridge University Press, 2004. [16] V . Krishnamurthy and H. V . Poor, “Social learning and bayesian games in multiagent signal processing: How do local and global decision makers interact?” IEEE Signal Pr ocessing Magazine, , vol. 30, no. 3, pp. 43–57, 2013. [17] Y . W ang and P . Djuric, “Social learning with bayesian agents and random decision making, ” IEEE T ransactions on Signal Pr ocessing , vol. 63, no. 12, pp. 3241–3250, 2015. [18] V . Krishnamurthy , O. N. Gharehshiran, and M. Hamdi, “Interactive sensing and decision making in social networks, ” F oundations and T rends in Signal Pr ocessing , vol. 7, no. 1-2, pp. 1–196, Apr . 2014. [19] E. Mossel, A. Sly , and O. T amuz, “Asymptotic learning on bayesian social networks, ” Pr obability Theory and Related Fields , vol. 158, no. 1-2, pp. 127–157, 2014. [20] A. Jadbabaie, P . Molavi, A. Sandroni, and A. T ahbaz-Salehi, “Non- bayesian social learning, ” Games and Economic Behavior , vol. 76, no. 1, pp. 210 – 225, 2012. [21] E. M. Rogers, Diffusion of Innovations , 5th ed. Simon and Schuster, 2003. [22] V . Borkar and P . V araiya, “ Asymptotic agreement in distributed esti- mation, ” IEEE T ransactions on A utomatic Contr ol , v ol. 27, no. 3, pp. 650–655, Jun 1982. [23] J. N. Tsitsiklis and M. Athans, “Conv ergence and asymptotic agreement in distributed decision problems, ” Automatic Contr ol, IEEE T ransactions on , vol. 29, no. 1, pp. 42–50, 1984. [24] S. McLaughlin, V . Krishnamurthy , and S. Challa, “Managing data incest in a distributed sensor network, ” in IEEE International Conference on Acoustics, Speech, and Signal Pr ocessing (ICASSP’03) , vol. 5. IEEE, 2003, pp. V –269. [25] N. Cesa-Bianchi and G. Lugosi, Prediction, Learning, and Games . New Y ork, NY , USA: Cambridge University Press, 2006. [26] A. Jadbabaie, J. Lin, and A. S. Morse, “Coordination of groups of mobile autonomous agents using nearest neighbor rules, ” IEEE T ransactions on Automatic Contr ol , vol. 48, no. 6, pp. 988–1001, 2003. [27] M. Mesbahi and M. Egerstedt, Graph Theoretic Methods in Multiagent Networks . Princeton Univ ersity Press, 2010. [28] D. Blackwell and L. Dubins, “Merging of opinions with increasing information, ” The Annals of Mathematical Statistics , vol. 33, pp. 882 – 886, 1962. [29] E. Lehrer and R. Smorodinsky , “Merging and learning, ” Lectur e Notes- Monograph Series , pp. 147–168, 1996. [30] D. Acemoglu, M. A. Dahleh, I. Lobel, and A. Ozdaglar, “Bayesian learning in social networks, ” The Review of Economic Studies , vol. 78, no. 4, pp. 1201–1236, 2011. [31] M. Mueller-Frank, “ A general framew ork for rational learning in social networks, ” Theoretical Economics , vol. 8, no. 1, pp. 1–40, 2013. [32] A. Wilson, “Bounded memory and biases in information processing, ” Econometrica , vol. 82, no. 6, pp. 2257–2294, 2014. [33] T . M. Cover , “ A note on the two-armed bandit problem with ﬁnite memory , ” Information and Contr ol , vol. 12, no. 5, pp. 371–377, 1968. [34] ——, “Hypothesis testing with ﬁnite statistics, ” The Annals of Mathe- matical Statistics , pp. 828–835, 1969. [35] L. Kontorovich, “Statistical estimation with bounded memory , ” Statistics and Computing , vol. 22, no. 5, pp. 1155–1164, 2012. [36] M. E. Hellman and T . M. Cover , “Learning with ﬁnite memory , ” The Annals of Mathematical Statistics , pp. 765–782, 1970. [37] T . M. Cov er , M. A. Freedman, and M. E. Hellman, “Optimal ﬁnite memory learning algorithms for the ﬁnite sample problem, ” Information and Control , vol. 30, no. 1, pp. 49 – 85, 1976. [38] T . M. Cover and M. E. Hellman, “The two-armed-bandit problem with time-in variant ﬁnite memory , ” IEEE T ransactions on Information Theory , vol. 16, no. 2, pp. 185–195, 1970. [39] K. Drakopoulos, A. Ozdaglar, and J. N. Tsitsiklis, “On learning with ﬁnite memory , ” IEEE T ransactions on Information Theory , vol. 59, no. 10, pp. 6859–6872, 2013. [40] M. H. DeGroot, “Reaching a consensus, ” Journal of American Statistical Association , vol. 69, pp. 118 – 121, 1974. [41] A. Jadbabaie, P . Molavi, and A. T ahbaz-Salehi, “Information heterogene- ity and the speed of learning in social networks, ” Revise and Resubmit, Review of Economic Studies , 2013. [42] V . Bala and S. Goyal, “Learning from neighbours, ” The Revie w of Economic Studies , vol. 65, no. 3, pp. 595–621, 1998. [43] E. Eyster and M. Rabin, “Naive herding in rich-information settings, ” American economic journal: micr oeconomics , vol. 2, no. 4, pp. 221–243, 2010. [44] ——, “Extensiv e imitation is irrational and harmful, ” The Quarterly Journal of Economics , p. qju021, 2014. [45] V . Krishnamurthy , P artially Observed Mark ov Decision Pr ocesses . Cambridge University Press, 2016. [46] M. Luca, “Revie ws, reputation, and rev enue: The case of yelp. com, ” Com (September 16, 2011). Harvard Business School NOM Unit W ork- ing P aper , no. 12-016, 2011. [47] G. Gigerenzer and W . Gaissmaier, “Heuristic decision making, ” Annual r evie w of psychology , vol. 62, pp. 451–482, 2011. [48] G. Gigerenzer and D. G. Goldstein, “Reasoning the fast and frugal way: models of bounded rationality , ” Psychological r eview , vol. 103, no. 4, p. 650, 1996. [49] G. Gigerenzer and P . M. T odd, Simple heuristics that make us smart . Oxford University Press, USA, 1999. [50] R. Hegselmann and U. Krause, “Opinion dynamics driv en by various ways of av eraging, ” Computational Economics , vol. 25, no. 4, pp. 381– 405, 2005. [51] V . Grimm and F . Mengel, “ An experiment on belief formation in networks, ” A vailable at SSRN 2361007 , 2014. [52] A. G. Chandrasekhar, H. Larreguy , and J. P . Xandri, “T esting models of social learning on networks: Evidence from a lab experiment in the ﬁeld, ” National Bureau of Economic Research, T ech. Rep., 2015. [53] P . Molavi, A. T ahbaz-Salehi, and A. Jadbabaie, “Foundations of non- bayesian social learning, ” Columbia Business School Resear ch P aper , 2015. [54] M. Mueller-Frank and C. Neri, “ A general model of boundedly rational observational learning: Theory and experiment, ” Available at SSRN 2566210 , 2015. [55] J. S. B. Evans, “In two minds: dual-process accounts of reasoning, ” T rends in cognitive sciences , vol. 7, no. 10, pp. 454–459, 2003. [56] D. Kahneman, Thinking, fast and slow . Farrar Straus Giroux, 2011. [57] ——, “Maps of bounded rationality: Psychology for behavioral eco- nomics, ” The American economic re view , vol. 93, no. 5, pp. 1449–1475, 2003. [58] I. Brocas and J. D. Carrillo, “Dual-process theories of decision-making: A selective surve y , ” Journal of economic psychology , vol. 41, pp. 45–54, 2014. [59] R. J. Aumann, “ Agreeing to disagree, ” The annals of statistics , pp. 1236– 1239, 1976. [60] D. Fudenberg, The Theory of Learning in Games . MIT Press, 1998. [61] H. P . Y oung, Strategic learning and its limits . Oxford university press, 2004. [62] D. Rosenberg, E. Solan, and N. V ieille, “Informational externalities and emergence of consensus, ” Games and Economic Behavior , v ol. 66, no. 2, pp. 979–994, 2009. [63] J.-F . Mertens and S. Zamir , “Formulation of bayesian analysis for games with incomplete information, ” International J ournal of Game Theory , vol. 14, no. 1, pp. 1–29, 1985. [64] A. Brandenburger and E. Dekel, “Hierarchies of beliefs and common knowledge, ” Journal of Economic Theory , vol. 59, no. 1, pp. 189–198, 1993. [65] R. O’Donnell, Analysis of boolean functions . Cambridge University Press, 2014. [66] I. Benjamini, G. Kalai, and O. Schramm, “Noise sensitivity of boolean functions and applications to percolation, ” Publications Math ´ ematiques de l’Institut des Hautes ´ Etudes Scientiﬁques , vol. 90, no. 1, pp. 5–43, 1999. [67] E. Mossel, R. O’Donnell, and K. Oleszkiewicz, “Noise stability of functions with low inﬂuences: in variance and optimality , ” 46th Annual IEEE Symposium on F oundations of Computer Science (FOCS) , pp. 21– 30, 2005. [68] Y . Peres, “Noise stability of weighted majority , ” arXiv preprint math/0412377 , 2004. [69] W . S. McCulloch and W . Pitts, “ A logical calculus of the ideas immanent in nervous activity , ” The bulletin of mathematical biophysics , vol. 5, no. 4, pp. 115–133, 1943. [70] J. J. Hopﬁeld, “Neural networks and physical systems with emergent collectiv e computational abilities, ” Pr oceedings of the national academy of sciences , vol. 79, no. 8, pp. 2554–2558, 1982. [71] D. A. Levin, Y . Peres, and E. L. W ilmer, Markov Chains and Mixing T imes . American Mathematical Society , 2009. [72] F . Martinelli, Lectur es on Glauber Dynamics for Discr ete Spin Models . Springer , 1999. [73] M. Ostilli, E. Y oneki, I. X. Y . Leung, J. F . F . Mendes, P . Li ´ o, and J. Crowcroft, “Statistical mechanics of rumour spreading in network communities, ” Pr ocedia Computer Science , vol. 1, no. 1, pp. 2331–2339, 2010. [74] J.-P . Nadal, D. Phan, M. B. Gordon, and J. V annimenus, “Multiple equilibria in a monopoly market with heterogeneous agents and exter- nalities, ” Quantitative F inance , vol. 5, no. 6, pp. 557–568, 2005. [75] M. H. Afrasiabi, R. Guerin, and S. S. V enkatesh, “Opinion formation in ising networks, ” in Information Theory and Applications W orkshop . IEEE, 2013, pp. 1–10. [76] D. V . Lindley , Intr oduction to probability and statistics fr om bayesian viewpoint. part I: Probability , P art II: inference . CUP Archive, 1965. [77] M. H. DeGroot, Optimal Statistical Decisions , ser. Wile y Classics Library . Ne w Y ork:McGraw-Hill, 1969. [78] L. J. Savage, The foundations of statistics . Courier Dover Publications, 1972. [79] J. O. Berger , Statistical decision theory and Bayesian analysis . Springer, 1985. [80] P . Billingsley , Pr obability and Measur e , 3rd ed. Wile y-Interscience, 1995. [81] M. A. Rahimian and A. Jadbabaie, “Learning without recall in directed circles and rooted trees, ” American Contr ol Conference , pp. 4222–4227, 2015. [82] T . M. Cov er and J. A. Thomas, Elements of Information Theory , ser. A W iley-Interscience publication. W iley , 2006. [83] E. Seneta, Non-negative matrices and Markov chains . Springer, 2006. [84] R. A. Brualdi, The mutually beneﬁcial relationship of graphs and matrices . American Mathematical Soc., 2011, no. 115. [85] T . Gagnon-Bartsch and M. Rabin, “Naive social learning, mislearning, and unlearning, ” W orking Paper . [86] M. A. Rahimian, S. Shahrampour, and A. Jadbabaie, “Learning without recall by random walks on directed graphs, ” IEEE Conference on Decision and Contr ol (CDC) , 2015. [87] S. Boyd, A. Ghosh, B. Prabhakar, and D. Shah, “Randomized gossip algorithms, ” IEEE Tr ansactions on Information Theory , vol. 52, no. 6, pp. 2508–2530, 2006. [88] A. T ahbaz-Salehi and A. Jadbabaie, “Consensus over ergodic stationary graph processes, ” IEEE T ransactions on Automatic Control , vol. 55, no. 1, pp. 225–230, 2010. [89] J. R. Norris, Markov Chains . Cambridge University Press, 1999. [90] M. A. Rahimian and A. Jadbabaie, “Learning without recall: A case for log-linear learning, ” 5th IF AC W orkshop on Distributed Estimation and Contr ol in Networked Systems , 2015. [91] S. Shahrampour and A. Jadbabaie, “Exponentially fast parameter es- timation in networks using distributed dual averaging, ” 52nd IEEE Confer ence on Decision and Control (CDC) , pp. 6196–6201, 2013. [92] K. Rahnama Rad and A. T ahbaz-Salehi, “Distributed parameter esti- mation in networks, ” 49th IEEE Conference on Decision and Contr ol (CDC) , pp. 5050–5055, 2010. [93] A. Nedi ´ c, A. Olshevsky , and C. A. Uribe, “Nonasymptotic conv ergence rates for cooperativ e learning over time-varying directed graphs, ” arXiv pr eprint arXiv:1410.1977 , 2014. [94] A. Lalitha, A. Sarwate, and T . Javidi, “Social learning and distributed hypothesis testing, ” IEEE International Symposium on Information The- ory , pp. 551–555, 2014. [95] S. Bandyopadhyay and S.-J. Chung, “Distributed estimation using bayesian consensus ﬁltering, ” American Control Conference , pp. 634– 641, 2014. [96] G. L. Gilardoni and M. K. Clayton, “On reaching a consensus using DeGroot’ s iterative pooling, ” The Annals of Statistics , pp. 391–401, 1993. [97] M. J. Rufo, J. Martin, C. J. P ´ erez et al. , “Log-linear pool to combine prior distributions: A suggestion for a calibration-based approach, ” Bayesian Analysis , vol. 7, no. 2, pp. 411–438, 2012. [98] J. G. Kemen y and J. L. Snell, F inite Markov Chains . Ne w Y ork: V an Nostrand, 1960. [99] M. A. Rahimian, P . Molavi, and A. Jadbabaie, “(Non-) bayesian learning without recall, ” IEEE Conference on Decision and Contr ol (CDC) , pp. 5730–5735, 2014.

Bayesian Learning without Recall

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment