A New Approach to Distributed Hypothesis Testing and Non-Bayesian Learning: Improved Learning Rate and Byzantine-Resilience

A New Approac h to Distributed Hyp othesis T esting and Non-Ba y esian Learning: Impro v ed Learning Rate and Byzan tine-Resilience Aritra Mitra, John A. Ric hards and Shreyas Sundaram ∗ July 9, 2019 Abstract W e study a setting where a group of agen ts, eac h receiving partially informative priv ate signals, seek to collab orativ ely learn the true underlying state of the w orld (from a ﬁnite set of h yp otheses) that generates their joint observ ation proﬁles. T o solv e this problem, we prop ose a distributed learning rule that diﬀers fundamen tally from existing approaches, in that it do es not emplo y an y form of “b elief-av eraging”. Instead, agen ts up date their b eliefs based on a min-rule. Under standard assumptions on the observ ation mo del and the netw ork structure, we establish that each agen t learns the truth asymptotically almos t surely . As our main contribution, we pro ve that with probabilit y 1, each false h yp othesis is ruled out b y ev ery agent exponentially fast at a netw ork-indep enden t rate that is strictly larger than existing rates. W e then develop a computationally-eﬃcien t v ariant of our learning rule that is prov ably resilient to agents who do not behav e as exp ected (as represen ted b y a Byzantine adversary mo del) and delib erately try to spread misinformation. 1 In tro duction Giv en noisy data, the task of making meaningful inferences ab out a quan tity of in terest is at the heart of v arious complex estimation and detection problems arising in signal pro cessing, informa- tion theory , machine learning, and control systems. When the information required to solve such problems is disp ersed o v er a netw ork, several in teresting questions arise. How should the individual en tities in the netw ork combine their own priv ate observ ations with the information received from neigh b ors to learn the quantit y of interest? What are the minimal requirements on the information structure of the entities and the top ology of the netw ork for this to happ en? How fast do es infor- mation spread as a function of the diﬀusion rule and the structure of the netw ork? What can b e said when the underlying netw ork changes with time and/or certain en tities deviate from nominal b eha vior? In this pap er, we provide rigorous theoretical answ ers to such questions for the setting where a group of agents receiv e a stream of priv ate signals generated by an unkno wn quan tit y ∗ A. Mitra and S. Sundaram are with the School of Electrical and Computer Engineering at Purdue Universit y . J. A. Richards is with Sandia National Laboratories. Email: { mitra14, sundara2 } @purdue.edu , jaricha@sandia.gov . This work was supp orted in part by NSF CAREER aw ard 1653648, and by the Lab oratory Directed Research and Dev elopment program at Sandia National Laboratories. Sandia National Laboratories is a multimission lab oratory managed and op erated b y National T ec hnology & Engineering Solutions of Sandia, LLC, a wholly owned subsidiary of Honeyw ell International Inc., for the U.S. Department of Energy’s National Nuclear Security Administration under con tract DE-NA0003525. The views expressed in the article do not necessarily represent the views of the U.S. Departmen t of Energy or the United States Gov ernmen t. 1 kno wn as the “true state of the w orld”. Comm unication among such agen ts is mo deled b y a graph. The goal of eac h agent is to even tually identify the true state from a ﬁnite set of hypotheses. Ho w- ev er, while the c ol le ctive signals across all agen ts might facilitate iden tiﬁcation of the true state, signals received b y an y given agent may , in general, not b e rich enough for identifying the state in isolation. Th us, the problem of interest is to develop and analyze lo cal interaction rules that facilitate inference of the true state at ev ery agent. The setup describ ed ab o ve serves as a common mathematical abstraction for mo deling and analyzing v arious decision-making problems in so cial and economic netw orks (e.g., opinion formation and spreading), and classiﬁcation/detection prob- lems arising in large-scale engineered systems (e.g., ob ject recognition by a group of aerial rob ots). While the former is t ypically studied under the moniker of non-Ba yesian so cial learning, the latter usually go es by the name of distributed detection/hypothesis testing. In what follows, we discuss relev ant literature. Related Literature : Muc h of the earlier work on this topic of in terest assumed the existence of a cen tralized fusion center for p erforming computational tasks [ 1 – 3 ]. Our work in this pap er, ho wev er, b elongs to a more recen t b o dy of literature wherein individual agents are e ndo w ed with computational capabilities, and in teractions among them are captured by a graph [ 4 – 14 ]. These w orks are essen tially inspired b y the mo del in [ 4 ], where each agen t main tains a belief v ector (o ver the set of h yp otheses) that is sequen tially updated as the con vex combination of its own Ba yesian p osterior and the priors of its neighbors. Subsequent approaches share a common theme: they t ypically inv olv e a learning rule that com bines a lo cal Ba y esian up date with a consensus-based opinion p ooling of neigh b oring beliefs. The k ey p oin t of distinction among such rules stems from the sp eciﬁc manner in which neigh b oring opinions are aggregated. Sp eciﬁcally , linear opinion p ooling is studied in [ 4 – 6 ], whereas log-linear opinion po oling is studied in [ 7 – 14 ]. Under appropriate conditions on the observ ation mo del and the netw ork structure, each of these approaches enable ev ery agen t to learn the true state exp onen tially fast, with probabilit y 1. The rate of conv ergence, ho wev er, depends on the sp eciﬁc nature of the learning rule. Notably , ﬁnite-time concentration results are deriv ed in [ 9 – 11 ], and a large-deviation analysis is conducted in [ 12 , 13 ] for a broad class of distributions that generate the agents’ observ ation proﬁles. Extensions to diﬀeren t types of time-v arying graphs hav e also b een considered in [ 6 , 8 – 11 ]. In a recen t pap er [ 15 ], the authors go b ey ond sp eciﬁc functional forms of b elief-up date rules and, instead, adopt an axiomatic framew ork that iden tiﬁes the fundamental factors resp onsible for so cial learning. W e p oin t out that b elief- consensus algorithms on graphs hav e b een studied prior to [ 4 ] as well as in [ 16 , 17 ]. The mo del in [ 16 , 17 ] diﬀers from that in [ 4 – 14 ] in one k ey asp ect: while in the former each agen t has access to only one observ ation, the latter allo ws for inﬂux of new information into the netw ork in the form of a time-series of observ ations at every agent. Our Contributions : In ligh t of the abov e dev elopments, we now elab orate on the main con tributions of this work. 1) A No v el Distributed Learning Rule : In [ 10 , Section I I I], the authors explain that the commonly studied linear and log-linear forms of b elief aggregation are sp eciﬁc instances of a more general class of opinion p ooling known as g-Quasi-Linear Opinion p o ols (g-QLOP), introduced in [ 18 ]. Our ﬁrst contribution is the dev elopment of a nov el b elief up date rule that deviates fundamen tally from the broad family of g-QLOP learning rules. Sp eciﬁcally , the learning algorithm that w e propose in Section 3 does not rely on any linear consensus-based belief aggregation protocol. Instead, each agen t main tains tw o sets of b elief vectors: a local b elief v ector and an actual b elief v ector. Each agent up dates its lo cal b elief vector in a Ba yesian manner based on only its priv ate observ ations, i.e., without the inﬂuence of neigh b ors. The actual b elief on eac h h yp othesis is up dated (up to normalization) as the minimum of the agent’s o wn lo cal b elief and the actual b eliefs of its neigh bors on that particular h yp othesis. W e pro vide theoretical guaran tees on the 2 p erformance of this algorithm in Section 4 . As w e explain later in the pap er, establishing such guaran tees requires pro of tec hniques that diﬀer substantially from those existing. 2) Strict Improv ement in Rate of Learning : While data-aggregation via arithmetic or geometric a veraging of neighboring beliefs allo ws asymptotic learning, suc h sc hemes ma y potentially dilute the rate at which false h ypotheses are eliminated. In particular, for the linear consensus proto col introduced in [ 4 ], the limiting rate at which a particular false hypothesis is eliminated is almost surely upp er-b ounded by a quan tity that dep ends on the relativ e en tropies and cen tralities of the agen ts [ 5 ]. The log-linear rules in [ 9 – 13 ] impro v e up on suc h a rate: with probability 1, the asymptotic rate of rejection of a false h yp othesis under such rules is a con vex combination of the agents’ relative entropies, where the conv ex weigh ts corresp ond to the eigen vector centralities of the agen ts. In contrast, based on our approach, each false hypothesis is rejected b y every agen t exp onentially fast, at a rate that is almost surely low er-b ounded b y the b est relative entrop y (b et w een the true state and the false h yp othesis) among all agen ts, pro vided the underlying net w ork is static and strongly-connected. In Theorem 1 , w e sho w that the ab ov e result con tinues to hold ev en when the netw ork changes with time, as long as a mild joint strong-connectivit y condition is met. Thus, to the b est of our know le dge, our appr o ach le ads to a strict impr ovement in the r ate of le arning over al l existing appr o aches: this c onstitutes our main c ontribution . 3) Resilience to Adv ersaries : Despite the wealth of literature on distributed inference, there is limited understanding of the impact of misbehaving agents who do not follo w the prescrib ed learning algorithm. Such agents may represen t stubb orn individuals or ideological extremists in the con text of a so cial net work, or mo del faults (either b enign or malicious) in a net work ed con trol system. In the pr esenc e of such misb ehaving entities, how should the r emaining agents pr o c ess their private observations and the b eliefs of their neighb ors to eventual ly le arn the truth? T o answer this question, we capture devian t b ehavior via the classical Byzan tine adversary mo del [ 19 ], and dev elop a prov ably correct, resilien t version of our prop osed learning rule in Section 5 . Theorem 5 c haracterizes the p erformance of this rule and, in particular, reveals that each regular agent can infer the truth exp onen tially fast. F urthermore, we identify conditions on the observ ation mo del and the netw ork structure that guarantee applicability of our Byzan tine-resilient learning rule, and argue that such conditions can b e c heck ed in p olynomial time. The only related w ork that w e are a ware of in this regard is [ 14 ]. As we discuss in detail in Section 5 , our prop osed approach has v arious computational adv an tages relative to those in [ 14 ]. In addition to the main contributions discussed ab o ve, a minor contribution of this pap er is the follo wing. F or static graphs where all agen ts b eha ve normally , Theorem 3 establishes consistency of our learning rule under conditions that are necessary for any b elief update rule to w ork, when agen ts mak e conditionally indep enden t observ ations. In particular, w e show that the typical assumption of strong-connectivity on the netw ork can b e relaxed, and iden tify the minimal requirement for uniquely learning any state that gets realized. 1 Despite its v arious adv antages, our approac h cannot, in general, handle the scenario where there do es not exist an y single true state that generates signals consisten t with those seen by every agent. The metho d in [ 10 , 11 ], how ev er, is applicable to this case as w ell, and enables eac h agen t to identify the h yp othesis that b est explains the groups’ observ ations. A preliminary version of this pap er app eared as [ 20 ]. W e signiﬁcantly expand up on the conten t in [ 20 ] by (i) pro viding detailed conv ergence rate analyses of our algorithms, (ii) extending our results to the case of time-v arying graphs, (iii) elab orating on the signiﬁcance of our results relative to prior w ork, and v alidating them via suitable sim ulation studies. 1 A strongly-connected graph has a path b et ween ev ery pair of no des. 3 2 Mo del and Problem F orm ulation Net w ork Mo del: Let N and N + denote the set of non-negativ e in tegers and positive integers, resp ectiv ely . W e consider a group of agents V = { 1 , 2 , . . . , n } interacting o ver a time-v arying, directed communication graph G [ t ] = ( V , E [ t ]), where t ∈ N . An edge ( i, j ) ∈ E [ t ] indicates that agen t i can directly transmit information to agent j at time-step t . If ( i, j ) ∈ E [ t ], then at time t , agen t i will b e called a neigh b or of agent j , and agen t j will b e called an out-neighbor of agent i . The set N i [ t ] will b e used to denote the neighbors of agen t i (excluding itself ) at time t , whereas the set N i [ t ] ∪ { i } will b e referred to as the inclusiv e neigh b orho od of agen t i at time t . W e will use |C | to denote the cardinality of a set C . Observ ation Mo del: Let Θ = { θ 1 , θ 2 , . . . , θ m } denote m p ossible states of the world; eac h θ i ∈ Θ will b e called a hypothesis. A t eac h time-step t ∈ N + , every agent i ∈ V priv ately observ es a signal s i,t ∈ S i , where S i denotes the signal space of agent i . The joint observ ation proﬁle so generated across the netw ork is denoted s t = ( s 1 ,t , s 2 ,t , . . . , s n,t ), where s t ∈ S , and S = S 1 × S 2 × . . . S n . The signal s t is generated based on a conditional lik eliho o d function l ( ·| θ ? ), go verned b y the true state of the w orld θ ? ∈ Θ. Let l i ( ·| θ ? ) , i ∈ V denote the i -th marginal of l ( ·| θ ? ). The signal structure of each agent i ∈ V is then characterized by a family of parameterized marginals { l i ( w i | θ ) : θ ∈ Θ , w i ∈ S i } . 2 W e make the following standard assumptions [ 4 – 6 , 8 – 14 ]: (i) The signal space of each agent i , namely S i , is ﬁnite. 3 (ii) Eac h agent i has kno wledge of its lo cal likelihoo d functions { l i ( ·| θ p ) } m p =1 , and it holds that l i ( w i | θ ) > 0 , ∀ w i ∈ S i , and ∀ θ ∈ Θ. (iii) The observ ation sequence of each agen t is describ ed b y an i.i.d. random pro cess o ver time; ho wev er, at an y giv en time-step, the observ ations of diﬀeren t agents may p oten tially b e correlated. (iv) There exists a ﬁxed true state of the world θ ? ∈ Θ (unknown to the agents) that generates the observ ations of all the agents. 4 Finally , w e deﬁne a probabilit y triple (Ω , F , P θ ? ), where Ω , { ω : ω = ( s 1 , s 2 , . . . ) , s t ∈ S , t ∈ N + } , F is the σ -algebra generated b y the observ ation proﬁles, and P θ ? is the probabilit y measure induced b y sample paths in Ω. Sp eciﬁcally , P θ ? = ∞ Q t =1 l ( ·| θ ? ). F or the sake of brevity , we will sa y that an even t o ccurs almost surely to mean that it o ccurs almost surely w.r.t. the probabilit y measure P θ ? . Note that assumptions (i) and (ii) on the observ ation mo del imply the existence of a constant L ∈ (0 , ∞ ) such that: max i ∈V max w i ∈S i max θ p ,θ q ∈ Θ     log l i ( w i | θ p ) l i ( w i | θ q )     ≤ L. (1) W e will make use of the ab ov e fact later in our analysis. Giv en the ab o v e setup, the goal of each agent in the net work is to discern the true state of the w orld θ ? . The challenge asso ciated with such a task stems from the fact that the priv ate signal structure of any given agent is in general only partially informative. T o mak e this notion precise, deﬁne Θ θ ? i , { θ ∈ Θ : l i ( w i | θ ) = l i ( w i | θ ? ) , ∀ w i ∈ S i } . In words, Θ θ ? i represen ts the set of h yp otheses that are observational ly e quivalent to the true state θ ? from the p ersp ectiv e of agent i . In general, for any agen t i ∈ V , we may hav e | Θ θ ? i | > 1, necessitating collab oration among agents sub ject to the restrictions imp osed by the time-v arying comm unication top ology . Our ob jective in this pap er will b e to design a distributed learning rule that allows each agen t i ∈ V to iden tify the true state of the world asymptotically almost surely . T o this end, w e now in tro duce the follo wing notion of source agents that will b e useful in our subsequent developmen ts. 2 Whereas w i ∈ S i will b e used to refer to a generic element of the signal space of agent i , s i,t will denote the random v ariable (with distribution l i ( ·| θ ? )) that corresp onds to the observ ation of agen t i at time-step t . 3 The analysis in [ 7 ] applies to contin uous parameter spaces. 4 The approach in [ 10 ] applies to a more general setting where there may not exist such a true hypothesis. 4 Deﬁnition 1. ( Sour c e agents ) An agent i is said to b e a sour c e agent for a p air of distinct hyp otheses θ p , θ q ∈ Θ , if D ( l i ( ·| θ p ) || l i ( ·| θ q )) > 0 , wher e D ( l i ( ·| θ p ) || l i ( ·| θ q )) r epr esents the KL- diver genc e b etwe en the distributions l i ( ·| θ p ) and l i ( ·| θ q ) , and is given by: D ( l i ( ·| θ p ) || l i ( ·| θ q )) = X w i ∈S i l i ( w i | θ p ) log l i ( w i | θ p ) l i ( w i | θ q ) . (2) The set of al l sour c e agents for the p air θ p , θ q is denote d by S ( θ p , θ q ) . In w ords, a source agen t for a pair θ p , θ q ∈ Θ is an agen t that can distinguish b et w een the pair of h yp otheses θ p , θ q based on its pr iv ate signal structure. It should be noted that S ( θ p , θ q ) = S ( θ q , θ p ), since D ( l i ( ·| θ p ) || l i ( ·| θ q )) > 0 ⇐ ⇒ D ( l i ( ·| θ q ) || l i ( ·| θ p )) > 0 [ 21 ]. T o av oid cluttering the exp osition, w e will henceforth use K i ( θ p , θ q ) as a shorthand for D ( l i ( ·| θ p ) || l i ( ·| θ q )). In this w ork, we will assume that eac h state θ ∈ Θ is glob al ly identiﬁable w.r.t. the join t observ ation mo del of the en tire net work. Based on our terminology of source agen ts, this translates to the following. Assumption 1. ( Glob al Identiﬁability ) F or e ach p air θ p , θ q ∈ Θ such that θ p 6 = θ q , the set S ( θ p , θ q ) of agents that c an distinguish b etwe en the p air θ p , θ q is non-empty. The ab o ve assumption is standard in the related literature. W e will additionally mak e a mild assumption on the time-v arying communication top ology . T o this end, let the union graph ov er an interv al [ t 1 , t 2 ] , 0 ≤ t 1 < t 2 , indicate a graph with vertex set equal to V , and edge set equal to S t 2 τ = t 1 E [ τ ]. Based on this conv en tion, we will assume (unless stated otherwise) that the sequence of comm unication graphs {G [ t ] } ∞ t =0 is jointly str ongly-c onne cte d , in the follo wing sense. Assumption 2. (Joint Str ong-Conne ctivity) Ther e exists T ∈ N + such that the union gr aph over every interval of the form [ r T , ( r + 1) T ) is str ongly-c onne cte d, wher e r ∈ N . While the ab o v e assumption on the netw ork connectivit y pattern is not necessary for solving the problem at hand, it is fairly standard in the analysis of distributed algorithms o v er time-v arying net works [ 10 , 22 , 23 ]. Ha ving established the mo del and the problem formulation, w e no w pro ceed to a formal description of our distributed learning algorithm. 3 Prop osed Learning Rule In this section, w e propose a no vel b elief update rule (Algorithm 1 ) and discuss the in tuition b ehind it. Every agent i maintains and up dates (at every time-step t ) tw o separate sets of b elief vectors, namely , π i,t and µ i,t . Each of these vectors are probability distributions ov er the hypothesis set Θ. W e will refer to π i,t and µ i,t as the “lo cal” b elief vector (for reasons that will so on b ecome ob vious), and the “actual” belief v ector, resp ectiv ely , maintained b y agen t i . The goal of eac h agen t i ∈ V in the netw ork will b e to use its o wn priv ate signals and the information av ailable from its neigh b ors to up date µ i,t sequen tially , so that lim t →∞ µ i,t ( θ ∗ ) = 1 almost surely . T o do so, at eac h time-step t + 1 (where t ∈ N ), agent i do es the follo wing for eac h θ ∈ Θ. It ﬁrst generates π i,t +1 ( θ ) via a local Bay esian update rule that incorp orates the priv ate observ ation s i,t +1 using π i,t ( θ ) as a prior (line 5 in Algo. 1 ). Ha ving generated π i,t +1 ( θ ), agen t i up dates µ i,t +1 ( θ ) (up to normalization) by setting it to b e the minimum of its lo cally generated b elief π i,t +1 ( θ ), and the actual b eliefs µ j,t ( θ ) , j ∈ N i [ t ] ∪ { i } of its inclusive neighborho o d at the previous time-step (line 6 in Algo. 1 ). It then rep orts µ i,t +1 to eac h of its out-neighbors at time t + 1. 5 5 Note that based on our algorithm, agents only exc hange their actual beliefs, and not their lo cal b eliefs. 5 Algorithm 1 Belief up date rule for eac h i ∈ V 1: Initialization: µ i, 0 ( θ ) > 0, π i, 0 ( θ ) > 0, ∀ θ ∈ Θ, and P θ ∈ Θ µ i, 0 ( θ ) = 1, P θ ∈ Θ π i, 0 ( θ ) = 1 2: T ransmit µ i, 0 to out-neigh b ors at time 0 3: for t + 1 ∈ N + do 4: for θ ∈ Θ do 5: Up date lo cal b elief on θ as π i,t +1 ( θ ) = l i ( s i,t +1 | θ ) π i,t ( θ ) m P p =1 l i ( s i,t +1 | θ p ) π i,t ( θ p ) (3) 6: Up date actual b elief on θ as µ i,t +1 ( θ ) = min {{ µ j,t ( θ ) } j ∈N i [ t ] ∪{ i } , π i,t +1 ( θ ) } m P p =1 min {{ µ j,t ( θ p ) } j ∈N i [ t ] ∪{ i } , π i,t +1 ( θ p ) } (4) 7: end for 8: T ransmit µ i,t +1 to out-neigh b ors at time t + 1 9: end for In tuition behind the learning rule : At the core of our learning algorithm are tw o k ey princi- ples: (1) Pr eservation of the intrinsic discriminatory c ap abilities of the agents , and (2) Pr op agation of low b eliefs on e ach false hyp othesis . W e no w elab orate on these features. Consider the set of source agents S ( θ ∗ , θ ) that can diﬀerentiate b et w een a certain false hypothesis θ and the true state θ ? . By deﬁnition, the signal structures of such agen ts are rich enough for them to b e able to eliminate θ on their o wn, i.e., without the supp ort of their neighbors. T o ac hieve this, w e require eac h agen t to main tain a local b elief v ector that is updated (via ( 3 )) without any network inﬂuenc e using only the agent’s o wn priv ate signals. Doing so ensures that π i,t ( θ ) → 0 a.s. for eac h i ∈ S ( θ ? , θ ). Next, leveraging this prop ert y , we wan t to b e able to propagate lo w b eliefs on θ from S ( θ ? , θ ) to V \ S ( θ ? , θ ), i.e., the agen ts in S ( θ ∗ , θ ) should con tribute to w ards driving the actual b eliefs of their out-neigh b ors (and even tually , of all the agents in the set V \ S ( θ ? , θ )) on the hypothesis θ to zero. Using a min-rule of the form ( 4 ), with π i,t +1 ( θ ) featuring as an external net work- indep enden t input, facilitates suc h propagation without compromising the abilities of agen ts in S ( θ ? , θ ) to eliminate θ . When set in motion, our learning rule triggers a pro cess of b elief reduction on θ originating at S ( θ ? , θ ) that even tually propagates to eac h agen t in the netw ork reachable from S ( θ ? , θ ). Remark 1. We emphasize that the pr op ose d le arning rule given by Algorithm 1 do es not employ any form of “b elief-aver aging”. This fe atur e is in stark c ontr ast with existing appr o aches to distribute d hyp othesis testing that r ely either on line ar opinion p o oling [ 4 – 6 ], or lo g-line ar opinion p o oling [ 7 – 14 ]. As such, the lack of line arity in our b elief up date rule pr e cludes (dir e ct or indir e ct) adaptation of existing analysis te chniques to suit our ne e ds. 6 4 Analysis of Algorithm 1 4.1 Statemen t of the Results In this section, we c haracterize the p erformance of Algorithm 1 . W e start with one of the main results of the pap er, prov en in App endix A . Theorem 1. Supp ose the observation mo del satisﬁes the glob al identiﬁability c ondition (Assumption 1 ), and the se quenc e of c ommunic ation gr aphs {G [ t ] } ∞ t =0 is jointly str ongly-c onne cte d (Assumption 2 ). Then, Algorithm 1 pr ovides the fol lowing guar ante es. • (Consistency) : F or e ach agent i ∈ V , µ i,t ( θ ? ) → 1 a.s. • (Asymptotic R ate of R eje ction of F alse Hyp otheses) : Consider any false hyp othesis θ ∈ Θ \ { θ ? } . Then, the fol lowing holds for e ach agent i ∈ V : lim inf t →∞ − log µ i,t ( θ ) t ≥ max v ∈S ( θ ? ,θ ) K v ( θ ? , θ ) a.s. (5) The ab ov e result tells us that with probability 1, every agent i will b e able to rule out each false hypothesis θ exp onentially fast, at a rate that is even tually lo wer-bounded by the b est KL- div ergence across the net work b etw een the pair of hypotheses θ ? and θ . In particular, this implies that given any  > 0, the probability that agent i ’s instantaneous rate of rejection of θ , namely − log µ i,t ( θ ) /t , is low er than the quantit y max v ∈S ( θ ? ,θ ) K v ( θ ? , θ ) by an additiv e factor of  , decays to zero. The next result, prov en in App endix B , sheds some ligh t on the rate of deca y of this probabilit y . Theorem 2. Supp ose the c onditions in The or em 1 hold. Fix θ ∈ Θ \ { θ ? } , and let ¯ K ( θ ? , θ ) = max v ∈S ( θ ? ,θ ) K v ( θ ? , θ ) . Then for every  > 0 and δ ∈ (0 , 1) , ther e exists a set Ω 0 ( δ ) ⊆ Ω with P θ ? (Ω 0 ( δ )) ≥ 1 − δ , such that the fol lowing holds for e ach agent i ∈ V : lim inf t →∞ − 1 t log P θ ?  − log µ i,t ( θ ) t ≤ ¯ K ( θ ? , θ ) −   ∩ Ω 0 ( δ )  ≥  2 8 L 2 . (6) Our next result p ertains to the sp ecial case when the communication graph do es not change o ver time, i.e., when G [ t ] = G , ∀ t ∈ N . T o state the result, we will emplo y the follo wing terminology . Giv en tw o disjoin t sets C 1 , C 2 ⊆ V , we say C 2 is reac hable from C 1 if for ev ery i ∈ C 2 , there exists a directed path in G from some j ∈ C 1 to agen t i (note that j will in general b e a function of i ). Theorem 3. L et the c ommunic ation gr aph b e time-invariant and b e denote d by G . Supp ose the fol lowing c onditions hold. (i) The observation mo del satisﬁes the glob al identiﬁability c ondition (Assumption 1 ). (ii) F or every p air of hyp otheses θ p 6 = θ q ∈ Θ , the set V \ S ( θ p , θ q ) is r e achable fr om the set S ( θ p , θ q ) in G . Then, Algorithm 1 guar ante es c onsistency as in The or em 1 . F urthermor e, for every θ ∈ Θ \ { θ ? } , the fol lowing holds for e ach agent i ∈ V : lim inf t →∞ − log µ i,t ( θ ) t ≥ max v ∈S i ( θ ? ,θ ) K v ( θ ? , θ ) a.s., (7) wher e S i ( θ ? , θ ) ⊆ S ( θ ? , θ ) ar e those sour c e agents fr om which ther e exists a dir e cte d p ath to i in G . 7 Pr o of. Fix θ ∈ Θ \ { θ ? } , and consider an agent i ∈ V \ S ( θ ? , θ ). The sets S ( θ ? , θ ) and S i ( θ ? , θ ) are non-empt y based on conditions (i) and (ii) of the theorem, resp ectiv ely . F ollowing a similar line of argumen t as in the pro of of Theorem 1 , one can establish the follo wing for each v ∈ S i ( θ ? , θ ). lim inf t →∞ − log µ i,t ( θ ) t ≥ K v ( θ ? , θ ) a.s. (8) The assertion regarding equation ( 7 ) then follows readily . Consistency follo ws b y noting that since S i ( θ ? , θ ) ⊆ S ( θ ? , θ ), K v ( θ ? , θ ) > 0 , ∀ v ∈ S i ( θ ? , θ ) . Our next result reveals that the combination of conditions (i) and (ii) in Theorem 3 constitutes minimal requirements on the observ ation mo del and the net work structure for any learning algo- rithm to guarantee consistency , when the observ ations of the agen ts are conditionally indep enden t. Theorem 4. L et the c ommunic ation gr aph b e time-invariant and b e denote d by G . Then, the fol lowing assertions hold. (i) Conditions (i) and (ii) in The or em 3 , taken to gether, is e quivalent to glob al identiﬁability of e ach sour c e c omp onent of G . 6 (ii) Supp ose the observations of the agents ar e indep endent c onditional on the r e alization of any state, i.e., l ( ·| θ ) = n Q i =1 l i ( ·| θ ) , ∀ θ ∈ Θ . Then, glob al identiﬁability of e ach sour c e c omp onent of G is ne c essary and suﬃcient for unique identiﬁc ation of any true state that gets r e alize d, at every agent, with pr ob ability 1. The pro of of the abov e result is fairly straightforw ard and hence omitted here. W e now lev erage the ab o v e results to quan tify the rate at whic h the o verall net w ork uncertain ty about the true state decays to zero. T o measure such uncertain ty , we employ the following metric from [ 5 ] whic h captures the total v ariation distance b et w een the agents’ b eliefs at time-step t , and the probabilit y distribution that is concen trated entirely on the true state of the world, namely 1 θ ? ( · ): e t ( θ ? ) , 1 2 n X i =1 k µ i,t ( · ) − 1 θ ? ( · ) k 1 = n X i =1 X θ 6 = θ ? µ i,t ( θ ) . (9) Giv en that θ ? gets realized, the r ate of so cial le arning is then deﬁned as [ 5 , 12 ]: ρ L ( θ ? ) , lim inf t →∞ − 1 t log e t ( θ ? ) . (10) Notice that the ab ov e expression dep ends on the state b eing realized; to account for the realization of any state, one can simply lo ok at the quantit y min θ ? ∈ Θ ρ L ( θ ? ) that provides a sense for the least rate of learning one can expect giv en a certain observ ation model, a net work, and a consistent learning algorithm. W e ha ve the following immediate corollaries of Theorems 1 and 3 ; their pro ofs are trivial and hence omitted. Corollary 1. Supp ose the c onditions state d in The or em 1 ar e met. Then, Algorithm 1 guar ante es: ρ L ( θ ? ) ≥ min θ 6 = θ ? max v ∈S ( θ ? ,θ ) K v ( θ ? , θ ) a.s. (11) Corollary 2. Supp ose the c onditions state d in The or em 3 ar e met. Then, Algorithm 1 guar ante es: ρ L ( θ ? ) ≥ min θ 6 = θ ? min i ∈V max v ∈S i ( θ ? ,θ ) K v ( θ ? , θ ) a.s. (12) 6 A source comp onen t of a time-inv ariant graph G is a strongly connected comp onen t with no incoming edges. 8 4.2 Discussion of the Results Commen ts on Theorem 1 : Let us compare the rate of learning based on our metho d to those existing in literature. Under iden tical assumptions of global iden tiﬁability of the observ ation mo del, and strong-connectivity (or join t strong-connectivit y as in [ 10 ]) of the underlying communication graph, b oth linear [ 4 , 5 ] and log-linear [ 9 , 10 , 12 ] opinion p o oling lead to an asymptotic rate of rejection of the form P i ∈V ν i K i ( θ ? , θ ) for eac h false h yp othesis θ ∈ Θ \ { θ ? } , for each agen t i ∈ V . 7 Here, ν i represen ts the eigen vector cen tralit y of agen t i ∈ V , which is strictly positive for a strongly- connected graph. Th us, referring to equation ( 5 ) reveals that the asymptotic rate of rejection of eac h false hypothesis (and hence, the rate of so cial learning) resulting from our algorithm (see ( 11 )), is a strict impr ovement ov er all existing rates - this constitutes a signiﬁcant contribution of our pap er. F urthermore, observe from Corollary 1 that the low er b ound on the rate of so cial learning is indep endent of b oth the size and structur e of the network . A k ey implication of this result is the fact that as long as the total information conten t of the netw ork remains the same, the sp eciﬁc manner in which signals are allo cated to agents does not impact the long-run learning rate of our approach. In sharp contrast, existing learning rates that dep end on the agen ts’ eigen vector cen tralities may suﬀer under p oor signal allo cations; see [ 5 ] for a discussion on this topic. Commen ts on Theorem 2 : At an y given time t , for some i ∈ V and θ 6 = θ ? , let us consider the set of all sample paths where agent i ’s instan taneous rate of rejection of θ is low er than its asymptotic low er b ound by a constant additive factor of  . Theorem 2 complemen ts Theorem 1 b y telling us that an arbitrarily accurate approximation of the measure of such “bad” sample paths ev entually deca ys to zero at an exp onen tial rate no smaller than  2 / 8 L 2 (the appro ximation is arbitrarily accurate since the set Ω 0 ( δ ) can b e chosen to hav e measure arbitrarily close to 1). It is instructiv e to compare the conce n tration result of Theorem 2 with [ 10 , Theorem 2], [ 12 , Theorem 2], and [ 9 , Lemma 3]. The analogous results in these pap ers are more elegant relativ e to ours, since they do not in volv e a set of the form Ω 0 ( δ ) that shows up in our analysis. A reﬁnement of Theorem 2 to obtain a cleaner non-asymptotic result w ould require a precise c haracterization of the transien t dynamics generated b y our learning rule: w e reserve inv estigations along this line as future w ork. Commen ts on Theorem 3 : While Theorem 4 iden tiﬁes an algorithm-indep endent necessary condition for ensuring unique identiﬁabilit y of an y realized state at every agen t (when the com- m unication graph is time-inv ariant and agents receive conditionally indep endent signals), Theorem 3 rev eals that suc h a condition is also suﬃcient for our prop osed learning algorithm to work. W e b eliev e that a result of this ﬂa v or is missing in the existing literature on distributed h yp othesis testing, where strong-connectivity is a standard assumption. The authors in [ 24 ] do relax the strong-connectivit y assumption, but require every strongly-connected comp onent of G to b e glob- ally iden tiﬁable for learning to tak e place [ 24 , Proposition 4]. In con trast, Theorem 3 requires only the source comp onents of G to satisfy the global iden tiﬁability requirement. Interestingly , our conclusions in this context align with an analogous result that identiﬁes joint detectability of eac h source comp onen t as the minimal requirement for solving the related problem of distributed state estimation [ 25 , 26 ]. The more general net work condition in Theorem 3 (as opp osed to strong-connectivit y) comes at the cost of a p otential reduction in the rate of so cial learning, as reﬂected in Corollary 2 . When the underlying graph is strongly-connected, S i ( θ ? , θ ) = S ( θ ? , θ ). Consequently , the min w.r.t. the agen t set V in equation ( 12 ) go es aw a y , and w e recov er Corollary 1 . 7 In [ 10 ], the consensus weigh ts are chosen to obtain a net work-structure indep enden t (alb eit netw ork-size dep en- den t) rate of rejection of θ of the form 1 n P i ∈V K i ( θ ? , θ ). 9 5 Learning despite Misinformation In this section, w e will address the problem of learning the true state of the world despite the presence of certain agen ts who do not b eha v e as exp ected and delib erately try to spread misinfor- mation. In order to isolate the c hallenges in tro duced b y such malicious en tities, we will consider a time-in v arian t communication graph G for our subsequent discussion; we anticipate that our pro- p osed approach will extend to the time-v arying case with suitable mo diﬁcations. W e now describ e the mo del of agent-misbehavior that we consider. 8 Adv ersary Mo del: W e assume that a certain subset of the agen ts are adv ersarial, and mo del their b eha vior based on the Byzantine fault mo del [ 28 ]. Sp eciﬁcally , Byzan tine agents possess complete knowledge of the observ ation mo del, the netw ork mo del, the algorithms b eing used, the information b eing exchanged, and the true state of the world. Lev eraging such information, adv ersarial agents can behav e arbitrarily and in a co ordinated manner, and can in particular, send incorrect, p oten tially inconsistent information to their out-neigh b ors. In return for allowing suc h w orst-case adversarial b eha vior and kno wledge by the adv ersaries, w e will restrict the n umber of suc h adversaries; in particular, we will consider an f -lo cal adversarial mo del, i.e., we assume that there are at most f adv ersaries in the neighborho o d of an y non-adv ersarial agent, where f ∈ N . Finally , w e emphasize that the non-adv ersarial agen ts are unaw are of the iden tities of the adversaries in their neighborho o d. As is fairly standard in the distributed fault-toleran t literature [ 29 – 36 ], we only assume that non-adversarial agen ts know the upp er b ound f on the num ber of adversaries in their neigh b orho od. T he adversarial set will b e denoted by A ⊂ V , and the remaining agents R = V \ A will b e called the regular agents. Our immediate goals are as follows. (i) Devise an algorithm that enables eac h regular agen t to asymptotically identify the true state with probabilit y 1, despite the presence of an f -lo cal Byzan- tine adversarial set. (ii) Iden tify conditions on the observ ation mo del and the netw ork structure that guaran tee correctness of suc h an algorithm. Prior to addressing these goals, w e brieﬂy motiv ate the need for a no vel Byzantine-resilien t learning algorithm. Motiv ation : A standard wa y to analyze the impact of adv ersarial agents while designing resilien t distributed consensus-based proto cols (for applications in consensus [ 29 , 30 ], optimization [ 32 , 33 ], h yp othesis testing [ 14 ], and m ulti-agent rendezvous [ 37 ]) is to construct an equiv alen t matrix represen tation of the linear up date rule that in volv es only the regular agen ts [ 38 ]. In particular, this requires expressing the iterates of a regular agent as a conv ex combination of the iterates of its regular neigh b ors, based on appropriate ﬁltering techniques, and under certain assumptions on the net work structure. While this can indeed b e ac hieved eﬃciently for scalar consensus problems, for problems requiring consensus on vectors (like the b elief vectors in our setting), such an approac h t ypically requires the computation of sets known as Tverb er g p artitions . Ho wev er, there is no kno wn algorithm that can compute an exact Tverberg partition in p olynomial time for a general d - dimensional ﬁnite point set [ 39 ]. Consequently , since the ﬁltering approach developed in [ 14 ] requires eac h regular agent to compute a Tv erb erg partition at every iteration, the resulting computations are forbiddingly high. The authors in [ 14 ] do brieﬂy discuss an alternate pairwise learning rule that requires agents to p erform scalar consensus on relative conﬁdence levels (instead of b eliefs) of one h yp othesis ov er another. Under suc h a rule, for each regular agent, its relativ e conﬁdence on the true state o ver ev ery false hypothesis approaches inﬁnity - a condition that is diﬃcult to v erify in practice. Moreov er, the pairwise learning rule in [ 14 ] requires each agent to maintain and up date at eac h time-step a vector of dimension O ( m 2 ). In contrast, we prop ose a simple, ligh t-weigh t Byzan tine-resilient learning rule that a voids the computation of Tv erb erg partitions, and requires 8 Diﬀeren t from our setting, the for c eful agents in [ 27 ] do not b ehav e arbitrarily and, in fact, up date their b eliefs (ev en if infrequently) b y interacting with their neighbors; our adversary mo del makes no such assumptions. 10 Algorithm 2 Belief up date rule for eac h i ∈ R 1: Initialization: µ i, 0 ( θ ) > 0, π i, 0 ( θ ) > 0, ∀ θ ∈ Θ, and P θ ∈ Θ µ i, 0 ( θ ) = 1, P θ ∈ Θ π i, 0 ( θ ) = 1 2: T ransmit µ i, 0 to out-neigh b ors 3: for t + 1 ∈ N + do 4: for θ ∈ Θ do 5: Up date lo cal b elief on θ as p er ( 3 ) 6: if |N i | ≥ (2 f + 1) then 7: Sort µ j,t ( θ ) , j ∈ N i from highest to low est, and reject the highest f and the low est f of suc h b eliefs. 8: Let M θ i,t b e the set of agents whose b eliefs are not rejected in the previous step. Up date µ i,t +1 ( θ ) as µ i,t +1 ( θ ) = min {{ µ j,t ( θ ) } j ∈M θ i,t , π i,t +1 ( θ ) } m P p =1 min {{ µ j,t ( θ p ) } j ∈M θ p i,t , π i,t +1 ( θ p ) } (13) 9: else 10: Up date µ i,t +1 ( θ ) as µ i,t +1 ( θ ) = π i,t +1 ( θ ) (14) 11: end if 12: end for 13: T ransmit µ i,t +1 to out-neigh b ors 14: end for agen ts to up date t wo m -dimensional b elief v ectors. 5.1 A Byzan tine-Resilien t Distributed Learning Rule In this section, w e dev elop an easy to implement and computationally-eﬃcient extension of Al- gorithm 1 that guarantees learning despite the presence of Byzantine adversaries. W e call it the Lo cal-Filtering based Resilient Hyp othesis Elimination (LFRHE) algorithm (Algorithm 2 ). Like Algorithm 1 , the LFRHE algorithm requires ev ery regular agen t i to maintain and update (at ev ery time-step t ) a lo cal b elief vector π i,t , and an actual belief vector µ i,t . While π i,t is up dated as b efore via ( 3 ), the update of µ i,t is the k ey feature of Algorithm 2 . T o up date µ i,t +1 ( θ ), agen t i ∈ R ﬁrst chec ks whether it has at least 2 f + 1 neigh b ors. If it do es, then it rejects the highest f and the low est f neigh b oring b eliefs µ j,t ( θ ) , j ∈ N i (line 7 in Algo. 2 ), and emplo ys a min-rule as b efore, but using only the remaining b eliefs (line 8 in Algo. 2 ). Th us, agent i ﬁlters out the most extreme neighboring b eliefs on eac h hypothesis, and retains only the mo derate ones to up date its o wn actual b elief. If agent i has strictly few er than 2 f + 1 neigh b ors, then it decides against using neigh b oring information and, instead, up dates its actual b elief vector to b e equal to its local b elief v ector (line 10 in Algo. 2 ). T o state our main result concerning the correctness of Algorithm 2 , we require the following deﬁnitions. Deﬁnition 2. ( r - r e achable set ) [ 30 ] F or a gr aph G = ( V , E ) , a set C ⊆ V , and an inte ger r ∈ N + , C is an r -r e achable set if ther e exists an i ∈ C such that |N i \ C | ≥ r . Deﬁnition 3. ( str ongly r - r obust gr aph w.r.t. S ( θ p , θ q ) ) F or r ∈ N + and θ p , θ q ∈ Θ , a gr aph 11 G = ( V , E ) is str ongly r -r obust w.r.t. the set of sour c e agents S ( θ p , θ q ) , if for every non-empty subset C ⊆ V \ S ( θ p , θ q ) , C is r -r e achable. Theorem 5. Supp ose that for every p air of hyp otheses θ p , θ q ∈ Θ , the gr aph G is str ongly (2 f + 1) - r obust w.r.t. the sour c e set S ( θ p , θ q ) . Then, Algorithm 2 guar ante es the fol lowing despite the actions of any f -lo c al set of Byzantine adversaries. • (Consistency) : F or e ach agent i ∈ R , µ i,t ( θ ? ) → 1 a.s. • (Asymptotic R ate of R eje ction of F alse Hyp otheses) : Consider any false hyp othesis θ ∈ Θ \ { θ ? } . Then, the fol lowing holds for e ach agent i ∈ R . lim inf t →∞ − log µ i,t ( θ ) t ≥ min v ∈S ( θ ? ,θ ) ∩R K v ( θ ? , θ ) a.s. (15) Pr o of. See App endix C . Remark 2. F or any p air θ p , θ q ∈ Θ , notic e that the str ong-r obustness c ondition in The or em 5 (to gether with Def. 3 ) r e quir es |S ( θ p , θ q ) | ≥ (2 f + 1) , if V \ S ( θ p , θ q ) is non-empty. In p articular, it blends r e quir ements on the signal structur es of the agents with those on the c ommunic ation gr aph. T o gain intuition ab out this c ondition, supp ose Θ = { θ 1 , θ 2 } , and c onsider an agent i ∈ V \ S ( θ 1 , θ 2 ) . T o enable i to le arn the truth despite p otential adversaries in its neighb orho o d, one r e quir es (i) r e dundancy in the signal structur es of the agents, and (ii) r e dundancy in the network structur e to ensur e r eliable information ﬂow fr om S ( θ 1 , θ 2 ) to agent i . These r e quir ements ar e enc apsulate d by The or em 5 . F or a ﬁxe d sour c e set S ( θ p , θ q ) , che cking whether G is str ongly (2 f + 1) -r obust w.r.t. S ( θ p , θ q ) c an b e done in p olynomial time by dr awing c onne ctions to the pr o c ess of b o otstr ap p er c olation on networks [ 34 , Pr op osition 5]. Sinc e the sour c e sets for e ach p air θ p , θ q ∈ Θ c an also b e c ompute d in p olynomial time via a simple insp e ction of the agents’ signal structur es, it fol lows that the str ong-r obustness c ondition in The or em 5 c an b e che cke d in p olynomial time. Lev eraging Theorem 5 , w e can characterize the rate of decay of the collective uncertain ty of the regular agents regarding the true state. T o do so, w e employ the follo wing mo diﬁcation of the metric ( 9 ): e R t ( θ ? ) , 1 2 X i ∈R k µ i,t ( · ) − 1 θ ? ( · ) k 1 = X i ∈R X θ 6 = θ ? µ i,t ( θ ) . (16) Note that this metric only considers the b eliefs of the regular agents, as the Byzantine agents can up date their b eliefs how ev er they wish. With θ ? as the true state, w e deﬁne the rate of so cial learning in the presence of Byzan tine adversaries as: ρ R L ( θ ? ) , lim inf t →∞ − 1 t log e R t ( θ ? ) . (17) W e hav e the follo wing immediate corollary of Theorem 5 . Corollary 3. Supp ose the c onditions state d in The or em 5 ar e met. Then, Algorithm 2 guar ante es: ρ R L ( θ ? ) ≥ min θ 6 = θ ? min v ∈S ( θ ? ,θ ) ∩R K v ( θ ? , θ ) a.s. (18) 12 1 2 3 4 5 6 n . . . 1 2 3 4 5 6 7 8 9 (a) (b) Figure 1: Figures 1 (a) and 1 (b) represent the net w ork mo dels for simulation examples 1 and 2, resp ectiv ely . (a) (b) Figure 2: Consider the setup of sim ulation example 1 with n = 5 agents. Fig. 2 (a) depicts the evolution of agen t 3’s b elief on the true state θ 2 , and Fig. 2 (b) depicts the evolution of the instan taneous rate of rejection of θ 1 for agen t 3, namely q 3 ,t ( θ 1 ) = − log µ 3 ,t ( θ 1 ) /t . 6 Sim ulations Example 1 (Impact of Net w ork Size on Rate of Conv ergence): F or our ﬁrst sim ulation study , we consider a binary hypothesis testing problem, i.e., Θ = { θ 1 , θ 2 } , where the signal space for eac h agent is iden tical and comprises of signals w 1 and w 2 . The (time-inv ariant) undirected net work for this example is depicted in Figure 1 (a). The likelihoo d mo dels of the agents are as follo ws: l 1 ( w 1 | θ 1 ) = 0 . 7 , l 1 ( w 1 | θ 2 ) = 0 . 5, and l i ( w 1 | θ 1 ) = l i ( w 1 | θ 2 ) = 0 . 5 , ∀ i ∈ V \ { 1 } , i.e., agent 1 is the only informative agent. In order to compare the p erformance of Algorithm 1 to the linear and log-linear b elief up date rules in [ 4 ] and [ 10 ], w e implemen t the latter assuming consensus weigh ts are assigned based on the lazy Metrop olis scheme (see [ 10 ] for details). Based on this w eight assignmen t, it is easy to verify that the eigenv ector centralit y of each agent is 1 /n . All agents start out with uniform priors. With θ ? = θ 2 , and n = 5, Figure 2 illustrates the p erformance of the three algorithms w.r.t. agent 3. In particular, Figure 2 (a) reveals that based on our approac h, agent 3’s b elief on the true state θ 2 con verges to 1 faster than the other algorithms. Figure 2 (b) mak es this observ ation precise b y plotting the instantaneous rate of rejection of θ 1 for agent 3, namely q 3 ,t ( θ 1 ) = − log µ 3 ,t ( θ 1 ) /t . Consisten t with the resp ectiv e theoretical ﬁndings, q 3 ,t ( θ 1 ) is even tually lo wer-bounded by K 1 ( θ 2 , θ 1 ) for our algorithm (see Theorem 1 ), approaches K 1 ( θ 2 , θ 1 ) /n for the log-linear rule in [ 10 ], and is ev entually upp er-b ounded by K 1 ( θ 2 , θ 1 ) /n for the linear rule in [ 4 ]. Similar conclusions hold for the other agen ts. Supp ose we no w double the num ber of agen ts in the netw ork. Agen t 1 contin ues to remain the 13 (a) (b) Figure 3: Consider the setup of sim ulation example 1 with n = 10 agents. Fig. 3 illustrates the dilution in the rates of so cial learning for the linear and log-linear rules with an increase in the n umber of uninformative agents. Figures 3 (a) and 3 (b) are analogous to those in Figure 2 . (a) (b) Figure 4: Consider the setup of simulation example 2, where agent 5 acts as an adversary . Figures 3 (a) and 3 (b) depict the evolution of agent 7’s b elief on the true state, when θ ? = θ 1 , and θ ? = θ 2 , resp ectiv ely . only informative agent. Figure 3 compares the p erformances of the three algorithms for this case. Notably , the conv ergence rate for our approac h remains unaﬀected, whereas that for the linear and log-linear rules gets diluted. This observ ation can b e attributed to the fact that while the rate pro vided by our algorithm is b oth net work-structure and netw ork-size indep enden t for strongly- connected netw orks (see Section 4.2 ), the rates of the linear and log-linear rules dep end crucially on the eigenv ector centralities of the agen ts, which, in this case, corresp ond to 1 /n . Th us, the gap b et w een the p erformance of our algorithm, and that of the linear and log-linear up date rules (as measured by conv ergence rates), b ecomes more pronounced as the num ber of uninformative agents increase (i.e., as n increases, but the total information con tent of the netw ork remains the same). Example 2 (Impact of Adv ersaries): While the previous example highlighted the b eneﬁts of Algorithm 1 , w e no w focus on an example that demonstrates the resilience of its v arian t, namely the LFRHE algorithm (Algorithm 2 ), to the presence of Byzantine adversaries. T o this end, consider the undirected net work in Figure 1 (b). F or this example, Θ = { θ 1 , θ 2 , θ 3 } , and S i = { w 1 , w 2 } , ∀ i ∈ V . Supp ose the agen t likelihoo d mo dels are given by l i ( w 1 | θ 1 ) = 3 / 4 , l i ( w 1 | θ 2 ) = l i ( w 1 | θ 3 ) = 1 / 3 , ∀ i ∈ { 1 , 2 , 3 } , l i ( w 1 | θ 1 ) = l i ( w 1 | θ 2 ) = 2 / 5 , l i ( w 1 | θ 3 ) = 1 / 7 , ∀ i ∈ { 4 , 5 , 6 } , and l i ( w 1 | θ 1 ) = l i ( w 1 | θ 2 ) = 1 / 2 , l i ( w 1 | θ 3 ) = 5 / 6 , ∀ i ∈ { 7 , 8 , 9 } . Supp ose f = 1 and agent 5 is the only adv ersarial agent. It is easy to see that condition (i) in Theorem 5 is met. W e will compare the p erformance of Algorithm 14 2 with the linear rule in [ 4 ], and the log-linear rule in [ 10 ]. F or implementing the latter, we again assign consensus w eights based on the lazy Metropolis sc heme. All agen ts start out with uniform priors. The adv ersary , agent 5, maintains a b elief of 0 . 1 on the true state, and 0 . 45 on each of the false h yp otheses, for all t ≥ 20. Figures 4 (a) and 4 (b) illustrate the rep ercussions of this action on agen t 7, when θ ? = θ 1 and θ ? = θ 2 , resp ectiv ely: while the linear and log-linear rules fail to recov er from the attac k, Algorithm 2 enables agen t 7 to infer the truth. Similar conclusions hold for the other regular agen ts. 7 Conclusion W e prop osed and analyzed a nov el algorithm for addressing the problem of distributed hypothesis testing. The key distinguishing feature of our learning algorithm is that it do es not employ an y linear consensus-based data aggregation proto col. Instead, it relies on a “min-rule” to spread beliefs through the net work. Under mild assumptions of global iden tiﬁability and join t strong-connectivit y , w e established consistency of our learning rule. In particular, we sho wed that the rate of learning resulting from our approach strictly improv es up on all existing rates. F or static netw orks, we established consistency of our algorithm under minimal requirements on the observ ation mo del and the net work structure. Finally , w e prop osed a simple and computationally-eﬃcient version of our learning rule that accounts for w orst-case adv ersarial b eha vior on the part of certain agen ts in the net work. As future w ork, we plan to inv estigate the impact of communication constrain ts on the p erformance of distributed inference/estimation algorithms. A Pro of of Theorem 1 The proof of Theorem 1 is based on sev eral in termediate results. W e start with the follo wing simple lemma that c haracterizes the asymptotic behavior of the lo cal b elief sequences generated based on ( 3 ); we pro vide a proof (adapted to our notation) to k eep the pap er self-con tained, and to in tro duce certain quan tities that will b e referenced later in our analysis. Lemma 1. Consider a false hyp othesis θ ∈ Θ \ { θ ? } , and an agent i ∈ S ( θ ? , θ ) . Supp ose π i, 0 ( θ p ) > 0 , ∀ θ p ∈ Θ . Then, the up date rule ( 3 ) ensur es that (i) π i,t ( θ ) → 0 a.s., (ii) π i, ∞ ( θ ? ) , lim t →∞ π i,t ( θ ? ) exists a.s. and satisﬁes π i, ∞ ( θ ? ) ≥ π i, 0 ( θ ? ) , and (iii) the fol lowing holds: lim t →∞ 1 t log π i,t ( θ ) π i,t ( θ ? ) = − K i ( θ ? , θ ) a.s. (19) Pr o of. Consider any agent i ∈ S ( θ ? , θ ), and deﬁne: ρ i,t ( θ ) , log π i,t ( θ ) π i,t ( θ ? ) , λ i,t ( θ ) , log l i ( s i,t | θ ) l i ( s i,t | θ ? ) . (20) Then, based on ( 3 ), w e obtain the following recursion: ρ i,t +1 ( θ ) = ρ i,t ( θ ) + λ i,t +1 ( θ ) , ∀ t ∈ N . (21) Rolling out the ab o v e equation ov er time yields ρ i,t ( θ ) = ρ i, 0 ( θ ) + t X k =1 λ i,k ( θ ) , ∀ t ∈ N + . (22) 15 Notice that { λ i,t ( θ ) } is a sequence of i.i.d. random v ariables with ﬁnite means (see equation ( 1 )). In particular, it is easy to verify that each random v ariable λ i,t ( θ ) has mean 9 giv en by − K i ( θ ? , θ ). Th us, based on the strong law of large num bers, we hav e 1 t t P k =1 λ i,k ( θ ) → − K i ( θ ? , θ ) almost surely . Dividing b oth sides of ( 22 ) by t , and taking the limit as t go es to inﬁnity , we then obtain lim t →∞ 1 t ρ i,t ( θ ) = − K i ( θ ? , θ ) a.s., (23) establishing part (iii) of the lemma. No w note that based on the deﬁnition of the set S ( θ ? , θ ), K i ( θ ? , θ ) > 0. It then follo ws from ( 23 ) that ρ i,t ( θ ) → −∞ almost surely , and hence π i,t ( θ ) → 0 almost surely . This establishes part (i) of the lemma. F or an y θ ∈ Θ θ ? i , observ e that λ i,t ( θ ) = 0 , ∀ t ∈ N + . It then follo ws from ( 21 ) that for each θ ∈ Θ θ ? i , ρ i,t ( θ ) = ρ i, 0 ( θ ) , ∀ t ∈ N + . F rom the ab o v e discussion, w e conclude that a limiting b elief vector π i, ∞ exists almost surely , with non-zero en tries corresp onding to eac h θ ∈ Θ θ ? i . P art (ii) of the lemma then follo ws readily . While our prop osed learning rule is tailored to facilitate propagation of low b eliefs on false h yp otheses, it is crucial to also ensure that the b eliefs of all agents on the true state remain b ounded aw ay from zero. In particular, consider the follo wing scenario. During a transient phase, certain agents see priv ate signals that cause them to temp orarily low er their lo cal b eliefs on the true state. This eﬀect manifests itself in the actual b eliefs of the agents via the min-rule ( 4 ). W e ask: can such a transient phenomenon trigger a cascade of progressively low er b eliefs on the true state? The next imp ortant result asserts that this will almost surely never b e the case. Lemma 2. Supp ose the c onditions state d in The or em 1 hold, and Algorithm 1 is employe d by e ach agent. Then, ther e exists a set ¯ Ω ⊆ Ω with the fol lowing pr op erties: (i) P θ ? ( ¯ Ω) = 1 , and (ii) for e ach ω ∈ ¯ Ω , ther e exist c onstants η ( ω ) ∈ (0 , 1) and t 0 ( ω ) ∈ (0 , ∞ ) such that on the sample p ath ω , π i,t ( θ ? ) ≥ η ( ω ) , µ i,t ( θ ? ) ≥ η ( ω ) , ∀ t ≥ t 0 ( ω ) , ∀ i ∈ V . (24) Pr o of. Let ¯ Ω ⊆ Ω denote the set of sample paths for which assertions (i)-(iii) in Lemma 1 hold for eac h false hypothesis θ ∈ Θ \ { θ ? } . Based on Lemma 1 , w e note that P θ ? ( ¯ Ω) = 1. Consequen tly , to pro ve the result, it suﬃces to establish the existence of η ( ω ) ∈ (0 , 1), and t 0 ( ω ) ∈ (0 , ∞ ) for each sample path ω ∈ ¯ Ω, suc h that ( 24 ) holds. T o this end, ﬁx an arbitrary sample path ω ∈ ¯ Ω . W e ﬁrst argue that the lo cal b eliefs of every agent on the true state θ ? are b ounded aw a y from 0 on ω . T o see this, pick an y agent i ∈ V . Supp ose there exists some θ ∈ Θ \ { θ ? } for which i ∈ S ( θ ? , θ ). Then, based on our c hoice of ω , Lemma 1 implies that π i, ∞ ( θ ? ) ≥ π i, 0 ( θ ? ) > 0, where the last inequalit y follo ws from the requirement of non-zero priors in line 1 of Algo. 1 . In particular, giv en the structure of the up date rule ( 3 ), it follows that π i,t ( θ ? ) > 0 for all time. This is true since if π i,t ( θ ? ) = 0 at any instan t, then the corresp onding b elief w ould remain at 0 for all subsequent time-steps, thereb y violating the fact that π i, ∞ ( θ ? ) ≥ π i, 0 ( θ ? ) > 0. Now consider the scenario where there exists no θ ∈ Θ \ { θ ? } for which i ∈ S ( θ ? , θ ), i.e., every hypothesis in Θ is observ ationally equiv alent to θ ? from the p oin t of view of agen t i . In this case, it is easy to see that based on ( 3 ), π i,t = π i, 0 , ∀ t ∈ N + . In particular, this implies π i,t ( θ ? ) = π i, 0 ( θ ? ) > 0 , ∀ t ∈ N + . This establishes our claim that on ω , π i,t ( θ ? ) remains b ounded aw a y from zero ∀ i ∈ V . 9 More precisely , the mean here is obtained by using the exp ectation operator E θ ? [ · ] asso ciated with the measure P θ ? . 16 T o pro ceed, deﬁne γ 1 , min i ∈V π i, 0 ( θ ? ) > 0, where the inequality follows from line 1 in Algo 1 . Pic k a small n um b er δ > 0 suc h that δ < γ 1 , and notice that our discussion concerning the ev olution of the lo cal b eliefs readily implies the existence of a time-step t 0 ( ω ), such that for all t ≥ t 0 ( ω ), π i,t ( θ ? ) ≥ γ 1 − δ > 0 , ∀ i ∈ V . With γ 2 ( ω ) , min i ∈V { µ i,t 0 ( ω ) ( θ ? ) } , w e claim that γ 2 ( ω ) > 0. The claim follo ws by noting that given the structure of the up date rule ( 4 ), and the requiremen t of non-zero priors in Algo 1 , γ 2 ( ω ) can equal 0 if and only if some agent in the net w ork sets its lo cal b elief on θ ? to 0 at some time-step prior to t 0 ( ω ). How ev er, this p ossibilit y is ruled out in view of the previously established fact that on ω , π i,t ( θ ? ) > 0 , ∀ t ∈ N , ∀ i ∈ V . Let η ( ω ) = min { γ 1 − δ, γ 2 ( ω ) } > 0. In words, η ( ω ) low er-b ounds the lo w est b elief (considering b oth lo cal and actual b eliefs) on the true state θ ? held b y an agen t at time-step t 0 ( ω ). It is apparen t from the preceding discussion that π i,t ( θ ? ) ≥ η ( ω ) , ∀ t ≥ t 0 ( ω ) , ∀ i ∈ V . Thus, to complete the pro of, it remains to establish that µ i,t ( θ ? ) ≥ η ( ω ) , ∀ t ≥ t 0 ( ω ) , ∀ i ∈ V . T o this end, let us ﬁx an agent i and observe the following: µ i,t 0 ( ω )+1 ( θ ? ) ( a ) = min {{ µ j,t 0 ( ω ) ( θ ? ) } j ∈N i [ t 0 ( ω )] ∪{ i } , π i,t 0 ( ω )+1 ( θ ? ) } m P p =1 min {{ µ j,t 0 ( ω ) ( θ p ) } j ∈N i [ t 0 ( ω )] ∪{ i } , π i,t 0 ( ω )+1 ( θ p ) } ( b ) ≥ η ( ω ) m P p =1 min {{ µ j,t 0 ( ω ) ( θ p ) } j ∈N i [ t 0 ( ω )] ∪{ i } , π i,t 0 ( ω )+1 ( θ p ) } ≥ η ( ω ) m P p =1 π i,t 0 ( ω )+1 ( θ p ) ( c ) = η ( ω ) , (25) where ( a ) is giv en by ( 4 ), ( b ) follows from the w ay η ( ω ) is deﬁned and b y noting that π i,t ( θ ? ) ≥ η ( ω ) , ∀ t ≥ t 0 ( ω ) , ∀ i ∈ V , and ( c ) follows by noting that the lo cal b elief vectors generated via ( 3 ) are v alid probability distributions o ver the hypothesis set Θ at each time-step, and hence m P p =1 π i,t 0 ( ω )+1 ( θ p ) = 1. The ab o ve reasoning applies to every agen t in the net work, and can b e rep eated to establish ( 24 ) via induction. The next result establishes that the intrinsic discriminatory capabilities of an agen t are preserved under our learning rule. Lemma 3. Supp ose the c onditions state d in The or em 1 hold, and Algorithm 1 is employe d by e ach agent. Consider any false hyp othesis θ ∈ Θ \ { θ ? } , and an agent i ∈ S ( θ ? , θ ) . Then, lim inf t →∞ − log µ i,t ( θ ) t ≥ K i ( θ ? , θ ) a.s. (26) Pr o of. With ¯ Ω deﬁned as in Lemma 2 , recall that P θ ? ( ¯ Ω) = 1, and pick any ω ∈ ¯ Ω . Now consider an y false h yp othesis θ ∈ Θ \ { θ ? } , and an agent i ∈ S ( θ ? , θ ). Fix an y  > 0, and notice that since i ∈ S ( θ ? , θ ), Eq. ( 19 ) in Lemma 1 implies that there exists t i ( ω , θ,  ), such that π i,t ( θ ) < e − ( K i ( θ ? ,θ ) −  ) t , ∀ t ≥ t i ( ω , θ,  ) . (27) F urthermore, since ω ∈ ¯ Ω, Lemma 2 guarantees the existence of a time-step t 0 ( ω ) ∈ (0 , ∞ ), and a constan t η ( ω ) ∈ (0 , 1), such that on ω , π i,t ( θ ? ) ≥ η ( ω ) , µ i,t ( θ ? ) ≥ η ( ω ) , ∀ t ≥ t 0 ( ω ) , ∀ i ∈ V . Let 17 ¯ t i ( ω , θ ,  ) = max { t 0 ( ω ) , t i ( ω , θ ,  ) } . Let us suppress the dep endence of ¯ t i ( ω , θ ,  ) on i, ω , θ and  for simplicit y of notation, and observe the following inequalities: µ i, ¯ t +1 ( θ ) ( a ) ≤ π i, ¯ t +1 ( θ ) m P p =1 min {{ µ j, ¯ t ( θ p ) } j ∈N i [ ¯ t ] ∪{ i } , π i, ¯ t +1 ( θ p ) } ≤ π i, ¯ t +1 ( θ ) min {{ µ j, ¯ t ( θ ? ) } j ∈N i [ ¯ t ] ∪{ i } , π i, ¯ t +1 ( θ ? ) } ( b ) < e − ( K i ( θ ? ,θ ) −  )( ¯ t +1) η ( ω ) . (28) In the ab o ve inequalities, (a) follows from ( 4 ), whereas (b) follows from ( 27 ) and by noting that all agen ts hav e b oth their lo cal and actual b eliefs lo wer b ounded b y η ( ω ) b ey ond time-step ¯ t. In particular, it is easy to see that the arguments used to arriv e at ( 28 ) apply to each time-step t ≥ ¯ t + 1 . Based on ( 28 ), we then obtain that ∀ t ≥ ¯ t + 1: − log µ i,t ( θ ) t > ( K i ( θ ? , θ ) −  ) + log η ( ω ) t . (29) T aking the limit inferior on b oth sides of ( 29 ), and noting that  can b e made arbitrarily small, readily leads to ( 26 ). F or the subsequent discussion, let us ﬁx a particular false h yp othesis θ ∈ Θ \ { θ ? } , and assume that global iden tiﬁability holds. Let v θ ∈ argmax l ∈S ( θ ? ,θ ) K l ( θ ? , θ ) represen t an y agen t with the best discriminatory p o wer w.r.t. the false hypothesis θ , given that θ ? gets realized. Based on Lemma 3 , w e hav e lim inf t →∞ − log µ v θ ,t ( θ ) t ≥ K v θ ( θ ? , θ ) a.s. (30) Our goal is to no w establish that each agent i ∈ V \ { v θ } inherits the same asymptotic rate of rejection of θ as that of agent v θ in ( 30 ). Roughly sp eaking, we will achiev e this by showing that under the assumption of joint strong-connectivity , the b elief of any agent i ∈ V \ { v θ } on θ is “not to o far oﬀ” from the b elief of agent v θ on θ . In what follows, w e make this idea precise. First, w e require some additional notation: with each agent i ∈ V , w e asso ciate a non-negativ e scalar c i,t ( θ ) ∈ [0 , ∞ ]. These parameters evolv e based on the following rules. 10 (i) c v θ ,t ( θ ) = 0 , ∀ t ∈ N . (ii) c i, 0 ( θ ) = ∞ , ∀ i ∈ V \ { v θ } . (iii) F or each i ∈ V \ { v θ } and t ∈ N , deﬁne τ i,t ( θ ) , min j ∈N i [ t ] ∪{ i } c j,t ( θ ), and c i,t +1 ( θ ) , τ i,t ( θ ) + 1 . (31) T o explain the purp ose of the ab o ve rules, w e will adhere to the follo wing terminology . W e say that there exists a path of length m ∈ N + from v θ to i ∈ V \ { v θ } ov er [ t − m, t − 1], if there exist agen ts x ( t − m + 1) , . . . , x ( t ) ∈ V \ { v θ } , such that ( x ( τ − 1) , x ( τ )) ∈ E [ τ − 1], where τ ∈ { t − m + 1 , . . . , t } , x ( t − m ) = v θ , and x ( t ) = i . Note that the agents app earing in the path need not 10 Note that the agents do not actually main tain or up date the parameters c i,t ( θ ). Instead, they hav e been introduced solely for the purp ose of analysis. 18 b e distinct, and that we hav e assumed the presence of self-lo ops in each graph G [ t ] , t ∈ N . Rules (i)-(iii) hav e b een designed in a manner such that if c i,t ( θ ) is ﬁnite at any time-step t ∈ N for any agen t i ∈ V \ { v θ } , then there exists a path of length c i,t ( θ ) from v θ to i ov er [ t − c i,t ( θ ) , t − 1], in the sense describ ed ab o ve. Analyzing the time-evolution of c i,t ( θ ) enables us to then relate the b elief µ i,t ( θ ) of agent i to a dela yed-v ersion of the b elief µ v θ ,t ( θ ) of agent v θ , where the delay is precisely c i,t ( θ ) (the ab ov e statements are formalized and prov en in Lemma 5 ). Since agent v θ is the reference agent here, its delay w.r.t. its own b elief on θ is set to 0 for all time, thus explaining rule (i). Initially , all agen ts in V \ { v θ } start out with an “inﬁnite-dela y ” w.r.t. the b elief of agent v θ ; this is captured b y rule (ii). Finally , the rationale b ehind up dating c i,t ( θ ) via rule (iii) is to formalize the intuition that under the assumption of joint strong-connectivity , the lengths of paths linking v θ to age n ts in V \ { v θ } (and hence, the corresp onding delays) should ev entually remain uniformly b ounded; we b egin by establishing this fact in the follo wing lemma. Lemma 4. Consider any θ ∈ Θ \ { θ ? } and supp ose the joint str ong-c onne ctivity assumption (As- sumption 2 ) holds. Then, the fol lowing is true: c i,t ( θ ) ≤ 2( n − 1) T , ∀ i ∈ V , ∀ t ≥ ( n − 1) T , (32) wher e T is the c onstant app e aring in Assumption 2 . Pr o of. Observe that the conclusion in ( 32 ) is trivially true for agent v θ since c v θ ,t ( θ ) = 0 , ∀ t ∈ N . T o prov e the result for agen ts in the set V \ { v θ } , w e b egin b y claiming that c i, ( n − 1) T ( θ ) ≤ ( n − 1) T , ∀ i ∈ V . (33) T o prov e this claim, let L 0 ( θ ? , θ ) = { v θ } , and deﬁne L 1 ( θ ? , θ ) , { i ∈ V \ L 0 ( θ ? , θ ) : { T − 1 [ τ =0 N i [ τ ] } ∩ L 0 ( θ ? , θ ) 6 = ∅} (34) as the set of agents in V \ { v θ } that hav e a direct edge from agent v θ at least once ov er the in terv al [0 , T ). Assumption 2 implies that L 1 ( θ ? , θ ) is non-empty (barring the trivial case when V = { v θ } ). No w pick any agent i ∈ L 1 ( θ ? , θ ), and notice that since v θ ∈ N i [ τ ] for some τ ∈ [0 , T ), up date rule ( 31 ) implies c i,τ +1 ( θ ) = 1. 11 In particular, based on ( 31 ), c i,t +1 ( θ ) ≤ c i,t ( θ ) + 1 . (35) Based on the ab o ve discussion, it follows that for each agent i ∈ L 1 ( θ ? , θ ), c i,T ( θ ) ≤ T . The claim in ( 33 ) follows readily for each agent i ∈ L 1 ( θ ? , θ ) b y app ealing to ( 35 ). Let us now recursiv ely deﬁne the sets L r ( θ ? , θ ) , 1 ≤ r ≤ ( n − 1), as L r ( θ ? , θ ) , { i ∈ V \ ( r − 1) [ q =0 L q ( θ ? , θ ) : { rT − 1 [ τ =( r − 1) T N i [ τ ] } ∩ { ( r − 1) [ q =0 L q ( θ ? , θ ) } 6 = ∅} . (36) In w ords, L r ( θ ? , θ ) are those agen ts belonging to V \ ( r − 1) S q =0 L q ( θ ? , θ ) that eac h ha ve at least one neigh b or from the set ( r − 1) S q =0 L q ( θ ? , θ ) ov er the interv al [( r − 1) T , r T − 1]. W e complete the pro of of 11 Notice that based on the update rule ( 31 ), c i,t ( θ ) ≥ 1 , ∀ i ∈ V \ { v θ } . Thus, argmin j ∈N i [ t ] ∪{ i } c j,t ( θ ) = v θ whenev er v θ ∈ N i [ t ], since c v θ ,t ( θ ) = 0 , ∀ t ∈ N . 19 the claim b y inducting on r . The base case with r = 1 has already been prov en abov e. No w supp ose the follo wing is true: c i,rT ( θ ) ≤ r T , ∀ i ∈ L r ( θ ? , θ ), where r ∈ { 1 , . . . , m − 1 } , and m ∈ { 2 , . . . , n − 1 } . Let r = m. If V \ ( m − 1) S q =0 L q ( θ ? , θ ) is empty , then we are done. Else, based on Assumption 2 , it must b e that L m ( θ ? , θ ) is non-empty . Pick an y agent i ∈ L m ( θ ? , θ ), and notice that it has a neigh b or j (sa y) from the set ( m − 1) S q =0 L q ( θ ? , θ ) at some time-step τ ∈ [( m − 1) T , mT ). The induction hypothesis coupled with ( 35 ) implies that c j,τ ( θ ) ≤ τ , and hence c i,τ +1 ( θ ) ≤ c j,τ ( θ ) + 1 ≤ τ + 1 based on ( 31 ). App ealing to ( 35 ) then reveals that c i,mT ( θ ) ≤ mT , th us completing the induction step. Finally , noting that ( n − 1) S q =0 L q ( θ ? , θ ) = V completes our pro of of the claim ( 33 ). An iden tical line of argument as ab o ve can b e employ ed to show that c i, 2( n − 1) T ≤ ( n − 1) T , ∀ i ∈ V . In particular, this can b e done b y ﬁrst taking C 0 ( θ ? , θ ) = { v θ } , and recursiv ely deﬁning the sets C r ( θ ? , θ ) , 1 ≤ r ≤ ( n − 1) as C r ( θ ? , θ ) , { i ∈ V \ ( r − 1) [ q =0 C q ( θ ? , θ ) : { ( n + r − 1) T − 1 [ τ =( n + r − 2) T N i [ τ ] } ∩ { ( r − 1) [ q =0 C q ( θ ? , θ ) } 6 = ∅} . (37) One can then easily prov e via induction that c i, ( n − 1+ r ) T ( θ ) ≤ r T , ∀ i ∈ C r ( θ ? , θ ), where 1 ≤ r ≤ ( n − 1). The rest then follows from ( 35 ). W e can rep eat the ab ov e argument to establish that c i,m ( n − 1) T ( θ ) ≤ ( n − 1) T , ∀ i ∈ V , ∀ m ∈ N + . Finally , based on the ab ov e b ound and ( 35 ), it follows that for each agent i ∈ V , c i,t ( θ ) is upp er- b ounded by 2( n − 1) T at any time-step t ∈ ( m ( n − 1) T , ( m + 1)( n − 1) T ), where m ∈ N + . This establishes ( 32 ) and completes the pro of. The next lemma relates µ i,t ( θ ) , i ∈ V \ { v θ } to µ v θ ,t ( θ ) in terms of the parameter c i,t ( θ ) and, in turn, pro vides the ﬁnal ingredient required to prov e Theorem 1 . Lemma 5. Consider any θ ∈ Θ \ { θ ? } . Supp ose the joint str ong-c onne ctivity assumption holds (Assumption 2 ), and e ach agent applies Algorithm 1 . Supp ose c i,t ( θ ) is ﬁnite, wher e i ∈ V \ { v θ } , and t ∈ N . Then, the fol lowing ar e true. (i) Ther e exists a p ath of length c i,t ( θ ) fr om v θ to i over [ t − c i,t ( θ ) , t − 1] . (ii) L et the p ath linking v θ to i over [ t − c i,t ( θ ) , t − 1] in p art (i) b e denote d x ( t − c i,t ( θ )) , x ( t − c i,t ( θ ) + 1) , . . . , x ( t ) , wher e x ( t − c i,t ( θ )) = v θ and x ( t ) = i . Then µ i,t ( θ ) ≤ µ v θ ,a i,t ( θ ) ( θ ) t Q τ = a i,t ( θ )+1 η x ( τ ) ,τ ( θ ? ) , (38) wher e a i,t ( θ ) = t − c i,t ( θ ) , and η i,t ( θ ? ) , min {{ µ j,t − 1 ( θ ? ) } j ∈N i [ t − 1] ∪{ i } , π i,t ( θ ? ) } , ∀ i ∈ V . (39) Pr o of. W e prov e part (i) b y inducting on the v alue of c i,t ( θ ). F or the base case, supp ose c i,t ( θ ) = 1 for some agent i ∈ V \ { v θ } at some time-step t . Based on ( 31 ), notice that this can happ en if and only if v θ ∈ N i [ t − 1]; the claim in part (i) then follo ws readily for the base case. Fix an in teger m ≥ 2, and supp ose that the assertion of part (i) holds for any agen t i ∈ V \ { v θ } and at an y 20 time-step t , whenever c i,t ( θ ) ∈ { 1 , . . . , m − 1 } . No w supp ose that at some time-step t , c i,t ( θ ) = m for some agen t i ∈ V \ { v θ } . Referring to ( 31 ), this is true only if c l,t − 1 ( θ ) = m − 1 for some l ∈ N i [ t − 1] ∪ { i } . Since m ≥ 2, w e ha ve c l,t − 1 ( θ ) ≥ 1, and hence l ∈ V \ { v θ } . The induction h yp othesis th us applies to agent l , implying the existence of a path of length m − 1 from v θ to l ov er [( t − 1) − c l,t − 1 ( θ ) , t − 2], i.e., o ver [ t − m, t − 2]. App ending this path with the edge ( l , i ) ∈ E [ t − 1] immediately leads to the desired conclusion. F or part (ii), consider the path x ( t − c i,t ( θ )) , x ( t − c i,t ( θ ) + 1) , . . . , x ( t ) from v θ to i o ver [ t − c i,t ( θ ) , t − 1], where x ( t − c i,t ( θ )) = v θ and x ( t ) = i . By deﬁnition of this path, x ( τ − 1) ∈ N x ( τ ) [ τ − 1] ∪ { x ( τ ) } , for all τ ∈ { a i,t ( θ ) + 1 , . . . , t } . Th us, referring to ( 4 ), w e obtain µ x ( τ ) ,τ ( θ ) ≤ µ x ( τ − 1) ,τ − 1 ( θ ) m P p =1 min {{ µ j,τ − 1 ( θ p ) } j ∈N x ( τ ) [ τ − 1] ∪{ x ( τ ) } , π x ( τ ) ,τ ( θ p ) } ≤ µ x ( τ − 1) ,τ − 1 ( θ ) η x ( τ ) ,τ ( θ ? ) . (40) Using the ab o v e inequality recursively with τ ∈ { a i,t ( θ ) + 1 , . . . , t } immediately leads to ( 38 ). Pr o of. (Theorem 1) : Fix a false hypothesis θ ∈ Θ \ { θ ? } . Based on the assumption of global iden tiﬁability , note that the set S ( θ ? , θ ) is non-empt y . Recall that v θ is any agent for whic h K i ( θ ? , θ ) , i ∈ S ( θ ? , θ ) is maximum, and note that we hav e already established that the assertion of Theorem 1 , namely inequality ( 5 ), holds for agen t v θ in Lemma 3 . Now consider an agent i ∈ V \ { v θ } , and notice that if t ≥ ( n − 1) T , then c i,t ( θ ) is uniformly b ounded based on Lemma 4 . Th us, the assertions in Lemma 5 hold for all t ≥ ( n − 1) T . T aking the natural log on b oth sides of ( 38 ), dividing throughout b y t , and simplifying, we obtain the following for all t ≥ ( n − 1) T : − log µ i,t ( θ ) t ≥ − log µ v θ ,a i,t ( θ ) ( θ ) t + t X τ = a i,t ( θ )+1 log η x ( τ ) ,τ ( θ ? ) t , (41) where a i,t ( θ ) = t − c i,t ( θ ), η i,t ( θ ? ) is as deﬁned in ( 39 ), and x ( τ ) , τ ∈ { a i,t ( θ ) + 1 , . . . , t } , are agen ts in the path linking v θ to i o ver [ a i,t ( θ ) , t − 1]. F or the remainder of the proof, to ligh ten the notation, let us drop the subscript on v θ , and let a ( t ) = a i,t ( θ ). Based on ( 4 ), we then hav e: µ v ,a ( t ) ( θ ) ≤ π v ,a ( t ) ( θ ) η v ,a ( t ) ( θ ? ) . (42) A bit of straigh tforward algebra then yields: − log µ v ,a ( t ) ( θ ) t ≥ − log π v ,t ( θ ) t + log π v,t ( θ ) π v,a ( t ) ( θ ) t + log η v ,a ( t ) ( θ ? ) t . (43) Com bining ( 41 ) and ( 43 ), w e obtain for t ≥ ( n − 1) T : − log µ i,t ( θ ) t ≥ − log π v ,t ( θ ) t + b ( t ) , (44) where b ( t ) = b 1 ( t ) + b 2 ( t ) + b 3 ( t ), b 1 ( t ) = t X τ = a ( t )+1 log η x ( τ ) ,τ ( θ ? ) t , b 2 ( t ) = log π v,t ( θ ) π v,a ( t ) ( θ ) t , (45) 21 and b 3 ( t ) = log η v ,a ( t ) ( θ ? ) t . (46) W e now argue that each of the terms b 1 ( t ) , b 2 ( t ) and b 3 ( t ) con verge to 0 almost surely as t → ∞ . T o do so, recall that the set ¯ Ω ⊆ Ω in Lemma 2 has measure 1. In what follo ws, we pro ve that b 1 ( t ) , b 2 ( t ) and b 3 ( t ) conv erge to 0 for each sample path ω ∈ ¯ Ω . Accordingly , ﬁx ω ∈ ¯ Ω, and recall η ( ω ) ∈ (0 , 1) and t 0 ( ω ) ∈ (0 , ∞ ) from Lemma 2 . Supp ose t > t 0 ( ω ) + 2 ¯ T , where ¯ T = ( n − 1) T . W e then claim the follo wing: π l,τ ( θ ? ) ≥ η ( ω ) , µ l,τ ( θ ? ) ≥ η ( ω ) , ∀ l ∈ V , ∀ τ ≥ a ( t ) . (47) T o see why this is true, notice that based on Lemma 4 , the following holds when t > t 0 ( ω ) + 2 ¯ T : a ( t ) = t − c i,t ( θ ) ≥ t − 2 ¯ T > t 0 ( ω ) . (48) The claim regarding ( 47 ) then follows readily from equation ( 24 ) in Lemma 2 . Based on the ab o ve discussion, and referring to ( 39 ), w e immediately note that when t > t 0 ( ω ) + 2 ¯ T , η l,τ ( θ ? ) ≥ η ( ω ) , ∀ l ∈ V , ∀ τ ≥ a ( t ) . (49) F or establishing the conv ergence of b 1 ( t ) , b 2 ( t ) and b 3 ( t ), supp ose t > t 0 ( ω ) + 2 ¯ T . Regarding b 1 ( t ), w e then observe: | b 1 ( t ) | =       t X τ = a ( t )+1 log η x ( τ ) ,τ ( θ ? ) t       ( a ) ≤ t X τ = a ( t )+1   log η x ( τ ) ,τ ( θ ? )   t ( b ) ≤ ( t − a ( t )) t log 1 η ( ω ) ( c ) ≤ 2 ¯ T t log 1 η ( ω ) , (50) where (a) follows from the triangle inequality , (b) follo ws from ( 49 ), and (c) follows from ( 48 ). F rom ( 50 ), we immediately note that b 1 ( t ) → 0 along ω . Let us now turn our attention to b 2 ( t ), and tak e note of the following: | b 2 ( t ) | ( a ) = 1 t       log π v ,t ( θ ? ) π v ,a ( t ) ( θ ? ) + t X τ = a ( t )+1 log l v ( s v ,τ | θ ) l v ( s v ,τ | θ ? )       ( b ) ≤ 1 t     log π v ,t ( θ ? ) π v ,a ( t ) ( θ ? )     + 1 t t X τ = a ( t )+1     log l v ( s v ,τ | θ ) l v ( s v ,τ | θ ? )     ( c ) ≤ 2 t log 1 η ( ω ) + ( t − a ( t )) L t ( d ) ≤ 2 t  log 1 η ( ω ) + L ¯ T  , (51) where (a) follows from ( 22 ) and some simple manipulations, (b) is a consequence of the triangle inequalit y , (c) follows from ( 1 ) and ( 47 ), and (d) follows from ( 48 ). Based on ( 51 ), we then note 22 that b 2 ( t ) → 0 along ω . Finally , the fact that b 3 ( t ) conv erges to 0 along ω follo ws immediately by app ealing to ( 49 ). W e hav e thus established that b ( t ) → 0 almost surely . The desired conclusion then follo ws by taking the limit inferior on b oth sides of ( 44 ), and noting that lim t →∞ − log π v ,t ( θ ) t = lim t →∞ − 1 t ρ v ,t ( θ ) = K v ( θ ? , θ ) a.s., (52) where ρ v ,t ( θ ) is as deﬁned in Lemma 1 . The fact that µ i,t ( θ ) → 0 is immediate, since K v ( θ ? , θ ) > 0 based on global identiﬁabilit y . The ab o ve analysis applies identically to each θ ∈ Θ \ { θ ? } . This establishes consistency of our rule, and completes the pro of. B Pro of of Theorem 2 T o pro ve Theorem 2 , we will mak e use of one of Littlewoo d’s three principles: ev ery point wise con vergen t sequence of measurable functions is nearly uniformly con vergen t. Theorem 6. ( Egor oﬀ ’s The or em ) [ 40 , Chapter 18] L et ( X, M , µ ) b e a ﬁnite me asur e sp ac e and { f n } a se quenc e of me asur able functions on X that c onver ge p ointwise a.e. (almost everywher e) on X to a function f that is ﬁnite a.e. on X . Then for e ach  > 0 , ther e is a me asur able subset X  of X for which f n → f uniformly on X  , and µ ( X  ) ≥ 1 − . Pr o of. ( Theorem 2 ): Consider a θ ∈ Θ \ { θ ? } , and recall that K v θ ( θ ? , θ ) = max l ∈S ( θ ? ,θ ) K l ( θ ? , θ ) = ¯ K ( θ ? , θ ) . W e only prov e the result for i ∈ V \ { v θ } , since the argumen t for agen t v θ will be similar. T o this end, let us ﬁx an agen t i ∈ V \ { v θ } . W e adhere to the notation used in the pro of of Lemma 1 , and for simplicit y assume that the initial lo cal b elief vectors π i, 0 , i ∈ V are uniform distributions ov er the hypothesis set Θ; our subsequent argumen ts will contin ue to hold (with simple mo diﬁcations) under the more general assumption on priors in line 1 of Algo 1 . W e immediately note that based on the assumption of uniform priors, ρ i, 0 ( θ ) = 0 , ∀ i ∈ V . Now referring to inequality ( 44 ) in the pro of of Theorem 1 , we obtain the following for t ≥ ( n − 1) T : P θ ?  − log µ i,t ( θ ) t ≤ ¯ K ( θ ? , θ ) −  2 + b ( t )  ( a ) ≤ P θ ?  − log π v θ ,t ( θ ) t ≤ ¯ K ( θ ? , θ ) −  2  ( b ) ≤ P θ ?  − ρ v θ ,t ( θ ) t ≤ ¯ K ( θ ? , θ ) −  2  ( c ) = P θ ? 1 t t X k =1 λ v θ ,k ( θ ) − ( − K v θ ( θ ? , θ )) ≥  2 ! ( d ) ≤ exp( −  2 t 8 L 2 ) . (53) In the ab ov e steps, (a) follo ws directly from ( 44 ), and (b) follo ws b y noting that based on the deﬁnition of ρ v θ ,t ( θ ), log π v θ ,t ( θ ) t ≤ ρ v θ ,t ( θ ) t , ∀ t ∈ N . (54) Step (c) follo ws directly from ( 22 ) with ρ v θ , 0 ( θ ) = 0 . Finally , noting that 1 t t P k =1 λ v θ ,k ( θ ) → − K v θ ( θ ? , θ ) a.s. (as argued in the pro of of Lemma 1 ), using the fact that | λ v θ ,t ( θ ) | ≤ L, ∀ t ∈ N + based on ( 1 ), 23 and applying Ho eﬀding’s inequality [ 41 , Theorem 2], leads to (d). Now recall from the pro of of Theorem 1 that b ( t ) → 0 almost surely . App ealing to Egoroﬀ ’s theorem, we then infer that given an y arbitrarily small δ ∈ (0 , 1), there exists a set Ω 0 ( δ ) ⊆ Ω of P θ ? -measure at least (1 − δ ), such that b ( t ) con verges to 0 uniformly on Ω 0 ( δ ). Thus, given any  > 0, there exists a ω -indep enden t constan t t ( , δ ) ∈ (0 , ∞ ), suc h that | b ( t ) | ≤  2 , ∀ t ≥ t ( , δ ), along each sample path ω ∈ Ω 0 ( δ ). Setting t 0 ( , δ, n, T ) = max { t ( , δ ) , ( n − 1) T } , and referring to ( 53 ), we immediately obtain that ∀ t ≥ t 0 ( , δ, n, T ), P θ ?  − log µ i,t ( θ ) t ≤ ¯ K ( θ ? , θ ) −   ∩ Ω 0 ( δ )  ≤ P θ ?  − log µ i,t ( θ ) t ≤ ¯ K ( θ ? , θ ) −  2 + b ( t )  ∩ Ω 0 ( δ )  ≤ P θ ?  − log µ i,t ( θ ) t ≤ ¯ K ( θ ? , θ ) −  2 + b ( t )  ≤ exp( −  2 t 8 L 2 ) . (55) T aking the natural log on b oth sides of the resulting inequalit y , dividing throughout by t , simpli- fying, and then taking the limit inferior on b oth sides, leads to the desired result. C Pro of of Theorem 5 Pr o of. Consider an f -lo cal adv ersarial set A ⊂ V , and let R = V \ A . W e study tw o separate cases. Case 1: Consider a regular agent i ∈ R suc h that |N i | < (2 f + 1). Based on the hypothesis of the theorem, we claim that i ∈ S ( θ p , θ q ), for every pair θ p , θ q ∈ Θ. W e prov e this claim via con tradiction. T o do so, suppose there exists a pair θ p , θ q ∈ Θ, suc h that i ∈ V \ S ( θ p , θ q ). As |N i | < (2 f + 1), the set { i } is clearly not (2 f + 1)-reac hable (see Def. 2 ). Th us, G is not strongly (2 f + 1)-robust w.r.t. the source set S ( θ p , θ q ), a fact that con tradicts the h yp othesis of the theorem. Th us, w e ha ve established that if the graph-theoretic condition identiﬁed in the theorem is met, then regular agen ts with fewer than (2 f + 1) neigh b ors can distinguish b et ween ev ery pair of hypotheses. F or such agen ts, the assertion of the theorem then follo ws directly from Lemma 1 , and up date rules ( 3 ) and ( 14 ). Case 2: W e now focus only on regular agen ts i satisfying |N i | ≥ (2 f + 1). A key property of the LFRHE algorithm (Algo. 2 ) that will b e used throughout the pro of is as follows. F or any i ∈ R , and an y θ ∈ Θ, the ﬁltering op eration in line 7 of Algo. 2 ensures that at each t ∈ N , we hav e µ j,t ( θ ) ∈ C onv (Ψ θ i,t ) , ∀ j ∈ M θ i,t , (56) where Ψ θ i,t , { µ l,t ( θ ) : l ∈ N i ∩ R} , (57) and C onv (Ψ θ i,t ) is used to denote the conv ex hull formed b y the p oin ts in the set Ψ θ i,t (recall that M θ i,t w as deﬁned in line 8 of Algo 2 to b e the set of agents in N i whose b eliefs are retained by agen t i after it remov es the highest f and low est f b eliefs µ j,t ( θ ) , j ∈ N i ). In words, an y neighboring b elief (on a particular h yp othesis) that agen t i uses in the up date rule ( 13 ) lies in the con vex h ull of the actual b eliefs of its regular neigh b ors (on that particular hypothesis). T o see why ( 56 ) is true, partition the neighbor set N i of a regular agen t into three sets U θ i,t , M θ i,t , and J θ i,t as follows. Sets U θ i,t and J θ i,t are each of cardinality f , and contain neighbors of agen t i that transmit the highest f and the low est f actual b eliefs resp ectiv ely , on the hypothesis θ , to agent i at time-step t . The set M θ i,t con tains the remaining neighbors of agent i , and is non-empty at ev ery time-step since 24 |N i | ≥ (2 f + 1). If M θ i,t ∩ A = ∅ , then ( 56 ) holds trivially . Th us, consider the case when there are adv ersaries in the set M θ i,t , i.e., M θ i,t ∩ A 6 = ∅ . Given the f -lo cality of the adv ersarial mo del, and the nature of the ﬁltering op eration in the LFRHE algorithm, we infer that for eac h j ∈ M θ i,t ∩ A , there exist regular agen ts u, v ∈ N i ∩ R , suc h that u ∈ U θ i,t , v ∈ J θ i,t , and µ v ,t ( θ ) ≤ µ j,t ( θ ) ≤ µ u,t ( θ ). This establishes our claim regarding equation ( 56 ). With the ab ov e prop ert y in hand, let ¯ Ω ⊆ Ω denote the set of sample paths for which assertions (i)-(iii) in Lemma 1 (App endix A ) hold when restricted to the set of regular agents R . Since the evolution of the lo cal b eliefs are unaﬀected by the presence of adv ersaries, Lemma 1 implies P θ ? ( ¯ Ω) = 1. Now as in Lemma 2 , ﬁx a sample path ω ∈ ¯ Ω. Deﬁne γ 1 , min i ∈R π i, 0 ( θ ? ), pic k a small n umber δ > 0 satisfying δ < γ 1 , and observe that argumen ts similar to those in the pro of of Lemma 2 imply the existence of a time-step t 0 ( ω ), suc h that for all t ≥ t 0 ( ω ) , π i,t ( θ ? ) ≥ γ 1 − δ > 0 , ∀ i ∈ R . Let γ 2 ( ω ) , min i ∈R { µ i,t 0 ( ω ) ( θ ? ) } . As b efore, we claim γ 2 ( ω ) > 0. T o establish this claim, w e need to answer the following question: can an adv ersarial agent cause its out-neigh b ors to set their actual b eliefs on θ ? to b e 0 by setting its o wn actual b elief on θ ? to b e 0? W e argue that this is imp ossible under the LFRHE algorithm. By w ay of contradiction, supp ose there exists a time-step ¯ t ( ω ) satisfying: ¯ t ( ω ) = min { t ∈ N : ∃ i ∈ R with µ i,t ( θ ? ) = 0 } . (58) In words, ¯ t ( ω ) represents the ﬁrst time-step when some regular agent i sets its actual b elief on the true hypothesis to b e zero. Clearly , ¯ t ( ω ) 6 = 0 based on line 1 of Algo. 2 . Supp ose ¯ t ( ω ) is some p ositiv e integer, and fo cus on how agen t i up dates µ i, ¯ t ( ω ) ( θ ? ) based on ( 13 ). F ollowing similar argumen ts as in the proof of Lemma 2 , w e know that π i,t ( θ ? ) > 0 , ∀ t ∈ N , ∀ i ∈ R . At the same time, ev ery b elief featuring in the set Ψ θ ? i, ¯ t ( ω ) − 1 (as deﬁned in equation ( 57 )) is strictly p ositiv e based on the w ay ¯ t ( ω ) is deﬁned. In light of the ab o ve arguments, and based on ( 56 ), ( 57 ), we infer: min {{ µ j, ¯ t ( ω ) − 1 ( θ ? ) } j ∈M θ ? i, ¯ t ( ω ) − 1 , π i, ¯ t ( ω ) ( θ ? ) } > 0 . (59) Th us, based on ( 13 ), we must hav e µ i, ¯ t ( ω ) ( θ ? ) > 0, yielding the desired contradiction. With η ( ω ) , min { γ 1 − δ, γ 2 ( ω ) } > 0, one can easily verify the following by referring to ( 13 ): µ i,t ( θ ? ) ≥ η ( ω ) , ∀ t ≥ t 0 ( ω ) , ∀ i ∈ R . (60) In particular, ( 60 ) follows b y (i) noting that for each i ∈ R , π i,t 0 ( ω )+1 ( θ ? ) ≥ η ( ω ), and eac h b elief featuring in the set Ψ θ ? i,t 0 ( ω ) is low er b ounded b y η ( ω ), (ii) leveraging ( 56 ), ( 57 ), and (iii) using a similar string of arguments as those used to arriv e at ( 25 ). Th us, we hav e established an analogous result as in Lemma 2 for the regular agen ts. T o pro ceed, let us ﬁx a false hypothesis θ 6 = θ ? , and deﬁne ˜ K ( θ ? , θ ) , min v ∈S ( θ ? ,θ ) ∩R K v ( θ ? , θ ). Then, giv en any  > 0, Lemma 1 implies the existence of a time-step ˜ t 1 ( ω , θ,  ), such that: π i,t ( θ ) < e − ( ˜ K ( θ ? ,θ ) −  ) t , ∀ t ≥ ˜ t 1 ( ω , θ ,  ) , ∀ i ∈ S ( θ ? , θ ) ∩ R . (61) Let ˜ t 2 = max { t 0 ( ω ) , ˜ t 1 ( ω , θ ,  ) } , where we ha ve suppressed the dep endence of ˜ t 2 on ω , θ and  . F or an y agent i ∈ S ( θ ? , θ ) ∩ R , observe that based on ( 56 ), ( 57 ) and ( 60 ), min {{ µ j,t ( θ ? ) } j ∈M θ ? i,t , π i,t +1 ( θ ? ) } ≥ η ( ω ) , ∀ t ≥ ˜ t 2 . (62) Com bining the ab o ve with a similar line of argument as used to arrive at ( 28 ), we obtain: µ i,t ( θ ) < C 1 ( ω ) e − ( ˜ K ( θ ? ,θ ) −  ) t , ∀ t ≥ ˜ t 2 + 1 , ∀ i ∈ S ( θ ? , θ ) ∩ R , (63) 25 where C 1 ( ω ) = η ( ω ) − 1 . If V \ S ( θ ? , θ ) is empty , then w e are essentially done. Else, deﬁne L 1 ( θ ? , θ ) , { i ∈ V \ S ( θ ? , θ ) : |N i ∩ S ( θ ? , θ ) | ≥ (2 f + 1) } . (64) Whenev er V \ S ( θ ? , θ ) is non-empty , we claim that L 1 ( θ ? , θ ) (as deﬁned ab o ve) is also non-empty based on the hypothesis of the theorem. T o see this, note that if L 1 ( θ ? , θ ) is empt y , then C = V \ S ( θ ? , θ ) is not (2 f + 1)-reac hable, violating the fact that G is strongly (2 f + 1)-robust w.r.t. S ( θ ? , θ ). W e claim that the following holds for each i ∈ L 1 ( θ ? , θ ) ∩ R : min j ∈M θ i,t µ j,t ( θ ) < C 1 ( ω ) e − ( ˜ K ( θ ? ,θ ) −  ) t , ∀ t ≥ ˜ t 2 + 1 . (65) T o verify the ab o v e claim, pick any agen t i ∈ L 1 ( θ ? , θ ) ∩ R , and supp ose t ≥ ˜ t 2 + 1. When |M θ i,t ∩ {S ( θ ? , θ ) ∩ R}| > 0, the claim follows immediately based on ( 63 ). Consider the case when |M θ i,t ∩ {S ( θ ? , θ ) ∩ R}| = 0. Since i ∈ L 1 ( θ ? , θ ), it has at least (2 f + 1) neigh b ors in S ( θ ? , θ ), out of whic h at least f + 1 are regular based on the f -lo cality of the adv ersarial mo del. Since the set J θ i,t has cardinalit y f , it m ust then be that |U θ i,t ∩ {S ( θ ? , θ ) ∩ R}| > 0. Let u ∈ U θ i,t ∩ {S ( θ ? , θ ) ∩ R} . Based on the w a y M θ i,t is deﬁned, it m ust b e that µ j,t ( θ ) ≤ µ u,t ( θ ) < C 1 ( ω ) e − ( ˜ K ( θ ? ,θ ) −  ) t , ∀ j ∈ M θ i,t , where the last inequality follows from ( 63 ). This establishes our claim regarding ( 65 ). No w consider the up date of µ i,t +1 ( θ ) based on ( 13 ), when t ≥ ˜ t 2 + 1. In ligh t of the ab o ve arguments, the numerator of the fraction on the RHS of ( 13 ) is upp er-b ounded by C 1 ( ω ) e − ( ˜ K ( θ ? ,θ ) −  ) t , while the denominator is lo wer-bounded by η ( ω ). W e conclude that for all i ∈ L 1 ( θ ? , θ ) ∩ R : µ i,t ( θ ) < ( C 1 ( ω )) 2 C 2 ( θ ,  ) e − ( ˜ K ( θ ? ,θ ) −  ) t , ∀ t ≥ ˜ t 2 + 2 , (66) where C 2 ( θ ,  ) = e ( ˜ K ( θ ? ,θ ) −  ) . With L 0 ( θ ? , θ ) , S ( θ ? , θ ), w e recursively deﬁne the sets L r ( θ ? , θ ) , 1 ≤ r ≤ ( n − 1) as: L r ( θ ? , θ ) , { i ∈ V \ r − 1 [ q =0 L q ( θ ? , θ ) : |N i ∩ { r − 1 [ q =0 L q ( θ ? , θ ) }| ≥ (2 f + 1) } . (67) W e claim that the following is true for all i ∈ L r ( θ ? , θ ) ∩ R : µ i,t ( θ ) < ( C 1 ( ω )) r +1 ( C 2 ( θ ,  )) r e − ( ˜ K ( θ ? ,θ ) −  ) t , ∀ t ≥ ˜ t 2 + ( r + 1) . (68) T o pro ve the claim, we pro ceed via induction on r . The base cases when r ∈ { 0 , 1 } hav e already b een established. Suppose equation ( 68 ) holds for all r ∈ { 0 , . . . , m − 1 } , whe re m ∈ { 2 , . . . , n − 1 } . The claim easily extends to the case when r = m by noting that (i) L m ( θ ? , θ ) is non-empty if V \ { S ( m − 1) q =0 L q ( θ ? , θ ) } is non-empty (based on the hypothesis of the theorem), (ii) any agent i ∈ L m ( θ ? , θ ) ∩ R has at least (2 f + 1) neigh b ors in the set S ( m − 1) q =0 L q ( θ ? , θ ), of whic h at least f + 1 are regular (based on the f -lo cality of the adv ersarial mo del), and (iii) using the induction hypothesis and arguments similar to those used to arrive at ( 66 ). W e ha v e thus veriﬁed the correctness of ( 68 ). No w taking the natural log on b oth sides of ( 68 ), dividing throughout by t , simplifying, and then taking the limit inferior on both sides of the resulting inequality immediately leads to ( 15 ). Finally , to complete the pro of, it suﬃces to note that S ( n − 1) q =0 L q ( θ ? , θ ) = R . 26 References [1] V. V. V eerav alli, T. Basar, and H. V. Poor, “Decentralized sequential detection with a fusion cen ter performing the sequential test,” IEEE T r ansactions on Information The ory , vol. 39, no. 2, pp. 433–442, 1993. [2] R. Viswanathan and P . K. V arshney , “Distributed detection with multiple sensors Part I. Fundamen tals,” Pr o c. of the IEEE , vol. 85, no. 1, pp. 54–63, 1997. [3] J. N. Tsitsiklis, “Decentralized detection b y a large n um b er of sensors,” Math. of Contr ol, Signals and Systems , v ol. 1, no. 2, pp. 167–182, 1988. [4] A. Jadbabaie, P . Molavi, A. Sandroni, and A. T ahbaz-Salehi, “Non-Ba yesian so cial learning,” Games and Ec onomic Behavior , vol. 76, no. 1, pp. 210–225, 2012. [5] A. Jadbabaie, P . Mola vi, and A. T ah baz-Salehi, “Information heterogeneity and the sp eed of learning in so cial netw orks,” Columbia Bus. Sch. R es. Pap er , pp. 13–28, 2013. [6] Q. Liu, A. F ang, L. W ang, and X. W ang, “So cial learning with time-v arying w eights,” Journal of Systems Scienc e and Complexity , vol. 27, no. 3, pp. 581–593, 2014. [7] K. R. Rad and A. T ahbaz-Salehi, “Distributed parameter estimation in netw orks,” in Pr o c e e d- ings of the 49th IEEE De cision and Contr ol Confer enc e , 2010, pp. 5050–5055. [8] S. Shahrampour and A. Jadbabaie, “Exp onen tially fast parameter estimation in net w orks using distributed dual av eraging,” in Pr o c. of the 52nd De cision and Contr ol Confer enc e , 2013, pp. 6196–6201. [9] S. Shahramp our, A. Rakhlin, and A. Jadbabaie, “Distributed detection: Finite-time analysis and impact of net work top ology ,” IEEE T r ans. on Autom. Contr ol , v ol. 61, no. 11, pp. 3256– 3268, 2016. [10] A. Nedi´ c, A. Olshevsky , and C. A. Urib e, “F ast con vergence rates for distributed Non-Ba yesian learning,” IEEE T r ans. on Autom. Contr ol , v ol. 62, no. 11, pp. 5538–5553, 2017. [11] ——, “Nonasymptotic conv ergence rates for co op erativ e learning ov er time-v arying directed graphs,” in Pr o c. of the Americ an Contr ol Confer enc e . IEEE, 2015, pp. 5884–5889. [12] A. Lalitha, T. Javidi, and A. Sarwate, “So cial learning and distributed hypothesis testing,” IEEE T r ans. on Information The ory , vol. 64, no. 9, pp. 6161–6179, 2018. [13] A. Lalitha and T. Javidi, “Large deviation analysis for learning rate in distributed hypothesis testing,” in Pr o c. of the 49th Asilomar Confer enc e on Signals, Systems and Computers . IEEE, 2015, pp. 1065–1069. [14] L. Su and N. H. V aidya, “Defending Non-Ba yesian learning against adversarial attacks,” Dis- tribute d Computing , pp. 1–13, 2016. [15] P . Molavi, A. T ah baz-Salehi, and A. Jadbabaie, “A theory of non-Bay esian so cial learning,” Ec onometric a , vol. 86, no. 2, pp. 445–490, 2018. [16] R. Olfati-Sab er, E. F ranco, E. F razzoli, and J. S. Shamma, “Belief consensus and distributed h yp othesis testing in sensor netw orks,” in Networke d Emb e dde d Sens. and Cont. Springer, 2006, pp. 169–182. 27 [17] V. Saligrama, M. Alan y ali, and O. Sav as, “Distributed detection in sensor netw orks with pack et losses and ﬁnite capacity links,” IEEE T r ansactions on Signal Pr o c essing , vol. 54, no. 11, pp. 4118–4132, 2006. [18] G. L. Gilardoni and M. K. Clayton, “On reac hing a consensus using DeGro ot’s iterativ e p o ol- ing,” The A nnals of Stat. , pp. 391–401, 1993. [19] N. A. Lync h, Distribute d algorithms . Morgan Kaufmann, 1996. [20] A. Mitra, J. A. Ric hards, and S. Sundaram, “A new approac h for distributed h yp othesis testing with extensions to Byzantine-resilience,” in Pr o c e e dings of the Americ an Contr ol Confer enc e , 2019. [21] T. M. Co ver and J. A. Thomas, Elements of information the ory . John Wiley & Sons, 2012. [22] A. Jadbabaie, J. Lin, and A. S. Morse, “Co ordination of groups of mobile autonomous agents using nearest neighbor rules,” IEEE T r ans. on A utom. Contr ol , vol. 48, no. 6, pp. 988–1001, 2003. [23] A. Nedi ´ c and A. Olshevsky , “Distributed optimization ov er time-v arying directed graphs,” IEEE T r ans. on Autom. Contr ol , v ol. 60, no. 3, pp. 601–615, 2014. [24] P . Molavi, A. Jadbabaie, K. R. Rad, and A. T ahbaz-Salehi, “Reaching consensus with in- creasing information,” IEEE Journal of Sele cte d T opics in Signal Pr o c essing , v ol. 7, no. 2, pp. 358–369, 2013. [25] S. Park and N. C. Martins, “Design of distributed L TI observers for state omniscience,” IEEE T r ans. on Autom. Contr ol , v ol. 62, no. 2, pp. 561–576, 2017. [26] A. Mitra and S. Sundaram, “Distributed observ ers for L TI systems,” IEEE T r ans. on Autom. Contr ol , vol. 63, no. 11, pp. 3689–3704, 2018. [27] D. Acemoglu, A. Ozdaglar, and A. ParandehGheibi, “Spread of (mis) information in so cial net works,” Games and Ec onomic Behavior , vol . 70, no. 2, pp. 194–227, 2010. [28] D. Dolev, N. A. Lync h, S. S. Pinter, E. W. Stark, and W. E. W eihl, “Reac hing appro ximate agreemen t in the presence of faults,” Journal of the A CM (JACM) , v ol. 33, no. 3, pp. 499–516, 1986. [29] N. H. V aidya, L. Tseng, and G. Liang, “Iterativ e appro ximate Byzan tine consensus in arbitrary directed graphs,” in Pr o c. of the ACM Symp. on Principles of Distribute d Computing , 2012, pp. 365–374. [30] H. J. LeBlanc, H. Zhang, X. Koutsoukos, and S. Sundaram, “Resilien t asymptotic consensus in robust net works,” IEEE Journal on Sele cte d Ar e as in Communic ations , v ol. 31, no. 4, pp. 766–781, 2013. [31] S. M. Diba ji and H. Ishii, “Resilien t consensus of second-order agent netw orks: Async hronous up date rules with delays,” Automatic a , vol. 81, pp. 123–132, 2017. [32] L. Su and N. H. V aidya, “F ault-tolerant m ulti-agen t optimization: optimal iterativ e distributed algorithms,” in Pr o c. of the 2016 ACM Symp. on Principles of Dist. Comp. A CM, 2016, pp. 425–434. 28 [33] S. Sundaram and B. Gharesifard, “Distributed optimization under adversarial no des,” IEEE T r ans. on Autom. Contr ol , v ol. 64, no. 3, pp. 1063–1076, 2019. [34] A. Mitra and S. Sundaram, “Byzantine-resilien t distributed observ ers for L TI systems,” A u- tomatic a , (to app ear). [35] J. Usevitc h and D. P anagou, “Resilien t leader-follo w er consensus to arbitrary reference v alues,” in Pr o c. of the Annual Americ an Contr ol Confer enc e . IEEE, 2018, pp. 1292–1298. [36] C.-Y. Ko o, “Broadcast in radio netw orks tolerating Byzantine adversarial b eha vior,” in Pr o c. of the A CM Symp osium on Principles of Distribute d Computing . ACM, 2004, pp. 275–282. [37] H. P ark and S. A. Hutchinson, “F ault-tolerant rendezvous of m ultirob ot systems,” IEEE T r ans. on R ob otics , vol. 33, no. 3, pp. 565–582, 2017. [38] N. V aidy a, “Matrix representation of iterative appro ximate Byzan tine consensus in directed graphs,” arXiv pr eprint arXiv:1203.1888 , 2012. [39] W. Mulzer and D. W erner, “Approximating Tverberg p oin ts in linear time for an y ﬁxed di- mension,” Discr ete & Computational Ge ometry , vol. 50, no. 2, pp. 520–535, 2013. [40] H. Royden and P . Fitzpatrick, R e al Analysis . Pren tice Hall, 2010. [41] W. Ho eﬀding, “Probabilit y inequalities for sums of b ounded random v ariables,” in The Col- le cte d Works of Wassily Ho eﬀding . Springer, 1994, pp. 409–426. 29

A New Approach to Distributed Hypothesis Testing and Non-Bayesian Learning: Improved Learning Rate and Byzantine-Resilience

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment