Adversarial Source Identification Game with Corrupted Training

Adv ersarial Source Identiﬁcation Game with Corrupted T raining Mauro Barni, F ellow , IEEE , Benedetta T ondi, Student Member , IEEE Abstract W e study a v ariant of the source identiﬁcation g ame with training data in which part of the training data is corrupted by an attacker . In the addressed scenario, the defender aims at deciding whether a test sequence has been drawn according to a discrete memoryless source X ∼ P X , whose statistics are known to him through the observ ation of a training sequence generated by X . In order to undermine the correct decision under the alternativ e hypothesis that the test sequence has not been drawn from X , the attacker can modify a sequence produced by a source Y ∼ P Y up to a certain distortion, and corrupt the training sequence either by adding some fake samples or by replacing some samples with fake ones. W e deriv e the unique rationalizable equilibrium of the two versions of the game in the asymptotic regime and by assuming that the defender bases its decision by relying only on the ﬁrst order statistics of the test and the training sequences. By mimicking Stein’ s lemma, we deriv e the best achie vable performance for the defender when the ﬁrst type error probability is required to tend to zero exponentially fast with an arbitrarily small, yet positi ve, error exponent. W e then use such a result to analyze the ultimate distinguishability of any two sources as a function of the allowed distortion and the fraction of corrupted samples injected into the training sequence. Index T erms Hypothesis testing, adversarial signal processing, cybersecurity , g ame theory , source identiﬁcation, optimal transportation theory , earth mov er distance, adversarial learning, Sanov’ s theorem. M. Barni is with the Department of Information Engineering and Mathematics, Univ ersity of Siena, V ia Roma 56, 53100 - Siena, IT AL Y , phone: +39 0577 234850 (int. 1005), e-mail: barni@dii.unisi.it; B. T ondi is with the Department of Information Engineering and Mathematics, University of Siena, V ia Roma 56, 53100 - Siena, IT AL Y , e-mail: benedettatondi@gmail.com. March 29, 2017 DRAFT IEEE TRANSA CTIONS ON INFORMA TION THEOR Y , V OL. X, NO. X, XXXXXXX XXXX 1 Adv ersarial Source Identiﬁcation Game with Corrupted T raining I . I N T RO D U C T I O N Adversarial Signal Processing (AdvSP) is an emerging discipline aiming at modelling the interplay between a defender wishing to carry out a certain processing task, and an attacker aiming at impeding it [1]. Binary decision in an adversarial setup is one of the most recurrent problems in AdvSP , due to its importance in many application scenarios. Among binary decision problems, source identiﬁcation is one of the most studied subjects, since it lies at the heart of sev eral security-oriented disciplines, like multimedia forensics, anomaly detection, traf ﬁc monitoring, ste ganalysis and so on. The source identiﬁcation game has been introduced in [2] to model the interplay between the defender and the attack er by resorting to concepts dra wn from game and information theory . According to the model put forward in [2], the defender and the attacker ha ve a perfect kno wledge of the to-be-distinguished sources. In [3] the analysis is pushed a step forw ard by considering a scenario in which the sources are kno wn only through the observation of a training sequence. Finally , [4] introduces the security margin concept, a synthetic parameter characterising the ultimate distinguishability of two sources under adversarial conditions. In this paper , we e xtend the analysis further , by considering a situation in which the attacker may interfere with the learning phase by corrupting part of the training sequence. Adversarial learning is a rather no vel concept, which has been studied for some years from a machine learning perspecti ve [5], [6], [7]. Due to the natural vulnerability of machine learning systems, in fact, the attacker may take an important advantage if no countermeasures are adopted by the defender . The use of a training sequence to gather information about the statistics of the to-be-distinguished sources can be seen as a very simple learning mechanism, and the analysis of the impact that an attack carried out in such a phase has on the performance of a decision system may help shedding new light on this important problem. T o be speciﬁc, we e xtend the game-theoretic frame work introduced in [3] and [4] to model a situation in which the attacker is giv en the possibility of corrupting part of the training sequence. By adopting a game-theoretic perspectiv e, we deri v e the optimal strategy for the defender and the optimal corruption strategy for the attacker when the length of the training sequence and the observed sequence tends to inﬁnity . Giv en such optimum strategies, expressed in the form of game equilibrium point, we analyse the best achiev able performance when the type I and II error probabilities tend to zero exponentially fast. March 29, 2017 DRAFT IEEE TRANSA CTIONS ON INFORMA TION THEOR Y , V OL. X, NO. X, XXXXXXX XXXX 2 Speciﬁcally , we study the distinguishability of the sources as a function of the fraction of training samples corrupted by the attacker and when the test sequence can be modiﬁed up to a certain distortion lev el. The results of the analysis are summarised in terms of blinding corruption le v el, deﬁned as the fraction of corrupted samples making a reliable distinction between the two sources impossible, and security margin, deﬁned as the maximum distortion of the observed sequence for which a reliable distinction is possible (see [4]). The analysis is applied to two different scenarios wherein the attacker is allo wed respectiv ely to add a certain amount of fake samples to the training sequence and to selecti v ely r eplace a fraction of the samples of the training sequences with fake samples. As we will see, the second case is more fa vourable to the attacker , since a lower distortion and a lo wer number of corrupted training samples are enough to prevent a correct decision. Gi ven the above general framework, the main results prov en in this paper can be summarised as follo ws: 1) W e rigorously deﬁne the source identiﬁcation game with addition of corrupted training samples ( S I a c - tr game) and show that such a game is a dominance solvable game admitting an asymptotic equilibrium point when the length of the training and test sequences tend to inﬁnity (Theorem 1 and following discussion in Section III); 2) W e ev aluate the payoff of the game at the equilibrium and derive the expression of the indistin- guishability region, deﬁned as the region with the sources Y which can not be distinguished from X because of the attack (Theorems 2 and 3, Section III); 3) Gi ven any two sources X and Y , we deri ve the security mar gin and the blinding corruption le v el deﬁned as the maximum distortion introduced into the test sequence and maximum fraction of fake training samples introduced by the attacker , still allowing the distinction of X and Y while ensuring positiv e error exponents for the two kinds of errors of the test (Theorem 4 and Deﬁnition 3 in Section V); 4) W e repeat the entire analysis for the source identiﬁcation g ame with selectiv e replacement of training samples ( S I r c - tr game), and compare the two versions of the game (Theorem 5 and subsequent discussion in Section VI). 5) The main proofs of the paper rely on a generalised version of Sanov’ s theorem [8], [9], which is prov en in Appendix A. In fact, Theorem 6, and its use to simplify some of the proofs in the paper , can be seen as a further methodological contrib ution of our work. This paper considerably extends the analysis presented in [10], by providing a formal proof of the March 29, 2017 DRAFT IEEE TRANSA CTIONS ON INFORMA TION THEOR Y , V OL. X, NO. X, XXXXXXX XXXX 3 results anticipated in [10] 1 and make a step forw ard by studying a more comple x corruption scenario in which the attack er has the freedom to replace a gi ven percentage of the training samples rather than simply adding some fak e samples to the original training sequence. The paper is organised as follows. Section II summarises the notation used throughout the paper , gives some deﬁnitions and introduces some basics concept of Game theory that will be used in the sequel. Section III giv es a rigorous deﬁnition of the S I a c - tr game, explaining the rationale behind the v arious assumptions made in the deﬁnition. In Section IV, we prove the main theorems of the paper regarding the asymptotic equilibrium point of the S I a c - tr game and the payoff at the equilibrium. Section V leverages on the results proven in Section IV to introduce the concepts of blind corruption lev el and security margin, and ev aluating them in the setting provided by the S I a c - tr game. Section VI, introduces and solves the S I r c - tr game, by paying attention to compare the results of the analysis with the corresponding results of the S I a c - tr game. The paper ends in Section VII, with a summary of the main results prov en in the paper and the description of possible directions for future work. In order to av oid b urdening the main body of the paper , the most technical details of the proofs are gathered in the Appendix. I I . N OTA T I O N A N D D E FI N I T I O N S In this section, we introduce the notation and deﬁnitions used throughout the paper . W e will use capital letters to indicate discrete memoryless sources (e.g. X ). Sequences of length n dra wn from a source will be indicated with the corresponding lowercase letters (e.g. x n ); accordingly , x i will denote the i -th element of a sequence x n . The alphabet of an information source will be indicated by the corresponding calligraphic capital letter (e.g. X ). The probability mass function (pmf) of a discrete memoryless source X will be denoted by P X . The calligraphic letter P will be used to indicate the class of all the probability mass functions, namely , the probability simplex in R |X | . The notation P X will be also used to indicate the probability measure ruling the emission of sequences from a source X , so we will use the expressions P X ( a ) and P X ( x n ) to indicate, respectiv ely , the probability of symbol a ∈X and the probability that the source X emits the sequence x n , the e xact meaning of P X being always clearly recoverable from the context wherein it is used. W e will use the notation P X ( A ) to indicate the probability of A (be it a subset of X or X n ) under the probability measure P X . Finally , the probability of a generic will be denoted by P r {} . Our analysis relies e xtensi vely on the concepts of type and type class deﬁned as follo ws (see [8] and [11] for more details). Let x n be a sequence with elements belonging to a ﬁnite alphabet X . The type 1 W e also give a more precise formulation of the problem, by correcting some inaccuracies present in [10]. March 29, 2017 DRAFT IEEE TRANSA CTIONS ON INFORMA TION THEOR Y , V OL. X, NO. X, XXXXXXX XXXX 4 P x n of x n is the empirical pmf induced by the sequence x n , i.e. ∀ a ∈X ,P x n ( a )= 1 n P n i =1 δ ( x i ,a ) , where δ ( x i ,a )=1 if x i = a and zero otherwise. In the follo wing, we indicate with P n the set of types with denominator n , i.e. the set of types induced by sequences of length n . Gi v en P ∈P n , we indicate with T ( P ) the type class of P , i.e. the set of all the sequences in X n having type P . W e denote by D ( P || Q ) the Kullback-Leibler (KL) di ver gence between two distrib utions P and Q , deﬁned on the same ﬁnite alphabet X [8]: D ( P || Q )= X a ∈X P ( a )log 2 P ( a ) Q ( a ) . (1) Most of our results are expressed in terms of the generalised log-likelihood ratio function h (see [3], [12], [13]), which for any two gi ven sequences x n and t m is deﬁned as: h ( P x n ,P t m )= D ( P x n || P r n + m )+ m n D ( P t m || P r n + m ) , (2) where P r n + m denotes the type of the sequence r n + m , obtained by concatenating x n and t m , i.e. r n + m = x n k t m . The intuitiv e meaning behind the above deﬁnition is that P r n + m is the pmf which maximises the probability that a memoryless source generates two independent sequences belonging to T ( P x n ) and T ( P t m ) , and that such a probability is equal to 2 − nh ( P x n ,P t m ) at the ﬁrst order in the exponent (see [13] or Lemma 1 in [3]). Throughout the paper , we will need to compute limits and distances in P . W e can do so by choosing one of the man y av ailable distances deﬁned o ver R |X | and for which P is a bounded set, for instance the L p distance for which we hav e: d L p ( P ,Q )=  X a ∈X | P ( a ) − Q ( a ) | p  1 /p . (3) W ithout loss of generality , we will prove all our results by adopting the L 1 distance, the generalisation to different L p metrics being straightforward. In the sequel, distances between pmf ’ s in P will be simply indicated as d ( · , · ) as a shorthand for d L 1 ( · , · ) 2 . W e also need to introduce the Hausdorf f distance as a way to measure distances between subsets of a metric space [14]. Let S be a generic space and d a distance measure deﬁned over S . F or any point x ∈ S and any non-empty subset A ⊆ S , the distance of x from the subset A is deﬁned as: d ( x,A ) = inf a ∈ A d ( a,x ) . (4) Gi ven the above deﬁnition, the Hausdorff distance between an y two subsets of S is deﬁned as follows. 2 Throughout the paper , we will use the symbol d ( · , · ) to indicate both the distortion between two sequences in X n and the L 1 distance between two pmf ’ s in P , the exact meaning being always clear from the context, March 29, 2017 DRAFT IEEE TRANSA CTIONS ON INFORMA TION THEOR Y , V OL. X, NO. X, XXXXXXX XXXX 5 Deﬁnition 1. F or any two subsets A and B of S , let us deﬁne δ B ( A )=sup b ∈ B d ( b,A ) . The Hausdorff distance δ H ( A,B ) between A and B is given by: δ H ( A,B ) = max { δ A ( B ) ,δ B ( A ) } . (5) If the sets A and B are bounded with respect to d , then the Hausdorff distance always takes a ﬁnite v alue. The Hausdorff distance does not deﬁne a true metric, but only a pseudometric, since δ H ( A,B )=0 implies that the closures of the sets A and B coincide, namely cl ( A )= cl ( B ) , but not necessarily that A = B . For this reason, in order for δ H to be a metric, we need to restrict its deﬁnition to closed subsets 3 . Let then L ( S ) denote the space of non-empty closed and limited subsets of S and let δ H : L ( S ) ×L ( S ) → [0 , ∞ ) . Then, the space L ( S ) endowed with the Hausdorf f distance is a metric space [15] and we can giv e the follo wing deﬁnition: Deﬁnition 2. Let { K n } be a sequence of closed and limited subsets of S , i.e., K n ∈L ( S ) ∀ n . W e use the notation K n H → K to indicate that the sequence has limit in ( L ( S ) ,δ H ) and the limiting set is K . A. Basic notions of Game Theory In this section, we introduce some basic notions and deﬁnitions of Game Theory . A 2-player game is deﬁned as a quadruple ( S 1 , S 2 ,u 1 ,u 2 ) , where S 1 = { s 1 , 1 ...s 1 ,n 1 } and S 2 = { s 2 , 1 ...s 2 ,n 2 } are the set of strategies the ﬁrst and the second player can choose from, and u l ( s 1 ,i ,s 2 ,j ) ,l =1 , 2 , is the payof f of the g ame for player l , when the ﬁrst player chooses the strategy s 1 ,i and the second chooses s 2 ,j . A pair of strate gies ( s 1 ,i ,s 2 ,j ) is called a proﬁle. When u 1 ( s s 1 ,i ,s 2 ,j )= − u 2 ( s 1 ,i ,s 2 ,j ) , the win of a player is equal to the loss of the other and the game is said to be a zero-sum game. The sets S 1 , S 2 and the payoff functions are assumed to be kno wn to both players. Throughout the paper we consider strategic games, i.e., games in which the players choose their strategies beforehand without knowing the strategy chosen by the opponent player . The ﬁnal goal of game theory is to determine the e xistence of equilibrium points, i.e. proﬁles that in some sense represent the best choice for both players [16]. The most famous notion of equilibrium is due to Nash. A proﬁle is said to be a Nash equilibrium if no player can improv e its payof f by changing its strategy unilaterally . Despite its popularity , the practical meaning of Nash equilibrium is often unclear , since there is no guarantee that the players will end up playing at the equilibrium. A particular kind of games for which stronger forms of equilibrium exist are the so called dominance solvable games [16]. 3 Note that in this case the inf and sup operations in volved in the deﬁnition of the Hausdorff distance can be replaced with min and max , respectively . March 29, 2017 DRAFT IEEE TRANSA CTIONS ON INFORMA TION THEOR Y , V OL. X, NO. X, XXXXXXX XXXX 6 T o be speciﬁc, a strate gy is said to be strictly dominant for one player if it is the best strategy for the player , i.e., the strategy which corresponds to the largest payof f, no matter how the other player decides to play . When one such strategy exists for one of the players, he will surely adopt it. In a similar way , we say that a strate gy s l,i is strictly dominated by strategy s l,j , if the payof f achiev ed by player l choosing s l,i is alw ays lo wer than that obtained by playing s l,j regardless of the choice made by the other player . The recursiv e elimination of dominated strategies is a common technique for solving games. In the ﬁrst step, all the dominated strategies are removed from the set of av ailable strategies, since no rational player would ev er play them. In this way , a ne w , smaller game is obtained. At this point, some strategies, that were not dominated before, may be dominated in the remaining game, and hence are eliminated. The process goes on until no dominated strategy exists for an y player . A rationalizable equilibrium is any proﬁle which surviv es the iterated elimination of dominated strategies [17], [18]. If at the end of the process only one proﬁle is left, the remaining proﬁle is said to be the only rationalizable equilibrium of the game. The corresponding strate gies are the only rational choice for the two players and the game is said dominance solvable . I I I . S O U R C E I D E N T I FI C A T I O N G A M E W I T H A D D I T I O N O F C O R RU P T E D T R A I N I N G S A M P L E S ( S I a c - tr ) In this section, we gi ve a rigorous deﬁnition of the Source Identiﬁcation game with addition of corrupted training samples. Gi ven a discrete and memoryless source X ∼ P X and a test sequence v n , the goal of the defender (D) is to decide whether v n has been drawn from X (hypothesis H 0 ) or not (alternative hypothesis H 1 ). By adopting a Neyman-Pearson perspecti v e, we assume that D must ensure that the f alse positi ve error probability ( P f p ), i.e., the probability of rejecting H 0 when H 0 holds (type I error) is lower than a giv en threshold. Similarly to the previous versions of the game studied in [2] and [3], we assume that D relies only on ﬁrst order statistics to make a decision. For mathematical tractability , likewise earlier papers, we study the asymptotic version of the g ame when n →∞ , by requiring that P f p decays e xponentially fast when n increases, with an error exponent at least equal to λ , i.e. P f p ≤ 2 − nλ . On its side, the attacker aims at increasing the false negati ve error probability ( P f n ), i.e., the probability of accepting H 0 when H 1 holds (type II error). Speciﬁcally , A takes a sequence y n drawn from a source Y ∼ P Y and modiﬁes it in such a way that D decides that the modiﬁed sequence z n has been generated by X . In doing so, A must respect a distortion constraint requiring that the av erage per-letter distortion between y n and z n is lo wer than L . Players A and D know the statistics of X through a training sequence, howe ver the training sequence can be partly corrupted by A. Depending on ho w the training sequence is modiﬁed by the attacker , we can deﬁne different versions of the game. In this paper , we focus on two possible cases: in the ﬁrst case, March 29, 2017 DRAFT IEEE TRANSA CTIONS ON INFORMA TION THEOR Y , V OL. X, NO. X, XXXXXXX XXXX 7 hereafter referred to as source identiﬁcation game with addition of corrupted samples S I a c - tr , the attacker can add some fake samples to the original training sequence. In the second case, analysed in Section VI, the attacker can replace some of the training samples with fake values (source identiﬁcation game with replacement of training samples - S I r c - tr ). It is worth stressing that, e ven if the goal of the attacker is to increase the false negati v e error probability , the training sequence is corrupted regardless of whether H 0 or H 1 holds, hence, in general, this part of the attack also af fects the false positi ve error probability . As it will be clear later on, this forces the defender to adopt a worst case perspectiv e to ensure that P f p is surely lower than 2 − λn . As to Y , we assume that the attacker kno ws P Y exactly . For a proper deﬁnition of the payoff of the game, we also assume that D knows P Y . This may seem a too strong assumption, howe ver we will show later on that the optimum strategy of D does not depend on P Y , thus allowing us to relax the assumption that D knows P Y . W ith the above ideas in mind, we are now ready to giv e a formal deﬁnition of the S I a c - tr game. A. Structur e of the S I a c - tr game A schematic representation of the S I a c - tr game is giv en in Figure 1. Let τ m 1 be a sequence dra wn from X . W e assume that τ m 1 is accessible to A, who corrupts it by concatenating to it a sequence of fak e samples τ m 2 . Then A reorders the overall sequence in a random way so to hide the position of the fake samples. Note that reordering does not alter the statistics of the training sequence since the sequence is supposed to be generated from a memoryless source 4 . In the follo wing, we denote by m the ﬁnal length of the training sequence ( m = m 1 + m 2 ), and by α = m 2 m 1 + m 2 the portion of fake samples within it. The corrupted training sequence observed by D is indicated by t m . Ev entually , we hypothesize a linear relationship between the lengths of the test and the corrupted training sequence, i.e. m = cn , for some constant v alue c 5 . The goal of D is to decide if an observed sequence v n has been dra wn from the same source that generated t m ( H 0 ) or not ( H 1 ). W e assume that D knows that a certain percentage of samples in the 4 By using the terminology introduced in [6], the above scenario can be referred to as a causative attack with control over training data. 5 In this paper , we are interested in studying the equilibrium point of the source identiﬁcation game when the length of the test and training sequences tend to inﬁnity . Strictly speaking, we should ensure that when n grows, all the quantities m , m 1 and m 2 are integer numbers for the given c and α . In practice, we will neglect such an issue, since when n grows the ratios m/n and m 1 / ( m 1 + m 2 ) can approximate any real values c and α . More rigorously , we could consider only rational values of c and α , and focus on subsequences of n including only those values for which m/n = c and m 1 / ( m 1 + m 2 )= α . March 29, 2017 DRAFT IEEE TRANSA CTIONS ON INFORMA TION THEOR Y , V OL. X, NO. X, XXXXXXX XXXX 8 A X Y y n x n τ m 1 z n ( d ( z n , y n ) < nL ) D H 0 /H 1 v n A t m = σ ( τ m 1 || τ m 2 ) Fig. 1. Schematic representation of the S I a c - tr game. Symbol || denotes concatenation of sequences and σ () is a random permutation of sequence samples. training sequence are corrupted, but he has no clue about the position of the corrupted samples. The attacker can also modify the sequence generated by Y so to induce a decision error . The corrupted sequence is indicated by z n . W ith regard to the two phases of the attack, we assume that A ﬁrst corrupts the training sequence, then he modiﬁes the sequence y n . This means that, in general, z n will depend both on y n and t m , while t m (noticeably τ m 2 ) does not depend on y n . Stated in another way , the corruption of the training sequence can be seen as a preparatory part of the attack, whose goal is to ease the subsequent camouﬂage of y n . For a formal deﬁnition of the S I a c - tr game, we must deﬁne the set of strategies av ailable to D and A (respecti vely S D and S A ) and the corresponding payof fs. B. Defender’ s strate gies The basic assumption behind the deﬁnition of the space of strate gies av ailable to D is that to make his decision D relies only on the ﬁrst order statistics of v n and t m . This assumption is equiv alent to requiring that the acceptance region for hypothesis H 0 , hereafter referred to as Λ n × m , is a union of pairs of type classes 6 , or equi valently , pairs of types ( P,R ) , where P ∈P n and R ∈P m . T o deﬁne Λ n × m , D follo ws a Neyman-Pearson approach, requiring that the f alse positiv e error probability is lo wer than a certain threshold. Speciﬁcally , we require that the false positiv e error probability tends to zero exponentially fast with a decay rate at least equal to λ . Gi ven that the pmf P X ruling the emission of sequences under H 0 is not known and given that the corruption of the training sequence is going to impair D’ s decision under 6 W e use the superscript n × m to indicate explicitly that Λ n × m refers to n -long test sequences and ( m = cn ) -long training sequences. March 29, 2017 DRAFT IEEE TRANSA CTIONS ON INFORMA TION THEOR Y , V OL. X, NO. X, XXXXXXX XXXX 9 H 0 , we adopt a worst case approach and require that the constraint on the false positiv e error probability holds for all possible P X and for all the possible strategies av ailable to the attacker . Gi ven the abov e setting, the space of strategies a vailable to D is deﬁned as follows: S D = { Λ n × m ⊂P n ×P m : max P X ∈P max s ∈S A P f p ≤ 2 − λn } , (6) where the inner maximization is performed over all the strategies av ailable to the attacker . W e will reﬁne this deﬁnition at the end of the next section, after the exact deﬁnition of the space of strategies of the attacker . C. Attacker’ s str ate gies W ith regard to A, the attack consists of two parts. Gi ven a sequence y n drawn from P Y , and the original training sequence τ m 1 , the attack er ﬁrst generates a sequence of fak e samples τ m 2 and mixes them up with those in τ m 1 producing the training sequence t m observed by D. Then he transforms y n into z n , ev entually trying to generate a pair of sequences ( z n ,t m ) 7 whose types belong to Λ n × m . In doing so, he must ensure that d ( y n ,z n ) ≤ nL for some distortion function d . Let us consider the corruption of the training sequence ﬁrst. Given that the defender bases his decision only on the type of t m , we are only interested in the ef fect that the addition of the fake samples has on P t m . By considering the dif ferent length of τ m 1 and τ m 2 , we have: P t m = αP τ m 2 +(1 − α ) P τ m 1 , (7) where P t m ∈P m , P τ m 1 ∈P m 1 and P τ m 2 ∈P m 2 . The ﬁrst part of the attack, then, is equiv alent to choosing a pmf in P m 2 and mixing it up with P τ m 1 . By the same token, it is reasonable to assume that the choice of the attacker depends only on P τ m 1 rather than on the single sequence τ m 1 . Arguably , the best choice of the pmf in P m 2 will depend on P Y , since the corruption of the training sequence is instrumental in letting the defender think that a sequence generated by Y has been dra wn by the same source that generated t m . T o describe the part of the attack applied to the test sequence, we follow the approach used in [4] based on transportation theory [19]. Let us indicate by n ( i,j ) the number of times that the i -th symbol of the alphabet is transformed into the j -th one as a consequence of the attack. Similarly , let S n Y Z ( i,j )= n ( i,j ) /n be the relative frequency with which such a transformation occurs. In the following, we refer to S n Y Z as 7 While reordering is essential to hide the position of fake samples to D, it does not have any impact on the position of ( z n ,t m ) with respect to Λ n × m , since we assumed that the defender bases its decision only on the ﬁrst order statistic of the observed sequences. For this reason, we omit to indicate the reordering operator σ in the attacking procedure. March 29, 2017 DRAFT IEEE TRANSA CTIONS ON INFORMA TION THEOR Y , V OL. X, NO. X, XXXXXXX XXXX 10 transportation map . For any additi v e distortion measure, the distortion introduced by the attack can be expressed in terms of n ( i,j ) and S n Y Z . In fact, we ha ve: d ( y n ,z n ) = X i,j n ( i,j ) d ( i,j ) , (8) d ( y n ,z n ) n = X i,j S n Y Z ( i,j ) d ( i,j ) . (9) where d ( i,j ) is the distortion introduced when symbol i is transformed into symbol j . The map S n Y Z also determines the type of the attacked sequence. In fact, by indicating with P z n ( j ) the relativ e frequency of symbol j into z n , we have: P z n ( j ) = X i S n Y Z ( i,j ) , S n Z ( j ) . (10) Finally , we observe that the attack er can not change more symbols than there are in the sequence y n ; as a consequence a map S n Y Z can be applied to a sequence y n only if S n Y ( i ) , P j S n Y Z ( i,j )= P y n ( i ) . Sometimes, we ﬁnd con v enient to explicit the dependence of the map chosen by the attacker on the type of t m and y n , and hence we will also adopt the notation S n Y Z ( P t m ,P y n ) . By remembering that Λ n × m depends on v n only through its type, and giv en that the type of the attacked sequence depends on S n Y only through S n Y Z , we can deﬁne the second phase of the attack as the choice of a transportation map among all admissible maps, a map being admissible if: S n Y = P y n (11) X i,j S n Y Z ( i,j ) d ( i,j ) ≤ L. Hereafter , we will refer to the set of admissible maps as A n ( L,P y n ) . W ith the above ideas in mind, the set of strategies of the attacker can be deﬁned as follo ws: S A = S A,T ×S A,O , (12) where S A,T and S A,O indicate, respectiv ely , the part of the attack affecting the training sequence and the observed sequence, and are deﬁned as: S A,T =  Q ( P τ m 1 ): P m 1 →P m 2  , (13) S A,O =  S n Y Z ( P y n ,P t m ): P n ×P m →A n ( L,P y n )  . (14) Note that the ﬁrst part of the attack ( S A,T ) is applied re gardless of whether H 0 or H 1 holds, while the second part ( S A,O ) is applied only under H 1 . W e also stress that the choice of Q ( P τ m 1 ) depends only on the training sequence τ m 1 , while the transportation map used in the second phase of the attack is a March 29, 2017 DRAFT IEEE TRANSA CTIONS ON INFORMA TION THEOR Y , V OL. X, NO. X, XXXXXXX XXXX 11 function of both on y n and τ m 1 (through t m ). Finally , we observe that with these deﬁnitions, the set of strategies of the defender can be redeﬁned by explicitly indicating that the constraint on the false positi ve error probability must be veriﬁed for all possible choices of Q ( · ) ∈S A,T , since this is the only part of the attack affecting P f p . Speciﬁcally , we can rewrite (6) as S D = { Λ n × m ⊂P n ×P m : max P X max Q ( · ) ∈S A,T P f p ≤ 2 − λn } . (15) D. P ayoff The payoff is deﬁned in terms of the false negati ve error probability , namely: u (Λ n × m , ( Q ( · ) , S n Y Z ( · , · ))) = − P f n . (16) Of course, D aims at maximising u while A wants to minimise it. E. The S I a c - tr game with targ eted corruption ( S I a,t c - tr game) The S I a c - tr game is dif ﬁcult to solve directly , because of the 2-step attacking strategy . W e will work around this difﬁculty by tackling ﬁrst with a slightly different version of the game, namely the source identiﬁcation game with tar geted corruption of the training sequence, S I a,t c - tr , depicted in Fig. 2. Whereas the strategies av ailable to the defender remain the same, for the attacker , the choice of Q ( · ) is targeted to the counterfeiting of a given sequence y n . In other words, we will assume that the attacker corrupts the training sequence τ m 1 to ease the counterfeiting of a speciﬁc sequence y n rather than to increase the probability that the second part of the attack succeeds. This means that the part of the attack aiming at corrupting the training sequence also depend on y n , that is: S A,T =  Q ( P τ m 1 ,P y n ): P m 1 ×P n →P m 2  . (17) Even if this setup is not very realistic and is more fav ourable to the attacker , who can exploit the e xact kno wledge of y n (rather than its statistical properties) also for the corruption of the training sequence, in the next section we will sho w that, for large n , the S I a,t c - tr game is equiv alent to the non-targeted version of the game we are interested in. W ith the above ideas in mind, the S I a,t c - tr game is formally deﬁned as follows. 1) Defender’ s strate gies: S D = { Λ n × m ⊂P n ×P m : max P X max Q ( · , · ) ∈S A,T P f p ≤ 2 − λn } . (18) 2) Attacker’ s str ate gies: S A = S A,T ×S A,O (19) with S A,T and S A,O deﬁned as in (17) and (14) respectively . March 29, 2017 DRAFT IEEE TRANSA CTIONS ON INFORMA TION THEOR Y , V OL. X, NO. X, XXXXXXX XXXX 12 A X Y y n x n τ m 1 z n ( d ( z n , y n ) < nL ) t m = σ ( τ m 1 || τ m 2 ) D H 0 /H 1 v n Fig. 2. S I a c - tr game with targeted corruption of the training sequence. 3) P ayoff: The payof f is still equal to the false negati ve error probability: u (Λ n × m , ( Q ( · , · ) , S n Y Z ( · , · ))) = − P f n . (20) I V . A S Y M P T OT I C E Q U I L I B R I U M A N D PA YO FF O F T H E S I a,t c - tr A N D S I a c - tr G A M E S In this section, we deri v e the asymptotic equilibrium point of the S I a,t c - tr and the S I a c - tr games when the length of the test and training sequences tends to inﬁnity and e valuate the payoff at the equilibrium. A. Optimum defender’s strate gy W e start by deriving the asymptotically optimum strategy for D. As we will see, a dominant and uni versal strategy with respect to P Y exists for D. In other words, the optimum choice of D depends on neither the strategy chosen by the attacker nor P Y . In addition, since the constraint on the false positi ve probability must be satisﬁed for all attackers’ strategy , the optimum strategy for the defender is the same for both the tar geted and non-tar geted versions of the game. As a ﬁrst step, we look for an explicit expression of the false positiv e error probability . Such a probability depends on P X and on the strategy used by A to corrupt the training sequence. In fact, the mapping of y n into z n does not have any impact on D’ s decision under H 0 . W e carry out our deriv ations by focusing on the game with tar geted corruption. It will be clear from our analysis that the dependence on y n has no impact on P f p , and hence the same results hold for the game with non-tar geted corruption. For a giv en P X and Q ( · , · ) , P f p is equal to the probability that Y generates a sequence y n and X generates two sequences x n and τ m 1 , such that the pair of type classes ( P x n ,αQ ( P τ m 1 ,P y n )+(1 − α ) P τ m 1 ) March 29, 2017 DRAFT IEEE TRANSA CTIONS ON INFORMA TION THEOR Y , V OL. X, NO. X, XXXXXXX XXXX 13 falls outside Λ n × m . Such a probability can be expressed as: P f p = P r { ( P x n ,αQ ( P τ m 1 ,P y n )+(1 − α ) P τ m 1 ) ∈ ¯ Λ n × m } = X P y n ∈P n P Y ( T ( P y n )) · (21) X ( P x n ,P t m ) ∈ ¯ Λ n × m P X ( T ( P x n )) · X P τ m 1 ∈P m 1 : αQ ( P τ m 1 ,P y n )+(1 − α ) P τ m 1 = P t m P X ( T ( P τ m 1 )) , where ¯ Λ n × m is the complement of Λ n × m , and where we hav e exploited the fact that under H 0 the training sequence τ m 1 and the test sequence x n are generated independently by X . Given the above formulation, the set of strate gies av ailable to D can be rewritten as: S D =  Λ n × m : max P X max Q ( · , · ) X P y n ∈P n P Y ( T ( P y n )) · (22) X ( P x n ,P t m ) ∈ ¯ Λ n × m P X ( T ( P x n )) · X P τ m 1 ∈P m 1 : αQ ( P τ m 1 ,P y n )+(1 − α ) P τ m 1 = P t m P X ( T ( P τ m 1 )) ≤ 2 − λn  . W e are now ready to prove the following lemma, which describes the asymptotically optimum strategy for the defender for both versions of the game. Lemma 1. Let Λ n × m, ∗ be deﬁned as follows: Λ n × m, ∗ =  ( P v n ,P t m ): min Q ∈P m 2 h  P v n , P t m − αQ 1 − α  ≤ λ − δ n  (23) with δ n = |X | log( n +1)((1 − α ) nc +1) n , (24) wher e |X | is the car dinality of the sour ce alphabet and where the minimisation over Q is limited to all the Q ’ s such that P t m − αQ is nonne gative for all the symbols in X . Then: 1) max P X max s ∈S A P f p ≤ 2 − n ( λ − ν n ) , with lim n →∞ ν n =0 , 2) ∀ Λ n × m ∈ S D , we have ¯ Λ n × m ⊆ ¯ Λ n × m, ∗ . March 29, 2017 DRAFT IEEE TRANSA CTIONS ON INFORMA TION THEOR Y , V OL. X, NO. X, XXXXXXX XXXX 14 Pr oof. T o pro ve the ﬁrst part of the lemma, we see that from the expression of the false positi ve error probability giv en by eq. (21), we can write: max P X max Q ( · , · ) P f p ≤ (25) max P X X P y n ∈P n P Y ( T ( P y n )) · X ( P x n ,P t m ) ∈ ¯ Λ n × m, ∗ P X ( T ( P x n )) · max Q ( · , · ) X P τ m 1 ∈P m 1 : αQ ( P τ m 1 ,P y n )+(1 − α ) P τ m 1 = P t m P X ( T ( P τ m 1 )) . (26) Let us consider the term within the inner summation. For each P τ m 1 such that αQ ( P τ m 1 ,P y n )+(1 − α ) P τ m 1 = P t m , we have 8 : P X ( T ( P τ m 1 )) ≤ max Q ∈P m 2 P X  T  P t m − αQ 1 − α  , (27) with the understanding that the maximisation is carried out only ov er the Q ’ s such that P t m − αQ is nonnegati v e for all the symbols in X . Thanks to the abo ve observation, we can upper bound the false positi ve error probability as follo ws: max P X max Q ( · , · ) P f p ≤ (28) max P X X P y n ∈P n P Y ( T ( P y n )) · X ( P x n ,P t m ) ∈ ¯ Λ n × m, ∗ P X ( T ( P x n )) ·|P m 1 |· max Q ∈P m 2 P X  T  P t m − αQ 1 − α  ( a ) = max P X X ( P x n ,P t m ) ∈ ¯ Λ n × m, ∗ P X ( T ( P x n )) |P m 1 | max Q ∈P m 2 P X  T  P t m − αQ 1 − α   ≤|P m 1 | X ( P x n ,P t m ) ∈ ¯ Λ n × m, ∗ max Q ∈P m 2 max P X P X ( T ( P x n )) P X  T  P t m − αQ 1 − α   where in ( a ) we exploited the fact that the rest of the expression no longer depends on P y n . From this point, the proof goes along the same line of the proof of Lemma 2 in [3], by observing that max P X P X ( T ( P x n )) P X  T  P t m − αQ 1 − α  is upper bounded by 2 − nh ( P x n , P t m − αQ 1 − α ) , and that for each pair of types in ¯ Λ n × m, ∗ , h ( P x n , P t m − αQ 1 − α ) is larger than λ − δ n for ev ery Q by the very deﬁnition of Λ n × m, ∗ . W e no w pass to the second part of the lemma. Let Λ n × m be a strategy in S D , and let ( P x n ,P t m ) be a pair of types contained in ¯ Λ n × m . Gi ven that Λ n × m is an admissible decision region (see (18)), the 8 It is easy to see that the bound (27) holds also for the non-targeted game, when Q depends on the training sequence only ( Q ( P τ m 1 ) ). March 29, 2017 DRAFT IEEE TRANSA CTIONS ON INFORMA TION THEOR Y , V OL. X, NO. X, XXXXXXX XXXX 15 probability that X emits a test sequence belonging to T ( P x n ) and a training sequence τ m 1 such that after the attack ( τ m 1 || τ m 2 ) ∈ T ( P t m ) must be lo wer than 2 − λn for all P X and all possible attacking strategies, that is: 2 − λn > max P X max Q ( · , · ) X P y n ∈P n P Y ( T ( P y n )) · (29)  P X ( T ( P x n )) · X P τ m 1 : αQ ( P τ m 1 ,P y n )+(1 − α ) P τ m 1 = P t m P X ( T ( P τ m 1 ))  ( a ) = max P X X P y n ∈P n P Y ( T ( P y n )) ·  P X ( T ( P x n )) · max Q ( · ,P y n ) X P τ m 1 : αQ ( P τ m 1 ,P y n )+(1 − α ) P τ m 1 = P t m P X ( T ( P τ m 1 ))  ( b ) ≥ max P X X P y n ∈P n P Y ( T ( P y n )) ·  P X ( T ( P x n )) · max Q ( P τ m 1 ,P y n ) P X  T  P t m − αQ ( P τ m 1 ,P y n ) 1 − α  ( c ) = max P X P X ( T ( P x n )) max Q ∈P m 2 P X  T  P t m − αQ 1 − α  , where ( a ) is obtained by replacing the maximisation over all possible strategies Q ( · , · ) , with a maximisa- tion over Q ( · ,P y n ) for each speciﬁc P y n , and ( b ) is obtained by considering only one term P τ m 1 of the inner summation and optimising Q ( P τ m 1 ,P y n ) for that term. Finally , ( c ) follo ws by observing that the optimum Q ( · ,P y n ) is the same for any P y n . As usual, the maximization over Q in the last expression is restricted to the Q ’ s for which P t m − αQ ≥ 0 for all the symbols in X 9 By lo wer bounding the probability that a memoryless source X generates a sequence belonging to a 9 It is easy to see that the same lower bound can be deriv ed also for the non targeted case, as the optimum Q in the second to last expression does not depend on P y n . March 29, 2017 DRAFT IEEE TRANSA CTIONS ON INFORMA TION THEOR Y , V OL. X, NO. X, XXXXXXX XXXX 16 certain type class (see [8], chapter 12), we can continue the abo ve chain of inequalities as follows 2 − λn > max P X max Q ∈P m 2 2 − n  D ( P x n || P X )+ m 1 n D  P t m − αQ 1 − α || P X  ( n +1) |X | ( m 1 +1) |X | (30) ≥ 2 − n min Q ∈P m 2 min P X  D ( P x n || P X )+ m 1 n D  P t m − αQ 1 − α || P X  ( n +1) |X | ( m 1 +1) |X | ( a ) = 2 − n min Q ∈P m 2 h  P x n , P t m − αQ 1 − α  ( n +1) |X | ( m 1 +1) |X | , where ( a ) deri v es from the minimization properties of the generalised log-likelihood ratio function h () (see Lemma 1, in [3]). By taking the log of both terms we ha ve: min Q ∈P m 2 h  P x n , P t m − αQ 1 − α  > λ − δ n , (31) thus completing the proof of the lemma. Lemma 1 shows that the strategy Λ n × m, ∗ is asymptotically admissible (point 1) and optimal (point 2), regardless of the attack. From a game-theoretic perspective, this means that such a strategy is a dominant strategy for D and implies that the game is dominance solvable [17]. Similarly , the optimum strategy is a semi-univ ersal one, since it depends on P X but it does not depend on P Y . It is clear from the proof of Lemma 1 that the same optimum strategy holds for the tar geted and non-targeted versions of the game. The situation is rather different with re gard to the optimum strategy for the attack er . Despite the e xistence of a dominant strate gy for the defender , in f act, the identiﬁcation of the optimum attacker’ s strategy for the S I a c - tr game is not easy due to the 2-step nature of the attack. For this reason, in the follo wing sections, we will focus on the tar geted version of the game, which is easier to study . W e will then use the results obtained for the S I a,t c - tr game to deri ve the best achiev able performance for the case of non-targeted attack. B. The S I a,t c - tr game: optimum attacker’ s strate gy and equilibrium point Gi ven the dominant strategy of D, for any gi ven τ m 1 and y n , the optimum attacker’ s strate gy for the S I a,t c - tr game boils down to the follo wing double minimisation: ( Q ∗ ( P τ m 1 ,P y n ) , S n, ∗ Y Z ( P y n ,P t m )) = (32) arg min Q ∈P m 2 S n Y Z ∈A n ( L,P y n )  min Q 0 h  P z n , (1 − α ) P τ m 1 + αQ − αQ 0 1 − α  , where P z n is obtained by applying the transformation map S n Y Z to P y n , and where P t m =(1 − α ) P τ m 1 + αQ . As usual, the minimisation ov er Q 0 is limited to the Q 0 such that all the entries of the resulting pmf are nonnegati v e. March 29, 2017 DRAFT IEEE TRANSA CTIONS ON INFORMA TION THEOR Y , V OL. X, NO. X, XXXXXXX XXXX 17 As a remark, for L =0 (corruption of the training sequence only), we get: Q ∗ ( P τ m 1 ,P y n ) = arg min Q ∈P m 2  min Q 0 h  P y n , P τ m 1 + α 1 − α ( Q − Q 0 )  , (33) while, for α =0 (classical setup, without corruption of the training sequence) we ha ve: S n, ∗ Y Z ( P y n ,P t m )= argmin S n Y Z ∈A n ( L,P y n ) h ( P z n ,P t m ) , (34) falling back to the known case of source identiﬁcation with uncorrupted training, already studied in [3]. Having determined the optimum strate gies of both players, it is immediate to state the following: Theorem 1. The S I a,t c - tr game is a dominance solvable game, whose only rationalizable equilibrium corr esponds to the pr oﬁle (Λ n × m, ∗ , ( Q ∗ ( P · , · ) , S n, ∗ Y Z ( · , · )) . Pr oof. The theorem is a direct consequence of the fact that Λ n × m, ∗ is a dominant strate gy for D. W e remind that the concept of rationalizable equilibrium is much stronger than the usual notion of Nash equilibrium, since the strategies corresponding to such an equilibrium are the only ones that two rational players may adopt [16], [17]. C. The S I a,t c - tr game: payoff at the equilibrium In this section, we deriv e the asymptotic value of the payof f at the equilibrium, to see who and under which conditions is going to win the game. T o start with, we identify the set of pairs ( P y n ,P τ m 1 ) for which, as a consequence of A ’ s action, D accepts H 0 : Γ n ( λ,α,L ) = { ( P y n ,P τ m 1 ): ∃ ( P z n ,P t m ) ∈ Λ n × m, ∗ (35) s.t. P t m =(1 − α ) P τ m 1 + αQ and P z n = S n Z for some Q ∈P m 2 and S n Y Z ∈A ( L,P y n ) } . If we ﬁx the type of the non-corrupted training sequence ( P τ m 1 ), we obtain: Γ n ( P τ m 1 ,λ,α,L )= { P y n : ∃ P z n ∈ Λ n, ∗ ((1 − α ) P τ m 1 + αQ ) (36) s.t. P z n = S n Z for some Q ∈P m 2 and S n Y Z ∈A ( L,P y n ) } , where Λ n, ∗ ( P ) denotes the acceptance region for a ﬁxed type of the training sequence in P m . It is interesting to notice that, since in the current setting A has two de grees of freedom, the attack has a March 29, 2017 DRAFT IEEE TRANSA CTIONS ON INFORMA TION THEOR Y , V OL. X, NO. X, XXXXXXX XXXX 18 twofold effect: the sequence y n is modiﬁed in order to bring it inside the acceptance region Λ n, ∗ ( P t m ) and the acceptance re gion itself is modiﬁed so to facilitate the former action. T o go on, we ﬁnd it conv enient to rewrite the set Γ n ( P τ m 1 ,λ,α,L ) as follo ws: Γ n ( P τ m 1 ,λ,α,L ) = (37) { P y n : ∃ S n P V ∈ A ( L,P y n ) s.t. S n V ∈ Γ n 0 ( P τ m 1 ,λ,α ) } , where Γ n 0 ( P τ m 1 ,λ,α )= (38) { P y n : ∃ Q ∈ P m 2 s.t. P y n ∈ Λ n, ∗ ((1 − α ) P τ m 1 + αQ ) } , is the set containing all the test sequences (or , equiv alently , test types) for which it is possible to corrupt the training set in such a way that they fall within the acceptance region. As the subscript 0 suggests, this set corresponds to the set in (36) when A cannot modify the sequence drawn from Y (i.e. L =0 ) and then tries to hamper the decision by corrupting the training sequence only . By considering the expression of the acceptance region, the set Γ n 0 ( P τ m 1 ,λ,α ) can be expressed in a more explicit form as follo ws: Γ n 0 ( P τ m 1 ,λ,α ) =  P y n : ∃ Q,Q 0 ∈ P m 2 s.t. (39) h  P y n ,P τ m 1 + α (1 − α ) ( Q − Q 0 )  ≤ λ − δ n  , where the second argument of h () denotes a type in P m 1 obtained from the original training sequence τ m 1 by ﬁrst adding m 2 samples and later removing (in a possibly different way) the same number of samples. Note that in this formulation Q accounts for the fake samples introduced by the attacker and Q 0 for the worst case guess made by the defender of the position of the corrupted samples. W e also observe that since we are treating the S I a,t c - tr game, in general Q will depend on P y n . As usual, we implicitly assume that Q and Q 0 are chosen in such a way that P τ m 1 + α (1 − α ) ( Q − Q 0 ) is nonne gativ e and smaller than or equal to 1 for all the alphabet symbols. W e are now ready to deriv e the asymptotic payoff of the game by following a path similar to that used in [2], [3]. First of all we generalise the deﬁnition of the sets Λ n × m, ∗ , Γ n and Γ n 0 so that they can be e valuated for a generic pmf in P (that is, without requiring that the pmf ’ s are induced by sequences of March 29, 2017 DRAFT IEEE TRANSA CTIONS ON INFORMA TION THEOR Y , V OL. X, NO. X, XXXXXXX XXXX 19 ﬁnite length). This step passes through the generalization of the h function. Speciﬁcally , giv en an y pair of pmf ’ s ( P ,P 0 ) ∈P ×P , we deﬁne: h c ( P ,P 0 ) = D ( P || U ) + c D ( P 0 || U ); (40) U = 1 1+ c P + c 1+ c P 0 , where c ∈ [0 , 1] . Note that when ( P ,P 0 ) ∈P n ×P n , h c ( P ,P 0 )= h ( P ,P 0 ) . The asymptotic version of Λ n × m, ∗ is: Λ ∗ =  ( P ,R ) : min Q h c  P , R − αQ 1 − α  ≤ λ  . (41) In a similar way , we can deri ve the asymptotic v ersions of Γ n and Γ n 0 in (37) and (38)-(39). T o do so, we ﬁrst observe that, the transportation map S n Y Z depends on the sources only through the pmfs. By denoting with S n P V a transportation map from a pmf P ∈P n to another pmf V ∈P n and rewriting the set Γ n accordingly , we can easily deriv e the asymptotic version of the set as follo ws: Γ( R,λ,α,L ) = { P ∈P : ∃ S P V ∈A ( L,P ) s.t. V ∈ Γ 0 ( R,λ,α ) } , (42) with Γ 0 ( R,λ,α ) = (43) { P ∈P : ∃ Q ∈P s.t. P ∈ Λ ∗ ((1 − α ) R + αQ ) } =  P ∈P : ∃ Q,Q 0 ∈P s.t. h c  P , R + α (1 − α ) ( Q − Q 0 )  ≤ λ  , where the deﬁnitions of S P V and A ( L,P ) deriv e from those of S n P V and A n ( L,P ) by relaxing the requirement that the terms S P V ( i,j ) and P ( i ) are rational number with denominator n . W e now hav e all the necessary tools to prov e the follo wing theorem. Theorem 2 (Asymptotic payof f of the S I a,t c - tr game) . F or the S I a,t c - tr game, the false ne gative err or exponent at the equilibrium is given by ε = min R [(1 − α ) c D ( R || P X )+ min P ∈ Γ( R,λ,α,L ) D ( P || P Y )] . (44) Accor dingly , 1) if P Y ∈ Γ( P X ,λ,α,L ) then ε = 0 ; 2) if P Y / ∈ Γ( P X ,λ,α,L ) then ε > 0 . Pr oof. The theorem could be proven going along the same lines of the proof of Theorem 4 in [3]. W e instead provide a proof based on the extension of Sanov’ s theorem pro vided in the Appendix (see March 29, 2017 DRAFT IEEE TRANSA CTIONS ON INFORMA TION THEOR Y , V OL. X, NO. X, XXXXXXX XXXX 20 Theorem 6). In fact, Theorem 2, as well as Theorem 4 in [3], can be seen as an application of such a generalized version of Sanov’ s theorem. Let us consider P f n = X ( P y n ,P τ m 1 ) ∈ Γ n ( λ,α,L ) P X ( T ( P τ m 1 )) P Y ( T ( P y n )) (45) = X R ∈P m 1 P X ( T ( R )) X P ∈ Γ n ( R,λ,α,L ) P Y ( T ( P )) = X R ∈P m 1 P X ( T ( R )) P Y (Γ n ( R,λ,α,L )) . W e start by deri ving an upper-bound of the f alse negati ve error probability . W e can write: P f n ≤ X R ∈P m 1 P X ( T ( R )) X P ∈ Γ n ( R,λ,α,L ) 2 − n D ( P || P Y ) ≤ X R ∈P m 1 P X ( T ( R ))( n +1) |X | 2 − n min P ∈ Γ n ( R,λ,α,L ) D ( P || P Y ) ≤ X R ∈P m 1 P X ( T ( R ))( n +1) |X | 2 − n min P ∈ Γ( R,λ,α,L ) D ( P || P Y ) ≤ ( n +1) |X | ( m 1 +1) |X | · 2 − n min R ∈P m 1 [ m 1 n D ( R || P X )+ min P ∈ Γ( R,λ,α,L ) D ( P || P Y )] ≤ ( n +1) |X | ( m 1 +1) |X | · 2 − n min R ∈P [(1 − α ) c D ( R || P X )+ min P ∈ Γ( R,λ,α,L ) D ( P || P Y )] , (46) where the use of the minimum instead of the inﬁmum is justiﬁed by the f act that Γ n ( R,λ,α,L ) and Γ( R,λ,α,L ) are compact sets. By taking the log and di viding by n we ﬁnd: − log P f n n ≥ min R ∈P  (1 − α ) c D ( R || P X )+ min P ∈ Γ( R,λ,α,L ) D ( P || P Y )  − β n , (47) where β n = |X | log( n +1)((1 − α ) nc +1) n tends to 0 when n tends to inﬁnity . W e no w turn to the analysis of a lo wer bound for P f n . Let R ∗ be the pmf achie ving the minimum in the outer minimisation of eq. (44). Due to the density of rational numbers within real numbers, we can ﬁnd a sequence of pmfs’ R m 1 ∈P m 1 ( m 1 =(1 − α ) nc ) that tends to R ∗ when n (and hence m 1 ) tends to March 29, 2017 DRAFT IEEE TRANSA CTIONS ON INFORMA TION THEOR Y , V OL. X, NO. X, XXXXXXX XXXX 21 inﬁnity . W e can write: P f n = X R ∈P m 1 P X ( T ( R )) P Y (Γ n ( R,λ,α,L )) ≥ P X ( T ( R m 1 )) P Y (Γ n ( R m 1 ,λ,α,L )) , ≥ 2 − m 1 D ( R m 1 || P X ) ( m 1 +1) |X | P Y (Γ n ( R m 1 ,λ,α,L )) , (48) where in the ﬁrst inequality we hav e replaced the sum with the single element of the subsequence R m 1 deﬁned pre viously , and where the second inequality deriv es from the well known lower bound on the probability of a type class [8]. From (48), by taking the log and dividing by n , we obtain: − log P f n n ≤ (1 − α ) c D ( R m 1 || P X ) − 1 n log P Y (Γ n ( R m 1 ,λ,α,L ))+ β 0 n , (49) where β 0 n = |X | log( m 1 +1) n tends to 0 when n tends to inﬁnity . In order to compute the probability P Y (Γ n ( R m 1 ,λ,α,L )) , we resort to Corollary 1 of the the generalised version of Sanov’ s Theorem giv en in Appendix A. T o apply the corollary , we must sho w that Γ n ( R m 1 ,λ,α,L ) H → Γ( R ∗ ,λ,α,L ) . First of all, we observe that by exploiting the continuity of the h c function and the density of rational numbers into the real ones, it is easy to prov e that Γ n 0 ( R m 1 ,λ,α ) H → Γ 0 ( R ∗ ,λ,α ) . Then the Hausdorff con ver gence of Γ n ( R m 1 ,λ,α,L ) to Γ( R ∗ ,λ,α,L ) follo ws from the regularity properties of the set of transportation maps stated in Appendix B. T o see ho w , we observe that any transformation S P V ∈A ( L,P ) mapping P into V can be applied in inv erse order through the transformation S V P ( i,j )= S P V ( j,i ) . It is also immediate to see that S V P introduces the same distortion introduced by S P V , that is S V P ∈A ( L,V ) . Let now P be a point in Γ( R ∗ ,λ,α,L ) . By deﬁnition we can ﬁnd a map S P V ∈A ( L,P ) such that V ∈ Γ 0 ( R ∗ ,λ,α ) . Since Γ n 0 ( R m 1 ,λ,α ) H → Γ 0 ( R ∗ ,λ,α ) , for large enough n , we can ﬁnd a point V 0 ∈ Γ n 0 ( R m 1 ,λ,α ) which is arbitrarily close to V . Thanks to the second part of Theorem 7 in Appendix B, we know that a map S V 0 P 0 ∈A n ( L,V 0 ) exists such that P 0 is arbitrarily close to P and P 0 ∈P n . By applying the inv erse map S P 0 V 0 to P 0 , we see that P 0 ∈ Γ n ( R m 1 ,λ,α,L ) , thus permitting us to conclude that, when n increases, δ Γ( R ∗ ,λ,α,L ) (Γ n ( R m 1 ,λ,α,L )) → 0 . In a similar way , we can prove that δ Γ n ( R m 1 ,λ,α,L ) (Γ( R ∗ ,λ,α,L )) → 0 , hence permitting us to conclude that Γ n ( R m 1 ,λ,α,L ) H → Γ( R ∗ ,λ,α,L ) . W e can now apply the generalised version of Sanov Theorem as expressed in Corollary 1 of Appendix A to conclude that: − lim n →∞ 1 n log P Y (Γ n ( R m 1 ,λ,α,L )) = min P ∈ Γ( R ∗ ,λ,α,L ) D ( P || P Y ) . (50) March 29, 2017 DRAFT IEEE TRANSA CTIONS ON INFORMA TION THEOR Y , V OL. X, NO. X, XXXXXXX XXXX 22 Going back to equation (49), and by exploiting the continuity of the div ergence function, we can say that for large n we ha ve: − log P f n n ≤ (1 − α ) c D ( R ∗ || P X ) + min P ∈ Γ( R ∗ ,λ,α,L ) D ( P || P Y )+ ν n , (51) where the sequence ν n tends to zero when n tends to inﬁnity . By coupling equations (47) and (51) and by letting n →∞ , we e ventually obtain: − lim n →∞ log P f n n = min R [(1 − α ) c ·D ( R || P X )+ min P ∈ Γ( R,λ,α,L ) D ( P || P Y )] , (52) thus proving the theorem. As an immediate consequence of Theorem 2, the set Γ( P X ,λ,α,L ) deﬁnes the indistinguishability r e gion of the test, that is the set of all the sources for which A induces D to decide in fav our of H 0 e ven if H 1 holds. D. Analysis of the S I a c - tr game W e now focus on the S I a c - tr game. For a giv en choice of Q ( P τ m 1 ) ∈S A,T (and hence t m ), giv en a sequence y n , the optimum choice of the second part of the attack deriv es quite easily from the deﬁnition of Λ n × m, ∗ , namely S n, ∗ Y Z ( P y n ,P t m )= (53) arg min S n Y Z ∈A n ( L,P y n )  min Q ∈P m 2 h  P z n , P t m − αQ 1 − α  . No w the point is to determine the strategy Q ( P τ m 1 ) which maximises the probability that the attack in (53) succeeds. T o this purpose, of course, the attacker must exploit the knowledge of P Y . Since solving such a maximisation problem is not an easy task, we will proceed in a different way . W e ﬁrst introduce a simple (and possibly suboptimum) strategy , then we argue that such a strategy is asymptotically optimum, in that the set of the sources that cannot be distinguished from X with this choice is the same set that we hav e obtained for the S I a,t c - tr setup, which is kno wn to be more fa v ourable to the attacker . More speciﬁcally , we consider the following two-step attacking strategy . In the ﬁrst step of the attack, A does not know y n , hence he trusts the la w of large numbers and optimises Q ( P τ m 1 ) by using P Y as a proxy March 29, 2017 DRAFT IEEE TRANSA CTIONS ON INFORMA TION THEOR Y , V OL. X, NO. X, XXXXXXX XXXX 23 for P y n . T o do so, he applies equation (32), by replacing P y n with P Y . Speciﬁcally , by indicating with Q † , the resulting strate gy for the ﬁrst step of the attack, we have Q † ( P τ m 1 ) = arg min Q ∈P m 2 (54) min Q 0 ∈P m 2 S Y Z ∈A ( L,P Y ) h c  P Z ,P τ m 1 + α 1 − α ( Q − Q 0 )  . (55) As a by-product of the abov e minimisation, the attacker also ﬁnds the map S n, † Y Z representing the optimum attack when P y n = P Y . Let us indicate the result of the application of such a map to P Y by P † Z . In the second part of the attack, A tries to mo ve P y n as close as possible to P † Z , that is: S n, † Y Z ( P y n ,P † t m ) = arg min S n Y Z ∈A n ( L,P y n ) d ( S n Z ,P † Z ) , (56) where S n, † Y Z ( P y n ,P † t m ) depends upon the corrupted training sequence obtained after the application of the ﬁrst part of the attack, namely P † t m =(1 − α ) P τ m 1 + αQ † ( P τ m 1 ) , through P † Z . The asymptotic optimality of the strategy ( Q † ( P τ m 1 ) , S n, † Y Z ( P y n ,P † t m ) ) derives from the following theorem Theorem 3 (Indistinguishability re gion of the S I a c - tr game) . The indistinguishability r egion of S I a c - tr game is equal to that of the S I a,t c - tr game (see eq. (42) ) and is asymptotically achieved by the attacking strate gy ( Q † ( P τ m 1 ) , S n, † Y Z ( P y n ,P † t m ) ). Pr oof (sketc h). The theorem derives from the observation that due to the law of large numbers, when n gro ws, P y n tends to P Y ; hence, for large enough n , optimising the ﬁrst part of the attack by replacing P y n with P Y does not introduce a signiﬁcant performance loss. The rigorous proof goes along similar lines to those used to prove Theorem 2 and ultimately relies on the continuity of the h c function and the regularity properties of the set A n ( L,P y n ) . The details of the proof are omitted for sake of bre vity . Gi ven that asymptotic equiv alence of the S I a c - tr and the S I a,t c - tr games, in the rest of the paper , we will generally refer to the S I a c - tr game without specifying if we are considering the tar geted or non-targeted case. V . S O U R C E D I S T I N G U I S H A B I L I T Y F O R T H E S I a c - tr G A M E In this section, we study the behaviour of the S I a c - tr game when we vary the decay rate of the false positiv e error probability λ . By letting λ tend to zero, in fact, we can deriv e the best achiev able performance of the defender when we require only that P f p tends to zero exponentially fast re gardless of the decay rate. Then, we use such a result to deri ve the conditions under which the reliable distinction March 29, 2017 DRAFT IEEE TRANSA CTIONS ON INFORMA TION THEOR Y , V OL. X, NO. X, XXXXXXX XXXX 24 between two sources is possible in terms of number of corrupted training samples α and maximum allo wed distortion L . A. Ultimate achievable performance of the game As we said, the goal of this section is to study the limit of the indistinguishability region when λ → 0 . This limit, in fact, determines all the pmf ’ s P Y that can not be distinguished from P X ensuring that the two types of error probabilities tend to zero exponentially fast (with vanishingly small, yet positive, error exponents). W e start by e xploiting optimal transport theory to rewrite the indistinguishability re gion as: Γ( P X ,λ,α,L ) = { P : ∃ V ∈ Γ 0 ( P X ,λ,α ) s.t. EMD ( P,V ) ≤ L } , (57) where EMD (Earth Mover Distance) is the term used in computer vision to denote the minimum transportation cost [19], [20], that is EMD ( P ,V ) = min S P V : S P = P,S V = V X i,j S P V ( i,j ) d ( i,j ) . (58) W ith this deﬁnition, the main result of this section is stated by the following theorem. Theorem 4. Given two sour ces X and Y , a maximum allowed avera ge per -letter distortion L and a fraction α of tr aining samples pr ovided by the attac ker , the maximum ac hievable false ne gative err or exponent ε for the S I a c - tr game is: lim λ → 0 lim n →∞ − 1 n log P f n = min R [(1 − α ) c D ( R || P X ) + min P ∈ Γ( R,α,L ) D ( P || P Y )] , (59) wher e Γ( R,α,L )=Γ( R ,λ =0 ,α,L ) . Accordingly , the ultimate indistinguishability r e gion is given by: Γ( P X ,α,L )= { P : ∃ V ∈ Γ 0 ( P X ,α ) s.t. EMD ( P,V ) ≤ L } , (60) wher e Γ 0 ( P X ,α )=Γ 0 ( P X ,λ =0 ,α ) . Moreo ver , Γ( P X ,α,L ) can be r e written as: Γ( P X ,α,L )=  P : min V : EMD ( P ,V ) ≤ L X i [ V ( i ) – P X ( i )] + ≤ α (1 − α )  =  P : min V : EMD ( P ,V ) ≤ L d L 1 ( V ,P X ) ≤ 2 α (1 − α )  . (61) with [ a ] + =max { a, 0 } . March 29, 2017 DRAFT IEEE TRANSA CTIONS ON INFORMA TION THEOR Y , V OL. X, NO. X, XXXXXXX XXXX 25 Pr oof. The proof of the ﬁrst part goes along the same steps used in the proof of Theorems 3 and 4 in [4] and is not repeated here. W e sho w , instead, that Γ( P X ,α,L ) can be re written as in (61). By observing that h c ( P ,Q )=0 if and only if P = Q , it is immediate to see that the set Γ 0 ( P X ,λ =0 ,α ) takes the following expression: Γ 0 ( P X ,α )= { P : ∃ Q,Q 0 ∈P s.t. P = P X + α (1 − α ) ( Q − Q 0 ) } . (62) Expression (62) can be rewritten by av oiding the introduction of the auxiliary pmf ’ s Q and Q 0 . T o do so, we observe that Q ( i ) must be larger than Q 0 ( i ) for all the bins i for which P ( i ) >P X ( i ) (and viceversa). In addition, Q and Q 0 must be valid pmf ’ s, hence we hav e P i [ Q ( i ) − Q 0 ( i )] + = P i [ Q 0 ( i ) − Q ( i )] + ≤ 1 . Then, it is easy to see that (62) is equiv alent to the follo wing deﬁnition: Γ 0 ( P X ,α )= ( P : X i [ P ( i ) − P X ( i )] + ≤ α (1 − α ) ) (63) =  P : d L 1 ( P ,P X ) ≤ 2 α (1 − α )  , where the second equality follows by observing that d L 1 ( P ,P X )= P i [ P ( i ) − P X ( i )] + + P i [ P X ( i ) − P ( i )] + . Eventually , equation (61) deri ves immediately from the expression of Γ 0 ( P X ,α ) giv en in (63). According to Theorem 4, Γ( P X ,α,L ) provides the ultimate indistinguishability re gion of the test, that is the set of all the pmf ’ s for which A wins the game. Before going on, we pose to discuss the geometrical meaning of the set Γ 0 ( P X ,α ) in (62). T o do so, we introduce the set Λ ∗ 0 , obtained from Λ ∗ by letting λ →∞ : Λ ∗ 0 =  ( P ,P 0 ): ∃ Q s.t. P 0 = P − αQ (1 − α )  . (64) As usual, we can ﬁx the pmf P and deﬁne: Λ ∗ 0 ( P )=  P 0 : ∃ Q s.t. P 0 = P − αQ (1 − α )  . (65) By referring to Figure 3 (left part), we can geometrically interpret Λ ∗ 0 ( P ) as the set of the pmf ’ s P 0 such that P is a con ve x combination (with coefﬁcient α ) of P 0 with a point Q of the probability simplex. Starting from (43), we can then rewrite Γ 0 ( P X ,α ) as follo ws: Γ 0 ( P X ,α )= { P : ∃ Q ∈P s.t. P ∈ Λ ∗ 0 ((1 − α ) P X + αQ ) } . (66) Accordingly , Γ 0 ( P X ,α ) is geometrically obtained as the union of the acceptance regions built from the points which can be written as a conv ex combination of P X with some point Q in the simplex. As sho wn in the right part of Figure 3, such a region corresponds to an hexagon centred in P X , which, in the probability simplex, is equi valent to the set of points whose L 1 distance from P X is smaller than March 29, 2017 DRAFT IEEE TRANSA CTIONS ON INFORMA TION THEOR Y , V OL. X, NO. X, XXXXXXX XXXX 26 Λ ∗ 0 ( P ) P Q 0 P 0 P 0 Q 0 Λ ∗ 0 ((1 − α ) P X + αQ ) Q P X Γ 0 ( P X , α ) (1 − α ) P X + αQ Fig. 3. Geometrical interpretation of Λ ∗ 0 ( P ) (left) and geometrical construction of Γ 0 ( P X ,α ) (right). The size of the sets are exaggerated for graphical purposes. P X Γ 0 ( P X , α ) P V EMD ( P , V ) < L Γ( P X , α, L ) Fig. 4. Geometrical interpretation of Γ( P X ,α,L ) as stated in Theorem 4. or equal to 2 α/ (1 − α ) (as stated in (63)). Of course, only the points of the he xagon that lie inside the simplex are valid pmf ’ s and then must be accounted for . A pictorial representation of the set Γ( P X ,α,L ) is gi v en in Figure 4. March 29, 2017 DRAFT IEEE TRANSA CTIONS ON INFORMA TION THEOR Y , V OL. X, NO. X, XXXXXXX XXXX 27 B. Security margin and blinding corruption le vel ( α b ) By a closer inspection of the ultimate indistinguishability r e gion Γ( P X ,α,L ) , we can deri ve some interesting parameters characterising the distinguishability of two sources in adversarial setting. Let X ∼ P X and Y ∼ P Y be two sources. Let us focus ﬁrst on the case in which the attacker can not modify the test sequence ( L =0 ). In this situation, the ultimate indistinguishability re gion boils down to Γ 0 ( P X ,α ) . Then we conclude that D can tell the two sources apart if d L 1 ( P Y ,P X ) > 2 α (1 − α ) . On the contrary , if d L 1 ( P Y ,P X ) ≤ 2 α (1 − α ) , A is able to make the sources indistinguishable by corrupting the training sequence. Clearly , the larger the α the easier is for A to win the game. W e can deﬁne the blinding corruption level α b , as the minimum value of α for which tw o sources X and Y can not be distinguished. Speciﬁcally , we hav e: α b ( P X ,P Y ) = d L 1 ( P Y ,P X ) 2+ d L 1 ( P Y ,P X ) = P i [ P Y ( i ) − P X ( i )] + 1+ P i [ P Y ( i ) − P X ( i )] + . (67) From (67) it is easy to see that α b is always lo wer than 1 / 2 , with the limit case α b =1 / 2 corresponding to a situation in which P X and P Y hav e completely disjoint supports 10 . It is interesting to notice that α b is symmetric with respect to the two sources. Since the attacker is allo wed only to add samples to the training sequence without removing existing samples, this might seem a counterintuiti ve result. Actually , the symmetry of α b is a consequence of the worst case approach adopted by the defender . In fact, D itself discards a subset of samples from the training sequence in such a way to maximise the probability that the remaining part of the training sequence and the test sequence hav e been dra wn from the same source. Let us now consider the more general case in which L 6 =0 . For a giv en α<α b , we look for the maximum distortion allowed to A for which it is possible to reliably distinguish between the tw o sources. From equation (61), we see that the attack does not succeed if: min V : EMD ( P Y ,V ) ≤ L d L 1 ( V ,P X ) > 2 α (1 − α ) . (68) This leads to the follo wing deﬁnition, which extends the concept of security margin, introduced in [4], to the more general setup considered in this paper . Deﬁnition 3 (Security Margin in the S I a c - tr setup) . Let X ∼ P X and Y ∼ P Y be two discr ete memoryless sour ces. The maximum distortion allowed to the attacker for which the two sour ces can be r eliably 10 W e remind that for any pair of pmf ’ s ( P ,Q ) , d L 1 ( P ,Q ) ≤ 2 . March 29, 2017 DRAFT IEEE TRANSA CTIONS ON INFORMA TION THEOR Y , V OL. X, NO. X, XXXXXXX XXXX 28 P X Γ 0 ( P X , α ) =      P : d L 1 ( P , P X ) ≤ 2 α (1 − α )      P Y { V : EMD ( P Y , V ) ≤ L ∗ α } L ∗ α Fig. 5. Geometrical interpretation of the Security Margin between two sources X and Y . distinguished in the S I a c - tr setup with a fraction α of possibly corrupted samples, is called Security Mar gin and is given by S M α ( P X ,P Y ) = L ∗ α , (69) wher e L ∗ α =0 if P Y ∈ Γ 0 ( P X ,α ) , while, if P Y / ∈ Γ 0 ( P X ,α ) , L ∗ α is the quantity whic h satisﬁes min V : EMD ( P Y ,V ) ≤ L ∗ α d L 1 ( V ,P X ) = 2 α (1 − α ) . (70) A geometric interpretation of L ∗ α is gi v en in Figure 5. By focusing on the case P Y / ∈ Γ 0 ( P X ,α ) , and by observing that min V : EMD ( P Y ,V ) ≤ L d L 1 ( V ,P X ) (71) is a monotonic non-increasing function of L , the security margin can be e xpressed in e xplicit form as S M α ( P X ,P Y )=argmin L 0 min V : EMD ( P Y ,V ) ≤ L 0     d L 1 ( V ,P X ) − 2 α (1 − α )     . (72) When L> S M α ( P X ,P Y ) , it is not possible for D to distinguish between the two sources with positi ve error exponents of the tw o kinds. By looking at the beha vior of the security margin as a function of α , we see that S M α b ( P X ,P Y )=0 , meaning that, whenev er the fraction of corrupted samples reaches the critical v alue, the sources can not March 29, 2017 DRAFT IEEE TRANSA CTIONS ON INFORMA TION THEOR Y , V OL. X, NO. X, XXXXXXX XXXX 29 be distinguished e ven if the attacker does not introduce any distortion. On the contrary , setting α =0 corresponds to study the distinguishability of the sources with uncorrupted training; in this case we have S M 0 ( P X ,P Y )= EMD ( P X ,P Y ) , in agreement with [4]. W ith reference to Figure 5, it is easy to see that when α =0 the hexagon representing Γ 0 ( P X ,α ) collapses into the single point P X and the security margin corresponds to the Earth Mov er Distance between Y and X . Eventually , we notice that, for α> 0 , the v alue of the security margin in (72) is less than EMD ( P X ,P Y ) . This is also an expected behaviour since the general setting considered in this paper is more fav ourable to the attack er than the setting in [4]. By looking at (72), we can argue that the Security Margin is symmetric with respect to the two sources X and Y , that is, S M α ( P Y ,P X )= S M α ( P X ,P Y ) . T o show that this is the case, we observ e that the pmf V 0 associated with the minimum L , for which we hav e EMD ( P Y ,V 0 )= S M α ( P X ,P Y ) , can be obtained through the application of a map S P Y V that works as follo ws: it does not modify a portion α/ (1 − α ) of P Y and mo ves the remaining mass into an equal amount of P X in a conv enient way (i.e., in such a way to minimise the ov erall distance between the masses). The in verse map can be applied to bring the same quantity of mass from P X to P Y , while leaving as is the remaining mass, thus obtaining a V 00 which satisﬁes EMD ( P X ,V 00 )= EMD ( P Y ,V 0 ) (because of the symmetry of the per-symbol distortion d ) and d L 1 ( V 00 ,P Y )= d L 1 ( V 0 ,P X )=2 α/ (1 − α ) . Arguably , V 00 is the pmf for which EMD ( P X ,V 00 )= S M α ( P Y ,P X ) ; hence, S M α ( P Y ,P X )= S M α ( P X ,P Y ) . 1) Bernoulli sour ces: In order to get some insights on the practical meaning of α b and S M α , we consider the simple case of two Bernoulli sources with parameter q = P X (1) and p = P Y (1) . Assuming that no distortion is allowed to the attacker , the minimum fraction of samples that A must add to induce a decision error is, according to (67), α b = | p − q | 1+ | p − q | . For instance, and rather obviously , when | p − q | =1 , to win the game A must introduce a number of fake samples equal to the number of samples of the correct training sequence, i.e. α =0 . 5 . W ith regard to S M , we have: S M α ( p,q )=    | q − p |− α 1 − α α < α b 0 α ≥ α b . (73) Figure 6 illustrates the behavior of S M α ( p,q ) as a function of α when p =0 . 3 and q =0 . 7 . The blinding corruption value is α b =0 . 286 . V I . S O U R C E I D E N T I FI C A T I O N G A M E W I T H R E P L A C E M E N T O F T R A I N I N G S A M P L E S In this section, we study a variant of the game with corrupted training, in which A observes the training sequence and can replace a selected fraction of samples. Let τ m indicate the original m -sample long training sequence dra wn from X and let M be a subset of m 2 = αm index es in [1 , 2 ...m ] . The attacker can choose the index set M and replace the corresponding samples with m 2 fake samples. More formally , March 29, 2017 DRAFT IEEE TRANSA CTIONS ON INFORMA TION THEOR Y , V OL. X, NO. X, XXXXXXX XXXX 30 0.4 0.3 0.2 0.1 S M α ( p, q ) α α b 0.1 0.2 0.3 0.4 0.5 0 Fig. 6. Security margin as a function of α for Bernoulli sources with parameters p =0 . 3 and q =0 . 7 ( α b =0 . 286 ). A X Y y n x n τ m z n ( d ( z n , y n ) < nL ) t m = σ m ( τ m 1 ¯ M || τ m 2 ) D H 0 /H 1 v n Fig. 7. Block diagram of the S I r c - tr game (targeted corruption). Gi ven the original training sequence τ m , the adversary has the possibility to replace a selected subset of m 2 training samples with fake ones. gi ven the original training sequence τ m , the training sequence seen by the defender is t m = σ ( τ m 1 ¯ M || τ m 2 ) , where ¯ M is the complement of M in [1 , 2 ...m ] , τ m 1 ¯ M is the set of original (non-attacked) samples, and τ m 2 is the sequence with the fake samples introduced by the attacker . Figure 7 illustrates the adv ersarial setup considered in this section for the case of a targeted attack. Arguably , this scenario is more fa vourable to the attacker with respect to the S I a c - tr game. A. F ormal deﬁnition of the S I r c - tr game In the sequel, we formally deﬁne the source identiﬁcation game with replacement of selected samples, namely the S I r c - tr game. As anticipated, we focus on a version of the game in which the corruption of the training samples depends on the to-be-attacked sequence y n (targeted attack), the extension to the case of non-target attack, in fact, can be easily obtained by following the same approach used in Section March 29, 2017 DRAFT IEEE TRANSA CTIONS ON INFORMA TION THEOR Y , V OL. X, NO. X, XXXXXXX XXXX 31 IV -D. 1) Defender’ s strate gies: As in the S I a c - tr game, in order to be sure that the false positive error probability is lo wer than 2 − nλ , the defender adopts a worst case strate gy and considers the maximum of the false positiv e error probability o ver all the possible P X and over all the possible attacks that the training sequence may ha ve undergone, yielding: S D = { Λ n × m ⊂P n ×P m : max P X ∈P max s ∈S A,T P f p ≤ 2 − λn } . (74) While the above expression is formally equal to that of the S I a c - tr game (see eq. (15)), the maximisation ov er S A,T is no w more cumbersome, due to the additional de gree of freedom a v ailable to the attack er , who can selectively remo ve the samples of the original training sequence. In f act, ev en if D kne w the position of the corrupted samples, simply throwing them a way would not guarantee that the remaining part of the sequence would follow the same statistics of X , since the attack er might ha ve deliberately altered them by selecti vely choosing the samples to replace. 2) Attacker’ s strate gies: W ith regard to the attacker , the part of the attack working on the test sequence y n is the same as for the S I a c - tr case, while the part regarding the corruption of the training sequence must be redeﬁned. T o this purpose, we observe that the corrupted training sequence may be any sequence t m for which d H ( t m ,τ m ) ≤ αm , where d H denotes the Hamming distance. Given that the defender basis his decision on the type of t m , it is conv enient to re write the constraint on the Hamming distance between sequences as a constraint on the L 1 distance between the corresponding types. In fact, by looking at the empirical distribution of the corrupted sequence, searching for a sequence t m s.t. d H ( t m ,τ m ) ≤ αm is equi valent to searching for a pmf P t m ∈P m for which d L 1 ( P t m ,P τ m ) ≤ 2 α (see the proof of Lemma 2 in [2]). Therefore, the set of strategies of the attack er is deﬁned by S A = S A,T ×S A,O , where S A,T = { Q ( P τ m ,P y n ): P m ×P n →P m such that d L 1 ( Q ( P τ m ,P y n ) ,P τ m ) ≤ 2 α } , (75) S A,O = { S n Y Z ( P y n ,P t m ): P n ×P m →A n ( L,P y n ) } . (76) Note that, in this case, the function Q ( · , · ) gi ves the type of the whole training sequence observed by D (not only the fak e subpart, as it was in the S I a c - tr game), that is, P t m = Q ( P τ m ,P y n ) . In the following, we will ﬁnd con v enient to express the attacking strategies in S A,T in an alternativ e way . Since the attacker replaces the samples of a subpart of the training sequence, the corruption strategy is equi v alent to ﬁrst removing a subpart of the training sequence and then adding a fake subsequence of March 29, 2017 DRAFT IEEE TRANSA CTIONS ON INFORMA TION THEOR Y , V OL. X, NO. X, XXXXXXX XXXX 32 the same length. Then, the sequence is reordered to hide the position of the fake samples. By focusing on the type of the observed training sequence, we can write: P t m = P τ m − αQ R ( P τ m ,P y n )+ αQ A ( P τ m ,P y n ) . (77) where Q R ( P τ m ,P y n ) and Q A ( P τ m ,P y n ) (both belonging to P m 2 ) are the types of the removed and injected subsequences respectiv ely . In order to simplify the notation, in the follo wing we will av oid to indicate explicitly the dependence of Q R ( P τ m ,P y n ) and Q A ( P τ m ,P y n ) on P τ m , P y n , and will indicate them as Q R () and Q A () . Furthermore, we will use notation Q R and Q A whene ver the dependence from the arguments is not relev ant. By varying Q R and Q A , we obtain all the pmf ’ s that can be produced from P τ m by ﬁrst removing and later adding m 2 samples. Of course not all pairs ( Q R , Q A ) are admissible since the P t m resulting from eq. (77) must be a valid pmf, i.e. it must be nonnegati ve for all the symbols of the alphabet X . 3) P ayoff: As usual, the payoff function is deﬁned as u (Λ n × m , ( Q R () ,Q A () ,S n Y Z ())) = − P f n . (78) B. Equilibrium point and payoff at the equilibrium In order to ensure that P f p is always lo wer than 2 − λn , it is con v enient to use the attack formulation gi ven in (77). For a giv en P X , Q R and Q A , P f p is the probability that X generates two sequences x n and τ m , such that the pair of type classes ( P x n ,P τ m − α ( Q R () − Q A ())) falls outside Λ n × m . Accordingly , the set of strate gies av ailable to D can be rewritten as: S D =  Λ n × m : max P X ∈P max Q R () ,Q A () X P y n ∈P n P Y ( T ( P y n )) · (79) X ( P x n ,P t m ) ∈ ¯ Λ n × m P X ( T ( P x n )) · X P τ m ∈P m : P τ m − α ( Q R () − Q A ())= P t m P X ( T ( P τ m )) ≤ 2 − λn  . By proceeding as in the proof of Lemma 1, it is easy to prov e that the asymptotically optimum strategy for the defender corresponds to the following: Λ n × m, ∗ =  ( P x n ,P t m ): min Q R ,Q A ∈P m 2 h ( P x n ,P t m + α ( Q R − Q A )) ≤ λ − δ n  , (80) March 29, 2017 DRAFT IEEE TRANSA CTIONS ON INFORMA TION THEOR Y , V OL. X, NO. X, XXXXXXX XXXX 33 where δ n tends to 0 as n →∞ and the minimization is limited to Q R and Q A in P m 2 such that P t m + α ( Q R − Q A ) is a v alid pmf. Consequently , the optimum attacking strate gy is gi ven by: ( Q ∗ ( P τ m ,P y n ) , S n, ∗ Y Z ( P y n ,P t m )) = argmin P t m s.t. d L 1 ( P t m ,P τ m ) ≤ 2 α S n Y Z ∈A n ( L,P y n )  min Q R ,Q A h ( P z n ,P t m + α ( Q R − Q A ))  , (81) hence resulting in the follo wing theorem. Theorem 5. The S I r c - tr game with tar geted corruption is a dominance solvable game, whose only rationalizable equilibrium corr esponds to the pr oﬁle (Λ n × m, ∗ , ( Q ∗ () , S n, ∗ Y Z ())) given by equations (80) and (81) . In order to study the asymptotic payoff of the S I r c - tr game at the equilibrium, we parallel the analysis carried out in Sec. IV -C. By considering the case L =0 , the set of pairs of types for which D will accept H 0 as a consequence of the attack to the training sequence is gi ven by Γ n 0 ( λ,α ) = { ( P y n ,P τ m ): ∃ P t m s.t. d L 1 ( P t m ,P τ m ) ≤ 2 α and ( P y n ,P t m ) ∈ Λ n × m, ∗ } . (82) If we ﬁx the type of the original training sequence, we get: Γ n 0 ( P τ m ,λ,α ) = { P y n : ∃ P t m s.t. d L 1 ( P t m ,P τ m ) ≤ 2 α and P y n ∈ Λ n, ∗ ( P t m ) } = { P y n : ∃ P t m , ∃ Q,Q 0 ∈P m 2 , s.t. (83) d L 1 ( P t m ,P τ m ) ≤ 2 α and h ( P x n ,P t m − αQ 0 + αQ ) ≤ λ − δ n } . By letting n go to inﬁnity , we obtain the asymptotic counterpart of the abov e set, which, for a generic R ∈P , takes the following expression: Γ 0 ( R,λ,α ) =  P : ∃ P 0 ,Q,Q 0 , s.t. d L 1 ( P 0 ,R ) ≤ 2 α and h c ( P ,P 0 − αQ 0 + αQ ) ≤ λ  . (84) When L 6 =0 , we obtain: Γ( R,λ,α,L ) = { P : ∃ V ∈ Γ 0 ( R,λ,α ) s.t. EMD ( P,V ) ≤ L } . (85) March 29, 2017 DRAFT IEEE TRANSA CTIONS ON INFORMA TION THEOR Y , V OL. X, NO. X, XXXXXXX XXXX 34 W ith the above deﬁnitions, it is straightforw ard to e xtend Theorem 2 to the S I r c - tr case, thus proving that the set in (85) ev aluated in R = P X represents the indistinguishability re gion of the S I r c - tr game. C. Security margin and blinding corruption le vel As a last contrib ution, we are interested in studying the ultimate distinguishability of two sources X and Y in the S I r c - tr setting and compare it with the result we have obtained for the S I a c - tr case. T o do so, we consider the behaviour of the indistinguishability re gion when λ tends to 0. W e ha ve: Γ( P X ,α,L ) = { P : ∃ V ∈ Γ 0 ( P X ,α ) s.t. EMD ( P,V ) ≤ L } , (86) where Γ 0 ( P X ,α ) =  P : ∃ P 0 ,Q,Q 0 s.t. d L 1 ( P 0 ,P X ) ≤ 2 α and P = P 0 + α ( Q − Q 0 )  =  P : ∃ P 0 s.t. d L 1 ( P 0 ,P X ) ≤ 2 α and d L 1 ( P ,P 0 ) ≤ 2 α  . (87) The set in (87) can be equiv alently rewritten as Γ 0 ( P X ,α ) =  P : d L 1 ( P ,P X ) ≤ 4 α  . (88) T o see why , we ﬁrst notice that set (87) is contained in (88). Indeed, from the triangular inequality we have that, for any P 0 , d ( P ,P X ) ≤ d L 1 ( P ,P 0 )+ d L 1 ( P 0 ,P X ) . Then, if P belongs to Γ 0 ( P X ,α ) in (87), it also belongs to the set in (88). T o see that the two sets are indeed equi v alent, it is sufﬁcient to sho w that the re verse implication also holds. T o this purpose, we observe that, whenever d L 1 ( P ,P X ) ≤ 4 α , a type P ∗ can be found such that its distance from both P and P X is less or at most equal to 2 α . In fact, by letting P ∗ = P + P X 2 , we have d L 1 ( P ,P ∗ ) = d L 1 ( P ∗ ,P X ) = X i     P ( i ) − P X ( i ) 2     d L 1 ( P ,P X ) = X i     P X ( i ) − P ( i )     = 2 d L 1 ( P ,P ∗ ) . (89) If d L 1 ( P ,P X ) ≤ 4 α , then, d L 1 ( P ,P ∗ ) = d L 1 ( P ∗ ,P X ) = d L 1 ( P ,P X ) / 2 ≤ 2 α , permitting us to conclude that the sets in (87) and (88) are equi valent. Upon inspection of equation (88), we can conclude that, as expected, the indistinguishability region for L =0 (and hence, also for the case L 6 =0 ) is larger than that of the S I a c - tr game (see (63)), thus conﬁrming that the game with sample replacement is more f av ourable to the attacker (a graphical comparison between March 29, 2017 DRAFT IEEE TRANSA CTIONS ON INFORMA TION THEOR Y , V OL. X, NO. X, XXXXXXX XXXX 35 P X Γ 0 ( P X , α ) =      P : d L 1 ( P , P X ) ≤ 2 α (1 − α )      Γ 0 ,c ( P X , α ) = { P : d L 1 ( P , P X ) ≤ 4 α } Fig. 8. Comparison of the indistinguishability regions for the S I a c - tr and S I r c - tr games with L =0 . the indistinguishability regions for the two setups is sho wn in Figure 8). As a matter of fact, for the attacker , the adv antage of the S I r c - tr game with respect to the S I a c - tr game depends on α . For small α and for α close to 1 / 2 , the indistinguishability re gions of the two games are very similar , while for intermediate values of α the indistinguishability region of the S I r c - tr game is considerably larger than that of the S I a c - tr game (the maximum difference between the two regions is obtained for α ≈ 0 . 3 ). When α =1 / 2 the attacker always wins, since he is able to bring any pmf inside the acceptance region regardless of the game version, while for α =0 , we fall back into the source identiﬁcation game without corruption of the training sequence, thus making the two v ersions of the game equiv alent. Gi ven two sources X and Y , the blinding corruption le vel v alue takes the expression: α b = d L 1 ( P Y ,P X ) 4 . (90) Since d L 1 ( P Y ,P X ) ≤ 2 for any couple ( P Y ,P X ) (the maximum value 2 is taken when the two distribution hav e disjoint support), the blinding value for the S I r c - tr game is lower than the blinding v alue of S I a c - tr game. The tw o expressions are identical when the two sources have disjoint support, in which case α b =1 / 2 . When the attacker can also corrupt the test sequence, the ultimate indistinguishability r e gion of the S I r c - tr game is: Γ( P X ,α,L ) =  P : min V : EMD ( P ,V ) ≤ L d L 1 ( V ,P X ) ≤ 4 α  . (91) March 29, 2017 DRAFT IEEE TRANSA CTIONS ON INFORMA TION THEOR Y , V OL. X, NO. X, XXXXXXX XXXX 36 0.4 0.3 0.2 0.1 S M α ( p, q ) α α b 0.1 0.2 0.3 0.4 0.5 0 Fig. 9. Security margin as a function of α for Bernoulli sources with parameters p =0 . 3 and q =0 . 7 ( α b =0 . 1 ). Starting from (91) we can deﬁne the security margin in the S I r c - tr setup. Deﬁnition 4 (Security Mar gin in the S I r c - tr setup) . Let X ∼ P X and Y ∼ P Y be two discr ete memory- less sources. The maximum distortion for whic h the two sources can be r eliably distinguished in the S I r c - tr setup is called Security Mar gin and is given by S M α ( P X ,P Y )= L ∗ α , (92) wher e L ∗ α is the quantity whic h satisﬁes the following r elation min V : EMD ( P Y ,V ) ≤ L ∗ α d L 1 ( V ,P X ) = 4 α , (93) if P Y / ∈ Γ 0 ( P X ,α ) , and L ∗ α =0 otherwise. Considering again the case of two Bernoulli sources and by adopting the same notation of Section V -B1, we ha ve that α b = | p − q | / 4 , while the security margin is S M α ( p,q )=    | q − p |− 2 α α < α b 0 α ≥ α b . (94) Figure 6 plots S M α as a function of α when p =0 . 3 and q =0 . 7 . The blinding v alue is α b =0 . 1 which, as expected, is lower than the v alue we found for the S I a c - tr setup. V I I . C O N C L U S I O N S W e studied the distinguishability of two sources in an adv ersarial setup when the sources are kno wn through training data, part of which can be corrupted by the attacker himself. W e considered two different March 29, 2017 DRAFT IEEE TRANSA CTIONS ON INFORMA TION THEOR Y , V OL. X, NO. X, XXXXXXX XXXX 37 scenarios. In the ﬁrst one, the attacker simply adds fake samples to the original training sequence, while in the second one, the attacker replaces a selected subset of training samples with fake ones. W e formalised both cases in a game-theoretic setup, then we derived the equilibrium point of the games and analysed the (asymptotic) payof f at the equilibrium. The result of the game can be summarised in a compact and elegant way by introducing two parameters, namely the Security Margin under corruption of the training sequence, and the blinding corruption level α b , deﬁned as the portion of f ake samples the attacker must introduce to make impossible any reliable distinction between the sources. Based on these two parameters, the performance of the two games with corruption of the training data can be easily compared. Though rather theoretical, our ﬁndings can guide more practical researches in sev eral ﬁelds belonging to the emerging areas of adversarial signal processing [1] and secure machine learning [6]. In many cases, in fact, the defender must take into account the possibility that the data he is using to tune the system he is working at, or during the learning phase, is corrupted by the attacker . The analysis carried out in this paper can be e xtended in sev eral w ays, for instance by considering continuous sources, or by assuming that the sources X and Y are not memoryless, b ut still amenable to be studied by using the method of types [21]. F ollo wing the analysis in [22], we could also consider a more general setup in which the attacker is activ e under both H 0 and H 1 . An interesting generalisation, consists in studying a symmetric setup in which the training and the test sequences can be corrupted by applying the same kinds of processing. For instance, the attacker could be allowed to replace samples in both the training and the set sequences, or he could be allo wed to modify the training sequence up to a certain distortion. Other kinds of attacks to the training data could also be considered, like sample remov al with no addition of fake samples. As a matter of fact, the kind of attack strongly depends on the application scenario, and it is arguable that the av ailability of a large variety of theoretical models would help bridging the gap between theory and practice. A C K N O W L E D G M E N T This work has been partially supported by a research sponsored by DARP A and Air Force Research Laboratory (AFRL) under agreement number F A8750-16-2-0173. The U.S. Government is authorised to reproduce and distrib ute reprints for Governmental purposes notwithstanding any copyright notation thereon. The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the ofﬁcial policies or endorsements, either expressed or implied, of D ARP A and Air Force Research Laboratory (AFRL) or the U.S. Gov ernment. March 29, 2017 DRAFT IEEE TRANSA CTIONS ON INFORMA TION THEOR Y , V OL. X, NO. X, XXXXXXX XXXX 38 R E F E R E N C E S [1] M. Barni and F . P ´ erez-Gonz ´ alez, “Coping with the enemy: adv ances in adversary-aw are signal processing, ” in ICASSP 2013, IEEE Int. Conf. Acoustics, Speech and Signal Pr ocessing , V ancouver , Canada, 26-31 May 2013, pp. 8682–8686. [2] M. Barni and B. T ondi, “The source identiﬁcation g ame: an information-theoretic perspective, ” IEEE T ransactions on Information F or ensics and Security , vol. 8, no. 3, pp. 450–463, March 2013. [3] ——, “Binary hypothesis testing game with training data, ” IEEE T ransactions on Information Theory , vol. 60, no. 8, August 2014, doi:10.1109/TIT .2014.2325571. [4] ——, “Source distinguishability under distortion-limited attack: an optimal transport perspecti ve, ” IEEE T ransactions on Information F or ensics and Security , vol. 11, no. 10, pp. 2145–2159, October 2016, doi:10.1109/TIFS.2016.2570739. [5] M. Barreno, B. Nelson, R. Sears, A. D. Joseph, and J. D. T ygar, “Can machine learning be secure?” in Proceedings of the 2006 A CM Symposium on Information, Computer and Communications Security , ser . ASIACCS ’06. Ne w Y ork, NY , USA: A CM, 2006, pp. 16–25. [Online]. A v ailable: http://doi.acm.org/10.1145/1128817.1128824 [6] M. Barreno, B. Nelson, A. D. Joseph, and J. D. T ygar , “The security of machine learning, ” Machine Learning , vol. 81, no. 2, pp. 121–148, 2010. [7] H. Xiao, B. Biggio, B. Nelson, H. Xiao, C. Eckert, and F . Roli, “Support vector machines under adv ersarial label contamination, ” Neurocomputing , vol. 160, pp. 53–62, 2015. [8] T . M. Cover and J. A. Thomas, Elements of Information Theory . New Y ork: W iley Interscience, 1991. [9] A. Dembo and O. Zeitouni, Lar ge Deviations T echniques and Applications . Springer Science & Business Media, 2009. [10] M. Barni and B. T ondi, “Source distinguishability under corrupted training, ” in Pr oc. of W ifs 2014, IEEE International W orkshop on Information F or ensics and Security , Atlanta, Georgia, 3-5 December 2014. [11] I. Csisz ´ ar and J. K ¨ orner , Information Theory: Coding Theorems for Discr ete Memoryless Systems. 2nd edition . Cambridge Univ ersity Press, 2011. [12] M. Gutman, “ Asymptotically optimal classiﬁcation for multiple tests with empirically observed statistics, ” IEEE T ransac- tions on Information Theory , vol. 35, no. 2, pp. 401–408, March 1989. [13] M. Kendall and S. Stuart, The Advanced Theory of Statistics, vol. 2, 4th edition . New Y ork: MacMillan, 1979. [14] J. Munkres, T opology , ser . Featured Titles for T opology Series. Prentice Hall, Incorporated, 2000. [Online]. A vailable: https://books.google.it/books?id=XjoZA QAAIAAJ [15] J. Henrikson, “Completeness and total boundedness of the Hausdorf f metric, ” MIT Under graduate Journal of Mathematics , vol. 1, pp. 69–80, 1999. [16] M. J. Osborne and A. Rubinstein, A Course in Game Theory . MIT Press, 1994. [17] Y . C. Chen, N. V an Long, and X. Luo, “Iterated strict dominance in general games, ” Games and Economic Behavior , vol. 61, no. 2, pp. 299–315, November 2007. [18] D. Bernheim, “Rationalizable strategic behavior , ” Econometrica , vol. 52, pp. 1007–1028, 1984. [19] S. T . Rachev , Mass T ransportation Pr oblems: V olume I: Theory . Springer , 1998, vol. 1. [20] Y . Rubner , C. T omasi, and L. J. Guibas, “The earth mover’ s distance as a metric for image retriev al, ” Int. J . Comput. V ision , vol. 40, no. 2, pp. 99–121, Nov ember 2000. [21] I. Csiszar , “The method of types, ” IEEE T r ansactions on Information Theory , vol. 44, no. 6, pp. 2505–2523, October 1998. [22] B. T ondi, M. Barni, and N. Merhav , “Detection games with a fully acti ve attacker , ” in IEEE International W orkshop on Information F or ensics and Security (WIFS) . IEEE, 2015, pp. 1–6. [23] S. I.N., “On the probability of large deviations of random variables, ” Math. Sbornik , vol. 42, pp. 11–44, 1957. [24] K. Kuratowski, T opology , ser . T opology . Academic Press, 1968, vol. 1. March 29, 2017 DRAFT IEEE TRANSA CTIONS ON INFORMA TION THEOR Y , V OL. X, NO. X, XXXXXXX XXXX 39 [25] G. Salinetti and J. B. W ets, “On the con vergence of sequences of con ve x sets in ﬁnite dimensions, ” Siam re view , vol. 21, no. 1, pp. 18–33, 1979. [26] D. Bertsimas and J. N. Tsitsiklis, Intr oduction to linear optimization . Belmont, MA: Athena Scientiﬁc, 1997, vol. 6. [27] S. Boyd and L. V andenberghe, Conve x optimization . Cambridge University Press, 2004. A P P E N D I X A. Generalized Sanov’ s theor em Let us consider a sequence of n i.i.d. discrete random variables taking v alues in a ﬁnite alphabet X and distrib uted according to a pmf P . W e denote with P n the empirical pmf of the sequence. Let E ⊆P be a set of pmf ’ s. Sanov’ s theorem [8], [23], [9] states that inf Q ∈ E D ( Q || P ) ≤ − lim sup n →∞ 1 n log P ( P n ∈ E ) ≤ − lim inf n →∞ 1 n log P ( P n ∈ E ) ≤ inf Q ∈ int E D ( Q || P ) , (A1) where int S denote the interior part of the set S . When cl ( E ) = cl ( int ( E )) 11 , or , E ⊆ cl ( int ( E )), the left and right-hand side of (A1) coincide and we get the exact rate: − lim n →∞ 1 n log P ( P n ∈ E ) = inf Q ∈ E D ( Q || P ) . (A2) If we deﬁne the set E n = E ∩P n , we ha ve: P ( P n ∈ E )= P ( P n ∈ E n ) and we can re write Sanov’ s theorem as: inf Q ∈ E D ( Q || P ) ≤ − lim sup n →∞ 1 n log P ( P n ∈ E n ) ≤ − lim inf n →∞ 1 n log P ( P n ∈ E n ) ≤ inf Q ∈ int E D ( Q || P ) , (A3) Note that, by construction, we hav e cl ( E ) = cl ( ∪ n E n ). In the follo wing, we extend the formulation of Sanov’ s theorem gi v en in (A3) to more general sequences of sets E n for which it does not necessary hold that E n = E ∩P n for some set E . W e start by introducing the notion of con ver gence for sequences of subsets due to Kurato wsk y , which is a more general notion of conv ergence with respect to the one based on Hausdorf f distance. Let ( S,d ) be a metric space. W e ﬁrst provide the deﬁnition of lower closed limit or Kurato wski limit inferior [24]. 11 cl ( E ) denotes the closure of E . Clearly , cl ( E ) ≡ E if E is a closed set. March 29, 2017 DRAFT IEEE TRANSA CTIONS ON INFORMA TION THEOR Y , V OL. X, NO. X, XXXXXXX XXXX 40 Deﬁnition 5. A point p belongs to the lower limit Li n →∞ K n (or simply LiK n ) of a sequence of sets K n , if every neighborhood of p intersects all the K n ’ s fr om a sufﬁciently gr eat index n onward. Given the above deﬁnition, the expr ession p ∈ Li n →∞ K n is equivalent to the existence of a sequence of points { p n } such that: p = lim n →∞ p n , p n ∈ K n . (A4) Stated in another way , LiK n is the set of the accumulation points of sequences in K n . As an alternative, equivalent, deﬁnition we can let: Li n →∞ K n = { p ∈ X s.t. lim sup n →∞ d ( x,K n ) = 0 } . (A5) Similarly , we ha ve the following deﬁnition of upper closed limit or Kuratowski limit superior [24]. Deﬁnition 6. A point p belongs to the upper limit Ls n →∞ K n (or simply LsK n ) of a sequence of sets K n , if every neighborhood of p intersects an inﬁnite number of terms in K n . The e xpr ession p ∈ Ls n →∞ K n is equivalent to the existence of a subsequence of points { p k n } such that k 1 n ( j ) , ∀ i > j March 29, 2017 DRAFT IEEE TRANSA CTIONS ON INFORMA TION THEOR Y , V OL. X, NO. X, XXXXXXX XXXX 42 veriﬁed with δ =0 . If the number of elements in Q n ( j ) is inﬁnite, then, due to the boundedness of P , the elements of Q n ( j ) must hav e at least one accumulation point (Bolzano-W eierstrass theorem). Let A i ’ s be the accumulation points of Q n ( j ) . By deﬁnition of Ls , all A i ’ s belong to Ls ( cl ( E ( n ) )) . In addition, for any radius ρ , from a certain j on, all the points in Q n ( j ) belong to R = S i B ( A i ,ρ ) 13 . F or large enough n , then we ha ve: min Q ∈ cl ( E ( n ) ) D ( Q || P ) ≥ min Q ∈ Ls ( cl ( E ( n ) )) ∪R D ( Q || P ) (A13) ≥ min Q ∈ Ls ( cl ( E ( n ) )) D ( Q || P ) − δ, where the second inequality deri ves from the continuity of the D function and the arbitrariness of ρ . By inserting equation (A11) in (A10), we ha ve that, for large n , 1 n log P ( E ( n ) ) ≤− min Q ∈ LsE ( n ) D ( Q || P )+ log( n +1) |X | n + δ, (A14) and hence, by the arbitrariness of δ , − limsup n →∞ 1 n log P ( E ( n ) ) ≥ min Q ∈ LsE ( n ) D ( Q || P ) . (A15) W e no w pass to the upper bound. Let Q ∗ be a point achie ving the minimum of the di ver gence ov er the set LiE n . By deﬁnition of limit inferior , there e xists a sequence of points { Q n } , Q n ∈ E n such that Q n → Q ∗ as n →∞ . Then, by exploiting the continuity of D , it follo ws that: D ( Q n || P ) ≤ D ( Q ∗ || P ) + γ , (A16) where γ can be made arbitrarily small for large n . W e can then write: P ( E ( n ) ) = X Q ∈ E n P ( T ( Q )) ≥ P ( T ( Q n )) ≥ 2 − n D ( Q n || P ) ( n +1) |X | . (A17) Hence, we get 1 n log P ( E ( n ) ) ≥ −D ( Q n || P ) − |X | log( n +1) n , ≥ −D ( Q ∗ || P ) − γ − |X | log( n +1) n , ≥ − min Q ∈ LiE n D ( Q || P ) − γ − |X | log( n +1) n , (A18) 13 B ( A i ,ρ ) is a ball with radius ρ centred in A i . March 29, 2017 DRAFT IEEE TRANSA CTIONS ON INFORMA TION THEOR Y , V OL. X, NO. X, XXXXXXX XXXX 43 and then, by the arbitrariness of γ , − liminf n →∞ 1 n log P ( E ( n ) ) ≤ min Q ∈ LiE n D ( Q || P ) , (A19) which concludes the proof of the ﬁrst part (relation (A7)). For the proof of the second part, we observe that, when LsE ( n ) = Li ( E ( n ) ∩P n ) , the two bounds in (A7) coincides. Moreover , the follo wing chain of inclusions holds, LiE ( n ) ⊆ LsE ( n ) = Li ( E ( n ) ∩ P n ) ⊆ LiE ( n ) , and then LiE ( n ) = LsE ( n ) = LimE ( n ) , yielding (A8). W e observe that, in general, the Kurato wski con ver gence of E ( n ) is a necessary condition for the exis- tence of the generalized Sanov limit in (A8), b ut it is not suf ﬁcient. In f act, we could hav e LiE ( n ) ⊇ Li ( E ( n ) ∩ P n ) , in which case the lo wer and upper bound in (A7) do not coincide. It is also interesting to notice that when E ( n ) ∈ P n is a sequence of sets in P n , then Sanov’ s limit holds whenever E ( n ) K → E for some set E , or , by e xploiting the compactness of P , E ( n ) H → E . Based on the abov e observation, we can state the following corollary: Corollary 1. Let E ( n ) be a sequence of sets in P n , such that E ( n ) H → E . Then: − lim n →∞ 1 n log P ( P n ∈ E ( n ) ) = min Q ∈ E D ( Q || P ) . (A20) B. Re gularity pr operties of the set of admissible maps T o prove the theorems on the asymptotic beha viour of the payoff in the two versions of the source identiﬁcation game studied in this paper, we need to prov e some regularity theorems on the set of admissible maps. T o start with, we need to deﬁne a distance between transportation maps, that is a function d s : R |X |×|X | × R |X |×|X | → R + . In accordance with the rest of the paper , let us choose the L 1 distance, that is, gi ven two maps ( S P V ,S QR ), we deﬁne d s ( S P V ,S QR )= P i,j | S P V ( i,j ) − S QR ( i,j ) | . Our ﬁrst result re gards the re gularity of A ( L,P ) as a function of P . Lemma 2. Let P ∈P and let P 0 be any pmf in the neighbourhood of P of radius τ , i.e., P 0 ∈B ( P ,τ ) . Then δ H ( A ( L,P ) , A ( L,P 0 )) ≤ τ and hence lim τ → 0 δ H ( A ( L,P ) , A ( L,P 0 )) = 0 , uniformly in P . Mor eover , if we insist that P 0 ∈P n , the following r esult holds: ∀ ε> 0 , ∃ τ ∗ and n ∗ such that ∀ τ <τ ∗ and n>n ∗ , δ H ( A ( L,P ) , A n ( L,P 0 )) ≤ ε ∀ P 0 ∈B ( P ,τ ) ∩P n , ∀ P ∈P . March 29, 2017 DRAFT IEEE TRANSA CTIONS ON INFORMA TION THEOR Y , V OL. X, NO. X, XXXXXXX XXXX 44 Pr oof. From a general perspecti v e, the lemma follo ws from the fact that A n ( L,P y n ) (and A ( L,P ) ) is built by imposing a number of linear constraints on the admissible transportation maps (see eq. (11)), i.e. A ( L,P ) is a con v ex polytope [26], [27]. By considering a P 0 close to P , we are perturbing the vector of the known terms of the linear constraints which deﬁnes the admissibility set. Instead of in voking the abov e general principle, in the following we giv e an e xplicit proof of the lemma. Gi ven P ∈P and P 0 ∈B ( P ,τ ) , let τ ( i )= P ( i ) − P 0 ( i ) be the excess (or defect) of mass of P with respect to P 0 in bin i . For any map in A ( L,P ) , we can choose a map S P 0 V 0 that works as follows: for the bins i such that τ ( i ) ≤ 0 , let S P 0 V 0 ( i,j )= S P V ( i,j ) for j 6 = i , while for j = i , we let S P 0 V 0 ( i,j )= S P V ( i,j )+ | τ ( i ) | . For the bins i for which τ ( i ) > 0 , we ﬁrst sort the inde x set { j : S P V ( i,j ) 6 =0 } in decreasing order with respect to the amount of distortion introduced per unit of mass deli vered from i to j ( d ( i,j ) ). Then, starting from the ﬁrst index in the ordered list, we let S P 0 V 0 ( i,j )=max(0 , S P V ( i,j ) − τ ( i )) . If S P 0 V 0 ( i,j )=0 , we update τ ( i ) to a new value τ 0 ( i )= τ ( i ) − S P V ( i,j ) , and iterate the pre vious procedure by subtracting the updated value of τ 0 ( i ) from the second S P V ( i,j ) in the list. This procedure goes on until the subtraction gi ves S P 0 V 0 ( i,j ) 6 =0 , that is when we have remov ed all the e xcess mass from the i -th row of S P V ( i,j ) . It is easy to see that the map b uilt in this way satisﬁes the distortion constraint, in f act, by construc- tion the distortion associated to S P 0 V 0 is less than that introduced by S P V . Then, S P 0 V 0 ∈A ( L,P 0 ) . In addition, by construction, P j | S P 0 V 0 ( i,j ) − S P V ( i,j ) |≤| τ ( i ) | , and hence P ij | S P 0 V 0 ( i,j ) − S P V ( i,j ) |≤ τ . Accordingly , we ha ve: δ A ( L,P ) ( A ( L,P 0 ))= (A21) max S P V ∈A ( L,P ) min S P 0 V 0 ∈A ( L,P 0 ) d s ( S P V ,S P 0 V 0 ) ≤ τ since, as we have shown with the preceding construction, the inner minimum is always lo wer or equal than τ . By repeating the same argument exchanging the role of A ( L,P ) and A ( L,P 0 ) , we ﬁnd that δ H ( A ( L,P 0 ) , A ( L,P )) ≤ τ , thus concluding the ﬁrst part of the proof. In the second part of the lemma, we require that P 0 ∈P n and that the map produces a sequence in P n . The proof is easily achie ved by exploiting the ﬁrst part of the lemma according to which for any map S P V in A ( L,P ) , we can ﬁnd a map S P 0 V 0 in A ( L,P 0 ) which is arbitrarily close to S P V , and then approximating S P 0 V 0 with a map S n P 0 V 0 ∈A n ( L,P 0 ) . Due to the density of rational numbers in real numbers, such an approximation can be made arbitrarily accurate by increasing n , thus completing the proof. Gi ven a transformation S P V mapping P into V , Lemma 2 states that, for any pmf P 0 close to P , we can ﬁnd a map S P 0 V 0 close to S P V . The following theorem e xtends such a result to the pmf resulting from the application of the mapping. March 29, 2017 DRAFT IEEE TRANSA CTIONS ON INFORMA TION THEOR Y , V OL. X, NO. X, XXXXXXX XXXX 45 Theorem 7. Let P ∈P , and let P 0 be any pmf in the neighbourhood of P of r adius τ , i.e., P 0 ∈B ( P ,τ ) . Let S P V ∈A ( L,P ) . Then, we can always ﬁnd a map S P 0 V 0 ∈A ( L,P 0 ) such that V 0 ∈B ( V ,τ ) . Similarly , for any ε> 0 , ther e exist τ ∗ and n ∗ such that ∀ τ <τ ∗ and n>n ∗ , given a P ∈P , a map S P V ∈A ( L,P ) and P 0 ∈P n ∩B ( P ,τ ) , we can ﬁnd a map S n P 0 V 0 in A n ( L,P 0 ) such that V 0 n ∈B ( V ,ε ) ∩P n . Pr oof. For any two maps S P V and S P 0 V 0 , we have: V 0 ( j ) = X i S P 0 V 0 ( i,j ) = X i ( S P V ( i,j )+( S P 0 V 0 ( i,j ) − S P V ( i,j ))) ≤ V ( j )+ X i | S P 0 V 0 ( i,j ) − S P V ( i,j ) | , (A22) and V 0 ( j ) = X i S P 0 V 0 ( i,j ) = X i ( S P V ( i,j )+( S P 0 V 0 ( i,j ) − S P V ( i,j ))) ≥ V ( j ) − X i | S P 0 V 0 ( i,j ) − S P V ( i,j ) | , (A23) yielding: | V 0 ( j ) − V ( j ) | ≤ X i | S P 0 V 0 ( i,j ) − S P V ( i,j ) | . (A24) By summing over j and exploiting Lemma 2, we can choose S P 0 V 0 so that: X j | V 0 ( j ) − V ( j ) | ≤ X i,j | S P 0 V 0 ( i,j ) − S P V ( i,j ) | ≤ δ H ( A ( L,P 0 ) , A ( L,P )) ≤ τ , (A25) and hence V 0 ∈B ( V , | τ ) . Similarly to the second part of Lemma 2, the second part of the theorem follo ws immediately from the density of rational numbers in the real line. March 29, 2017 DRAFT

Adversarial Source Identification Game with Corrupted Training

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment