Smart Jammer and LTE Network Strategies in An Infinite-Horizon Zero-Sum Repeated Game with Asymmetric and Incomplete Information

LTE/LTE-Advanced networks are known to be vulnerable to denial-of-service and loss-of-service attacks from smart jammers. In this article, the interaction between a smart jammer and LTE network is modeled as an infinite-horizon, zero-sum, asymmetric …

Authors: Farhan M. Aziz, Lichun Li, Jeff S. Shamma

Smart Jammer and LTE Network Strategies in An Infinite-Horizon Zero-Sum   Repeated Game with Asymmetric and Incomplete Information
1 Smart Jammer and L TE Network Strate gies in An Infinite-Horizon Zero-Sum Repeated Game with Asymmetric and Incomplete Information Farhan M. Aziz, Lichun Li, Jef f S. Shamma, and Gordon L. St ¨ uber Abstract —L TE/L TE-Advanced networks are known to be vul- nerable to denial-of-service (DOS) and loss-of-service attacks from smart jammers. In this article, the interaction between a smart jammer and L TE network (eNode B) is modeled as an infinite-horizon, zero-sum, repeated game, with asymmetric and incomplete information. The smart jammer and eNode B are modeled as the informed and the uninformed player , respectively . The main purpose of this article is to construct efficient subop- timal strategies for both players that can be used to solve the above-mentioned infinite-horizon repeated game with asymmetric and incomplete information. It has been shown in game-theoretic literature that security strategies provide optimal solution in zero-sum games. It is also shown that both players’ security strategies in an infinite-horizon asymmetric game depend only on the history of the informed player’ s actions. However , fixed-sized sufficient statistics ar e needed f or both play ers to solve the above- mentioned game efficiently . The smart jammer (informed player) uses its evolving belief state as the fixed-sized sufficient statistics for the repeated game. Whereas, the L TE network (uninf ormed player) uses worst-case regret of its security strategy and its anti-discounted update as the fixed-sized sufficient statistics. Although fixed-sized sufficient statistics are employed by both players, optimal security strategy computation in λ -discounted asymmetric games is still hard to perform because of non- con vexity . Hence, the problem is con vexified in this article by devising “approximated” security strategies for both players that are based on approximated optimal game value. Howev er , “approximated” strategies requir e full monitoring. Theref ore, a simplistic yet effectiv e “expected” strategy is also constructed f or the L TE network (uninformed player) that does not require full monitoring. The simulation results show that the smart jammer plays non-re vealing and misleading strategies against the network for its own long-term advantage. Index T erms —L TE; smart jamming; λ -discounted repeated games; asymmetric information; linear programming. I . I N T R O D U C T I O N L TE/L TE-A ( [1], [2]) networks have been deployed around the world providing advanced data, V oice-o ver- L TE (V oL TE), multimedia and location-based services to more F . M. Aziz is a systems engineer working on 4G/5G technologies at Intel Corporation, San Diego, CA and an alumnus of the Wireless Systems Laboratory (WSL), School of Electrical and Computer Engineering, Georgia Institute of T echnology , Atlanta, GA, 30332, USA. G. L. St ¨ uber is the professor and director of the W ireless Systems Laboratory (WSL). E-mail: { faziz,stuber } @ece.gatech.edu. L. Li is an assistant professor with the Department of Industrial and Manufacturing Engineering, Florida A&M University - Florida State Uni- versity College of Engineering, T allahassee, FL 32310, USA. E-mail: lichunli@eng.famu.fsu.edu. J. S. Shamma is the professor and director of the RISC Lab, Computer , Electrical and Mathematical Science and Engineering (CEMSE) Division, King Abdullah Univ ersity of Science and T echnology (KA UST), Thuwal, Saudi Arabia. E-mail: jeff.shamma@kaust.edu.sa. than 3.2 billion subscribers via 681 commercial networks [3]. Howe ver , it has been previously sho wn that Long T erm Evolu- tion (L TE) and L TE-Advanced (L TE-A) networks are vulner - able to control-channel jamming attacks from smart jammers who can “learn” network parameters and “synchronize” them- selves with the network ev en when they are not attached to it (cf. [4]–[9]). It is shown in the abov e-referenced articles that such a smart jammer can launch very effecti ve denial- of-service (DOS) and loss of service attacks without ev en hacking the network or its components. Recently , Garnaev and T rappe [10] also looked into the possibility when the ri val is not smart in a jamming game with incomplete information. Hence, pursuing autonomous techniques to address potentially dev astating wireless jamming problem has become an acti ve research topic. In this article, the interaction between the L TE network and the smart jammer is modeled as an infinite-horizon zero-sum repeated 1 Bayesian game with asymmetric and incomplete information . This article is similar to pre viously published articles by the authors ( [5], [7], [9]) with the exception of zero-sum and λ -discounted utility . The main purpose of this article is to construct efficient suboptimal strategies for both players to solve the above-mentioned infinite-horizon game with asymmetric and incomplete information. Asymmetric information games (cf. [11] - [15]) provide a rich frame- work to model situations in which one player lacks complete knowledge about the “state of the nature”. The player who possesses complete knowledge about the state of the nature is known as the informed player and the one who lacks this kno wledge is called the uninformed player . The smart jammer is modeled as the inf ormed row player , whereas L TE eNode B is modeled as the uninf ormed column player . The informed player deals with the ultimate and subtle tradeoff of exploiting its superior information at the cost of revealing that information via its actions or some other (unavoidable) signals during repeated interactions with other players (cf. [11], [13]). In most game-theoretic literature on repeated games with asymmetric information, the informed player’ s strategy is computed based on how much information it should rev eal for an optimal or suboptimal policy . Furthermore, many informed player zero-sum formulations model the uninformed player as a Bayesian player in order to solve asymmetric games (cf. [16] - [20]). Howe ver , relativ ely little work has 1 A repeated game results when an underlying stage game is played over many stages, and a player may take into account its observations about all previous stages when making a decision at each stage [11]. 2 been done to address the optimal strategy computation of the uninformed player in an infinite-horizon repeated zero-sum game with asymmetric information [21]. The main difficulty arises from the fact that the uninformed player lacks complete knowledge about the state of nature and informed player’ s belief state, which plays a crucial role in determining players’ payoffs and strategies. Howe ver , it has been shown in [14] that the uninformed player’ s security strategy does exist in finite-horizon and infinite-horizon games with discounted and bounded average cost formulations. Furthermore, it has been shown in [22] that the uninformed player’ s security strategy does not depend on history of its own actions. This article attempts to solve the abov e-mentioned L TE network vs. smart jammer game by constructing ef ficient linear programming (LP) formulations for both players’ “approxi- mated” strategy computation and unique “expected” strategy computation for the eNode B. The informed player’ s (the smart jammer) security strategy (optimal strategy in the worst-case scenario) only depends on the history of its own actions and is independent of the other player’ s actions. The smart jammer models the uninformed player as a Bayesian player, making Bayesian updates with an ev olving belief state. Howe ver , in order to solve the infinite-horizon game ef ficiently , fixed-sized sufficient statistics are needed for both players that do not grow with the horizon. The ev olving belief state serves as sufficient statistics for the informed player in a λ -discounted asymmetric repeated game. On the other hand, the uninformed player’ s (the eNode B) security strategy does not depend on the history of its own actions, b ut rather depends on the history of the informed player’ s actions. Howe ver , the uninformed player does not have access to the informed player’ s belief state and needs to find different fixed-sized sufficient statistics. Fortunately , the uninformed player’ s security strategy in the dual game depends only on a fixed-sized suf ficient statistics that is fully av ailable to it. Furthermore, the uninformed player’ s security strategy in the dual game, with initial worst- case regret vector , also serves as its security strategy in the primal game. Therefore, initial worst-case regret of its security strategy and its anti-discounted update (which is the same size as the cardinality of system state) is used as the fixed- sized sufficient statistics for the uninformed player . Although the abov e-mentioned suf ficient statistics are fixed-sized for both players in an infinite-horizon game, the optimal security strategy computation in λ -discounted asymmetric game is still hard to compute because of non-conv exity [23]. Consequently , “approximated” security strategies based on an approximated optimal game value with guaranteed performance are com- puted for both players that are based on a recent work by Li and Shamma [20]. The above-mentioned “approximated” security strategies require full monitoring 2 . Since eNode B cannot observe smart jammer’ s actions with complete certainty , a unique “expected” strategy formulation is also presented for the uninformed player (the eNode B) that does not require full monitoring. Both the smart jammer and the L TE eNode B exploit the 2 Full monitoring requires that all players are capable of observing previous actions of their opponents with certainty after each stage [11]. “approximated” and “expected” formulations to compute their sub-optimal yet efficient strategies in order to maximize their corresponding utilities. The main idea of the zero-sum formu- lation is to find an (almost) optimal yet tractable strategy for both players in the worst-case scenario. It is to be noted here that the smart jammer is the maximizer in the zero-sum game and the eNode B is the minimizer . Also, full monitoring is only limited to observing actions of the opponent and, hence, does not rev eal any priv ate information of ordinary UEs in the network to the smart jammer . A. Related W ork Game theory (cf. [11]–[15], [24], [25]) provides a rich set of mathematical tools to analyze and address conflict and cooperation scenarios in multi-player situations, and as such has been applied to a multitude of real-world situations in economics, biology , cyber security , multi-agent networks, wireless networks (cf. [26]–[28]) and more. In this article, the interaction between the L TE network and the smart jammer is modeled as an infinite-horizon zer o-sum r epeated game with asymmetric information . Zero-sum repeated game formulations have been studied extensi vely in the game-theoretic literature, including asym- metric information cases, such as Chapter 5 of [11], Chapter 4 of [13], Chapters 2 - 4 of [14], and Chapter 2 of [25]. Ho we ver , most of the prior work on asymmetric zero-sum repeated games deals with the informed player’ s viewpoint. For exam- ple, [11] and [13] pointed out that the informed player might rev eal its superior information implicitly by its actions and, hence, may want to refrain from certain actions in order not to rev eal that information. In case of full monitoring, playing non-rev ealing strategy for the informed player is equiv alent to not using its superior information [11]. Furthermore, [14] showed that the informed player’ s belief state (conditional probability of the game being played gi ven history of informed player’ s actions) is its sufficient statistics to make long-run decisions. Hence, many informed player’ s strategies (cf. [16] - [20]) use the belief state as their sufficient statistics. Howe ver , [23] showed that computing the optimal value of the infinite- horizon repeated game is non-conv ex and identified com- putational complexities inv olved in solving infinite-horizon games. Therefore, the abov e-mentioned articles approximate the optimal game value via linear programming. On the other hand, limited work has been done for the uninformed player’ s optimal strategy computation as compared to the vast research done for the informed player [21]. It is, ho wev er, kno wn that the uninformed player’ s security strategy exists in infinite- horizon repeated zero-sum games, and that it does not depend on the history of its own actions (cf. [19], [22]). But, effi- cient computation of the uninformed player’ s optimal security strategy is still an open problem. Recently , [21] suggested that the uninformed player could use its expected payoff for each candidate game as sufficient statistics since it is unaware of the game being played. Similarly , [19] used realized vector payoff as the uninformed player’ s sufficient statistics to compute its efficient but suboptimal strategy in finite-horizon zero-sum repeated games. Howe ver , it is to be noted here that all of 3 these formulations are based on commonly-assumed notion of full monitoring in which players can perfectly observe their opponent’ s actions. This article also utilizes the notion of full monitoring for its “approximated” strategy computation. Although there has been quite a lot of work done on infinite-horizon repeated zero-sum games with asymmetric information, there does not exist any tractable non-zero-sum formulations for the uninformed player that can be used for its optimal strategy computation in infinite-horizon asymmetric repeated games. Most of the classic general-sum (non-zero- sum) game-theoretic literature like Chapter 6 of [11] and Chapters V and IX of [15] focus on the characterization and existence of equilibria in repeated games with asymmetric information, and deal with the optimal strategy construction for the full monitoring case. Chapter V of [15] also suggests using appr oachability theory for the construction of the unin- formed player’ s strategy for the full monitoring case. Ho wever , none of these formulations result in efficient computation of the uninformed player’ s optimal strategy . This problem gets further complicated for general-sum (non-zero-sum) games with imperfect monitoring. For example, [29] pointed out that the solution of a general-sum (non-zero-sum) stochastic game with both incomplete knowledge and imperfect monitoring is an open problem and there is no well-established solution av ailable so far . T o the best of our knowledge, that is still the case for repeated as well as stochastic general-sum (non- zero-sum) games (e.g., see [21], [34]). This is one of the main reasons that the L TE network and smart jammer interaction is not modeled as a general-sum (non-zero-sum) game in this paper . Bayesian approaches hav e been widely used to solve asym- metric information problems in which updated belief state can be used as sufficient statistics for the informed player . Belief state often serves as a tool for updating the internal notion of a player’ s knowledge related to another . For example, [16] - [20] modeled the uninformed player as a Bayesian player in order to compute the informed player’ s suboptimal strategies in repeated zero-sum games. Similarly , [30] - [32] used Bayesian approaches to de vise an uninformed player’ s strategy based on expected payof f, and [29] employed Bayesian Nash-Q learning in an incomplete information stochastic game and used Bayes’ formula to update belief of an Intrusion Detection System (IDS). Another commonly used technique to address lack of information problems is state estimation. For example, [33] used a Kalman filter to estimate the state of an observable, linear , stochastic dynamic system in an infrastructure security game. Howe ver , the system of interest and game dynamics in this article are nonlinear and may not be completely- observable. Therefore, the applicability of state estimation techniques is very limited. T o the best of our knowledge, there does not exist any explicit formulations for the optimal strategy computation of the uninformed player in an infinite-horizon repeated zero-sum game with asymmetric information [21]. Howe ver , it has been shown in [14] that the uninformed player’ s security strategy does exist in finite-horizon and infinite-horizon games with discounted and bounded av erage cost formulations. Moreov er , it has been shown in [22] that the uninformed player’ s security strategy does not depend on history of its own actions. Never - theless, a recent LP formulation in [20] provides an efficient technique for the explicit “approximated” strategy computa- tion of the uninformed player in infinite-horizon asymmetric repeated zero-sum games, but with the assumption of full monitoring. Furthermore, there are multiple differences between this article and [9] which is focused only on estimating the jammer type at the beginning of the game and as such strategies com- puted in that article cannot be used for long-term interaction. On the other hand, this article is focused on computing both players’ strategies for very long-term interaction. Moreov er , smart jammer in [9] is modeled as a myopic player (i.e., it only cares about short-term utility) as opposed to being modeled as a strategic player in this article. In addition, the interaction between the smart jammer and the eNode B in [9] is modeled as a general-sum (non-zero-sum) game without perfect monitoring as opposed to zero-sum formulation in this article. The eNode B exploits “jamming sense” part of the algorithm presented in [9] to in v oke strategy computation algorithms discussed in this article. On the other hand, the article [20] written by co-authors Li and Shamma is focused on an approximated yet efficient LP formulation for both players’ strategies in an abstract repeated zero-sum game with perfect monitoring. This “approximated” strategy construction technique is further extended to realistic smart jammer and L TE network interaction in this article. On top of that, a unique “expected” strategy formulation is also explored in the article. Smart jamming problem in L TE networks has been studied extensi vely lately . Ho wev er , to the best of our knowledge, none of the articles published so far studied the smart jamming problem in L TE networks in a game-theoretic manner . I I . S M A RT J A M M I N G I N L T E N E T W O R K S Potential smart jamming attacks and suggested network countermeasures are the same as described in [5]. They are briefly discussed here for the sake of completeness. A. Smart Jamming Attacks on an LTE Network The set of smart jammer’ s pure actions consists of the following jamming attacks 3 : 1) a 1 j = Inactive (no jamming) : corresponds to default UE operation when no jammer is activ e in the network. 2) a 2 j = Jam CS-RS : corresponds to OFDM pilot jam- ming/nulling extensi vely studied in the literature. This action pre vents all UEs from demodulating data channels by prohibiting them from performing coherent demod- ulation, degrades cell quality measurements and blocks initial cell acquisition in the jamming area. 3) a 3 j = Jam CS-RS + PUCCH : corresponds to jamming PUCCH in the UL in addition to DL CS-RS jamming. This action could be more catastrophic for Connected mode UEs as compared to jamming the CS-RS alone due to eNode B’ s loss of critical UL control information but requires more sophisticated dual-band jamming. 3 See [1] or [2] for the description of various L TE channels 4 4) a 4 j = Jam CS-RS + PBCH + PRA CH : corresponds to jamming DL broadcast channel PBCH and UL random access channel PRA CH in addition to pilot nulling. This action is intended to block reselection/handover of UEs from neighboring cells and block synchronization of idle mode and out-of-sync UEs. 5) a 5 j = Jam CS-RS + PCFICH + PUCCH + PRA CH : corresponds to jamming CS-RS and PCFICH in the DL and jamming PUCCH and PRA CH in the UL. This action is intended to cause loss of DL and UL grants, radio link failures and loss of UE synchronization mostly in Connected mode UEs. Although jamming individual control channels may also cause denial-of-service (DOS) attacks, jamming effects may not hav e specific consequences desired by the smart jammer . For example, jamming CS-RS alone could be limited to specific part of a cell depending on the jammer location and its transmit po wer and jamming PBCH alone may only prev ent a small fraction of UEs to not reselect/handover to the cell if they hav e not visited that particular cell recently . Even though concatenation of jamming multiple control channels requires distribution of the jammer’ s transmit power among all jamming activities, it allows the smart jammer to target specific aspects of network operation, such as cell reselec- tion/handov er , data download and upload etc. In addition to the abov e-mentioned pure actions, the smart jammer uses its probability of jamming ( p j ) and transmit power ( P j ) to decide when to jam the network and how much power to use during a particular jamming attack. The duty cycle of each action is also implicitly modeled in the utility function of the jammer . Thus, the smart jammer launches denial-of-service (DoS) and loss of service attacks on the L TE network by employing these actions and can be easily implemented using a software-defined radio (SDR) and a colluding UE. B. Suggested Network Countermeasures It is proposed that the network can use the following (pure) countermeasures in case of a jamming attack: 1) a 1 0 = Normal (default action) : corresponds to default network operation. 2) a 2 0 = Increase CS-RS T ransmit Po wer : corresponds to pilot boosting in order to alleviate CS-RS jamming at the expense of transmitting other channels at lo wer transmit power than normal operation. 3) a 3 0 = Throttle : corresponds to a specific thr eat mech- anism when all activ e UEs’ DL/UL grants (and hence throughputs) are throttled. 4) a 4 0 = Change eNode B f c + SIB 2 : corresponds to a specific interfer ence avoidance mechanism when the network “relocates” its carrier frequenc y f c to a dif ferent carrier within its allocated band/channel and rearranges itself into a lo wer occupied bandwidth configuration. It also changes its PRA CH configuration parameters in SIB 2 to alleviate PRA CH jamming. 5) a 5 0 = Change eNode B T iming : corresponds to a specific interfer ence avoidance mechanism in which network “resets” its frame/subframe/slot/symbol timing and SIB 2 parameters. Ongoing data sessions would be handed ov er to neighboring cells before the “reset” and the cell would not be av ailable during transition. It is to be noted here that the above-mentioned countermea- sures do not require any exogenous information or significant changes in 3GPP specifications and can be implemented easily with current technology . Furthermore, the network is not aw are of jammer’ s location, jamming wav eform, and its probability of jamming p j . Also, the average duty cycle and eNode B’ s transmit power ( P 0 ) determine the power consumption of the network, modeled in its utility function. The curious reader is encouraged to see [9], [34] for further details on smart jammer actions and network countermeasures. I I I . LT E N E T W O R K & S M A R T J A M M E R D Y NA M I C S A. Network Model The network model used in this article is the same as dev eloped in [9] and is briefly discussed here for the sake of completeness. UEs arriv e in the cell according to a homogeneous 2D Stationary Spatial P oisson P oint Pr ocess (SPPP) with the rate Λ per unit area and are uniformly distributed over the entire cell conditioned on the total number of users N . The large-scale path loss is modeled using the Simplified P ath Loss Model [35]. P r ( dB m ) = P t ( dB m ) + K ( dB ) − 10 γ log 10  d d 0  (1) where P r is the receiv ed power , P t is the transmitted po wer , K ( dB ) = 20log 10  λ 4 π d 0  is a constant, γ is the path loss exponent , d is the distance between the transmitter and receiv er, and d 0 is the outdoor reference distance for antenna far field. The small-scale multipath fading is modeled using exponentially distributed Rayleigh-faded channel gains at each subcarrier . Thus, the instantaneous SINR Γ[ k ] of a particular OFDM subcarrier k is modeled as follows: Γ[ k ] = P 0 [ k ] | h | 2 K ( R 0 d 0 ) − γ σ 2 + P j [ k ] | g | 2 K ( R j d 0 ) − γ (2) where P 0 and P j are desired and jammer transmit powers, | h | 2 and | g | 2 are exponentially distributed Rayleigh-faded channel gains, R 0 and R j are large-scale distances from desired transmitter and jammer respecti vely , γ is the path loss exponent, and σ 2 is the noise variance at the receiv er . It is assumed that Inter-Cell Interfer ence (ICI) is independent of jamming and, hence, any residual ICI can be lumped together in the noise v ariance σ 2 for the scope of this article. It is further assumed that σ 2 is the same at all receiv ers. The SINR in (2) can be re-written in terms of the Carrier- to-J ammer ratio C /J as follo ws: Γ[ k ] = ( C J ) | h | 2 K ( R 0 d 0 ) − γ ( σ 2 P j [ k ] ) + | g | 2 K ( R j d 0 ) − γ (3) 5 Equations (2) and (3) are used to model the SINR of nar- rowband flat-faded signals and channels like CS-RS, PCFICH, PUCCH etc.. Howe ver , wideband channels like PDSCH and PUSCH cannot be modeled using (2) or (3). Furthermore, SINR estimation is done in the frequency domain. In addition, the L TE network’ s m th user’ s DL PDSCH throughput R m ( k , l ) in the k th resource block during the l th subframe is modeled as a fraction  ∈ (0 , 1] of Shannon’ s A WGN Channel Capacity as described in (4). R m ( k , l ) =  W RB log 2 [1 + Γ PDSCH m ( k , l )] (4) where W RB is the bandwidth of a single RB i.e. 180 kHz. For the purposes of this article, it is assumed that  = 1 . The m th user’ s total throughput in a given subframe is the sum of its assigned RBs’ throughput for that particular subframe. It is modeled that the eNode B uses a Proportional Fair Scheduling (PFS) [36] algorithm to allocate resources to its users. User m is allocated in resource block k during the l th subframe if the ratio of its achiev able instantaneous data rate and long-term average throughput in (5) is the highest among all the activ e users in the network. The long-term average throughput of user m , R m ( l ) during subframe l is computed using the recursi ve equations (5) and (6) belo w: ˆ m k = arg max m 0 =1 ,...,N  R m 0 ( k , l ) R m 0 ( l )  (5) R m ( l ) =  1 − 1 t c  R m ( l − 1) + 1 t c K X k =1 R m ( k , l ) I ( ˆ m k = m ) (6) where t c represents fairness time window , and I is an indicator function. In general, the ov erall L TE network dynamics can be modeled as a highly nonlinear dynamical system described by: χ + = f ( χ, θ, a 0 , a j , ω ) (7) where χ ∈ R M × K represents state of the network (not to be confused with the game-theoretic state of nature θ ) with each row corresponding to the user m ∈ M , including K elements for each user (such as, SINRs Γ m of its control and data channels, and average throughput for user m ∈ M ); θ represents the game-theoretic state of nature (jammer type) described in the next section; a 0 ∈ A 0 represents eNode B action; a j ∈ A j represents jammer’ s action and ω character- izes the randomness in the network induced by the channel, arbitrary user locations, varying transmit power le vels, PFS scheduling and other sources of randomness in the network. Thus, the network dynamics are modeled as a P artially- Observable Markov Decision Pr ocess (POMDP) . Evidently , it is nontrivial and intractable to model the L TE network and smart jammer dynamics analytically . Hence, these abstracted dynamics are simulated in MA TLAB without losing any modeling fidelity . Although this article can be used as a building block for more complicated scenarios, multi-cell and multi-jammer scenarios are beyond the scope of this article. B. Game-Theor etic Model Notations: For the rest of this paper, the following mathematical notations are used. Let A be a finite set. Its cardinality is denoted by |A| , and the set of all probability distributions ov er A is indicated by ∆( A ) . 1) Game Model: The interaction between the L TE network and the smart jammer is modeled as a strictly competitiv e infinite-horizon zero-sum repeated Bayesian game G with asymmetric and incomplete information , with the smart jammer as the informed (row) player and the eNode B as the uninformed (column) player . Infinite-horizon games are used to model situations in which horizon length is not fixed in advance and there is a non-zero probability at the end of each stage that the game will continue for next stage (e.g., see [12]). The game G is described by • N = { smart jammer , eNode B } , the set of players, • Θ , the set of states of nature (jammer types), • p 0 ∈ ∆(Θ) , the prior probability distribution on Θ , which is common kno wledge, • A j and A 0 , the set of pure actions of the smart jammer and the eNode B, respecti vely as described in Section II, where a j ∈ A j and a 0 ∈ A 0 represent corresponding elements in these sets, • H , a set of sequences such that each H ∈ H is a history of observations, • I i , the information partition of player i and • U i : Θ × A j × A 0 → R , the single-stage utility function of player i , U θ i ∈ R |A j |×|A 0 | the utility matrix of player i gi ven jammer type θ whose element U θ i ( a j , a 0 ) is U i ( θ , a j , a 0 ) , ∀ a j ∈ A j , a 0 ∈ A 0 . Follo wing the conv ention used in game-theoretic literature including [20], the informed player , i.e., the smart jammer is played as the maximizer (row player) , whereas the uninformed player, i.e., the eNode B is played as the minimizer (column player) . It is to be noted here that ordinary user equipments (UEs) are not modeled as players in this article. 2) J ammer T ypes: The type θ ∈ Θ of smart jammer is classified as: • T ype I: Cheater • T ype II: Saboteur The type Cheater is used to model the jammer with the intent of getting more resources for itself as a result of reduced competition among UEs. Thus, a cheating UE is always present in the network with an activ e data session. On the other hand, the type Saboteur is used to model the jammer with the intent of causing highest possible damage to the network resources. Thus, a sabotaging UE may be unattached to the network. It is to be noted here that the “Normal (inactive)” jammer type is not modeled here because jammer is not present in that state and it is in the best interest of the network to play default normal action in that case. The strategy algorithms presented in this article are inv oked when 6 jammer is present in the network and/or when jamming is sensed by the network using “jamming sense” part of the algorithm presented in [9]. It is to be noted here that absence of jamming does not imply jammer’ s absence as an active smart jammer may also decide to play “inactive” action as part of its strategy during an attack. 3) Strate gies: Both the network and the jammer are modeled as rational and strategic. By definition, a pur e strate gy of a player is a mapping from each non-terminal history to a pure action and a mixed strate gy is a probability measure ∆ ov er the set of its pure strategies. A behavioral strate gy specifies a probability measure ∆ ov er its available actions at each stage when an action needs to be taken [12]. Also, the best r esponse (BR) is the strategy (or strategies) that produces the most fa vorable outcome for a player given other players’ strategies [24]. T wo types of suboptimal security strategies for infinite-horizon game are presented in this article, as discussed in Section V . 4) Information P artitions: The jammer is informed of its own type θ . Ho wev er, eNode B is only informed about the prior probability distribution p 0 ∈ ∆(Θ) . This results in a game with asymmetric information , with lack of information on the network side, making eNode B the uninformed player . 5) Observable Signals: It is assumed that for the “approximated” strategy computation, players can observe each other’ s actions with certainty after each stage, i.e. full monitoring requirements are satisfied. This is a very widely used assumption used in classic and modern game- theoretic literature (e.g., Chapter 6 of [11]). The network can distinguish between smart jammer’ s different actions at high SNR and can make reasonable estimates at low SNR. Howe ver , the imperfect monitoring case is beyond the scope of the “approximated” formulation presented in this article. On the other hand, the “expected” strategy formulation for repeated games does not require any full monitoring. 6) Utilities: Both players’ utility functions are based on their key performance indicators (KPIs) and are defined to re- flect a strictly competitive (zero-sum) setting, i.e., one player’ s gain is the other player’ s loss as described by: U 0 = −U j (8) When the system state is Cheater , the zero-sum utility function is simplified as U c j ( a j , a 0 ) = α R c E w h δ ( R norm c )( a j , a 0 ) i − α N c E w h N norm c ( a j , a 0 ) i (9) where δ ( R norm c ) represents change in the Cheater ’ s normalized av erage throughput from the baseline scenario, α R c represents its corresponding weight, N norm c represents the normalized av erage number of Connected mode UEs in the network when the Cheater is present, α N c represents its corresponding weight and E w represents expectation with respect to random- ness caused by w as mentioned in (7). The Cheater tries to maximize (9) in order to increase its throughput from the baseline scenario and reduce the number of Connected mode UEs in the network which at the same time reduces the competition for limited network resources. The eNode B, on the other hand, tries to minimize (9) to do the opposite, hence, creating a proper zero-sum game. Similarly , the zero-sum utility function is defined in (10) when the system state is Saboteur . U s j ( a j , a 0 ) = − α N s E w h N norm s ( a j , a 0 ) i − α R eNB E w h R norm eNB ( a j , a 0 ) i (10) where N norm s represents the normalized average number of Connected mode UEs in the network when Saboteur is present, α N s represents its corresponding weight, R norm eNB represents eNode B’ s normalized av erage throughput/UE, α R eNB rep- resents its corresponding weight and E w , again, represents the expectation with respect to randomness caused by w as mentioned above. The Saboteur tries to maximize the opposite (negati ve of) eNode B utility defined in terms of average number of Connected mode users and average throughput/UE, hence, defining the zero-sum game. Note that there are no “unilateral” fixed costs associated with either player in the abov e-mentioned zero-sum construc- tion. This means that the game would be played without modeling higher “fidelity” parameters like players’ duty cycles and implicit cost associated with eNode B actions like ’f Change’ and ’T iming Change’ . Howe ver , this “fidelity” loss does not affect the inherent nature of smart jammer and net- work interaction and, hence, can be discounted. It is also to be noted here that the utility functions are different for dif ferent jammer type, which is a common phenomenon in Bayesian games. Furthermore, the ke y performance indicators (KPIs) are functions of observable parameters only , for example, eNode B’ s utility is a function of parameters observed from Connected Mode UEs. Interestingly , none of the players need to compute their utilities explicitly as it is not used to make strategy decisions in repeated games as discussed later in Section V . The payoffs are received by both players as a result of the interaction between the smart jammer and the network. Even though the network does not know the jammer type, it can compute expected utility for minimization based on the prior (and updated belief) probability of specific jammer type presence. 7) Game Play: At the beginning of the game, nature flips a coin and selects θ ∈ Θ (jammer type) according to p 0 ∈ ∆(Θ) , which remains fixed for the rest of the game. The jammer is informed about its selected type but eNode B is not. Howe ver , in a repeated game , eNode B’ s history of interaction with the jammer e volv es with time which may affect its belief about θ . 7 I V . S I N G L E - S H O T G A M E The single-shot game is played between the smart jammer as the maximizer (row player) and the network as the mini- mizer (column player). The maxmin value for the row player for giv en state θ is denoted by v ; whereas the minmax value for the column player is denoted by v . It is widely known that v ≤ v is always true. Howe ver , when v = v is satisfied, then the game is said to have a value v = v = v . The legendary von Neumann’s celebrated Minmax Theorem states that any matrix game has a value v in mixed strategies and the players hav e optimal strategies [14], i.e., the minmax solution of a zero-sum game is the same as the Nash equilibrium . Both players play their security strategies in a zero-sum game to guarantee the best outcome under the worst conditions, due to the game’ s strictly competitiv e (zero-sum) nature. The single-shot game simulation results are obtained from a Monte-Carlo simulation of L TE network and smart jammer dynamics as dicussed in Section III. The following parameters are used for our simulations: • Carrier-to-jammer power ratio: C /J = 0 dB, • Probability of jamming: p j = 1 . 0 , • W eight of no. of Connected UEs for T ype I: α N c = 4 , • W eight of no. of Connected UEs for T ype II: α N s = 5 , • W eight of average throughput for T ype I: α R c = 5 , • W eight of average throughput for T ype II: α R eNB = 4 . The following simulation results are obtained for the single- shot game when the jammer type is Cheater . The first element of the utility matrices represent the baseline scenario when no jammer is acti ve in the network and network is playing its default normal action. U c j =       − 1 . 0000 − 1 . 0239 − 2 . 2464 − 1 . 3840 − 1 . 0000 − 0 . 9642 − 1 . 0029 − 2 . 2130 − 1 . 3398 − 0 . 9642 − 0 . 8016 − 0 . 8239 -2.0553 − 1 . 1366 − 0 . 8016 − 0 . 9714 − 1 . 0078 − 2 . 2212 − 1 . 3525 − 0 . 9714 − 0 . 8181 − 0 . 8399 − 2 . 0716 − 1 . 1610 − 0 . 8181       Similarly , the simulation results for the single-shot game when the jammer type is Saboteur are presented below . U s j =       − 1 . 0000 − 0 . 9933 − 0 . 5635 − 0 . 9128 − 1 . 0000 − 0 . 9879 − 0 . 9805 − 0 . 5446 − 0 . 9022 − 0 . 9898 − 0 . 9905 − 0 . 9805 − 0 . 4578 − 0 . 8849 − 0 . 9867 − 0 . 9900 − 0 . 9827 − 0 . 5498 − 0 . 9050 − 0 . 9919 − 0 . 9895 − 0 . 9800 − 0 . 4666 − 0 . 8880 − 0 . 9875       For the complete information case when the network is aware of the jammer type Cheater , the game has a single pure strategy Nash Equilibrium , ( a j ∗ , a 0 ∗ ) = (’Jam CS-RS + PUCCH’, ’Thr ottling’) , with the game v alue v = − 2 . 0553 , satisfying the follo wing equation. U c j ( a j ∗ , a 0 ∗ ) = min a 0 ∈A 0 U c j ( a j ∗ , a 0 ) = max a j ∈A j U c j ( a j , a 0 ∗ ) For the complete information case when the network is aware of the jammer type Saboteur , the game does not hav e any pure strategy Nash Equilibrium. If the players are allowed to use mixed strategies, i.e., a probability distribution over a player’ s action set, then there exists a mixed strategy Nash Equilibrium ( x ∗ , y ∗ ) , where x ∗ = [0 0 . 51 0 0 0 . 49] T ∈ ∆( A j ) , and y ∗ = [0 . 59 0 0 0 0 . 41] ∈ ∆( A 0 ) with the game value v = − 0 . 9887 , satisfying the following equation. This mixed strategy probability distribution loosely translates to playing (’J am CS-RS’, ’Jam CS-RS + PCFICH + PUCCH + PRA CH’) and (’Normal’, ’T iming Change’) equally likely by the jammer and the eNode B respectiv ely . E x ∗ ,y ∗ ( U s j ( a j , a 0 )) = min y ∈ ∆( A 0 ) E x ∗ ,y ( U s j ( a j , a 0 )) = max x ∈ ∆( A j ) E x,y ∗ ( U s j ( a j , a 0 )) where E x,y ( U s j ( a j , a 0 )) = x T U s j y is the expected value of the single-stage utility given mixed strategies x and y . Given the utility matrix, linear programming is used to compute the Nash Equilibirum [12] with x ∗ and y ∗ , and the game value v . Howe ver , in the asymmetric information case, eNode B only knows the probability distribution p 0 ov er jammer’ s types which is public information, while the jammer knows exactly its own type. Knowing its own type, the jammer can use a different strategy for dif ferent states θ . Therefore, in the asymmetric game, jammer’ s mixed strategy x is a mapping from Θ to ∆( A j ) . The single-shot asymmetric game still has a mixed strategy Nash Equilibrium ( x ∗ , y ∗ ) , where x ∗ ∈ ∆( A j ) | Θ | and y ∗ ∈ ∆( A 0 ) satisfy E p 0 ,x ∗ ,y ∗ ( U θ j ( a j , a 0 )) = min y ∈ ∆( A 0 ) E p 0 ,x ∗ ,y ( U θ j ( a j , a 0 )) = max x ∈ ∆( A j ) | Θ | E p 0 ,x,y ∗ ( U θ j ( a j , a 0 )) where E p 0 ,x,y  U θ j ( a j , a 0 )  = P θ ∈ Θ p θ 0 x θ T U θ j y is the ex- pected value of the single-stage utility giv en the initial prob- ability p 0 and mixed strategies x and y . It is to be noted here that the utility functions are common knowledge. Although eNode B is unaware of the jammer type, it knows that the utility function is either U c j or U s j giv en that the jammer is present in the network. For a given prior probability p 0 , the eNode B can ev aluate its expected utility E p 0 ,x,y  U θ j ( a j , a 0 )  whose minmax value (i.e., the game value) is a function of prior probability p 0 . Since, p 0 is fixed, the game value shall also remain fixed. The Nash Equilibrium for the asymmetric information game can be computed by solving an LP by setting the time horizon to a single stage [37]. V . I N FI N I T E - H O R I Z O N A S Y M M E T R I C R E P E A T E D G A M E S T R A T E G Y A L G O R I T H M S The repetition of a zero-sum game in its basic form does not warrant further study as the players can play their optimal security strategies i.i.d. at each stage to guarantee an optimal game value [14]. Ho wev er, in repeated asymmetric games, playing the optimal strate gy in a single-stage asymmetric game i.i.d. at each stage does not guarantee the player the optimal game value [14]. Therefore, the repeated game needs to be studied further . 8 It is assumed that both players’ actions are publicly known at the end of each stage. The jammer’ s action history at stage t ≥ 1 is H j t = { a j 1 , a j 2 , . . . , a j t − 1 } , and H j t denotes the set of all possible action histories of the jammer at stage t . Similarly , eNode B’ s action history at stage t ≥ 1 is H 0 t = { a 0 1 , a 0 2 , . . . , a 0 t − 1 } , and the set of all possible action histories of eNode B at stage t is denoted by H 0 t . Since the optimal strategies of both players do not depend on the action history of the uninformed player eNode B [14], the behavior strategy σ t of the jammer at stage t is defined as a mapping from Θ × H j t to ∆( A j ) , and the behavior strategy τ t of eNode B at stage t is a mapping from H j t to ∆( A 0 ) . The behavior strategies of jammer and eNode B are denoted by σ = ( σ t ) ∞ t =1 and τ = ( τ t ) ∞ t =1 , and the set of all possible behavior strategies are denoted by Σ and T , respectiv ely . This article considers a λ -discounted utility function as the ov erall utility function in the infinite horizon game. The λ - discounted utility function is commonly used in both classic and modern game-theoretic literature (cf. [11], [13]–[15], [18]). The main reason for selecting discounted utility formula- tion is its guarantee of con ver gence in infinite-horizon games. If av erage utility formulation is selected then the ov erall payoff is taken as a limit, which may or may not exist. Also, uniformity conditions are required for equilibrium in av erage utility formulations [11]. It is also shown in [14] that as λ → 0 , the game value of a discounted infinite-horizon game con verges to that of an av erage rew ard game. It is to be noted here that λ is merely a mathematical constant dictating the discount factor and does not represent any physical quantity like received signal strength or SNR etc. The λ -discounted utility function is defined as follows: u λ ( p 0 , σ, τ ) = E p 0 ,σ,τ " ∞ X t =1 λ (1 − λ ) t − 1 U ( θ , a j t , a 0 t ) # (11) Discounted utility represents the idea that players often focus more on current rew ard and apply a discount fac- tor of (1 − λ ) to future rew ard. Based on the discounted utility function, jammer has a security lev el V ( p 0 ) = max σ ∈ Σ min τ ∈T u λ ( p 0 , σ, τ ) which is the maximum utility it can get in the game if eNode B always plays the best response strategy . The strategy σ ∗ that guarantees this value no matter what strategy eNode B plays is called the jammer’ s security strategy . Similarly , eNode B’ s security level is defined as ¯ V ( p 0 ) = min τ ∈T max σ ∈ Σ u λ ( p 0 , σ, τ ) , and the strategy τ ∗ that guarantees the security lev el no matter what strategy jammer plays is called eNode B’ s security strategy . If the security levels of both players are the same, which is true in our case, it can be said that the game has a value V λ ( p 0 ) , and there exists a Nash Equilibrium. This article is concerned with the security strategies of both players. Howev er, in our case, the system has multiple states and the game is played with the lack of information on one side. Li et al. showed that the security strategies for both the players in finite-horizon asymmetric informa- tion repeated zero-sum games depend only on the informed player’ s history actions [19]. For the infinite-horizon games, this would imply utilizing large amount of memories to record the history actions. It is, therefore, necessary for the players to find fixed-size sufficient statistics for decision making in λ -discounted infinite-horizon games. Howe ver , it is still non- trivial to compute optimal security strategies ev en with fixed- size sufficient statistics. Therefore, Li & Shamma provided approximated security strategies with guaranteed performance to solve infinite-horizon games [20]. A. The Smart Jammer’ s Appr oximated Security Strate gy Algo- rithm Since jammer’ s behavior strategy depends on the type of the jammer , the type of the jammer may be rev ealed through the action history of the jammer . The re velation is characterized by the conditional probability p t ov er Θ conditioned on jammer’ s action history H j t , which is updated as follows p θ t +1 ( H j t +1 ) = π ( p t , x t , a j t ) = p θ t ( H j t ) x θ t ( a j t ) ¯ x p t ,x t ( a j t ) (12) with p 1 = p 0 and ¯ x p t ,x t ( a j t ) = P θ ∈ Θ p θ t ( H j t ) x θ t ( a j t ) repre- sents weighted average of x t . The conditional probability p t is also called the belief state which is the sufficient statistics for the informed player, the jammer, to make a decision at stage t . It was shown in [14] that the informed player has a stationary security strategy that only depends on p t , and the game value V λ ( p 0 ) satisfies the following recursiv e equation V λ ( p 0 ) = max x ∈ ∆( A j ) | Θ | min y ∈ ∆( A 0 ) h λ X θ ∈ Θ p θ 0 x θ T U θ y + (1 − λ ) T p 0 ,x ( V λ ) i (13) where x θ represents jammer’ s behavioral strategy giv en state θ , y represents eNode B’ s behavioral strategy , and T p,x ( V λ ) = P a j ∈A j ¯ x p,x ( a j ) V λ ( π ( p, x, a j )) . Although the game value V λ ( p t ) from a stage t, ∀ t 6 = 1 to the end of the game changes as p t ev olves with time, the game value V λ ( p 0 ) from the first stage t = 1 to the end of the game remains the same as p 0 is fixed. It is non-conv ex to compute V λ ( p ) and the corresponding security strategies [23] and, therefore, an approximated strat- egy is proposed. The basic idea is to use the game v alue V λ,T ( p ) of a T stage λ discounted asymmetric repeated game to approximate the game value V λ,T ( p ) , and compute the approximated security strategy based on the approximated game value V λ,T ( p ) . Define the jammer’ s stationary behavior strategy as ¯ σ : Θ × ∆(Θ) → ∆( A j ) . The approximated stationary security strategy is ¯ σ (: , p ) = arg max x ∈ ∆( A j ) | Θ | min y ∈ ∆( A 0 ) h λ X θ ∈ Θ p θ x θ T U θ y + (1 − λ ) T p,x ( V λ,T ) i (14) where ¯ σ (: , p ) is an |A j | × | Θ | matrix whose θ th column is ¯ σ λ,T ( θ , p ) . Furthermore, Li & Shamma constructed a linear program to compute the approximated game v alue V λ,T +1 ( p ) and 9 corresponding approximated security strategy ¯ σ ( θ, p ) . It was shown that V λ,T +1 ( p ) satisfies the following linear program in the λ -discounted zero-sum asymmetric game Γ λ ( p ) : V λ,T +1 ( p ) = max q ,l   T +1 X t =1 X H j t ∈H j t λ (1 − λ ) t − 1 l H j t   (15) s.t. X θ ∈ Θ ,a j ∈A j q t +1  θ , ( H j t , a j )  U θ a j , : ≥ l H j t 1 T , (16) ∀ t = 1 , 2 , ....., T + 1 , H j t ∈ H j t (17) q 1 ( H j 1 ; θ ) = 1 , ∀ θ ∈ Θ , (18) X a j t ∈A j q t +1 (( H j t , a j t ); θ ) = q t ( H j t ; θ ) , ∀ θ ∈ Θ , H j t ∈ H j t , ∀ t = 1 , . . . , T , (19) q t ( H j t ; θ ) ≥ 0 , ∀ θ ∈ Θ , H j t ∈ H j t , ∀ t = 2 , . . . , T + 1 , (20) where q ∈ Q is a set of all properly dimensioned real vectors, L is a properly dimensioned real space, and ( H j t , a j t ) corresponds to a concatenation. The approximated security strategy is ¯ σ a j ( θ , p ) = q ∗ 2 ( a j ; θ ) , ∀ a j ∈ A j (21) where q ∗ 2 is the optimal solution of the linear program in (15) - (20). The curious reader is encouraged to see [20] for further details. 1) The Algorithm: The LP-based algorithm for the in- formed player to compute its approximated security strategy and update belief state in λ -discounted asymmetric repeated game is presented as follows [20]: 1) Initialization: a) Read pay off matrices U , prior probability p 0 , and system state θ . b) Set receding hor iz on length T . c) Let t = 1 , and p 1 = p 0 . 2) Compute the inf ormed pla yer’ s appro ximated secu- rity strategy ¯ σ λ,T based on (21) with p = p t . 3) Choose an action a j ∈ A j according to the proba- bility ¯ σ λ,T ( θ , p t ) , and announce it publicly . 4) Update the belief state p t +1 according to (12). 5) Update t = t + 1 and go to step 2. B. The eNode B’ s Appr oximated Security Strate gy Algorithm The uninformed player does not hav e access to the informed player’ s strategy or belief state p t , therefore, p t cannot serve as its sufficient statistics. The suf ficient statistics of the unin- formed player, eNode B, was shown to be the anti-discounted regret which will be explained further . The regret µ θ t ( H j t ) in state θ is defined as the difference between the expected realized utility so far and the security level of eNode B’ s security strategy , giv en state θ , i.e., µ θ t ( H j t ) = E τ t − 1 X s =1 λ (1 − λ ) s − 1 U j ( a j s , a 0 s ) | θ , H j t ! − µ θ ∗ where µ θ ∗ = max σ ( θ ) ∈ Σ( θ ) E σ ( θ ) ,τ ∗ ∞ X s =1 λ (1 − λ ) s − 1 U j ( a j s , a 0 s ) | θ ! , τ ∗ is eNode B’ s security strategy , σ ( θ ) indicates jammer’ s behavior strategy giv en θ ∈ Θ , and Σ( θ ) is the corresponding set including all σ ( θ ) . The anti-discounted regret is defined as w θ t ( H j t ) = µ θ t ( H j t ) (1 − λ ) t − 1 , ∀ θ ∈ Θ , and is updated according to w θ t +1 ( H j t , a j t ) = w θ t ( H j t ) + λ U θ ( a j t , :) τ ( H j t ) 1 − λ , ∀ θ ∈ Θ , (22) where U θ ( a j t , :) is the a j t th row of matrix U θ . Computation of the security le vel µ ∗ of eNode B’ s security strategy is non-con vex [20]. Therefore, an approximated secu- rity level µ θ? is used, which is the security lev el of eNode B’ s security strategy giv en state θ in T -stage λ -discounted asym- metric repeated game. The approximated security lev el µ θ? is computed according to the follo wing linear program: min y ∈ Y,l ∈ R | Θ | X θ ∈ Θ p θ l θ (23) s.t. T X t =1 λ (1 − λ ) t − 1 U θ ( a j t , :) y H j T ≤ l θ , ∀ θ ∈ Θ , ∀ H j T ∈ H j T , a j t ∈ A j , (24) 1 T y H j t = 1 , ∀ H j t ∈ H j t , ∀ t = 1 , ...., T , (25) y H j t ≥ 0 , ∀ H j t ∈ H j t , ∀ t = 1 , ...., T (26) where Y is properly-dimensioned real space. The approxi- mated security lev el is µ θ? = l ∗ , where l ∗ is the optimal solution to the LP problem (23-26). The eNode B has a stationary security strategy that only depends on the anti-discounted regret w t [20]. Define eNode B’ s stationary behavior strategy as ¯ τ : R | θ | → ∆( A 0 ) . Computation of the stationary security strategy of eNode B is non-con ve x [20]. Therefore, an approximated stationary security strategy ¯ τ ( w ) of eNode B is proposed in [20], which can be computed by solving the following LP problem. min y ∈ Y,l ∈ R | Θ | ,L ∈ R L (27) s.t. w + l ≤ L 1 , (28) 10 s.t. T +1 X t =1 λ (1 − λ ) t − 1 U θ ( a j , :) y H j T +1 ≤ l θ , ∀ θ ∈ Θ , ∀ H j T +1 ∈ H j T +1 , a j ∈ A j , (29) 1 T y H j t = 1 , ∀ H j t ∈ H j t , ∀ t = 1 , ...., T + 1 , (30) y H j t ≥ 0 , ∀ H j t ∈ H j t , ∀ t = 1 , ...., T + 1 (31) where Y is properly-dimensioned real space. The uninformed player’ s approximated security strategy ¯ τ ( w ) is y ∗ H j 1 . The curious reader is encouraged to see [20] for further details. 1) The Algorithm: The LP-based algorithm for the unin- formed player to compute its approximated security strategy in λ -discounted asymmetric repeated game Γ λ ( p 0 ) is presented as follows [20]: 1) Initialization: a) Read pay off matr ices U , and pr ior probability p 0 . b) Set receding hor iz on length T . c) Solve the LP problem (23) - (26) with p = p 0 and let µ ∗ = l ∗ . d) Let t = 1 and w 1 = − µ ∗ . 2) Solve the LP problem (27) - (31) with w = w t , and the uninf or med pla yer’ s approximated security strategy ¯ τ ( w t ) is y ∗ H j 1 . 3) Choose an action a 0 ∈ A 0 according to the proba- bility ¯ τ ( w t ) , and announce it publicly . 4) Read the inf ormed pla yer’ s action, and update the anti-discounted regret w t +1 according to (22). 5) Update t = t + 1 and go to Step 2. It is to be noted here that jamming sense is still required by the network, i.e., the network has to first decide whether it is under jamming attack or not in order to in voke the above- mentioned algorithm. C. The eNode B’ s Expected Security Strate gy Algorithm The “expected strategy” algorithm for the eNode B is defined as a simplex ∆ ov er its complete-information single- shot game security strategies σ 1 with the same probability as prior p 0 . In other words, the eNode B would play the complete-information single-shot security strategies σ 1 | θ = 1 and σ 1 | θ = 2 with the probabilities p 1 0 and p 2 0 respectiv ely . Since, the prior is common knowledge, it alleviates eNode B from “learning” and full monitoring in a repeated game. Thus, the eNode B essentially plays a single-shot strategy in a repeated game b ut without the requirements of full monitoring, which may not be such a bad idea if the jammer plays “non- rev ealing” strategies. Furthermore, the network does not need to observe the jammer’ s action with certainty that leads to more practical implementations. Both discounted and average payoff formulations can be used with this algorithm. It is to be noted here that the expected security strategy algorithm is nov el and not based on a prior work. In the next section, the approximated security strategy and expected security strategy algorithms are used to design strategies for both the smart jammer and the L TE network. The λ -discounted cost formulation is used for both the algorithms in the infinite-horizon game. V I . P E R F O R M A N C E A N A L Y S I S O F R E P E A T E D G A M E S T R A T E G Y A L G O R I T H M S The zero-sum game-theoretic algorithms presented earlier are used to devise “approximated” strategy formulations for both the players in a λ -discounted utility sense. Ho wev er , these algorithms require full monitoring, i.e. the network has to ob- serve jammer’ s action at ev ery stage with certainty . Therefore, the “expected” formulation is devised in which the network being the uninformed player simply plays its single-shot best r esponse in an expected sense, i.e., it would play single-shot Best Response (BR) with the same probability distribution as the prior probability (which is common knowledge) of the jammer occurrence. This enables the network to alleviate full monitoring requirement, i.e., the network does not have to observe the jammer’ s action with certainty and leads to more practical implementations. The performance of both “approximated” and “expected” algorithms for discounted utility formulations is characterized in the follo wing section. Howe ver , not all of the simulation results can be shared here due to space constraints. The fol- lowing parameters were used for both the players in repeated game simulations (in addition to the single-shot case): discount factor λ = 0 . 90 and receding horizon length T = 4 . It is to be noted here that the receding horizon length of T = 4 is chosen for simulation efficiency purposes and almost the same results are obtained at higher v alues of T . A. eNode B vs. Cheater 1) J ammer Strate gy: When the Cheater ( θ = 1 ) is in the network, it always uses its “approximated” algorithm to devise repeated game strategy against the network. Also, being the informed player , there is no ambiguity about the system state so Cheater can decide to rev eal its superior information as much as it suits it. The Cheater’ s steady state belief state p t and repeated game strategy vs. prior probability are shown in Figs. 1 and 2, respectively , where p 1 and p 2 represent updated belief (probability) about the states θ = 1 and θ = 2 , respectiv ely , and a k j represents kth pure action of the Cheater . It is interesting to note that the Cheater always plays the same security (pure) strategy (play a 3 j = ’J am CS-RS + PUCCH’ ) that it uses for a single-shot game, independent of the prior probability . It is also interesting to kno w that Cheater’ s strategies are non-rev ealing 4 , e ven at a relativ ely low prior probability of its occurrence when p 1 0 ≥ 0 . 25 . This means that the network does not “learn” anything new about the jammer type from jammer’ s repeated actions despite 4 The informed player is said to play non-rev ealing at stage n when the posterior probabilities in (12) do not change at that stage if its mixed move at stage n is independent of the state θ ∈ Θ for all values of θ for which p θ n > 0 . In case when full monitoring is assumed, not rev ealing the information is equiv alent to not using that information, [15]. 11 Fig. 1: Cheater’ s Steady State Belief State vs. Prior p 1 0 Fig. 2: Cheater’ s Steady State Approximated Security Strategy vs. Prior p 1 0 full monitoring when p 1 0 ≥ 0 . 25 and Cheater takes full advantage of its superior information. At relativ ely lo w prior probability of Cheater’ s occurrence ( p 1 0 < 0 . 25 ), the jammer rev eals very little information in the first stage when the belief state gets updated to p 0 = [0 . 25 0 . 75] T , but it remains the same after that. For instance, Fig. 3 shows the e voluation of Cheater’ s belief state and its strategy at ev ery stage when the prior probability is p 1 0 = 0 . 05 . This puts the network at a disadvantageous position in the game if the network plays as a Bayesian player, ev en when it can observe jammer’ s actions perfectly at e very stage. 2) eNode B Strate gies: The eNode B’ s steady state “ap- proximated” and “expected” security strategies vs. prior prob- ability p 1 0 are plotted in Figs. 4, and 5, respectiv ely , where a k 0 represents kth pure action of the eNode B. The network’ s strategies (both “expected” and “approximated”) evolv e with varying prior probability lev els as it is the uninformed player . The “approximated” strategy relies on full monitoring and Fig. 3: Cheater’ s Belief & Strategy vs. T ime when p 1 0 = 0 . 05 Fig. 4: eNode B’ s Steady State Approximated Security Strat- egy against Cheater vs. Prior p 1 0 switches to a different strategy at p 1 0 ≥ 0 . 35 , when it starts playing a 3 0 = ‘Thr ottling’ (its security strategy against Cheater in complete-information single-shot game) in addition to playing a 4 0 = ‘Change f c ’ . On the other hand, the “e xpected” algorithm does not rely on full monitoring and, hence, uses an expectation of its single-shot strategies in volving playing mixed strategy ov er ‘Normal’, ‘Thr ottling’ and ‘Change T im- ing’ . The “expected” strategy is pre-computed based on the prior probability and does not change as the game proceeds, whereas the “approximated” algorithm con verges in around 12 stages. The “expected” strategy algorithm may work well enough for the network as the jammer’ s strategies are mostly non-rev ealing and the “approximated” algorithm requires full monitoring. 3) eNode B’ s λ -discounted Utilities: A snapshot of both eNode B and Cheater’ s actions and eNode B’ s utility at e very stage is shown in Fig. 6 for λ = 0 . 90 and p 1 0 = 0 . 05 . It is apparent that the eNode B’ s (hence, Cheater’ s) utility stabilizes very quickly at the beginning of the game - a trend that is observed throughout the repeated game. The eN- ode B’ s “approximated” and “expected” λ -discounted utility values against Cheater are plotted in Fig. 7 at dif ferent prior probability p 1 0 lev els. The “approximated security” algorithm 12 Fig. 5: eNode B’ s Expected Strategy vs. Prior p 1 0 Fig. 6: Players’ Actions & Utility vs. T ime when p 1 0 = 0 . 05 performs almost optimally when p 1 0 ≥ 0 . 35 , whereas the “expected” algorithm performs poorly as compared to the “approximated” algorithm with the exception of low prior values. The “approximated” algorithm uses full monitoring and repeated game linear programming (LP) formulation to compute its strategy and, hence, performs much better than its counterpart. On the other hand, the “expected” algorithm only relies on the prior probability and does not observe the jammer’ s actions and, hence, ends up underperforming ev en when the jammer uses its single-shot security strategy . When prior probability for Cheater’ s occurrence is lo w (i.e., p 1 0 < 0 . 35 ), eNode B strategies fail to ev en come close to the complete-information single-shot value. This happens due to the fact that it is rather unlikely for the Cheater to be present in the network at such low prior v alue and eNode B strategy algorithms are not robust enough to address this problem. B. eNode B vs. Saboteur 1) J ammer Strate gy: Similar to the eNode B vs. Cheater game, Saboteur’ s steady state belief states p t and “approx- imated security” strategies vs. prior probability of its oc- currence p 2 0 are shown in Figs. 8 and 9, respectiv ely . It is very interesting to note that being the informed player, Saboteur plays non-re vealing and “misleading” strategies Fig. 7: eNode B’ s Utility against Cheater vs. Prior p 1 0 ev en at prior probability values as high as p 2 0 = 0 . 75 (this value goes up to p 2 0 = 0 . 85 for λ = 0 . 70 ). It plays its type θ = 1 (Cheater) dominant security strategy (play a 3 j = ’J am CS-RS + PUCCH’ ) while actually being a type θ = 2 (Saboteur) jammer . For example, Fig. 10 shows the ev oluation of Saboteur’ s belief state and its strategy at ev ery stage when the prior probability is p 2 0 = 0 . 80 . At high prior probability values of 0 . 75 < p 2 0 < 0 . 90 , the jammer’ s belief state goes through a transition period because the network forces it to rev eal its true identity by playing a 4 0 with certainty . Hence, the belief state eventually settles down to the completely- rev ealing state of p 2 = 1 . During the transition period when the jammer’ s belief state conv erges to p 2 = 1 , it plays its single-shot security strategy for state θ = 2 , i.e., play a 2 j = ’J am CS-RS’ and a 5 j = ’J am CS-RS + PUCCH + PCFICH + PRA CH’ with almost the same probability . At very high prior probability levels of p 2 0 ≥ 0 . 90 , the state information ( θ = 2 ) is completely revealed and the jammer plays its single-shot security strategy for state θ = 2 as mentioned abov e. Hence, the jammer uses its superior information to its complete adv antage e ven when full monitoring is allowed. This is a good example of the strength of superior information and how it can be exploited in asymmetric games against an adversary . 2) eNode B Str ate gies: Similar to the repeated game against Cheater , the eNode B adapts its repeated game strategy against Saboteur as the game proceeds. From the simulations, eNode B’ s strategy seems to con ver ge in 12 stages. The “expected” strate gy is sho wn in Fig. 11 and is deployed similar to the game against Cheater . Since, the “expected” strategy algorithm is oblivious to the actual jammer type and does not use full monitoring, its mixed strategy does not depend on the system state θ and is played solely based on the prior probability value. On the other hand, the “approximated” security strategy algorithm relies on the repeated game and full monitoring to adapt its strategy . The network’ s steady state “approximated” strategy vs. prior probability p 2 0 is plotted in Fig. 12. As 13 Fig. 8: Saboteur’ s Steady State Belief State vs. Prior p 2 0 Fig. 9: Saboteur’ s Steady State Approximated Security Strat- egy vs. Prior p 2 0 Fig. 10: Saboteur’ s Belief & Strategy vs. Time when p 2 0 = 0 . 80 Fig. 11: eNode B’ s Expected Strategy vs. Prior p 2 0 Fig. 12: eNode B’ s Steady State Approximated Security Strat- egy against Saboteur vs. Prior p 2 0 discussed abov e, the jammer plays completely non-rev ealing and misleading strategies for p 2 0 ≤ 0 . 75 and, hence, eNode B gets tricked into believing that it is playing against Cheater ( θ = 1 ), when in fact it is playing against the Saboteur ( θ = 2 ). This leads the network to play the same strategy that it played against Cheater until p 2 0 ≤ 0 . 70 . At 0 . 70 ≤ p 2 0 ≤ 0 . 75 , the network plays a 4 0 = ’Change frequency’ with certainty to force the jammer to rev eal its state. When the jammer starts playing its security strategy for state θ = 2 at high prior values, the network switches to its own security strategy against Saboteur and plays a 1 0 = ’Normal’ + a 5 0 = ’Change T iming’ . The netw ork also plays a 2 0 = ’Pilot Boosting’ with a very low probability . This trend continues whenev er the network observes (due to full monitoring) the jammer playing its state θ = 2 security strategies. It is curious to see how the network gets tricked by the jammer ev en with full monitoring because it lacks information about the system state. 14 Fig. 13: Players’ Actions & Utility vs. Time when p 2 0 = 0 . 80 3) eNode B’ s λ -discounted Utilities: A snapshot of both eNode B and Saboteur’ s actions and eNode B’ s utility at every stage is shown in Fig. 13 for λ = 0 . 90 and p 2 0 = 0 . 80 . Similar to the game against Cheater, eNode B’ s (hence, Saboteur’ s) utility stabilizes v ery quickly at the beginning of the game. The network’ s λ -discounted utility values for both “approximated” and “expected” security strategy algorithms are plotted against prior probability p 2 0 in Fig. 14. The jammer strategies are mostly non-rev ealing and, hence, eNode B does not seem to “learn” much about the jammer type from its repeated interaction. Therefore, the “approximated security strategy” formulation seems to perform very poorly until p 2 0 < 0 . 70 . At p 2 0 = 0 . 70 , the eNode B switches its strategy to playing a 4 0 = ’Change f c ’ and catches up to the optimal v alue at p 2 0 ≥ 0 . 80 . Obviously , the jammer also uses full monitoring and is forced to come out and play rev ealing strategy at p 2 0 ≥ 0 . 90 . On the other hand, the “expected strategy” algorithm seems to perform better than the “approximated security strategy” as it does not get tricked by the jammer’ s non-revealing strate gies due to its oblivion. It appears that the “expected strategy” algorithm outperforms its counterpart when p 1 0 ≤ 0 . 30 (or equiv alently , p 2 0 ≥ 0 . 70 ) giv en that the Cheater ( θ = 1 ) is present in the network and p 2 0 ≤ 0 . 70 (or equi v alently , p 1 0 ≥ 0 . 30 ) when Saboteur ( θ = 2 ) is present in the network. Thus, it performs better in low prior probability regions, when eNode B does not expect a certain jammer type in the network. Nev ertheless, it becomes clear that the network is at a very disadvantageous position in the game ag ainst the smart jammer due to its lack of information and can be easily misled by the jammer . Furthermore, the “approximated” and “expected” strategy algorithms work in a complementary sense in fav or of the network. V I I . C O N C L U S I O N In this article, the smart jammer and eNode B dynamics are modeled as a strictly competitive (zero-sum) repeated asymmetric game with incomplete information and lack of information on the network side. The solution of a complete- information single-shot g ame is based on very familiar security strategies that lead to a Nash equilibrium. Howe ver , tractable optimal strategy formulations for infinite-horizon asymmetric repeated games do not exist in game-theoretic literature, Fig. 14: eNode B’ s Utility against Saboteur vs. Prior p 2 0 especially for the uninformed player . Therefore, ef ficient LP formulations from a recent work are used for “approximated” security strategy computation for both players that requires full monitoring. Therefore, a simplistic yet ef fecti ve “expected” security strategy algorithm is also devised for the network that does not require full monitoring. This article also presents and discusses performance charac- terization of the above-mentioned algorithms. It turns out that the jammer is able to play non-rev ealing strategies most of the time, which implies that the network is unable to learn any new information about the jammer type in repeated games. Hence, at low prior values, the network performs worse (or equiv alently , smart jammer performs better) in repeated games as compared to the complete-information single-shot game. In the game against the Cheater , the “approximated security strategy” algorithm is able to strategize against the Cheater rather quickly and achiev es its optimal utility because the jammer plays its single-shot game security strategy in repeated game. Howe ver , this advantage goes away in the game against the Saboteur, when the jammer plays misleading strategies for a wide range of prior probabilities. Nevertheless, the network’ s algorithm ev entually catches up and forces the jammer to rev eal its true type. The unique “expected security strategy” algorithm per- forms equally well or sometimes better than its counterpart “approximated security strategy” algorithm against the type Saboteur . This is due to the fact that it does not get duped by the “misinformation” spread by the jammer due to lack of full monitoring, which plays at its advantage. Howe ver , the former algorithm performs better than the latter at low prior probability values against the type Cheater because the smart jammer always plays its single-shot game security strategy . Nev ertheless, the biggest advantage of “expected strategy” algorithm comes from the fact that it does not require full monitoring and, hence, can be easily deployed in practical networks. 15 R E F E R E N C E S [1] 3rd Generation Partnership Project (3GPP): T echnical Specifications; L TE (Ev olved UTRA) and L TE-Advanced Radio T echnology Series (Rel 14) [Online]. A vailable: http://www .3gpp.org/ftp/Specs/latest/Rel-14/ [2] S. Sesia, I. T oufik, and M. Baker (Eds.), LTE - The UMTS Long T erm Evolution: F r om Theory to Practice. (2nd ed.) W est Sussex, UK: Wile y , 2011. [3] The Global mobile Suppliers Association (GSA). (2018, Aug.) Sta- tus of the L TE Ecosystem - August 2018. [Online]. A vailable: http://gsacom.com/ [4] R. P . Jover , J. Lackey , and A. Raghav an, “Enhancing the security of L TE networks against jamming attacks, ” EURASIP Journal on Information Security , 2014, 2014:7. [5] F . M. Aziz, J. S. Shamma, and G. L. St ¨ uber , “Resilience of L TE networks against smart jamming attacks, ” in Pr oc. 2014 IEEE Globecom, Austin, TX, pp. 734-739, Dec. 2014. [6] C. Shahriar, M. L. Pan, M. Lichtman, T . C. Clancy , R. McGwier, R. T andon, S. Sodagari, and J. H. Reed, “PHY -layer resiliency in OFDM communications: a tutorial, ” IEEE Comm. Surveys & Tutorials, vol. 17, no. 1, pp. 292-314, Jan. 2015. [7] F . M. Aziz, J. S. Shamma, and G. L. St ¨ uber , “Resilience of L TE networks against smart jamming attacks: wideband model, ” in Pr oc. 2015 IEEE 26th International Symposium on PIMRC, Hong K ong, China, pp. 1534- 1538, Aug - Sep. 2015. [8] M. Lichtman, R. P . Jover , M. Labib, R. Rao, V . Maroje vic, and J. H. Reed, “L TE/L TE-A jamming, spoofing, and sniffing: threat assessment and mitigation, ” IEEE Comm. Magazine, vol. 54, no. 4, pp. 54-61, Apr . 2016. [9] F . M. Aziz, J. S. Shamma, and G. L. St ¨ uber , “Jammer type estimation in L TE with a smart jammer repeated game, ” V ehicular T ec hnology , IEEE T rans. on, vol. 66, no. 8. 7422-7432, Aug. 2017. [10] A. Garnaev , and W . Trappe, “The ri val might not be smart: revising a CDMA jamming game, ” in Proc. 2018 IEEE W ireless Communications and Networking Confer ence (WCNC), Barcelona, Spain, pp. 1-6, Apr . 2018. [11] R. J. Aumann, and S. Hart (Eds.), Handbook of Game Theory with Economic Applications. vol. 1. Amsterdam, The Netherlands: Elsevier , 1992. [12] M. J. Osborne, and A. Rubinstein, A Course in Game Theory . Cam- bridge: The MIT Press, 1994. [13] R. J. Aumann, and M. Maschler, Repeated Games with Incomplete Information. Cambridge: The MIT Press, 1995. [14] S. Sorin, A F irst Course on Zer o-Sum Repeated Games. vol. 37. Berlin Heidelberg: Springer-V erlag, 2002. [15] J-F . Mertens, S. Sorin, and S. Zamir, Repeated Games. New Y ork: Cambridge University Press, 2015. [16] M. Jones, and J. S. Shamma, “Policy impro vement for repeated zero-sum games with asymmetric information, ” in Pr oc. 51st IEEE Conference on Decision and Control (CDC) 2012, pp. 7752-7757, Dec. 2012. [17] L. Li, and J. S. Shamma, “LP formulation of asymmetric zero-sum stochastic games, ” in Pr oc. 53rd IEEE Confer ence on Decision and Contr ol (CDC) 2014, pp. 1930-1935, Dec. 2014. [18] L. Li, and J. S. Shamma, “Efficient computation of discounted asym- metric information zero-sum stochastic games, ” in Proc. 54th IEEE Confer ence on Decision and Contr ol (CDC) 2015, pp. 4531-4536, Dec. 2015. [19] L. Li, E. Feron, and J. S. Shamma, “Finite stage asymmetric repeated games: Both players’ viewpoints, ” in Proc. 55th IEEE Confer ence on Decision and Control (CDC) 2016, pp. 5310-5315, Dec. 2016. [20] L. Li, and J. S. Shamma, “Efficient strategy computation in zero- sum asymmetric repeated games, ” Automatic Control, IEEE T rans. on, accepted for publication, [Online] A vailable: arXiv preprint arXiv:1703.01952, 2017 . [21] V . Kamble, Games with vector payoff: a dynamic pr ogramming ap- pr oach. PhD dissertation, UC Berkeley , CA, Fall 2015. [22] B. De Meyer, “Repeated games and partial differential equations, ” Mathematics of Operations Research, vol. 21, no. 1, pp. 209-236, 1996. [23] T . Sandholm, “The state of solving large incomplete-information games, and application to poker, ” AI Magazine, vol. 31, no. 4, pp. 13-32, 2010. [24] D. Fudenberg, and J. Tirole, Game Theory . Cambridge: The MIT Press, 1991. [25] H. P . Y oung, and S. Zamir (Eds.), Handbook of Game Theory with Economic Applications. vol. 4. Amsterdam, The Netherlands: Elsevier , 2015. [26] A. B. Mackenzie, and L. A. DaSilva, Game Theory for Wir eless Engineers. San Rafael, California: Morgan & Claypool Publishers, 2006. [27] E. Altman, T . Boulogne, R. El-Azouzi, T . Jimenez, and L. W ynter , “ A survey on networking games in telecommunications, ” Computers & Operations Resear ch, v ol. 33, no. 2, pp. 286-311, 2006. [28] Z. Han, D. Niyato, W . Saad, T . Basar, and A. Hjorungnes, Game Theory in W ireless and Communication Networks: Theory , Models, and Applications. New Y ork: Cambridge Univ ersity Press, 2011. [29] X. He, H. Dai, P . Ning, and R. Dutta, “Dynamic IDS configuration in the presence of intruder type uncertainty , ” in Pr oc. 2015 IEEE Global Communications Conference (GLOBECOM), Dec. 2015. [30] A. Garnaev , M. Baykal-G ¨ ursoy , and H. V . Poor , “Incorporating attack- type uncertainty into network protection, ” Information F or ensics and Security , IEEE T rans. on, vol. 9, no. 8, pp. 1278-1287, Aug. 2014. [31] A. Garnaev , M. Baykal-G ¨ ursoy , and H. V . Poor, “Security games with unknown adversarial strategies, ” Cybernetics, IEEE T ransactions on, v ol. 46, no. 10, pp. 2291-2299, Oct. 2016. [32] A. Garnaev , and W . T rappe, “ A bandwidth monitoring strategy under uncertainty of the adversary’ s acti vity , ” Information F or ensics and Se- curity , IEEE T rans. on, vol. 11, no. 4, pp. 837-849, Apr . 2016. [33] M. Baykal-G ¨ ursoy , Z. Duan, H. V . Poor, and A. Garnaev , “Infrastructure security games, ” European Journal of Operational Researc h, vol. 239, no. 2, pp. 469-478, 2014. [34] F . M. Aziz, Resilience of LTE networks against smart jamming attac ks: a game-theoretic appr oach. PhD dissertation, Geor gia Institute of T ech- nology , Atlanta, GA, Summer 2017. [35] A. Goldsmith, Wir eless Communications. New Y ork: Cambridge Uni- versity Press, 2005. [36] P . Visw anath, D. N. C. Tse, and R. Laroia, “Opportunistic beamforming using dumb antennas, ” Info. Theory , IEEE T rans. on, vol. 48, no. 6, pp. 1277-1294, Jun. 2002. [37] J-P . Ponssard, and S. Sorin, “The LP formulation of finite zero-sum games with incomplete information, ” Game Theory , International Jour- nal of, vol. 9, no. 2, pp. 99-105, Jun. 1980.

Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Leave a Comment