Joint Coverage and Power Control in Highly Dynamic and Massive UAV Networks: An Aggregative Game-theoretic Learning Approach

Unmanned aerial vehicles (UAV) ad-hoc network is a significant contingency plan for communication after a natural disaster, such as typhoon and earthquake. To achieve efficient and rapid networks deployment, we employ noncooperative game theory and a…

Authors: Zhuoying Li, Pan Zhou, Yanru Zhang

Joint Coverage and Power Control in Highly Dynamic and Massive UAV   Networks: An Aggregative Game-theoretic Learning Approach
IEEE TRANSA CTIONS ON WIRELESS COMMUNICA TION, V OL. X, NO. X, A UGUST 2019 1 Joint Co v erage and Po wer Control in Highly Dynamic and Massi ve U A V Networks: An Aggre gati v e Game-theoretic Learning Approach Zhuoying Li, Pan Zhou, Member , IEEE, Y anru Zhang, Member , IEEE, and Lin Gao, Senior Member , IEEE Abstract —Unmanned aerial vehicles (U A V) ad-hoc network is a significant contingency plan f or communication after a natural disaster , such as typhoon and earthquake. T o achieve efficient and rapid networks deployment, we employ noncooperativ e game theory and amended binary log-linear algorithm (BLLA) seeking for the Nash equilibrium which achie ves the optimal network performance. W e not only take channel overlap and power contr ol into account but also consider coverage and the complexity of interference. However , extensive U A V game theoretical models show limitations in post-disaster scenarios which r equire lar ge-scale U A V network deployments. Besides, the highly dynamic post-disaster scenarios cause strategies updating constraint and strategy-deciding error on U A V ad-hoc networks. T o handle these problems, we employ aggregative game which could capture and cover those characteristics. Moreo ver , we pro- pose a novel synchronous payoff-based binary log-linear learning algorithm (SPBLLA) to lessen inf ormation exchange and reduce time consumption. Ultimately , the experiments indicate that, under the same strategy-deciding error rate, SPBLLA ’ s learning rate is manifestly faster than that of the re vised BLLA. Hence, the new model and algorithm are mor e suitable and promising for large-scale highly dynamic scenarios. Index T erms —U A V , aggregative game, synchronous learning, large-scale model, highly dynamic scenario I . I N T RO D U C T I O N A. Backgr ound and Motivation C A T ASTROPHIC natural and man-made disasters, such as earthquakes, typhoons, and wars, usually in volv e great loss of life and/or properties, historical interests in vast areas. Though sometimes unavoidable, the loss of life and property can be effecti v ely reduced if proper disaster management has been implemented. Since telecommunication infrastructure can be vital for search and rescue missions after disasters [1], when communication networks are damaged by natural disasters, rapid deployment to provide wireless cov erage for ground users is quite essential for disaster man- agement [2]. Considering that repairing communication infras- tructures takes a long time, building vehicle relay networks Z. Li is with the School of Optical and Electronic Information, Huazhong Univ ersity of Science and T echnology , Wuhan 430074, China (e-mail: zhuoy- ingli@hust.edu.cn). P . Zhou is with the School of Electronic Information and Communications, Huazhong University of Science and T echnology , W uhan 430074, China (email: panzhou@hust.edu.cn). Y . Zhang is with the School of Computer Science and Engineering, Univ ersity of Electronic Science and T echnology of China, Chengdu 611731, China (email: yanruzhang@uestc.edu.cn). L. Gao is with the Department of Electronic and Information Engi- neering, Harbin Institute of T echnology , Shenzhen 518055, China (email: gaol@hit.edu.cn). was a preferable solution during the critical first 72 hours [3]. In such relay networks, vehicles are distributed in the post-disaster areas and act as communication infrastructures, supporting mission-critical communication. Ho we ver , vehicles are likely barricaded by damaged roads, torrential ri vers, and precipices, etc, in the vast areas. Faced with those challenges, unmanned aerial vehicles (U A V) ad-hoc networks become a good alternati ve for emergenc y response due to its numerous advantages such as quick deployment and resilience in large- scale harsh conditions [4][5]. T o in vestig ate U A V networks, nov el network models should jointly consider power control and altitude for practicability . Energy consumption, SNR and cov erage size are ke y points to decide the performance of a U A V network [6]. Respectively , power control determines the signal to energy consumption and noise ratio (SNR) of U A V ; altitude decides the number of users that can be supported [7], and it also determines the minimum value of SNR. It is because the higher altitude a UA V is, the more users it can support, and the higher SNR it requires. Therefore, power control and altitude are two essential factors. There have been extensi ve researches building models focusing on various network influence fac- tors. For example, the work in [8] established a system model with channels and time slots selections. Authors of [9] constructed a coverage model which considered each agent’ s cov erage size on a network graph. Howe ver , such models usually consider only one specific characteristic of networks but ignore systems’ multiplicity , which would bring great loss in practice. Since U A Vs will consume too much power to improv e SNR or to increase coverage size. Even though U A V systems in 3D scenario with multi-factors of coverage and charging strategies have been studied by [7], it overlooks power control which means that UA Vs might wast lots of energy . T o sum up, in terms of U A V ad-hoc networks in post- disaster scenarios, power control and altitude which determine energy consumption, SNR, and cov erage size ought to be considered to make the model credible [10]. In post-disaster scenarios, a great many of UA Vs are re- quired to support users [4]. Therefore, we propose aggregati v e game theory into such scenarios and permit U A V to learn in the constrained strategy sets. Because the aggregati ve game can integrate the impact of all other U A Vs on one UA V , it reduces the complexity of receiving information and reduces the data processing capacity of UA Vs. For instance, in a con v entional game applied a scenario with N U A Vs, it needs to analyze N strategies which decide noise and coverage sizes IEEE TRANSA CTIONS ON WIRELESS COMMUNICA TION, V OL. X, NO. X, A UGUST 2019 2 from each other indi vidual U A V . Howe ver aggregati ve game only needs to process the integrated noise and cov erage sizes of all other U A Vs. Such an advantage is more obvious when the number of UA Vs is extremely large since figuring out each others’ strategies is unrealistic [8]. In terms of constrained strategy sets, due to environmental factors such as violent winds [11] and tempestuous rainstorms, the action set of U A Vs has a restriction that cannot switch rapidly between extreme high power or elev ate altitude to low ones, but only lev els adjacent to them [12]. For instance, the power can change from 1 mW to 1 . 5 mW in the first time slot and from 1 . 5 mW to 2 mW in the next one, but it cannot alter it directly from 1 mW to 2 mW . Therefore, the aggregati v e game with constrained sets is an ideal model for post-disaster scenarios. A ne w algorithm which can learn from pre vious experiences is required, and the algorithm with faster learning speed is more desirable. Existing algorithms’ learning method is learning by prediction. It means that U A V knows current strategies with corresponding payoff and it can randomly select another strategy and calculate its payoff. And then U A V compares two payoffs. If the payoff of new strategy is larger , the current strategy will be replaced by the new strategy; if the current payoff strategy is large, it will remain in the current strategy . Howe ver , under highly dynamic sce- narios, complicate network conditions make UA Vs hard to calculate their e xpected payof fs b ut only learn from previous experiences [11]. In this situation, UA V only can learn from previous experience. U A V merely knows the current and the last strategy with the corresponding payoff, and it can only learn from these. In this case, if the two strategies are dif ferent, U A V chooses the strategy with the larger payoff as the current strategy in the ne xt iteration; if the tw o strategies are the same, a new strategy is randomly selected as the current strategy of the next iteration. T o sum up, the difference between the existing and required algorithms is that the existing algorithm calculates payoff for a new strategy (equiv alent to prediction) and chooses it based on prediction; while required algorithm can be av ailable when UA V only knows strategy’ s payoff if strategy has been selected, and then UA V can decide whether to stick to the current strategy or return to the past strategy by comparing their payof fs. Therefore, an algorithm which can learn from previous e xperiences is required. The learning rate of the extant algorithm is also not desirable [13]. Recently , a new fast algorithm called binary log-linear learning algorithm (BLLA) has been proposed by [14]. Howe v er , in this algorithm, only one UA V is allowed to change strategy in one iteration based on current game state, and then another U A V changes strategy in the next iteration based on the new game state. It means that U A Vs are not permitted to update strategies at the same time. Besides, to determine which U A V to update strategy , the coordinating process will occupy plenty of channel capacities and require more time between two iterations [15]. If the algorithm can learn synchronously , more than one U A V can update strategies based on the current game state in one iteration. Thus, the algorithm can be more efficient. T o sum up, synchronous update algorithms which can learn from pre vious experiences are desirable, but only a little research in vestig ated on it. B. Contribution W e establish a multi-factor system model based on large- scale U A V networks in highly dynamic post-disaster scenar- ios. Considering the limitations in existing algorithms, we de- vise a novel algorithm which is capable of updating strategies simultaneously to fit the highly dynamic en vironments. The main contributions of this paper are as follows: • W e design a model which jointly considers multiple fac- tors such as coverage and power control in multi-channel scenario. The model with more network influence factors ensures its reliability . • W e propose a novel U A V ad-hoc network model with the aggregati v e game which is compatible with the large- scale highly dynamic en vironments, in which se veral influences are coupled together . In the aggregativ e game, the interference from other U A Vs can be regarded as the integral influence, which makes the model more practical and efficient. • W e revise BLLA and construct a novel payoff-based binary log-linear learning algorithm (PBLLA). PBLLA outperforms BLLA in the sense that it is capable of learning previous utilities and strategies to fit for post- disaster scenarios. • W e propose the synchronous payoff-based binary log- linear learning algorithm (SPBLLA) which has the following properties: 1) SPBLLA can learn with re- stricted information; 2) In certain conditions, SPBLLA approaches NE with constrained strategies sets; 3) SP- BLLA allows UA Vs to update strategies synchronously , which significantly speeds up the learning rate; 4) SP- BLLA av oids system disorder and ensures synchronous learning, which is a main breakthrough. W e organize this paper as follows. In section II, we in- troduce the related works. In section III, we first introduce the U A V’ s power control in the multi-channel communication and coverage problems, then form a system model in highly dynamic scenarios. Moreov er , in section IV , we formulate our work as an aggregativ e game and prov e the existence of the NE. In section V , we propose the two algorithms for approaching the NE. Section VI presents the simulation results and discussions. Ultimately , section VII giv es a conclusion of the whole study . I I . R E L A T E D W O R K The literature mainly lies in four areas, we will present the studies of U A V communication network, channel reuse in the communication network, game theoretical approaches, and the implementation algorithms sequentially . A. U A V Communication Network W ith the rapid commercialization of U A Vs, a lot of research has emerged in this field [16]. T o efficiently deploy UA Vs, studies have been made to find out UA V distribution on net- work graph [9] and a graphical model has been proposed for channels reuse [17]. The resource allocation of channel and time is also a hot area to study [8]. As mentioned pre viously , IEEE TRANSA CTIONS ON WIRELESS COMMUNICA TION, V OL. X, NO. X, A UGUST 2019 3 those models are impractical for post-disaster scenarios since they fail to jointly consider power control and cov erage which are coupled factors. B. Channel Reuse T ypical wireless protocol 802.11b/g only provides limited channels for users, which is far more than enough for high- quality communication services [18]. T o reduce the load in central system, making use of distributed av ailable resources in networks turns out to be an ideal solution. Underlay Device-to-De vice (D2D) communication is considered as one of the crucial technologies for cellular spectrum reuse for user devices in communication networks [19]. The adv antage of D2D communication that allows end users to operate on licensed channels through power control sheds light on how interference management would work in UA V ad-hoc networks [22]. C. Game Theoretical Appr oaches Game theory provides an efficient tool for the cooper - ation through resource allocation and sharing [20][21]. A computation offloading game has been designed in order to balance the UA V’ s tradeoff between execution time and energy consumption [25]. A sub-modular game is adopted in the scheduling of beaconing periods for the purpose of less energy consumption [23]. Sedjelmaci et al. applied the Bayesian game-theoretic methodology in U A V’ s intrusion detection and attacker ejection [24]. Howe v er , most existing models focus on common scenarios with less number of U A Vs, which are not compatible with large-scale scenarios with large numbers of U A Vs [26]. Aggregati ve game is a characteristic game model which treats other agents’ strategies as a whole influence, thus av oids overwhelming strategies information from every single agent [27][28]. Inspired by this, our model is built upon the aggregati v e game theory which suits for large-scale scenarios. D. Cooperative Algorithms Compared with other algorithms, nov el algorithm SPBLLA has more advantages in learning rate. V arious algorithms hav e been employed in the UA V networks in search of the optimal channel selection [31][29], such as stochastic learning algorithm [30]. The most widely seen algorithm–LLA is an ideal method for NE approaching [9][32]. The BLLA has been employed by [33], which is modified from LLA to update strategies in each iteration to con ver ge to the NE. Howe v er , only a single agent is allowed to alter strategies in one iteration. In large-scale scenarios, more iterations are required, which makes BLLA inefficient. It is obvious that more UA Vs altering strategies in one iteration would be more efficient. T o achie ve it, the works in [34] and [35] have provided a novel synchronous algorithm. Howe ver , there exist superabundant restrictions that mak e the algorithm impractical in most scenarios. Compared with the formers, SPBLLA has fewer constraints and can achieve synchronous operation, which can significantly improv e the computational ef ficiency . Po st - d is a s t er A r ea U AV Ad - hoc Net work UA V C o v er ed Us er Un c o v er ed Us er Net w o rk L i n k Us e r L i nk a The T opological S tructure of Post - disas te r Scenario b ) Two Kinds o f Communica tion Li n k s B et wee n Users c ) A ltitude Influ en ce o n C o v erag e L i n k 2 L i n k 1 U A V 1 U A V 2 U A V 3 U A V 4 U se r 1 U se r 2 Po st - disa s t er A r ea U A V Po s t - d isaste r A re a U A V h 1 h 2 Fig. 1: The topological structure of U A V ad-hoc networks. a) The UA V ad-hoc network supports user communications. b) The coverage of a U A V depends on its altitude and field angle. c) There are two kinds of links between users, and the link supported by U A V is better . In summary , our work differs significantly from each of the abo ve-mentioned works, and other literatures in U A V ad- hoc networks. As far as we know , our proposed algorithm is capable of learning pre vious utilities and strategies, achie ve NE with restricted information and constrained strategies sets, and update strate gies synchronously , which significantly speed up the learning rate. I I I . S Y S T E M M O D E L A N D P RO B L E M F O R M U L A T I O N A. System Model W e construct a UA V ad-hoc network in a post-disaster scenario with M identical UA Vs being randomly deployed, in which M is a huge number compared with normal Multi- U A V system. All the UA Vs hav e the same volume of battery E and communication capability . The topological structure of Multi-U A V network is shown in Fig. 1 (a). T o support the communication mission, all UA Vs are re- quired to cooperate and support the user communication in need. U A Vs work abov e post-disaster area D . If a user ( User 1 ) needs to communicate with another user ( User 2 ) far aw ay from him/her , the communication link can be set either di- rectly between User 2 ( User 1 → User 2 ), or use U A Vs abov e as relay ( User 1 → UA V 1 → UA V 2 → · · · → UA V n → User 2 ), as sho wn in Fig. 1 (b). Since the users are far away from each other , the link through U A Vs provides better communication quality than the direct one, users will choose Link 2 if they are cov ered by UA Vs. Notice that users may not be cov ered by U A Vs due to the high uncertainty in post-disaster areas. In the UA V ad-hoc network, there are N channels av ail- able for UA Vs to communicate, which are labeled as C = [ C 1 , ..., C n , ..., C N ] . Each channel holds at most C max U A Vs due to interference constraint. Following the water -filling allocation scheme [36], UA Vs select one channel C n in the beginning of the process and will not change their decision afterwards. Assuming that UA V i ’ s decision is represented by the vector C i = [ C i 1 , ..., C in , ...C iN ] , where C in = 1 if UA V i selects channel C n , otherwise C in = 0 . Since the channel of each UA V is fixed, U A Vs cannot switch to other channel with less noise to improv e SNR, but only capable of raising SNR by coordinating their own power . The power for UA V i in each channel is denoted by the vector IEEE TRANSA CTIONS ON WIRELESS COMMUNICA TION, V OL. X, NO. X, A UGUST 2019 4 P i = [ P i 1 , ..., P in , ..., P iN ] , P in > 0 when UA V i selects the channel C n , otherwise, P in = 0 . The intrinsic noise in each channel is N oise = [ N s 1 , ..., N s n , ..., N s N ] . U A Vs have sev eral power lev els and altitude levels. In the midst of extreme en vironments, U A Vs cannot change its voltage dramatically but merely change to the adjacent power lev el [12]. Similarly , the altitude changing also has a limitation that only adjacent altitude level con version is permitted in each move. W e denote power set and altitude set to be P = { P 1 , ..., P k , ..., P np } and h = { h 1 , ..., h k , ..., h nh } , respectiv ely , where np is the number of power levels, and nh is the number of altitude lev els. W e assume that the gap between different lev els of po wer and altitude are equal. Let ∆ P and ∆ h denote the distance value of adjacent po wer le v els and altitude lev els, respecti vely . When U A Vs need communications, and the signal to noise rate (SNR) mainly determines the quality of service. U A Vs’ power and inherent noise are interferences for each other . Since there are hundreds of U A Vs in the system, each U A V is unable to sense all the other UA Vs’ power explicitly , but only sense and measure aggregati ve interference and treat it as an integral influence. Though increasing po wer can improv e SNR, excessiv ely large power causes more energy consumption and results in less running time. Therefore, proper power control for UA Vs is needed to be carefully designed. Cov erage is another factor which determines the perfor- mance of each U A V . As presented in Fig. 1 (c), the altitude of U A V plays an important role in cov erage adjusting. The higher altitude it is, the larger coverage size a UA V has. A large cov erage size means a substantial opportunity of supporting more users, but a higher SNR will be needed. Furthermore, the turbulence of upper air disrupts the stability of U A Vs with more energy consumption. Thus, a suitable height is essential to determine the coverage area. B. Pr oblem F ormulation Defining a U A V ad-hoc network game Γ = ( U i , S i ) i ∈ M , where U i is the utility function of UA V i and S i is the strategy set of UA V i . T ake s i ∈ S i as one strategy of UA V i , then s i = [ C i , P i , h i ] . For simplicity , we denote S − i ∈ Q j  = i S j as the strategy set of U A Vs except UA V i and s − i ∈ S − i . S = Q i S i , and s ∈ S is the strategy profile of the game. As is described previously , U A Vs can only sense the aggregati v e influence in each channel they select. Defining such influence as a power interaction term, which of UA V i is written as: σ ip ( s − i ) = ( σ − P i ) ⊗ C i , (1) where σ = P i ∈ M P i + N oise and ⊗ is the operator that multiplies corresponding elements in two vectors. For the con v enience of notation, denoting the nth element in σ ip ( s − i ) to be σ ip ( s − i ) ( n ) , which is the power interaction in C n . It should be noticed that, when UA V i does not choose C n , σ ip ( s − i ) ( n ) = 0 . Supposing that a U A V covers a round area below it with a field angle θ as shown in Fig. 1 (b). Thus the cov erage of UA V i is D i = π ( h i tan θ ) 2 . Considering that the higher the U A V 2 P o s t - d i s a s t e r a r e a U A V 1 U s e r U s e r U s e r U s e r U s e r U s e r U s e r U s e r Su p p o r t ed b y U A V 1 Su p p o r t ed b y U A V 2 U s e r U s e r U s e r Fig. 2: Co verage o v erlap between two UA Vs. When two U A Vs are close, there will be ov erlap areas among them and the utility of coverage will decrease. altitude is, the sev erer the air turbulence a UA V suffers, the utility of a coverage area is ˜ D i = π ( h i tan θ ) 2 · β , (2) where β is the air turbulence index, which decreases as the altitude increases. When there are numbers of U A Vs in the network, it is possible for the cov erage areas of different U A Vs to ov erlap. When a U A V ov erlaps with another , they will not support all users but share the mission. The users in the overlaps will be served randomly with equal probability by each U A V . Fig. 2 presents the overlaps between two UA Vs, i.e. UA V 1 and UA V 2 will support half of the users in the overlap area. In this condition, the true coverage ¯ D i is smaller than D i , and it is written as ¯ D i = D i − κ X j  = i D j , (3) where κ is the index which decides the influence of overlap. Since ¯ D i must satisfy that ¯ D i > 0 , κ is a tinny index. In a large-scale network, D i << P j  = i D j and κ << 1 . In order to support as many users as possible, U A Vs are required to enlarge coverage size, which is equal to enlarge the cov erage proportion in the mission area. Higher altitude indicates larger coverage size as shown in Fig. 1 (c). The utility of coverage size is denoted as U D = ˜ D i D . (4) Now we define the utility functions of energy , SNR and cov erage to calculate U A Vs’ payoffs. For power selection of UA V i , a large power does not necessarily result in high utility due to the large interference comes with it. T aking energy saving and longer lifetime into consideration, choosing the right amount of power that balances the tradeoffs between power and interference, brings the highest utility . W e define the energy utility function U E as: U E = E P 1 ≤ n ≤ N P in . (5) It should be noted that a high power provides UA Vs with higher communication quality and larger coverage requires higher power . IEEE TRANSA CTIONS ON WIRELESS COMMUNICA TION, V OL. X, NO. X, A UGUST 2019 5 As for po wer control, the SNR in channel C n for UA V i is S N R = P in P j  = i P j n + N s n . (6) And it can also be written as µC in [ P in − γ σ ip ( s − i ) ( n ) ] [8], where γ is the SNR balance index and µ is the SNR index. If giv en the Shannon capacity C = B log 2 (1 + S N R ) , (7) which decides the communication quality to be improved, SNR is supposed to be enlarged. The summation SNR of UA V i is µ P i ∈ M [ P i − γ σ ip ( s − i )] ⊗ C i , where we broaden the meaning of P and the formula is the summation of all elements in the vector . According to the previous definitions, we use an equation and an index to balance the tradeoff between power control and cov erage, and we formulate the SNR utility function as: U S N R = µ X i ∈ M [ P i − γ σ ip ( s − i )] ⊗ C i − α ¯ D i , (8) where α is the index that balances the tradeoff between SNR and cov erage size. Finally , in terms of the utility function U i of UA V i , we should find a balance among po wer limitation, SNR and cov erage size. Then the utility function U i of UA V i is defined as: U i ( s i , σ ip ( s − i )) = AU E + B U S N R + C U D = AE P 1 ≤ n ≤ N P in + B { µ X [ P i − γ σ ip ( s − i )] ⊗ C i − α ¯ D i } + C ˜ D i D , (9) where A , B and C are balance indices that balance three utilities on the basis of post-disaster scenario. The ultimate goal for enlarging the utility of the networks is to enlarge the summation of utility function (9) of each UA V , and we define the global utility function as the goal function: U = X i ∈ M U i ( s i , σ ip ( s − i )) , (10) which presents the performance of the UA V ad-hoc network. I V . M U LT I - AG G R E G A TO R A G G R E G AT I V E G A M E A. Multi-aggr e gator Aggr e gative Game Model W e formulate the UA V ad-hoc network game in large-scale post-disaster area as a multi-aggregator aggregativ e game [27], where we calibrate its definition in our UA V network model and put it as follows. Definition 1: (Multi-aggregator Aggregati v e Game) The game Γ = ( U i , S i ) i ∈ M is called a multi-aggr egator ag gr e ga- tive game with two or more aggregators g = g 1 , g 2 , ..., g K , g k : S → R n ( n is the dimension of aggregators), and S is bounded, if there exists two or more interaction functions σ i = { σ i 1 , σ i 2 , ..., σ iK } , and σ ik : S − i → X − i ⊆ R n , i ∈ M such that each of the payoff functions i ∈ M can be written as: U i ( s ) = u i ( s i , σ i 1 ( s − i ) , σ i 2 ( s − i ) , ..., σ iK ( s − i )) , (11) where u i : S i × X − i × X − i × ... × X − i → R , and for any fixed strate gies profile s : g k ( s ) = g ( σ ik ( s i , s − i )) . (12) In the U A V ad-hoc network game, σ i 1 ( s − i ) is the power interaction term σ ip ( s − i ) , which is the aggregati v e power influence from other U A Vs for UA V i . Another interaction term σ i 2 ( s − i ) is named as area interaction term σ ia ( s − i ) : σ i 2 ( s − i ) = σ ia ( s − i ) = X j  = i D j . (13) Besides, the aggregators g 1 ( s ) and g 2 ( s ) of the U A V ad-hoc network game are as follo ws: g 1 ( s ) ( n ) = ( σ ip ( s − i ) + P i ) ( n ) , (14) where g 1 ( s ) ( n ) is the n th element of g 1 ( s ) . g 1 ( s ) ( n ) is the summation of po wer and noise in channel n , which has been chosen by UA V i . If the strategy profile s is fixed, the summation of power and noise in all channels are fixed. Therefore, for any value of i , g 1 ( s ) ( n ) is the same, if the stratefy profile s is fixed. g 2 ( s ) = σ ia ( s − i ) + D i . (15) σ ia ( s − i ) is the area covered by U A Vs except of UA V i . Therefore g 1 ( s ) is the summation of all UA V’ s coverage area. For any v alue of i , g 2 ( s ) is the same, if the stratefy profile s is fixed. The utility function U i ( S i , σ ip ( S − i )) in U A V ad-hoc net- work game conforms to the definition of payoff function U i ( S ) . Therefore, UA V ad-hoc network game is a multi- aggregator aggregativ e game. B. Analysis of Nash Equilibrium In game theory , Nash Equilibrium (NE) is a special state that no UA V can gain more payoff by changing its strategy . Thus, NE is an ideal solution for all UA Vs in multi-U A V relay mission game [8]. The potential game is usually used in analyzing the existence of NE. W e first define the Pure Strategy Nash Equilibrium (PSNE) with se veral assumptions, then prov e that the U A V ad-hoc network game has a NE. Definition 2: (Pure Strategy Nash Equilibrium) A strategy profile s ∗ = { s ∗ 1 , s ∗ 2 , ..., s ∗ m } is a pure strate gy Nash equilib- rium if and only if any UA V is unable to gain more payof f by altering their strategies when no other UA Vs change their strategies, i.e., U i ( s i , s ∗ − i ) ≤ U i ( s ∗ i , s ∗ − i ) , i ∈ M , s i  = s ∗ i . (16) Definition 3: (Potential Game) A game Γ = ( ϕ i , S i ) i ∈ M such that S i is the strategy of UA V i , s i ∈ S i . W e broaden the meaning of Q and write the set of all strategies as S = Q S i . Labeling the set of U A Vs except UA V i as S − i = Q j  = i S j , s − i ∈ S − i . There exists a potential function ϕ i ( s − i ) : S − i → S i . If there exists a function P : S → R for ∀ i ∈ M such that U i ( s ′ i , s − i ) − U i ( s i , s − i ) = ϕ ( s ′ i , s − i ) − ϕ ( s i , s − i ) , (17) then this game is a potential game. Definition 3 indicates that the change of utility function is the same amount of the change of potential function, which giv es an ideal property to the potential game. Theorem 1: An y potential game with finite strategy has at least one PSNE. IEEE TRANSA CTIONS ON WIRELESS COMMUNICA TION, V OL. X, NO. X, A UGUST 2019 6 Pr oof: Please refer to the proof of Corollary 2.2 in [37]. Theorem 2: The U A V ad-hoc network game is a potential game and has at least one PSNE. Pr oof: Please refer to Appendix A. Since the UA V ad-hoc network game is a special type of potential game, we can apply the properties of the potential game in the later analysis. Some algorithms that have been applied in the potential game can also be employed in the UA V ad-hoc network game. In the next section, we in vestigate the existing algorithm with its learning rate in large-scale post- disaster scenarios and propose a new algorithm which is more suitable for the U A V ad-hoc network in such scenarios. V . L E A R N I N G A L G O R I T H M A. P ayoff-based Binary Log-Linear Learning In the literatures, most works search PSNE by using the Binary Log-linear Learning Algorithm (BLLA). Howe ver , there are limitations of this algorithm. In BLLA, each U A V can calculate and predict its utility for any s i ∈ S i in the complete strategy set. While in U A V ad-hoc networks, U A Vs are accessible only to the constrained strategy set and corresponding utilities in the last two decision periods. Thus, con v entional BLLA is no more suitable for the scenario we considered here, we propose a re vised algorithm based on BLLA, called Payoff-based Binary Log-linear Learning Algorithm (PBLLA) to resolve the issue. In this section, we will present how PBLLA works and the con v ergence of PBLLA. Theorem 3: Let denote τ as the dynamic degree of the scenarios. The harsher en vironment the networks suffers, the higher τ it is. In the highly dynamic scenarios, we suppose that τ ≥ 0 . 01 . W ith proper τ , PBLLA asymptotically con ver ges and leads the U A V ad-hoc network game approaching to the PSNE. Pr oof: The proof of the con vergence of PBLLA is similar to that of BLLA in [38]. According to [38], we only need to illustrate that U A V ad-hoc network game conforms to the following two assumptions. Assumption 1: For ∀ i ∈ M and all strategy profile pairs s 0 i , s n i ∈ A , there exists a series of strategies s 0 i → s 1 i → ... → s n i satisfying that s k i ∈ C i ( s k − 1 i ) for ∀ k ∈ { 1 , 2 , ..., n } , where C i ( s k i ) is the constrained strategies set of strategy profile s k i . Assumption 2: For ∀ i ∈ M and all strategy profile pairs s 0 1 , s n 2 ∈ A , s 2 i ∈ C i ( s 1 i ) ⇔ s 1 i ∈ C i ( s 2 i ) . (18) In the U A V ad-hoc network game, U A Vs are only permitted to select the adjacent power and altitude lev els. It is evident that for any strategy profile pair s 0 i , s n i ∈ A , s 0 i can change power and altitude step by step to reach s n i , vice versa. Thus, U A V ad-hoc network game satisfies Assumption 1 and Assumption 2. The remaining proof can follow Theorem 5.1 in [38]. The essence of PBLLA is selecting an alternative UA V randomly in one iteration and improving its utility by al- tering power and altitude with a certain probability , which Algorithm 1 Payoff-based Binary Log-linear Learning Algo- rithm 1: Initialization: Selecting an arbitrary power and altitude profile s ∈ S and arbitrary channels for each U A V , the number of U A Vs in a channel must be less than C max . 2: Set t = 1 . 3: Set s ( t ) = s , x i ( t ) = 0 for any UA V i such that i ∈ M . 4: Set dynamic degree index τ ∈ (0 , ∞ ) . 5: Repeat 6: if x i ( t ) = 0 for ∀ i ∈ M then 7: Select UA V i randomly . 8: Select s i ( t + 1) randomly in C i ( s i ( t )) , which is the constrained strategies set when the current strategy is s i ( t ) . 9: x i ( i + 1) = 1 . 10: else 11: Select UA V i which satisfies x i ( i + 1) = 1 , 12: W ith probability exp[ 1 τ U i ( s ( t − 1))] exp[ 1 τ U i ( s ( t − 1))]+exp[ 1 τ U i ( s ( t ))] , 13: s i ( t + 1) = s i ( t − 1) ; 14: Or with probability exp[ 1 τ U i ( s ( t ))] exp[ 1 τ U i ( s ( t − 1))]+exp[ 1 τ U i ( s ( t ))] , 15: s i ( t + 1) = s i ( t ) . 16: x i ( t + 1) = 0 . 17: end if 18: for UA V j , j ∈ M and j  = i do 19: s j ( t + 1) = s j ( t ) and x j ( t + 1) = 0 20: end for 21: t = t + 1 22: End Repeat is determined by the utilities of two strategies and τ . UA V prefers to select the po wer and altitude which pro vide higher utility . Nev ertheless, highly dynamic scenarios will cause U A Vs to make mistakes and pick the worse strategy . The dynamic degree index τ determines the dynamic degree of the situation and U A V’ s performance. Small τ means less dynamic scenarios and fewer mistakes when U A Vs are making decisions. When τ → 0 which equals to stabilization, U A V will always select the power and altitude with higher utility; when τ → ∞ where exists sev er dynamics, U A V will choose them randomly . Howe ver , PBLLA has its limitations that PBLLA is only one single U A V is allowed for altering strategies in one iteration. W e will propose a ne w algorithm in the next section to ov ercome the restrictions. B. Synchr onous P ayoff-based Binary Log-linear Learning Since PBLLA only allo ws one single U A V to alter strate gies in one iteration, such defect would cause computation time to grow exponentially in large-scale U A Vs systems. In terms of large-scale UA Vs ad-hoc networks with a number of U A Vs denoted as M , M 2 times of exchange messages will be needed to coordinate and guarantee that only one UA V changes strategy in each iteration. Such a process not only consumes large energy but also prolongs con ver gence time. Algorithms that can improve the learning rate and reduce messages exchange is urgently needed. Thus, we propose the Synchronous Payof f-based Binary Log-linear Learning IEEE TRANSA CTIONS ON WIRELESS COMMUNICA TION, V OL. X, NO. X, A UGUST 2019 7 Algorithm (SPBLLA), which permits each U A V altering their strategies synchronously and learning with no message ex- change. Algorithm 2 Synchronous Payof f-based Binary Log-linear Learning Algorithm 1: Initialization: Selecting an arbitrary power and altitude profile s ∈ S and arbitrary channels for each U A V , the number of U A Vs in a channel must be less than C max . 2: Set t = 1 s (1) = s 3: Set x i ( t ) = 0 for any UA V i such that i ∈ M . 4: Set dynamic degree index τ ∈ (0 , ∞ ) . 5: Set probability index m , and the altering strate gies prob- ability ω = ( e − 1 τ ) m . 6: Repeat 7: for ∀ UA V i do 8: if x i ( t ) = 0 then 9: W ith probability ω : 10: UA V i select s i ( t + 1) randomly in C i ( s i ( t )) , which is the constrained strategy set when the current strategy is s i ( t ) , 11: x i ( i + 1) = 1 ; 12: or W ith probability (1 − ω ) : 13: s i ( t + 1) = s i ( t ) , 14: x i ( i + 1) = 0 . 15: else 16: W ith probability exp[ 1 τ U i ( s ( t − 1))] exp[ 1 τ U i ( s ( t − 1))]+exp[ 1 τ U i ( s ( t ))] , 17: s i ( t + 1) = s i ( t − 1) ; 18: Or with probability exp[ 1 τ U i ( s ( t ))] exp[ 1 τ U i ( s ( t − 1))]+exp[ 1 τ U i ( s ( t ))] , 19: s i ( t + 1) = s i ( t ) . 20: x i ( t + 1) = 0 . 21: end if 22: end for 23: t = t + 1 . 24: End Repeat The process of SPBLLA let U A Vs free from message exchange. Therefore, there is no waste of energy or time con- sumption between tw o iterations, which significantly improv es learning efficienc y . All U A Vs are altering strategies with a certain probability of ω , which is determined by τ and m . τ also presents the dynamic degree of scenarios. The chance of U A Vs to make mistakes when altering strategies is determined by the dynamic degree as in PBLLA. T o prov e the conv er gence of SPBLLA, we first provide some conceptions. Definition 4: (Regular Perturbed Markov Process) Denote P as the transaction matrix of a Markov Process which has a finite state space S . This Markov Process is called regular perturbed markov process with noise ϵ if the following conditions are met. 1) P ϵ is aperiodic and irreducible when ϵ > 0 . 2) lim ϵ → 0 P ϵ = P 0 , where P 0 is an unperturbed process. 3) For any s n , s m ∈ S , when P ϵ ( s n , s m ) > 0 , there exists a function R ( s n → s m ) ≥ 0 , which is called the resistance of Fig. 3: T wo trees rooted at s 3 changing strategy from s n to s m , such that 0 < lim ϵ → 0+ P ϵ ( s n , s m ) ϵ R ( s n → s m ) < ∞ . (19) Definition 5: (Stochastically Stable Strategy) Denote P ϵ as the transaction probability of a regular perturbed Markov process in a state space S , and µ ϵ ( s ) is the probability that the state transforms to s . The state is a stochastically stable strategy if lim ϵ → 0+ µ ϵ ( s ) > 0 . (20) In our model, when U A Vs of repeated U A V ad-hoc net- work game adheres to regular perturbed Markov process, the probability of being in s is µ ϵ ( s ) = ϵ − ϕ ( s ) P ˜ s ∈ S ϵ − ϕ ( ˜ s ) . (21) Let L denote a series of adjacent strategies, then L = { s 0 → s 1 → ... → a n } . In any adjacent strategies pairs ( s k − 1 , s k ) , the resistance of changing strategy from s k − 1 to s k is R ( s k − 1 , s k ) . The resistance of path L is the sum of each mov e R ( L ) = m X k =1 R ( s k − 1 → s k ) . (22) According to L , we giv e the definition of resistance tree. Definition 6: (Resistance T ree) In a strate gy profile space S , strategy profiles are linked. A tree T , rooted at strategy profile s , is a set of directed edges that any other strategy profile has only one directed path, which consists of several directed edges, leads to s . The resistance of T is the summation of resistance of all directed edges, R ( T ) = X s ′ → s ′′ ∈ T R ( s ′ → s ′′ ) . (23) Denoting T ( s ) as all the trees that rooted at strategy profile s , then the stochastic potential of strategy profile s can be written as γ ( s ) = min T ∈ T ( s ) R ( T ) . (24) The minimum resistance tree of strategy space S is the tree that has the minimum stochastic potential, R ( T min ) = min s ∈ S γ ( s ) . (25) For example, Fig. 3 shows different tree rooted at S 3 . The branches are linked in different ways. W ith these definitions, we can prove the con v ergence of SPBLLA. Theorem 4: Considering sev eral U A Vs with U A V ad-hoc network game with potential function ϕ : S → R . When IEEE TRANSA CTIONS ON WIRELESS COMMUNICA TION, V OL. X, NO. X, A UGUST 2019 8 all UA Vs adhere to SPBLLA, if m is large enough, the stochastically stable strategies are maximizers of the potential function, which are PSNEs. Pr oof: Refer to Appendix B. Remark 1: According to Appendix B, to make the SPBLLA con v erge, m should be twice larger than the most massiv e altering amount of each U A V’ s utility function. Theorem 5: m is an index that indirectly influences the learning rate. If m satisfies m > 2∆ , (26) where ∆ = AE ∆ P P C i · P 1 · ( P 1 + ∆ P ) + B X C i · ∆ P + B γ ∆ P · X ( X j  = i C j ⊗ C i ) + B απ tan 2 θ (2 h nh ∆ h − ∆ h 2 ) + B ακ ( M − 1) π tan 2 θ (2 h nh ∆ h − ∆ h 2 ) + C D π tan 2 θ [ h 2 β nh − ( h nh − ∆ h ) 2 β ] , then SPBLLA con ver ge. Pr oof: Refer to Appendix C. Remark 2: Let each U A V alter strategy as large as possible to make utility function change significantly . Calculating the most significant dif ference that a utility function can make in an iteration, and we are capable of learning the range of m . V I . S I M U L A T I O N R E S U LT A N D D I S C U S S I O N In this section, how the key parameters in the U A V ad- hoc network affect the performance of PBLLA and SP- BLLA will be studied. In the simulation, besides the quan- tity of U A Vs and channel, other parameters are fixed as constant values. W e set up the post-disaster area as D = 4000 k m 2 . M = 100 UA Vs are scatted in the area. There are N = 30 channels and each channel can hold C max = 25 U A Vs at most, and each U A V selects N C = 5 chan- nels set np = 40 lev els of power are accessible P = { 0 . 025 W , 0 . 05 W, 0 . 75 W, ..., 0 . 975 W , 1 W } , and the gap be- tween two lev els is ∆ P = 0 . 025 W . Noise is randomly distributed from 0 . 025 W to 1 W . nh = 46 le vels of altitude are accessible h = { 1 k m, 1 . 2 k m, 1 . 4 km, ..., 9 . 8 k m, 10 k m } , and the distinction between two levels is ∆ h = 0 . 2 k m . The battery capacity of U A V for communication is E = 5 mAh . Field angle which determines the coverage size is θ = 30 o . The balance indices vary from one to another in the diverse scenarios. Here we choose A = 0 . 002 , B = 0 . 005 , C = 0 . 03 , and α = 0 . 002 , γ = 0 . 002 , κ = 10 − 4 , µ = 10 . β decreases as h increases. According to Theorem 5 in the last section, when m > 2∆ , SPBLLA is feasible. A. Effect of Scenarios’ Dynamic De gr ee τ In this part, we in v estigate the influence of environment dynamic on the network states. W ith different scenarios’ dynamic degree τ ∈ (0 , ∞ ) , PBLLA and SPBLLA will con v erge to the maximizer of goal function with different altering strategy probability . Fig. 4 presents the influence of the dynamics on PBLLA. W e can find out that the fluctuation during con v erging is sev ere in both algorithms, which is different from other related works. It does not result from the bad performance of algorithms but from the highly dynamic scenarios. When the en vironment is highly dynamic with high values of τ , which brings about more mistakes when selecting powers and altitudes. Thus, when U A Vs hav e low probabilities to select the right strategy , it will result in non- optimal decisions. In the rest simulations, similar phenomena can also be observed. As the dynamic degree inde x τ decreases from 0 . 03 to 0 . 01 , the goal function’ s v alues are increasing, which illustrates that lower values of τ approach to maximizer of the global utility function. When τ = 0 . 03 , the value of U does not increase much before con v erge. The fact that the sev ere interference from the environment seriously influences the UA V network making UA Vs, making mistakes when altering strategies, can account for such a result. In Fig. 5, ∆ U is the value that presents the difference between the value of each iteration and the av erage function value after conv er gence. Compare two lines of τ = 0 . 01 and τ = 0 . 03 in Fig. 5. The fluctuation of τ = 0 . 03 is around 50% more than that of τ = 0 . 01 , showing that the rough environments lead to unstable con v erge states. Fig. 7 and 8 show the performance of SPBLLA, which is similar to that of PBLLA. In the simulation of the SPBLLA, m = 0 . 03 , which follows the instruction of Theorem 5. Those mentioned figures show that when τ s in PBLLA and SPBLLA are equal, the final optimum states and maximizer are similar . B. P erformance on SNR and Coverage Since PBLLA is similar to SPBLLA, we skip the perfor- mance analysis for PBLLA, but focus on SPBLLA only . W e set m = 0 . 03 and τ from 0 . 01 to 0 . 03 . Fig. 6 and Fig. 9 present the performance average SNR and the cov erage size of the whole U A V ad-hoc network. The interference from the en vironment causes fluctuation. When the scenario dynamic degree is getting lo wer , the network is capable of approaching to a better outcome, and the SNR and coverage size is getting larger . In the slower dynamic scenario ( τ = 0 . 01 ), the U A V ad-hoc networks can cov er 95% post-disaster area. Even though the interference from the en vironment is serious ( τ = 0 . 03 ), the coverage proportion can be up to 80%. C. Altering strate gies pr obability m on SPBLLA The opportunities for UA Vs changing strategies in each SPBLLA ’ s iteration are determined by the probability index m . The altering probability ω = ( e − 1 τ ) m illustrates that higher m provide less probability for U A Vs altering strategies. Higher probability enables higher chances for more UA Vs altering strategies at the same time. Thus lower m ensures high frequencies of changing strategies and faster conv er ge rates. According to the limitation of m in equ. (26), we figure out that m > 0 . 028 . Fig. 10 shows the effect of m on the behavior of SPBLLA. Setting τ = 0 . 01 and m > 0 . 028 , we choose 5 values from 0 . 03 to 0 . 05 . As m getting higher , SPBLLA needs more time for con v ergence. Since higher m results in less opportunity for U A Vs to alter strategies, fewer U A Vs change strategies at the IEEE TRANSA CTIONS ON WIRELESS COMMUNICA TION, V OL. X, NO. X, A UGUST 2019 9 0 25 50 75 100 Iterations X10 4 14 15 16 17 18 19 20 21 22 Global Utility Function (U) τ =0.01 τ =0.015 τ =0.02 τ =0.025 τ =0.03 Fig. 4: Effect of dynamic degree index τ on PBLLA ( 10 6 iterations). Se vere dynamic scenarios cause fewer utilities of the whole networks. 50 150 250 350 450 550 650 750 850 950 Iterations X10 4 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 Fluctuation of Global Utility Function ( ∆ U) τ =0.01 τ =0.03 Fig. 5: Effect of dynamic degree index τ on PBLLA ’ s fluctuation. T wo horizontal lines show the fluctuation range. Higher dynamic scenario causes severe fluctua- tion. 0 10 20 30 40 50 Iterations X10 3 14 14.5 15 15.5 SNR/dB τ =0.01 τ =0.015 τ =0.02 τ =0.025 τ =0.03 Fig. 6: Performance of SNR. In the ower dynamic scenario, SNR is much higher after con ver gence. 0 5 10 15 20 Iterations X10 4 14 15 16 17 18 19 20 21 22 Global Utility Function (U) τ =0.01 τ =0.015 τ =0.02 τ =0.025 τ =0.03 Fig. 7: Effect of dynamic degree index τ on SPBLLA ( 2 × 10 5 iterations). The result is the same as PBLLA, which illustrates that algorithm does not affect con v ergence states. 0 100 200 300 400 500 600 700 800 900 Iterations X10 3 -1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 Fluctuation of Global Utility Function ( ∆ U) τ =0.01 τ =0.03 Fig. 8: Effect of dynamic degree index τ on SPBLLA ( 2 × 10 5 iterations). The result is the same as PBLLA, which illustrates that algorithm does not affect con v ergence states. 0 10 20 30 40 50 Iterations X10 3 2600 2800 3000 3200 3400 3600 3800 4000 Coverage/km 2 τ =0.01 τ =0.015 τ =0.02 τ =0.025 τ =0.03 Fig. 9: Performance of Cov erage size. Lower dynamic degree index increases Cov erage size. When τ = 0 . 01 , coverage proportion is up to 95%. same time. For the sake of getting to the same optimum state, higher m requires a longer time for conv ergence. It is also presented in Fig. 10 that m does not influence the optimum state. The same value of τ and v arious values of m con ver ge to the state that shares the similar value of the global utility function, and a higher probability for altering strategies save U A V ad-hoc networks con ver gence time. D. P ower Contr ol Allocating appropriate po wer is one of the key factors that affect U A V’ s utility , and the proper value of power will significantly improv e the quality of service. Fig. 11 presents the sketch diagram of a UA V’ s utility with power altering. The altitudes of U A Vs are fixed. When other U A Vs’ power profiles are altering, the interference increases and the curve moves down. The high interference will reduce the utility of the U A V . Fig. 11 also shows that utility decreases and increases with power improving. Small and large power both provide high utilities, which is because small power will sav e energy and large power will increase SNR. The U A V might select the largest power to increase utility . Howe v er , The more power one U A V uses, the more interference other U A Vs will receive and other U A Vs’ utilities will reduce. For the sake of enlarging the global utility , the largest po wer is not the optimal strategies for the whole UA V ad-hoc network. The best power will locate in some values that smaller than the largest power (The optimal value in the figure is a sketch value). E. The number N of U A Vs In the large-scale U A V ad-hoc networks, the number of U A Vs is another feature that should be in v estigated. Since the demanding channel’ s capacity should not be more than the channel’ s size we provide, we limit the number of U A Vs in the tolerance range which satisfies that each U A V’ s channel selection is contented. In this scenario, there are N = 50 channels, and the number of U A Vs should be limited to M ≤ 250 . Fig. 12 shows ho w the number of U A Vs affect the computa- tion complexity of SPBLLA. Since the total number of U A Vs is div erse, the goal functions are different. The goal functions’ value in the optimum states increase with the gro wth in U A Vs’ number . Since goal functions are the summation function of utility functions, more U A Vs offer more utilities which result in higher potential function value. Moreover , more U A Vs can cover more area and support more users, which also corresponds with more utilities. Fig. 12 also shows ho w many iterations that U A V ad-hoc network needs to approach to con ver gence. W ith the number of U A Vs improves, more iterations are required in this network. F . Comparison of PBLLA and SPBLLA In this subsection, we do a comparison between PBLLA and SPBLLA to inv estigate the superiority of SPBLLA. W e fix τ at sev eral different values then compare the con ver gence rates of PBLLA and SPBLLA. Fig. 13 presents the learning rate of PBLLA and SPBLLA when τ = 0 . 01 . As m increases the learning rate of SP- BLLA decreases, which has been shown in Fig. 13. Howe ver , IEEE TRANSA CTIONS ON WIRELESS COMMUNICA TION, V OL. X, NO. X, A UGUST 2019 10 0 2 4 6 8 10 Iterations X10 4 14 15 16 17 18 19 20 21 22 Global Utility Function (U) m=0.03 m=0.035 m=0.04 m=0.045 m=0.05 Fig. 10: Effect of probability index m on SPBLLA. Lower probability indices create higher altering chances and higher learning rates. 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 Power/W -0.05 0 0.05 0.1 0.15 0.2 0.25 0.3 Utility Function of A UAV(Ui) Largest Interference Random Interference Least Interference Optimal Value Maximum Value Fig. 11: Effect of power on the utility . The best power will locate in some val- ues that smaller than the largest power . 0 5 10 15 20 25 iterations X10 4 10 15 20 25 30 35 40 45 U M=100 M=125 M=150 M=175 M=200 Fig. 12: Ef fect of U A Vs quantity in SP- BLLA. More U A Vs provide more UA V ad-hoc networks utilities. 0 1 2 3 4 5 6 7 8 9 10 terations X10 4 14 15 16 17 18 19 20 21 22 Global Utility Function (U) PBLLA SPBLLA m=0.03 SPBLLA m=0.035 SPBLLA m=0.04 Fig. 13: Comparison of PBLLA with SP- BLLA when τ = 0 . 01 . With tree altering probabilities, SPBLLA ’ s learning rates are much higher than that of PBLLA. τ =0.01 τ =0.02 τ =0.03 τ =0.04 τ =0.05 0 5 10 15 20 25 30 Numbers of Iterations X10 4 PBLLA SPBLLA m=0.05 SPBLLA m=0.045 SPBLLA m=0.04 SPBLLA m=0.035 SPBLLA m=0.03 Fig. 14: Comparison of PBLLA with SP- BLLA with various τ . SPBLLA ’ s learn- ing rates are not always higher than those of PBLLA. The red circle shows the situation that SPBLLA is lo wer than PBLLA. 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1 m 0 0.005 0.01 0.015 0.02 τ exp(-m/ τ )=0.05 exp(-m/ τ )=0.04 exp(-m/ τ )=0.03 exp(-m/ τ )=0.02 exp(-m/ τ )=0.01 exp(-m/ τ )=0.005 exp(-m/ τ )=0.001 Fig. 15: τ and m ’ s impact on the prob- ability of altering strategies ω = ( e 1 τ ) m . These lines are equal probability lines with various τ and m . when m is small, SPBLLA ’ s learning rate is about 3 times that of PBLLA showing the great advantage of synchronous learning. When τ = 0 . 015 and τ = 0 . 02 as sho wn in Fig. 14, such phenomenon also exists. Since PBLLA merely permits a single UA V to alter strategies in one iteration, SPBLLA ’ s synchronous learning rate will much larger than PBLLA. Moreover , in the large-scale U A V network with high dynamic, PBLLA needs information exchange to decide the update order, which would sev erely prolong the learning time. PBLLA ’ s learning time might be four times as long as that of SPBLLA. Thus we can make the conclusion that in the same condition (the same τ and other indexes), SPBLLA performs better and is more suitable for large-scale highly dynamic en vironment than PBLLA, and SPBLLA can improve the learning rate sev eral times. W ith larger altering strategies probability , SPBLLA will be e ven more po werful. Howe v er , we hav e to recognize that the altering strategies probability ω severely impacts on the efficiency of SPBLLA. If Theorem 5 limits m to be a large value, the probability will decrease. When m is too large, U A Vs are hard to move, and the learning rate will decrease. T o some points, the learning rate of SPBLLA will lower than that of PBLLA. In our U A V ad-hoc network scenario, when τ = 0 . 01 and m = 0 . 03 , which is circled in Fig. 14, the probability of altering strategies ω < 0 . 01 . The probability of altering strategies in SPBLLA is less than that of PBLLA, and the SPBLLA will spend more learning time. Fig. 15 shows τ and m ’ s impact on the probability of altering strategies ω = ( e 1 τ ) m . The red circle in Fig. 15 matches with the red circle in Fig. 14, where SPBLLA is not efficient. The line ω = ( e 1 τ ) m = 0 . 01 marks the same ef ficienc y of the two algorithms, where SPBLLA alters one U A V in one iteration on average. When post-disaster scenarios are in the same degree of dynamic, lower m permits higher changing strategies probability . If we want to increase altering probability , we can limit the utility change in each iteration to reduce m . Nev ertheless, such a method reduces the updated amount which also causes a negati v e influence on the learning rate. Then how to balance the altering probability and update amount is a new topic that needs further in vestigation. Besides, how to reduce m in Theorem 5 is another task that requires further research. V I I . C O N C L U S I O N In this paper , we establish a U A V ad-hoc network in the large-scale post-disaster area with the aggregati ve game. W e propose a synchronous learning algorithm (SPBLLA) which expedites the learning rate comparing to the asynchronous learning algorithm (PBLLA), and shows desired behavior in highly dynamic scenarios. The learning rate of SPBLLA can be 10 times larger than that of PBLLA. Even though there exists fluctuation during con vergence, SNR has been improv ed and the network can cov er over 95% post-disaster area. From our analysis, we see that more U A Vs provide higher utilities in the network. Though our proposed algorithm fits a highly dynamic en vironment, the learning rate of SPBLLA decreases when m becomes large. In further work, improved SPBLLA which supports a wide range of m will be needed. IEEE TRANSA CTIONS ON WIRELESS COMMUNICA TION, V OL. X, NO. X, A UGUST 2019 11 A P P E N D I X A P RO O F O F T H E T H E O R E M 2 Pr oof: F ormulate a function ϕ : ϕ ( s ) = X i ∈ M { A E P 1 ≤ n ≤ N P in + B [ X 1 ≤ n ≤ N P in − α ¯ D i ] + C ˜ D i D } . (27) When U AV i change its strategy from s to s ′ , its utility changes U i ( s ′ i , s − i ) − U i ( s i , s − i ) = A E P 1 ≤ n ≤ N P ′ in + B { X [ P ′ i − γ σ ip ( s − i )] ⊗ C i − α ¯ D ′ i } + C ˜ D ′ i D ! − A E P 1 ≤ n ≤ N P in + B { X [ P i − γ σ ip ( s − i )] ⊗ C i − α ¯ D i } + C ˜ D i D ! = A ( E P 1 ≤ n ≤ N P ′ in − E P 1 ≤ n ≤ N P in ) + B ( X ( P ′ i − P i ) ⊗ C i ) + B α [( D ′ i − κ X j  = i D j ) − ( D i − κ X j  = i D j )] + C ( ˜ D ′ i D − ˜ D i D ) = A ( E P 1 ≤ n ≤ N P ′ in − E P 1 ≤ n ≤ N P in ) + B X 1 ≤ n ≤ N ( P ′ in − P in ) + B α ( D ′ i − D i ) + C ( ˜ D ′ i D − ˜ D i D ) . (28) As for function ϕ ϕ ( s ′ i , s − i ) − ϕ ( s i , s − i ) = X i ∈ M { A E P 1 ≤ n ≤ N P ′ in + B [ X 1 ≤ n ≤ N P ′ in − α ¯ D ′ i ] + C ˜ D ′ i D } − X i ∈ M { A E P 1 ≤ n ≤ N P in + B [ X 1 ≤ n ≤ N P in − α ¯ D i ] + C ˜ D i D } = A ( E P 1 ≤ n ≤ N P ′ in − E P 1 ≤ n ≤ N P in ) + B X 1 ≤ n ≤ N ( P ′ in − P in ) + B α ( D ′ i − D i ) + C ( ˜ D ′ i D − ˜ D i D ) . (29) Therefore, U i ( s ′ i , s − i ) − U i ( s i , s − i ) = ϕ ( s ′ i , s − i ) − ϕ ( s i , s − i ) and ϕ is the potential function of U A V ad-hoc network game. A P P E N D I X B P RO O F O F T H E O R E M 4 Pr oof: Define the state that UA V ad-hoc netwok has the latest two strategy profiles s ( t − 1) and s ( t ) to be tuple a ( t ) = [ s ( t − 1) , s ( t ) , x ( t )] , where x ( t ) = [ x 1 ( t ) , x 2 ( t ) , ..., x M ( t )] . Let ϵ = e − 1 τ . When ϵ approaches to 0+ , ω = 0 , then the unperturbed process equals to U A Vs ne ver altering strategies. It is obvious that the recurrent classes of U A V ad-hoc network game with unperturbed process are [ s, s, 0 ] , where two strategy profiles are identical and any element in x ( t ) is 0 . Any transition in the process of SPBLLA can be written as z 1 = [ s 0 , s 1 , x 1 ] → z 2 [ s 1 , s 2 , x 2 ] , to be specific x 1 i = 0 ⇒  x 2 i = 0 , s 2 i = s 1 i , or x 2 i = 1 , s 2 i ∈ C i ( s 1 i ) , x 1 i = 1 ⇒ x 2 i = 0 , s 2 i ∈ s 0 i , s 1 i . The probability of transition from a 1 to a 2 is P ϵa 1 → a 2 = Y i : x 1 i =0 ,x 2 i =0 (1 − ω ) ! Y i : x 1 i =0 ,x 2 i =1 ω | C i ( s 1 i ) | ! Y i : x i i =1 ,s 2 i = s 0 i ϵ − U i ( s 0 ) ϵ − U i ( s 0 ) + ϵ − U i ( s 1 ) ! Y i : x i i =1 ,s 2 i = s 1 i ϵ − U i ( s 1 ) ϵ − U i ( s 0 ) + ϵ − U i ( s 1 ) ! . (30) Denote V i ( s 0 , s 1 ) := max { U i ( s 0 ) , U i ( s 1 ) } , (31) s ( x ) := X i x i . (32) Multiplying the numerator and denominator of P ϵz 1 → z 2 by Q i ∈ M ϵ V i ( s 0 ,s 1 ) , and let R ( a 1 → a 2 ) = ϵ ms ( x 2 ) + X i : x i i =1 ,s 2 i = s 0 i ( V i ( s 0 , s 1 ) − U i ( s 0 ))+ X i : x i i =1 ,s 2 i = s 1 i ( V i ( s 0 , s 1 ) − U i ( s 1 )) . P ϵa 1 → a 2 ϵ R ( z 1 → z 2 ) = Y i : x 1 i =0 ,x 2 i =0 (1 − ϵ m ) ! Y i : x 1 i =0 ,x 2 i =1 1 | C i ( s 1 i ) | ! Y i : x i i =1 ,s 2 i = s 0 i 1 ϵ V i ( s 0 ,s 1 ) − U i ( s 0 ) + ϵ V i ( s 0 ,s 1 ) − U i ( s 1 ) ! Y i : x i i =1 ,s 2 i = s 1 i 1 ϵ V i ( s 0 ,s 1 ) − U i ( s 0 ) + ϵ V i ( s 0 ,s 1 ) − U i ( s 1 ) ! . (33) Since when ϵ → 0+ , ϵ V i ( s 0 ,s 1 ) − U i ( s 0 ) + ϵ V i ( s 0 ,s 1 ) − U i ( s 1 ) → 1 . Thus, 0 < l im ϵ → 0 P ϵz 1 → z 2 ϵ R ( z 1 → z 2 ) < ∞ and R ( z 1 → z 2 ) is the resistance of transition from a 1 to a 2 . Then SPBLLA induces a regular perturbed marko v process. According to Lemma 1 in [39], we learn that the stochas- tically stable strate gy profiles are strategy profiles with mini- mum stochastic potential, and a strategy profile is a stochasti- cally stable strategy profile only if it is in a recurrent class of unperturbed process P 0 . Then we only need to prove that the root of minimum trees of strategy profiles S is the maximizer of ϕ . Denote 3 kinds of strategy profile X , Y and Z ,where X is what cannot be written as [ s, s, 0 ] , Y = [ s, s, 0 ] is what is not a maximizer of ϕ , and Z = [ s, s, 0 ] is what is a maximizer of ϕ . Only Y and Z are the candidates for stochastically stable strategy profiles. IEEE TRANSA CTIONS ON WIRELESS COMMUNICA TION, V OL. X, NO. X, A UGUST 2019 12 s s s s s s s ~ 1 2 3 4 5 s s s s s s s ~ 1 2 3 4 5 s s s s s s s ~ 1 2 3 4 5 s s s s s s s ~ 1 2 3 4 5 T r e e T T r e e T E d g es o f L E d g es o f L R Fig. 16: Tree T is a tree that has an edge s → ˜ s , where multiple UA Vs atler strategies. W e give a method of transformation from T to T ′ , which removes s → ˜ s : add orange edges of L (edges s → s 1 , s 1 → s 2 , and s 2 → ˜ s ) to the tree T and then remov e blue edges of L R (edges s 1 → s 5 , s 2 → s 3 , and s → ˜ s ) from the tree T . Due to X cannot be the stochastically stable strategy profile, we focus on Y and Z . Build a minimum resistance tree T rooted at s ∗ , and some edges hav e multiple UA Vs alter strategies. At each edge, there are several UA Vs such that s k i  = s k − 1 i , which compose a group G ⊆ M . Then the probability of transition from s to s ′ is P ϵs → s ′ = X S ⊆ M : G ⊆ S ϵ m | S | (1 − ϵ m ) | M \ S | Y i ∈ M ϵ − U i ( s ′ ) ϵ − U i ( s ′ ) + ϵ − U i ( s ) , the resistance of transition from s to s ′ is R ( s → s ′ ) = m | G | + X i ∈ G ( V i ( s, s ′ ) − U i ( s ′ )) . (34) Denote the upper bound of V i ( s, s ′ ) − U i ( s ′ ) for any i ∈ M to be ∆ , the resistance of the transition from s to s ′ satisfies the inequality as follows: m | G | + ∆ | G | ≥ R ( s → s ′ ) ≥ m | G | . (35) In the first condition, there is merely a single UA V altering strategy for each edge [ s → ˜ s ] ∈ T , then the argument in [38] illustrates that s ∗ is a maximizer of the potential function. In another condition, assuming that there exists an edge [ s → ˜ s ] ∈ T with multiple U A Vs atlering strategies, Let ˜ G consist of the group of these multiple UA Vs. The resistance of this edge meets R ( s → ˜ s ) ≥ m | ˜ G | . (36) Consider a path L = { s = s 0 → s 1 → ... → s ˜ G | } , where for any k ∈ 1 , ..., | ˜ G | , there is only one UA V altering strategy and the rest U A Vs keep the original strate gies. equ. (35) sho ws that the resistance of these edges is less than R ( s k − 1 → s k ) ≤ m + ∆ , (37) and the resistance of L is at most R ( L ) ≤ | ˜ G | ( m + ∆) . (38) Build a new tree T ′ rooted at s ∗ as well but remove redundant edges of L R from T and add edges of L to T . The redundant edges consist of [ s → ˜ s ] and other directed edges rooted at strategy profiles in { s 1 , s 2 , ...s | ˜ G |− 1 } but not in L . Fig. 16 presents how the new tree T ′ forms. W e can deduce from equ. (36) that R ( s → ˜ s ) ≥ m | ˜ G | and the remanding edges in L R is at least m . Then, the resistance of L R satisfies R ( P R ) ≥ m | ˜ G | + m ( | ˜ G | − 1) . (39) The resistance of tree T ′ is R ( T ′ ) = R ( T ) + R ( L ) − R ( P R ) ≤ R ( T ) + | ˜ G | ( m + ∆) − ( m | ˜ G | + m ( | ˜ G | − 1)) = R ( T ) + ∆ | ˜ G | − m ( | ˜ G | − 1) . (40) Since | ˜ G | ≥ 2 , if m > 2∆ then R ( T ′ ) < R ( T ) . The conflict implies that trees with edges, which has multiple U A Vs changing strategies, are not the minimum resistance trees and this condition do not exist. Therefore, only maximizers of potential function are stochastically stable. A P P E N D I X C P RO O F O F T H E O R E M 5 Pr oof: Notice that ∆ max = | U i ( s ′ ) − U i ( s ) | max , where the distinction between s ′ and s is capable to be made in one iteration. Define δ ( x ) as the discrepancy of x in an iteration. In one iteration U i ( s ′ ) − U i ( s ) = A E P 1 ≤ n ≤ N P ′ in + B { X [( P ′ i − γ σ ip ( s − i )) ⊗ C i ] − α ¯ D ′ i } + C ˜ D ′ i D ! − A E P 1 ≤ n ≤ N P in + B { X [( P i − γ σ ip ( s − i )) ⊗ C i ] − α ¯ D i } + C ˜ D i D ! (41) = δ ( AE P 1 ≤ n ≤ N P in + B X 1 ≤ n ≤ N P in ) + δ ( B γ X σ ip ( s − i )) + δ ( − B αD i ) + δ ( B ακ X σ ia ) + δ ( C ˜ D i D ) . (42) If each δ ( x ) approaches maximum, U i ( s ′ ) − U i ( s ) gets the maximum. The maximum δ ( x ) are as follows. δ ( AE P 1 ≤ n ≤ N P ′ in + B X 1 ≤ n ≤ N P ′ in ) max = AE ( 1 P C i · P 1 − 1 P C i ( P 1 + ∆ P ) ) + B X C i · ∆ P = AE ∆ P P C i · P 1 · ( P 1 + ∆ P ) + B X C i · ∆ P , (43) δ ( B γ X σ ip ( s − i )) max = B γ ∆ P · X ( X j  = i C j ⊗ C i ) , (44) IEEE TRANSA CTIONS ON WIRELESS COMMUNICA TION, V OL. X, NO. X, A UGUST 2019 13 δ ( − B αD i ) max = B απ tan 2 θ [ h 2 nh − ( h nh − ∆ h ) 2 ] = B απ tan 2 θ (2 h nh ∆ h − ∆ h 2 ) , (45) δ ( B ακ X σ ia ) max = B ακ ( M − 1) π tan 2 θ (2 h nh ∆ h − ∆ h 2 ) , (46) δ ( C ˜ D i D ) max = C D π tan 2 θ [ h 2 β nh − ( h nh − ∆ h ) 2 β ] . (47) Thus, ∆ max equals to the summation of the right value of Eqs. (43)–(47). When m > ∆ max , m satisfies the assumption that m is large enough. R E F E R E N C E S [1] S. Saha, S. Nandi, and P . S. Paul, “Designing delay constrained hybrid ad hoc network infrastructure for post-disaster communication, ” Ad Hoc Networks. , vol. 25, pp. 406-429, Sep. 2015. [2] P . Doherty and P . Rudol, “ A UA V search and rescue scenario with human body detection and geolocalization, ” Australian Joint Conference on Advances in Artificial Intelligenc , Gold Coast, Australia, 2007, pp. 1-13. [3] V . Akbari and F . S. Salman, “Multi-vehicle synchronized arc routing problem to restore post-disaster network connectivity , ” European J ournal of Operational Research. , vol. 257, no. 2, pp. 625-640, Mar . 2017. [4] G. Tuna, B. Nefzi, and G. Conte, “Unmanned aerial vehicle-aided communications system for disaster recovery , ” Journal of Network and Computer Applications. , vol. 41, no. 1, pp. 27-36, Jan. 2014. [5] S. O. Koray , “Networking models in flying Ad-Hoc networks (F ANETs): concepts and challenges, ” Journal of Intelligent & Robotic Systems. , vol. 74, no. 1.2, pp. 513-527, Apr . 2014. [6] A. A. Pirzada and C. Mcdonald, “Establishing trust in pure ad-hoc networks, ” Australasian Confer ence on Computer Science. Australian Computer Society . , Feb. 2004, pp. 1-8. [7] A. T rotta, M. Di Felice, and F . Montori, “Joint coverage, connecti vity , and charging strategies for distributed UA V networks, ” IEEE T ransactions on Robotics. , vol. 34, no. 4, pp. 883-900, Aug. 2018. [8] J. Chen, Q. W u, and Y . Xu, “Distributed demand-aware channel-slot selection for multi-UA V Nntworks: a game-theoretic learning approach, ” IEEE Access. , vol. 6, pp. 14799-14811, Mar . 2018. [9] A. Y . Y azıcıo ˘ glu, M. Egerstedt, and J. S. Shamma, “Communication- free distributed coverage for networked systems, ” IEEE Tr ansactions on Contr ol of Network Systems. , vol. 4, no. 3, pp. 499-510, May . 2017. [10] D. H. Tu, J. Park, and S. Shimamoto, “Power and performance tradeoff of MA C protocol for wireless sensor network employing U A V , ” Interna- tional Conference on Advanced T echnolo gies for Communications. , Ho Chi Minh City , V ietnam, 2010, pp. 23-28. [11] J. C. Fan, D. H. Chen, and Y . Peng, “Research on force test technique of U A V model with large aspect ratio in high speed wind tunnel, ” J ournal of Experiments in Fluid Mechanics. , vol. 21, no. 3, pp. 62-65, Sep. 2007. [12] G. T una, T . V . Mumcu, and K. Gulez, “Unmanned aerial vehicle- aided wireless sensor network deployment system for post-disaster mon- itoring, ” Emer ging Intelligent Computing T echnology and Applications. , Huangshan, China, 2012, pp. 298-305. [13] L. E. Blume, “The statistical mechanics of strategic interaction, ” Games and economic behavior . , vol. 5, no. 3, pp. 387-424, Feb . 1993. [14] J. R. Marden, G.Arslan, and Shamma J S, “Connections between coop- erativ e control and potential games illustrated on the consensus problem, ” 2007 Eur opean Contr ol Confer ence (ECC). IEEE , Kos, Greece, 2007, pp.4604-4611. [15] L. Lin, Q. Sun, and S. W ang, “ A geographic mobility prediction routing protocol for Ad Hoc U A V Network, ” Globecom W orkshops (GC Wkshps). IEEE , Anaheim, CA, USA, 2012, pp. 1597-1602. [16] K. Peng, J. Du, F . Lu, Q. Sun, Y . Dong, P . zhou, and M. Hu, “ A hybrid genetic algorithm on routing and scheduling for vehicle-assisted multi- drone parcel delivery , ” IEEE Access , vol. 7, no. 1, pp.49191-49200, Dec. 2019. [17] Y . Xu, Q. W u, and S. Liang, “Opportunistic spectrum access with spatial reuse: graphical game and uncoupled learning solutions, ” IEEE T r ansactions on W ir eless Communications. , vol. 12, no. 10, pp. 4814- 4826, Oct. 2013. [18] H. Li, J. Ran, and D. Y uan, “ An adaptive channel allocation algorithm for hierarchical wireless network, ” IEEE International Confer ence on Computer & Communications. IEEE , Chengdu, China, 2018, pp. 1-6. [19] S. Dominic and L. Jacob, “Learning algorithms for joint resource block and power allocation in underlay D2D networks, ” T elecommunication Systems , vol. 69, no. 3, pp. 285-301, Nov . 2018. [20] P . Zhou, Y . Chang and J. Copeland, “Reinforcement Learning for repeated power control game in cognitive radio networks, ” IEEE Journal on Selected Ar eas in Communications , v ol. 30, no. 1, pp. 54-66, Jan. 2012. [21] P . Zhou, W . W ei, K. Bian, D. O. W u, Y . Hu, and Q. W ang, “Priv ate and truthful aggreg ativ e game for large-scale spectrum sharing, ” IEEE Journal on Selected Ar eas in Communications , vol. 35, no. 2, pp. 463- 477, Feb . 2017. [22] J. Koshal, A. Nedi ´ c, and U. V . Shanbhag, “Distrib uted algorithms for aggregati ve games on graphs, ” Operations Resear ch. , vol. 64, no. 3, pp. 680-704, May . 2016. [23] S. Koulali, E. Sabir, and T . T aleb, “ A green strategic activity scheduling for UA V networks: A sub-modular game perspective, ” IEEE Communi- cations Magazine. , vol. 54, no. 5, pp. 58-64, May . 2016. [24] H. Sedjelmaci, S. M. Senouci, and N. Ansari, “Intrusion detection and ejection framew ork against lethal attacks in UA V -aided networks: a Bayesian game-theoretic methodology , ” IEEE T ransactions on Intelligent T r ansportation Systems. , col. 18, no. 5, pp. 1-11, Jan. 2017. [25] M. A. Messous, H. Sedjelmaci, and N. Houari, “Computation offloading game for an U A V network in mobile edge computing, ” IEEE Interna- tional Confer ence on Communications. IEEE , P aris, France, 2017, pp.1-6. [26] Bekmezci, O. K. Sahingoz, and T emel, “Flying Ad-Hoc networks (F ANETs): A surve y , ” Ad Hoc Networks. , vol. 11, no. 3, pp. 1254-1270, May . 2013. [27] M. K. Jensen, “ Aggre gati ve games and best-reply potentials, ” Economic theory . , vol. 43, no.1, pp. 45-66, Apr . 2010. [28] R. Cornes and R. Hartley , “The geometry of aggregativ e games, ” School of Economics. , Jul. 2005. [29] U. Challita and W . Saad, “Network formation in the sky: Unmanned aerial vehicles for multi-hop wireless backhauling, ” GLOBECOM 2017- 2017 IEEE Global Communications Conference . IEEE , Singapore, 2017, pp. 1-6. [30] Y . Xu, J. W ang, and Q. W u, “Opportunistic spectrum access in unknown dynamic environment: A game-theoretic stochastic learning solution, ” IEEE transactions on wireless communications. , vol. 11, no. 4, pp. 1380- 1391, Apr . 2012. [31] D. W u, Y . Xu, and Q. W u. “Resource allocation for D2D wireless net- works with asymmetric social weighted graph, ” IEEE Communications Letters. , v ol. 21, no. 9, pp. 2085-2088, May . 2017. [32] M. S. Ali, P . Coucheney , and M. Coupechoux, “Load balancing in heterogeneous networks based on distrib uted learning in near-potential games, ” IEEE T r ansactions on W ir eless Communications. , vol. 15, no. 7, pp. 5046-5059, Jul. 2015. [33] S. Rahili and W . Ren, “Game theory control solution for sensor coverage problem in unknown en vironment, ” 53r d IEEE Conference on Decision and Contr ol. IEEE , Los Angeles, CA, USA, 2014, pp. 1173-1178. [34] M. Hasanbeig and L. Pavel, “On synchronous binary log-linear learning and second order Q-learning, ” IF AC-P apersOnLine . , vol. 50, no. 1, pp. 8987-8992, Jul. 2017. [35] M. Hasanbeig and L. Pavel, “From g ame-theoretic multi-agent log linear learning to reinforcement learning, ” arXiv preprint , Feb . 2018. [36] L. Bing, Q. Cui, and W . Hui, “Optimal joint water-filling for OFDM systems with multiple cooperative power sources, ” Global T elecommu- nications Confer ence. , Miami, FL, USA, 2010, pp. 1-5. [37] D. Monderer and L. S. Shapley , “Potential games, ” Games & Economic Behavior . , vol. 14, no. 1, pp. 124-143, May . 1996. [38] J. R. Marden and J. S. Shamma, “Revisiting log-linear learning: Asynchrony , completeness and payoff-based implementation, ” 2010 48th Annual Allerton Conference on Communication, Contr ol, and Computing (Allerton). IEEE , Allerton, IL, USA, 2011, pp. 1-40. [39] H. P . Y oung, “The ev olution of con v entions, ” Econometrica. , vol. 61, no. 1, pp. 57-84, Jan. 1993. [40] Y . Ding, Y . Huang, and G. Zeng, “Using partially overlapping channels to improve throughput in wireless mesh networks, ” IEEE Tr ansactions on Mobile Computing. , vol. 11, no. 11, pp. 1720-1733, Nov . 2012.

Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Leave a Comment