Decentralized Knowledge and Learning in Strategic Multi-user Communication

Please see the content of this report.

Authors: Yi Su, Mihaela van der Schaar

Decentralized Knowledge and Learning in Strategic Multi-user   Communication
1 Decentralized Knowledge and Learning in Strategic Multi-user Communication Yi Su and Mihaela van der Schaar Dept. of Electrical Engineering, UCLA {yisu, mihaela}@ee.ucla.edu I. I NTRODUCTION A. Motivation Multi-user wireless communications systems form competitive environments, where heterogeneous and strategic users compete for the available spectrum resources. These heterogeneous devices may differ in terms of their adopted standards and arch itectures, their deployed communication algorithms, their experienced environment (e.g. channel conditions and traffic characteristics), thei r application-defined utilities etc. [1]. Moreover, these devices can also significantly differ in their ability to sense the environment and acquire information about o ther users sharing the same spectrum, exchange information and negotiate spectrum access rules with the competing users and, ultimately, learn and reason based on the av ailable information to select their optimal transmission strategies. Importantly, in most c ommunication scenarios, users compete for resources in a strategic manner, i.e. they aim a t maximizing their own utilities. The majority of past multi-user communications research has fo cused on analyzing and quantifying the p erformance of various multi-user environments, where transceivers with similar standards, algorithms or utilities share th e availa ble spectrum resources. An underlying assumption has been that the users select thei r transmission actions based on either complete or no knowledge about their competitors ’ protoc ols, s trat egies , utili ty func tions , environment etc. In this report, the term “knowledge” represents the correct beliefs 1 of users about their opponen ts and the 1 Plato defined knowledge as "j ustif ied true belief" [http://en. wikipedia.org/wiki/knowledge]. 2 environment. Specifically, two extreme types of multi-user in teraction have been especially studied in the past: • the complete knowledge scenario, which assumes that all participating users kn ow 2 the other users’ actions, utility forms etc.; • the private knowledge (e.g. the knowledge of it s own channel or traffic characteristics) scenario, where users interact by assuming no knowledge about each other. Hence, the aforementioned comm unications scenarios neglect that in a multi-user communication system, the users’ knowledge is mostly decentralized and heterogeneous. Another assumption in the existing research has been that there is consensu s about what is a fair distribution of resources and thus, efficiency and fairness can be simultaneously satisfied. However, this violates the strateg ic nature of users and, as a result, it encourages a passive participation of transceivers in the multi-user interact ion, because wireless users are not able to proactively influence the resou rce division based on their knowled ge . Thus, such current multi-user network designs do not consider the heterogeneity and decentralization of the knowledg e of the participating devices and, do not take advantage of t he transceivers’ “smartness” (i.e. their ability to acquire information, learn and reason abou t their opponents), which often lead in practice to inefficient spectrum usage. For instance, in ex isting centralized resource management solutions for 802.11e Hybrid Coordination Fun ction, users operate in a non-collaborative manner and request spectrum access from the m oderator (e.g. the access point or base station) by declaring their worst case resource needs, while disregarding the resource requests of their competing users [48]. Alternatively, in this report, we present relatively recent research developments aimed at building a new paradigm for multi-user communication environments, wh ere the emerging interaction among users and their resultin g perfor mance will be driven by the heterogeneous users’ ability to strategically adapt their transmission actions, based on their knowledge about their 2 The assumption is that this knowledge wa s obtained by users publicly announcing thei r private information, or by implicitly assuming that all use rs that are co-existing adopt the sam e protocol [7][13]. 3 competitors and the environment. Hence, voluntary rather than externally imposed (e.g. protocol enforced) collaborations may evolv e among strat egic users, because such collaborations may allow them to increase their own utility [44]. The devel opment of such new parad igms for multi-user communications scenarios is esse ntial, given the r ecent proliferation of heterogeneous protocols, devices, applications and services, but also diverse user needs. B. A user-centric approach for multi-user communicatio ns In this report, we will not adop t the conventio nal view of characterizing the multi-user interactions from a system designer’s perspective, which aims to maximize the social welfare of the users by assuming that there is common kno wledge about the users’ preferences etc. and that there is consensus abo ut how to define the welfare [45][46]. Inste ad, we will take the viewpoint of self-interested users, which have their own prefer ences and knowledge, and present a theoretical foundation that enables devices to proactively impr o ve their own utilities by acquiring kn owledge based on their strategic interaction with competing users. Note that th is user-centric approach can also positively impact the overall communication system performance, as users are now proactively interacting and improving their knowledge. This decentralized knowledge paradigm is well-suited in the considered hetero geneous and distributed co mmunication environments, where there is little or no available common knowledge. In the presented paradigm for multi-user wirele ss systems, the hetero geneous transceivers are modeled as self-interested agents that strateg ically interact by adapting their tran smission actions in order to maximize their own payoffs. The tran smission actions are the algorithms and configurations available in the protocols of various OSI lay ers (from the application to the phy sical layer). The heterogeneous users may differ not only in their available transmission actions, but also in their strategies for selecting their actions based on their utilities, and their available knowled ge about other users and the environment., The emerging interaction among users and their resulting performance in the paradigm is driven not only by the users’ ability to efficiently adapt their 4 transmission strategies (e.g. their available modula tion and coding schemes, ability to shape their traffic), but a lso by their ability to proactively acquire information and form accurate beliefs about their competitors and the environment. The knowledge that the users can bu ild depends on their ability to acquire information (which may be restricted by their hardware constraints, adopted protocols, etc.) and their capability to reason and lear n based on this information, in order to form beliefs about the other users’ actions, policies, pr otocols etc. Using their knowledge, the users can predict the responses of their opponents to th eir actions, and hence optimize their strategies for selecting a specific action given their utility. Thu s , the multi-user wireless interaction becomes a game played by strategic agents, based on their avail able actions, strategies and, additionally, their knowledge. In sh ort, unlike in trad itional com munication theory, where the performance of each user and the system is influenced by the users’ actions, protocols and experienced environment, in the communication paradigm discussed in this repo rt, the kn owledge based on which users select their actions will also play an essential role. In this report, we show how d ifferent forms of knowledge about the environment and competing users can influence the actions selected by various users and, consequentially, shape the resulting multi-user interactions and the utilities obtained by the users. We also intro duce strategic learning techniques that enable u sers to accumulate knowledge and maximize their utilities by acquiring information via their interactions with other users and, based on this information, building their beliefs about their competing users and th e enviro nment, and optimizing their transmission actions. C. Quantifying performa nce in multi-user environments with decentralized knowledge To characterize multi-user environments with decent ralized knowledge, we discuss in this report two related metrics: (a) The value of knowledge. First, we discuss which performance bounds can be attain ed by individual users, when users an d/or resource moderators with d ifferent degrees of knowledge about the en tire communication system and th e competing users (e.g. the kno wledge of other 5 users’ action space, strategies, utility functions , used protocols etc.) interact. We term these resulting performance bounds as “the value of knowledge”. We show in this report how the heterogeneous users can select their actions, given their available knowledge. (b) The value of learning. We also discuss methods for constructing operational solutions and algorithms, which enable each user to approach the performance bound. For this, users can systematically acquire information about other users and, based on this information, deploy strategic learning solutions that enable them to forecast the other users’ responses and, ultimately, to optim ize their transmission actio ns. We highlight several stra tegic learning techniques, which require various information o verheads and complexity costs, and lead to different performance gains for the wireless users. To quantify these gains, we use a new metric, which we refer to as “the value of learning”. Fig. 1. Components of heterogen eous multi-user communication framework with decentralized knowledge. As shown in Figure 1, while the value of learning d epends on the practical capability of the u ser to reason a nd dynam ically im prove its bel iefs ba sed on its observa tions an d explic it info rmatio n exchanges, the value of knowledge represents the upper bound wh ich can be achieved by a user, given its observat ions and information . The term observation refers to the observable outcome that users can access in the multi-user interaction without explicitly exchanging messages, while information is obtained by u sers through explicit message exchanges Summarizing, the goal of this report is to prov ide the reader with a thorough und erstanding of 6 how the users’ lea rning abilities , their kn owl edge , and their strategies for selecting actio ns will impact and shape the multi-user interaction and the users’ resulting utilit ies . The report is divided into two main parts. First, we will discuss how the users can optimize their actions based on their available knowledge and show how to compute performance bounds achievable by the heterogeneou s users with different knowledge. Second, to ap proach these bou nds, we discuss how to construct o perational algori th ms for wireless users to proactively drive the information acquisition, and use strategic learning to form beliefs, based on which they can optimize their transmission strategies. Section II describes the gam e theoretic approach in the communication setting and introduces a general mathematic m odel for communication games. Section III defines the value of available knowledge in communication games, formulates t he optimal decision making for the users with heterogeneity in their available knowledge, an d validates the performance improvement in power control games. In Section IV, the theo ry of learning in games is introduced as a tool for acquiring knowledge by users and improving their own performance. For instance, it is shown that in power control games, users can improve their performance by apply ing the regret matching algorithm. Section V concludes the report with t he key challenges for future research. II. A G AME T HEOR ETIC M ODEL FOR C OMMUNICA TION N ETWORKS Game theory provides a formal framework for studying the interactio ns of strategic agents. Recently, there h as been a surge in research activities that employ g ame theory to model and analyze a wide range of app lication scenarios in modern communication networks [2] -[4], including reso urce allocation [5][6], power control and spectrum management [7]-[13], m edium access control [14]-[17] , routing and congestion control [18]-[22], information theoretical analysis [23]-[ 25], etc. Depending on the ch aracte ristics of diffe rent applications, numerous game-theoretical models and solution co ncepts have been proposed to describe th e multi-user interactions and optimize the users’ decisions in communication networks. 7 W j j a j u i i a i u i k j k Fig. 2. A general illustration of kno wledge-driven communication games In this report , we first f ormulate the m ulti- user interaction in comm unication systems as a knowledge-driven strategic game , where users with different levels of knowledge availability compete for limited resources. As shown in Figure 2, the game is formally defined as a tuple ,, , , Game Ω = N WAU K , where {} 1, , N = " N is the set of wireless users, which are the rational decision-makers in the system; W is th e available network res ources, e.g. frequency bands, time slots, or routes; A is the joint action space nn ∈ =× N AA , with n A being the action set available for user n to play the resource sharing game (e.g. selecting a specific frequency channel, time slot, or route for transmission) ; U is the utility vector function defined as a mapping from joint actions and the network resources to an N -dime nsional real vector with each element being the utility of a particular user, i.e. : N nn u ∈ + =× × 6\ N UA W ; K represents the joint kn owledge set of all the users. Specifically, for any ( ) 1 ,, N kk = ∈ " K k , n k indicates user n ’s available knowledge about its oppo nents and the network resource. In a communication environment, users interact with each other and determine their actions such that they maximize their utility functions based on their available knowledg e. Note that, as opposed to traditional st rategic game definitions [26], we purposely include two new components W and K in the game formulation, because bo th of them play very importan t roles in communicat ion games. Specifically, the available network resources W over which the game is played direc tly influences the utility achievable by the participating users. The next section will investigate the impact of knowledge K in communication games. The proposed game formulation is general and it can be applied in nume rous communication scenarios. As we will see in the n ext section, many well-known game models are special cases of the above formulation. 8 III. K NOWLEDGE IN C OMMUNICA TION G AMES A. The Value of Knowl edge () V priv k () V heter k ( ) V comp k Fig. 3. The value of kno wledge. We define the policy with which user n engages in the multi-user game : nn n π 6 KA , as a mapping from the user’s available knowledge into its specific selected action. The value of knowledge is defined as ( ) ( ) {} 11 () , , ( ) , NN kk w ππ = " k VU , ( 1 ) which is the utility of individu al users given their selected actions based on their knowledge 3 . This metric represents the resulting performance given the decentralized kn owledge available to the users. However, as we will discuss later, ex isting research on communication games usually implicitly assumes that the knowled ge vector k takes two extreme s: either complete knowledge about the entire multi-user system comp k or private kn owledge priv k about them selv es. By explicitly allowing th e knowledge n k in equation (1) to vary, i.e. individual users have d ifferent degrees of knowledge, we aim at provid ing a systematic characterization of the impact of knowledge availability in communication networks. Figure 3 shows an ex ample of the value of heterogeneous and decentralized kno wledge heter k by comparing it with the conven tional solution concepts, including Nash equilibrium and Pareto optimum that will be defined in the next subsection, where users have private knowledge priv k or co mplete kn owledge comp k , respectively . 3 For the remaining part of this report, we omit w in the expression of utility func tions for static network resource. 9 Intuitively , users can benefit themselves by gai ning more knowledge. However, th is may potentially harm other users [49]. As we will show late r, it is surprising that there are also setting s where i t can l ead t o ( ) ( ) ; VV heter priv kk 4 , if ⊇ heter priv kk , i.e. a subset of self-interested u sers with more knowledge may benefit the utilities of all users in the comm unication system. B. Special Cases: Private Knowl e dge and Complete Kno wledge Roughly speaking, the existing multi-user re search can be categorized into two types, non-cooperative games and coo perative gam es. In wirele ss networks, any action taken by a single user usually affects the utilities of the other users sh aring the same resources. Various game theoretic solutions were developed to characterize the resulting performance in both game models, among which the most well-kno wn ones include Nash equilibrium (NE) and Pareto op timali ty [26]. As we will see, both scenarios implicitly assu me that the users’ available knowledge takes a homogeneous form. This is mainly because most research assumes that only users with homogeneous protocols, actions, etc. can interact. However, new systems are currently emerging, where heterogeneous users with different protocols and running different applications co-exist and influence each other. First, non-cooperative approaches generally assu me that the participating users only have private knowledge priv n k about their own chann el conditions, utility func tions, etc. The se use rs simply choose actions nn a ∈ A to selfishly maximize their indivi dual utili ty function s. In othe r words, the policy o f each user is ( ) ( ) arg max , nn priv nn n n n a ku a a π ∈ = A − in which n a − = ( ) 11 1 ,, , ,, nn N aa a a − + "" . Most non-cooperative approaches are d evoted to investigating the existence and properties of NE. NE is defined to be a profile ( ) ** 1 ,, N aa " of actions with the property that for every player, it satisfies ( ) ( ) ** * ,, nn n nn n ua a ua a −− ≥ for all nn a ∈ A , i.e. given the other users’ actions, no user can increase its utilit y alo ne by changing its actio n. As shown in Figure 4 The symbol ; denotes component-wise inequality. 10 3, it is well-known that operating in non-coop erative manners will generally limit the performance of the device itself as well as the whole syst em, because the available resources are not always effectively exploited due to the conflicts of interest occurrin g among users [27]. On the other hand, cooperative approaches focus on study ing how users can jointly optimize a common objective function ( ) 1 ,, N f uu " . This function represents the social welfare (allocation rule) based on which the system-wide resource all ocation is performed. Allocation rules, e.g. weighted sum maximization, can provide reasonable allocation outcomes by considering the trade-off between fairness and efficiency. However, this is true only when users agree on the allocation rules, or they are pr ice-takers, which disregards the decentralized knowledge of users, and their strategic nature. A profile of actions is Pareto optimal if there is no other profile of actions that makes every player at least as well off and at least one player strictly better off. Most cooperative appro aches focus on studyin g how to efficiently fin d the optimum joint policy ( ) * 1 arg max , , N a af u u ∈ = " A . It should be pointed out that, in order to achieve Pareto optimality, information exch ange throughout the whole system is required such that all th e users can collaboratively maximize ( ) 1 ,, N f uu " and improve the system efficiency. These cooperative scenarios either assume complete knowledge comp k about all the users by a tru sted moderator or peer (e.g. access point, base station, selected network leader etc.), to which it is given the autho rity to centrally divide the available resou rces among the participating users [11], or, in the d istributed setting [7][12][13], users exchange price signals {} 1 ,, N pp = " p that reflect the “cost” for cons uming th e constrained resources and maintain the required knowledge of {} , nn n ku p = to maximize the social welfare and reach Pareto optimal allocations. In both centralized and decent ralized scenarios, the users are assumed to have as objectives the maximization of social welfar e. However, in many practical setting s, strategic users aim at optim izing their own uti lities based on their knowledge, rather than that of the system. 11 C. An Example: Power Control Games with Decentralized and Heterog eneous Knowledge As discussed previously, existing approaches only assume two extremes for the users’ knowledge. However, devices in current wireless netw orks can differ a lot in their abilities to sense the environment, to gather info rmation, to r eason about this information and pro actively make decisions, which motivates us to investigate the impact of heterogeneity (in terms of action sets, utilities, knowledge) in the communication system. The decentralized heterogeneous knowledge of devices can take various forms, e.g. knowledge about resource allocation rules, available resource, opponents’ actions, their strategies, and their utility functions. As an illustration, we investigate a multi-user power control game that considers how us ers should allocate their power in multi-carrier systems [10]. 1) Power Control Games Fig. 4. Gaussian in terference channel model. Figure 4 shows the frequency -selective Gaussian interference channel models. Th ere are N transmitters and N receivers in the system. Each transmitter and receiver pair can be viewed as a player. The transfer function of the channel from transmitter i to receiver j is denoted as ( ) ij Hf , where 0 s f F ≤≤ . The noise power spectral density (PSD) that receiver n experiences is denoted as ( ) n f σ and its transmit PSD as ( ) n Pf . Its transmit PSD is subject to its power constraint: () 0 s F n Pf d f ≤ ∫ n P . ( 2 ) For a fixed ( ) n Pf , if treating interference as noise, user n can achieve the following d ata rate: 12 () () () () () 2 2 2 0 log 1 s F nn n n nj j n jn Pf H f Rd f fP f H f σ ≠ ⎛⎞ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ =+ ⎟ ⎜ ⎟ ⎜ ⎟ + ⎜ ⎟ ⎟ ⎜ ⎝⎠ ∫ ∑ . ( 3 ) In the power control games, the set of play ers is {} 1, , N = " N . The set n A o f actions available to user n are the set of transmit PSDs satisfying the constraint in (2), and n u is user n ’s achievable rate n R . To fully capture the performance tradeoff in the sy stem, the rate region is defined as ( ) () () ( ) {} 11 ,, : ,, ( 2 ) ( 3 ) NN RR P f P f s a t i s f y i n g a n d = ∃ "" R , ( 4 ) which contains all the achievable data rate combi nations among the users subject to the power constraints. In the following subsections, we will inv est igate the power control games with homogeneous and heterogeneous users, and show that self-inter ested users can potentially improve the system performance if they have more knowledge. 2) Power Control Game with Homogeneous Users Both non-cooperative and coop erative solutions have been developed for power control games. First, a non-cooperative iterat ive water-filling (IW) algorith m has been proposed to optimize the performance without the need for a central controller [10]. It assumes that each user only has the knowledge of its channel gain and the noise-and-interference PSD, i.e. () { 2 , priv nn n kH f = () () () } 2 nj j n jn fP f H f σ ≠ + ∑ . At every decision stage, self-interested users deploying this algorithm try to myopically maximize their imm edi ate achievable rates. Specifica lly, for a two-user system, each user simply updates their transmit PSD by choosing th e optimal solutions of () () () () () () () () 1 2 11 1 2 0 12 1 2 1 0 1 max ln 1 .. 0, s s F Pf F Hf P f df fH f P f st P f d f Pf f σ ⎛⎞ ⎟ ⎜ ⎟ ⎜ + ⎟ ⎜ ⎟ ⎜ ⎟ ⎟ ⎜ + ⎝⎠ ≤ ≥∀ ∫ ∫ 1 P ( 5 ) 13 and () () () () () () () () 2 2 22 2 2 0 21 2 1 2 0 2 max ln 1 .. 0, s s F Pf F Hf P f df fH f P f st P f d f Pf f σ ⎛⎞ ⎟ ⎜ ⎟ ⎜ + ⎟ ⎜ ⎟ ⎜ ⎟ ⎟ ⎜ + ⎝⎠ ≤ ≥∀ ∫ ∫ 2 P . ( 6 ) Both users will iteratively perform water-f illin g with respect to their exp erienced noise-and-interference PSD () () () 2 nj j n j n f Pf H f σ ≠ + ∑ across the whole frequency band until a Nash equilibrium is reached. On the other hand, cooperative solu tions in powe r control games can attain Pareto optimality by sharing their complete knowledge ( ) {} ( ) {} {} , comp ij n kH f f σ = thro ughout the sy stem and solving the optimization globally [11]. To find the Pareto optimal operating points on th e boundary o f the convex hull of the rate region R , the cooperative approaches focus on solving the weighted rate-sum maxim ization () () () () () () () () () 1 2 2 2 0 ,, 1 0 max log 1 .. 0, , s N s N F nn n n Pf P f n nj j n jn F n n Pf H f wd f fP f H f st P f d f n Pf n f σ = ≠ ⎛⎞ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ + ⎟ ⎜ ⎟ ⎜ ⎟ + ⎜ ⎟ ⎟ ⎜ ⎝⎠ ≤∀ ≥∀ ∑ ∫ ∑ ∫ " n P , ( 7 ) in which the n th element in the weighted vector 1 N ww ⎡ ⎤ = ⎢ ⎥ ⎣ ⎦ " w indicates the relative importance of the user n ’s utility [11][12]. Figure 5 shows the achievable rate region for both approaches [12]. Similar with Figure 3, we can see that there is substantial perform ance gain obtained by the cooperativ e algorithms over the non-cooperative IW algorithm. However, al l these algorithms still assume homogeneous participants in the sense that they d eploy the same algorithms and have the similar information availability. Therefore, they are no t well-suited to be applied in the scenarios where users differ in their ability to acquire information and improve performance. 14 Fig. 5. Ra te reg ions obta ined by various a lgorithms [12 ]. 3) Heterogeneous Knowledge and Stackelberg Equilibrium To give an example of how users with heterogeneous knowledge can improve the system performance, we introduce a simple two-us er two-channel power control game, where 11 22 1 HH == and 12 1 σσ == for both channels, () 12 10 . 8 H = and () ( ) () 12 21 21 212 HH H == 0.4 = . Users’ power constraints are 10 == 12 PP . User 1 has two actions including Spread and Concentrate . In the Spread mode, u ser 1 will split its power in both ch annels. In the Concentrate mode, it will concentrate all its power in channel 1. User 2 has two similar actions, whereas it will transmit at its maximum power instead in channel 2 in the Con cent rate mode. If users choose a ( Concent rate , Sp read ) play, user 1’s utility will be 2 10 log 1 2.12 10 . 4 * 5 ⎛⎞ ⎟ ⎜ += ⎟ ⎜ ⎟ ⎜ ⎝⎠ + an d us er 2’s i s () 22 5 log 1 log 1 5 10 . 8 * 1 0 ⎛⎞ ⎟ ⎜ ++ + ⎟ ⎜ ⎟ ⎜ ⎝⎠ + 3.22 = . Similarly, we can calculate users’ utilities for all the action combinations and the payoff matrix of this game is shown in Fi gure 6. Note that in this gam e, user 1 has a strictly dominant strategy, Spread . Therefore, two play er s will end up with a ( Spread , Spread ) play if they have only the private knowledge and will always take the best response. The resulting outcome i n this case will be ( ) 2.83, 2.42 . If user 1 is aware of us er 2’s coupled reaction, it will play Concentrate instead, such that th ey wi ll end up with a ( Concentrate , Co ncentrate ) play, which leads to an increased utility of ( ) 3.46, 3.46 for b oth players. It is worth n oticing that additional knowledge is needed to attain this performance improvement, i.e. user 1 needs to know 15 the utility and the response strategy of the other user. User 1 User 2 Spread Concentrate Concentrate 2. 12, 3.22 3.46, 3.46 Spread 2.83, 2.42 3.59, 2.12 Fig. 6. A simple power control game: user 1’s utility is given first in each cell, with user 2’s following. To address heterogeneity in the above case, we introduce the concept of Stackelb erg equilibrium [2]. Definition of Stackelberg Equilibrium : Let ( ) n NE a be th e Nash equilibrium strategy o f the remaining players if play er n chooses to play n a . The strategy profile ( ) ( ) ** , nn aN E a is a Stackelberg equilibrium for user n iff ( ) ( ) () ( ) ** ,, , nn n nn n n n ua N E a ua N E a a ≥∀ ∈ A . 4) Power Control Game with Heterogeneous Users In contrast with existin g approaches, we stud y a p ower control game with two heterogeneous users [28]. Specifically, user 2 has o nly th e knowledge () () () () {} 22 22 2 1 12 2 , priv kH f f P f H f σ =+ and it always takes the best response strategy of water-filling. The knowledge of u ser 1 1 hete r k includes ( ) {} ij Hf , ( ) {} k f σ , and user 2’s policy ( ) 2 2 priv k π of water-filling. We apply the concept of Stackelberg equilibrium to determine user 1’s opti mal action by formulating the following bi-level programming problem: () () () () () () () () () () () () () () () () 1 2 2 11 1 2 0 12 1 2 1 0 1 2 22 2 2 2 0 22 1 1 2 max ln 1 .. 0 arg max ln 1 .. s s s F Pf F F Pf upper level problem lowe Hf P f df fH f P f st P f d f Pf Hf r level p ro bl Pf Pf d f fH f P f s m tP f e σ σ ′ ⎧ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎨ ⎪ ⎛⎞ ⎟ ⎜ ⎟ ⎜ + ⎟ ⎜ ⎟ ⎜ ⎟ ⎟ ⎜ + ⎝⎠ ≤ ≥ ⎛⎞ ′ ⎟ ⎜ ⎟ ⎜ =+ ⎟ ⎜ ⎟ ⎜ ⎟ ⎟ ⎪ ⎪ ⎪ ⎪ ⎪ ⎩ ⎧ ⎪ ⎪ ⎪ ⎪ ⎨ ⎪ ⎜ ⎠ ⎪ ⎪ ⎩ + ′ ⎪ ⎝ ∫ ∫ ∫ 1 P () 2 0 0 s F Pf d f ≥ ′ ≤ ∫ 2 P . (8) 16 As indicated in (8), user 1 sho uld optimize it s transmit PSD by exploring its kn owledge and including th e opponent’ s policy ( ) 2 2 priv k π as a constraint in the lower level problem in the bi-level program, whereas in the iterative water-filling algo rithm, user 1 will solve the upper level problem instead since it only has the private knowledge. 0 20 40 60 80 10 0 120 140 160 180 200 0 2 4 6 8 frequency bins P 1 (f) IW A lgori t hm (Us er 1) 0 20 40 60 80 10 0 120 140 160 180 200 0 2 4 6 8 frequency bins P 1 (f ) Propos ed A l gorit hm (Us er 1) 0 20 40 60 80 10 0 120 140 160 180 200 0 2 4 6 8 frequency bins P 2 (f) IW A lgori t hm (Us er 2) 0 20 40 60 80 10 0 120 140 160 180 200 0 2 4 6 8 frequency bins P 2 (f ) P roposed A lgori t hm (Us er 2) Fig. 7. Power allocations of different algorithms. 0 200 400 600 80 0 1000 1200 1400 0 200 400 600 800 1000 1200 1400 R 1 (K bps ) R 2 (K bps ) P riv at e k nowledge (Nas h) Het erogeneous k nowledge (S t a c k el berg) Compl et e k nowledge (P aret o bo undary ) Fig. 8. Va lue of he terogene ous knowle dge in power control gam e. We provide a sub-optimal solution of problem (8) to verify the perfo rmance improvement when having heterogeneous knowledge. Figure 7 sho ws the power allocations for both users using iterative water-filling algorithm and our prop osed al gorithm. In the iterative water- filling algo rithm, each user water-fills the whole frequency band by regarding its competitor’s interference as 17 background noise. In contrast, user 1 will no t water-fill, if it has additional kno wledge 1 heter k of user 2’s channel state information and power allocatio n strategy. As shown in Figure 7, user 1 concentrates its power in the interval [ ] 30 , 117 even thoug h it can gain an immedia te increase in 1 R by re-allocating some of its power in the region where the noise PSD is below its water-level, e.g. [ ] 20 , 30 and [ ] 117 , 140 . 0. 5 1 1. 5 2 2.5 3 3. 5 0 0. 5 1 1. 5 2 2. 5 3 3. 5 4 4. 5 His t ogram of t he Rat i o of R ' i ov er R nas h i R ' i /R nas h i P robabi li t y Densi t y R ' 1 /R nas h 1 R ' 2 /R nas h 2 Fig. 9. H istogra m fo r the ra tio of 11 / nash RR ′ and 22 / nash RR ′ To evaluate the performance, we tested a variety of frequency-selective fading channels where the iterative water-filling algorithm reaches the Na sh equilibriu m. The total power of all ray s of ( ) 11 Hf and ( ) 22 Hf is normalized as one, and that of ( ) 12 Hf and ( ) 21 Hf is no rmalized as 0.5 . Figure 8 shows the achievable rate re gions of different algorithms for one realization of randomly generated channels, which represent the valu es of various knowledge, includ ing priv k , heter k , and comp k , in the two-user power control games with di fferent combinations of power constraints. We can see the achievable rate region is e n larged in the case with heterogeneous users compared with the iterative water-filling solution. In other words, as illustrated in Figure 3, user 1’s additional knowledge can benefit both of them even though user 2 behaves in a non-collaborative manner. Figure 9 shows the simulated histogram of the ratio of 11 / nash RR ′ and 22 / nash RR ′ , where i R ′ is user 18 i ’s achievable rate of our proposed algorithm in the heterog eneous case and nash i R represents that of user i if both of them adopt the water-filling algor ithm. Simulation results show that both users will gain rate im provement in most of the realiz ations. The average rate improvem ent for user 1 is 16.4%. Interestingly , the average rate improvement fo r user 2 is 74%, which is significantly higher than that of user 1. IV. S TRA TEGIC L EARNING IN C OMMUNICA TION G AMES In the previous section, we define the val ue of knowledge, which represents the achievable performance bound when heterogeneous users with different amounts of knowledge interact and reach steady-state. In this section, we will discuss how users can approach these bounds using strategic learning. A. Learning in Communicati on Games Most non-cooperative game theory research focuses on study ing various equilib rium concepts. The traditional explanation for when and why equilibrium arises is th at it results from analy sis and introspection by the players in a si tuation where the rules of the game, the rationality of the players, and the players’ payoff functions are all common knowledge. However, this is difficult to attain in communication networks, where the information is decentralized and often private to the u sers. Thus, we use another strand of game theory known as “learning in games” [29] in order to construct operational solutions that allow auton omous users to approach the perfo rmance bounds of knowledge that they try to obtain via strategic learning. In contrast with issu es such as equi libriu m characterization, the area of learning in games focuses on developing constru ctive methods to achieve various points in the utility space, where agents may or may not be person-by-person optimal. It also provides an a lternative explanation of how equil ib rium arises, as the long-run outcome of a process in which less than fully rati onal play ers grope for optimality over time. The main interest of strategic learning is to understand what will be the emergent behavior of interacting 19 agents that use simple adaptation rules to adjust and optimize their strategies. The outcome of these interactions need not converge to equilibriu m, i.e., perpetual adaptation of strategies may persist, but the user’s and the system’s utilities may su bstantially improve compared to the performance of NE. While originally learning in games was used to develop descriptive models in social systems [47], these m ethods can also be used as a pr escriptive model driving the design of dyna mic reconfiguration mechanisms in multi-agent systems. This has motivated recently a gro wing interest in the application of strategic learning to en gineered sy stems (e.g. in robotics [30]-[32] etc.). However, strategic learning algorithms have only been recently u sed in communication environments [33]-[41]. In the previous section , we demonstrated the incentives of self-interested users to increase their knowledge. However, th e process of acquiring the desired knowledge can be very d ifficult in general, because the intentions of users could be difficult to predict, and their knowledge and skills could be difficult to envisage. Th us, a strategic us er can only predict these uncertainties caused by the competing users b ased on its ex plicit information exchanges with other users and its observations of past interaction. To address this challenge, a user can deploy strategic learning solutions to build beliefs based on their available information and, based on these beliefs, optimize their actions. Strategic learning algorithms have th e following benefits when applied in multi-u ser communications settings. First, strategic learni ng can add ress the challenges arising due to the distributed availability of information and kn owledge in communication networks, where the decisions on how to adapt the transm ission strategies need to be performed in a distributed manner, as the delay caused by information exchanges and communication overheads usually do not allow propagating messages back and forth to a central decision maker. Second, as long as communication devices are built according to specific protocols, users’ beh aviors, actions and ev en payoff functions, can be parameterized. As a result, users are able to easily model their opponents, and strategically learn the desired knowledge, thereby improving their perfo rmance through 20 repeated interactions. Third, if we con sider dy namic environments, the operating condition s of interacting users are time-varying, and thus, they need to timely ad apt to the network dy namics. In short, strategic learning is an appealing approach for driv ing the information acquisition and decision making in multi-user communication sy stems, where users need to o ptimize their transmission strategies in an informationally decentralized manner. The existing learning in g ames literature provid es a broad spectrum of analytical and practical results on learning algorithms and underlyin g game structures for a variety of competitive interaction scenarios. In general, the main issue considered has been to characterize long term behavior in terms of a generalized equilibrium concept or characterize the lack of conv ergence for general classes of learning dynamics. Howeve r, when selecting learning solutions for communication games, the specific constraints and featu res of the considered multi-user interactions will need to be considered. For instance, the learning algorithm that should be deployed by a user in a wireless environment strongly depends on what information it can observ e about the other users, given the adopt ed protocols or spectrum regulation rules. In this report, we assume that knowledge can be gathered by users based on observ ations and/or info rmation. We differentiate these two types of knowledge g athering solutions, since we want to highlight the different communication overheads incurred as well as the challenges that arise when heterogeneous users, deploying different protocols and/or having different learning abilities in teract. There are several different forms of information in com munication systems: z Private information : this p rivate information can include the utility fo rms, characteristics of the application traffic, channel gains or channel conditio ns (SINR, etc.); z Network informa tion : the netwo rk information refers to the network resource states (e.g. which channels are available for transmission); z Oppo nents’ inform ation : this information can include the protocols, action sets, strategies, and even utilities of the opponents. 21 B. Defini tion of Learning Algorithms an d Beliefs in Communication Games t i o t i s t i I − t Ω Fig. 10. An illustration of strategic learning We note that a learning algorithm is built based on all the observation t i o and exchanged information t i I − obtaine d by user i , and hence, it is denoted as ( ) , ii i oI − L . Based on its available information and observation s, a user can build its belief about other users’ strategies, policies, and the network resource, and update its response policy . As shown in F igure 10, in communication environment, a multi-agent strategic learning algorithm ii ∈ =× LL N can b e defined using the following equations: ( ) ˆ , tt t ii i ak w i t h π = ˆ ii tt t t t ii w ks B B B −− ⎡ ⎤ = ⎢ ⎥ ⎣ ⎦ s π , ( 9 ) ( ) ˆ ,, , , tt t t t Game w Ω = N auk , ( 1 0 ) ( ) tt t ii oO ∈ =× = Ω o N , ( 1 1 ) ( ) +1 ˆ , tt t ii i i k ππ = F , ( 1 2 ) ( ) +1 ,, ii i tt t t ii BB o I −− − − = ππ π F , ( ) +1 ,, tt t t ww w i i BB o I − = F , ( ) +1 ,, ii i tt t t ii BB o I −− − − = ss s F , ( 1 3 ) where t i π is the policy taken by user i to select its action, t i s represents the status of user i ’s private information, i t B − s , i t B − π and t w B are the beliefs abou t other users’ private information t i s − , policies i − π and network resource w , ˆ t i k approximates the true knowledg e t i k , and t i I − is the explicit informa tion that user i has received about other users and the network resou rce; t Ω is the output of the multi-user interaction; t i o is th e observation of user i and O is the observation 22 function, which depends on the current game outpu t; i − π F , w F , and i − s F are the update functions about the policies and beliefs. Note that different learning algorithms can employ various update functions, and the three update functions listed above are not necessarily included. Eq. (9) shows that user i takes action based on its own private information, the b elief about the other users’ private info rmation, policies and available network resources. After each user determines its actions, a m u lti-user communication game is p layed and the results of the game are produced as shown in Eq. (1 0). The results of the multi-user game m ay or may no t be fully observed by the users. Eq. (11) represents th e observat ion function which depends on the implemented network protocols and sensing and measuring abilities of users. Hence, a user may have incentives to exchange information with other users. Eq. (12) shows that a user updates its policy based on its private information and beliefs. The exchanged in formation t i I − may be used to update the belief about the other users’ states, policies and the networ k resource state. Eq. (13) represents the updates of these beliefs. () tt t ii i ou a = () ,, tt t t t t ii i i i i Ia s u a a −− − == Fig. 11. Il lustration of strategic l earning soluti ons bas ed on the require d observat ion and inform ation An important consideration in strategic learning solutions is the observation t i o and the information t i I − available to each user as well as their ability to process this information. Figure 11 highlights three convention al strategic learning solutions [29][42]: 1) Fictitious play : a wi reless user’s strategy is the best response to the empirically observed actions of all other users ; 2) Regret matching : The user’s strategies are selected to elimi nate the “regret” of not having played different actions based on retrospective ob serva tions of other user’s strategies; 3) Reinforcement learning : Playing a strategy increases or decreases the probab ility of it being played in the future, depending 23 on the utility received when playing that action. As shown in Figure 11, both regret matching and fictitious play require the information of the functional expression of utility ( ) , tt t ii i ua a − and assume that the actions t i a − of other users can be observed or explicitly exchanged. Of course, this may no t be feasible, especially for “large scale” games that involve many users, or when the ot her users adopt different protocols. In contrast, reinforcement learning only assumes knowledge of th e received payoff by individual users of their own historical actions ( ) tt ii ua , and it is easier to implement in wireless networks, where users with different protocols operate, and thus users cannot easily exch ange information or interpret their opponents’ strategies based on their observ at ions. Another difference among these learning methods is their required complexity, which is especially important if deployed by low-power wireless devices. Hence, depending on their avail able observation an d information, strategic users can deploy different learnin g solutions to achi eve various performance versus complexity trade-offs. If users deploy the same set of actions, strategies, and protocols, and are able to observe the oth er users’ actions, th ey can model the other u sers using “self-play ” [47] and hence, th ey can easily develop strategies for efficiently sharing the netwo rk resources by anticipating the other users’ actions. However, if the h eterogeneous users are not able to observe or interpret the other users’ actions (e.g. because they deploy different prot ocols), they may share the available netwo rk resources in a very inefficient manner, as previous ly d iscussed. Hence, users may decide that it is beneficial for them to exchange explicit information about their actions, and potentially even their strategies or utilities [40]. (Such informatio n exchanges can be implem ented using existing or new signaling protocols [1] at the application or MAC layer.) Therefore, users need to decide wh ich strategic learning solution , information and ob servat ions have higher values fo r them. Next, we will discuss how users can quantify the value of learning, info rmation, and observations. C. Value of Learning, Information, and Ob servations 24 As mentioned previously, the performance of a l earning algorithm can be quantified based on the resulting payoffs obtained by users. We denote th e policy generated by a learning algorithm L as L π . Users will learn in order to improve their policies and payoffs from participating in the communication game . Given the available information and o bservations, the value of the learning algorithm L is defined as the tim e average payoff obtained in a time windo w with length T in which this learning algorithm was used: () ,, 1 1 (, , , ) ( ) T t t T T = = ∑ L L V oI s oI s U π , (14) where the attainable payoff of the learning approach L depends on the o bservation ii o ∈ =× o N , private information ii s ∈ =× s N , and exchanged information ii I ∈− =× I N . Thus, using this definition, the value of a learning scheme can be determined. As illustrated in Figure 12, different learning algorithms provide operational way s to approach the value of perfectly having the knowledge that the users are tryi ng to learn. Th e “valu e of information/observations” for variou s available information and observations w ith respect to a learning algo rithm L can be also similarly computed, which will play a significant role on determining what information/observations sh ould be exchanged/gathered by users. ( ) V priv k () V heter k () V comp k (, , , ) T L V oI s Fig. 12. An i llust ratio n of the val ue of knowledge an d learni ng. D. An Example: Regret Matching In this subsection, we introd uce the concept of correlated equilibrium, which can b e achieved using regret matching [42]. As an illustration, we also numerically show that users that employ this 25 strategic learning algorithm can improve their performance without any explicit information exchange with their competitors in the power control games. 1) Correlated Equilibrium User 1 User 2 Aggress Backoff Aggress 0, 0 7, 2 Backoff 2, 7 6, 6 Fig. 13. Game in contention networks: u ser 1’s utility is given first in each cell, with user 2’s following. First, we consider the gam e in contention netw orks, e.g. networ ks depl oying CSMA/CA ba sed protocols, where two users are competing with each other in a packet tran smission contest wh ere each of them can either aggressively transmit or ob ey the backoff principle [17]. If one user is g oing to Aggress , it is bette r for the other to Backoff , otherwise the network will be congested and few packets can get through. But if one user chooses to Backoff , it is better for the other to Aggress , which will result in a better outcome than bo th users obeying the backoff principle. The pay off matrix is shown in Figure 13. There are three NE in this game, including two pure strate gy NE, ( Aggress , Backoff ) and ( Backoff , Aggress ), and a mixed strategy NE where bo th Aggress with probability 1/3 and get an average utility of 4.67. Next, we introduce the concept of correla ted equilib rium , which is a generalization of the NE concept. The correlated equilibrium is defined in a context where the players are able to access certain common signals. These signals allow players to coordin ate their actions and to perform joint randomization over strategies. Given the recommended strategy , it is in the players’ best interests to conform with this strategy. The distribution of the strategies is c alled correlated equilibrium. Definition of Correlated Equilibrium : Let ( ) Δ A be the set o f probability distrib utions on A . A probability distributio n on the strategy profile ( ) μ ∈Δ A is a correlated equilibrium (CE) of gam e Ω if and only if, for all n ∈ N and ( ) , nn aa − ∈ A , ( ) ( ) ( ) ( ) ,, ,, nn nn nn n nn nn n nn aa aa u aa aa u aa μμ −− −− −− −− ∈∈ ′ ≥ ∑∑ AA , for all nn a ′ ∈ A A simple example of correlated equilibrium for t he contention game is to let the two users choose 26 ( Backoff , Backoff ), ( Aggress , Backoff ), and ( Backoff , Aggress ) with identical probability of 1 /3. Note the resulting utility for each user is 5, which is higher than that of the mixed strategy NE. For finite games, the set of CE is a non-empty poly tope which contains the convex hull of all the NE. Figure 14 shows geometry of the equilibria of th e 2-user contention game, where the tetrahedron is the simplex of probability distributions on outcomes of th e game, the saddle is the set of distributions independent between players, the p olytope with 5 vertices and 6 facets is the set of CE, and their three points of intersection are NE [43] . Importantly, as shown in Figure 14, sin ce CE is a generalization of NE, a CE may lie outside the convex hull of the NE. Therefore, the performance of CE may be better than that of NE. Fig. 14. Geom etry of CE and NE (A= Aggress , B= Backoff ) [43]. 2) Regret matching in Power Control Ga mes To achieve the set of CE, a class of algorith ms named regret matching was designed. The stationary solution of this algorithm exhibits no regret and users update their play p robabilities in (9)-(13) are proportional to the “regrets” for not having played other actions [42]. In particular, for any two distinct actions t nn aa ′ ≠ in n A an d at every time t , the regret of user n at time t for not playing n a ′ is () () ( ) 1 max 0 , , , tt t t nn nn n nn n tt ra u a a u aa t ′′ ′ −− ′ ≤ ⎧⎫ ⎪⎪ ⎪⎪ ′′ = − ⎨⎬ ⎪⎪ ⎪⎪ ⎩⎭ ∑ . ( 1 5 ) In other words, th e regret for an action is defi ned as the increase in utility if such a change had 27 always been made in the past. Computing the regret completes the belief update part in the learning process in Figure 10. Users will update their po licies by simply assigning th e probability of selecting a certain transmission actio n proportionally to the regret of that action. It is shown that, if every player play s according to regret matching al gorithm, the empirical distribution s of play converge almost surely to the set of CE distribu tions of the game [42] . Specifically , in regret mat ching, t he obse rvatio n t i o in the p ast play serves the role of common signal that enables u sers to perform joint coordination. The advantage of the algorith m is that it requires no information exchange t i I − among the users, and hence users can learn the correlated equilibrium in a fully distributed manner. 0 10 0 200 300 400 500 600 70 0 800 0 100 200 300 400 500 600 700 800 900 R 1 (K bps ) R 2 (K bps ) Na s h st ra t eg y (I t era t iv e Wat er fillin g) S t ac k elb erg s t rat egy Pa re to Bo un d ar y B es t res ul t of Regret -bas ed Learnin g Fig. 15. Pe rforma nce of re gret m atching in pow er control game s. We applied this regret matching algorithm in a two-user power control game and the simulation results are shown in Figure 15. In the simulation, we discretize users’ action space by dividing their maxi mum po wer int o mul tip le p art s. E ach u ser cal culates the regrets fo r all the possible actions in its finite action space and updates its strategy repeatedly according to (15). The numerical resu lts for different power constraints are shown in Figure 15. Because NE is only part of the set of CE, the best performance that regret matching can achieve is strictly better than that of NE achieved by the 28 IW algorithm. V. S OME C HALLENGES FOR F UTURE R ESEARC H Numerous challenges still need to be add ressed to characterize the performance of the knowledge-driven g ames among heterogeneous users as well as to make these solutions widely adopted in practice. For example, the value of hav ing different knowledge (e.g. the knowledge of other users’ action space, strategies, utility funct ions, the environment, etc.) needs to be quantified in various communication systems, and applicat ion-specific learning algorithms (e.g. which consider the available com mon knowledge about the pr otocols present in the system) need to be designed to achieve an increased utility efficiency instead of applying learnin g algorithms designed for generic games. Moreover, in each application scenario, the learning algorithms need to be benchmarked along the following three dimensions: (i) incurred communication overhead for the information exchanges, (ii) in curred computatio nal overhead, and (iii) the achievable utility that can be derived by different learning algori thms (i.e. the value of learning). Another interesting topic is to in vestigate the scenarios in which users only have the (probabilistic) belief about the ot her players’ characteristics. In these cases, Bayesian games are an appealing approach to model the interaction and determine the achievable performance [26]. Considering the dy namic nature of the wireless networks where users’ CSI and source characteristics are dynamically changing over ti me, all the previously addressed issues will be further complicated. More advanced multi-user inte raction models, e.g. stochastic gam es, are required to cope with th ese dynamics in the multi-user interaction [41]. Research advances will also need to be made to allow users to proactively improve the available common knowledge based on which the multi-user interact, and design protocols that can encourage the accumulation of common knowledge, wh ich can increase the system performance and even prevent malicious users to misbehave. Summarizing, in this report, we have provided a fresh glimpse of how heterogeneous knowledge 29 and strategic learning can change the conventional way people hav e designed and characterized the multi-user interactions in wireless communication systems. Considering the complexity of this research topic and the diversity of existing litera ture, our presentation is far from comprehensive. However, we hope that this rep ort can motivate the need for fu rther research activities in study ing the knowledge-driven in teraction of users in communi cation networks. Finally, it is our vision that designing communication systems, where users wi th different amounts of knowledge can compete according to “free market” principles rather than based on guidelines imposed by resource o wners and protocol providers, will lead to un precedent ed performance improvem ents for communication devices and systems as well as ca talyze new algorithm and system designs. R EFERENCES [1] S.-F. Chang and A. Vetro, "Video Adaptation: Co ncepts, Technologies, and Open Issues," Special Issu e on Advanc es in Vi deo Coding and Del ivery , Proc. of IEEE , vol. 93, pp. 148-1 58, Jan. 2005 [2] E. Altm an, T. B oulogne, R. El -Azouzi, T. Ji me nez, and L . Wynter, “A sur vey on netwo r king gam es in telecommunications,” Compute r Oper ation Re search , vol. 33, pp. 286-311, Feb. 2006. [3] A. MacKenzie and S. W icker, “Game Theory and the Design o f Self-Configur ing, Adaptive Wireless Networks,” IEEE Co mmun. Mag azine , vol. 39, pp. 126-131 , Nov. 2001. [4] V. Srivastava, J . Neel, A. MacKenzie, R. Menon, L. A. DaSilva, J. Hi cks, J.H. Reed, and R. Gi lles, “Using game theory to analyze wireless ad hoc networks,” I EEE Commun . Surveys Tutorials , vol. 7, pp. 46-56, 4th Quart . 2005. [5] F. Meshkati, V. Poor, and S. Schwartz, “Energy-efficient reso urce allocation in wireless n etworks: An overview of game-theoretic appr oaches,” IEEE Signal Processing Maga zine , vol. 24, pp. 5 8-68, May 2007. [6] R. Johari a nd J.N. Tsi tsi klis, “E fficie ncy los s in a networ k resource alloc ation gam e,” Mathematics of Operati ons Research , vol . 29, no. 3, pp. 407-435, Aug. 2004 . [7] C. U. Saraydar, N. B. Mandayam, and D. J. Goodman, "Efficient power control v ia pricing in wireless data networks," IEEE Trans. C ommun. , vol. 50, no. 2, pp. 291-303, Feb.20 02 [8] E. Altman and Z. Alt man, ``S-modular games and power control in wireless networks'', IEEE Trans. Automat ic Cont rol , vol . 48, no. 5, pp. 839-842, Ma y, 2003. [9] R. Etkin, A. Parekh , and D. Tse, “Spectrum Sharing for Unlicensed Band s,” IEEE J. Sel. Areas Commun. , vol . 25, no. 3, pp . 517-528, Apri l 2007. [10] W. Yu, G. Ginis, and J. Cioffi, “Distribu ted Multiuser Power Control for Digital Subscriber Lines,” IEEE J. Sel. Areas Commu n. , vol. 20, no. 5, p p. 1105-1115, Ju ne 2002. 30 [11] R. Ce ndrill on, W. Yu, M. M oonen, J. Verli nden, and T. Bosto en, “Optima l Mul tiuser Spec trum Bala ncing for Digit al Subs criber Lines,” IEEE Tran s. Commun. , vol . 54, no. 5, pp . 922-933, May 2006. [12] R. Ce ndrill on, J. Hu ang, M. Chiang, and M. M oonen, “Autonomo us Spectr um Ba lanci ng for Digi tal Subscriber Lines,” IEEE Trans. Signal Processing , vol . 55, no. 8, pp. 4 241-4257, Aug. 200 7. [13] J. Huang, R. Berry, and M. Honig, “Distr ibuted Interference Compensation fo r Wireless Networks,” IEEE J. Sel. Areas Commu n. , vol. 24, no. 5, p p. 1074-1084, M ay 2006. [14] A.B. MacKenzie and S.B.Wicker, “Selfish users in Aloha: A game theoretic approach,” Proc. IEEE VTC Fall , vol. 3, pp. 1354-1357, Oct . 2001. [15] Y. Jin and G.Kesidis, “Equilibria of a non-cooperative game for heterogeneous user s of an ALOHA network,” IEEE Co mmun. Letters , vol. 6, no 7, pp. 28 2-284, July 2002. [16] J.-W. Lee, A. Tang, J. Huang, M. Ch iang, and A. R. Calderbank , "Reverse Engineering MAC: A Non-Cooper ative Game Model," IEEE J. Sel. Areas Commun. , vol. 24, no. 6, pp. 113 5-1147, Aug. 20 07 [17] M. Ca galj, S. Ga neriwal , I. Aad, and J.-P. Huba ux, "On se lfish behavior in CSM A/CA net works," Proc. IEEE Infoco m , vol. 4, pp. 2513-2524, Ma r. 2005. [18] Y. A. Korilis, A. A. Lazar, and A. Orda, “Architecting noncoop erative networks,” IEEE J. Sel. Area s Commun. , vol. 13, pp. 1241-1251, Sep. 1995. [19] A. Orda, R . Rom, and N. Shi mkin, “Com petitive rout ing in m ultiuser comm unication ne tworks,” IEEE/ACM Trans. Networking , vol. 1, pp. 51 0-521, Oct. 1993. [20] R. La and V. Anantharam, “Optimal Routing Control : Repeated Game Approach,” IE EE Trans. Automat ic Cont rol , vol . 47, no. 3, pp. 437-450, Ma rch 2002 [21] T. Rough garden and E. Tardos, "How Bad is Self ish Routing?", Jour nal of t he ACM , vol . 49, issue 2, pp. 236-259, Ma r. 2002. [22] A. Tang, J . Wang, S. H. L ow, and M. Chia ng. Equi libriu m of heter ogeneous congesti on control protocol s, IEEE/ACM Tran s. Networking , vol. 15 no. 4, p p. 824-837, Oct. 2007. [23] R. J. La, and V. Anantharam, "A game-theoretic look at the Gaussian multiaccess channel," DIMACS Series in Discrete Mathematics and Theoretical Computer Science, Vol. 66 (Advances in Network Inform ation Th eory), pp. 87-106 , 2004. [24] S. Mathur, L. Sank aranarayanan, and N. B. Mandayam, “Coalitional Games in Gaussian Interf erence Channel s,” Proc. IEEE IS IT , pp. 2210-2214, J uly 2006. [25] L. Lai and H. El-Gamal, "Fading Multiple Access Channels: A Game Theoretic Per spective", Proc. IEEE ISIT , pp. 1334-1 338, July 2006. [26] M. Osborne and A. R ubenste in, A Course in Game Theory . MIT Pr ess, 1994. [27] R. W. Lucky, “Tragedy of the co mmons,” IEEE Spectrum , vol. 43, no. 1, p p. 88, Jan 2006. [28] Y. Su and M. van der Schaar, “A New Loo k at Multi-user Po wer Control Games,” in Proc. Int. Conf. Commun. ( ICC ) 2008 , to appear [29] D. Fudenber g, and D. K. Le vine, The Theory of Learni ng in Games . Cambridge , MIT Press, 1998. 31 [30] P. Stone and M. Veloso, “Multiagent systems: A survey from a machine learning perspective,” Autonomou s Robots , vol. 8, no. 3, pp .345-383, July 2000. [31] Y. Shoham, R. Powers, and T. Grenager, "I f Mu lti-Agent Learning is the Answer, W hat is the Question? " Artificial Intelligence , vol. 171, iss ue 7, pp. 365-377 , May 2007 . [32] S. Mannor and J. S. Sh amm a, "Mu lti-agent learning for eng ineers," Artificial Intelligence , vol. 171, iss ue 7, pp. 417-422, May 2007. [33] E. Friedman and S. Shen ker, "Learning and Implementation on the Internet," Manuscript. New Brunswi ck: Rutgers Uni versity, Depa rtm ent of Ec onom ics, 19 97. Availa ble at http://citeseer.ist.psu.edu/eric98 learning.html. [34] Y. Chang, T. Ho, and L. Kaelbling, "All learning is local: Multi-agent learning in global rewar d games," Advance s in Ne ural In formati on Proce ssing Systems 16 ( NIPS ), 2003. [35] S. Haykin, “Cognitive Radio: Brain -empowered wireless communications,” IEEE J. Sel. Areas Commun. , vol. 23, pp. 201-220, Feb. 2005. [36] C. Pandana and K.J.R. Liu, "Near Optimal Reinforcement Learning Framework for Energy-Awar e Wirele ss Sensor Comm unications ", IEEE J. Sel. Ar eas Commun. , vol. 23, no. 4, pp. 788-797, Apri l 2005. [37] Z. Han, C. Pandana, and K. J . R. Liu, “Distributive Op portunistic Spectrum Access f or Cognitive Radio using Correlated Equilibrium and No-regr et Learning,” Proc. I EEE Wireless Commu. a nd Net. Conf. 2007 ( WCNC 2007 ), pp. 11-15, Mar. 2007. [38] D. Djonin and V. Krishnamurthy, “Q-Learnin g Algorithms for Constrained Markov Decision Process es with Ra ndomi zed Mono tone Pol icies : Appl icati ons to MIMO Tra nsmi ssion Control, ” IEEE Trans. Signal P rocess ing , vol. 55, no. 5, pp. 217 0-2181, May 2007. [39] C. Long, Q. Z h ang, B. Li , H. Yang, and X. G uan , "Non-Cooperative Power Con trol for Wireless Ad Hoc Networks with Repeated Games", IEEE J. Sel. Areas Comm un ., Spec ial Is sue on Non-C ooperative Behavior in Networki ng , vol. 25, pp . 1101-1112, Aug. 2007. [40] H. P. Shi ang and M . van der Schaar, "D istributed R esource M anagement in Mu lti-ho p Cognit ive Rad io Networks for D elay Sensitive Transmission," IEEE Trans. Veh. Tech. , to app ear [41] F. Fu and M. van der Schaar, "Dynamic Spectrum Sharing Using Learning for Delay-Sensitive Applications," in Proc. Int. C onf. Commun. ( ICC ) 2008 , to appe ar [42] S. Hart, "Adaptiv e Heuristics," Econometrica , vol. 3 , issue 5, pp. 1401-1430, Sep. 2005 [43] R. Nau, S. G. Canovas, a nd P. Hanse n, “On the Geome try of Nash Equil ibria and Corr elated Equilibria,” Int. J. Game Theory , vol. 32, no.4, pp . 443-453, Aug. 20 04 [44] R. Axelrod and W. D. Hamilton, "The Evolution of Cooperation, " Scienc e , vol. 211 pp.139 0-1396, 1981 [45] F. P. Kelly, A. Maulloo, and D. Tan , “Rate control in communication networks: shadow prices, proporti onal fa irness and st abili ty,” Journal of the Operatio nal Research Society , vol. 49, pp. 237-252, 1998 [46] M. Chiang, S. H. Low, A. R. Calderbank, an d J. C. Doyle, “Layering as optimization decomposition: A mathemati cal theory of network arch itectures,” Proc. of IEEE , vol. 95, n o. 1, pp. 255-312, Jan. 2007. 32 [47] H.P. Young, Strate gic Le arning and its L imits , Oxford, Oxford Universi ty Press , 2006. [48] F. Fu and M. van der Sch aar, "Non-collaborative resour ce management for wireless m ultimedia appli cations using mechani sm de sign," IEEE Trans. Multimedia , vol. 9, no. 4, pp. 851-868, Jun. 2007. [49] Y. Su and M. van der Schaar, "A Simple Characterization o f Strategic Behaviors in Broadcast Channel s," IEEE Sign al Process. Lett. , vol. 15, pp. 37-40, 2008

Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Leave a Comment