Resource-Aware Control via Dynamic Pricing for Congestion Game with Finite-Time Guarantees
Congestion game is a widely used model for modern networked applications. A central issue in such applications is that the selfish behavior of the participants may result in resource overloading and negative externalities for the system participants.…
Authors: Ezra Tampubolon, Haris Ceribasic, Holger Boche
Resource-A ware Control via Dynamic Pricing for Congestion Game with Finite-T ime Guarantees Ezra T ampubolon : , Haris Ceribasic : , and Holger Boche : * : T echnische Uni versit ¨ at M ¨ unchen, Lehrstuhl f ¨ ur Theoretische Informationstechnik *Munich Center for Quantum Science and T echnology (MCQST) { ezra.tampubolon,haris.ceribasic,boche } @tum.de Abstract —Congestion game is a widely used model for modern networked applications. A central issue in such applications is that the selfish behavior of the participants may result in r esource overloading and negative exter nalities for the system participants. In this work, we propose a pricing mechanism that guarantees the sub-linear increase of the time-cumulative violation of the resour ce load constraints. The feature of our method is that it is resour ce-centric in the sense that it depends on the congestion state of the resour ces and not on specific characteristics of the system participants. This feature makes our mechanism scalable, flexible, and privacy-pr eserving. Moreo ver , we show by numerical simulations that our pricing mechanism has no significant effect on the agents’ welfare in contrast to the improvement of the capacity violation. Index T erms —Congestion Game, Resour ce Allocation, Decen- tralized algorithm, Mirr or Descent, Pricing Algorithm, Netw ork Routing I . I N T RO D U C T I O N Optimizing users/devices competing for utilization of re- sources (be it network link, power supply , or wireless spec- trum) ha ve become essential components of modern networked systems such as IoT , smart grid, and cogniti ve radio. A trend in recent years is that the number of users in such applications increases tremendously (see e.g. [1]). For instance: Analysts predicts that more than 50 billion things are expected to be connected ov er the internet by the end of 2020 [1]. Such rapid gro wth in volves certainly a series of challenges. One of the main challenges facing the system managers is congestion control (CC) of the av ailable resources. For without it, negati ve externalities in the form of quality degradation of resource services might occur due to overload. For instance, in wireless communication network applications, an excessiv e amount of traffic through a base station or an access point (resource) might result in buf fer bloat, and consequently in the ineffi- ciency of the system in the form of high latency and network throughput reduction, causing a negati ve experience for all users. Furthermore, sophisticated congestion control method is crucial for making the electrical po wer driv en technologies en vironment-friendly , Another trend visible in recent years is that po wer consumption due to technical applications consti- tutes a non-negligible part of the global power consumption with the tendency of enormous gro wth (see e.g., [2]). The excessi ve numbers of users in modern networked systems and the decreasing degree of cooperati veness justify the attractiveness of the f amous game theoretical concepts for system modeling. A natural fundament for dev eloping a CC method is the concept of the congestion game introduced in [3], [4]. The corresponding model assumes non-cooperativ e rational participants, whose strategy is an allocation policy ov er resources, and whose loss depends proportionally on the total load of the utilized resources. The most prominent classical example of a congestion game is the traffic routing model [5], where the arcs in a giv en network represent the resources, the different origin- destination pairs specify the player , and the possible action of a player is the allocation ov er the paths in the system. This concept has also lead to fruitful discussions in the wireless net- work literature. It has recently been used in wireless network modeling, e.g., access point selection in W iFi networks [6], [7], uplink resource allocation in multichannel wireless access networks [8], wireless channels with multipacket reception capability [9], and the impact of interference set in studying the congestion game in wireless mesh networks [10]. Many CC methods are user-centric in the sense that they require observability of system participants’ actions and be- haviors and provide specific instructions for all of the system users. Such practicas are not suitable for modern large-scale applications. The reason is threefold: First, such methods often lack scalability and flexibility; Second, the typically high number of participants in such applications makes the approaches computationally infeasible; Third, due to growing users’ demands of sov ereignty and priv acy in recent years, direct observation and influence of users’ acts by higher authority are highly undesirable. Our Contributions: In this work, we assume that the agents are rational and cost-oriented, in the sense that they choose actions minimizing the accumulated historical costs, and that they are non-cooperative, i.e. they do not mutually communicate. Based on that, we propose resource-centric dynamic pricing that offers the system participants appropriate incentiv es to adhere to the resource constraints jointly support sustainable use of the resources. W e present our theoretical guarantee that our proposed method ensures that the av erage violation of the capacity constraints decays of order O p n ´ 1 { 2 q w .r .t. the time-horizon n . Complementary to this result, we provide numerical simulations for the network routing game – an instance of the congestion game. As a by-product of our practical in vestigation, we observe that, compared to the gain in resource sustainability , our pricing mechanism does not significantly effectuate the agents’ welfare (expressed by their av erage loss significantly). Relation to prior work: The congestion game has been in vestigated in sev eral directions. Closely related to our work are the following approaches which consider the game played repeatedly: Under different model of the individual agents, [11]–[14] study the con ver gence of selfish behaviour to ward the Nash equilibrium. Besides the fact that it yields a sub- optimal welfare of the agents [15], Nash equilibrium of this sort of game might not be a resource sustainable population state (see also the notion of generalized Nash equilibrium in [16], [17]). In order to relieve those undesired effects, several works introduce exciting approaches. Closely related to ours are pricing based methods, e.g. [18]–[23]. The common aspect of the listed works is that they design a population dynamic which con ver ges to the corresponding (designed) equilibrium fulfilling the capacity constraints (see e.g., the concept of generalized Nash equilibrium [16]) of the problem-specific potential game [24]. A clear contrast to our work is that they only provide asymptotic guarantee. Moreover , the methods proposed in some of those works, such as, requires agents’ personalized information such as their utilities. Basic Notions and Notations: For a real vector a , r a s ` denotes the vector whose entries are the non-negati ve part of the entries of a . Let p X , }¨}q be a normed space and A, B Ď X . W e denote A ´ B : “ t x ´ y : x P A, y P B u , and } A } : “ sup x P A } x } . In this work we assume that a probability space p Ω , Σ , P q and a filtration F : “ p F n q n P N 0 therein are giv en. I I . S E T T I N G Congestion Game: A congestion game consists of a finite set of agents/players r N s and a finite set R of resources. T o each agent i P r N s , there corresponds a collection P i Ď 2 R of resource b undles. One may encode the latter assumption by defining the adjacency matrix M p i q P R | R |ˆ| P i | whose P i - th column provides the information about all the resources contained in the bundle P i , i.e.: r M p i q s r, P i “ # 1 r P P i 0 else . The aim of each agent i P r N s is to ex ecute a cer - tain amount m i ą 0 of tasks by utilizing the bundles of resources from P i . W e describe the corresponding (uti- lization) action/strate gy of agent i by a vector x p i q P X i , where X i is a scaled simplex on P i , i.e.: X i : “ ! x p i q : “ p x p i q P i q P i P P i P R | P i | : ř P i P P i x p i q P i “ m i ) . For any P i P P i , x p i q P i corresponds to the amount of tasks agent i allocates to the bundle P i . Equiv alently , we can describe the task allocation strategy of agent i by means of the simplex ∆ i : “ ! µ p i q : “ p µ p i q P i q P i P P i P R | P i | : ř P i P P i µ p i q P i “ 1 ) . In this paper we describe the allocation strategy of agent i by means of the simplex ∆ i instead of X i . W e denote the set of population strategy by ∆ “ ś N i “ 1 ∆ i . Let µ p i q P ∆ i be an allocation action of agent i . The total load φ p i q r p µ p i q q of the resource r P R caused by the allocation action µ p i q P ∆ i of agent i is given by φ p i q r p µ p i q q “ ř P i P P i : r P P i m i µ p i q P i . Accordingly , the total load φ r p x q of resource r caused by the population strategy µ P ∆ is giv en by φ r p µ q “ ř N i “ 1 φ p i q r p µ p i q q W e sometimes also use the notation φ : “ p φ r q r P R . Moreover , we consider in this work the case where the load of resources r P R is desirable to not exceed the capacity L r P R ą 0 , i.e. φ r p µ p k qq ´ L r “ : Γ r p µ q ď 0 . T o each resource r P R , we associate a function ` r : R ě 0 Ñ R which quantifies negati ve externalities induced on the resource r due to load φ r p µ q . W e refer to ` r as the loss function of the resource r . W e assume throughout: Assumption 1: For all r P R , ` r : R ě 0 Ñ R is continuous, con vex, and non-decreasing. Assumption 2 (Slater’ s Condition): There exists ˆ µ P ∆ s.t. Γ p ˆ µ q ă 0 . The loss of a bundle P i P P i (for agent i ) is correspondingly giv en by ` p i q P i p µ q “ ř r P P i ` r p φ r p µ qq . Throughout this work we use the notations ` p i q : “ p ` p i q P i q P i P P i and ` : “ p ` p i q q i Pr N s . An example of congestion game is the following: Example 1 (Network Routing Game): Given a directed Graph G “ p V , E q with a vertex set V and edge set E Ď V ˆ V . In a routing game, the task of agent i P r N s is to transport a certain amount of commodity m i ą 0 from a starting point s p i q P V to a destination t p i q P V . T o fulfill this task, agent i can use a prescribed collection P i Ď 2 E of edges that connects s p i q and t p i q . T o e very edge (resource) e P E there corresponds a function c e (loss) that maps the total amount flow caused by the transport of commodities on the edge e to a non-negati ve number determining the delay on e , and also a constant ` e ą 0 which prescribed the amount of flow admissible on edge e . Remark 1: The congestion game which we in vestigate in this chapter is an instance of the so-called potential games which are games admitting a potential function: a real-valued function whose unilateral change describes the change in the player’ s payoffs. Finite player potential game was firstly studied by Rosenthal [3], who also recognized its relation to congestion game and systematically inv estigated by Monderer and Shapley [24]. The work [24] also provides a generalization of the finite player setting to infinite player setting, which is the subject of our in vestigations. Remark 2: The infinite player setting can be either seen as a mixed strategy version of finite player setting, or as an approximation of the finite player setting with lar ge number of populations, and is more con venient; since the latter case can be cumbersome to analyze. Howe ver , there is a key dif ference between finite and infinite cases: while the Nash equilibria in finite case are exactly the optimizers of the corresponding potential function, not all equilibria are optimizers [11] in the infinite case; although in that case, all optimizers of the potential function are equilibria. P erformance Measures: Let be k P N and µ p τ q , τ P r k s 0 , be a giv en sequence of population actions from initial time until time slot k . T o ev aluate the population performance in the congestion game we use the following criteria: W e measure the resource sustainability of the population sequential actions p µ p τ qq τ Pr k s 0 by the (norm) of the aggregated admissible flow violation defined by: A CV p k q “ › › › › › « k ´ 1 ÿ τ “ 0 Γ r p µ p τ qq ff ` › › › › › 2 Additional to resource sustainability behavior , we in vesti- gate the loss incurred to the population applying the resource allocation decisions p µ p τ qq τ Pr k s 0 in form of the aggregated delay: AD p k q “ k ÿ τ “ 0 ÿ i Pr N s D i p τ q , where D i p τ q denotes the delay experienced by agent i at time τ , i.e. D i p τ q “ ř P i P P i ` p i q P i p µ p τ qq µ p i q P i p τ q . It should be noted, that resource sustainability and loss minimization do not need to be coinciding objectives, b ut can display a trade-off behavior depending on model parameters, i.e. they appear as conflicting objectiv es. Therefore it can happen, that resource sustainability subsequently implies a disadvantaging of some agents. I I I . R E S O U R C E - C E N T R I C P R I C I N G F O R C O N G E S T I O N G A M E P opulation Dynamic via Score and Hedge str ate gy: Throughout this work, we consider the congestion game, which is played multiply with time horizon n P N . W e provide a summary of our model for the agents’ decision-making process in Algorithm 1. According to Algorithm 1, ev ery agent i P r N s accumu- lates at each round k P r n s the present and historical cost (discounted by a gi ven parameter γ ) of each resource b undle av ailable to him, aiming to provide the scores of her bundle preferences. This model corresponds to non-myopic data- driv en agents that utilize historical data to derive their strategy . This assumption of agents’ behavior is quite plausible for recent applications that mostly utilize statistical and learning methods by accumulating past data (in this context: resource costs). The corresponding actual cost of an av ailable bundle con- sists of the actual noisy loss caused by negati ve externalities and the price set exogenously by a regulator (c.f. Algorithm 2). W e assume that the nois p ξ n q n P N is a R ř N i “ 1 | P i | -valued F - martingale differ ence sequence which is a quite general noise model. One reason that we model the loss as noisy is that the en vironment or the imperfectness of agents’ sensing de vices can cause imperfectness of agents’ feedback. Another reason is that we can handle the case where the agents’ actions are discrete while their strategies are mixed states, and thus the resource congestion only represents an unbiased sample of the congestion specified by the mixed strategies (see [25]). The mapping Φ p i q serves to model how the i th agent builds up his allocation strategy from the actual score of the bundles. In this work, we inv estigate the case where it takes the following specific form: p Φ p i q p y p i q qq P i “ exp p y p i q P i q ř Ă P i P P i exp p y p i q Ă P i q . (2) Algorithm 1 Hedge algorithm with Prices Require: n P N , γ ą 0 , Φ i : R P i Ñ ∆ i . for ev ery agent i P r N s do Initialize the score vector Y p i q 0 Ð 0 end for for time k “ 1 , 2 , . . . , n do Population apply the allocation strategy: X p k q “ p m i µ p i q p k qq i Pr N s for ev ery agent i P r N s do Receiv e the price v ector p Λ r p k qq r P R broadcasted by the regulator . for all bundle of resource P i P P i do Experience the disturbed cost: ˆ ` p i q P i p k q Ð ` p i q P i p µ p k qq ` ξ p i q P i p k ` 1 q Compute the price per amount of task: π p i q P i p k q “ ÿ r P P i Λ r p k q Update the score of bundle P i : Y p i q P i p k ` 1 q Ð Y p i q P i p k q ´ γ ” ˆ ` p i q P i p k q ` π p i q P i p k q ı (1) end for Generate the allocation strategy (see (2)): µ p i q p k ` 1 q Ð Φ p i q p Y p i q p k ` 1 qq end for end for W ithout altering the analysis giv en in this work, one can use more generally the concept of the mirror map (See also [25]) for specifying the choice map Φ p i q p y p i q q : “ arg max µ p i q P ∆ i @ µ p i q , y p i q D ´ ψ i p µ p i q q ( , where ψ p i q : ∆ i Ñ R strongly con vex w .r .t. } ¨ } 1 . For instance, one can use the usual Euclidean projection onto the simplex instead. Remark 3: W ithout altering the analysis giv en in this work, one can use more generally the concept of mirror map (See also [25]) for specifying the choice map Φ i . Specifically Φ i which takes the form Φ p i q p y p i q q : “ arg max µ p i q P ∆ i !A µ p i q , y p i q E ´ ψ i p µ p i q q ) , for a function ψ p i q : ∆ i Ñ R strongly con vex w .r .t. } ¨ } 1 . For instance the usual Euclidean projection onto the simplex can be used as the choice map. Pricing Algorithm: T o encourage sustainable use of the resources, we specify the price vector required by Algorithm 1 via the mechanism described in Algorithm 2. W e underline the fundamental role of the price to reflect the scarcity of a resource by setting the price update (3) proportional to the present congestion state φ r,k ´ L r (with the parameter β specifying the sensiti vity of the prices to the congestion state). This aspect allows the regulators to indicate a possible resource ov erload implicitly . Algorithm 2 Resource-Centric Pricing Require: n P N , β ą 0 , α P p 0 , 1 s Initialize the price vector Λ 0 Ð 0 for time k “ 1 , 2 , . . . , n do for r P R do Check the actual load φ r,k : “ φ r p X p k qq of ressource r caused by Algorithm 1 Update the price of resource r : Λ r p k ` 1 q Ð rp 1 ´ α q Λ r p k q ` β p φ r,k ´ L r qs ` (3) end for end for Furthermore, we introduce ”memory” into the price dy- namics by inv olving the previous price update Λ r p k q into (3). The reason is twofold; firstly , to ensure the alignment of the incentiv es with the non-myopic behavior of the agents, and secondly to track the congestion dynamic for analytical purposes. The latter reason becomes clear by iterating (3) (with α “ 0 ), and recognizing, that the prices gi ve an upper bound for the A CV. Howe ver , a possible drawback of this procedure is that a sharp price increase might result in a domination of the agents’ preferences (expressed by their losses): It follows from (1) that unusually high prices caused the agents to decide for the resources, ha ving the lo west prices and not for the ones giving them the lowest loss – resulting in a degradation of the population’ s welfare. Thus, we introduce in Algorithm 2 the parameter α whose role is to bypass the phenomenon above by offsetting the memory in the price dynamic. update score(.) update strategy(.) agent(1) ... agent(N) Y ( k + 1) µ ( k + 1) X ( k ) Agents: [ N ] update load(.) update delay(.) get cost(.) Σ φ r ( µ ( k )) ` r ( φ r ( k )) ˆ ` ( µ ( k )) φ r ( µ ( k )) ˆ ` ( k ) + π ( k ) Netw ork set price(.) get price(.) Resource: r ∈ R Λ r ( k + 1) Λ r ( k ) Algorithm 1 Algorithm 2 Fig. 1. Sketch of Algorithms 1 and 2. The blue color marks the agents’ aff airs, the red price the setters’ (resources), and the black the networks’ Relation between Algorithms 1 and 2: In order to clarify the relationship between the price setters, i.e., resources, we sketch the connection between Algorithms 1 and 2 in Figure 1. It is apparent that the resource-prices are decided parallelly in-situ and do not require any centralized instance in contrast to most of the resource control mechanisms such as the bidding-based and auction-based mechanism. This aspect is an advantage since centralized solutions are known to be sensitiv e to malicious attacks and require rather sophisticated computations, e.g., solving an optimization problem. Also, we want to stress that the price of a resource r is based purely on 0 100 200 300 400 500 600 700 800 900 1 , 000 47 48 49 50 51 52 53 54 55 56 57 no pricing β = γ β > γ β < γ (a) AD av eraged over time 0 100 200 300 400 500 600 700 800 900 1 , 000 0 1 2 3 4 5 6 7 8 9 10 no pricing β = γ β > γ β < γ (b) ACV av eraged over time Fig. 2. Performance for L r “ 14 the congestion state φ r,k ´ L r and not on the (preferences of the) agents utilizing the resource r P R .Since the agents do not have to reveal their strategy and preferences (e.g., resource bundles), our method respects the sovereignty and the priv acy of the individuals. Moreov er , since our method does not cling to a specific agent’ s feedback, agents can be added or remo ved, making this approach particularly flexible. By not knowing the preferences of the agents and due to the absence of a centralized instance, we may sacrifice some de- sired properties of the mechanism (e.g., driving the population tow ard a socially optimum state and budget balance). Howe ver , in order to approach the fulfillment of the first property , tuning the parameters of the mechanism and accepting looser resource constraints so that the prices do not dominate the loss of the agents, results in resource sustainability with a lo wer cost of welfare degradation (for details see Section V). I V . P E R F O R M A N C E A N A L Y S I S Throughout, C 1 , C 2 , C 3 , m ˚ denote non-negati ve constants fulfilling for all µ P ∆ and λ P R R ě 0 : N ÿ i “ 1 m i } M p i q , T λ } 2 8 ď C 2 1 } λ } 2 2 , N ÿ i “ 1 m i } ` p i q p µ q} 2 8 ď C 2 2 } φ p µ q ´ L } 2 ď C 3 , m i ď m ˚ , @ i P r N s . Our main result is the following: Theorem 1: Let γ ą 0 be given, β “ γ , and α “ δ γ 2 with δ ą 0 satisfying p C 2 1 ` γ 2 δ 2 q ´ δ 2 ď 0 . (4) It holds: E ” } Λ p n q´ λ ˚ } 2 2 2 ı ď ∆ ψ 2 2 ` p 1 ` αn q } λ ˚ } 2 2 2 ` ˜ C 2 1 2 γ 2 n ` 2 γ 2 m ˚ N n ÿ k “ 1 E r} ξ k } 2 8 s (5) wher e r C 2 1 : “ 2 ` C 2 2 ` 2 C 2 3 ˘ and ∆ ψ 2 “ 2 m ˚ ř N i “ 1 ln p| P i |q Remark 4: A necessary condition for gamma such that there exists a δ ą 0 satisfying (4) is: γ ď 1 4 C 1 . (6) If this is fulfilled, then (4) is equiv alent to: 1 ´ a 1 ´ 16 γ 2 C 2 1 4 γ 2 ď δ ď 1 ` a 1 ´ 16 γ 2 C 2 1 4 γ 2 . (7) W e also observe that for small enough γ , we can choose δ « 2 C 1 , which does not depend on the horizon length. Attentiv e reader may recognize by inspecting the proof of abov e Theorem that in order that abov e result holds, it is not necessary , that α is of the form α “ δ γ 2 , and thus that the regulator kno ws precisely about the agents’ step size. The only requirement is that α has to decay slower than γ 2 with the time horizon T . Howe ver , one obtains the best rate for the performance guarantee in case that α is of order γ 2 (w .r .t. T ). The proof of Theorem 1 is giv en in the Appendix. An immediate consequence of Theorem 1 is the following (for proof see the full version [26]) guarantee for the accumu- lation of the capacity violation: Corollary 2: Suppose that the conditions of Theorem 1 are fulfilled and that the noise is persistent in the sense that there exists σ 2 ą 0 s.t. E r} ξ k } 2 8 s ď σ 2 4 m ˚ N for all k P N . It holds: E r} Λ p n q} 2 s ď ∆ ψ ` p 1 ` a p 1 ` δ γ 2 n qq} λ ˚ } 2 ` p ˜ C 1 ` σ q γ ? n, (8) wher e ∆ ψ and ˜ C 1 is given as in Theor em 1. Now , suppose that γ : “ c { ? n : for a constant c ą 0 and δ P p 0 , 1 { γ 2 q s.t. (4) is fulfilled. It holds: E r A CV p n qs ď p δ c ` 1 c q A ? n, (9) wher e A : “ ∆ ψ ` p 1 ` a p 1 ` δ c 2 qq} λ ˚ } 2 ` p ˜ C 1 ` σ q c . V . S I M U L A T I O N Game Setting: W e consider the network routing problem giv en in Example 1 which we specify as follows: V consists of 15 nodes and E is built from a randomly generated adjacency matrix (without self-loop) with independent entries, where each non-diagonal is 1 with probability 0 . 5 . Furthermore, we consider N “ 10 agents, each has the starting point and the destination randomly uniformly chosen from V . Giv en the lat- ter , each agent i has randomly created bundles of maximal size | P i | ď 10 . W e set the total resource load m i “ 20 , @ i P r N s , and the admissible flow per resource L r “ 14 , @ r P R . For the cost per resource ` r,k , we consider a quadratic polynomial of the form ` r,k p φ r p k qq “ a p r q 2 φ r p k q 2 ` a p r q 1 φ r p k q ` a p r q 0 , where the coefficients p a p r q 2 , a p r q 1 , a p r q 0 q for each resource r P R are independently randomly uniformly chosen from r 0 , 0 . 05 s . Parameter Setting: W e set the parameters required by Algorithms 1 and 2 as follows: W e consider the time horizon n “ 10 3 , the agents’ learning rate γ “ 0 . 1 ? n “ 0 . 0032 , and the response parameter α “ 10 ´ 5 . W e are not only interested in the case β “ γ analyzed in Section IV, but also in the case where the regulator is uncertain about the agents’ learning rate, and therefore β dif fers significantly from γ by the factor 10: β “ 10 γ ( β ą γ ) and β “ 10 ´ 1 γ ( β ă γ ). F or the noise modeling w .r .t. the disturbed cost we consider uniformly distributed random i.i.d. samples between [-0.01,0.01]. Perf ormance Evaluation: Fig. 2 sho ws that our pricing mechanism reduces the aggregated capacity violation even if β ‰ γ since the ACV for each of the parameter choices is significantly lower than A CV of purely anarchistic case (red,dashed). Ho we ver , we observe that a higher β may accelerate this process. Additionally , we see that our pric- ing method does not yield significant discrimination of the agents, when compared to the improvement of the capacity violation, as the differences between the aggregated delays for the different cases are marginal at worst (see Fig. 2 (b)). Still, we note a trade-off behavior in the choice of β : In case that β is high ( β ą γ ), the capacity violation is the lowest, but the experienced delay the highest. This occurrence 0 100 200 300 400 500 600 700 800 900 1 , 000 0 1 2 3 4 5 6 7 8 9 10 (a) L r “ 14 0 100 200 300 400 500 600 700 800 900 1 , 000 0 10 20 30 40 50 60 70 80 90 100 (b) L r “ 11 Fig. 3. Pricing over time reflects the increasing dominance of the price regulation over the agents’ personal interest to decrease the incurred delay . Another observation which we make is that if β “ γ , some prices might at worst be constant for large times as predicted in Corollary 2, indicating that e ven if the population fulfills resource constraints, a control mechanism is necessary to maintain this desired status quo. Overly Strict Capacity Constraints: W e also in vestigate the performance of our method with stricter capacity con- straints, i.e. L r “ 11 . W e see that our method still yields an improvement of the capacity violation compared to the no pricing case (see Fig. 4). Howe ver , this comes with a significant reduction of agents’ welfare in the form of a higher AD (see Fig. 4 (a)). One may justify this as follows: T aking a look at the pricing ev olution (Fig. 3 (b)) of exemplary resources, we observe a linear increase in prices dominating the personal preferences ( ˆ ` p i q P i in (1)) of the agents in lar ge times. Consequently , each of the affected agents decides for routes that ha ve the lower prices rather than those that incur the lowest delay . The enormous increase of prices shown in Fig. 3 (b) gives a hint that the minimizer of the Rosenthal potential correspond- ing to the network routing game over Q does not exist (c.f. the Proof of Theorem (1)) due to overly strict resource constraints. Howe ver , one may able to sho w the sub-linearity of ACV to be of order O p n 1 { 4 q . Moreov er , the increase in prices is in contrast to the case where the capacity constraints are rather loose (Fig. 3 (a)). The latter observations give the following heuristic: In case that one observes a linear increase of some prices, one may set a looser constraint so that the reduction of the capacity violations does not come with a significant reduction of the populations’ welfare. V I . S U M M A RY , D I S C U S S I O N , A N D F U T U R E W O R K Assuming that the agents are choosing their action based on the av erage historical cost of the resource bundles and the logit choice rule, we introduced a resource-centric pricing mechanism which allows a non-asymptotic guarantee of the sub-linear growth of the expected aggregated violation of the resource constraints of order O p ? n q . In case that the resource 0 100 200 300 400 500 600 700 800 900 1 , 000 44 45 46 47 48 49 50 51 52 53 no pricing β = γ β > γ β < γ (a) AD av eraged over time 0 100 200 300 400 500 600 700 800 900 1 , 000 6 7 8 9 10 11 12 13 14 15 16 no pricing β = γ β > γ β < γ (b) ACV av eraged over time Fig. 4. Performance for L r “ 11 constraints are not ov erly strict, we observe numerically that the resource sustainability deli vered by our method, does not come with significant discrimination of the agents. For the general case, trade-of f effect between resource sustainability and population’ s welfare might occur . In the future, we plan to explain these aspects formally . R E F E R E N C E S [1] D. Evans, “The Internet of Things: How the Next Evolution of the Internet is Changing Everything, ” CISCO, T ech. Rep., April 2011. [2] M. Pickav et, W . V ereecken, S. Demeyer, P . Audenaert, B. V ermeulen, C. Develder, D. Colle, B. Dhoedt, and P . Demeester, “W orldwide energy needs for ICT: The rise of po wer-aware netw orking, ” in 2nd IEEE ANTS , Dec. 2008, pp. 1–3. [3] R. W . Rosenthal, “ A class of games possessing pure-strategy nash equilibria, ” Int. J. of Game Th. , vol. 2, no. 1, pp. 65–67, Dec 1973. [4] D. Schmeidler , “Equilibrium points of nonatomic games, ” J. of Stat. Phy . , vol. 7, no. 4, pp. 295–300, Apr 1973. [5] J. G. W ardrop, “Some Theoretical Aspects of Road Traffic Research. ” Pr oc. of the Inst. of Civ . Eng. , v ol. 1, no. 3, pp. 325–362, 1952. [6] O. Ercetin, “ Association games in IEEE 802.11 wireless local area networks, ” IEEE T rans. on W ire. Comm. , vol. 7, no. 12, pp. 5136 – 5143, Dec. 2008. [7] L. Chen, “ A Distributed Access Point Selection Algorithm Based on No-Regret Learning for Wireless Access Networks, ” in IEEE 71st V eh. T ech. Conf. , 2010, pp. 1 – 5. [8] E. Altman, A. Kumar , and Y . Hayel, “ A potential game approach foruplink resource allocation in a multichannel wireless access network, ” in 4th Int. ICST Conf. on P erf. Eval. Meth. and T ools , 2009. [9] D. Sanyal, S. Chakraborty , M. Chattopadhyay , and S. Chattopadhyay , “Congestion games in wireless channels with multipacket reception capability , ” in Information and Communication T echnologies , 2010, pp. 201–205. [10] A. Argento, M. Cesana, and I. Malanchini, “On access point association in wireless mesh networks, ” in 2010 IEEE W oWMoM , 2010, pp. 1–6. [11] W . H. Sandholm, “Potential games with continuous player sets, ” J. of Econ. Th. , vol. 97, no. 1, pp. 81 – 108, 2001. [12] S. Fischer and B. V ¨ ocking, “On the ev olution of selfish routing, ” in ESA . Springer , 2004, pp. 323–334. [13] A. Blum, E. Even-Dar , and K. Ligett, “Routing Without Regret: On Con ver gence to Nash Equilibria of Regret-Minimizing Algorithms in Routing Games, ” in Pr oc. of the 25th Ann. ACM Symp. on Princ. of Dist. Comp. , 2006, pp. 45 – 52. [14] W . Krichene, B. Drigh ´ es, and A. Bayen, “Online Learning of Nash Equilibria in Congestion Games, ” SIAM J. on Cont. and Opt. , vol. 53, no. 2, pp. 1056–1081, 2015. [15] T . Roughgarden and ´ Eva T ardos, “How bad is selfish routing?” J. ACM , vol. 49, no. 2, pp. 236–259, 2002. [16] F . Facchinei and C. Kanzow , “Generalized Nash equilibrium problems, ” 4OR , vol. 5, no. 3, pp. 173–210, Sep 2007. [17] G. Scutari, D. P . Palomar, F . Facchinei, and J.-S. Pang, Distributed Deci- sion Making and Contr ol , ser . Lecture Notes in Control and Information Sciences. Springer, 2012, ch. Monotone Games for Cognitiv e Radio Systems, pp. 83–112. [18] T . Alpcan and T . Basar, “ A game-theoretic framework for congestion control in general topology networks, ” in Pr oc. 41th IEEE CDC , vol. 2, 2002, pp. 1218–1224. [19] G. Scutari, S. Barbarossa, and D. P . Palomar , “Potential Games: A Framew ork for V ector Power Control Problems With Coupled Con- straints, ” in IEEE ICASSP , vol. 4, 2006. [20] A. Ozdaglar and R. Srikant, “Incentives and pricing in communication networks, ” in In Algorithmic Game Theory . Cambridge Press, 2007, pp. 571–591. [21] F . Farokhi and K. H. Johansson, “ A piecewise-constant congestion taxing policy for repeated routing games, ” T ransportation Resear ch P art B: Methodological , vol. 78, pp. 123 – 143, 2015. [22] J. Barrera and A. Garcia, “Dynamic Incentiv es for Congestion Control, ” IEEE T rans. on Aut. Cont. , vol. 60, no. 2, pp. 299 – 310, Feb. 2015. [23] D. Paccagnan, B. Gentile, F . Parise, M. Kamgarpour , and J.Lygeros, “Nash and Wardrop Equilibria in Aggregati ve Games with Coupling Constraints, ” IEEE T rans. on Aut. Cont. , 2017. [24] D. Monderer and L. S. Shapley , “Potential games, ” Games Econ. Beh. , vol. 14, pp. 124 – 143, May 1996. [25] P . Mertikopoulos and Z. Zhou, “Learning in games with continuous action sets and unknown payoff functions, ” Math. Pro g. , Mar . 2018. [26] E. T ampubolon, H. Ceribasic, and H. Boche, “Resource-aware control via dynamic pricing for congestion game with finite-time guarantees, ” arXiv e-prints , 2020. A P P E N D I X A. Additional Notations ‚ Ă M p i q “ m i M p i q ‚ One can express φ p µ q more compactly by φ p µ q “ Ă M µ , where Ă M “ r Ă M p 1 q | ¨ ¨ ¨ | Ă M p N q s B. Auxiliary Statements Lemma 3: Suppose that Λ 0 “ 0 . F or all r P r R s and k P N : A CV p k q ď } Λ p k q} 2 ` α ř k ´ 1 τ “ 1 } Λ p τ q} 2 β (10) 1) Pr oof of Lemma 3: The definition of our price policy giv es: Λ r p τ ` 1 q ě Λ r p τ q ` β Γ r p τ q ´ α Λ r p τ q . So by summing this inequality and subsequent telescoping, we hav e since Λ 0 “ 0 : k ´ 1 ÿ τ “ 0 Γ r p τ q ď Λ r p k q ` ř k ´ 1 τ “ 1 α Λ r p τ q β . W e observe that the inequality also holds when applying r¨s ` to the L.H.S. due to the R.H.S. of the inequality being non- negati ve and since r¨s ` is monotonically increasing. Moreover , since the resulting inequality holds for all r P r R s and both sides of the equation are positive it follows using monotonic- ity: › › › › › « k ´ 1 ÿ τ “ 0 Γ p τ q ff ` › › › › › 2 ď › › › › › Λ p k q ` ř k ´ 1 τ “ 1 α Λ p τ q β › › › › › 2 W e notice, that the L.H.S. corresponds to our definition of the aggregated capacity violation at stage k ´ 1 . Therefore ap- plying the triangle inequality yields the upper-bound (10) and (subsequently an approximation) for the aggregated capacity violation purely based on the price. C. Monotonicity of the KKT -operators An operator F : R D Ñ R D is said to be monotone on Z Ď R D if x x 1 ´ x 2 , F p x 1 q ´ F p x 2 qy ď 0 , for all x 1 , x 2 P Z . If in the latter strict inequality hold for x 1 ‰ x 2 , then F is said to be strictly monotone. F is said to be c -strongly monotone on Z if x x 1 ´ x 2 , F p x 1 q ´ F p x 2 qy ď ´ c } x 1 ´ x 2 } 2 , for all x 1 , x 2 P Z . Proposition 4: Let be X Ă R D . Consider the operator ˜ v : X ˆ R M ě 0 Ñ R D ˆ R M given by: p x, λ q ÞÑ “ v p x q ` A T λ, b ´ Ax ‰ T , (11) wher e v : X Ñ R D , A P R M ˆ D , and b P R M . It holds: x α 1 ´ α 2 , ˜ v p α 1 q ´ ˜ v p α 2 qy “ x x 1 ´ x 2 , v p x 1 q ´ v p x 2 qy , (12) for all α i : “ p x i , λ i q P X ˆ R M ě 0 , i “ 1 , 2 . Proof: Straightforward computations yields: x α 1 ´ α 2 , ˜ v p α 1 q ´ ˜ v p α 2 qy “ x x 1 ´ x 2 , v p x 1 q ´ v p x 2 qy ` x x 1 ´ x 2 , A T λ 1 ´ A T λ 2 y ´ x λ 1 ´ λ 2 , Ax 1 ´ Ax 2 y . Moreov er , we have: x λ 1 ´ λ 2 , Ax 1 ´ Ax 2 y “ x A T λ 1 ´ A T λ 2 , x 1 ´ x 2 y , Combining both computations, we obtain (12). D. Pr oof of the main result Proof (Proof of Theorem 1): The logit choice Φ p i q giv en in (2) is a mirror map (Definition 3.1 in [25]) induced by the negati ve Gibbs entropy ψ i p µ p i q q “ ř P i P P i µ p i q P i ln p µ p i q P i q as regularizer on the simplex which is a compact con vex subset. Let be F m p µ, Y p k qq : “ ř N i “ 1 m i F i p µ p i q , Y p i q p k qq where F i is the Fenchel coupling (Definition 4.2 in [25]) induced by the negati ve Gibbs entropy as 1 -strongly (w .r .t. } ¨ } 8 ) con vex regularizer on the simplex ∆ i . By means of F m , we can estimate the evolution of Al- gorithm 1 with the dynamic pricing mechanism giv en in Algorithm 2 by means of L yapunov’ s type argumentation. T ow ard this end, we use the usual bound for the one step difference of the Fenchel coupling (see e.g. Proposition 4.3 (c) in [25]), insert the giv en iterate at time k ` 1 in the resulted inequality , and apply triangle inequality , to obtain: F m p µ, Y p k ` 1 qq ´ F m p µ, Y p k qq ď ´ γ N ÿ i “ 1 m i x µ p i q p k q ´ µ p i q , ˆ ` p i q p k q ` π p i q p k qy lo ooooooooooooooooooooooomo oooooooooooooooooooooo on “ : (a) ` γ 2 2 N ÿ i “ 1 m i } ˆ ` p i q p k q ` π p i q p k q} 2 8 lo ooooooooooooo omoooooooooooooo on “ : (b) . (13) By the triangle inequality and the definition of the constants giv en in Section IV, we can estimate the summand (b) as follows: (b) { 2 ď C 2 1 } Λ p k q} 2 2 ` 2 p C 2 2 ` N ÿ i “ 1 m i } ξ p i q k ` 1 } 2 8 q (14) Now to estimate the summand (a), notice that we can write: N ÿ i “ 1 m i x µ p i q p k q ´ µ p i q , π p i q p k qy “ x µ p k q ´ µ, Ă M T Λ p k qy . (15) Combining all the previous observations, we have by sum- ming the resulting inequality over all k “ 0 , . . . , n ´ 1 , and by subsequent telescoping, we obtain an upper bound for the cu- mulativ e difference V p 1 q n p µ q : “ F m p µ, Y p n qq ´ F m p µ, Y p 0 qq : V p 1 q n p µ q ď ´ γ n ´ 1 ÿ k “ 0 N ÿ i “ 1 m i x µ p i q p k q ´ µ p i q , ` p i q p µ p k qqy lo oooooooooooooooomo ooooooooooooooo on “x µ p i q p k q´ µ p i q , ∇ µ p i q p k q V p µ p k qqy lo oooooooooooooooooo omo oooooooooooooooooo on x µ p k q´ µ,v p µ p k qqy ´ γ n ´ 1 ÿ k “ 0 x µ p k q ´ µ, Ă M T Λ p k qy ` γ 2 C 2 1 n ´ 1 ÿ k “ 0 } Λ p k q} 2 2 ` γ S n ` 2 γ 2 R n ` 2 C 2 2 γ 2 n (16) where: S n : “ ´ n ´ 1 ÿ k “ 0 x X p k q´ x ˚ , ξ p k ` 1 qy , R n : “ m ˚ N n ÿ k “ 1 } ξ p k q} 2 8 , and where v p µ q : “ ∇ V p µ q , where V denotes the Rosenthal potential: V : ∆ Ñ R , µ ÞÑ ÿ r P R ż φ r p µ q 0 ` r p u q d u, (17) W e no w estimate the e volution of the price vector by pro- viding a bound for V p 2 q n p λ q : “ p} Λ p n q ´ λ } 2 2 ´ } Λ p 0 q ´ λ } 2 2 q{ 2 , where λ ě 0 . By similar computations as before, and by the elementary bound 2 x λ ´ Λ p k q , Λ p k qy ď } λ } 2 2 ´ } Λ p k q} 2 2 , we obtain: V p 2 q n p λ q ď β n ´ 1 ÿ k “ 0 x Λ p k q ´ λ, φ p µ p k qq ´ L y ` α 2 n ´ 1 ÿ k “ 0 p} λ } 2 2 ´ } Λ p k q} 2 2 q ` n ´ 1 ÿ k “ 0 p β 2 C 2 3 ` α 2 } Λ p k q} 2 2 q . (18) Combining the bounds (16) and (18), , it holds: V p 1 q n p µ q ` V p 2 q n p λ q ď ´ γ n ´ 1 ÿ k “ 0 x z p k q ´ z , ˜ v p z p k qqy ` p β ´ γ q n ´ 1 ÿ k “ 0 x Λ p k q ´ λ, Ă M µ p k q ´ L y ` p γ 2 C 2 1 ´ α 2 ` α 2 q n ´ 1 ÿ k “ 0 } Λ p k q} 2 2 ` ˆ 2 C 2 2 γ 2 ` C 2 3 β 2 ` α } λ } 2 2 2 ˙ n ` γ S n ` 2 γ 2 R n , where: z p k q : “ p µ p k q , Λ p k qq , z “ p µ, λ q , ˜ v p z p k qq “ r ∇ V p µ p k qq ` Ă M T Λ p k q , L ´ Ă M µ p k qs . setting β “ γ and α “ δ γ 2 with δ P p 0 , 1 { γ 2 q fulfilling (4), we hav e: V p 1 q n p µ q ` V p 2 q n p λ q ď ´ γ n ´ 1 ÿ k “ 0 x z p k q ´ z , ˜ v p z p k qqy looooooooo omoooooooooon “ :Υ k p z ,z p k qq ` ˆ p 2 C 2 2 ` C 2 3 q γ 2 ` α } λ } 2 2 2 ˙ n ` γ S n ` 2 γ 2 R n . (19) Notice that v is monotone (for the definition of monotone operator see C) since V is conv ex. Thus by Proposition 4, we hav e that ˜ v is also monotone implying: Υ k p z , z p k qq ě x z p k q ´ z , ˜ v p z qy . Moreov er , by the slater’ s condition and KKT ar gumen- tations, we can find a Lagrangian dual optimizer λ ˚ P R R ě 0 corresponding to the minimizer µ ˚ of V over Q : “ t µ P ∆ : Γ p µ q ď 0 u . It follows that p µ ˚ , λ ˚ q P SOL p X ˆ R R , ˜ v q , and consequently: Υ k p z ˚ , z p k qq ě x z p k q´ z ˚ , ˜ v p z ˚ qy ě 0 , z ˚ “ p µ ˚ , λ ˚ q . (20) Setting this observation into (19), we obtain: V p 1 q n p µ ˚ q ` V p 2 q n p λ ˚ q ď ˆ p 2 C 2 2 ` C 2 3 q γ 2 ` α } λ ˚ } 2 2 2 ˙ n ` γ S n ` 2 γ 2 R n . (21) Now , since Y 0 “ 0 , we have: V p 1 q n p µ ˚ q ě ´ N ÿ i “ 1 m i ˆ max ∆ i ψ i ´ min ∆ i ψ i ˙ ě ´ m ˚ N ÿ i “ 1 ln p| P i |q , and thus V p 1 q n p µ ˚ q ě ´ ∆ ψ 2 { 2 . Combining this observation with (21) and using Λ 0 “ 0 , we obtain that: } Λ p n q´ λ ˚ } 2 2 2 ď ∆ ψ 2 2 ` } Λ p 0 q´ λ ˚ } 2 2 2 lo ooo omo ooo on } λ ˚ } 2 2 2 ` ˆ p 2 C 2 2 ` C 2 3 q γ 2 ` α } λ ˚ } 2 2 2 ˙ n ` γ S n ` 2 γ 2 R n “ ∆ ψ 2 2 ` p 1 ` αn q } λ ˚ } 2 2 2 ` p 2 C 2 2 ` C 2 3 q γ 2 n ` γ S n ` 2 γ 2 R n (22) Since S n is a martingale with E r S 1 s “ 0 , we have by taking the expectation (and noticing E r S n s “ 0 ), the desired result. E. Pr oof of consequences of the main result Proof (Proof of Corollary 2): Jensen’ s and triangle inequal- ity asserts that: b E r} Λ p n q ´ λ ˚ } 2 2 s ě E r} Λ p n q ´ λ ˚ } 2 s ě E r} Λ p n q} 2 s´} λ ˚ } 2 Applying this to (5) and by the persistence of the noise, we obtain (8). For any k P r n s , we have by Corollary 2: E r} Λ p k q} 2 s ď ∆ ψ ` p 1 ` a p 1 ` δ γ 2 n qq} λ ˚ } 2 ` p ˜ C 1 ` σ q γ ? n. Now , setting our choices of parameters into (8), it yields: E r} Λ p k q} 2 s ď ∆ ψ ` p 1 ` a p 1 ` δ c 2 qq} λ ˚ } 2 ` p ˜ C 1 ` σ q c “ A. Consequently: α β E « n ´ 1 ÿ k “ 0 } Λ p k q} 2 ff “ δ c ? n n ´ 1 ÿ k “ 1 E r} Λ p k q} 2 s ď δ c p n ´ 1 q ? n A ď δ cA ? n. (23) Moreov er , we hav e E r} Λ p n q} 2 s { β ď A ? n { c . Setting this ob- servation and (23) into (10), we hav e the remaining statement.
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment