Exploiting Channel Memory for Multi-User Wireless Scheduling without Channel Measurement: Capacity Regions and Algorithms

1 Exploiting Channel Memory for Multi-User W ireless Scheduling without Channel Measurement: Capacity Re gions and Algorithms Chih-ping Li, Student Member , IEEE and Michael J. Neely , Senior Member , IEEE Abstract —W e study the fundamental network capacity of a multi-user wireless downlink under tw o assumptions: (1) Chan- nels are not explicitly measured and thus instantaneous states are unknown, (2) Channels are modeled as ON / OFF Marko v chains. This is an important network model to explore because channel probing may be costly or infeasible in some contexts. In this case, we can use channel memory with A CK/N A CK feedback from pre vious transmissions to improve network throughput. Computing in closed form the capacity region of this network is difﬁcult because it inv olves solving a high dimension partially observed Markov decision problem. Instead, in this paper we construct an inner and outer bound on the capacity region, showing that the bound is tight when the number of users is large and the trafﬁc is symmetric. F or the case of heterogeneous trafﬁc and any number of users, we propose a simple queue-dependent policy that can stabilize the netw ork with any data rates strictly within the inner capacity bound. The stability analysis uses a novel frame-based Lyapuno v drift argument. The outer-bound analysis uses stochastic coupling and state aggregation to bound the performance of a restless bandit problem using a related multi-armed bandit system. Our results are useful in cognitive radio networks, opportunistic scheduling with delay ed/uncertain channel state inf ormation, and restless bandit problems. Index T erms —stochastic network optimization, Markovian channels, delayed channel state inf ormation (CSI), partially observable Markov decision process (POMDP), cognitive radio, restless bandit, opportunistic spectrum access, queueing theory , L yapunov analysis. I . I N T R O D U C T I O N D UE to the increasing demand of cellular network ser- vices, in the past ﬁfteen years efﬁcient communication ov er a single-hop wireless downlink has been extensi vely stud- ied. In this paper we study the fundamental network capacity of a time-slotted wireless downlink under the following as- sumptions: (1) Channels are nev er explicitly probed, and thus their instantaneous states are nev er kno wn, (2) Channels are modeled as two-state ON / OFF Mark ov chains. This network model is important because, due to the ener gy and timing ov erhead, learning instantaneous channel states by probing may be costly or infeasible. Even if this is feasible (when Chih-ping Li (web: http://www-scf.usc.edu/ ∼ chihpinl) and Michael J. Neely (web: http://www-rcf.usc.edu/ ∼ mjneely) are with the Department of Electrical Engineering, University of Southern California, Los Angeles, CA 90089, USA. This material is supported in part by one or more of the follo wing: the D ARP A IT -MANET program grant W911NF-07-0028, the NSF Career grant CCF-0747525, and continuing through participation in the Network Science Collaborativ e T echnology Alliance sponsored by the U.S. Army Research Laboratory . This paper appears in part in [1]. channel coherence time is relati vely large), the time consumed by channel probing cannot be re-used for data transmission, and transmitting data without probing may achiev e higher throughput [2]. 1 In addition, since wireless channels can be adequately modeled as Markov chains [3], [4], we shall take advantage of channel memory to improve network throughput. Speciﬁcally , we consider a time-slotted wireless downlink where a base station serves N users through N (possibly different) positively corr elated Markov ON / OFF channels. Channels are nev er probed so that their instantaneous states are unknown. In ev ery slot, the base station selects at most one user to which it transmits a packet. W e assume e very packet transmission takes exactly one slot. Whether the transmission succeeds depends on the unknown state of the channel. At the end of a slot, an A CK/N A CK is fed back from the served user to the base station. Since channels are either ON or OFF , this feedback rev eals the channel state of the served user in the last slot and provides partial information of future states. Our goal is to characterize all achie vable throughput vectors in this network, and to design simple throughput-achieving algorithms. W e deﬁne the network capacity r egion Λ as the closure of the set of all achie vable throughput vectors. W e can compute Λ by locating its boundary points. Every boundary point can be computed by formulating a partially observable Markov decision process (POMDP) [5], with information states de- ﬁned as, conditioning on the channel observation history , the probabilities that channels are ON . This approach, howe ver , is computationally prohibitive because the information state space is countably inﬁnite (which we will show later) and grows exponentially fast with N . The ﬁrst contribution of this paper is that we construct an outer and an inner bound on Λ . The outer bound comes from analyzing a ﬁctitious channel model in which e very scheduling policy yields higher throughput than it does in the real network. The inner bound is the achie v able rate region of a special class of randomized r ound robin policies (introduced 1 One quick example is to consider a time-slotted channel with state space { B , G } . Suppose channel states are i.i.d. ov er slots with stationary probabilities Pr [ B ] = 0 . 2 and Pr [ G ] = 0 . 8 . At state B and G , at most 1 and 2 packets can be successfully deliv ered in a slot, respecti vely . Packet transmissions beyond the capacity will all fail and need retransmissions. Channel probing can be done on each slot, which consumes 0 . 2 fraction of a slot. Then the policy that always probes the channel yields throughput 0 . 8(2 · 0 . 8 + 1 · 0 . 2) = 1 . 44 , while the policy that nev er probes the channel and alw ays sends packets at rate 2 packets/slot yields throughput 2 · 0 . 8 = 1 . 6 > 1 . 44 . 2 in Section IV -A). These policies are simple and take advantage of channel memory . In the case of symmetric channels (that is, channels are i.i.d.) and when the network serves a large number of users, we sho w that as data rates are more balanced , or in a geometric sense as the direction of the data rate vector in the Euclidean space is closer to the 45 -degree angle, the inner bound con ver ges geometrically fast to the outer bound, and the bounds are tight. This analysis uses results in [6], [7] that deri ve an outer bound on the maximum sum throughput for a symmetric system. The inner capacity bound is indeed useful. First, the struc- ture of the bound itself sho ws how channel memory impro ves throughput. Second, we show analytically that a large class of intuitively good heuristic policies achie ve throughput that is at least as good as this bound, and hence the bound acts as a (non-trivial) performance guarantee. Finally , supporting throughput outside this bound may inevitably in volv e solving a much more complicated POMDP . Thus, for simplicity and practicality , we may regard the inner bound as an operational network capacity region. In this paper we also deri ve a simple queue-dependent dy- namic round robin policy that stabilizes the network whenever the arri val rate v ector is interior to our inner bound. This polic y has polynomial time complexity and is deriv ed by a novel variable-length frame-based Lyapunov analysis , ﬁrst used in [8] in a different context. This analysis is important because the inner bound is based on a mixture of many dif ferent types of round robin policies, and an of ﬂine computation of the proper time av erage mixtures needed to achiev e a giv en point in this complex inner bound would require solving Θ(2 N ) unknowns in a linear system, which is impractical when N is large. The L yapunov analysis overcomes this complexity difﬁculty with online queue-dependent decisions. The results of this paper apply to the emerging area of opportunistic spectrum access in cogniti ve radio netw orks (see [9] and references therein), where the channel occupancy of a primary user acts as a Markov ON / OFF channel to the secondary users. Speciﬁcally , our results apply to the important case where each of the secondary users has a designated channel and they cooperate via a centralized controller . This paper is also a study on efﬁcient scheduling over wireless networks with delayed/uncertain channel state information (CSI) (see [10]–[12] and references therein). The work on delayed CSI that is most closely related to ours is [11], [12], where the authors study the capacity region and throughput- optimal policies of different wireless networks, assuming that channel states are persistently probed b ut fed back with delay . W e note that our paper is signiﬁcantly different. Here channels are nev er probed, and new (delayed) CSI of a channel is only acquired when the channel is served. Implicitly , acquiring the delayed CSI of any channel is part of the control decisions in this paper . This paper is organized as follo ws. The network model is giv en in Section II, inner and outer bounds are constructed in Sections III and IV, and compared in Section V in the case of symmetric channels. Section VI giv es the queue-dependent policy to achiev e the inner bound. I I . N E T W O R K M O D E L Consider a base station transmitting data to N users through N Markov ON / OFF channels. Suppose time is slotted with normalized slots t in { 0 , 1 , 2 , . . . } . Each channel is modeled as a two-state ON / OFF Markov chain (see Fig. 1). The state 2 in Section IV -A). These policies are simple and tak e adv antage of channel memory . In the case of symmetr ic channels (that is, channels are i.i.d.) and when the netw ork serv es a lar ge number of users, we sho w that as data rates are more balanced , or in a geometric sense as the direction of the data rate v ector in the Euclidean space is closer to the 45 -de gree angle, the inner bound con v er ges geometrically f ast to the outer bound, and the bounds are tight. This analysis uses results in [6], [7] that deri v e an outer bound on the maximum sum throughput for a symmetric system. The inner capacity bound is indeed useful. First, the struc- ture of the bound itself sho ws ho w channel memory impro v es throughput. Second, we sho w analytically that a lar ge class of intuiti v ely good heuristic policies achie v e throughput that is at least as good as this bound, and hence the bound acts as a (non-tri vial) performance guarantee. Finally , supporting throughput outside this bound may ine vitably in v olv e solving a much more complicated POMDP . Thus, for simplicity and practicality , we may re g ard the inner bound as an oper ational netw ork capacity re gion. In this paper we also deri v e a simple queue-dependent dy- namic round robin polic y that stabilizes the net w ork whene v er the arri v al rate v ector is interior to our inner bound. This polic y has po l ynomial time comple xity and is deri v ed by a no v el variable-length fr ame-based L yapuno v analysis , ﬁrst used in [8] in a di f ferent conte xt. This analysis is important because the inner bound is based on a mixture of man y dif ferent types of round robin policies, and an of ﬂine computation of the proper time a v erage mixtures needed to achie v e a gi v en point in this comple x inner bound w ould require solving Θ (2 N ) unkno wns in a linear system, which is impractical when N is lar ge. The L yapuno v analysis o v ercomes this comple xity dif ﬁculty with online queue-dependent decisions. The results of this paper apply to the emer ging area of opportunistic spectrum access in cogniti v e radio netw orks (see [9] and references therein), where the channel occupanc y of a primary user acts as a Mark o v ON / OFF channel to the secondary users. Speciﬁcally , our results apply to the important case where each of the secondary users has a designated channel and the y cooperate via a centralized controller . This paper is also a study on ef ﬁcient scheduling o v er wireless netw orks with delayed/uncertain channel state information (CSI) (see [10]–[12] and references therein). The w ork on delayed CSI that is most closely related to ours is [11], [12], where the authors study the capacity re gion and throughput- optimal policies of dif ferent wireless netw orks, assuming that channel states are per sistently probed b ut fed back with delay . W e note that our paper is signiﬁcantly dif ferent. Here channels are ne v er probed, and ne w (delayed) CSI of a channel is only acquired when the channel is serv ed. Implicitly , acquiring the delayed CSI of an y channel is part of the control decisions in this paper . This paper is or g anized as follo ws. The netw ork model is gi v en in Section II, inner and outer bounds are constructed in Sections III and IV, and compared in Section V in the case of symmetric channels. Section VI gi v es the queue-dependent polic y to achie v e the inner bound. II. N ET W ORK M OD EL Consider a base stat ion transmitting data to N users through N Mark o v ON / OFF channels. Suppose time is slotted with normalized slots t in { 0 , 1 , 2 ,. .. } . Each channel is modeled as a tw o-state ON / OFF Mark o v chain (se e Fig. 1). The state ON ( 1 ) OFF ( 0 ) P n, 10 P n, 11 P n, 00 P n, 01 Fig. 1. A tw o-state Mark o v ON / OFF chain for channel n ∈ { 1 , 2 ,... ,N } . e v olution of channel n ∈ { 1 , 2 ,. .., N } follo ws the transition probability matrix P n =  P n, 00 P n, 01 P n, 10 P n, 11  , where state ON is represented by 1 and OFF by 0 , and P n,ij denotes the transition probability from state i to j . W e assume P n, 11 < 1 for all n so that no channel is constantly ON . Incorporating constantly ON channels lik e wired links is easy and thus omitted in this paper . W e suppose channel states are ﬁx ed in e v ery slot and may only change at slot boundaries. W e assume all channels are positi v ely correlated, which, in terms of transition probabi lities, is equi v alent to assuming P n, 11 > P n, 01 or P n, 01 + P n, 10 < 1 for all n . 2 W e suppose the base station k eeps N queues of inﬁnite c apacity to store e xogenous pack et arri v als dest ined for the N users. At the be ginning of e v ery slot, the base station attempts to transmit a pack et (if there is an y) to a selected user . W e suppos e the base station has no channel probing capability and must select users obli vious of the current channel states. If a user i s selected and its current channel state is ON , one pack et is successfully deli v ered to that user . Otherwise, the transmission f ai ls and zero pack ets are serv ed. At the end of a slot i n which the base station serv es a user , an A CK/N A CK message i s fed back from the selected user to the base station through an independent error - free control channel, according to whether the transmission succeeds. F ailing to recei v e an A CK is re g arded as a N A CK. Since channel states are either ON or OFF , such feedback re v eals the channel state of the selected user in the last slot. Conditioning on all past channel observ ations, deﬁne the N - dimensional information state vec tor ω ( t )=( ω n ( t ): 1 ≤ n ≤ N ) where ω n ( t ) is the conditional probability that channel n is ON in s lot t . W e assume initially ω n ( 0) = π n, ON for all n , where π n, ON denotes the stationary probability that channel n is ON . As discussed in [5, Chapter 5 . 4 ], v ector ω ( t ) is a suf ﬁcient statistic . That is, instead of tracking the whole system 2 Assumption P n, 11 > P n, 01 yields that the state s n ( t ) of channel n has auto-co v ariance E [( s n ( t ) − E s n ( t ))( s n ( t +1 ) − E s n ( t +1 ) ) ] > 0 . In addition, we note that the case P n, 11 = P n, 01 corresponds to a channel ha ving i.i.d. s tates o v er slots. Although we can naturally incorporate i.i.d. channels into our model and all our results still hold, we e xclude t hem in this paper because we shall sho w ho w throughput can be impro v ed by channel memory , which i.i.d. channels do not ha v e. The de generate case where all channels are i.i.d. o v er slots is fully solv ed in [2]. Fig. 1. A two-state Markov ON / OFF chain for channel n ∈ { 1 , 2 , . . . , N } . ev olution of channel n ∈ { 1 , 2 , . . . , N } follows the transition probability matrix P n =  P n, 00 P n, 01 P n, 10 P n, 11  , where state ON is represented by 1 and OFF by 0 , and P n,ij denotes the transition probability from state i to j . W e assume P n, 11 < 1 for all n so that no channel is constantly ON . Incorporating constantly ON channels like wired links is easy and thus omitted in this paper . W e suppose channel states are ﬁxed in every slot and may only change at slot boundaries. W e assume all channels are positiv ely correlated, which, in terms of transition probabilities, is equi v alent to assuming P n, 11 > P n, 01 or P n, 01 + P n, 10 < 1 for all n . 2 W e suppose the base station keeps N queues of inﬁnite capacity to store exogenous packet arriv als destined for the N users. At the beginning of ev ery slot, the base station attempts to transmit a packet (if there is any) to a selected user . W e suppose the base station has no channel probing capability and must select users oblivious of the current channel states. If a user is selected and its current channel state is ON , one packet is successfully delivered to that user . Otherwise, the transmission fails and zero packets are served. At the end of a slot in which the base station serves a user , an A CK/NA CK message is fed back from the selected user to the base station through an independent error- free control channel, according to whether the transmission succeeds. Failing to receive an A CK is re garded as a NA CK. Since channel states are either ON or OFF , such feedback rev eals the channel state of the selected user in the last slot. Conditioning on all past channel observations, deﬁne the N - dimensional information state vector ω ( t ) = ( ω n ( t ) : 1 ≤ n ≤ N ) where ω n ( t ) is the conditional probability that channel n is ON in slot t . W e assume initially ω n (0) = π n, ON for all n , where π n, ON denotes the stationary probability that channel n is ON . As discussed in [5, Chapter 5 . 4 ], vector ω ( t ) is a sufﬁcient statistic . That is, instead of tracking the whole system 2 Assumption P n, 11 > P n, 01 yields that the state s n ( t ) of channel n has auto-covariance E [( s n ( t ) − E s n ( t ))( s n ( t + 1) − E s n ( t + 1))] > 0 . In addition, we note that the case P n, 11 = P n, 01 corresponds to a channel having i.i.d. states over slots. Although we can naturally incorporate i.i.d. channels into our model and all our results still hold, we exclude them in this paper because we shall show how throughput can be improv ed by channel memory , which i.i.d. channels do not have. The degenerate case where all channels are i.i.d. ov er slots is fully solved in [2]. 3 history , the base station can act optimally only based on ω ( t ) . The base station shall keep track of the { ω ( t ) } process. W e assume transition probability matrices P n for all n are known to the base station. W e denote by s n ( t ) ∈ { OFF , ON } the state of channel n in slot t . Let n ( t ) ∈ { 1 , 2 , . . . , N } denote the user served in slot t . Based on the A CK/N A CK feedback, vector ω ( t ) is updated as follows. For 1 ≤ n ≤ N , ω n ( t +1) =      P n, 01 , if n = n ( t ) , s n ( t ) = OFF P n, 11 , if n = n ( t ) , s n ( t ) = ON ω n ( t ) P n, 11 + (1 − ω n ( t )) P n, 01 , if n 6 = n ( t ) . (1) If in the most recent use of channel n , we observed (through feedback) its state w as i ∈ { 0 , 1 } in slot ( t − k ) for some k ≤ t , then ω n ( t ) is equal to the k -step transition probability P ( k ) n,i 1 . In general, for any ﬁx ed n , probabilities ω n ( t ) take v alues in the countably inﬁnite set W n = { P ( k ) n, 01 , P ( k ) n, 11 : k ∈ N } ∪ { π n, ON } . By eigenv alue decomposition on P n [13, Chapter 4 ], we can show the k -step transition probability matrix P ( k ) n is P ( k ) n , " P ( k ) n, 00 P ( k ) n, 01 P ( k ) n, 10 P ( k ) n, 11 # = ( P n ) k = 1 x n  P n, 10 + P n, 01 (1 − x n ) k P n, 01 (1 − (1 − x n ) k ) P n, 10 (1 − (1 − x n ) k ) P n, 01 + P n, 10 (1 − x n ) k  , (2) where we hav e deﬁned x n , P n, 01 + P n, 10 . Assuming that channels are positi vely correlated, i.e., x n < 1 , by (2) we have the following lemma. Lemma 1. F or a positively corr elated ( P n, 11 > P n, 01 ) Markov ON / OFF channel with transition pr obability matrix P n , we have 1) The stationary pr obability π n, ON = P n, 01 /x n . 2) The k -step transition pr obability P ( k ) n, 01 is nondecr easing in k and P ( k ) n, 11 nonincr easing in k . Both P ( k ) n, 01 and P ( k ) n, 11 con ver ge to π n, ON as k → ∞ . As a corollary of Lemma 1, it follows that P n, 11 ≥ P ( k 1 ) n, 11 ≥ P ( k 2 ) n, 11 ≥ π n, ON ≥ P ( k 3 ) n, 01 ≥ P ( k 4 ) n, 01 ≥ P n, 01 (3) for any integers k 1 ≤ k 2 and k 3 ≥ k 4 (see Fig. 2). T o maximize network throughput, (3) has some fundamental implications. W e note that ω n ( t ) represents the transmission success probability o ver channel n in slot t . Thus we shall keep serving a channel whene ver its information state is P n, 11 , for it is the best state possible. Second, gi ven that a channel was OFF in its last use, its information state impro ves as long as the channel remains idle. Thus we shall wait as long as possible before reusing such a channel. Actually , when channels are symmetric ( P n = P for all n ), it is shown that a myopic policy with this structure maximizes the sum throughput of the network [7]. I I I . A R O U N D R O B I N P O L I C Y For any inte ger M ∈ { 1 , 2 , . . . , N } , we present a spe- cial round robin policy RR ( M ) serving the ﬁrst M users 3 history , the base station can act optimally only based on ω ( t ) . The base station shall k eep track of the { ω ( t ) } process. W e assume transition probability matrices P n for all n are kno wn to the base station. W e denote by s n ( t ) ∈ { OFF , ON } the state of channel n in slot t . Let n ( t ) ∈ { 1 , 2 ,. .., N } denote the user serv ed in slot t . Based on the A CK/N A CK feedback, v ector ω ( t ) is updated as follo ws. F or 1 ≤ n ≤ N , ω n ( t + 1) =      P n, 01 , if n = n ( t ) , s n ( t )= OFF P n, 11 , if n = n ( t ) , s n ( t )= ON ω n ( t ) P n, 11 +( 1 − ω n ( t )) P n, 01 , if n � = n ( t ) . (1) If in the most recent use of channel n , we observ ed (through feedback) its state w as i ∈ { 0 , 1 } in slot ( t − k ) for some k ≤ t , then ω n ( t ) is equal to the k -step transition probability P ( k ) n,i 1 . In general, for an y ﬁx ed n , probabilities ω n ( t ) tak e v alues in the countably inﬁnite set W n = { P ( k ) n, 01 , P ( k ) n, 11 : k ∈ N } ∪ { π n, ON } . By eigen v alue decomposition on P n [13, Chapter 4 ], we can sho w the k -step transition probability matrix P ( k ) n is P ( k ) n �  P ( k ) n, 00 P ( k ) n, 01 P ( k ) n, 10 P ( k ) n, 11  =( P n ) k = 1 x n  P n, 10 + P n, 01 (1 − x n ) k P n, 01 (1 − (1 − x n ) k ) P n, 10 (1 − (1 − x n ) k ) P n, 01 + P n, 10 (1 − x n ) k  , (2) where we ha v e deﬁned x n � P n, 01 + P n, 10 . Assuming that channels are positi v ely correlated, i.e., x n < 1 , by (2) we ha v e the follo wing lemma. Lemma 1. F or a positively corr elated ( P n, 11 > P n, 01 ) Mark o v ON / OFF c hannel with tr ansition pr obability matrix P n , we have 1) The stationary pr obability π n, ON = P n, 01 /x n . 2) The k -step tr ansition pr obability P ( k ) n, 01 is nondecr easing in k and P ( k ) n, 11 nonincr easing in k . Both P ( k ) n, 01 and P ( k ) n, 11 con ver g e to π n, ON as k →∞ . As a corollary of Lemma 1, it follo ws that P n, 11 ≥ P ( k 1 ) n, 11 ≥ P ( k 2 ) n, 11 ≥ π n, ON ≥ P ( k 3 ) n, 01 ≥ P ( k 4 ) n, 01 ≥ P n, 01 (3) for an y inte gers k 1 ≤ k 2 and k 3 ≥ k 4 (see Fig. 2). T o maximize netw ork throughput, (3) has some fundamental implications. W e note that ω n ( t ) represents the transmission success probability o v er channel n in slot t . Thus we shall k eep serving a channel whene v er its information state is P n, 11 , for it is the best state possible. Second, gi v en that a channel w as OFF in its last use, its information state impro v es as long as the channel remains idle. Thu s we shall w ait as long as possible before reusing such a channel. Actually , when channels are symmetric ( P n = P for all n ), i t is sho wn that a myopic polic y with this structure maximi zes the sum throughput of the netw ork [7]. III. A R OU ND R OBI N P OL ICY F or an y inte ger M ∈ { 1 , 2 ,. .., N } , we present a spe- cial round robin polic y RR ( M ) serving the ﬁrst M users k ω n ( t ) π n, ON P n, 01 P n, 11 P ( k ) n, 11 P ( k ) n, 01 Fig. 2. Diagram of the k -step transition probabilities P ( k ) n, 01 and P ( k ) n, 11 of a positi v ely correlated Mark o v ON / OFF channel. { 1 , 2 ,. .., M } in the netw ork. The M users are serv ed in the circular order 1 → 2 → ·· · → M → 1 → ·· · . In general, we can use this polic y to serv e an y subset of users. This polic y is the fundamental b uilding block of all the results in this paper . A. The P olicy Round Robin P olicy RR ( M ) : 1) At time 0 , the base station starts with channel 1 . Suppose initially ω n ( 0) = π n, ON for all n . 2) Suppose at time t , the base station switches to channel n . T ransmit a data pack et to user n with probability P ( M ) n, 01 / ω n ( t ) and a dummy pack et otherwise. In both cases, we recei v e A CK/N A CK information at the end of the slot. 3) At time ( t + 1) , if a dummy pack et is sent at time t , switch to channel ( n mo d M )+1 and go to Step 2. Otherwise, k eep transmitting data pack ets o v er channel n until we recei v e a N A CK. Then switch to channel ( n mo d M )+ 1 and go to Step 2. W e note that dummy pack ets are only sent on the ﬁrst slot e v ery time the base station switches to a ne w channel. 4) Update ω ( t ) according to (1) in e v ery slot. Step 2 of RR ( M ) only mak es sense if ω n ( t ) ≥ P ( M ) n, 01 , which we pro v e in the ne xt lemma. Lemma 2. Under RR ( M ) , whene ver the base station switc hes to c hannel n ∈ { 1 , 2 ,. .., M } for another r ound of tr ansmis- sion, its curr ent information state satisﬁes ω n ( t ) ≥ P ( M ) n, 01 . Pr oof of Lemma 2: See Appendix A. W e note that polic y RR ( M ) is v ery conserv ati v e and not throughput-optimal. F or e xample, we can impro v e the through- put by al w ays sending data pack ets b ut no dummy ones. Also, it does not follo w the guidelines we pro vide at the end of Section II for maximum throughput. Y et, we wil l see that, in the case of symmetric channels, throughput under RR ( M ) is close to optimal when M is lar ge. Moreo v er , the underlying analysis of RR ( M ) is tractable so that we can mix such round robin policies o v er dif ferent subsets of users to form a non-tri vial inner capacity bound. The tractability of RR ( M ) is because it is equi v alent to the follo wing ﬁctitious round robin polic y (which can be pro v ed as a corollary of Lemma 3 pro vided later). Equi v alent Fictitious Round Robin : 1) At time 0 , start with channel 1 . Fig. 2. Diagram of the k -step transition probabilities P ( k ) n, 01 and P ( k ) n, 11 of a positiv ely correlated Marko v ON / OFF channel. { 1 , 2 , . . . , M } in the network. The M users are served in the circular order 1 → 2 → · · · → M → 1 → · · · . In general, we can use this policy to serve any subset of users. This policy is the fundamental building block of all the results in this paper . A. The P olicy Round Robin Policy RR ( M ) : 1) At time 0 , the base station starts with channel 1 . Suppose initially ω n (0) = π n, ON for all n . 2) Suppose at time t , the base station switches to channel n . T ransmit a data packet to user n with probability P ( M ) n, 01 /ω n ( t ) and a dummy packet otherwise. In both cases, we receive A CK/N A CK information at the end of the slot. 3) At time ( t + 1) , if a dummy packet is sent at time t , switch to channel ( n mo d M ) + 1 and go to Step 2. Otherwise, keep transmitting data packets over channel n until we receiv e a NA CK. Then switch to channel ( n mo d M ) + 1 and go to Step 2. W e note that dummy packets are only sent on the ﬁrst slot ev ery time the base station switches to a ne w channel. 4) Update ω ( t ) according to (1) in every slot. Step 2 of RR ( M ) only makes sense if ω n ( t ) ≥ P ( M ) n, 01 , which we prove in the next lemma. Lemma 2. Under RR ( M ) , whenever the base station switches to channel n ∈ { 1 , 2 , . . . , M } for another r ound of transmis- sion, its current information state satisﬁes ω n ( t ) ≥ P ( M ) n, 01 . Pr oof of Lemma 2: See Appendix A. W e note that policy RR ( M ) is very conservati ve and not throughput-optimal. F or example, we can improve the through- put by always sending data packets but no dummy ones. Also, it does not follow the guidelines we provide at the end of Section II for maximum throughput. Y et, we will see that, in the case of symmetric channels, throughput under RR ( M ) is close to optimal when M is large. Moreov er , the underlying analysis of RR ( M ) is tractable so that we can mix such round robin policies over different subsets of users to form a non-trivial inner capacity bound. The tractability of RR ( M ) is because it is equiv alent to the following ﬁctitious round robin policy (which can be proved as a corollary of Lemma 3 provided later). Equivalent Fictitious Round Robin : 1) At time 0 , start with channel 1 . 4 2) When the base station switches to channel n , set its curr ent information state to P ( M ) n, 01 . 3 Keep transmitting data packets over channel n until we receiv e a N ACK. Then switch to channel ( n mo d M ) + 1 and repeat Step 2. For any round robin policy that serves channels in the circular order 1 → 2 → · · · → M → 1 → · · · , the technique of resetting the information state to P ( M ) n, 01 creates a system with an information state that is worse than the information state under the actual system. T o see this, since in the actual system channels are served in the circular order , after we switch away from serving a particular channel n , we serve the other ( M − 1) channels for at least one slot each, and so we return to channel n after at least M slots. Thus, its starting information state is always at least P ( M ) n, 01 (the proof is similar to that of Lemma 2). Intuitiv ely , since information states represent the pack et transmission success probabilities, resetting them to lower values de grades throughput. This is the reason why our inner capacity bound constructed later using RR ( M ) provides a throughput lower bound for a large class of policies. B. Network Thr oughput under RR ( M ) Next we analyze the throughput vector achiev ed by RR ( M ) . 1) General Case: Under RR ( M ) , let L kn denote the dura- tion of the k th time the base station stays with channel n . A sample path of the { L kn } process is ( L 11 , L 12 , . . . , L 1 M | {z } round k = 1 , L 21 , L 22 , . . . , L 2 M | {z } round k = 2 , L 31 , . . . ) . (4) The next lemma presents useful properties of L kn , which serve as the foundation of the throughput analysis in the rest of the paper . Lemma 3. F or any inte ger k and n ∈ { 1 , 2 , . . . , M } , 1) The pr obability mass function of L kn is independent of k , and is L kn = ( 1 with pr ob . 1 − P ( M ) n, 01 j ≥ 2 with pr ob . P ( M ) n, 01 ( P n, 11 ) ( j − 2) P n, 10 . As a result, for all k ∈ N we have E [ L kn ] = 1 + P ( M ) n, 01 P n, 10 = 1 + P n, 01 (1 − (1 − x n )) M x n P n, 10 . 2) The number of data pack ets served in L kn is ( L kn − 1) . 3) F or every ﬁxed channel n , time durations L kn ar e i.i.d. random variables over all k . Pr oof of Lemma 3: 1) Note that L kn = 1 if, on the ﬁrst slot of serving channel n , either a dummy packet is transmitted or a data packet is transmitted but the channel is OFF . This e vent occurs with probability 1 − P ( M ) n, 01 ω n ( t ) ! + P ( M ) n, 01 ω n ( t ) (1 − ω n ( t )) = 1 − P ( M ) n, 01 . 3 In reality we cannot set the information state of a channel, and therefore the policy is ﬁctitious. Next, L kn = j ≥ 2 if in the ﬁrst slot a data packet is successfully serv ed, and this is followed by ( j − 2) con- secutiv e ON slots and one OFF slot. This happens with probability P ( M ) n, 01 ( P n, 11 ) ( j − 2) P n, 10 . The expectation of L kn can be directly computed from the probability mass function. 2) W e can observ e that one data packet is served in e very slot of L kn except for the last one (when a dummy packet is sent over channel n , we hav e L kn = 1 and zero data packets are serv ed). 3) At the beginning of ev ery L kn , we observe from the equiv alent ﬁctitious round robin policy that RR ( M ) effecti vely ﬁxes P ( M ) n, 01 as the current information state, regardless of the true current state ω n ( t ) . Neglecting ω n ( t ) is to discard all system history , including all past L k 0 n for all k 0 < k . Thus L kn are i.i.d.. Speciﬁcally , for any k 0 < k and integers l k 0 and l k we hav e Pr [ L kn = l k | L k 0 n = l k 0 ] = Pr [ L kn = l k ] . Now we can deri ve the throughput vector supported by RR ( M ) . Fix an integer K > 0 . By Lemma 3, the time av erage throughput over channel n after all channels ﬁnish their K th rounds, which we denote by µ n ( K ) , is µ n ( K ) , P K k =1 ( L kn − 1) P K k =1 P M n =1 L kn . Passing K → ∞ , we get lim K →∞ µ n ( K ) = lim K →∞ P K k =1 ( L kn − 1) P K k =1 P M n =1 L kn = lim K →∞ (1 /K ) P K k =1 ( L kn − 1) P M n =1 (1 /K ) P K k =1 L kn ( a ) = E [ L 1 n ] − 1 P M n =1 E [ L 1 n ] ( b ) = P n, 01 (1 − (1 − x n ) M ) / ( x n P n, 10 ) M + P M n =1 P n, 01 (1 − (1 − x n ) M ) / ( x n P n, 10 ) , (5) where ( a ) is by the La w of Large Numbers (noting by Lemma 3 that L kn are i.i.d. ov er k ), and ( b ) is by Lemma 3. 2) Symmetric Case: W e are particularly interested in the sum throughput under RR ( M ) when channels are symmetric, that is, all channels hav e the same statistics P n = P for all n . In this case, by channel symmetry e very channel has the same throughput. From (5), we can sho w the sum throughput is M X n =1 lim K →∞ µ n ( K ) = P 01 (1 − (1 − x ) M ) x P 10 + P 01 (1 − (1 − x ) M ) , where in the last term the subscript n is dropped due to channel symmetry . It is handy to deﬁne a function c ( · ) : N → R as c M , P 01 (1 − (1 − x ) M ) x P 10 + P 01 (1 − (1 − x ) M ) , x , P 01 + P 10 , (6) 5 and deﬁne c ∞ , lim M →∞ c M = P 01 / ( x P 10 + P 01 ) (note that x < 1 because every channel is positively correlated over time slots). The function c ( · ) will be used extensi vely in this paper . W e summarize the abo ve deriv ation in the next lemma. Lemma 4. P olicy RR ( M ) serves channel n ∈ { 1 , 2 , . . . , M } with thr oughput P n, 01 (1 − (1 − x n ) M ) / ( x n P n, 10 ) M + P M n =1 P n, 01 (1 − (1 − x n ) M ) / ( x n P n, 10 ) . In particular , in symmetric channels the sum thr oughput under RR ( M ) is c M deﬁned as c M = P 01 (1 − (1 − x ) M ) x P 10 + P 01 (1 − (1 − x ) M ) , x = P 01 + P 10 , and every channel has thr oughput c M / M . W e remark that the sum throughput c M of RR ( M ) in the symmetric case is nondecreasing in M , and thus can be improv ed by serving more channels. Interestingly , here we see that the sum throughput is improv ed by ha ving multiuser diversity in the network, ev en though instantaneous channel states are never known. C. How Good is RR ( M ) ? Next, in symmetric channels, we quantify how close the sum throughput c M is to optimal. The following lemma presents a useful upper bound on the maximum sum throughput. Lemma 5 ([6], [7]) . In symmetric channels, any scheduling policy that conﬁnes to our model has sum throughput less than or equal to c ∞ . 4 By Lemma 4 and 5, the loss of the sum throughput of RR ( M ) is no larger than c ∞ − c M . Deﬁne e c M as e c M , P 01 (1 − (1 − x ) M ) x P 10 + P 01 = c ∞ (1 − (1 − x ) M ) and note that e c M ≤ c M ≤ c ∞ . It follows c ∞ − c M ≤ c ∞ − e c M = c ∞ (1 − x ) M . (7) The last term of (7) decreases to zero geometrically fast as M increases. This indicates that RR ( M ) yields near-optimal sum throughput e ven when it only serves a moderately large number of channels. I V . R A N D O M I Z E D R O U N D R O B I N P O L I C Y , I N N E R A N D O U T E R C A P AC I T Y B O U N D A. Randomized Round Robin P olicy Lemma 4 speciﬁes the throughput v ector achiev ed by imple- menting RR ( M ) ov er a particular collection of M channels. Here we are interested in the set of throughput vectors achiev able by randomly mixing RR ( M ) -like policies over 4 W e note that the throughput analysis in [6] makes a minor assumption on the existence of some limiting time average. Using similar ideas of [6], in Theorem 2 of Section IV -C we will construct an upper bound on the maximum sum throughput for general positi vely correlated Marko v ON / OFF channels. When restricted to the symmetric case, we get the same upper bound without any assumption. different channel subsets and allowing a dif ferent round-robin ordering on each subset. T o generalize the RR ( M ) policy , ﬁrst let Φ denote the set of all N -dimensional binary vectors excluding the all-zero vector (0 , 0 , . . . , 0) . For any binary vector φ = ( φ 1 , φ 2 , . . . , φ N ) in Φ , we say channel n is active in φ if φ n = 1 . Each vector φ ∈ Φ represents a different subset of activ e channels. W e denote by M ( φ ) the number of activ e channels in φ . For each φ ∈ Φ , consider the following round robin policy RR ( φ ) that serves activ e channels in φ in ev ery round. Dynamic Round Robin Policy RR ( φ ) : 1) Deciding the service order in eac h r ound : At the beginning of each round, we denote by τ n the time duration between the last use of channel n and the beginning of the current round. Active channels in φ are served in the decreasing order of τ n in this round (in other words, the active channel that is least r ecently used is served ﬁrst). 2) On each active c hannel in a round : a) Suppose at time t the base station switches to channel n . T ransmit a data packet to user n with probability P ( M ( φ )) n, 01 /ω n ( t ) and a dummy packet otherwise. In both cases, we receiv e A CK/N A CK information at the end of the slot. b) At time ( t + 1) , if a dummy packet is sent at time t , switch to the next activ e channel following the order given in Step 1. Otherwise, keep transmitting data packets ov er channel n until we receive a N A CK. Then switch to the next active channel and go to Step 2a. W e note that dummy packets are only sent on the ﬁrst slot every time the base station switches to a ne w channel. 3) Update ω ( t ) according to (1) in every slot. Using RR ( φ ) as building blocks, we consider the following class of randomized r ound r obin policies . Randomized Round Robin Policy RandRR : 1) Pick φ ∈ Φ with probability α φ , where P φ ∈ Φ α φ = 1 . 2) Run policy RR ( φ ) for one round. Then go to Step 1. Note that activ e channels may be served in different order in different rounds, according to the least-recently-used service order . This allows more time for OFF channels to return to better information states (note that P ( k ) n, 01 is nondecreasing in k ) and thus improv es throughput. The next lemma guarantees the feasibility of ex ecuting any RR ( φ ) policy in RandRR (similar to Lemma 2, whene ver the base station switches to a new channel n , we need ω n ( t ) ≥ P ( M ( φ )) n, 01 in Step 2a of RR ( φ ) ). Lemma 6. When RR ( φ ) is chosen by RandRR for a new r ound of transmission, e very active c hannel n in φ starts with information state no wor se than P ( M ( φ )) n, 01 . Pr oof of Lemma 6: See Appendix B. Although RandRR randomly selects subsets of users and serves them in an order that depends on previous choices, we can surprisingly analyze its throughput. This is done by using the throughput analysis of RR ( M ) , as shown in the following corollary to Lemma 3: 6 Corollary 1. F or each policy RR ( φ ) , φ ∈ Φ , within time periods in which RR ( φ ) is executed by RandRR , denote by L φ kn the duration of the k th time the base station stays with active channel n . Then: 1) The pr obability mass function of L φ kn is independent of k , and is L φ kn = ( 1 with pr ob . 1 − P ( M ( φ )) n, 01 j ≥ 2 with pr ob . P ( M ( φ )) n, 01 ( P n, 11 ) ( j − 2) P n, 10 . As a result, for all k ∈ N we have E h L φ kn i = 1 + P ( M ( φ )) n, 01 P n, 10 . (8) 2) The number of data pack ets served in L φ kn is ( L φ kn − 1) . 3) F or e very ﬁxed φ and every ﬁxed active channel n in φ , the time durations L φ kn ar e i.i.d. random variables over all k . B. Achie vable Network Capacity — An Inner Capacity Bound Using Corollary 1, next we present the achie vable rate region of the class of RandRR policies. F or each RR ( φ ) polic y , deﬁne an N -dimensional v ector η φ = ( η φ 1 , η φ 2 , . . . , η φ N ) where η φ n ,    E [ L φ 1 n ] − 1 P n : φ n =1 E [ L φ 1 n ] if channel n is active in φ , 0 otherwise, (9) where E h L φ 1 n i is giv en in (8). Intuitively , by the analysis prior to Lemma 4, round robin policy RR ( φ ) yields throughput η φ n ov er channel n for each n ∈ { 1 , 2 , . . . , N } . Incorporating all possible random mixtures of RR ( φ ) policies for different φ , RandRR can support any data rate vector that is entrywise dominated by a con vex combination of vectors { η φ } φ ∈ Φ as shown by the next theorem. Theorem 1 (Generalized Inner Capacity Bound) . The class of RandRR policies supports all data rate vectors λ in the set Λ int deﬁned as Λ int , n λ | 0 ≤ λ ≤ µ , µ ∈ con v   η φ  φ ∈ Φ o , wher e η φ is deﬁned in (9) , conv ( A ) denotes the con vex hull of set A , and ≤ is taken entrywise. Pr oof of Theor em 1: See Appendix C. Applying Theorem 1 to symmetric channels yields the following corollary . Corollary 2 (Inner Capacity Bound for Symmetric Channels) . In symmetric channels, the class of RandRR policies supports all rate vectors λ ∈ Λ int wher e Λ int = ( λ | 0 ≤ λ ≤ µ , µ ∈ con v  c M ( φ ) M ( φ ) φ  φ ∈ Φ !) , wher e c M ( φ ) is deﬁned in (6) . An e xample of the inner capacity bound and a simple queue- dependent dynamic policy that supports all data rates within this nontrivial inner bound will be pro vided later . C. Outer Capacity Bound W e construct an outer bound on Λ using sev eral nov el ideas. First, by state aggregation, we transform the information state process { ω n ( t ) } for each channel n into non-stationary two- state Markov chains (in Fig. 4 provided later). Second, we create a set of bounding stationary Markov chains (in Fig. 5 provided later), which has the structure of a multi-armed bandit system. Finally , we create an outer capacity bound by relating the bounding model to the original non-stationary Marko v chains using stochastic coupling. W e note that since the control of the set of information state processes { ω n ( t ) } for all n can be viewed as a restless bandit problem [14], it is interesting to see how we bound the optimal performance of a restless bandit problem by a related multi-armed bandit system. W e ﬁrst map channel information states ω n ( t ) into modes for each n ∈ { 1 , 2 , . . . , N } . Inspired by (3), we observe that each channel n must be in one of the follo wing two modes: M1 The last observed state is ON , and the channel has not been seen (through feedback) to turn OFF . In this mode the information state ω n ( t ) ∈ [ π n, ON , P n, 11 ] . M2 The last observ ed state is OFF , and the channel has not been seen to turned ON . Here ω n ( t ) ∈ [ P n, 01 , π n, ON ] . On channel n , recall that W n is the state space of ω n ( t ) , and deﬁne a map f n : W n → { M1 , M2 } where f n ( ω n ( t )) = ( M1 if ω n ( t ) ∈ ( π n, ON , P n, 11 ] , M2 if ω n ( t ) ∈ [ P n, 01 , π n, ON ] . This mapping is illustrated in Fig. 3. 6 Cor ollary 1. F or eac h policy RR ( φ ) , φ ∈ Φ , within time periods in whic h RR ( φ ) is e xecuted by RandRR , denote by L φ kn the dur at ion of the k th time the base station stays with active c hannel n . Then: 1) The pr obability mass function of L φ kn is independent of k , and is L φ kn =  1 with pr ob . 1 − P ( M ( φ )) n, 01 j ≥ 2 with pr ob . P ( M ( φ )) n, 01 ( P n, 11 ) ( j − 2) P n, 10 . As a r esult, for all k ∈ N we have E  L φ kn  =1 + P ( M ( φ )) n, 01 P n, 10 . (8) 2) The number of data pac k et s served in L φ kn is ( L φ kn − 1) . 3) F or e very ﬁxed φ and e very ﬁxed active c hannel n in φ , the time dur ati ons L φ kn ar e i.i.d. r andom variables o ver all k . B. Ac hie vable Network Capacity — An Inner Capacity Bound Using Corollary 1, ne xt we present the achie v able rate re gion of t he class of RandRR policies. F or each RR ( φ ) polic y , deﬁne an N -dimensional v ector η φ =( η φ 1 , η φ 2 ,. .., η φ N ) where η φ n �    E [ L φ 1 n ] − 1 � n : φ n =1 E [ L φ 1 n ] if channel n is acti v e in φ , 0 otherwise, (9) where E  L φ 1 n  is gi v en in (8). Intuiti v ely , by the analysis prior to Lemma 4, round robin polic y RR ( φ ) yields throughput η φ n o v er channel n for each n ∈ { 1 , 2 ,. .., N } . Incorporating all possible random mixtures of RR ( φ ) policies for dif ferent φ , RandRR can support an y data rate v ector that is entrywise dominated by a con v e x combination of v ectors { η φ } φ ∈ Φ as sho wn by the ne xt theorem. Theor em 1 (Generalized Inner Capacity Bound) . The class of RandRR policies supports all data r ate vector s λ in the set Λ int deﬁned as Λ int �  λ | 0 ≤ λ ≤ µ , µ ∈ c on v   η φ  φ ∈ Φ  , wher e η φ is deﬁned in (9) , c on v ( A ) denotes the con ve x hull of set A , and ≤ is tak en entrywise . Pr oof of Theor em 1: See Appendix C. Applying Theorem 1 to symmetric channels yields the follo wing corollary . Cor ollary 2 (Inner Capacity Bound for Symmetric Channels) . In symmet ric c hannels, the class of RandRR policies supports all r ate vector s λ ∈ Λ int wher e Λ int =  λ | 0 ≤ λ ≤ µ , µ ∈ c on v   c M ( φ ) M ( φ ) φ  φ ∈ Φ  , wher e c M ( φ ) is deﬁned in (6) . An e xample of the inner capacity bound and a simple queue- dependent dynamic polic y that supports all data rates within this nontri vial inner bound will be pro vided later . C. Outer Capacity Bound W e construct an outer bound on Λ using se v eral no v el ideas. First, by state aggre g ation, we transform the information state process { ω n ( t ) } for each channel n into non-stationary tw o- state Mark o v chains (in Fig. 4 pro vided later). Second, we create a set of bounding stationary Mark o v chains (in Fig. 5 pro vided later), which has the structure of a multi-armed bandit system. Finally , we create an outer capacity bound by relating the bounding model to the original non-stationary Mark o v chains using stochastic coupling. W e note that since the control of the set of information state processes { ω n ( t ) } for all n can be vie wed as a restless bandit problem [14], it is interesting to see ho w we bound the optimal performance of a restless bandit problem by a related multi-armed bandit system. W e ﬁrst map channel information states ω n ( t ) into modes for each n ∈ { 1 , 2 ,. .., N } . Inspired by (3), we observ e that each channel n must be in one of the follo wing tw o modes: M1 The last observ ed state is ON , and the channel has not been seen (through feedback) to turn OFF . In this mode the information state ω n ( t ) ∈ [ π n, ON , P n, 11 ] . M2 The last observ ed state is OFF , and the channel has not been seen to turned ON . Here ω n ( t ) ∈ [ P n, 01 , π n, ON ] . On channel n , recall that W n is the state space of ω n ( t ) , and deﬁne a map f n : W n → { M1 , M2 } where f n ( ω n ( t )) =  M1 if ω n ( t ) ∈ ( π n, ON , P n, 11 ] , M2 if ω n ( t ) ∈ [ P n, 01 , π n, ON ] . This mapping is illustrated in Fig. 3. k ω n ( t ) π n, ON P n, 01 P n, 11 P ( k ) n, 11 P ( k ) n, 01 M1 M2 Fig. 3. The mapping f n from information states ω n ( t ) to modes { M1 , M2 } . F or an y information state process { ω n ( t ) } (controlled by some schedul ing polic y), the corresponding mode transition process under f n can be represented by the Mark o v chains sho wn in Fig. 4. Speciﬁcally , when channel n is serv ed in a slot, the associated mode transition follo ws the upper non- stationary chain of Fig. 4. When channel n is idled in a slot, the mode transition follo ws the lo wer stationary chain of Fig. 4. In the upper chain of Fig. 4, re g ardless what the current mode is, mode M1 is visited in t he ne xt slot if and only if channel n is ON in the current slot, which occurs with probability ω n ( t ) . In the lo wer chain of Fig. 4, when channel n is idled, its information state changes from a k -step transition probability to the ( k + 1) -step transition probability with the same most recent observ ed channel state. Therefore, the ne xt mode stays the same as the current mode. W e emphasize that, in the upper chain of Fig. 4, at mode M1 we al w ays ha v e ω n ( t ) ≤ P n, 11 , and at mode M2 it is ω n ( t ) ≤ π n, ON . A pack et is serv ed if and only if M1 is visited in the upper chain of Fig. 4. Fig. 3. The mapping f n from information states ω n ( t ) to modes { M1 , M2 } . For any information state process { ω n ( t ) } (controlled by some scheduling policy), the corresponding mode transition process under f n can be represented by the Markov chains shown in Fig. 4. Speciﬁcally , when channel n is served in a slot, the associated mode transition follows the upper non- stationary chain of Fig. 4. When channel n is idled in a slot, the mode transition follows the lower stationary chain of Fig. 4. In the upper chain of Fig. 4, regardless what the current mode is, mode M1 is visited in the ne xt slot if and only if channel n is ON in the current slot, which occurs with probability ω n ( t ) . In the lower chain of Fig. 4, when channel n is idled, its information state changes from a k -step transition probability to the ( k + 1) -step transition probability with the same most recent observed channel state. Therefore, the next mode stays the same as the current mode. W e emphasize that, in the upper chain of Fig. 4, at mode M1 we always hav e ω n ( t ) ≤ P n, 11 , and at mode M2 it is ω n ( t ) ≤ π n, ON . A packet is served if and only if M1 is visited in the upper chain of Fig. 4. 7 7 M1 M2 1 − ω n ( t ) ω n ( t ) 1 − ω n ( t ) ω n ( t ) When channel n is served in a slot. M1 M2 1 1 When channel n is idled in a slot. Fig. 4. Mode transition diagrams for the real channel n . M1 M2 1 − P n, 11 P n, 11 1 − π n, ON π n, ON When channel n is served in a slot. M1 M2 1 1 When channel n is idled in a slot. Fig. 5. Mode transition diagrams for the ﬁctitious channel n . T o upper bound throughput, we compare Fig. 4 to the mode transition diagrams in Fig. 5 that corresponds to a ﬁctitious model for channel n . This ﬁctitious channel has constant information state ω n ( t )= P n, 11 whene v er it is in mode M1 , and ω n ( t )= π n, ON whene v er it is in M2 . In other w ords, when the ﬁctitious channel n is in mode M1 (or M2 ), it sets i ts current information state to be the best state possible when the corresponding real channel n is in the same mode. It follo ws that, when both the real and the ﬁctitious channel n are serv ed, the probabilities of transitions M1 → M1 and M2 → M1 in the upper chain of Fig. 5 are greater than or equal to those in Fig. 4, respecti v ely . In other w ords, the upper chain of Fi g. 5 is mor e lik ely to go to mode M1 and serv e pack ets than that of Fig. 4. Therefore, intuiti v ely , if we serv e both the real and the ﬁctitious channel n in the same inﬁnite sequence of time slots, the ﬁctitious channel n will yield higher throughput for all n . This observ ation is made precise by the ne xt lemma. Lemma 7. Consider two discr ete-time Mark o v c hains { X ( t ) } and { Y ( t ) } both with state space { 0 , 1 } . Suppose { X ( t ) } is stationary and er godic with tr ansition pr obability matrix P =  P 00 P 01 P 10 P 11  , and { Y ( t ) } is non-stationary with Q ( t )=  Q 00 ( t ) Q 01 ( t ) Q 10 ( t ) Q 11 ( t )  . Assume P 01 ≥ Q 01 ( t ) and P 11 ≥ Q 11 ( t ) for all t . In { X ( t ) } , let π X ( 1) denote the stationary pr obability of state 1 ; π X ( 1) = P 01 / ( P 01 + P 10 ) . In { Y ( t ) } , deﬁne π Y ( 1) � lim sup T →∞ 1 T T − 1  t =0 Y ( t ) as the limiting fr action of time { Y ( t ) } stays at state 1 . Then we have π X ( 1) ≥ π Y ( 1) . Pr oof of Lemma 7: Gi v en in Appendix E. W e note that e x ecuting a scheduling polic y in the netw ork is to generate a sequence of channel selection decisions. By Lemma 7, if we apply the same sequence of channel selection decisions of some scheduling polic y to the set of ﬁctitious channels, we will get higher throughput on e v ery channel. A direct consequence of this is that the maximum sum throughput o v er the ﬁctitious channels is greater than or equal to that o v er the real channels. Lemma 8. The maximum sum thr oughput o ver the set of ﬁctitious c hannels is no mor e than m ax n ∈ { 1 , 2 ,...,N } { c n, ∞ } ,c n, ∞ � P n, 01 x n P n, 10 + P n, 01 . Pr oof of Lemma 8: W e note that ﬁnding the maximum sum throughput o v er ﬁctitious channels in Fig. 5 is equi v alent to solving a multi-armed bandit problem [15] with each channel acting as an arm (s ee Fig. 5 and note that a channel can change mode only when it is serv ed), and one unit of re w ard is earned if a pack et is deli v ered (recall that a pack et is serv ed if and only if mode M1 is visited in the upper chain of Fig. 5). The optimal solution to the multi-armed bandit system is to al w ays play the arm (channel) with the lar gest a v erage re w ard (throughput). The a v erage re w ard o v er channel n is equal to the stationary probability of mode M1 in the upper chain of Fig. 5, which is π n, ON P n, 10 + π n, ON = P n, 01 x n P n, 10 + P n, 01 . This ﬁnishes the proof. T ogether with the f act that throughput o v er an y real channel n cannot e xceed its stationary ON probability π n, ON , we ha v e constructed an outer bound on t he netw ork capacity re gion Λ (the proof follo ws the abo v e discussions and thus is omitted). Theor em 2. (Gener alized Outer Capacity Bound): Any sup- portable thr oughput vector λ =( λ 1 , λ 2 ,. .., λ N ) necessarily satisﬁes λ n ≤ π n, ON , for all n ∈ { 1 , 2 ,. .., N } , N  n =1 λ n ≤ m ax n ∈ { 1 , 2 ,...,N } { c n, ∞ } = m ax n ∈ { 1 , 2 ,...,N }  P n, 01 x n P n, 10 + P n, 01  . These ( N + 1) hyperplanes cr eate an outer capacity bound Λ out on Λ . Cor ollary 3 (Outer Capacity Bound for Symmetric Channels) . In symmetric c hannels with P n = P , c n, ∞ = c ∞ , and Fig. 4. Mode transition diagrams for the real channel n . 7 M1 M2 1 − ω n ( t ) ω n ( t ) 1 − ω n ( t ) ω n ( t ) When channel n is served in a slot. M1 M2 1 1 When channel n is idled in a slot. Fig. 4. Mode transition diagrams for the real channel n . M1 M2 1 − P n, 11 P n, 11 1 − π n, ON π n, ON When channel n is served in a slot. M1 M2 1 1 When channel n is idled in a slot. Fig. 5. Mode transition diagrams for the ﬁctitious channel n . T o upper bound throughput, we compare Fig. 4 to the mode transition diagrams in Fig. 5 that corresponds to a ﬁctitious model for channel n . This ﬁctitious channel has constant information state ω n ( t )= P n, 11 whene v er it is in mode M1 , and ω n ( t )= π n, ON whene v er it is in M2 . In other w ords, when the ﬁctitious channel n is in mode M1 (or M2 ), it sets i ts current information state to be the best state possible when the corresponding real channel n is in the same mode. It follo ws that, when both the real and the ﬁctitious channel n are serv ed, the probabilities of transitions M1 → M1 and M2 → M1 in the upper chain of Fig. 5 are greater than or equal to those in Fig. 4, respecti v ely . In other w ords, the upper chain of Fi g. 5 is mor e lik ely to go to mode M1 and serv e pack ets than that of Fig. 4. Therefore, intuiti v ely , if we serv e both the real and the ﬁctitious channel n in the same inﬁnite sequence of time slots, the ﬁctitious channel n will yield higher throughput for all n . This observ ation is made precise by the ne xt lemma. Lemma 7. Consider two discr ete-time Mark o v c hains { X ( t ) } and { Y ( t ) } both with state space { 0 , 1 } . Suppose { X ( t ) } is stationary and er godic with tr ansition pr obability matrix P =  P 00 P 01 P 10 P 11  , and { Y ( t ) } is non-stationary with Q ( t )=  Q 00 ( t ) Q 01 ( t ) Q 10 ( t ) Q 11 ( t )  . Assume P 01 ≥ Q 01 ( t ) and P 11 ≥ Q 11 ( t ) for all t . In { X ( t ) } , let π X ( 1) denote the stationary pr obability of state 1 ; π X ( 1) = P 01 / ( P 01 + P 10 ) . In { Y ( t ) } , deﬁne π Y ( 1) � lim sup T →∞ 1 T T − 1  t =0 Y ( t ) as the limiting fr action of time { Y ( t ) } stays at state 1 . Then we have π X ( 1) ≥ π Y ( 1) . Pr oof of Lemma 7: Gi v en in Appendix E. W e note that e x ecuting a scheduling polic y in the netw ork is to generate a sequence of channel selection decisions. By Lemma 7, if we apply the same sequence of channel selection decisions of some scheduling polic y to the set of ﬁctitious channels, we will get higher throughput on e v ery channel. A direct consequence of this is that the maximum sum throughput o v er the ﬁctitious channels is greater than or equal to that o v er the real channels. Lemma 8. The maximum sum thr oughput o ver the set of ﬁctitious c hannels is no mor e than m ax n ∈ { 1 , 2 ,...,N } { c n, ∞ } ,c n, ∞ � P n, 01 x n P n, 10 + P n, 01 . Pr oof of Lemma 8: W e note that ﬁnding the maximum sum throughput o v er ﬁctitious channels in Fig. 5 is equi v alent to solving a multi-armed bandit problem [15] with each channel acting as an arm (s ee Fig. 5 and note that a channel can change mode only when it is serv ed), and one unit of re w ard is earned if a pack et is deli v ered (recall that a pack et is serv ed if and only if mode M1 is visited in the upper chain of Fig. 5). The optimal solution to the multi-armed bandit system is to al w ays play the arm (channel) with the lar gest a v erage re w ard (throughput). The a v erage re w ard o v er channel n is equal to the stationary probability of mode M1 in the upper chain of Fig. 5, which is π n, ON P n, 10 + π n, ON = P n, 01 x n P n, 10 + P n, 01 . This ﬁnishes the proof. T ogether with the f act that throughput o v er an y real channel n cannot e xceed its stationary ON probability π n, ON , we ha v e constructed an outer bound on t he netw ork capacity re gion Λ (the proof follo ws the abo v e discussions and thus is omitted). Theor em 2. (Gener alized Outer Capacity Bound): Any sup- portable thr oughput vector λ =( λ 1 , λ 2 ,. .., λ N ) necessarily satisﬁes λ n ≤ π n, ON , for all n ∈ { 1 , 2 ,. .., N } , N  n =1 λ n ≤ m ax n ∈ { 1 , 2 ,...,N } { c n, ∞ } = m ax n ∈ { 1 , 2 ,...,N }  P n, 01 x n P n, 10 + P n, 01  . These ( N + 1) hyperplanes cr eate an outer capacity bound Λ out on Λ . Cor ollary 3 (Outer Capacity Bound for Symmetric Channels) . In symmetric c hannels with P n = P , c n, ∞ = c ∞ , and Fig. 5. Mode transition diagrams for the ﬁctitious channel n . T o upper bound throughput, we compare Fig. 4 to the mode transition diagrams in Fig. 5 that corresponds to a ﬁctitious model for channel n . This ﬁctitious channel has constant information state ω n ( t ) = P n, 11 whenev er it is in mode M1 , and ω n ( t ) = π n, ON whenev er it is in M2 . In other words, when the ﬁctitious channel n is in mode M1 (or M2 ), it sets its current information state to be the best state possible when the corresponding real channel n is in the same mode. It follows that, when both the real and the ﬁctitious channel n are served, the probabilities of transitions M1 → M1 and M2 → M1 in the upper chain of Fig. 5 are greater than or equal to those in Fig. 4, respecti vely . In other words, the upper chain of Fig. 5 is mor e likely to go to mode M1 and serv e packets than that of Fig. 4. Therefore, intuitiv ely , if we serve both the real and the ﬁctitious channel n in the same inﬁnite sequence of time slots, the ﬁctitious channel n will yield higher throughput for all n . This observ ation is made precise by the next lemma. Lemma 7. Consider two discr ete-time Markov chains { X ( t ) } and { Y ( t ) } both with state space { 0 , 1 } . Suppose { X ( t ) } is stationary and ergodic with tr ansition pr obability matrix P =  P 00 P 01 P 10 P 11  , and { Y ( t ) } is non-stationary with Q ( t ) =  Q 00 ( t ) Q 01 ( t ) Q 10 ( t ) Q 11 ( t )  . Assume P 01 ≥ Q 01 ( t ) and P 11 ≥ Q 11 ( t ) for all t . In { X ( t ) } , let π X (1) denote the stationary pr obability of state 1 ; π X (1) = P 01 / ( P 01 + P 10 ) . In { Y ( t ) } , deﬁne π Y (1) , lim sup T →∞ 1 T T − 1 X t =0 Y ( t ) as the limiting fraction of time { Y ( t ) } stays at state 1 . Then we have π X (1) ≥ π Y (1) . Pr oof of Lemma 7: Giv en in Appendix E. W e note that ex ecuting a scheduling policy in the network is to generate a sequence of channel selection decisions. By Lemma 7, if we apply the same sequence of channel selection decisions of some scheduling policy to the set of ﬁctitious channels, we will get higher throughput on ev ery channel. A direct consequence of this is that the maximum sum throughput o ver the ﬁctitious channels is greater than or equal to that o ver the real channels. Lemma 8. The maximum sum thr oughput over the set of ﬁctitious channels is no more than max n ∈{ 1 , 2 ,...,N } { c n, ∞ } , c n, ∞ , P n, 01 x n P n, 10 + P n, 01 . Pr oof of Lemma 8: W e note that ﬁnding the maximum sum throughput ov er ﬁctitious channels in Fig. 5 is equiv alent to solving a multi-armed bandit problem [15] with each channel acting as an arm (see Fig. 5 and note that a channel can change mode only when it is served), and one unit of re ward is earned if a packet is deliv ered (recall that a packet is served if and only if mode M1 is visited in the upper chain of Fig. 5). The optimal solution to the multi-armed bandit system is to always play the arm (channel) with the largest average reward (throughput). The av erage re ward ov er channel n is equal to the stationary probability of mode M1 in the upper chain of Fig. 5, which is π n, ON P n, 10 + π n, ON = P n, 01 x n P n, 10 + P n, 01 . This ﬁnishes the proof. T ogether with the f act that throughput over any real channel n cannot exceed its stationary ON probability π n, ON , we ha ve constructed an outer bound on the network capacity region Λ (the proof follows the above discussions and thus is omitted). Theorem 2. (Generalized Outer Capacity Bound): Any sup- portable thr oughput vector λ = ( λ 1 , λ 2 , . . . , λ N ) necessarily satisﬁes λ n ≤ π n, ON , for all n ∈ { 1 , 2 , . . . , N } , N X n =1 λ n ≤ max n ∈{ 1 , 2 ,...,N } { c n, ∞ } = max n ∈{ 1 , 2 ,...,N }  P n, 01 x n P n, 10 + P n, 01  . These ( N + 1) hyperplanes cr eate an outer capacity bound Λ out on Λ . Corollary 3 (Outer Capacity Bound for Symmetric Channels) . In symmetric channels with P n = P , c n, ∞ = c ∞ , and 8 π n, ON = π ON for all n , we have Λ out = ( λ ≥ 0 | N X n =1 λ n ≤ c ∞ , λ n ≤ π ON for 1 ≤ n ≤ N ) , (10) wher e ≥ is tak en entrywise. W e note that Lemma 5 in Section III-C directly follows Corollary 3. D. A T wo-User Example on Symmetric Channels Here we consider a tw o-user example on symmetric chan- nels. For simplicity we will drop the subscript n in notations. From Corollary 3, we have the outer bound Λ out =       λ 1 λ 2         0 ≤ λ n ≤ P 01 /x, for 1 ≤ n ≤ 2 , λ 1 + λ 2 ≤ P 01 / ( x P 10 + P 01 ) , x = P 01 + P 10      . For the inner bound Λ int , we note that policy RandRR can ex ecute three round robin policies RR ( φ ) for φ ∈ Φ = { (1 , 1) , (0 , 1) , (1 , 0) } . From Corollary 2, we hav e Λ int =     λ 1 λ 2        0 ≤ λ n ≤ µ n , for 1 ≤ n ≤ 2 ,  µ 1 µ 2  ∈ conv  c 2 / 2 c 2 / 2  ,  c 1 0  ,  0 c 1     . Under the special case P 01 = P 10 = 0 . 2 , the tw o bounds λ int and Λ out are shown in Fig. 6. 8 π n, ON = π ON for all n , we have Λ out =  λ ≥ 0 | N  n =1 λ n ≤ c ∞ , λ n ≤ π ON for 1 ≤ n ≤ N  , (10) wher e ≥ is tak en entrywise . W e note that Lemma 5 in Section III-C directly follo ws Corollary 3. D. A T wo-User Example on Symmetric Channels Here we consider a tw o-user e xample on symmetric chan- nels. F or simplicity we will drop the subscript n in notations. From Corollary 3, we ha v e the outer bound Λ out =       λ 1 λ 2         0 ≤ λ n ≤ P 01 /x , for 1 ≤ n ≤ 2 , λ 1 + λ 2 ≤ P 01 / ( x P 10 + P 01 ) , x = P 01 + P 10      . F or the inner bound Λ int , we note that polic y RandRR can e x ecute three round robin policies RR ( φ ) for φ ∈ Φ = { (1 , 1) , (0 , 1) , (1 , 0) } . From Corollary 2, we ha v e Λ int =     λ 1 λ 2        0 ≤ λ n ≤ µ n , for 1 ≤ n ≤ 2 ,  µ 1 µ 2  ∈ c on v   c 2 / 2 c 2 / 2  ,  c 1 0  ,  0 c 1      . Under the special case P 01 = P 10 =0 . 2 , the tw o bounds λ int and Λ out are sho wn in Fig. 6. A B C D λ 1 λ 2 0 . 25 0 . 5 0 . 25 0 . 5 Λ out Λ ideal Λ blind Λ int Λ (unknown) Fig. 6. Comparison of rate re gions under dif ferent assumptions. In Fig. 6, we also compare Λ int and Λ out with other rate re gions. Set Λ ideal is the ideal capacity re gion when instantaneous channel states are kno wn without causing an y (timing) o v erhead [16]. Ne xt, it is sho wn in [6] that the maximum sum throughput in this netw ork is achie v ed at point A =( 0 . 325 , 0 . 325) . The (unkno wn) netw ork capacity re gion Λ is bounded between Λ int and Λ out , and has boundary points B , A , and C . Since the boundary of Λ is a conca v e curv e connecting B , A , and C , we en vision that Λ shall contain b ut be v ery close to Λ int . Finally , the rate re gion Λ blind is rendered by completely ne glecting channel memory and treating the channels as i.i.d. o v er slots [2]. W e observ e the throughput g ain Λ int \ Λ blind , as much as 23% in this e xample, is achie v ed by incorporating channel memory . In general, if channels are symmetric and treated as i.i.d. o v er slots, the maximum sum throughput in the netw ork is π ON = c 1 . Then the maximum throughput g ain of RandRR using channel memory i s c N − c 1 , which as N →∞ con v er ges to c ∞ − c 1 = P 01 x P 10 + P 01 − P 01 P 01 + P 10 , which is controlled by the f actor x = P 01 + P 10 . E. A Heuristically T ighter Inner Bound It is sho wn in [7] that the follo wing polic y maximi zes the sum throughput in a symmetric netw ork: Serve c hannels in a cir cular or der , wher e on eac h c hannel k eep tr ansmitting data pac k ets until a N A CK is r eceived . In the abo v e tw o-user e xample, this polic y achie v es throughput v ector A in Fig. 7. If we replace our round robin polic y A B C D λ 1 λ 2 0 . 25 0 . 5 0 . 25 0 . 5 Λ int Λ (unkno wn) Λ heuristic Fig. 7. Comparison of our inner bound Λ int , the unkno wn netw ork capacity re gion Λ , and a heuristically better inner bound Λ heuristic . RR ( φ ) by this one, heuristically we are able to construct a tighter inner capacity bound. F or e xample, we can support the tighter inner bound Λ heuristic in Fig. 7 by appropriate time sharing among the abo v e polic y that serv es dif ferent subsets of channels. Ho we v er , we note that this approach is dif ﬁcult to analyze because the { L kn } process (see (4)) forms a high- order Mark o v chain. Y et, our inner bound Λ int pro vides a good throughput guarantee for this class of heuristic policies. V. P ROX I M I T Y O F T H E I NN ER B OUN D T O T HE T RU E C AP A C ITY R EG ION —S YMMET RIC C AS E Ne xt we bound the closeness of the boundaries of Λ int and Λ in the case of symmetric channels. In Section III-C, by choosing M = N , we ha v e pro vided such analysis for the boundary point in the direction (1 , 1 ,. .., 1) . Here we generalize to all boundary points. Deﬁne V �  ( v 1 ,v 2 ,. .., v N )      v n ≥ 0 for 1 ≤ n ≤ N , v n > 0 for at least one n  Fig. 6. Comparison of rate regions under dif ferent assumptions. In Fig. 6, we also compare Λ int and Λ out with other rate regions. Set Λ ideal is the ideal capacity region when instantaneous channel states are known without causing any (timing) ov erhead [16]. Next, it is shown in [6] that the maximum sum throughput in this network is achiev ed at point A = (0 . 325 , 0 . 325) . The (unkno wn) network capacity region Λ is bounded between Λ int and Λ out , and has boundary points B , A , and C . Since the boundary of Λ is a concav e curve connecting B , A , and C , we envision that Λ shall contain but be very close to Λ int . Finally , the rate region Λ blind is rendered by completely neglecting channel memory and treating the channels as i.i.d. ov er slots [2]. W e observe the throughput gain Λ int \ Λ blind , as much as 23% in this example, is achiev ed by incorporating channel memory . In general, if channels are symmetric and treated as i.i.d. over slots, the maximum sum throughput in the network is π ON = c 1 . Then the maximum throughput gain of RandRR using channel memory is c N − c 1 , which as N → ∞ con ver ges to c ∞ − c 1 = P 01 x P 10 + P 01 − P 01 P 01 + P 10 , which is controlled by the factor x = P 01 + P 10 . E. A Heuristically T ighter Inner Bound It is sho wn in [7] that the follo wing policy maximizes the sum throughput in a symmetric network: Serve channels in a cir cular or der , where on each channel k eep transmitting data pac kets until a N ACK is r eceived . In the abo ve two-user e xample, this policy achie ves throughput vector A in Fig. 7. If we replace our round robin policy 8 π n, ON = π ON for all n , we have Λ out =  λ ≥ 0 | N  n =1 λ n ≤ c ∞ , λ n ≤ π ON for 1 ≤ n ≤ N  , (10) wher e ≥ is tak en entrywise . W e note that Lemma 5 in Section III-C directly follo ws Corollary 3. D. A T wo-User Example on Symmetric Channels Here we consider a tw o-user e xample on symmetric chan- nels. F or simplicity we will drop the subscript n in notations. From Corollary 3, we ha v e the outer bound Λ out =       λ 1 λ 2         0 ≤ λ n ≤ P 01 /x , for 1 ≤ n ≤ 2 , λ 1 + λ 2 ≤ P 01 / ( x P 10 + P 01 ) , x = P 01 + P 10      . F or the inner bound Λ int , we note that polic y RandRR can e x ecute three round robin policies RR ( φ ) for φ ∈ Φ = { (1 , 1) , (0 , 1) , (1 , 0) } . From Corollary 2, we ha v e Λ int =     λ 1 λ 2        0 ≤ λ n ≤ µ n , for 1 ≤ n ≤ 2 ,  µ 1 µ 2  ∈ c on v   c 2 / 2 c 2 / 2  ,  c 1 0  ,  0 c 1      . Under the special case P 01 = P 10 =0 . 2 , the tw o bounds λ int and Λ out are sho wn in Fig. 6. A B C D λ 1 λ 2 0 . 25 0 . 5 0 . 25 0 . 5 Λ out Λ ideal Λ blind Λ int Λ (unkno wn) Fig. 6. Comparison of rate re gions under dif ferent assumptions. In Fig. 6, we also compare Λ int and Λ out with other rate re gions. Set Λ ideal is the ideal capacity re gion when instantaneous channel states are kno wn without causing an y (timing) o v erhead [16]. Ne xt, it is sho wn in [6] that the maximum sum throughput in this netw ork is achie v ed at point A =( 0 . 325 , 0 . 325) . The (unkno wn) netw ork capacity re gion Λ is bounded between Λ int and Λ out , and has boundary points B , A , and C . Since the boundary of Λ is a conca v e curv e connecting B , A , and C , we en vision that Λ shall contain b ut be v ery close to Λ int . Finally , the rate re gion Λ blind is rendered by completely ne glecting channel memory and treating the channels as i.i.d. o v er slots [2]. W e observ e the throughput g ain Λ int \ Λ blind , as much as 23% in this e xample, is achie v ed by incorporating channel memory . In general, if channels are symmetric and treated as i.i.d. o v er slots, the maximum sum throughput in the netw ork is π ON = c 1 . Then the maximum throughput g ain of RandRR using channel memory i s c N − c 1 , which as N →∞ con v er ges to c ∞ − c 1 = P 01 x P 10 + P 01 − P 01 P 01 + P 10 , which is controlled by the f actor x = P 01 + P 10 . E. A Heuristically T ighter Inner Bound It is sho wn in [7] that the follo wing polic y maximi zes the sum throughput in a symmetric netw ork: Serve c hannels in a cir cular or der , wher e on eac h c hannel k eep tr ansmitting data pac k ets until a N A CK is r eceived . In the abo v e tw o-user e xample, this polic y achie v es throughput v ector A in Fig. 7. If we replace our round robin polic y A B C D λ 1 λ 2 0 . 25 0 . 5 0 . 25 0 . 5 Λ int Λ (unknown) Λ heuristic Fig. 7. Comparison of our inner bound Λ int , the unkno wn netw ork capacity re gion Λ , and a heuristically better inner bound Λ heuristic . RR ( φ ) by this one, heuristically we are able to construct a tighter inner capacity bound. F or e xample, we can support the tighter inner bound Λ heuristic in Fig. 7 by appropriate time sharing among the abo v e polic y that serv es dif ferent subsets of channels. Ho we v er , we note that this approach is dif ﬁcult to analyze because the { L kn } process (see (4)) forms a high- order Mark o v chain. Y et, our inner bound Λ int pro vides a good throughput guarantee for this class of heuristic policies. V. P ROX I M I T Y O F T H E I NN ER B OUN D T O T HE T RU E C AP A C ITY R EG ION —S YMMET RIC C AS E Ne xt we bound the closeness of the boundaries of Λ int and Λ in the case of symmetric channels. In Section III-C, by choosing M = N , we ha v e pro vided such analysis for the boundary point in the direction (1 , 1 ,. .., 1) . Here we generalize to all boundary points. Deﬁne V �  ( v 1 ,v 2 ,. .., v N )      v n ≥ 0 for 1 ≤ n ≤ N , v n > 0 for at least one n  Fig. 7. Comparison of our inner bound Λ int , the unknown network capacity region Λ , and a heuristically better inner bound Λ heuristic . RR ( φ ) by this one, heuristically we are able to construct a tighter inner capacity bound. For example, we can support the tighter inner bound Λ heuristic in Fig. 7 by appropriate time sharing among the abov e policy that serves dif ferent subsets of channels. Howe ver , we note that this approach is difﬁcult to analyze because the { L kn } process (see (4)) forms a high- order Markov chain. Y et, our inner bound Λ int provides a good throughput guarantee for this class of heuristic policies. V . P RO X I M I T Y O F T H E I N N E R B O U N D T O T H E T R U E C A PAC I T Y R E G I O N — S Y M M E T R I C C A S E Next we bound the closeness of the boundaries of Λ int and Λ in the case of symmetric channels. In Section III-C, by choosing M = N , we have provided such analysis for 9 the boundary point in the direction (1 , 1 , . . . , 1) . Here we generalize to all boundary points. Deﬁne V , ( ( v 1 , v 2 , . . . , v N )      v n ≥ 0 for 1 ≤ n ≤ N , v n > 0 for at least one n ) as a set of directional vectors. For any v ∈ V , let λ int = ( λ int 1 , λ int 2 , . . . , λ int N ) and λ out = ( λ out 1 , λ out 2 , . . . , λ out N ) be the boundary point of Λ int and Λ out in the direction of v , respec- tiv ely . It is useful to compute P N n =1 ( λ out n − λ int n ) , because it upper bounds the loss of the sum throughput of Λ int from Λ in the direction of v . 5 W e note that computing λ int in an arbitrary direction is difﬁcult. Thus we will ﬁnd an upper bound on P N n =1 ( λ out n − λ int n ) . A. Pr eliminary T o have more intuitions on Λ int , we start with a toy example of N = 3 users. W e are interested in the boundary point of Λ int in the direction of v = (1 , 2 , 1) . Consider two RandRR -type policies ψ 1 and ψ 2 deﬁned as follows. For ψ 1 , choose      φ 1 = (1 , 0 , 0) with prob. 1 / 4 φ 2 = (0 , 1 , 0) with prob. 1 / 2 φ 3 = (0 , 0 , 1) with prob. 1 / 4 For ψ 2 , choose ( φ 4 = (1 , 1 , 0) with prob. 1 / 2 φ 5 = (0 , 1 , 1) with prob. 1 / 2 Both ψ 1 and ψ 2 support data rates in the direction of (1 , 2 , 1) . Howe ver , using the analysis of Lemma 4 and Theorem 1, we know ψ 1 supports throughput vector 1 4   c 1 0 0   + 1 2   0 c 1 0   + 1 4   0 0 c 1   = c 1 4   1 2 1   , while ψ 2 supports 1 2   c 2 / 2 c 2 / 2 0   + 1 2   0 c 2 / 2 c 2 / 2   = c 2 4   1 2 1   ≥ c 1 4   1 2 1   , where c 1 and c 2 are deﬁned in (6). W e see that ψ 2 achiev es data rates closer than ψ 1 does to the boundary of Λ int . It is because e very sub-policy of ψ 2 , namely RR ( φ 4 ) and RR ( φ 5 ) , supports sum throughput c 2 (by Lemma 4), where those of ψ 1 only support c 1 . In other words, policy ψ 2 has better multiuser diversity gain than ψ 1 does. This example suggests that we can ﬁnd a good lower bound on λ int by exploring to what extent the multiuser di versity can be exploited. W e start with the following deﬁnition. Deﬁnition 1. F or any v ∈ V , we say v is d -user div erse if v can be written as a positi ve combination of vectors in Φ d , wher e Φ d denotes the set of N -dimensional binary vectors having d entries be 1 . Deﬁne d ( v ) , max 1 ≤ d ≤ N { d | v is d -user diverse } , 5 Note that P N n =1 ( λ out n − λ int n ) also bounds the closeness between Λ out and Λ . and we shall say v is maximally d ( v ) -user div erse . The notion of d ( v ) is well-deﬁned because e very v must be 1 -user diverse. 6 Deﬁnition 1 is the most useful to us through the next lemma. Lemma 9. The boundary point of Λ int in the dir ection of v ∈ V has sum thr oughput at least c d ( v ) , wher e c d ( v ) , P 01 (1 − (1 − x ) d ( v ) ) x P 10 + P 01 (1 − (1 − x ) d ( v ) ) , x , P 01 + P 10 . Pr oof of Lemma 9: If direction v can be written as a positiv e weighted sum of vectors in Φ d ( v ) , we can normalize the weights, and use the ne w weights as probabilities to randomly mix RR ( φ ) policies for all φ ∈ Φ d ( v ) . This way we achie ve sum throughput c d ( v ) in e very transmission round, and overall the throughput vector will be in the direction of v . Therefore the result follows. For details, see Appendix G. Fig. 8 provides an example of Lemma 9 in the two- user symmetric system in Section IV -D. W e observe that 9 as a set of directional v ectors. F or an y v ∈ V , let λ int = ( λ int 1 , λ int 2 ,. .., λ int N ) and λ out =( λ out 1 , λ out 2 ,. .., λ out N ) be the boundary point of Λ int and Λ out in the direction of v , respec- ti v ely . It is useful to compute  N n =1 ( λ out n − λ int n ) , because it upper bounds the loss of the sum throughput of Λ int from Λ in the direction of v . 5 W e note that computing λ int in an arbitrary direction is dif ﬁcult. Thus we will ﬁnd an upper bound on  N n =1 ( λ out n − λ int n ) . A. Pr eliminary T o ha v e more intui tions on Λ int , we start with a to y e xample of N =3 users. W e are interested in the boundary point of Λ int in t he direction of v =( 1 , 2 , 1) . Consider tw o RandRR -type policies ψ 1 and ψ 2 deﬁned as follo ws. F or ψ 1 , choose      φ 1 =( 1 , 0 , 0) with prob . 1 / 4 φ 2 =( 0 , 1 , 0) with prob . 1 / 2 φ 3 =( 0 , 0 , 1) with prob . 1 / 4 F or ψ 2 , choose  φ 4 =( 1 , 1 , 0) with prob . 1 / 2 φ 5 =( 0 , 1 , 1) with prob . 1 / 2 Both ψ 1 and ψ 2 support data rates in the direction of (1 , 2 , 1) . Ho we v er , using the analysis of Le mma 4 and Theorem 1, we kno w ψ 1 supports throughput v ector 1 4   c 1 0 0   + 1 2   0 c 1 0   + 1 4   0 0 c 1   = c 1 4   1 2 1   , while ψ 2 supports 1 2   c 2 / 2 c 2 / 2 0   + 1 2   0 c 2 / 2 c 2 / 2   = c 2 4   1 2 1   ≥ c 1 4   1 2 1   , where c 1 and c 2 are deﬁned in (6). W e see that ψ 2 achie v es data rates closer than ψ 1 does to the boundary of Λ int . It is because e v ery sub-polic y of ψ 2 , namely RR ( φ 4 ) and RR ( φ 5 ) , supports sum throughput c 2 (by Lemma 4), where those of ψ 1 only support c 1 . In other w ords, polic y ψ 2 has better multiuser diver sity gain than ψ 1 does. This e xample suggests that we can ﬁnd a good lo wer bound on λ int by e xploring to what e xtent the multiuser di v ersity can be e xploited. W e start with the follo wing deﬁnition. Deﬁnition 1. F or any v ∈ V , we say v is d -user di v erse if v can be written as a positi v e combination of vector s in Φ d , wher e Φ d denotes the set of N -dimensional binary vector s having d entries be 1 . Deﬁne d ( v ) � m ax 1 ≤ d ≤ N { d | v is d -user diver se } , and we shall say v is maximally d ( v ) -user di v erse . The notion of d ( v ) is well-deﬁned because e v ery v must be 1 -user di v erse. 6 Deﬁnition 1 is the most use ful to us through 5 Note that � N n =1 ( λ out n − λ int n ) also bounds the closeness between Λ out and Λ . 6 The set Φ 1 = { e 1 , e 2 ,..., e N } is the collection of unit coordinate v ectors where e n has its n th entry be 1 and 0 otherwise. An y v ector v ∈ V , v =( v 1 ,v 2 ,..., v N ) , can be written as v = � v n > 0 v n e n . the ne xt lemma. Lemma 9. The boundary point of Λ int in the dir ection of v ∈ V has sum thr oughput at least c d ( v ) , wher e c d ( v ) � P 01 (1 − (1 − x ) d ( v ) ) x P 10 + P 01 (1 − (1 − x ) d ( v ) ) ,x � P 01 + P 10 . Pr oof of Lemma 9: If direction v can be writt en as a positi v e weighted sum of v ectors in Φ d ( v ) , we can normalize the weights, and use the ne w weights as probabilities to randomly mix RR ( φ ) policies for all φ ∈ Φ d ( v ) . This w ay we achie v e sum throughput c d ( v ) in e v ery transmission round, and o v erall the throughput v ec tor will be in the direction of v . Therefore the result follo ws. F or details, see Appendix G. Fig. 8 pro vides an e xample of Lemma 9 in the tw o- user symmetric system in Section IV -D. W e observ e that B C D λ 1 λ 2 0 . 25 0 . 5 0 . 25 0 . 5 λ 1 + λ 2 = c 2 λ 1 + λ 2 = c 1 Λ int Fig. 8. An e xample for Lemma 9 in the tw o-user symmetric netw ork. Point B and C achie v e sum throughput c 1 = π ON =0 . 5 , and the sum throughput at D is c 2 ≈ 0 . 615 . An y other boundary point of Λ int has sum throughput between c 1 and c 2 . direction (1 , 1) , the one that passes point D in Fig. 8, is the only direction that is maximally 2 -user di v erse. The sum throughput c 2 is achie v ed at D . F or all the other directions, the y are maximally 1 -user di v erse and, from Fig. 8, only sum throughput c 1 is guaranteed along those directions. In general, geometrically we can sho w that a maximally d -user di v erse v ector , s ay v d , forms a smaller angle with the all - 1 v ector (1 , 1 ,. .., 1) than a maximally d � -user di v erse v ector , say v d � , does if d � 0 v n e n . 10 B. Pr oximity Analysis W e use the notion of d ( v ) to upper bound P N n =1 ( λ out n − λ int n ) in any direction v ∈ V . Let λ out = θ λ int (i.e., λ out n = θ λ int n for all n ) for some θ ≥ 1 . By (10), the boundary of Λ out is characterized by the interaction of the ( N + 1) hyperplanes P N n =1 λ n = c ∞ and λ n = π ON for each n ∈ { 1 , 2 , . . . , N } . Speciﬁcally , in any given direction, if we consider the cross points on all the hyperplanes in that direction, the boundary point λ out is the one closest to the origin. W e do not know which hyperplane λ out is on, and thus need to consider all ( N + 1) cases. If λ out is on the plane P N n =1 λ n = c ∞ , i.e., P N n =1 λ out n = c ∞ , we get N X n =1 ( λ out n − λ int n ) ( a ) ≤ c ∞ − c d ( v ) ( b ) ≤ c ∞ (1 − x ) d ( v ) , where (a) is by Lemma 9 and (b) is by (7). If λ out is on the plane λ n = π ON for some n , then θ = π ON /λ int n . It follows N X n =1 ( λ out n − λ int n ) = ( θ − 1) N X n =1 λ int n ≤  π ON λ int n − 1  c ∞ . The above discussions lead to the next lemma. Lemma 10. The loss of the sum thr oughput of Λ int fr om Λ in the dir ection of v is upper bounded by min  c ∞ (1 − x ) d ( v ) , min 1 ≤ n ≤ N  π ON λ int n − 1  c ∞  = c ∞ min  (1 − x ) d ( v ) , π ON max 1 ≤ n ≤ N { λ int n } − 1  . (11) Lemma 10 shows that, if data rates are more balanced, namely , hav e a larger d ( v ) , the sum throughput loss is domi- nated by the ﬁrst term in the minimum of (11) and decreases to 0 geometrically fast with d ( v ) . If data rates are biased toward a particular user, the second term in the minimum of (11) captures the throughput loss, which goes to 0 as the rate of the fav ored user goes to the single-user capacity π ON . V I . T H R O U G H P U T - A C H I EV I N G Q U E U E - D E P E N D E N T R O U N D R O B I N P O L I C Y Let a n ( t ) , for 1 ≤ n ≤ N , be the number of exogenous packet arriv als destined for user n in slot t . Suppose a n ( t ) are independent across users, i.i.d. over slots with rate E [ a n ( t )] = λ n , and a n ( t ) is bounded with 0 ≤ a n ( t ) ≤ A max , where A max is a ﬁnite integer . Let U n ( t ) be the backlog of user- n packets queued at the base station at time t . Deﬁne U ( t ) , ( U 1 ( t ) , U 2 ( t ) , . . . , U N ( t )) and suppose U n (0) = 0 for all n . The queue process { U n ( t ) } ev olves as U n ( t + 1) = max [ U n ( t ) − µ n ( s n ( t ) , t ) , 0] + a n ( t ) , (12) where µ n ( s n ( t ) , t ) ∈ { 0 , 1 } is the service rate allocated to user n in slot t . W e have µ n ( s n ( t ) , t ) = 1 if user n is served and s n ( t ) = ON , and 0 otherwise. In the rest of the paper we drop s n ( t ) in µ n ( s n ( t ) , t ) and use µ n ( t ) for notational simplicity . W e say the network is (strongly) stable if lim sup t →∞ 1 t t − 1 X τ =0 N X n =1 E [ U n ( τ )] < ∞ . Consider a rate vector λ interior to the inner capacity region bound Λ int giv en in Theorem 1. Namely , there exists an  > 0 and a probability distrib ution { β φ } φ ∈ Φ such that λ n +  < X φ ∈ Φ β φ η φ n , for all 1 ≤ n ≤ N , (13) where η φ n is deﬁned in (9). By Theorem 1, there exists a RandRR policy that yields service rates equal to the right- side of (13) and thus stabilizes the network with arri val rate vector λ [17, Lemma 3 . 6 ]. The existence of this policy is useful and we shall denote it by RandRR ∗ . Recall that on each new scheduling round, the policy RandRR ∗ randomly picks a binary vector φ using probabilities α φ (deﬁned ov er all of the (2 N − 1) subsets of users). The M ( φ ) active users in φ are served for one round by the round robin policy RR ( φ ) , serving the least recently used users ﬁrst. Howe ver , solving for the probabilities needed to implement the RandRR ∗ policy that yields (13) is intractable when N is large, because we need to ﬁnd (2 N − 1) unknown probabilities { α φ } φ ∈ Φ , compute { β φ } φ ∈ Φ from (19), and make (13) hold. Instead of probabilistically ﬁnding the vector φ for the current round of scheduling, we use the following simple queue-dependent policy . Queue-dependent Round Robin Policy ( QRR ) : 1) Start with t = 0 . 2) At time t , observ e the current queue backlog vector U ( t ) and ﬁnd the binary vector φ ( t ) ∈ Φ deﬁned as 7 φ ( t ) , arg max φ ∈ Φ f ( U ( t ) , RR ( φ )) , (14) where f ( U ( t ) , RR ( φ )) , X n : φ n =1 " U n ( t ) E h L φ 1 n − 1 i − E h L φ 1 n i N X n =1 U n ( t ) λ n # and E h L φ 1 n i = 1 + P ( M ( φ )) n, 01 / P n, 10 from (8). Ties are broken arbitrarily . 3) Run RR ( φ ( t )) for one round of transmission. W e em- phasize that activ e channels in φ are served in the least- recently-used order . After the round ends, go to Step 2. The QRR policy is a frame-based algorithm similar to RandRR , except that at the beginning of ev ery transmission round the polic y selection is no longer random but based on a queue-dependent rule. W e note that QRR is a polynomial time algorithm because we can compute φ ( t ) in (14) in polynomial time with the follo wing divide and conquer approach: 1) P artition the set Φ into subsets { Φ 1 , . . . , Φ N } , where Φ M , M ∈ { 1 , . . . , N } , is the set of N -dimensional binary vectors having exactly M entries be 1 . 2) F or each M ∈ { 1 , . . . , N } , ﬁnd the maximizer of f ( U ( t ) , RR ( φ )) among vectors in Φ M . For each φ ∈ Φ M , we have f ( U ( t ) , RR ( φ )) = 7 The vector φ ( t ) is a queue-dependent decision and thus we should write φ ( U ( t ) , t ) as a function of U ( t ) . For simplicity we use φ ( t ) instead. 11 X n : φ n =1 " U n ( t ) P ( M ) n, 01 P n, 10 − 1 + P ( M ) n, 01 P n, 10 ! N X n =1 U n ( t ) λ n # , and the maximizer of f ( U ( t ) , RR ( φ )) is to activate the M channels that yield the M largest summands of the abov e equation. 3) Obtain φ ( t ) by comparing the maximizers from the abov e step for different values of M . The detailed implementation is as follo ws. Polynomial time implementation of Step 2 of QRR : 1) F or each ﬁx ed M ∈ { 1 , . . . , N } , we do the follo wing: Compute U n ( t ) P ( M ) n, 01 P n, 10 − 1 + P ( M ) n, 01 P n, 10 ! N X n =1 U n ( t ) λ n (15) for all n ∈ { 1 , . . . , N } . Sort these N numbers and deﬁne the binary v ector φ M = ( φ M 1 , . . . , φ M N ) such that φ M n = 1 if the value (15) of channel n is among the M largest, otherwise φ M n = 0 . T ies are broken arbitrarily . Let ˆ f ( U ( t ) , M ) denote the sum of the M largest values of (15). 2) Deﬁne M ( t ) , arg max 1 ≤ M ≤ N ˆ f ( U ( t ) , M ) . Then we assign φ ( t ) = φ M ( t ) . Using a novel variable-length frame-based L yapunov anal- ysis, we show in the next theorem that QRR stabilizes the network with any arriv al rate vector λ strictly within the inner capacity bound Λ int . 8 The idea is that we compare QRR with the (unknown) policy RandRR ∗ that stabilizes λ . W e show that, in ev ery transmission round, QRR ﬁnds and ex ecutes a round robin policy RR ( φ ( t )) that yields a larger negati ve drift on the queue backlogs than RandRR ∗ does in the current round. Therefore, QRR is stable. Theorem 3. F or any data rate vector λ interior to Λ int , policy QRR str ongly stabilizes the network. Pr oof of Theor em 3: See Appendix H. V I I . C O N C L U S I O N The network capacity of a wireless network is practically degraded by communication overhead. In this paper , we take a step forward by studying the fundamental achiev able rate region when communication ov erhead is k ept minimum, that is, when channel probing is not permitted. While solving the original problem is difﬁcult, we construct an inner and an outer bound on the network capacity region, with the aid of channel memory . When channels are symmetric and the network serves a large number of users, we show the inner and outer bound are progressiv ely tight when the data rates of different users are more balanced. W e also deri ve a simple queue-dependent frame-based policy , as a function of packet arriv al rates and channel statistics, and show that this policy stabilizes the network for any data rates strictly within the inner capacity bound. 8 In (50) we show that as long as the queue backlog vector U ( t ) is not iden- tically zero the arri val rate v ector λ is interior to the inner capacity bound Λ int , in Step 2 of the QRR polic y we always have max φ ∈ Φ f ( U ( t ) , RR ( φ )) > 0 . T ransmitting data without channel probing is one of the many options for communication over a wireless network. Practically each option may have pros and cons on criteria like the achiev able throughput, power efﬁciency , implementation complexity , etc. In the future it is important to explore how to combine all possible options to push the practically achie v able network capacity to the limit. It is part of our future work to generalize the methodology and framework dev eloped in this paper to more general cases, such as when limited probing is allo wed and/or other QoS metrics such as energy consumption are considered. It will also be interesting to see how this framew ork can be applied to solve new problems in opportunistic spectrum access in cognitive radio networks, in opportunistic scheduling with delayed/uncertain channel state information, and in restless bandit problems. A P P E N D I X A Pr oof of Lemma 2: Initially , by (3) we have ω n (0) = π n, ON ≥ P ( M ) n, 01 for all n . Suppose the base station switches to channel n at time t , and the last use of channel n ends at slot ( t − k ) for some k < t . In slot ( t − k ) , there are two possible cases: 1) Channel n turns OFF , and as a result the information state on slot t is ω n ( t ) = P ( k ) n, 01 . Due to round robin, the other ( M − 1) channels must hav e been used for at least one slot before t after slot ( t − k ) , and thus k ≥ M . By (3) we ha ve ω n ( t ) = P ( k ) n, 01 ≥ P ( M ) n, 01 . 2) Channel n is ON and transmits a dummy pack et. Thus ω n ( t ) = P ( k ) n, 11 . By (3) we have ω n ( t ) = P ( k ) n, 11 ≥ P ( M ) n, 01 . A P P E N D I X B Pr oof of Lemma 6: At the beginning of a new round, suppose round robin policy RR ( φ ) is selected. W e index the M ( φ ) acti ve channels in φ as ( n 1 , n 2 , . . . , n M ( φ ) ) , which is in the decreasing order of the time duration between their last use and the beginning of the current round. In other words, the last use of n k is earlier than that of n k 0 only if k < k 0 . Fix an activ e channel n k . Then it suf ﬁces to show that when this channel is served in the current round, the time duration back to the end of its last service is at least ( M ( φ ) − 1) slots (that this channel has information state no worse than P ( M ( φ )) n k , 01 then follows the same arguments in the proof of Lemma 2). W e partition the activ e channels in φ other than n k into two sets A = { n 1 , n 2 , . . . , n k − 1 } and B = { n k +1 , n k +2 , . . . , n M ( φ ) } . Then the last use of ev ery channel in B occurs after the last use of n k , and so channel n k has been idled for at least |B | slots at the start of the current round. Howe ver , the policy in this round will serve all channels in A before serving n k , taking at least one slot per channel, and so we wait at least additional |A| slots before serving channel n k . The total time that this channel has been idled is thus at least |A| + |B| = M ( φ ) − 1 . 12 A P P E N D I X C Pr oof of Theor em 1: Let Z ( t ) denote the number of times Step 1 of RandRR is e xecuted in [0 , t ) , in which we suppose vector φ is selected Z φ ( t ) times. Deﬁne t i , where i ∈ Z + , as the ( i + 1) th time instant a new vector φ is selected. Assume t 0 = 0 , and thus the ﬁrst selection occurs at time 0 . It follows that Z ( t − i ) = i , Z ( t i ) = i + 1 , and the i th round of packet transmissions ends at time t − i . Fix a vector φ . W ithin the time periods in which policy RR ( φ ) is ex ecuted, denote by L φ kn the duration of the k th time the base station stays with channel n . Then the time av erage throughput that policy RR ( φ ) yields on its acti ve channel n ov er [0 , t i ) is P Z φ ( t i ) k =1  L φ kn − 1  P φ ∈ Φ P Z φ ( t i ) k =1 P n : φ n =1 L φ kn . (16) For simplicity , here we focus on discrete time instants { t i } large enough so that Z φ ( t i ) > 0 for all φ ∈ Φ (so that the sums in (16) make sense). The generalization to arbitrary time t can be done by incorporating fractional transmission rounds, which are amortized o ver time. Next, rewrite (16) as P Z φ ( t i ) k =1 P n : φ n =1 L φ kn P φ ∈ Φ P Z φ ( t i ) k =1 P n : φ n =1 L φ kn P Z φ ( t i ) k =1  L φ kn − 1  P Z φ ( t i ) k =1 P n : φ n =1 L φ kn | {z } ( ∗ ) . (17) As t → ∞ , the second term ( ∗ ) of (17) satisﬁes ( ∗ ) = 1 Z φ ( t i ) P Z φ ( t i ) k =1  L φ kn − 1  P n : φ n =1 1 Z φ ( t i ) P Z φ ( t i ) k =1 L φ kn ( a ) → E h L φ 1 n − 1 i P n : φ n =1 E h L φ 1 n i ( b ) = η φ n , where (a) is by the La w of Large Numbers (we hav e sho wn in Corollary 1 that L φ kn are i.i.d. for different k ) and (b) by (9). Denote the ﬁrst term of (17) by β φ ( t i ) , where we note that β φ ( t i ) ∈ [0 , 1] for all φ ∈ Φ and P φ ∈ Φ β φ ( t i ) = 1 . W e can rewrite β φ ( t i ) as β φ ( t i ) = h Z φ ( t i ) Z ( t i ) i P n : φ n =1 h 1 Z φ ( t i ) P Z φ ( t i ) k =1 L φ kn i P φ ∈ Φ h Z φ ( t i ) Z ( t i ) i P n : φ n =1 h 1 Z φ ( t i ) P Z φ ( t i ) k =1 L φ kn i . As t → ∞ , we hav e β φ , lim i →∞ β φ ( t i ) = α φ P n : φ n =1 E h L φ 1 n i P φ ∈ Φ α φ P n : φ n =1 E h L φ 1 n i , (18) where by the La w of Lar ge Numbers we ha ve Z φ ( t i ) Z ( t i ) → α φ , 1 Z φ ( t i ) Z φ ( t i ) X k =1 L φ kn → E h L φ 1 n i . From (16)(17)(18), we have shown that the throughput con- tributed by policy RR ( φ ) on its activ e channel n is β φ η φ n . Consequently , RandRR parameterized by { α φ } φ ∈ Φ supports any data rate vector λ that is entrywise dominated by λ ≤ P φ ∈ Φ β φ η φ , where { β φ } φ ∈ Φ is deﬁned in (18) and η φ in (9). The abov e analysis shows that every RandRR policy achiev es a boundary point of Λ int deﬁned in Theorem 1. Con versely , the next lemma, pro ved in Appendix D, sho ws that ev ery boundary point of Λ int is achie vable by some RandRR policy , and the proof is complete. Lemma 11. F or any pr obability distribution { β φ } φ ∈ Φ , ther e exists another probability distribution { α φ } φ ∈ Φ that solves the linear system β φ = α φ P n : φ n =1 E h L φ 1 n i P φ ∈ Φ α φ P n : φ n =1 E h L φ 1 n i , for all φ ∈ Φ . (19) A P P E N D I X D Pr oof of Lemma 11: F or any probability distribution { β φ } φ ∈ Φ , we prove the lemma by inductively constructing the solution { α φ } φ ∈ Φ to (19). The induction is on the cardinality of Φ . W ithout loss of generality , we index elements in Φ by Φ = { φ 1 , φ 2 , . . . } , where φ k = ( φ k 1 , . . . , φ k N ) . W e deﬁne χ k , P n : φ k n =1 E h L φ k 1 n i and redeﬁne β φ k , β k and α φ k , α k . Then we can rewrite (19) as β k = α k χ k P 1 ≤ k ≤| Φ | α k χ k , , for all k ∈ { 1 , 2 , . . . , | Φ |} . (20) W e ﬁrst note that Φ = { φ 1 } is a degenerate case where β 1 and α 1 must both be 1 . When Φ = { φ 1 , φ 2 } , for any probability distribution { β 1 , β 2 } with positive elements , 9 it is easy to show α 1 = χ 2 β 1 χ 1 β 2 + χ 2 β 1 , α 2 = 1 − α 1 . Let Φ = { φ k : 1 ≤ k ≤ K } for some K ≥ 2 . Assume that for any probability distribution { β k > 0 : 1 ≤ k ≤ K } we can ﬁnd { α k : 1 ≤ k ≤ K } that solves (20). For the case Φ = { φ k : 1 ≤ k ≤ K + 1 } and any { β k > 0 : 1 ≤ k ≤ K + 1 } , we construct the solution { α k : 1 ≤ k ≤ K + 1 } to (18) as follows. Let { γ 2 , γ 3 , . . . , γ K +1 } be the solution to the linear system γ k χ k P K +1 k =2 γ k χ k = β k P K +1 k =2 β k , 2 ≤ k ≤ K + 1 . (21) By the induction assumption, the set { γ 2 , γ 3 , . . . , γ K +1 } exists and satisﬁes γ k ∈ [0 , 1] for 2 ≤ k ≤ K + 1 and P K +1 k =2 γ k = 1 . Deﬁne α 1 , β 1 P K +1 k =2 γ k χ k χ 1 (1 − β 1 ) + β 1 P K +1 k =2 γ k χ k (22) α k , (1 − α 1 ) γ k , 2 ≤ k ≤ K + 1 . (23) 9 If one element of { β 1 , β 2 } is zero, say β 2 = 0 , we can show necessarily α 2 = 0 and it degenerates to the one-policy case Φ = { φ 1 } . Such degeneration happens in general cases. Thus in the rest of the proof we will only consider probability distributions that only have positive elements. 13 It remains to show (22) and (23) are the desired solution. It is easy to observ e that α k ∈ [0 , 1] for 1 ≤ k ≤ K + 1 , and K +1 X k =1 α k = α 1 + (1 − α 1 ) K +1 X k =2 γ k = α 1 + (1 − α 1 ) = 1 . By rearranging terms in (22) and using (23), we ha ve β 1 = α 1 χ 1 α 1 χ 1 + P K +1 k =2 (1 − α 1 ) γ k χ k = α 1 χ 1 P K +1 k =1 α k χ k . (24) For 2 ≤ k ≤ K + 1 , α k χ k P K +1 k =1 α k χ k = " α k χ k P K +1 k =2 α k χ k # " P K +1 k =2 α k χ k P K +1 k =1 α k χ k # ( a ) = " (1 − α 1 ) γ k χ k P K +1 k =2 (1 − α 1 ) γ k χ k # " 1 − α 1 χ 1 P K +1 k =1 α k χ k # ( b ) = " γ k χ k P K +1 k =2 γ k χ k # (1 − β 1 ) ( c ) = β k P K +1 k =2 β k ! (1 − β 1 ) ( d ) = β k , where (a) is by plugging in (23), (b) uses (24), (c) uses (21), and (d) is by P K +1 k =1 β k = 1 . The proof is complete. A P P E N D I X E Pr oof of Lemma 7: Let N 1 ( T ) ⊆ { 0 , 1 , . . . , T − 1 } be the subset of time instants in which Y ( t ) = 1 . Note that P T − 1 t =0 Y ( t ) = |N 1 ( T ) | . For each t ∈ N 1 ( T ) , let 1 [1 → 0] ( t ) be an indicator function which is 1 if Y ( t ) transits from 1 to 0 at time t , and 0 otherwise. W e deﬁne N 0 ( T ) and 1 [0 → 1] ( t ) similarly . In { 0 , 1 , . . . , T − 1 } , since state transitions of { Y ( t ) } from 1 to 0 and from 0 to 1 differ by at most 1 , we hav e       X t ∈N 1 ( T ) 1 [1 → 0] ( t ) − X t ∈N 0 ( T ) 1 [0 → 1] ( t )       ≤ 1 , (25) which is true for all T . Dividing (25) by T , we get       1 T X t ∈N 1 ( T ) 1 [1 → 0] ( t ) − 1 T X t ∈N 0 ( T ) 1 [0 → 1] ( t )       ≤ 1 T . (26) Consider the subsequence { T k } such that lim k →∞ 1 T k T k − 1 X t =0 Y ( t ) = π Y (1) = lim k →∞ |N 1 ( T k ) | T k . (27) Note that { T k } exists because (1 /T ) P T − 1 t =0 Y ( t ) is a bounded sequence indexed by integers T . Moreov er , there exists a subsequence { T n } of { T k } so that each of the two averages in (26) has a limit point with respect to { T n } , because they are bounded sequences, too. In the rest of the proof we will work on { T n } , but we drop subscript n for notational simplicity . Passing T → ∞ , we get from (26) that  lim T →∞ |N 1 ( T ) | T  | {z } ( a ) = π Y (1)   lim T →∞ 1 |N 1 ( T ) | X t ∈N 1 ( T ) 1 [1 → 0] ( t )   | {z } , β =  lim T →∞ |N 0 ( T ) | T  | {z } ( b ) = 1 − π Y (1)   lim T →∞ 1 |N 0 ( T ) | X t ∈N 0 ( T ) 1 [0 → 1] ( t )   | {z } , γ , (28) where (a) is by (27) and (b) is by |N 1 ( T ) | + |N 0 ( T ) | = T . From (28) we get π Y (1) = γ β + γ . The next lemma, prov ed in Appendix F, helps to show γ ≤ P 01 . Lemma 12 (Stochastic coupling of random binary sequences) . Let { I n } ∞ n =1 be an inﬁnite sequence of binary random vari- ables. Suppose for all n ∈ { 1 , 2 , . . . } we have Pr [ I n = 1 | I 1 = i 1 , . . . , I n − 1 = i n − 1 ] ≤ P 01 (29) for all possible values of i 1 , . . . , i n − 1 . Then we can construct a new sequence { ˆ I n } ∞ n =1 of binary random variables that are i.i.d. with Pr h ˆ I n = 1 i = P 01 for all n and satisfy ˆ I n ≥ I n for all n . Consequently , we have lim sup N →∞ 1 N N X n =1 I n ≤ lim sup N →∞ 1 N N X n =1 ˆ I n = P 01 . T o use Lemma 12 to prove γ ≤ P 01 , let t n denote the n th time Y ( t ) = 0 and let I n = 1 [0 → 1] ( t n ) . For simplicity assume { t n } is an inﬁnite sequence so that state 0 is visited inﬁnitely often in { Y ( t ) } . By the assumption that Q 01 ( t ) ≤ P 01 for all t , we know (29) holds. Therefore by Lemma 12 we ha ve γ ≤ lim sup N →∞ 1 N N X n =1 1 [0 → 1] ( t n ) ≤ P 01 . Similarly as Lemma 12, we can show β ≥ P 10 by stochastic coupling. Therefore π Y (1) = γ β + γ ≤ γ P 10 + γ ≤ P 01 P 01 + P 10 = π X (1) . A P P E N D I X F Pr oof of Lemma 12: For simplicity , we assume Pr [ I n = 0 | I 1 = i 1 , . . . , I n − 1 = i n − 1 ] > 0 for all n and all possible values of i 1 , . . . , i n − 1 . For each n ∈ { 1 , 2 , . . . } , deﬁne ˆ I n as follows: If I n = 1 , deﬁne ˆ I n = 1 . If I n = 0 , observe the history I n − 1 1 , ( I 1 , . . . , I n − 1 ) and 14 independently choose ˆ I n as follows: ˆ I n =      1 with prob . P 01 − Pr [ I n =1 | I n − 1 1 ] Pr [ I n =0 | I n − 1 1 ] 0 with prob . 1 − P 01 − Pr [ I n =1 | I n − 1 1 ] Pr [ I n =0 | I n − 1 1 ] . (30) The probabilities in (30) are well-deﬁned because P 01 ≥ Pr  I n = 1 | I n − 1 1  by (29), and P 01 ≤ 1 = Pr  I n = 1 | I n − 1 1  + Pr  I n = 0 | I n − 1 1  and therefore P 01 − Pr  I n = 1 | I n − 1 1  ≤ Pr  I n = 0 | I n − 1 1  . W ith the abov e deﬁnition of ˆ I n , we hav e ˆ I n = 1 whenev er I n = 1 . Therefore ˆ I n ≥ I n for all n . Further , for any n and any binary vector i n − 1 1 , ( i 1 , . . . , i n − 1 ) , we have Pr h ˆ I n = 1 | I n − 1 1 = i n − 1 1 i = Pr  I n = 1 | I n − 1 1 = i n − 1 1  + Pr  I n = 0 | I n − 1 1 = i n − 1 1  × P 01 − Pr  I n = 1 | I n − 1 1 = i n − 1 1  Pr  I n = 0 | I n − 1 1 = i n − 1 1  = P 01 . (31) Therefore, for all n we hav e Pr h ˆ I n = 1 i = X i n − 1 1 Pr h ˆ I n = 1 | I n − 1 1 = i n − 1 1 i Pr  I n − 1 1 = i n − 1 1  = P 01 , and thus the ˆ I n variables are identically distributed. It remains to prove that the y are independent. Suppose components in ˆ I n 1 , ( ˆ I 1 , . . . , ˆ I n ) are independent. W e prove that components in ˆ I n +1 1 = ( ˆ I 1 , . . . , ˆ I n +1 ) are also independent. For any binary vector ˆ i n +1 1 , ( ˆ i 1 , . . . , ˆ i n +1 ) , since Pr h ˆ I n +1 1 = ˆ i n +1 1 i = Pr h ˆ I n +1 = ˆ i n +1 | ˆ I n 1 = ˆ i n 1 i Pr h ˆ I n 1 = ˆ i n 1 i = Pr h ˆ I n +1 = ˆ i n +1 | ˆ I n 1 = ˆ i n 1 i n Y k =1 Pr h ˆ I k = ˆ i k i , it sufﬁces to show Pr h ˆ I n +1 = 1 | ˆ I n 1 = ˆ i n 1 i = Pr h ˆ I n +1 = 1 i = P 01 . Indeed, Pr h ˆ I n +1 = 1 | ˆ I n 1 = ˆ i n 1 i = X i n 1 Pr h ˆ I n +1 = 1 | I n 1 = i n 1 , ˆ I n 1 = ˆ i n 1 i × Pr h I n 1 = i n 1 | ˆ I n 1 = ˆ i n 1 i = X i n 1 Pr h ˆ I n +1 = 1 | I n 1 = i n 1 i Pr h I n 1 = i n 1 | ˆ I n 1 = ˆ i n 1 i ( a ) = X i n 1 P 01 Pr h I n 1 = i n 1 | ˆ I n 1 = ˆ i n 1 i = P 01 , where (a) is by (31), and the proof is complete. A P P E N D I X G Pr oof of Lemma 9: By deﬁnition of d ( v ) , there exists a nonempty subset A ⊆ Φ d ( v ) , and for e very φ ∈ A a positi ve real number ˆ β φ > 0 , such that v = P φ ∈ A ˆ β φ φ . For each φ ∈ A , we hav e M ( φ ) = d ( v ) and thus c M ( φ ) = c d ( v ) . Deﬁne β φ , ˆ β φ P φ ∈ A ˆ β φ for each φ ∈ A and { β φ } φ ∈ A is a probability distribution. Consider a RandRR policy that in ev ery round selects φ ∈ A with probability β φ . By Lemma 4, this RandRR policy achiev es throughput vector λ = ( λ 1 , . . . , λ N ) that satisﬁes λ = X φ ∈ A β φ c M ( φ ) M ( φ ) φ = c d ( v ) d ( v ) X φ ∈ A ˆ β φ P φ ∈ A ˆ β φ φ = c d ( v ) d ( v ) P φ ∈ A ˆ β φ X φ ∈ A ˆ β φ φ = c d ( v ) d ( v ) P φ ∈ A ˆ β φ ! v , which is in the direction of v . In addition, the sum throughput N X n =1 λ n = X φ ∈ A β φ c M ( φ ) M ( φ ) N X n =1 φ n ! = X φ ∈ A β φ c M ( φ ) = c d ( v ) is achiev ed. A P P E N D I X H Pr oof of Theor em 3: ( A Related RandRR Policy ) For each randomized round robin policy RandRR , it is useful to consider a rene wal re war d pr ocess where rene wal epochs are deﬁned as time instants at which RandRR starts a new round of transmission. 10 Let T denote the renew al period. W e say one unit of reward is earned by a user if RandRR serves a packet to that user . Let R n denote the sum reward earned by user n in one rene wal period T , representing the number of successful transmissions user n receiv es in one round of scheduling. Conditioning on the round robin policy RR ( φ ) chosen by RandRR for the current round of transmission, we hav e from Corollary 1: E [ T ] = X φ ∈ Φ α φ E [ T | RR ( φ )] (32) E [ T | RR ( φ )] = X n : φ n =1 E h L φ 1 n i , (33) and for all n ∈ { 1 , 2 , . . . , N } , E [ R n ] = X φ ∈ Φ α φ E [ R n | RR ( φ )] (34) 10 W e note that the renewal re ward process is deﬁned solely with respect to RandRR , and is only used to facilitate our analysis. At these renewal epochs, the state of the network, including the current queue state U ( t ) , does not necessarily renew itself. 15 E [ R n | RR ( φ )] = ( E h L φ 1 n − 1 i if φ n = 1 0 if φ n = 0 . (35) Consider the round robin policy RR ((1 , 1 , . . . , 1)) that serves all N channels in one round. W e deﬁne T max as its renew al period. From Corollary 1, we know E [ T max ] < ∞ and E  ( T max ) 2  < ∞ . Further , for any RandRR , including using a RR ( φ ) policy in e very round as special cases, we can show that T max is stochastically lar ger than the renewal period T , and ( T max ) 2 is stochastically lar ger than T 2 . It follo ws that E [ T ] ≤ E [ T max ] , E  T 2  ≤ E  ( T max ) 2  . (36) W e hav e denoted by RandRR ∗ (in the discussion after (13)) the randomized round robin policy that achie ves a service rate v ector strictly larger than the target arriv al rate vector λ entrywise. Let T ∗ denote the rene wal period of RandRR ∗ , and R ∗ n the sum rew ard (the number of successful transmissions) receiv ed by user n ov er the renewal period T ∗ . Then we hav e E [ R ∗ n ] E [ T ∗ ] ( a ) = P φ ∈ Φ α φ E [ R ∗ n | RR ( φ )] P φ ∈ Φ α φ E [ T ∗ | RR ( φ )] ( b ) = X φ ∈ Φ α φ P φ ∈ Φ α φ E [ T ∗ | RR ( φ )] ! E [ R ∗ n | RR ( φ )] = X φ ∈ Φ α φ E [ T ∗ | RR ( φ )] P φ ∈ Φ α φ E [ T ∗ | RR ( φ )] | {z } ( c )= β n E [ R ∗ n | RR ( φ )] E [ T ∗ | RR ( φ )] | {z } ( d )= η φ n = X φ ∈ Φ β φ η φ n ( e ) > λ n + , (37) where (a) is by (32)(34), (b) is by rearranging terms, (c) is by plugging (33) into (19), (d) is by plugging (33) and (35) into (9) in Section IV -B, and (e) is by (13). From (37) we get E [ R ∗ n ] > ( λ n +  ) E [ T ∗ ] , for all n ∈ { 1 , . . . , N } . (38) ( L yapunov Drift ) From (12), in a frame of size T (which is possibly random), we can sho w that for all n U n ( t + T ) ≤ max " U n ( t ) − T − 1 X τ =0 µ n ( t + τ ) , 0 # + T − 1 X τ =0 a n ( t + τ ) . (39) W e deﬁne a L yapunov function L ( U ( t )) , (1 / 2) P N n =1 U 2 n ( t ) and the T -slot L yapunov drift ∆ T ( U ( t )) , E [ L ( U ( t + T ) − L ( U ( t )) | U ( t )] , where in the last term the expectation is with respect to the randomness of the whole network in frame T , including the randomness of T . By taking square of (39) and then conditional expectation on U ( t ) , we can sho w ∆ T ( U ( t )) ≤ 1 2 N (1 + A 2 max ) E  T 2 | U ( t )  − E " N X n =1 U n ( t ) " T − 1 X τ =0 ( µ n ( t + τ ) − a n ( t + τ )) # | U ( t ) # . (40) Deﬁne f ( U ( t ) , θ ) as the last term of (40), where θ represents a scheduling policy that controls the service rates µ n ( t + τ ) and the frame size T . In the following analysis, we only consider θ in the class of RandRR policies, and the frame size T is the renew al period of a RandRR policy . By (36), the second term of (40) is less than or equal to the constant B 1 , (1 / 2) N (1 + A 2 max ) E  ( T max ) 2  < ∞ . It follo ws that ∆ T ( U ( t )) ≤ B 1 − f ( U ( t ) , θ ) . (41) In f ( U ( t ) , θ ) , it is useful to consider θ = RandRR ∗ and T is the renewal period T ∗ of RandRR ∗ . Assume t is the beginning of a renew al period. For each n ∈ { 1 , 2 , . . . , N } , because R ∗ n is the number of successful transmissions user n receiv es in the rene wal period T ∗ , we have E " T ∗ − 1 X τ =0 µ n ( t + τ ) | U ( t ) # = E [ R ∗ n ] . Combining with (38), we get E " T ∗ − 1 X τ =0 µ n ( t + τ ) | U ( t ) # > ( λ n +  ) E [ T ∗ ] . (42) By the assumption that packet arriv als are i.i.d. over slots and independent of the current queue backlogs, we have for all n E " T ∗ − 1 X τ =0 a n ( t + τ ) | U ( t ) # = λ n E [ T ∗ ] . (43) Plugging (42) and (43) into f ( U ( t ) , RandRR ∗ ) , we get f ( U ( t ) , RandRR ∗ ) ≥  E [ T ∗ ] N X n =1 U n ( t ) . (44) It is also useful to consider θ as a round robin policy RR ( φ ) for some φ ∈ Φ . In this case frame size T is the rene wal period T φ of RR ( φ ) (note that RR ( φ ) is a special case of RandRR ). From Corollary 1, we have E  T φ | U ( t )  = E  T φ  = X n : φ n =1 E h L φ 1 n i , (45) where E h L φ 1 n i can be expanded by (8). Let t be the beginning of a transmission round. If channel n is active, we ha ve E   T φ − 1 X τ =0 µ n ( t + τ ) | U ( t )   = E h L φ 1 n i − 1 , and 0 otherwise. It follows that f ( U ( t ) , RR ( φ )) =   X n : φ n =1 U n ( t ) E h L φ 1 n − 1 i   − E  T φ  N X n =1 U n ( t ) λ n ( a ) = X n : φ n =1 " U n ( t ) E h L φ 1 n − 1 i − E h L φ 1 n i N X n =1 U n ( t ) λ n # , (46) where (a) is by (45) and rearranging terms. ( Design of QRR ) Gi ven the current queue backlogs U ( t ) , we are interested in the policy that maximizes f ( U ( t ) , θ ) over all RandRR policies in one round of transmission. Although 16 the RandRR policy space is uncountably lar ge and thus search- ing for the optimal solution could be difﬁcult, next we sho w that the optimal solution is a round robin policy RR ( φ ) for some φ ∈ Φ and can be found by maximizing f ( U ( t ) , RR ( φ )) in (46) o ver φ ∈ Φ . T o see this, we denote by φ ( t ) the binary vector associated with the RR ( φ ) polic y that maximizes f ( U ( t ) , RR ( φ )) over φ ∈ Φ , and we hav e f ( U ( t ) , RR ( φ ( t ))) ≥ f ( U ( t ) , RR ( φ )) , for all φ ∈ Φ . (47) For any RandRR policy , conditioning on the policy RR ( φ ) chosen for the current round of scheduling, we have f ( U ( t ) , RandRR ) = X φ ∈ Φ α φ f ( U ( t ) , RR ( φ )) , (48) where { α φ } φ ∈ Φ is the probability distrib ution associated with RandRR . By (47)(48), for any RandRR we get f ( U ( t ) , RR ( φ ( t ))) ≥ X φ ∈ Φ α φ f ( U ( t ) , RR ( φ )) = f ( U ( t ) , RandRR ) . (49) W e note that as long as the queue backlog vector U ( t ) is not identically zero and the arriv al rate vector λ is strictly within the inner capacity bound Λ int , we get max φ ∈ Φ f ( U ( t ) , RR ( φ )) ( a ) = f ( U ( t ) , RR ( φ ( t ))) ( b ) ≥ f ( U ( t ) , RandRR ∗ ) ( c ) > 0 , (50) where (a) is from the deﬁnition of φ ( t ) , (b) from (49), and (c) from (44). The policy QRR is designed to be a frame-based algo- rithm where at the beginning of each round we observe the current queue backlog vector U ( t ) , ﬁnd the binary vector φ ( t ) whose associated round robin policy RR ( φ ( t )) maxi- mizes f ( U ( t ) , RandRR ) over RandRR policies, and execute RR ( φ ( t )) for one round of transmission. W e emphasize that in ev ery transmission round of QRR , acti ve channels are served in the order that the least recently used channel is served ﬁrst, and the ordering may change from one round to another . ( Stability Analysis ) Again, policy QRR comprises of a se- quence of transmission rounds, where in each round QRR ﬁnds and executes policy RR ( φ ( t )) for one round, and different φ ( t ) may be used in different rounds. In the k th round, let T QRR k denote its time duration. Deﬁne t k = P k i =1 T QRR i for all k ∈ N and note that t k − t k − 1 = T QRR k . Let t 0 = 0 . Then for each k ∈ N , from (41) we ha ve ∆ T QRR k ( U ( t k − 1 )) ( a ) ≤ B 1 − f ( U ( t k − 1 ) , QRR ) ( b ) ≤ B 1 − f ( U ( t k − 1 ) , RandRR ∗ ) ( c ) ≤ B 1 −  E [ T ∗ ] N X n =1 U n ( t k − 1 ) , (51) where (a) is by (41), (b) is because QRR is the maximizer of f ( U ( t k − 1 ) , RandRR ) over all RandRR policies, and (c) is by (44). By taking expectation ov er U ( t k − 1 ) in (51) and noting that E [ T ∗ ] ≥ 1 , for all k ∈ N we get E [ L ( U ( t k ))] − E [ L ( U ( t k − 1 ))] ≤ B 1 −  E [ T ∗ ] N X n =1 E [ U n ( t k − 1 )] ≤ B 1 −  N X n =1 E [ U n ( t k − 1 )] . (52) Summing (52) over k ∈ { 1 , 2 , . . . , K } , we have E [ L ( U ( t K ))] − E [ L ( U ( t 0 ))] ≤ K B 1 −  K X k =1 N X n =1 E [ U n ( t k − 1 )] . Since U ( t K ) ≥ 0 entrywise and by assumption U ( t 0 ) = U (0) = 0 , we hav e  K X k =1 N X n =1 E [ U n ( t k − 1 )] ≤ K B 1 . (53) Dividing (53) by K and passing K → ∞ , we get lim sup K →∞ 1 K K X k =1 N X n =1 E [ U n ( t k − 1 )] ≤ B 1  < ∞ . (54) Equation (54) shows that the network is stable when sampled at renewal time instants { t k } . Then that it is also stable when sampled ov er all time follows because T QRR k , the renew al period of the RR ( φ ) policy chosen in the k th round of QRR , has ﬁnite ﬁrst and second moments for all k (see (36)), and in ev ery slot the number of packet arriv als to a user is bounded. These details are provided in Lemma 13, which is prov ed in Appendix I. Lemma 13. Given that E h T QRR k i ≤ E [ T max ] , E h ( T QRR k ) 2 i ≤ E  ( T max ) 2  (55) for all k ∈ { 1 , 2 , . . . } , packets arrivals to a user is bounded by A max in every slot, and the network sampled at r enewal epochs { t k } is stable fr om (54) , we have lim sup K →∞ 1 t K t K − 1 X τ =0 N X n =1 E [ U n ( τ )] < ∞ . A P P E N D I X I Pr oof of Lemma 13: In [ t k − 1 , t k ) , it is easy to see for all n ∈ { 1 , . . . , N } U n ( t k − 1 + τ ) ≤ U n ( t k − 1 ) + τ A max , 0 ≤ τ < T QRR k . (56) Summing (56) over τ ∈ { 0 , 1 , . . . , T QRR k − 1 } , we get T QRR k − 1 X τ =0 U n ( t k − 1 + τ ) ≤ T QRR k U n ( t k − 1 ) + ( T QRR k ) 2 A max / 2 . (57) 17 Summing (57) ov er k ∈ { 1 , 2 , . . . , K } and noting that t K = P K k =1 T QRR k , we have t K − 1 X τ =0 U n ( τ ) = K X k =1 T QRR k − 1 X τ =0 U n ( t k − 1 + τ ) ( a ) ≤ K X k =1 h T QRR k U n ( t k − 1 ) + ( T QRR k ) 2 A max / 2 i , (58) where (a) is by (57). T aking expectation of (58) and dividing it by t K , we have 1 t K t K − 1 X τ =0 E [ U n ( τ )] ( a ) ≤ 1 K t K − 1 X τ =0 E [ U n ( τ )] ( b ) ≤ 1 K K X k =1 E " T QRR k U n ( t k − 1 ) + ( T QRR k ) 2 A max 2 # , (59) where (a) follows t K ≥ K and (b) is by (58). Next, we have E h T QRR k U n ( t k − 1 ) i = E h E h T QRR k U n ( t k − 1 ) | U n ( t k − 1 ) ii ( a ) ≤ E [ E [ T max U n ( t k − 1 ) | U n ( t k − 1 )]] = E [ E [ T max ] U n ( t k − 1 )] = E [ T max ] E [ U n ( t k − 1 )] , (60) where (a) is because E h T QRR k i ≤ E [ T max ] . Using (55)(60) to upper bound the last term of (59), we have 1 t K t K − 1 X τ =0 E [ U n ( τ )] ≤ B 2 + E [ T max ] 1 K K X k =1 E [ U n ( t k − 1 )] , (61) where B 2 , 1 2 E  ( T max ) 2  A max < ∞ . Summing (61) over n ∈ { 1 , . . . , N } and passing K → ∞ , we get lim sup K →∞ 1 t K t K − 1 X τ =0 N X n =1 E [ U n ( τ )] ≤ N B 2 + E [ T max ] lim sup K →∞ 1 K K X k =1 N X n =1 E [ U n ( t k − 1 )] ! ( a ) ≤ N B 2 + E [ T max ] B 1 / < ∞ , where (a) is by (54). The proof is complete. R E F E R E N C E S [1] C.-P . Li and M. J. Neely , “On achie vable network capacity and throughput-achieving policies ov er markov on/off channels, ” in IEEE Int. Symp. Modeling and Optimization in Mobile, Ad Hoc, and W ireless Networks (W iOpt) , A vignon, France, May 2010. [2] ——, “Energy-optimal scheduling with dynamic channel acquisition in wireless downlinks, ” IEEE T rans. Mobile Comput. , vol. 9, no. 4, pp. 527 –539, Apr . 2010. [3] H. S. W ang and P .-C. Chang, “On verifying the ﬁrst-order markovian assumption for a rayleigh fading channel model, ” IEEE T rans. V eh. T echnol. , vol. 45, no. 2, pp. 353–357, May 1996. [4] M. Zorzi, R. R. Rao, and L. B. Milstein, “ A Markov model for block errors on fading channels, ” in P ersonal, Indoor and Mobile Radio Communications Symp. PIMRC , Oct. 1996. [5] D. P . Bertsekas, Dynamic Pr ogramming and Optimal Control , 3rd ed. Athena Scientiﬁc, 2005, vol. I. [6] Q. Zhao, B. Krishnamachari, and K. Liu, “On myopic sensing for multi- channel opportunistic access: Structure, optimality , and preformance, ” IEEE T rans. W ir eless Commun. , vol. 7, no. 12, pp. 5431–5440, Dec. 2008. [7] S. H. A. Ahmad, M. Liu, T . Javidi, Q. Zhao, and B. Krishnamachari, “Optimality of myopic sensing in multichannel opportunistic access, ” IEEE T rans. Inf. Theory , vol. 55, no. 9, pp. 4040–4050, Sep. 2009. [8] M. J. Neely , “Stochastic optimization for markov modulated networks with application to delay constrained wireless scheduling, ” in IEEE Conf. Decision and Contr ol (CDC) , 2009. [9] Q. Zhao and A. Swami, “ A decision-theoretic framew ork for opportunis- tic spectrum access, ” IEEE W ireless Commun. Mag. , vol. 14, no. 4, pp. 14–20, Aug. 2007. [10] A. Pantelidou, A. Ephremides, and A. L. T its, “Joint scheduling and routing for ad-hoc networks under channel state uncertainty , ” in IEEE Int. Symp. Modeling and Optimization in Mobile, Ad Hoc, and W ireless Networks (W iOpt) , Apr. 2007. [11] L. Y ing and S. Shakkottai, “On throughput optimality with delayed network-state information, ” in Information Theory and Application W orkshop (IT A) , 2008, pp. 339–344. [12] ——, “Scheduling in mobile ad hoc networks with topology and channel-state uncertainty , ” in IEEE INFOCOM , Rio de Janeiro, Brazil, Apr . 2009. [13] R. G. Gallager, Discrete Stochastic Pr ocesses . Kluwer Academic Publishers, 1996. [14] P . Whittle, “Restless bandits: Activity allocation in a changing world, ” J. Appl. Pr obab . , vol. 25, pp. 287–298, 1988. [15] J. C. Gittins, Multi-Armed Bandit Allocation Indices . Ne w Y ork, NY : W iley , 1989. [16] L. T assiulas and A. Ephremides, “Dynamic server allocation to parallel queues with random varying connectivity , ” IEEE T rans. Inf. Theory , vol. 39, no. 2, pp. 466–478, Mar. 1993. [17] L. Geor giadis, M. J. Neely , and L. T assiulas, “Resource allocation and cross-layer control in wireless networks, ” F oundations and T rends in Networking , vol. 1, no. 1, 2006.

Exploiting Channel Memory for Multi-User Wireless Scheduling without Channel Measurement: Capacity Regions and Algorithms

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment