On Myopic Sensing for Multi-Channel Opportunistic Access: Structure, Optimality, and Performance

We consider a multi-channel opportunistic communication system where the states of these channels evolve as independent and statistically identical Markov chains (the Gilbert-Elliot channel model). A user chooses one channel to sense and access in ea…

Authors: - **Qing Zhao** (University of California, Los Angeles) - **Bhaskar Krishnamachari** (University of Southern California) *(논문에 기재된 정확한 저자 명단 및 소속은 원문을 참고하시기 바랍니다.)* ###

On Myopic Sensing for Multi-Channel Opportunistic Access: Structure,   Optimality, and Performance
On My opic Sensing f or Multi-Channel Opportunistic Acce ss: Structur e, Optimality , and P erformance Qing Zhao, Bhaskar Kri sh namacha ri, Keqin Liu Abstract — W e consider a mu lti-channel opportunistic com- munication system wher e the states o f these channels ev olve as independ ent and statistically identical Markov chains (the Gilbert-Elliot channel model). A u ser chooses one chann el to sense and access in each slot and collects a reward determined by the state of the chosen c han nel. The pr oblem is to design a sensing policy for channel selection to maximize the av erage reward, which can be for mulated as a multi-arm r estless band it proce ss. In this paper , we study the structure, optimality , and perfo rmance of the my opic sensing policy . W e show that the myopic sensin g policy has a simple ro bust structure that reduces channel se lection to a round-robin pro cedure and obviates the need for knowing the channel transition p robabilities. Th e opt imality of thi s si mple policy is established for the two-channel case and conjectured for the general case based on nu merical results. The performa nce of the my opic sensing policy is analyzed, which, based on the opti- mality of myopic sensing, characterizes the maximum throughput of a multi-channel opportunistic com mun ication system and its scaling behavior with respect to th e number of ch annels. These r esults a pp ly to cognitiv e radio networks, opportunistic transmission in f ading en vironments, downlink scheduli ng in centralized networks, and resource-co nstrain ed jamming and anti-jamming. Index T erms: Opportunistic access , cognitive ra di o, m ul ti-channel MA C, mul ti-arm restless b andit p rocess, my opic policy . I . I N T RO D U C T I O N A. Multi-Channe l Oppo rtunistic Access The f undamen tal idea of opp ortunistic access is to ad apt th e transmission parameters (such as data rate and transmission power) according to the state of th e co mmunicatio n environ- ment including , f or example, fading cond itions, interference lev el, and buffer state. Since the seminal work by Kno pp and Humblet in 1995 [ 1], the con cept of oppor tunistic a ccess has fou nd applica tions beyond tr ansmission an d scheduling over fading cha nnels. An emerging application is co gnitive radio for opportunistic spec trum access, w here secondary users search in the spec trum for idle chan nels temporarily unused by primary u sers [2]. An other applica tion is resour ce-constrain ed jamming and anti-jamm ing, where a jam mer s eek s channels occupied by users or a u ser tries to avoid jamm ers. W e consider a gen eral opp ortunistic comm unication system where a user h as access to N parallel ch annels and chooses Manuscript recei ved Nov ember 30, 2007; revised June 1, 2008 and J une 26, 2008; accepted July 9, 2008. Part of this work was presented at CogNet, June 2007 and ICASSP , March 2008. This work was supported by the Army Researc h Laboratory CT A on Communication and Networks under Grant D AAD19-01-2-0011 and by the National Science Founda tion under Grants CNS-0627090, ECS-0622200, and CNS-0347621. Q. Zhao and K. Liu are with the Department of Electrica l and Com- puter E nginee ring, Univ ersity of Californ ia, Davis, CA 95616. Emails: { qzhao,kql iu } @ucdav is.edu. B. Krishnamachari is with the Ming Hsieh Departmen t of Electrical E ngineer ing, Univ ersity of Southern Calif ornia, Los Angeles, CA 90089. E mail: bkrishna@usc.edu. one channel to sense and access in each slot, aiming to maxi- mize its expected lo ng-term re ward ( i.e., th rough put). This user can be a ba se station, and each channel is associated with a downlink rec ei ver . In th is case, c hannel selection is equiv alent to r eceiv er selection , and the general p roblem con sidered here also applies to downlink sch eduling in a centralized network. These N chan nels are modelled as independent and stochas- tically ide ntical Gilber t-Elliot ch annels [3] , which has been common ly u sed to abstract physical ch annels with memory (see, for example, [4], [5 ]). As illustrated in Fig. 1 , the state of a channel — g ood or b ad — indicates the desirab ility of accessing th is chann el and determin es th e resultin g reward. For example, for the application of cogn iti ve radio networks, the go od state r epresents an unused chan nel b y pr imary users while the b ad state an o ccupied chann el 1 . The tr ansitions between these two s tates fo llow a Markov chain with transition probab ilities { p ij } i,j =0 , 1 . P S f r a g r e p l a c e m e n t s 0 1 (bad) (good ) p 01 p 11 p 00 p 10 Fig. 1. The Gilbert-Elliot channel model. A sensing policy th at governs the cha nnel selection in eac h slot is crucial to th e e fficienc y of multi-chan nel oppo rtunistic access. T he design of the optim al sensing policy c an be formu lated as a p artially observable Markov decision process (POMDP) for generally correlated c hannels, or a restless multi-armed ban dit pro cess for in depende nt channels. Unfo r- tunately , obtaining the optimal p olicy for a general POMDP or restless band it process is of ten intractab le due to the exponential co mputation c omplexity . A commo n app roach o f trading perf ormance f or tractable solutions is to conside r myop ic p olicies. A myo pic policy aims solely a t max imizing the immed iate r ew ard, ig noring the impact of the cur rent action on the futur e reward. Ob taining a myopic policy is thus a static optimization problem instead of a sequential decision-makin g problem. As a con sequence, the com plexity is significantly r educed, often at the pr ice of considerab le perfor mance loss. In this p aper, we show that for desig ning sensing strategies for mu lti-channel oppo rtunistic access, low co mplexity does not n ecessarily im ply su boptimal perfo rmance. T he myop ic 1 When the primary network employs load balancin g across channels, the occupa ncy process of all channel s can be conside red stochastic ally identical. 2 sensing policy with a simple and robust stru cture ac hiev es the op timal p erform ance und er the i.i.d . Gilb ert-Elliot chann el model. B. Contribution Under the i.i.d. Gilbert- Elliot ch annel model, we establish the struc ture and optima lity of the my opic sensing p olicy and analyze its performan ce. 1) Structure of Myopic Sen sing: Th e first contribution of this p aper is th e establishme nt of a simple and robust structu re of the my opic sensing policy . Besides sign ificant implications in the practical impleme ntation, th is result serves as the key to the optimality proof an d the performan ce an alysis. W e show that the basic structure of the myopic po licy is a round -robin sch eme based on a circular o rdering of the channels. For the case of p 11 ≥ p 01 , the circu lar ord er is constant and determined b y the initial informa tion (if any) on the state of each chan nel. The myop ic action is to stay in the same chann el when it is good (state 1 ) and switch to th e next channel in the circular order when it is b ad. In the ca se of p 11 < p 01 , the circular o rder is reversed in every slot with the initial ord er deter mined by the initial inform ation o n channel states. The myo pic policy stays in the same cha nnel when it is bad; o therwise, it switches to the next channel in the current circular order 2 . The significan ce of this result in terms of the p ractical implementatio ns o f myopic sensing is tw of old. First, it demon- strates the simplicity of myop ic sensing: chan nel selection is reduced to a simple roun d-robin pro cedure. The myop ic sens- ing p olicy req uires no compu tation and little memory . Second , it shows th at myo pic sen sing is ro bust to mo del mismatch. Specifically , the my opic sen sing p olicy h as a semi-universal structure; it ca n be implemen ted without k nowing th e ch annel transition prob abilities. The only requ ired inf ormation about the chan nel model is the o rder of p 11 and p 01 . As a result, the myopic sensing policy automatica lly tracks variations in the channel model p rovided that th e o rder o f p 11 and p 01 remains unchan ged. Note that when p 11 = p 01 , chan nel states beco me indepen dent in time; all chann el selections lead to the same perfor mance. W e thu s expect that myop ic sensin g is robust to estimation error s in the or der of p 11 and p 01 , which u sually occur when p 11 ≈ p 01 . This has been c onfirmed b y simulation results [6]. 2) Optimality o f Myopic Sen sing: Surprisingly , the m yopic sensing policy with such a simple and robust stru cture is, in fact, op timal a s established in this pap er for N = 2 . Based on numerical results, we conjectu re that the optimality of the myopic policy can be genera lized to N > 2 . The o ptimality along with the simp le an d robust structure makes the myop ic sensing policy particu larly ap pealing. In a recen t work [ 8], based on the stru cture of the myo pic policy , the o ptimality result has b een extended to N > 2 un der 2 It is easy to sho w that p 11 > p 01 correspond s to the case where the channe l states in two consecuti ve slots are positi vel y corre lated, i.e., for any distrib ution of S ( t ) , w e have E [( S ( t ) − E [ S ( t )])( S ( t + 1) − E [ S ( t + 1 )])] > 0 , where S ( t ) is the state of the Gilber t-Elliot channe l in slot t . Similar , p 11 < p 01 correspond s to the case where S ( t ) and S ( t + 1) are negati vely correla ted, and p 11 = p 01 the case where S ( t ) and S ( t + 1) are indepe ndent. the condition of p 11 ≥ p 01 . While numerical results indicate that f or a wide r ange o f p 11 and p 01 , the myopic po licy is also optimal fo r N > 2 with p 11 < p 01 , patho logical cases where optim ality fails have bee n found when p 01 − p 11 is close to 1 . Nevertheless, the pe rforman ce loss of the myop ic policy in these cases is minimal and tends to d iminish with the hor izon len gth. Establishing necessary and/o r sufficient condition s (po tentially in the f orm of boun ding p 01 − p 11 ) under which the my opic po licy is optima l for p 11 < p 01 appears to b e challeng ing. It is our ho pe th at results an d approa ches p resented in this p aper , in p articular, the sim ple structure of the m yopic policy , may stimulate fresh idea s for completing the picture o n the optimality of the myopic policy . 3) P erfo rmance of Myopic S ensing: The optimality of th e myopic sensing p olicy motiv ates the performan ce a nalysis, as its perform ance defines the throughp ut limit of a multi-chan nel oppor tunistic c ommunica tion system und er the i.i.d . Gilbert- Elliot chan nel mod el. W e are p articularly interested in th e relationship between the maximum throug hput and the numbe r of channels. Closed-form expre ssions for the p erforma nce of POMDP and re stless bandit p olicies ar e ra re. For this pro blem at hand, the simp le stru cture of the my opic po licy ag ain render s an exception. Spec ifically , based on th e stru cture of the my- opic po licy , we show that its per formanc e is determin ed by the stationary distributions of a hig her-order countab le-state Markov chain . For N = 2 , we h av e a first-order Markov chain whose stationary distribution can be obtaine d in closed- form, leading to exact cha racterizations o f th e throug hput. For N > 2 , w e con struct first-ord er Mar kov p rocesses th at stochastically dom inate or are domin ated by this higher-order Markov ch ain. The stationary distributions of the former, again obtained in closed-fo rms, lead to lower a nd up per boun ds that monoto nically tighten as the n umber N of ch annels increases. These ana lytical charac terizations allow us to study the rate at which the maximu m thro ughpu t o f a n oppo rtunistic system increases with N , and to obtain the limitin g perfor mance as N approach es to infinity . Our r esult demon strates that the maximum th rough put of a m ulti-chann el o pportu nistic sy stem with single-chan nel sensing saturates at geometric rate as the number of channels increases. This result sugg ests to system designers the importan ce of having radios capable of sensing multiple chan nels in order to fully exploit th e commun ication oppor tunities offered b y a large number of channels. C. Re lated W ork The structure, optimality , and performan ce analysis of my- opic sensing in the context o f o pportun istic access may bear significance in the general context of restless mu lti-armed bandit pro cesses. While an index policy (Gittins index [11]) is known to be optimal for the classical ban dit problems, th e structure of th e optim al po licy for a gen eral re stless bandit process rema ins unknown, and th e prob lem is shown to be PSP A CE-hard [12] . Whittle pro posed a Gittins-like heu ristic index policy for r estless band it prob lems [ 7], which is a symp- totically op timal in ce rtain limiting regime [13]. Beyond th is asymptotic result, relatively little is k nown ab out the structur e 3 ω i ( t + 1) =    p 11 , a ( t ) = i , S a ( t ) ( t ) = 1 p 01 , a ( t ) = i , S a ( t ) ( t ) = 0 ω i ( t ) p 11 + (1 − ω i ( t )) p 01 , a ( t ) 6 = i . (1) of the op timal p olicies for a gen eral restless bandit p rocess. The existing literatu re m ainly focuses on appr oximation alg o- rithms and heu ristic policies [9] , [1 0]. The op timality o f the myopic policy shown in this pa per sugg ests non -asymptotic condition s und er wh ich an ind ex policy can be optimal for restless bandit processes. The r esults p resented in this pap er apply to cogn iti ve rad io networks, which h as received increasing attentio n recently . In this con text, the design o f sensing po licies for tracking the rapidly varying spec trum oppor tunities has been addressed in [ 14], [15 ] under a general Markvian mod el of potentially correlated chann els, where a POMDP framework has been developed. This paper is also r elated to cha nnel prob ing and trans- mission strategies in multichan nel wireless system s (see [ 16]– [19] and refer ences therein ). In contr ast to the Markovian model co nsidered in this paper, these existing r esults ado pt a memoryless channel m odel. I I . P R O B L E M F O R M U L AT I O N W e con sider the scenario wh ere a user is tryin g to access N indepe ndent and stoch astically ide ntical ch annels u sing a slotted transmission structure. The state S i ( t ) of chann el i in slot t is given b y a two-state Markov chain shown in Fig. 1. At the beginning of ea ch slot, th e user selects one of the N channels to sense. If the ch annel is sen sed to b e go od (state 1 ) , the user transmits and collec ts one unit of r e ward. Otherwise, the user do es no t transmit (or transmits at a lo wer rate), c ollects no re ward, and waits u ntil the ne x t s lot to make another choice. The objective is to maximize the average reward (thro ughpu t) over a horizo n of T slots by choo sing judiciously a sensing policy that governs ch annel selection in each slot. Due to limited sensing, the full system state [ S 1 ( t ) , · · · , S N ( t )] ∈ { 0 , 1 } N in slot t is n ot ob servable. The user , howe ver , can infer the state f rom its decision and ob servation history . It has been shown that a su fficient statistic for optim al decision making is giv en by the condition al probab ility th at e ach ch annel is in state 1 giv en all past decision s and observations [ 20]. Referr ed to as the b elief vector, this sufficient statistic is deno ted by Ω( t ) ∆ = [ ω 1 ( t ) , · · · , ω N ( t )] , where ω i ( t ) is the conditional probab ility that S i ( t ) = 1 . Given the sensing actio n a ( t ) and the observation S a ( t ) ( t ) in slot t , the belief vector for slot t + 1 can be ob tained v ia Bay es Rule as g i ven in (1). A sensing policy π specifies a sequen ce o f function s π = [ π 1 , π 2 , · · · , π T ] , where π t is the decision rule at time t that maps a belief vector Ω( t ) to a sensing action a ( t ) ∈ { 1 , · · · , N } for slot t . Multi-chann el opp ortunistic access can thus be formulated as the following stochastic control problem. π ∗ = arg max π E π " T X t =1 R π t (Ω( t )) ( t ) | Ω(1) # , (2) where π t (Ω( t )) is the chann el selected an d R π t (Ω( t )) ( t ) = S π t (Ω( t )) ( t ) the rew ard so ob tained when the belief is Ω( t ) , and Ω(1) is th e initial belief vector . If no inform ation abo ut the initial system state is available, each en try of Ω(1 ) can be set to the stationary d istribution ω o of the underlying Markov chain: ω o = p 01 p 01 + p 10 . (3) This p roblem falls into the g eneral mod el of POMDP . It can also b e co nsidered as a restless multi-arme d bandit p roblem by treating th e belief value of each channe l as th e state of each arm of a band it. Note that for a given sensing policy π , the belief vecto rs { Ω( t ) } T t =1 form a M arkov process with an uncou ntable state space. The expectation in (2) is with respect to this Markov p rocess which d etermines the reward process. Th e difficulty in obta ining the o ptimal policy π ∗ and character izing its p erforman ce largely resu lts from the complexity o f a nalyzing a Markov pro cess with uncountab le state space. I I I . O P T I M A L P O L I C Y V S . M Y O P I C P O L I C Y A. V alue Function and Optimal P o licy Let V t (Ω( t )) be the value f unction, which represents the maximum expected to tal r e ward that can be obtained starting from slo t t giv en the current belief vector Ω( t ) . Given th at the u ser takes action a and ob serves S a ( t ) in slot t , the rew ard th at can be accum ulated starting fro m slot t con sists of two par ts: the expected imm ediate r ew ard E [ R a ( t )] = E [ S a ( t )] = ω a ( t ) and the maximu m expected fu ture reward V t +1 ( T (Ω( t ) | a, S a ( t ))) , w here T (Ω( t ) | a, S a ( t )) d enotes the updated belief vector f or slot t + 1 as given in (1). A veraging over all possible ob servations S a ( t ) and m aximizing over all actions a , we arriv e at th e followi n g op timality equations. V T (Ω( T )) = max a =1 , ··· ,N ω a ( T ) V t (Ω( t )) = max a =1 , ··· ,N { ω a ( t ) + ω a ( t ) V t +1 ( T (Ω( t ) | a, 1 )) + (1 − ω a ( t )) V t +1 ( T (Ω( t ) | a, 0 )) } . (4) In theory , the optim al policy π ∗ and its performan ce V 1 (Ω(1)) can be ob tained by solving the a bove dy namic progr am. Un fortuna tely , this approach is compu tationally pro- hibitive du e to the impact of the curr ent action o n the f uture rew ard and the unc ountable space of the belief vector Ω( t ) . Even if ap proxima te numer ical solutions are feasible, they do not provide in sights fo r sy stem design or analytical character- izations of the optimal performanc e V 1 (Ω(1)) . B. Myopic P o licy A myop ic policy ig nores the imp act o f th e cu rrent actio n on the futu re rew ard, fo cusing sole ly on maximizin g the expected immediate rew ar d E [ R a ( t )] . Myopic policies are thus 4 stationary: th e map ping from b elief vector s to actio ns does n ot change w ith time t . The myopic action ˆ a ( t ) and the value function ˆ V t (Ω( t )) o f the myopic po licy for a given belief vector Ω( t ) ar e g iv en by ˆ a ( t ) = arg m a x a =1 , ··· ,N ω a ( t ) , (5) ˆ V t (Ω( t )) = ω ˆ a ( t ) ( t ) + ω ˆ a ( t ) ( t ) ˆ V t +1 ( T (Ω( t ) | ˆ a ( t ) , 1)) +(1 − ω ˆ a ( t ) ( t )) ˆ V t +1 ( T (Ω( t ) | ˆ a ( t ) , 0)) . In general, o btaining the m yopic ac tion in each slot requir es the recursiv e update o f the be lief vecto r Ω( t ) as gi ven in (1), which require s the knowledge of the transition pro babilities { p ij } . In the n ext section, we show that the myo pic policy has a simple semi-universal structure that does n ot need the update of th e belief vector or the k nowledge of the transition probab ilities. I V . S T RU C T U R E O F M Y O P I C S E N S I N G In this section, we establish the simp le and robust structure of the myopic policy , which lay s out th e fo undation f or the optimality proo f an d performan ce analysis in subseq uent sections. A. Structur e The basic element in th e stru cture of the myopic policy is a cir cular or dering K o f the channels. For a cir cular or der, the starting point is irr elev ant: a cir- cular ord er K = ( n 1 , n 2 , · · · , n N ) is equiv alent to ( n i , n i +1 , · · · , n N , n 1 , n 2 , · · · , n i − 1 ) for any 1 ≤ i ≤ N . An example of a circular orde r is g i ven in Fig. 2, where all N channels are placed on a cir cle in the clock wise d irection. W e now introdu ce th e following n otations. For a circular order K , let −K den ote its reverse circular o rder, i.e., f or K = ( n 1 , n 2 , · · · , n N ) , we h av e −K = ( n N , n N − 1 , · · · , n 1 ) (see Fig. 3 for an illustration where the lower c ircle on the right shows the reverse circu lar order of that g i ven by the c ircle on the left). For a ch annel i , let i + K denote the next ch annel in the circular order K . For example, for K = (1 , 2 , · · · , N ) , we h av e i + K = i + 1 for 1 ≤ i < N and N + K = 1 . W ith th ese no tations, we present th e structu re of th e myo pic policy in Th eorem 1. Theor em 1: Structure of Myopic Sensing: Let Ω(1) = [ ω 1 (1) , · · · , ω N (1)] deno te th e initial belief vector . The circular ch annel o rder K (1) in slot 1 is determine d b y a descendin g ord er of Ω(1) ( i.e., K (1) = ( n 1 , n 2 , · · · , n N ) implies that ω n 1 (1) ≥ ω n 2 (1) ≥ · · · ≥ ω n N (1) ). Let ˆ a (1) = arg max i =1 , ··· ,N ω i (1) . The myo pic action ˆ a ( t ) in slot t ( t > 1 ) is giv en as follows. • Case 1: p 11 ≥ p 01 ˆ a ( t ) =  ˆ a ( t − 1) , if S ˆ a ( t − 1) ( t − 1 ) = 1 ˆ a ( t − 1) + K ( t ) , if S ˆ a ( t − 1) ( t − 1 ) = 0 , (6) where K ( t ) = K (1) . • Case 2: p 11 < p 01 ˆ a ( t ) =  ˆ a ( t − 1 ) if S ˆ a ( t − 1) ( t − 1) = 0 ˆ a ( t − 1 ) + K ( t ) if S ˆ a ( t − 1) ( t − 1) = 1 , (7) where K ( t ) = K (1) when t is odd and K ( t ) = −K (1 ) when t is e ven. Pr oof: See Appendix A. Theorem 1 shows that the b asic stru cture of the myo pic policy is a rou nd-rob in scheme b ased on a circu lar ord ering of th e c hannels. F or p 11 ≥ p 01 , the circu lar or der is constant: K ( t ) = K (1) in every slot t , wh ere K (1 ) is determ ined by a descen ding order of the in itial belief values. The myo pic action is to stay in th e same channe l when it is g ood (state 1 ) and switch to the next ch annel in the circula r o rder wh en it is bad (see Fig . 2 for a n illustration). 1 N N−1 5 4 3 2 P S f r a g r e p l a c e m e n t s S 1 = 0 K ( t ) = K ( 1) Fig. 2. The structure of the myopic polic y for p 11 ≥ p 01 : the circu lar order of the channe ls is constant and dete rmined by the initia l belief Ω(1) ( ω 1 (1) ≥ ω 2 (1) ≥ · · · ≥ ω N (1) is assumed in this exampl e, thus ˆ a (1) = 1 ); the myopic policy switche s to the next channel when the current one is in the bad state. In the case of p 11 < p 01 , the circu lar o rder is reversed in ev ery slot, i. e., K ( t ) = K (1) wh en t is od d and K ( t ) = −K (1) when t is even, where the initial o rder K (1) is determined by the in itial b elief values. The myopic po licy stays in the same channe l when it is bad; otherwise, it switches to the next channe l in the cu rr ent circular o rder K ( t ) , which is eith er K (1) or −K (1) d ependin g on whether the cur rent time t is odd or e ven. An illu strated is gi ven in Fig. 3 . An alternative way to see the chan nel switching structure of the m yopic policy is throu gh the last visit to each ch annel (once every chan nel has been visited at least on ce). Specif- ically , for p 11 ≥ p 01 , wh en a chann el switch is need ed, the policy selects the chan nel v isited the long est tim e ago. For p 11 < p 01 , when a channel switch is needed, the policy selects, among tho se chann els to which the last visit occu rred an e ven number o f slots ago, the one most r ecently v isited. If there are no such c hannels, th e user chooses the c hannel v isited th e longest time ago ( see Appendix B for a p roof). B. Pr operties The simple stru cture of the my opic p olicy has signif- icant implications in both practica l and tech nical aspects. Implemen tation-wise, the following properties of th e myopic policy follow fro m its structure: belief-independ ence and model-insen sitivity . Specifically , the myopic policy d oes n ot require the u pdate of the be lief vectors o r the k nowledge of the transition pro babilities excep t the ord er o f p 11 and 5 p 11 ≥ p 01 p 11 < p 01 q ~ i , ~ j = ( Q N k =1 p i k ,j k if i 1 = 1 p i 1 ,j N Q N k =2 p i k ,j k − 1 if i 1 = 0 , q ~ i , ~ j = ( Q N k =1 p i k ,j N − k +1 if i 1 = 1 p i 1 ,j 1 Q N k =2 p i k ,j N − k +2 if i 1 = 0 , (8) where ~ i = [ i 1 , i 2 , · · · , i N ] , ~ j = [ j 1 , j 2 , · · · , j N ] with e ntries equal to 0 o r 1 . 1 N N−1 5 4 3 2 1 N N−1 5 4 3 2 1 N N−1 5 4 3 2 P S f r a g r e p l a c e m e n t s t = 1 K (1) t = L K ( L ) = K (1) t = L K ( L ) = − K (1) L odd L ev en S 1 = 1 S 1 = 1 Fig. 3. The structure of the myopic polic y for p 11 < p 01 : in the first slot ( t = 1 ), the circula r order K (1) is determin ed by the initial belief Ω(1) ( ω 1 (1) ≥ ω 2 (1) ≥ · · · ≥ ω N (1) is assumed in this example, thus ˆ a (1) = 1 ). Suppose that channel 1 is in the bad state in slots 1 , · · · , L − 2 and in the good stat e in slot L − 1 . The circular order at t = L is K (1) when L is odd and −K (1) when L is eve n, and ˆ a ( L ) is the next channel in K ( L ) , i.e., ˆ a ( L ) = 2 for L odd and ˆ a ( L ) = N for L ev en. p 01 . These pro perties make the myo pic p olicy particular ly attractive in im plementation . Besides its simplicity , this semi- universal structure leads to r obustness against model mismatch and variations. A technical bene fit of this simple structure is that it pr ovides the foun dation for establishing the optim ality and chara cteriz- ing the per forman ce of the m yopic policy as given in Sec. V- VI, as well as the generalizatio ns of the optim ality pro of to N > 2 given in [8 ]. Th e rea son is that th e stru cture allows us to work with a Mar kov rew ard pro cess with a finite state space instead of one with an uncoun table state space ( i.e., belief vectors) as we encounter in a general POMDP . Details are stated in the co rollary below . Cor ollary 1: Let K ( t ) = ( n 1 , n 2 , · · · , n N ) ( n i ∈ { 1 , 2 , · · · , N } ∀ i ) be the circular o rder of chan nels in slot t , where the starting po int of the circu lar o rder is fixed to the m yopic action : n 1 = ˆ a ( t ) for all t . Th en the resulting ordered chan nel states ~ S ( t ) ∆ = [ S n 1 ( t ) , S n 2 ( t ) , · · · , S n N ( t )] } form a 2 N -state Markov ch ain with tr ansition p robabilities { q ~ i , ~ j } gi ven in (8), and th e perfo rmance of the my opic po licy is d etermined b y the M arkov reward process ( ~ S ( t ) , R ( t )) with R ( t ) = S n 1 ( t ) . Pr oof: Th e p roof follows directly fro m T heorem 1 by noticing that S n 1 ( t ) determin es the chan nel o rdering in ~ S ( t + 1) and each c hannel ev olves as indep endent Markov chains. Specifically , for p 11 ≥ p 01 , if S n 1 ( t ) = 1 , the channel ordering in ~ S ( t + 1) is th e same as that in ~ S ( t ) ; if S n 1 ( t ) = 0 , th e first channel (chan nel n 1 ) in ~ S ( t ) is moved to the last one in ~ S ( t + 1) with the ord ering of the re st N − 1 chan nels unchan ged. For p 11 < p 01 , if S n 1 ( t ) = 0 , the first channel in ~ S ( t ) remains the first in ~ S ( t + 1 ) while the orderin g of the rest channels is re versed; if S n 1 ( t ) = 1 , the order ing of all N channels are reversed. T he transition prob abilities g iv en in (8) thus follow . V . O P T I M A L I T Y O F M Y O P I C S E N S I N G In this section, we establish the op timality of the myopic policy for N = 2 . Our pr oof h inges on th e stru cture of the myopic policy given in Theor em 1 and Corollary 1. Theor em 2: Optimality of Myopic Sensing: For N = 2 , the myopic sensing policy is op timal, i.e ., ˆ V t (Ω) = V t (Ω) for all t and Ω . Pr oof: see Appendix C. Based on extensi ve numer ical results, we con jecture that the optimality of the myopic sensing policy ca n b e generalized to N > 2 . A recent work [8] has mad e p artial progr ess towards proving this conjecture , by showing that the optim ality holds for N > 2 under the co ndition of p 11 ≥ p 01 . Further more, it is shown in [8] that if the myo pic p olicy is o ptimal unde r the sum-reward criterio n over a fin ite ho rizon, it is a lso optimal for o ther cr iteria such as d iscounted and averaged rewards over a finite or infinite hor izon. In the case of infinite- horizon discounted reward, it is determine d th at so long as the d iscount factor is less than 0.5 , the myop ic policy is optim al fo r all N . V I . P E R F O R M A N C E O F M YO P I C S E N S I N G In this sectio n, we analy ze the perf ormance of the myo pic policy . W ith the optimality r esults, th e through put achieved by the myopic policy defin es the per formanc e limit of a m ulti- channel opp ortunistic co mmunicatio ns system. In particu lar , we are interested in the r elationship b etween this m aximum throug hput and the number N of channels. A. Uniquene ss o f Steady- State P erformance and Its Nume rical Evaluatio n W e first establish the existence and un iqueness of the system steady states under the myopic p olicy . Th e steady- state 6 throug hput of the myopic policy is given by U (Ω(1)) ∆ = lim T →∞ ˆ V 1: T (Ω(1)) T , (9) where ˆ V 1: T (Ω(1)) is the expected total rew ard o btained in T slots under the myop ic p olicy when the initial belief is Ω(1) . From Coro llary 1, U (Ω(1)) is determined by th e Mar kov rew ard pro cess { ~ S ( t ) , R ( t ) } . It is ea sy to see that the 2 N -state Markov chain { ~ S ( t ) } is irred ucible an d aperiod ic, th us has a limiting distribution. As a c onsequen ce, the limit in (9) exists, and th e steady-state through put U is ind ependen t of the initial belief value Ω(1) . Corollary 1 also provides a numerical app roach to ev alu- ating U by calculating th e limiting (stationary ) distribution of { ~ S ( t ) } whose transition prob abilities are giv en in (8). Specifically , the th rough put U is given by the summ ation of the limiting probabilities of th ose 2 N − 1 states with first entr y S (1) = 1 . T his nu merical app roach, h owe ver , does not provide an an alytical ch aracterization of the throu ghput U in terms of the numb er N of cha nnels and the tr ansition pr obabilities { p i,j } . In the n ext section , we o btain analytical expr essions of U and its scaling beh avior with respect to N b ased on a stochastic dominance argument. B. Analytical Characterization o f Th r oughpu t 1) The Structure of T ransmission P eriod: From th e struc- ture of the myop ic policy we can see that the key to the throug hput is how often th e user switche s chann els, or equiv- alently , how long the user stays in the same channel. Whe n p 11 ≥ p 01 , the event of cha nnel switching is equivalent to a slot withou t reward. T he opp osite h olds when p 11 < p 01 : a channel switching correspond s to a slot with r ew ard. W e thus intro duce th e conc ept o f transmission period (TP) , which is the time the user stays in th e sam e chann el (see Fig. 4). Let L k denote the length of the k th TP . W e then have a d iscrete-time rando m process { L k } ∞ k =1 with a state space of positive integers. P S f r a g r e p l a c e m e n t s channel switching L k = 3 L k +1 = 6 t Fig. 4. The transmission period structure . Based on the structure o f the myopic p olicy , we hav e U =    lim K →∞ Σ K k =1 ( L k − 1) Σ K k =1 L k , p 11 ≥ p 01 lim K →∞ Σ K k =1 1 Σ K k =1 L k , p 11 < p 01 . . (10) Let ¯ L = lim K →∞ P K k =1 L k K denote the average len gth of a TP . The above equation leads to U = ( 1 − 1 / ¯ L, p 11 ≥ p 01 1 / ¯ L, p 11 < p 01 . (11) Throu ghput analysis is thus reduced to analyzing the average TP len gth ¯ L . For N = 2 , a closed-f orm expression of ¯ L can be ob tained, which leads to a closed-f orm expression of th e throug hput U (see Sec. VI-B.2). F or N > 2 , lower and upper bound s on U ar e o btained (see Sec. VI- B.3). 2) Thr ou ghput for N = 2 : From the struc ture of the myopic po licy , { L k } ∞ k =1 form a first-order Mar kov chain for N = 2 . Specifically , the d istribution of L k is determin ed by the belief value of the chosen channel in the fir st slot of the k -th TP . The latter e quals to p ( L k − 1 +1) 01 for p 11 ≥ p 01 and p ( L k − 1 +1) 11 for p 11 < p 01 , wh ere p ( j ) 01 is the j -step tra nsition probab ility . The transition probab ilities of { L k } ∞ k =1 are thus giv en as follows. • For p 11 ≥ p 01 , r ij = ( 1 − p ( i +1) 01 , i ≥ 1 , j = 1 p ( i +1) 01 p j − 2 11 p 10 , i ≥ 1 , j ≥ 2 . . (12) • For p 11 < p 01 , r ij = ( p ( i +1) 11 , i ≥ 1 , j = 1 p ( i +1) 10 p j − 2 00 p 01 , i ≥ 1 , j ≥ 2 . (13) As shown in App endix D, the limiting distribution { λ l } ∞ l =1 of this co untable-state Marko v chain can be obtained in closed- form, which leads to ¯ L = P ∞ l =1 l λ l and then the thr oughp ut U fr om ( 11). Theor em 3: For N = 2 , the throug hput U is gi ven by U = ( 1 − 1 − p 11 1+ ¯ ω − p 11 , p 11 ≥ p 01 p 01 1 − ¯ ω ′ + p 01 , p 11 < p 01 , (14) where ¯ ω and ¯ ω ′ are the expected pr obability that the ch annel the user switches to is in state 1 wh en p 11 ≥ p 01 and p 11 < p 01 , respectively . They are gi ven in (15) an d (16). Pr oof: See Appendix D. 3) Thr ou ghput fo r N > 2 : For N > 2 , { L k } ∞ k =1 is a random pro cess with hig her-order me mory . In particular, for p 11 ≥ p 01 , it is an ( N − 1) -th o rder Markov chain. As a consequen ce, closed-fo rm expressions of ¯ L a re difficult to obtain. Our ob jectiv e is to develop lower a nd upp er bo unds on U , which would allow u s to stud y the scaling behavior of U with respect to N . The ap proach is to constru ct first-order M arkov chains that stochastically dominate or are do minated by { L k } ∞ k =1 . The stationary distributions of these first-or der Markov ch ains, which can be obtained in closed-fo rm, lead to lower and u pper bound s on U accord ing to (1 1). Spe cifically , f or p 11 ≥ p 01 , a lower boun d on U is obtained b y constru cting a first-or der Markov chain whose stationary d istribution is stochastically dominated by the stationary distrib ution of { L k } ∞ k =1 . An upper bound on U is given by a first-ord er Markov ch ain wh ose stationary distribution stochastically domin ates the stationary distribution of { L k } ∞ k =1 . Similarly , b ounds o n U can be obtained for p 11 < p 01 . Theor em 4: For N > 2 , we have the following lower an d upper bounds on the thr oughp ut U . • Case 1: p 11 ≥ p 01 C C + (1 − D + C )(1 − p 11 ) ≤ U ≤ ω o 1 − p 11 + ω o , (17) 7 ¯ ω = p (2) 01 1 + p (2) 01 − A , where p (2) 01 = p 00 p 01 + p 01 p 11 , A = p 01 1 + p 01 − p 11 (1 − ( p 11 − p 01 ) 3 (1 − p 11 ) 1 − ( p 11 ) 2 + p 11 p 01 ) , ( 15) ¯ ω ′ = B 1 − p (2) 11 + B , where p (2) 11 = p 10 p 01 + p 11 p 11 , B = p 01 1 + p 01 − p 11 (1 + ( p 11 − p 01 ) 3 (1 − p 11 ) 1 − (1 − p 01 )( p 11 − p 01 ) ) . (16) where ω o is gi ven by (3) an d C = ω o (1 − ( p 11 − p 01 ) N ) , D = ω o (1 − ( p 11 − p 01 ) N +1 (1 − p 11 ) 1 − p 2 11 + p 11 p 01 ) . • Case 2: p 11 < p 01 1 − p (2) 10 E − p 01 H ≤ U ≤ 1 − p (2) 10 E − p 01 G , (18) where p (2) 10 = p 10 p 00 + p 11 p 10 , E = p (2) 10 (1 + p 01 ) + p 01 (1 − F ) , F = (1 − p 01 )(1 − ω o ) ( 1 2 − p 01 − p 01 ( p 11 − p 01 ) 4 1 − ( p 11 − p 01 ) 2 (1 − p 01 ) 2 ) , G = (1 − ω o )( 1 2 − p 01 − p 01 ( p 11 − p 01 ) 6 1 − ( p 11 − p 01 ) 2 (1 − p 01 ) 2 ) , H = (1 − ω o )( 1 2 − p 01 − p 01 ( p 11 − p 01 ) 2 N − 1 1 − ( p 11 − p 01 ) 2 (1 − p 01 ) 2 ) . • Mono tonicity: in bo th cases, th e up per bound is in- depend ent of N wh ile the lower bo und mo notonically approa ches to the upp er b ound as N increases; f or p 11 ≥ p 01 , the lower bo und converges to the upper boun d as N → ∞ . Pr oof: See Appendix E. Numerical resu lts g iv en in [6] have demo nstrated the tight- ness of the bo unds: the relativ e difference between the lower and the up per b ounds is within 6% fo r a wide ran ge of transition probab ilities { p i,j } . The mo notonicity of the difference b etween the upper and lower boun ds with r espect to N shows that the perfor mance of the multi-cha nnel opportu nistic system improves with the number N of channe ls, as sug gested by intuition. For p 11 ≥ p 01 , the up per bou nd gives the limiting per forman ce of th e oppor tunistic system wh en N → ∞ . In Corollary 2 below , w e show that the throug hput of an oppo rtunistic system increa ses to a con stant at (at least) geo metric rate as N increases. This result conv eys an im portant m essage regarding system design: the thro ughpu t o f a multi-ch annel opportu nistic s y stem with single-ch annel sen sing q uickly saturates as th e numb er of ch annels incr eases; it is th us crucial to enhance ra dio sensing capability in order to fully exploit th e commun ication oppor tunities offered b y a large number of channels. Cor ollary 2: For p 11 > p 01 , the lower bo und on th rough put U converges to the constan t upper b ound at geometr ical rate ( p 11 − p 01 ) as N in creases; for p 11 < p 01 , the lower b ound on U converges to a constant at geometrica l r ate ( p 01 − p 11 ) 2 . Pr oof: See Appendix F . V I I . C O N C L U S I O N A N D F U T U R E W O R K W e h av e conside red an optimal sensing pr oblem that is of fun damental interest in contexts inv olving op portun istic commun ications over multiple channels. W e hav e shown that for indep endent an d identically ev o lving c hannels, the myopic sensing policy has a simple round -robin structure, which obviates the n eed to know the exact chann el param eters, making it extrem ely easy to implemen t in practice. W e have proved that the myop ic policy is op timal for th e two-channel case. W e have also characteriz ed in closed-for m the throu ghput perfor mance of the myopic policy and the scaling behavior with respect to th e number o f channels. Future directions include sensing policies for n on-iden tical channels and with multi- channel sensing. In a recent work [21], the existence of Wh ittle’ s index policy and the closed- form expression of Whittle’ s in dex h av e been obtain ed, lead - ing to a simple, n ear-optimal index p olicy for non- identical channels with multi-chann el sensing. Fu rthermor e, it is shown in [ 21] th at the myopic p olicy is equiv alen t to Whittle’ s index policy when channels are identical. The resu lts obtained in this paper on the m yopic p olicy thu s also apply to Whittle’ s index policy . T he structu re and optimality of th e m yopic policy is also extended to multich annel sensing in [22] . It is also of interest to co nsider sensing policies for multiple users competing for comm unication o pportu nities in multiple channels. Recent work on e xten ding the myopic sensing policy to multi-user scenarios can b e f ound in [23], [24]. A P P E N D I X A : P RO O F O F T H E O R E M 1 W e p rove The orem 1 by showing that the chan nel ˆ a ( t ) given by (6) an d (7) is indeed th e ch annel with the largest belief value in slot t . Specifically , we prove the following lem ma. Lemma 1: L et ˆ a ( t ) = i 1 be the c hannel determ ined b y (6) for p 11 ≥ p 01 and by (7) f or p 11 < p 01 . Let K ( t ) = ( i 1 , i 2 , · · · , i N ) be the c ircular ord er of chan nels in slot t , where we set the star ting po int to ˆ a ( t ) = i 1 . W e then h a ve, for any t ≥ 1 , ω i 1 ( t ) ≥ ω i 2 ( t ) ≥ · · · ≥ ω i N ( t ) , (19) i.e., the chann el given by (6) and (7) has the largest belief value in ev ery slot t . T o prove Lemma 1, we introdu ce operator τ ( · ) for the b elief update of u nobserved ch annels ( see (1)). τ ( ω ) ∆ = ω p 11 + (1 − ω ) p 01 = p 01 + ω ( p 11 − p 01 ) . (20) 8 Note that τ ( ω ) is an incr easing fu nction o f ω f or p 11 > p 01 and a d ecreasing functio n of ω for p 11 < p 01 . Furtherm ore, we note that the belief value ω i ( t ) of chann el i in slot t is boun ded between p 01 and p 11 for any i and t > 1 , an d an obser ved channel achieves either th e uppe r b ound or th e lower bound of the belief v alue s (see (1)). W e now p rove Lemma 1 b y indu ction. For t = 1 , ( 19) ho lds by the definition of K (1) . Assume th at (19) is tru e for slot t , where K ( t ) = ( i 1 , i 2 , · · · , i N ) and ˆ a ( t ) = i 1 . W e sh ow th at it is also true f or slot t + 1 . Consider first p 11 ≥ p 01 . W e have K ( t + 1) = K ( t ) = ( i 1 , i 2 , · · · , i N ) . When S i 1 ( t ) = 1 , we have ˆ a ( t + 1) = ˆ a ( t ) = i 1 from (6). Sin ce ω i 1 ( t + 1) = p 11 achieves the up per bou nd of the belief v alu es and the ord er of the belief values of the unobser ved ch annels remain s unchang ed due to the monoto n- ically increasing pr operty of τ ( ω ) , we ar riv e at (19) fo r t + 1 . When S i 1 ( t ) = 0 , we have ˆ a ( t + 1) = i 2 from (6). W e again have (19) b y noticing that ω i 1 ( t + 1) = p 01 achieves the lower bound of the belief values and K ( t + 1) = ( i 2 , i 3 , · · · , i N , i 1 ) when the starting point is set to ˆ a ( t + 1) = i 2 . For p 11 < p 01 , K ( t + 1) = −K ( t ) = ( i 1 , i N , i N − 1 , · · · , i 2 ) . When S i 1 ( t ) = 0 , we h av e ˆ a ( t + 1) = ˆ a ( t ) = i 1 from (7). Since ω i 1 ( t + 1 ) = p 01 achieves the upp er bound of the belief values and th e o rder of the belief values of the unobser ved channe ls is rev er sed due to the mon otonically decreasing proper ty of τ ( ω ) , we have, from the induction assumption at t , ω i 1 ( t + 1) ≥ ω i N ( t + 1) ≥ ω i N − 1 ( t + 1) ≥ · · · ≥ ω i 2 ( t + 1) , which agrees with (19) for t + 1 and K ( t + 1) = ( i 1 , i N , i N − 1 , · · · , i 2 ) . When S i 1 ( t ) = 1 , we have ˆ a ( t + 1) = i N from (7 ). W e again have (19) by n oticing that ω i 1 ( t + 1) = p 11 achieves the lower bou nd o f the belief values and K ( t + 1) = ( i N , i N − 1 , · · · , i 2 , i 1 ) when the starting po int is set to ˆ a ( t + 1) = i N . T his conclu des the proof of Lemma 1, hence Theorem 1. A P P E N D I X B : L A S T C H A N N E L V I S I T S A N D j - S T E P T R A N S I T I O N P RO BA B I L I T I E S As commen ted in Sec. IV, anoth er way to see the chann el switching struc ture of the myopic po licy is th rough the last visit to each chan nel once every ch annel h as been visited at least once. An altern ativ e p roof of this structur e is ba sed on p roperties of the j -step transition pr obabilities p ( j ) 01 and p ( j ) 11 [25]. p ( j ) 01 = p 01 − p 01 ( p 11 − p 01 ) j p 01 + p 10 , (21) p ( j ) 11 = p 01 + p 10 ( p 11 − p 01 ) j p 01 + p 10 . (22) It is easy to see that f or p 11 > p 01 , p ( j ) 01 monoto nically increases to the station ary distribution ω o as j increases. For p 11 < p 01 , p ( j ) 11 oscillates aro und and conv erges to ω o with p ( j ) 11 > ω 0 for even j ’ s and p ( k ) 11 < ω 0 for odd j ’ s (see Fig. 5 and 6). The chann el switching stru cture thus follows b y noticing that ch annel switching occu rs o nly af ter observing 0 for p 11 ≥ p 01 and after observing 1 f or p 11 < p 01 . P S f r a g r e p l a c e m e n t s p 01 j ω o p ( j ) 01 Fig. 5. The j -step transition probabilit ies of the Gilbert-Elli ot channel when p 11 > p 01 . P S f r a g r e p l a c e m e n t s p 11 p 01 ω o j 1 2 3 4 p ( j ) 11 Fig. 6. The j -step transition probabilit ies of the Gilbert-Elli ot channel when p 11 < p 01 . A P P E N D I X C : P R O O F O F T H E O R E M 2 Recall that ˆ V t (Ω) denotes the total expected r e ward obtained under th e m yopic po licy starting f rom slot t . Let ˆ V t (Ω; a ) denote the total expected rew ar d obtained by action a in slot t followed by the myopic po licy in future slots. W e first establish the following lemma which applies to a gene ral POMDP/MDP . Lemma 2: For a POMDP over a fin ite ho rizon T , th e myopic policy is op timal if for t = 1 , · · · , T , ˆ V t (Ω) ≥ ˆ V t (Ω; a ) , ∀ a, Ω . (25) Lemma 2 can be proved b y backward ind uction. Spe cifically , the in itial co ndition ˆ V T (Ω) = V T (Ω) is straightforward. Assume that ˆ V t +1 (Ω) = V t +1 (Ω) . W e th en ha ve, from (25), ˆ V t (Ω) = max a =1 { R a (Ω) + X Ω ′ Pr[Ω ′ | Ω , a ] ˆ V t +1 (Ω) } = max a =1 { R a (Ω) + X Ω ′ Pr[Ω ′ | Ω , a ] V t +1 (Ω) } = V t (Ω) , i.e., th e m yopic policy is optimal. W e now p rove Theore m 2 based on Corollary 1. Considering all channel state r ealizations in slot t , we have ˆ V t (Ω; a ) = P s Pr[ S ( t ) = s | Ω] ˆ V t (Ω; a | S ( t ) = s ) = ω a + P s Pr[ S ( t ) = s | Ω] ˆ V t +1 ( T (Ω | a, s a ) | S ( t ) = s ) , (2 6) where ˆ V t +1 ( T (Ω | a, s a ) | S ( t ) = s ) is the conditio nal reward obtained starting from slot t + 1 given that the system state in 9 ˆ V t (1 | [1 , 0 ]) = p 01 + p 10 p 00 V t +1 (2 | [0 , 0 ]) + p 10 p 01 V t +1 (2 | [0 , 1 ]) + p 11 p 00 V t +1 (2 | [1 , 0 ]) + p 11 p 01 V t +1 (2 | [1 , 1 ]) , (2 3) ˆ V t (1 | [0 , 1 ]) = p 01 + p 00 p 10 V t +1 (1 | [0 , 0 ]) + p 00 p 11 V t +1 (1 | [0 , 1 ]) + p 01 p 10 V t +1 (1 | [1 , 0 ]) + p 01 p 11 V t +1 (1 | [1 , 1 ]) . (2 4) slot t is s . From Corollary 1, we have ˆ V t ( T (Ω | a, s a ) | S ( t − 1) = s ) = ˆ V t ( T (Ω ′ | a, s a ) | S ( t − 1) = s ) , (27) i.e., the conditional total expected rew ar d of the my opic policy starting from slot t is deter mined by th e action a in slot t − 1 and indep endent of the belief vector Ω in slot t − 1 (no te that a ( t − 1) and S ( t − 1) de termines ~ S ( t ) , which deter mines the rew ard p rocess). Adoptin g the simplified notation of ˆ V t ( a ( t − 1) | S ( t − 1) = s ) , we further hav e, fr om the statistically identical assumption o f channels, ˆ V t ( a ( t − 1) = 1 | S ( t − 1 ) = [ s 1 , s 2 ]) = ˆ V t ( a ( t − 1) = 2 | S ( t − 1 ) = [ s 2 , s 1 ]) . (28) Next we show that ˆ V t ( a ( t − 1) = 1 | S ( t − 1) = [1 , 0]) = ˆ V t ( a ( t − 1) = 1 | S ( t − 1) = [0 , 1])) . (29) Assume that p 01 > p 11 . Following the stru cture of the myopic policy , we know that the my opic action in slot t is ˆ a ( t ) = 2 for the lef t hand side of (29) and ˆ a ( t ) = 1 for the right, wh ich leads to (23) a nd ( 24). W e then ha ve (29) based on (28). The case of p 01 < p 11 can be similarly proved. Consider Ω = [ ω 1 , ω 2 ] with ω 1 ≥ ω 2 . The myopic action is thus a = 1 . W e now establish (25). From (26) and (28), we have ˆ V t (Ω; a = 1) = ω 1 + X i,j ∈{ 0 , 1 } Pr[ S ( t ) = [ i , j ]] ˆ V t +1 (1 | [ i, j ]) , ˆ V t (Ω; a = 2) = ω 2 + X i,j ∈{ 0 , 1 } Pr[ S ( t ) = [ i , j ]] ˆ V t +1 (1 | [ j, i ]) . It thus follows from (29) that ˆ V t (Ω; a = 1) − ˆ V t (Ω; a = 2) = ( ω 1 − ω 2 )(1 + ˆ V t +1 (1 | [1 , 0 ]) − ˆ V t +1 (1 | [0 , 1 ])) = ω 1 − ω 2 ≥ 0 . This conclud es the proof. A P P E N D I X D : P RO O F O F T H E O R E M 3 Consider first p 11 ≥ p 01 . Let R = { r i,j } denote the transition matr ix of { L k } ∞ k =1 , wher e r i,j is gi ven in (12). Let R (: , k ) den ote the k - th column o f R . W e have 1 − R (: , 1) = R (: , 2) p 10 , R (: , k ) = R (: , 2)( p 11 ) k − 2 , (30) where 1 is the u nit column vector [1 , 1 , ... ] t . By the definition of stationary distribution, we have, for k = 1 , 2 , · · · , [ λ 1 , λ 2 , · · · ] R (: , k ) = λ k , (31) which, combined with (30), lea ds to λ 1 = 1 − λ 2 (1 − p 11 ) , λ k = λ 2 p k − 2 11 . (32) Substituting (32) into (31) for k = 2 and solving for λ 2 , we have λ 2 = ¯ ωp 10 , where ¯ ω is given in ( 15). From (32), we then have the stationary distribution as λ k =  1 − ¯ ω, k = 1 ¯ ωp k − 2 11 p 10 , k > 1 , (33) which leads to (14) based on (11) an d ¯ L = P ∞ k =1 k λ k . The pro of for p 11 < p 01 is similar based on th e transition probab ilities given in (13). Based on Corollary 1, Th eorem 3 can also be p roved by calculating the stationary d istribution of { ~ S ( t ) } . A P P E N D I X E : P RO O F O F T H E O R E M 4 Case 1 : p 11 ≥ p 01 Let ω k denote the belief value of the chosen channe l in the first slot o f the k -th TP . The length L k ( ω k ) of this TP ha s the following distribution. Pr[ L k ( ω k ) = l ] =  1 − ω k , l = 1 ω k p l − 2 11 p 10 , l > 1 . (34) It is easy to see that if ω ′ ≥ ω , then L k ( ω ′ ) stochastically dominates L k ( ω ) . From the round-r obin structur e of the myopic policy , ω k = p ( J k ) 01 , where J k = P N − 1 i =1 L k − i + 1 . Based on th e monotonic increasing property of the j -step transition probability p ( j ) 01 (see (21) and Fig. 5) , we have ω k ≤ ω o , wh ere ω o is the stationary distribution of the Gilbert-Elliot chann el giv en in (3). L k ( ω o ) thus stochastically d ominates L k ( ω k ) , an d the expectation of the fo rmer, L k ( ω o ) = 1 + ω o 1 − p 11 , leads to the upp er boun d of U given in (17). Next, we pr ove the lower bo und of U by constructing a hypoth etical system whe re the initial belief v alue of the chosen channel in a TP is a lower bound o f th at in the real system. The average T P length in this hyp othetical sy stem is thus smaller than tha t in th e real system, leading to a lower bou nd on U based o n (11). Spe cifically , since ω k = p ( J k ) 01 and J k = P N − 1 i =1 L k − i + 1 ≥ N + L k − 1 − 1 , we have ω k ≤ p ( N + L k − 1 − 1) 01 . W e thus construct a h ypothe tical system g i ven by a first- order Mar kov ch ain { L ′ k } ∞ k =1 with the following transition probab ility r i,j . r i,j = ( 1 − p ( N + i − 1) 01 , i ≥ 1 , j = 1 p ( N + i − 1) 01 p j − 2 11 p 10 , i ≥ 1 , j ≥ 2 . (35) It can be shown that th e station ary distribution o f { L k } ∞ k =1 stochastically d ominates th at of the hyp othetical system { L ′ k } ∞ k =1 (see [6] fo r d etails). The latter can be obtaine d with the same techniques used in App endix D. Th e average len gth 10 of L ′ k can th us be calculated , leading to th e lower b ound given in (17). Case 2: p 11 < p 01 In this case, th e larger the in itial belief of the c hosen channe l in a giv en TP , the sma ller the av erag e length of the TP . On the other hand , (1 1) sho ws th at U is inversely prop ortional to th e average T P len gth. Thus, similar to the case of p 11 ≥ p 01 , we will construct hypoth etical systems where th e initial b elief o f th e chosen channel in a TP is an upp er bound o r a lower bound of that in the real system . The former lead s to an upp er bo und on U , the latter , a lower bound on U . Consider first th e upper boun d. Fro m th e structure of the myopic po licy , it is clear that when L k − 1 is od d, in the k - th TP , the user will switch to th e c hannel visited in the ( k − 2) -th TP . As a consequence , the initial b elief ω k of the k -th TP is giv en by ω k = p ( L k − 1 +1) 11 . When L k − 1 is ev en , we can show that ω k ≤ p ( L k − 1 +4) 11 . This is becau se that for N ≥ 3 and L k − 1 ev en, the user cann ot switch to a channe l visit ed L k − 1 + 2 slots ago, a nd p ( j ) 11 decreases with j fo r even j ’ s and p ( j ) 11 > p ( i ) 11 for any even j and od d i (see ( 22) an d Fig. 6). W e th us construct a h ypothetical system given by the fir st-order Markov chain { L ′ k } ∞ k =1 with the following transition proba bilities. r i,j =          p ( i +1) 11 , if i is odd, j = 1 p ( i +1) 10 p j − 2 00 p 01 , if i is odd, j ≥ 2 p ( i +4) 11 , if i is e ven, j = 1 p ( i +4) 10 p j − 2 00 p 01 , if i is even, j ≥ 2 . It can be shown that the station ary distribution of { L ′ k } ∞ k =1 is stochastically d ominated by th at of { L k } ∞ k =1 . The former leads to the upper b ound of U given in (18). W e n ow consider the lower bou nd. Similarly , ω k = p ( L k − 1 +1) 11 when L k − 1 is o dd. When L k − 1 is even, to find a lower bo und on ω k , we need to find the smallest o dd j such that the last v isit to the ch annel ch osen in the k -th TP is j slots ago . From the structure of the myo pic policy , the smallest feasible od d j is L k − 1 + 2 N − 3 , which correspon ds to the scenario wh ere all N ch annels a re visited in tur n fr om the ( k − N + 1) -th TP to the k -th TP with L k − N +1 = L k − N +2 = · · · = L k − 2 = 2 . W e thu s have ω k ≥ p ( L k − 1 +2 N − 3) 11 . W e then con struct a h ypothetica l system given b y the first- order Mar kov ch ain { L ′ k } ∞ k =1 with the following transition probab ilities. r i,j =          p ( i +1) 11 , if i is odd, j = 1 p ( i +1) 10 p j − 2 00 p 01 , if i is odd, j ≥ 2 p ( i +2 N − 3) 11 , if i is e ven, j = 1 p ( i +2 N − 3) 10 p j − 2 00 p 01 , if i is even, j ≥ 2 . The stationary distribution o f this hypothe tical system leads to the lower bou nd of U giv en in (18). A P P E N D I X F : P R O O F O F C O RO L L A RY 2 Let x = | p 11 − p 01 | . For p 11 > p 01 , af ter some simplifica- tions, the lower b ound has the fo rm a + b / ( x N + c ) , where a, b, c ( c 6 = 0) are co nstants. Th e upp er boun d is a + b/c . W e have | a + b/ ( x N + c ) − a − b/c | x N → b/c 2 as N → ∞ . Thus the lower bound conv erges to the upper bound with geometric r ate x . For p 11 < p 01 , the lower bound has the form d + e / ( x 2 N − 1 + f ) , wher e d, e, f ( f 6 = 0 ) ar e constants. It converges to d + e /f as N → ∞ . W e h av e | d + e/ ( x 2 N − 1 + f ) − d − e/f | x 2 N → e/ ( xf 2 ) as N → ∞ . T hus the lower b ound c on verges with geometric rate x 2 . A C K N OW L E D G E M E N T The auth ors would like to thank th e associate edito r and anonymou s reviewers for th eir inv aluab le comments and sug- gestions. R E F E R E N C E S [1] R. Knopp and P . Humblet, “Information capacity and power control in single cell multi-u ser communicati ons, ” in P r oc. Intl Conf . Comm. , (Seattl e, W A), pp. 331–335, June 1995. [2] Q. Z hao and B. Sadler , “A Surve y of Dynamic Spectrum Access, ” IEEE Signal Pr ocessing magazine , vol. 24, no. 3, pp. 79-89, May 2007. [3] E.N. Gilbert, “Capaci ty of burst-noise channels, ” Bell Syst. T ech. J. , vol. 39, pp. 1253-1265, Sept. 1960. [4] M. Zorzi, R. Rao, and L. Milstein, “Error statistic s in data transmission ov er fading channels, ” IEEE T rans. Commun. , vol . 46, pp. 1468-1477, Nov . 1998. [5] L.A. Johnston and V . Krishnamurthy , “Opportuni stic File Transfer over a Fading Channel: A POMDP Search Theory Formulatio n with Optimal Threshold Policies, ” IEEE T rans. W ir eless Communicati ons , vol. 5, no. 2, 2006. [6] K. Liu and Q. Zhao, “Link Throughput of Multi-C hannel Opportuni stic Access with Limite d Sensing, ” T echnica l Report, Univ . of California , Davi s, July , 2007, http://www .ece.ucdavi s.edu/ ∼ qzhao/Report.html . [7] P . Whittle , ”Restless bandits: Acti vity allocati on in a changing world” , in J ournal of Applied Probabi lity , V olume 25, 1988. [8] T . Javidi , B. Krishnamachari, Q. Zhao, and M. L iu, “Optimality of Myopic Sensing in Multi-Channe l Opportunist ic Access, ” in Pro c. of ICC , May , 2008 (an exte nded version s ubmitte d to IEE E T rans. on Informatio n Theory in May , 2008). [9] S. Guha and K. Munagala, “ Approximation algorithms for partia l- informati on based stochastic control with Marko vian re wards, ” Pro c. 48th IEEE Symposi um on F oundations of Computer Science (FOCS) , 2007. [10] D. Bertsimas and J. E. Ni ˜ no-Mora, “Restless bandi ts, linear program- ming relaxation s, and a primal-dual heuristic , ” in Operations Resear ch , 48(1), January-Feb ruary 2000. [11] J.C. Gittins, “Bandit Processes and Dynamic Allocation Indices, ” J our- nal of the Royal Stati stical Society , Series B, 41, pp. 148-177, 1979. [12] C. H. Papadimitri ou and J. N. Tsitsikl is, “The complexi ty of optimal queuein g network control. ” in Mathematic s of Operations Resear ch , V olume. 24, 1999 [13] R. R. W eber and G. W eiss, “On an index policy for restless bandit s, ” J ournal of A pplie d Pro bability , 27:637–648, 1990. [14] Q. Z hao, L. T ong, A. Swami, and Y . Chen “Decentrali zed cogniti ve MA C for opportunistic spectrum access in ad hoc networks: A POMDP frame work, ” IEEE J ournal on Select ed Areas in Communications , vol. 25, no. 3, pp. 589 - 600, Apr . 2007 (also see Pr oc. of the first IEEE Symposium on Ne w F r ontiers in Dynamic Spectrum Access Networks , pp. 224 - 232, Nov . 2005). [15] Y . Chen, Q. Zhao, and A. Swami, “Joint design and separatio n principl e for opport unistic spectrum access in the presence of sensing errors, ” IEEE T ransac tions on Information Theory , vol. 54, no. 5 , pp. 2053-2071, May , 2008 (also see Pr oc. of IEEE Asilomar Confer ence on Signals, Systems, and Compute rs , Nov . 2006). [16] A. Sabharwal , A . Khoshnev is, and E . Knightly , “Opportunistic spectral usage: Bounds and a multi-band CSMA/CA protocol, ” IEE E/AC M T ransact ions on Network ing , pp. 533545, June 2007. [17] S. Guha, K. Munagala , and S. Sarkar , “Jointl y optimal transmission and probing s trate gies for multicha nnel wireless systems”, Pr oc. of Confer ence on Information Sciences and Systems (CISS) , March, 2006. [18] N. Chang and M. Liu, “Optimal channel probing and transmission scheduli ng for opportun istic spectrum access”, Pr oc. ACM Internatio nal Confer ence on Mobile Computing and Networking (MobiCom) , Septem- ber 2007. 11 [19] M. Agarwal and M.L. Honig, “Spectrum Sharing on a Wide band Fading Channel with Limited Feedback, ” P r oc. of International Confer ence on Co gnitive Radio Oriented W irele ss Netwo rks and Communicatio ns (Cr ownCom) , August, 2007. [20] R. Smallwo od and E. S ondik, “The optimal control of partia lly ovserv- able Marko v processes ov er a finite horizon, ” Operations Researc h , pp. 1071–1088, 1971. [21] K. L iu and Q. Zhao, “ A Restless Bandit Formula tion of Opportunistic Access: Indexa blity and Index Policy , ” in P r oc. of IEEE W orkshop on Network ing T echno logies for Software Defined Radio (SDR) Networks , June, 2008. [22] K. Liu and Q. Z hao, “Chan nel Probing for Opportunist ic Access with Multi-c hannel Sensing, ” to appear in Pr oc. of IEEE Asilomar Conf erence on Signals, Systems, and Computers , October , 2008. [23] H. L iu, B. Krishnamachari, and Q. Zhao, “Coopera tion and Learning in Multiuse r Opportunisti c Spectrum Access, ” in P r oc. of IEE E W orkshop on T owards Cognit ion in W ir eless Netw orks (CogNe t) , May , 2008. [24] K. Liu, Q. Zhao, and Y . Chen, “Distribut ed S ensing and Access in Cogniti ve Radio Networks, ” to appear in Proc. of 10th Internatio nal Symposium on Spr ead Spectrum T ec hniques and Applicati ons (ISSST A) , August, 2008. [25] R. G. Gallager , Discret e Stochastic Pr ocesses . Kluwer Academic Publishers, 1995.

Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Leave a Comment