Relay Selection with Partial Information in Wireless Sensor Networks
Our work is motivated by geographical forwarding of sporadic alarm packets to a base station in a wireless sensor network (WSN), where the nodes are sleep-wake cycling periodically and asynchronously. When a node (referred to as the source) gets a pa…
Authors: K.P.Naveen, Anurag Kumar
1 Relay Selecti on with Par tial Information in W ireless Sensor Netw orks K. P . Na veen, Student Member , IEEE and Anurag Kumar , F ellow , IEEE Abstract Our work is motiv ated by geographical forwarding of sporadic al arm packe ts to a base stati on in a wi reless sensor network (WS N), where the nodes are sleep-w ake cycling periodically and async hronously . When a node (referred to as the sour ce ) gets a packet to forward, either by detecting an eve nt or from an upstream node, it has to wait for it s neighbors in a forwar ding set (referred to as r elays ) to wake-up. E ach of the relays i s associated with a random re ward (e.g., t he progress made to wards the sink) that is indepen dent and identically distributed (ii d). T o begin with, the source is uncertain about the number of relays, their wake -up times and t he reward values, but kno ws their distribu tions. At each relay wake-up instant, w hen a relay r e veals its reward va lue, the source’ s problem is to forward the packet or to wait for further relays to wake-up. In this setting, we seek to minimize the expected waiting time at the source subject to a lower bound on the av erage re ward. In terms of t he operations research literature, our work can be considered as a variant of the asset selling pr oblem . W e formulate the relay selection problem as a partially observ able Marko v decision process (POMDP), where the unkno wn state is the number of relays. W e be gin by considering the case where the source kno ws t he number of relays. For the general case, where t he source only kno ws a probability mass function (pmf) on the number of relays, it has to maintain a posterior pmf on t he number of relays and forward the pack et iff the pmf is in an optimum stopping set . W e sho w that the optimum stopping set is con ve x and obtain an inner bound to this set. W e pro ve a monotonicity result which yields an outer bound . The computational complexity of the abov e policies moti v ates us to formulate an alternativ e simplified model, the optimal policy for which is a simple thresh old rule . W e provide simulation results to compare the performance of the inner and outer bound policies against the si mple policy , and against the optimal policy when the source knows the exact number of relays. Observing the simplicity and t he good performance of the simple policy , we heuristically employ it for end-to-end packe t forwar ding at each hop in a multihop WSN of sl eep-wak e cycling nodes. Index T erms Relay selection, wireless sensor networks, sleep-w ake cycling, partially observ able Mark ov decision process (POMDP), asset selling problem. Both the authors are with the Dept. of E lectri cal Communication Engineerin g, Indian Institute of Scie nce, Bangalore 560 012, India. Email: { na veenkp, anurag } @ece .iisc.ernet .in This research was supported in part by a project on W irele ss Sensor Network s for Intrusion Detec tion, funded by DRDO, Govern ment of India, and in part by IFCP AR (Indo-Frenc h Center for the Promotio n of Adva nced Resea rch) (Project 4000-IT -1). 2 I . I N T R O D U C T I O N W e a re interested in the problem of packet fo rwarding in a class of w ireless sensor networks (WSNs) in which local inf erences based on sensor measuremen ts could result in the generatio n o f occasio nal “alarm” packets that need to b e routed to a b ase-station, wh ere so me sort of a ction could b e taken [1], [2], [3]. Su ch a situation could arise, fo r examp le, in a WSN for h uman in trusion detection or fire detection in a large region . Suc h WSNs often need to r un on batteries o r on h arvested en ergy and , hen ce, must b e energy consciou s in all their operatio ns. T he nodes of such a WSN would be sleep-wake cycling, waking up periodically to per form th eir tasks. On e appr oach for the for warding problem is to use a distrib uted algo rithm to schedu le the sleep-wake cycles o f the no des such that the delay of a packet f rom its source to the sink on a multiho p path is min imized [2], [ 4]. An organizationa l phase is r equired fo r such algorith ms, which increa ses the proto col overhead an d more over the scheduling alg orithm has to b e rer un per iodically since the clock s at different nodes drift at different rates ( so tha t the previously co mputed schedule would have become stale a fter lo ng o peration time). For a survey of rou ting techn iques in wireless sensor and ad h oc networks a nd their classification, see [5], [6]. In this paper we are concern ed with the sleep- wake cycling approach that permits the nodes to wake-up inde- penden tly of each oth er even th ough each nod e is waking u p p eriodically , i.e., asynchron ous period ic sleep-wake cycling [7], [1]. In fact, given the need for a lon g network life-tim e, no des are more likely to b e sleeping tha n awak e. In such a situation, when a node has a packet to forward, it has to wait for its neighbo rs to wake up. When a neig hbor node wakes up , the forwardin g no de can evaluate it for its use as a relay , e.g ., in term s of the p rogress it makes tow ards the destination node, the qu ality of the channel to the relay , the ene rgy level of the relay , etc., (see [8], [9] for d ifferent routing metrics based on the above m entioned quantities). W e thin k of this as a r ewar d offered by the potential relay . The end -to-end network ob jectiv e is to minimize the average to tal delay subject to a lower boun d on so me m easure of total reward along the en d-to-en d path. In this p aper we ad dress th is end -to-end objective by considerin g op timal strategies at ea ch hop. When a n ode gets a packet to for ward, it h as to make decisions b ased only on the activities in its neigh borho od. W aiting for all potential relays to wake-up an d ch oosing the o ne with the best r ew ard m aximizes the rew ar d at e ach h op, but increases the forwarding delay . On the other hand, forwarding to th e first relay to wake-up m ay result in the loss of the oppor tunity of cho osing a node with a better reward. Hen ce, at each hop , there is a trade-o ff b etween the on e-hop delay and the one-ho p r ew ard. By solving the one-ho p p roblem of minimizing the average delay subject to a c onstraint on th e average re ward, we expect to captur e the trade -off between the end-to-e nd metrics. For instan ce, sup pose th e end-to-en d objective is to minimize the expected end-to-en d deliv ery d elay subject to an upper bound on the expected number o f hops in the path, the motivation fo r this constraint bein g that more ho ps traversed entails a greater expenditure of energy in the network. In ou r appr oach, we would h euristically add ress this p roblem by con sidering at each hop the prob lem of minimizing the mean forwarding delay subject to a lower bound on the progress made to wards the sink. Greater progr ess at each hop entails g reater delay per h op, while r educing the n umber of hops it takes a packet to reach the sink. 3 The local problem setting is the following. Somewhere in the network a no de h as ju st receiv ed a packet to forward; for the lo cal p r oblem we r efer to this f orwarding nod e as the sou r ce and think o f the time at which it g ets the packet as 0 . Th ere is an unkn own numbe r of r elays in the fo rwarding set of the sourc e. In th e geo graphica l forwarding context, th is lack of in formatio n on the number of r elays could m odel the fact that the neighbo rhood of a forwardin g node cou ld vary over tim e due, for example, to no de failures, variation in channel co nditions, or ( in a mobile network) the entry or exit of mobile relay s. Howe ver, we assum e that the num ber of r elays is bound ed by a known number K , and the source has an initial pr obability mass fun ction (pmf), over (1 , · · · , K ) , on the numb er of p otential relays. Th e sou rce desires to f orward the p acket within the interval [0 , T ] , while knowing that the relays wake-up ind epende ntly an d u niform ly over [0 , T ] and the rewards they o ffer are indepen dently and identically distributed (iid). W e will for mally introdu ce ou r mod el in Section II. Next we discuss r elated work and highligh t our contributions. A. Related W ork Here we p rovide a summary of related litera ture in the co ntext of geograp hical forwarding and channel selection. Since our pro blem also belon gs to th e class of asset selling problems studied in ope rations research literature, we survey related work fr om ther e as well. Geographica l forwarding problems: In ou r pr ior work [7] we have consid ered a simple mo del wh ere th e n umber of r elays is a con stant which is kn own to the sou rce. The re the r ew ard is simply the progr ess made b y a relay n ode tow ards the sink. In the cur rent work we have gen eralized ou r earlier m odel by a llowing the num ber of relays to be n ot known to the sour ce. Also, her e we a llow a g eneral reward structu re. There has been other work in th e con text o f geo graph ical fo rwarding and anycast routing, where th e problem of choo sing one among sev eral neighb oring no des arises. Z orzi and Rao [1 0] consider a scenario of geographic al forwarding in a wireless mesh network in which the nod es know their lo cations, and are sleep- wake cycling. They propo se GeRaF (G eograp hical Random Forwardin g), a distributed r elaying alg orithm, wh ose ob jective is to carry a packet to its destination in as few ho ps as po ssible, by making as large p rogr ess as possible at each relayin g stage . For their algorith m, th e authors o btain the average n umber o f h ops ( for gi ven source-sink distance) as a function of the node density . Th ese authors d o not co nsider the trade-off between the relay selection de lay and th e reward gained b y selectin g a relay , which is a majo r co ntribution of o ur work. Liu et al. [1 1] p ropo se a relay selectio n approach as a part of CMA C, a protoco l for geograph ical packet forwarding . W ith respect to the fixed sink, a node i has a forwarding set consisting o f all nod es that make p rogress greater tha n r 0 (an algorithm parameter) . I f Y represent th e delay until the first wake-up in stant of a no de in the forwarding set, and X is the correspon ding progr ess made, then, u nder CMA C, node i cho oses an r 0 that minimizes the expected normalized latency E [ Y X ] . The Rand om Asynch ronou s W akeup (RA W) protoco l [1 2] also conside rs transmitting to the first node to wake-up that makes a progr ess o f greater than a thresho ld. I nterestingly , this is the structure of the op timal po licy for our simplified model in [7]. For th e sake of completen ess we have described th e 4 simplified m odel in this paper as well (see Section VI). Thu s we have provide d analytical support for u sing such a threshold policy . Kim et al. [1] consider a den se WSN. Just like the motiv ation for our mod el, an occasional alarm packet needs to be sent, from wher ev er in the network it is generated , to the sink . Th e autho rs d evelop an o ptimal anycast scheme to minimize average end -to-end delay from any nod e i to the sink whe n each nod e i wakes up asynch ronou sly with ra te r i . Th ey show tha t per iodic wake-up patterns obtain minimum d elay amon g all sleep-wake patterns with the same rate. They propo se an alg orithm called L OCAL-OPT [13] which y ields, for each node i , a threshold h ( i ) j for each o f its neighb or j . If the time at which neigh bor j wakes up is less than h ( i ) j , then i will transmit to j . Otherwise j will go b ack to sleep and i will continu e waiting f or fur ther neighbo rs. A ke y drawback is that a config uration ph ase is req uired to run the LOCAL-O PT algo rithm. Rossi et al. [14], consider the p roblem wh ere a no de i , with a packet to forward and wh ich is n hops away from the sink, h as to ch oose b etween two of its shortlisted neighb ors. The first sh ortlisted neigh bor is the one with the least co st among all othe rs with h op cou nt n − 1 (o ne less th an no de i ). The secon d one is the least cost node among all its neighb ors with h op cou nt n ( same as that of n ode i ) . Thou gh the first node is on the shortest p ath, sometimes wh en its cost is hig h, it may not be the best o ption. It turns o ut th at it is optimal to choose on e no de over the other by co mparin g the cost difference with a threshold . Th e threshold depends on the cost distribution of the nodes wh ich are two h ops away from nod e i . Here there is no no tion of sleep-wake cycling so th at all the neighbo r co sts are known when node i g ets a pac ket to f orward. The pro blem is that of one shot dec ision making. In o ur pr oblem a neighb or’ s cost will bec ome av ailab le on ly after it wakes u p, at which instant node i has to take decision regarding forwardin g. Hence, ours is a sequential decision pr oblem. Channel selectio n problems: Akin to the relay selection pr oblem is the problem of channel selection. Th e au thors in [15], [ 16] consider a model wh ere th ere are se veral channels av ailable to c hoose fro m. The transmitter has to pro be the channe ls to lea rn their qu ality . Pro bing many channels yields one with a go od gain but r educes the effecti ve time for transmission within the channel coherence period. The problem is to obtain optimal strategies to decide when to stop prob ing and to tr ansmit. Here the num ber of chann els is known and all the channels are a vailable at the very b eginning of the decision proc ess. In ou r p roblem the number of relays is n ot known, and the relays become available at rand om times. Asset selling problems: The basic asset selling p roblem [17], [18], comp rises N offers th at arr iv e sequentially over discrete time slots. Th e offers are iid . As the offers arr iv e, the seller has to d ecide whether to take an offer or wait for f uture offers. The seller has to pay a cost to o bserve the next offer . Previous offers cannot b e r ecalled. T he decision pr ocess en ds with the seller c hoosing an offer . Over the year s, several variants of the basic prob lem have been studied, b oth with and without recalling th e previous offers. Recently Kang [1 9] has consider ed a model where a cost h as to be paid to recall the previous best offer . Further, the previous best offer can be lost at the next time instant with some pr obability . See [1 9] for fu rther refere nces to literatu re on m odels with uncer tain recall. In [ 20], 5 the au thors con sider a model in wh ich the offers arrive at the points of a renewal p rocess. Add itional literatur e on such work can be found in [20]. In these mod els, either the num ber of potential offers is known or is infinite. I n [21], a variant is studied in which the asset selling p rocess c an reach a deadline in the next slot with some fixed probab ility , provided that the pro cess has p roceeded u pto the p resent slo t. In our work the number o f offers (i.e., relay s) is not known. Also the successi ve instants at wh ich the offers arrive are the o rder statistics o f an unk nown numb er of iid unifo rm ra ndom variables over a n in terval [0 , T ] . After observing a relay , the probab ility that there are no mo re relays to go (which is the prob ability that the present stage is the la st one) is not fixed. This p robab ility has to be updated depend ing on the previous such pro babilities and th e inter wake-up times between the sucessi ve relays. Althoug h our problem falls in the class of asset selling pro blems, to the best of our knowledge the pa rticular setting we have consid ered in this pap er h as not been studied befo re. B. Our Contributions W ith the nu mber of relays bein g un known, the natural ap proach is to for mulate the problem as a partially observed Markov d ecision pro cess (POMDP). A POMDP is a generalization of an MDP , where at each stage the actual inte rnal state of the system is not av ailable to the c ontroller . Instead , the controller can observe a value from an ob servation spa ce. Th e ob servation proba bilistically depen ds on the cu rrent actu al state and the previous action. In some cases, a POMDP can be converted to an eq uiv a lent MDP by regardin g a belief (i.e., a p robab ility distribution) on the state space as th e state of the equiv alent MDP . For a survey of POMDPs see [22]. It is clear that, even if the ac tual state space is finite, the belief space is uncoun table. The re are several algo rithms av ailable to obtain the op timal p olicy when the actual state spac e is finite [2 3], starting from the semin al work b y Smallwood and Sondik [24]. When the num ber of states is large, these algorithms are computation ally in tensiv e. In general, it is not easy to obtain an o ptimal policy for a POMDP . In the cu rrent work, we h av e characterized the optimal policy in ter ms o f an optimum stop ping set . W e have made use of the convexity results in [25] and so me pr operties specific to our problem to obtain an inner bound o n the optimum stopping set. W e prove a simp le monotonicity result to obtain an outer bo und . In summa ry , the following are the ma in co ntributions of our work : • W e formulate the p roblem of relay selection with par tial information as a finite horizon partially observable Markov decision p rocess (POM DP), with the unknown state being the actual n umber o f r elays (Sec tion I II). The posterior pmf on the nu mber of relay s is shown to b e a su fficient decision statistic. • W e first co nsider the com pletely observable MDP (COMDP) version of the problem wh ere th e sou rce knows the number of relays with pro bability one (wp1) (Section IV). The optimal policy is characterized by a sequenc e of th reshold function s. • F or the POMDP , at each stage the optimu m stopping set is th e set of all p mfs on th e numb er o f relays wh ere it is optima l to stop (Section V) . W e pr ove that this set is co n vex (Section V -A), an d p rovide an inner bo und ( subset ) fo r it (Section V -B). W e prove a mono tonicity result and obtain an outer boun d ( supe rset , Section V -C). The thresho ld fu nctions obtained in COMDP version are used in the design of the bou nds. These threshold function s need to b e obtain ed recur si vely which is in gener al, co mputation ally in tensiv e. 6 • The co mplexity o f the above policies motiv ates u s to consider a simp lified mod el (Section VI). W e prove that the o ptimal policy for this simplified model is a simple thresh old rule. • Through simulatio ns (Section VII-A) we study the perfo rmance co mparision o f various policies with the optimal COMDP p olicy . The in ner b ound policy perfo rms slighty better th an th e outer bo und p olicy . The simple po licy obtained f rom the simplified mo del perfo rms very close to the inner bou nd. Also, we show the poor performan ce of a naive po licy , that assumes the ac tual number o f relays to be simply the expected number . • Finally as a heu ristic f or the end- to-end prob lem in the geog raphical f orwarding context, we app ly the simple policy at each hop and study the en d-to-en d perform ance by simulation ( Section VI I-B). W e find that it is possible to tra deoff between the expected en d-to-en d delay and expected num ber of hops b y tunin g a par ameter . For the ease o f presentation , in the main sectio ns we o nly provide an outlin e of the proof for m ost of the lemmas, followed b y a brief description . Formal proofs are available in Append ices I , II and III. Ap pendix IV co ntains additional simulation results. I I . S Y S T E M M O D E L W e consider th e on e stage problem in which a no de in the network receives a packet to forward. W e call this node the “source” and the nodes that it could potentially f orward the packet to ar e called “relays”. The local pr oblem is taken to start at time 0 . Thus at time 0 , the source node h as a packet to forward to a sink but needs a relay node to acco mplish this task. Ther e is a nonempty set of N relay nodes, lab eled b y the indices 1 , 2 , · · · , N . N is a r andom variable boun ded above b y K , a system pa rameter that is known to th e sour ce no de, i.e., the suppo rt of N is { 1 , 2 , · · · , K } . The source does not know N , b ut knows the bou nd K , and a pmf p 0 on { 1 , 2 , · · · , K } , which is the initial pmf of N . A relay node i , 1 ≤ i ≤ N , becom es a vailable to the sour ce at the instant T i . The sou rce k nows that the instants { T i } are iid unifor mly distributed on (0 , T ) . Observe that this would be the case if the wake-up instants of all the nodes in the network a re perio dic with per iod T , if these (perio dic) r enew al processes are stationar y and independen t, and if the f orwarding node’ s decision instants are stopp ing times w .r .t. these wake-up time pr ocesses [2 6]. W e call T i the wake-up instant of relay i . If the source fo rwards the packet to the relay i , then a r ewar d of R i is accrued. The rewards R i , i = 1 , 2 , · · · , N , ar e iid rand om variables with pdf f R . The suppor t of f R is [0 , R ] . The source knows th is statistical cha racterisation o f the rewards, and also that the { R i } are ind ependen t of the wake-up instants { T i } . When a r elay wakes up at T i and reveals its reward R i , the so urce h as to decide whether to tra nsmit to relay i or to wait for furth er r elays. If th e source decides to wait, then it instructs the relay with the b est r ewar d to stay a wake, while letting the rest g o b ack to sleep . Th is way the sou rce can always fo rward to a relay with the best reward am ong th ose that have woken up so far . Giv en that N = n (throug hout this discussion we will focus on the event ( N = n ) ), let W 1 , W 2 , · · · , W n represent the orde r statistics of T 1 , T 2 , · · · , T n , i.e., the { W k } sequenc e is the { T i } sequenc e sorted in the increasin g o rder . 7 The pdf of the k th ( k ≤ n ) o rder statistic [27, Chapter 2] is, f or 0 < u < T , f W k | N ( u | n ) = n ! u k − 1 ( T − u ) n − k ( k − 1)!( n − k )! T n . (1) Also the jo int pdf of the k th and the ℓ th ord er statistic (fo r k < ℓ ≤ n ) is, for 0 < u ≤ v < T , f W k ,W ℓ | N ( u, v | n ) = n ! u k − 1 ( v − u ) ℓ − k − 1 ( T − v ) n − ℓ ( k − 1)!( ℓ − k − 1 )!( n − ℓ )! T n . (2) Using the a bove expression s, we can write d own th e conditio nal pdf f W k + ℓ | W k ,N (for 1 < ℓ ≤ n − k ) as, for 0 < w < T an d 0 ≤ u < T − w , f W k + ℓ | W k ,N ( w + u | w , n ) = f W k ,W k + ℓ | N ( w, w + u | n ) f W k | N ( w | n ) = ( n − k )! u ℓ − 1 (( T − w ) − u ) ( n − k ) − ℓ ( ℓ − 1 )!(( n − k ) − ℓ )!( T − w ) ( n − k ) . (3) Comparing (3) with ( 1), as expected, we observe th at, given N = n , the pdf of th e wake-up instant of the ( k + ℓ ) th node, cond itioned on the wake-up instant of the k th nod e, is th e ℓ th order statistic of ( n − k ) iid rando m v ariables that are uniform o n the remaining time ( T − w ) . Let W 0 = 0 and d efine U k = W k − W k − 1 for k = 1 , 2 , · · · , n . U k are th e inter-wake-up time instants betwee n the consecutive nodes (see Fig. 1). Later w e will be interested in the cond itional pdf f U k +1 | W k ,N for k = 0 , 1 , · · · , n − 1 which is given by , f or 0 < w < T and 0 ≤ u < T − w , f U k +1 | W k ,N ( u | w, n ) = f W k +1 | W k ,N ( w + u | w , n ) = ( n − k )( T − w − u ) n − k − 1 ( T − w ) n − k . (4) The condition al expectation is g iv en by , E [ U k +1 | W k = w, N = n ] = T − w n − k + 1 , (5) which is simply the expe cted value of the minimu m of n − k random variables ( n − k is the rem aining number of relays), each of which are iid unifo rm on the interval [0 , T − w ) ( T − w is the remain ing time). Definition 1: For no tational simplicity we de fine, f k ( u | w, n ) := f U k +1 | W k ,N ( u | w, n ) E k [ ·| w, n ] := E [ ·| W k = w, N = n ] Note th at f k ( ·| w, n ) depends on n a nd k through the difference n − k an d d epends o n w thr ough T − w . Since the rew ard sequence R 1 , R 2 , · · · , R n is iid and independent o f the wake-up instants T 1 , T 2 , · · · , T n , we wr ite ( W k , R k ) as the pairs o f o rdered wak e-up instants a nd the co rrespon ding rewards. Eviden tly , f R k +1 | W k ,N ( r | w , n ) = f R ( r ) fo r k = 0 , 1 , · · · , n − 1 . Fur ther we de fine (when N = n ) W n +1 := T , U n +1 := ( T − W n ) and R n +1 := 0 . Also E n [ U n +1 | w, n ] := T − w . All these variables are de picted in Fig. 1. W e end this section by listing out, in T able I, m ost of the symbo ls that ap pear in the paper with a brief descr iption fo r eac h. 8 ( W k − 1 , R k − 1 ) U 2 U 3 U 1 U k ( W 3 , R 3 ) 0 0 ( W 2 , R 2 ) ( W k , R k ) R ( W 1 , R 1 ) T U n +1 = T − W n ( W n , R n ) ( W n +1 , R n +1 ) Fig. 1. There are N = n relays. ( W k , R k ) represents the wake-up instant and re ward repecti vely , of the k th rela y . T hese are shown as points in [0 , T ] × [0 , R ] . U k are the inter-wa ke-u p times. Note that W n +1 = T , R n +1 = 0 and U n +1 = T − W n . I I I . T H E S E Q U E N T I A L D E C I S I O N P R O B L E M For the model set up in Section II, we now con sider the following sequen tial de cision pro blem. At each instan t that a r elay wakes up, i.e., W 1 , W 2 , · · · , the source has to make the decision to forward the packet, o r to ho ld th e packet until the n ext wake-up in stant. Since the n umber of available r elays, N , is unkn own, we have a decision problem with partial in formatio n. W e will show how the p roblem can be set up in the f ramework o f a p artially observable Markov decision process (POMDP) [2 2] [28, Chapter 5]. A. Actions, State Space, an d State T ransition Actions: W e assum e that the time instants at which the relays wak e-up, i.e., W 1 , W 2 , · · · , con stitute the de cision instants o r stages 1 . At each decision in stant, th ere are two actio ns po ssible at the sour ce, deno ted 0 an d 1 , where • 0 represen ts the actio n to con tinue waiting for more relay s to wake-up, and • 1 represen ts the actio n to stop and forward th e pac ket to the relay that provide s the b est reward among th ose that h av e woken up to the curren t decision epoch . Since there can be at mo st K relays, the total nu mber of d ecision in stants is K . The decision pr ocess technically ends at th e first instant W k , at which th e source chooses a ction 1 , in w hich case we assum e that all th e subsequent decision instants, k + 1 , · · · , K , occur at W k . In cases wh ere the source ends up waiting un til tim e T (r eferring to Fig. 1, this is possible if , ev en at W n the source decides to co ntinue, not realizing th at it has seen all the relays there are in its f orwarding set), all the subseq uent de cision in stants are a ssumed to occur at T . 1 A better choice for the decision instants m ay be to allow the source to take decision at any time t ∈ (0 , T ] . When N is known to the source it can be argued that it is optimal to tak e decisions only at rela y wake-up instanc es. Ho w e ver this may not hold for our case where N is unkno wn. In this paper we proceed with our restri ction on the decision instants and conside r the general case as a topic for future work. 9 Symbol Descripti on h a, b i Inner product of vecto rs a and b a ℓ k ( w, b ) b ℓ k ( w, b ) Thresholds lying on the line joining p ( k ) k and p ( k + ℓ ) k of the simplex P k ; Used in the construction of the inner and outer bounds, respecti vely B k Best re ward so far , i.e., B k = m ax { R 1 , · · · , R k } c k ( p, w , b ) A vera ge cost of continuing at stage k when the state is ( p, w , b ) C k ( w, b ) Optimum stoppi ng set at stag e k when ( W k , B k ) = ( w, b ) C k ( w, b ) Inner bound for the stopping set C k ( w, b ) C k ( w, b ) Outer bound for the stopping set C k ( w, b ) C 1 step One-step-st opping set for the simpli fied model E k [ ·| w , n ] Expectat ion conditione d on ( W k , N ) = ( w , n ) f k ( ·| w , n ) pdf of U k +1 conditi oned on ( W k , N ) = ( w , n ) f R ( · ) pdf of the iid re wards { R k } J k ( p, w , b ) Opt imal cost-to- go function at stage k when the state is ( p, w , b ) K Bound on the number of rel ays N Number of relays; random v ariable taki ng value s from { 1 , 2 , · · · , K } ˜ N Number of relays in the simplified m odel; a constant P ( A ) Probabil ity of an e ven t A P k Set of all pmfs on the set { k , k + 1 , · · · , K } ( p, w , b ) Represen ts a typical state at stage k where p ∈ P k is the belief state and ( W k , B k ) = ( w , b ) p ( n ) k A corne r point in P k , i.e., p ( n ) k ( n ) = 1 R k Re ward of the k th relay U k +1 Inter wak e-up time between the k + 1 and k th relay , i.e., U k +1 = W k +1 − W k W k W ake- up instant of the k th rel ay ˜ W k , ˜ R k , ˜ U k +1 Quantit ies, analogous to the ones in the exa ct model, for the simplified model α Threshold obtained from the simplified model γ Re ward constraint for the proble m in (11) δ n − k ( w, b ) When p ∈ P k is such that p ( k ) + p ( n ) = 1 then it is optimal to stop iff p ( n ) ≤ δ n − k ( w, b ) η Lagrange multipli er , see (12) − ηb A vera ge cost of s topping at stage k when B k = b τ k +1 ( p, w , u ) Belie f transition function; τ k +1 ( p, w , u ) is a pmf in P k +1 for a giv en p ∈ P k , W k = w and U k +1 = u φ n − k ( w, b ) Threshold obtained from the COMDP version of the problem; If the source knows wp1 that N = n , then at some stage k ≤ n with ( W k , B k ) = ( w , b ) it is optima l to stop if f b ≥ φ n − k ( w, b ) T ABLE I L I S T O F M AT H E M A T I C A L N O TA T I O N . State Space: At stage 0 the state space is simply S a 0 = n ( n, 0 , 0) : 1 ≤ n ≤ K o and the o nly action possible is 0 , where a in the su perscript is to signify th at S a 0 is the set of ac tual intern al states of the system. Th e state spa ce at stage 1 is, S a 1 = n ( n, w, b ) : 1 ≤ n ≤ K , w ∈ (0 , T ) , b ∈ [0 , R ] o 10 and for stages k = 2 , 3 , · · · , K is, S a k = n ( n, w, b ) : k ≤ n ≤ K , w ∈ (0 , T ) , b ∈ [0 , R ] o ∪ n ( k − 1 , T , b ) : b ∈ [0 , R ] o ∪ { ψ } (6) = S a k (1) ∪ S a k (2) ∪ S a k (3) . Thus the state sp ace at stage k = 2 , 3 , · · · , K is written as the union of thr ee sets. The ph ysical m eanings of these sets are as follows: • S a k (1) : n in the state triple ( n, w , b ) r epresents the a ctual num ber of relay s. The states in this set co rrespon d to the case where there are more than or equ al to k re lays, i.e. , n satisfies, k ≤ n ≤ K . In the pair ( w , b ) , w is th e wake-up instant ( W k ) of the k th relay , an d b is the b est reward ( B k = max { R 1 , · · · , R k } ) amo ng the relays seen so far . Same r emark holds for the states in S a 1 . Stage 0 begins at time 0 with 0 reward. Hence th e states in S a 0 are o f th e form ( n, 0 , 0) . • S a k (2) : Sup pose there were k − 1 relays and , at stage k − 1 th e so urce d ecides to continue. No te that it is possible for the sour ce to take such a decisio n, since it does not kno w the numb er o f relays. In such a c ase, the source ends up waiting until time T and enter s stag e k . Hence the states in this set are of the form ( k − 1 , T , b ) where b r epresents the best rew ard among all the k − 1 relay s ( B k − 1 ). • S a k (3) : ψ is the terminating state. The state at stage k will be ψ , if the sour ce has a lready forwarded the packet at an e arlier stag e. State T ransition: If the state at stage k is ψ (i.e., the sou rce has alre ady forwarded the packet) then the next state is always ψ . Suppose ( n, w, b ) ∈ S a k is the state at some stag e k , 0 ≤ k ≤ K − 1 , and a k ∈ { 0 , 1 } r epresents the action taken. I f a k = 1 th en the decision pro cess stop s and we regard tha t the system enters the termination state ψ so th at the state at all the subsequen t stages, k + 1 , · · · , K , is ψ . The source will also terminate the dec ision process, kn owing that the re lays wake-up within the inter val (0 , T ) , if it has waited for a duratio n of T . Th is mean s that ( n, w , b ) ∈ S a k (2) , i.e., n = k − 1 and w = T . On the oth er hand if ( n, w , b ) ∈ S a k (1) and a k = 0 , the sou rce waits for a r andom dura tion o f U k +1 and encoun ters a relay with a ran dom reward of R k +1 so that the next state is ( n, w + U k +1 , max { b, R k +1 } ) . Note that if n = k , i.e., the curr ent relay is th e last one, th en since we have d efined U k +1 = T − w and R k +1 = 0 , the next state w ill b e of the for m ( k , T , b ) . Thu s the state a t stage k + 1 can be written do wn as, s k +1 = ψ if w = T and/or a k = 1 n, w + U k +1 , max { b, R k +1 } otherwise. (7) B. Belief State and Belief State T ransition Since the sour ce do es n ot k now the actual nu mber o f relay s N , the state is on ly partially observable. The source takes decisions based on th e entire h istory of the wake-up instants and the b est r ew ards. If the source h as not 11 forwarded the packet u ntil stage k − 1 then define, I k = ( p 0 , ( w 1 , b 1 ) , · · · , ( w k , b k )) to be the information ve ctor av ailable at the source when th e k th relay wakes up. w 1 , · · · , w k represents the wake-up instants of relays waking up at stages 1 , · · · , k and b 1 , · · · , b k are th e correspo nding be st r ew ards. Defin e p k to be the belief state ab out N at stage k given the infor mation vector I k , i.e., p k ( n ) = P ( N = n | I k ) for n = k , k + 1 , · · · , K (note that p k ( k ) is the pro bability that the k th relay is the last one ). Thus, p k is a pmf in the K − k d imensional pr oba bility simplex . Let us deno te this simplex as P k . Definition 2: For k = 1 , 2 , · · · , K , let P k := set o f all pmfs on the set { k , k + 1 , · · · , K } . P k is the K − k dimensiona l proba bility simplex in ℜ K − k +1 . The “o bservation” ( w k , b k ) at stage k is a pa rt of th e actua l state ( n, w k , b k ) . For a gene ral POMDP pr oblem the observation can belong to a comp letely different space than the actual state space. Moreover the distribution of the observation at any stage can in general depend on all the previous states, ob servations, actions and disturban ces. Suppose this distribution depends only on the state, action and disturb ance of the immediately precedin g stage, then a belief on th e actual state given the entire history tur ns ou t to be sufficient for taking decision s [28, Cha pter 5]. For our case, this conditio n is m et and hen ce at stage k , ( p k , w k , b k ) is a sufficien t statistic to take decision. Therefo re we modify the state space as, S 0 = { ( p, 0 , 0) : p ∈ P 1 } an d f or k = 1 , 2 · · · , K , S k = n ( p, w, b ) : p ∈ P k , w ∈ (0 , T ] , b ∈ [0 , R ] o ∪ { ψ } . (8) After seeing k r elays, suppose the sour ce cho oses not to forward th e packet, then upon the next relay waking up (if any), the sou rce n eeds to update its belief about th e number of r elays. Formally , if ( p, w , b ) ∈ S k is the state at stage k an d w + u is the wake-up instant o f th e next relay then, using Bayes rule, the next belief state can be obtained via th e following belief state transition fu nction wh ich yields a pmf in P k +1 , τ k +1 ( p, w, u )( n ) = p ( n ) f k ( u | w, n ) P K ℓ = k +1 p ( ℓ ) f k ( u | w, ℓ ) (9) for n = k + 1 , · · · , K . Note that th is fun ction do es not dep end on b . T hus, if at stage k ∈ { 0 , 1 , · · · , K − 1 } , th e state is ( p, w, b ) ∈ S k , then the next state is s k +1 = ψ if w = T and/or a k = 1 τ k +1 ( p, w, U k +1 ) , w + U k +1 , max { b, R k +1 } otherwise, (10) where U k +1 is the random delay until the next r elay wakes up and R k +1 is the random re ward offered by that relay . The exp lanation fo r th e ab ove belief state transition expression rem ains same as that of the actual state transition in (7), except tha t if th e action is to con tinue, then the source needs to update the b elief ab out the nu mber of relays. Su ppose at stage k , the actual num ber of relays hap pens to be k and the action is to continue, which is possible since the source does not k now the actual numb er , th en the source will end up waiting un til time T and then transmit to the relay with the b est reward. C. Stopp ing Rules a nd the Optimization Pr ob lem As th e relays wake-up, th e source’ s p roblem is to decide to sto p o r co ntinue waiting for further relays. A stopping rule or a policy π is a seq uence of mapp ings ( µ 1 , · · · , µ K ) where µ k : S k → { 0 , 1 } . Let Π r epresent the set of all 12 policies. The delay D π incurred usin g p olicy π is the instant at which the source fo rwards the packet. I t could be either on e of the W k , or the instant T . The reward R π is the reward a ssociated with the relay to wh ich the p acket is f orwarded. T he p roblem we ar e interested in is the following, min π ∈ Π E [ D π ] Subject to E [ R π ] ≥ γ . (11) T o so lve the above prob lem, we consider the following un constrained pr oblem, min π ∈ Π E [ D π ] − η E [ R π ] (12) where η > 0 . Lemma 1 : Let π ∗ be an op timal po licy for the unconstrained pr oblem in (12). Suppose that η (=: η γ ) is such that E [ R π ∗ ] = γ , then π ∗ is o ptimal for the main problem in (11) as well. Pr oof: For a ny po licy π satisfying the constraint E [ R π ] ≥ γ we c an write, E [ D π ∗ ] ≤ E [ D π ] − η γ E [ R π ] − E [ R π ∗ ] = E [ D π ] − η γ E [ R π ] − γ ≤ E [ D π ] , where the fir st in equality is by the o ptimality of π ∗ for (12), the e quality is by the h ypothe sis on η γ , and the last inequality is due to the restriction of π to E [ R π ] ≥ γ . Hence we focus on solving the unc onstrained pr oblem in (1 2). D. One-Step Co sts The objective in (12) can be seen as accumulating additively over each step. If the decision at a stage is to continue then the delay until the n ext relay wakes up (or until T ) gets ad ded to the cost. On the other han d if the decision is to stop then th e source c ollects the rew ard offered by the re lay to which it fo rwards the pa cket and the decision p rocess en ters the state ψ . The c ost in state ψ is 0 . Suppo se ( p, w, b ) is the state at stage k . Then th e one-step-c ost functio n is, for k = 0 , 1 , · · · , K − 1 , g k ( p, w, b ) , a k = − η b if w = T and/or a k = 1 U k +1 otherwise. (13) The cost of termination is g K ( p, w, b ) = − η b . Also note that for k = 0 , the possible states ar e of the form ( p, 0 , 0) and the o nly possible action is a 0 = 1 , so th at g 0 ( p, 0 , 0) , a 0 = U 1 . E. Optimal Co st-to-go F unctions For k = 1 , 2 , · · · , K , let J k ( · ) represen t the optimal co st-to-go function at stage k . For any state s k ∈ S k , J k ( s k ) can b e written as, J k ( s k ) = min { stop ping cost , con tinuing cost } , (14) 13 where stopping cost ( continu ing cost ) repr esents the average cost incu rred, if the source, at the c urrent stage decides to stop (continue), and takes optima l action at the subsequent stag es. For the ter mination state, since the o ne step cost is zer o and since the system remains in ψ in all the sub sequent stages, we have J k ( ψ ) = 0 . For a state ( p, w, b ) ∈ S k , we next ev aluate th e two costs in th e above expr ession. First let u s o btain the stop ping co st. Su ppose that there were K rela y no des an d the sour ce has seen them all. In such a case if ( p, w , b ) ∈ S K (note tha t p will just b e a p oint mass on K ) is the state at stage K then the op timal cost is simply the cost of termination, i.e., J K ( p, w, b ) = g K ( p, w, b ) = − η b . For k = 1 , 2 , · · · , K − 1 , if the action is to stop then the o ne step cost is − η b and the next state is ψ so that th e fu rther cost is J k +1 ( ψ ) = 0 . T herefor e, the stoppin g cost at any stage is simply − η b . On th e oth er hand th e cost for continuin g, wh en the state at stag e k is ( p, w, b ) , usin g th e total exp ectation law , can b e written as, c k ( p, w, b ) = p ( k ) T − w − η b + K X n = k +1 p ( n ) E k U k +1 + J k +1 τ k +1 ( p, w, U k +1 ) , w + U k +1 , max { b, R k +1 } w, n . (15 ) Each of the expec tation term in the summa tion in (15) is the average cost to con tinue cond itioned on the event ( N = n ) . U k +1 is the (rand om) time u ntil th e next re lay wakes up ( U k +1 is the on e step c ost) and J k +1 ( · ) is the optimal cost-to-g o fr om the next stage onwards ( J k +1 ( · ) constitutes th e future cost). The n ext state is obtained via the state transition equatio n (1 0). The term ( T − w − η b ) in (15) associated with p ( k ) is th e cost of continu ing when the num ber o f relays happ en to be k , i.e. , ( N = k ) and th ere are no mor e relays to go . Recall that we h ad defined (in Sec tion I I) U k +1 = T − w and R k +1 = 0 when the actual num ber of relays is N = k . Theref ore T − w is th e one step co st when N = k . Also w + U k +1 = T and max { b, R k +1 } = b so that at th e next stage (wh ich occurs at T ) the pro cess will ter minate (en ter ψ ) with a cost of − η b (see ( 10) and (13)), which represen ts the f uture co st. Thus the optima l cost-to-go functio n (14) at stage k = 1 , 2 , · · · , K − 1 , can be written as, J k ( p, w, b ) = min n − η b, c k ( p, w, b ) o . (16) From the above expression it is clear that at stage k when the state is ( p, w , b ) , the sou rce has to compar e the stopping cost, − η b , with the cost of continu ing, c k ( p, w, b ) , and stop iff − η b ≤ c k ( p, w, b ) . Later in Section V, w e will use this co ndition ( − η b ≤ c k ( p, w, b ) ) and define, the optimum stop ping set . W e will pr ove that the con tinuing cost, c k ( p, w, b ) , is con cav e in p , leading to the result that the o ptimum stopping set is con vex. (1 5) and (16) ar e extensi vely used in the subseque nt development. I V . R E L AT I O N S H I P W I T H T H E C A S E W H E R E N I S K N O W N ( T H E C O M D P V E R S I O N ) In the previous sectio n (Sectio n III) we detailed o ur p roblem formulatio n as a POMDP . The state is partially observable because the sou rce do es no t know the exact num ber of relay s. It is interesting to first co nsider the simpler case wher e this nu mber is k nown, which is the con tribution of our earlier work in [7]. Hence, in this section , we 14 will con sider the case when the in itial pm f, p 0 , has all the m ass only on som e n , i. e., p 0 ( n ) = 1 . W e call this, th e COMDP version o f the pr oblem. First we define a sequen ce of threshold fu nctions which will be useful in the subsequent pr oofs. These are th e same thresho ld functio ns that characteriz e the optim al po licy for our mode l in [7]. Definition 3: For ( w, b ) ∈ (0 , T ) × [0 , R ] , define { φ ℓ : ℓ = 0 , 1 , · · · , K − 1 } inductively as follows: φ 0 ( w, b ) = 0 for all ( w, b ) , an d for ℓ = 1 , 2 , · · · , K − 1 (re call Definition 1 ), φ ℓ ( w, b ) = E K − ℓ max b, R, φ ℓ − 1 w + U, max { b, R } − U η w, K . (17) In the ab ove expression we h av e sup pressed the subscrip t K − ℓ + 1 fo r R and U for simp licity . T he pdf used to take th e expectatio n in the above expression is f R ( · ) f K − ℓ ( ·| w, K ) (ag ain recall Definition 1). W e will n eed the f ollowing simple pro perty of the thre shold fu nctions in a later sectio n. Lemma 2 : For ℓ = 1 , 2 , · · · , K − 1 , − η φ ℓ ( w, b ) ≤ ( T − w − η b ) . Pr oof: See App endix I -A. Next we state the main lemma of this sectio n. W e call th is the On e-point L emma, b ecause it g iv es th e optimal cost, J k ( p k , w, b ) , at stage k when the belief state p k ∈ P k is suc h th at it has all the mass on som e n ≥ k . Lemma 3 (One-point) : Fix so me n ∈ { 1 , 2 , · · · , K } and ( w , b ) ∈ (0 , T ) × [0 , R ] . For any k = 1 , 2 , · · · , n , if p k ∈ P k is suc h th at p k ( n ) = 1 then, J k ( p k , w, b ) = min n − η b, − η φ n − k ( w, b ) o . Pr oof: The pro of is by inductio n. W e make use of the fact that if at som e stage k < n the belief state p k is such that p k ( n ) = 1 then th e next b elief state p k +1 ( ∈ P k +1 ) , ob tained by using the belief tr ansition equation (9), is also of the f orm p k +1 ( n ) = 1 . W e complete the proo f b y using D efinition 3 and the induction hypo thesis. For a complete p roof, see Append ix I-B. Discussion o f Lemma 3: At stage k if the state is ( p k , w, b ) , wh ere p k is such that p k ( n ) = 1 for some n ≥ k , then from th e On e-point L emma it follows that the o ptimal policy is to stop and transmit iff b ≥ φ n − k ( w, b ) . The subscript n − k o f the f unction φ n − k signifies the number of m ore relay s to go. For instance, if we know that th ere are exactly 4 more relays to go then the thr eshold to be used is φ 4 . Suppose at stage k if it was optimal to continu e, then fro m (9) it follows that the n ext b elief state p k +1 ∈ P k +1 also has mass only o n ( N = n ) and hence a t this stage it is optimal to use the thr eshold fun ction φ n − ( k +1) . T herefo re, if we begin with an in tial b elief p 0 ∈ P 1 such that p 0 ( n ) = 1 for some n , then the optimal p olicy is to stop at the first stage k suc h that b ≥ φ n − k ( w, b ) where W k = w is the wake-up instant of the k th relay and B k = max { R 1 , · · · , R k } = b . Note that, since a t stag e n the thresh old to be used is φ 0 ( w, b ) = 0 ( see Definitio n 3), we inv a riably hav e to sto p at stage n if we h av e not terminated e arlier . This is exactly the same as our optimal policy in [7], whe re the number of relays is known to the source (instead of knowing the numb er wp1 , as in o ur One- point Le mma here). 15 V . U N K N OW N N : B O U N D S O N T H E O P T I M U M S T O P P I N G S E T In this section we will co nsider the gen eral case where the n umber of rela ys N is not known to the sour ce. The sequential de cision prob lem developed in Section III was for this un known N case. The p roblem was formulated as a POMDP for wh ich the source ’ s decision to sto p and forward the packet is based on th e belief state which takes v alues in P k after the sour ce has observed k relays waking up. W e begin this section by defin ing the optimum stopping set . W e show that this set is conv ex. Ch aracterizing the exact optimum stopping set is comp utationally intensive. Therefore, o ur aim is to deriv e inner and ou ter bo unds (a sub set and a superset, r espectively) for the optimum stopping set. Definition 4 (Optimum stopping set): For 1 ≤ k ≤ K − 1 , let C k ( w, b ) = n p ∈ P k : − η b ≤ c k ( p, w, b ) o . Referring to (16) it f ollows th at, for a given ( w, b ) , C k ( w, b ) repre sents the set of all beliefs p ∈ P k at stage k at which it is optimal to stop. W e call C k ( w, b ) the o ptimum stopping set at stage k when the delay ( W k ) and best rew ard ( B k ) values are w and b , r espectively . A. Con vexity of the Op timum S toppin g Sets W e will prove (in Lem ma 4 ) that the co ntinuin g cost, c k ( p, w, b ) , in (15) is con cave in p ∈ P k . From the form of the stop ping set C k ( w, b ) , a simp le con sequence of this lemm a will b e that the optimu m stop ping set is conve x . W e further extend the concavity result o f c k ( p, w, b ) for p ∈ P k , where P k is th e affine set containing P k (to b e defined sho rtly in this section). Lemma 4 : For k = 1 , 2 , · · · , K − 1 , and any g iv en ( w , b ) , the cost of contin uing (d efined in ( 15)), c k ( · , w, b ) , is concave o n P k . Pr oof: The essence of the p roof is same as that in [25, Lemma 1]. From (15) we easily see that c K − 1 ( · , w, b ) is an af fine fun ction of p ∈ P K − 1 , and hence J K − 1 ( · , w, b ) , in (16), bein g m inimum of an affine f unction and a constant is concave. The proof then follows by induction. Th e induction hypo thesis is that for some stage k + 1 , J k +1 ( · , w, b ) is con cave. Hence it can be expr essed as an infimum over so me collection of affine functions. The inductive step then sh ows that c k ( · , w, b ) can also be similarly expressed as an infimum over some collec tion of affine functions. Hence c k ( · , w, b ) and (u sing 16) J k ( · , w, b ) are con cav e. F ormal proof is av ailable in Ap pendix II-A. The following corollar y is a straight forward application of the above lemma. Cor ollary 1: For k = 1 , 2 , · · · , K − 1 , and any given ( w , b ) , C k ( w, b )( ⊆ P k ) is a conve x set. Pr oof: From Lemma 4 we k now that c k ( p, w, b ) is a concave function of p ∈ P k . He nce C k ( w, b ) (see Definition 4), being a super level set of a concave function, is c onv ex [2 9]. In the next section while proving an inner boun d f or the stopping set C k ( w, b ) , we will id entify a set of points that could lie outside the probability simp lex P k . W e can obtain a better inner bound if we extend the concavity result to the affine set, P k = n p ∈ ℜ K − k +1 : h p, 1 i = 1 o , 16 where h p, 1 i = P K n = k p ( n ) , i.e., in P k the vectors sum to one, but we do not requir e no n-negativity of the vectors. This can be do ne as follows. Define τ k +1 ( p, w, u ) using (9) for every p ∈ P k . Then τ k +1 ( ., w, u ) as a func tion o f p , is the extension o f τ k +1 ( ., w, u ) fro m P k to P k . Similarly , fo r every p ∈ P k , define c k ( p, w, b ) and J k ( p, w, b ) using ( 15) an d (16). These are the extension s of c k ( · , w, b ) a nd J k ( · , w, b ) r espectively . Then again, using the p roof technique same as that in Lemm a 4, we can ob tain th e fo llowing corollary , Cor ollary 2: For k = 1 , 2 , · · · , K − 1 , and any g iv en ( w , b ) , c k ( · , w, b ) is conc av e on the affine set P k . Using th e above corollar y , C k ( w, b ) can be written as, C k ( w, b ) = P k ∩ n p ∈ ℜ K − k +1 : h p, 1 i = 1 , − η b ≤ c k ( p, w, b ) o . (18) B. Inner Bound on the Optimum Stopp ing Set W e h ave showed that the optimum stopping set is con vex. In this sectio n, we will identify p oints that lie along certain edges of the simplex P k . A co n vex hu ll of these p oints will yield an inner bound to the o ptimum stopp ing set. This will first req uire us to prove the following lemma, referred to as the T wo-po ints Lem ma, and is a generalizatio n of th e On e-point Lemm a (L emma 3). It gives the optimal cost, J k ( p, w, b ) , at stage k when p ∈ P k is suc h th at it places all its m ass on k and on some n > k , i.e., p ( k ) + p ( n ) = 1 . Throug hout th is and the n ext section (on an outer bo und) ( W k , B k ) = ( w, b ) is fixed and h ence, fo r the ease of presentation (and r eadability), we drop ( w, b ) from the notation s δ ℓ ( w, b ) , a ℓ k ( w, b ) a nd b ℓ k ( w, b ) (to appe ar in these sections later). Howe ver it is unde rstood that these thresho lds are, in gener al, fun ctions of ( w , b ) . Lemma 5 (T wo-p oints): F or k = 1 , 2 , · · · , K − 1 , if p ∈ P k is such that p ( k ) + p ( n ) = 1 , where k < n ≤ K then, J k ( p, w, b ) = min n − η b, p ( k ) T − w − η b + p ( n ) − η φ n − k ( w, b ) o . Pr oof: Using ( 15) we c an wr ite, c k ( p, w, b ) = p ( k ) T − w − η b + p ( n ) E k U k +1 + J k +1 τ k +1 ( p, w, U k +1 ) , w + U k +1 , max { b, R k +1 } w, n . For p given a s in th e hypo thesis, the belief in the next state is such that τ k +1 ( p, w, u )( n ) = 1 . Using this o bservation, Lemma 3 (One-po int), and the definition of φ n − k in (17), we obtain the desired resu lt. Discussion of Lemma 5 : The T w o-po ints Lemma (Lemma 5 ) can be used to obtain certain thresh old p oints in the following way . Whe n p ∈ P k has mass only on k and o n some n , k < n ≤ K , th en using Lemma 5 , the continuin g cost can be written as a fun ction of p ( n ) as, c k ( p, w, b ) = T − w − η b − p ( n ) T − w − η b − φ n − k ( w, b ) . (19) From Lemma 2, it follows that c k ( p, w, b ) in (1 9) is a decre asing fun ction of p ( n ) . Let p ( k ) k and p ( n ) k be pmfs in P k with mass only o n N = k and N = n r espectively . These ar e two o f th e cor ner points of the simp lex P k (as an example, Fig. 2 illu strates th e simplex and the corner po ints for stage k = K − 2 . W ith at most two more nodes 17 p ( K − 1) K − 2 p ( K − 2) p ( K ) K − 2 p ( K − 2) K − 2 p ( K ) p ( K − 1) Fig. 2. Probab ility simplex, P K − 2 , at stage K − 2 . A belief state at stage K − 2 is a pmf on the points K − 2 , K − 1 and K (i.e., no-more, one-more and two-more relays to go, respecti vely). Thus P K − 2 is a two dimension al simplex in ℜ 3 . to go, P K − 2 is a two dimensio nal simplex in ℜ 3 . p ( K − 2) K − 2 , p ( K − 1) K − 2 and p ( K ) K − 2 are the corner po ints of this simplex). At stage k as we m ove along the line joining the points p ( k ) k and p ( n ) k (Fig. 3(a) a nd 3(b) illustrates this as p ( n ) going fro m 0 to 1 ), the cost of continuin g in (19) d ecreases and there is a thr eshold below wh ich it is optimal to transmit and beyond which it is optima l to continue. The value o f this threshold is that value of p ( n ) in (19) at which the c ontinuin g cost becomes equal to − η b . Let δ n − k denote this thresho ld value, then δ n − k = T − w T − w − η b − φ n − k ( w, b ) . The cost of con tinuing in ( 19) as a fu nction o f p ( n ) along with th e stopping cost, − η b , is shown in Fig. 3 (a) and 3(b). The thresh old δ n − k is th e point of intersection of these two co st f unctions. The value of the con tinuing cost c k ( p, w, b ) at p ( n ) = 1 is − η φ n − k ( w, b ) . No te that in the case whe n b > φ n − k ( w, b ) the thresho ld δ n − k will be greater than 1 in which case it is optim al to stop fo r a ny p on the line jo ining p ( k ) k and p ( n ) k . 0 1 stop continue p ( n ) c k ( p, w , b ) T − w − η b δ n − k − η φ n − k ( w , b ) − η b (a) 0 1 c k ( p, w , b ) stop − η b − η φ n − k ( w , b ) T − w − η b δ n − k p ( n ) (b) Fig. 3. Depic tion of the thresholds δ n − k ( w, b ) . c k ( p, w , b ) in Equation (19) is plotted as a function of p ( n ) . Also shown is the const ant function − ηb which is the stopping cost. δ n − k is the point of intersect ion of these two functions. (a) When b ≤ φ n − k . (b) When b > φ n − k ( w, b ) . 18 p ( K ) K − 2 p ( K − 1 ) K − 2 a 2 K − 2 p ( K − 2 ) K − 2 a 1 K − 2 (a) p ( K ) K − 2 p ( K − 2 ) K − 2 a 1 K − 2 p ( K − 1 ) K − 2 a 2 K − 2 (b) p ( K ) K − 2 p ( K − 2 ) K − 2 p ( K − 1 ) K − 2 a 1 K − 2 a 2 K − 2 (c) Fig. 4. Depicti on of the inner bound C K − 2 ( w, b ) . In the ex amples in (a), (b), and (c) we only sho w the fa ce of the simplex, P K − 2 , in Fig. 2, with the inner bound being shown as the shaded region. (a) When δ 1 and δ 2 are both less than 1 . (b) When δ 1 > 1 and δ 2 < 1 . (c) When δ 1 > 1 and δ 2 > 1 . There are similar threshold s alon g each ed ge of th e simplex P k starting fro m the cor ner po int p ( k ) k . In general, let us define for ℓ = 1 , 2 , · · · , K , δ ℓ = T − w T − w − η b − φ ℓ ( w, b ) . (20) Remark: Note that (19) will also hold for th e extende d function c k ( p, w, b ) , wh ere n ow p ∈ P k . In terms of the extended function, δ n − k represents the value of p ( n ) (in (1 9) with c k replaced by c k ) at which c k ( p, w, b ) = − η b . Recall th at (fr om Lemma 5) th e above discussion b egan with a p ∈ P k such th at p ( k ) + p ( n ) = 1 . At the threshold of interest we hav e p ( n ) = δ n − k and hen ce p ( k ) = 1 − δ n − k , and the r est of the com ponen ts are zer o. W e denote this vecto r as a n − k k . For instan ce in Fig. 4, wher e the face of the two dimension al simplex P K − 2 is shown, th e thre shold along th e lower edge o f th e simplex is a 1 K − 2 = [1 − δ 1 , δ 1 , 0] an d that along th e oth er edge is a 2 K − 2 = [1 − δ 2 , 0 , δ 2 ] . Since it is p ossible for δ n − k > 1 , therefo re the vector threshold a n − k k is not r estricted to lie in the simplex P k , however it always stays in the affine set P k . W e formally define these thresho lds next. Definition 5: For a given k ∈ { 1 , 2 , · · · , K − 1 } , f or each ℓ = 1 , 2 , · · · , K − k defin e a ℓ k as a K − k + 1 dimensiona l point with the first and the ℓ + 1 th co mpon ents eq ual to 1 − δ ℓ and δ ℓ respectively , the rest of th e compon ents are zero s. As mentioned bef ore, a ℓ k lies o n the line join ing p ( k ) k and p ( k + ℓ ) k . At stage k th ere are K − k such points, o ne correspon ding to each edge in P k emanating f rom the co rner point p ( k ) k . For an illustration of these points see Fig. 4 for the case k = K − 2 . Referring to Fig. 4(a) (wh ich depic ts the case, k = K − 2 ) , sup pose all the vecto r thresho lds, a l k , lie within the simplex P k then, since at these po ints the stopping cost ( − η b ) is equal to the continuing cost ( c k ( a l k , w, b ) ), all these points lie in th e optimu m stoppin g set C k ( w, b ) . Note tha t the corn er p oint p ( k ) k (belief with all th e mass o n no-mo re relays to go) also lies in C k ( w, b ) . Since we h ave a lready shown that C k ( w, b ) is con vex, the conve x hull of these po ints will yield an in ner boun d. Howe ver as mentioned ear lier (and as depicted in Fig. 4 (b) and 4 (c)) it is possible for some or all the th resholds a l k to lie outside the simplex (an d henc e these thresho lds do no t belong to C k ( w, b ) ) . This is wh ere we will use Corollary 2, where th e co ncavity result of the con tinuing cost, c k ( p, w, b ) , 19 is extend ed to th e affine set P k . W e next state this inner bo und th eorem: Theor em 1 ( Inner b ound) : For k = 1 , 2 , · · · , K − 1 , Recalling that p ( k ) k is th e pmf in P k with point mass on k , define C k ( w, b ) := P k ∩ conv n p ( k ) k , a 1 k , · · · , a K − k k o , where conv denotes the conv ex h ull of the given points. Th en C k ( w, b ) ⊆ C k ( w, b ) . Pr oof: The way the points a ℓ k are defined using δ ℓ it follows that c k ( a ℓ k , w, b ) = − η b ( see Remark fo llowing (20)). p ( k ) k is the pmf with p oint mass on ( N = k ) , so that c k ( p ( k ) k , w, b ) = c k ( p ( k ) k , w, b ) = T − w − η b (see (15)). Therefo re the p oints p ( k ) k , a 1 k , · · · , a K − k k ∈ n p ∈ ℜ K − k +1 : p. 1 = 1 , − η b ≤ c k ( p, w, b ) o which is a conve x set (because c k ( p, w, b ) is con cav e in p , f rom Coro llary 2) . Th erefore conv n p ( k ) k , a 1 k , · · · , a K − k k o ⊆ n p ∈ ℜ K − k +1 : p. 1 = 1 , − η b ≤ c k ( p, w, b ) o and the r esult follows from (18). In Fig . 4, for stage k = K − 2 , we illustrate the various cases that can arise. In each of the figures the shaded region is the inn er bound . In Fig. 4(a) all the thresho lds lie within the simplex and simply the convex h ull of these points gives th e inn er boun d. When some or all the th resholds lie outside the sim plex, as in Fig. 4(b) and 4 (c), then the in ner bo und is obtaine d by in tersecting the convex hull of the thresh olds with the simplex. In Fig. 4(c), wh ere all the thresho lds lie o utside the simplex, the inner boun d is the entire simplex, P K − 2 , so that at stage K − 2 with ( W K − 2 , B K − 2 ) = ( w , b ) it is optimal to stop fo r any b elief state. C. Outer B ound on the Optimum Stop ping Set In this section we will o btain an ou ter bo und (a superset) for the op timum stopping set. Again, as in the ca se of the inn er bound , we will iden tify certain thr eshold points who se conve x h ull will con tain the optimum stopping set. This will requir e us to first pr ove a mo noton icity resu lt which com pares the c ost of continu ing a t two belief states p, q ∈ P k which a re or dered, f or in stance f or k = K − 2 , as in Fig 5 . q in Fig. 5 is such tha t q ( K − 2) = p ( K − 2) (i.e., the pro bability that there is no- more relays to go is same in both p and q ) and q ( K − 1) = 1 − p ( K − 2) (i.e., all the re maining pro bability in q is on the event th at ther e is one-more relay to go, wh ile in p it can be on one-mo re or two-m ore relay s to go). Thu s q lies on the lower edge of the simplex. W e will show tha t the cost of continuin g at p is less than that at q . Lemma 6 : Giv en p ∈ P k for k = 1 , 2 , · · · , K − 1 , defin e q ( k ) = p ( k ) and q ( k + 1) = 1 − p ( k ) , then c k ( p, w, b ) ≤ c k ( q , w , b ) f or any ( w, b ) . Pr oof: See App endix I I-B. Discussion o f Lemma 6 : This lemma proves the intuitive result that th e continu ing cost with a pm f p th at gives mass o n a la rger numb er of relays should b e smaller than with a pmf q th at concentrates all such mass in p on ju st one more relay to go. W ith more relays, the cost of contin uing is expected to decrease. Similar to the th resholds a ℓ k we define the thresholds b ℓ k that lie along certain ed ges of the simplex. W e will identify the thresho ld a ℓ k that is at a max imum distance from the corner point p ( k ) k (in Fig. 5, this poin t is a 1 K − 2 = 20 [1 − δ 1 , δ 1 , 0] ). Next we define the thresh olds b ℓ k to be the p oints on the edges eman ating from p ( k ) k , which are a t this same distance. Thu s in Fig. 5, b 1 K − 2 = a 1 K − 2 and b 2 K − 2 = [1 − δ 1 , 0 , δ 1 ] . p ( K ) K − 2 p q p ( K − 2 ) K − 2 p ( K − 1 ) K − 2 a 1 K − 2 a 2 K − 2 b 2 K − 2 Fig. 5. T he light s haded region is the inner bound. The oute r bound is the union of the ligh t and the dark shaded regi ons. Definition 6: Now for ℓ = 1 , 2 , · · · , K − k define b ℓ k as a K − k + 1 dimension al po int with the first and the ℓ + 1 th co mpon ents equal to 1 − δ ℓ max and δ ℓ max respectively , the rest of the co mponen ts a re zeros. E ach o f the b ℓ k are at e qual distance from p ( k ) k but on a different edge starting fro m p ( k ) k . Using L emma 6, we show that th e conve x hull o f th e thr esholds b l k along with the corn er poin t p ( k ) k constitutes an o uter boun d fo r the optimum stopping set. The idea of th e pr oof can be illustrated using Fig . 5. p in Fig . 5 is outside the con vex hull and q is o btained f rom p as in Lemma 6. At q it is op timal to c ontinue since it is beyond the threshold a 1 K − 2 and hence the co ntinuing cost at q , c k ( q , w , b ) , is less than the stoppin g co st − η b . Fr om Lemma 6 it follows that the continu ing c ost at p , c k ( p, w, b ) , is a lso less than − η b so that it is optimal to co ntinue at p as we ll, proving that p do es no t belo ng to the o ptimum stopping set. Thus the conve x h ull co ntains th e op timum stopping set. W e formally state and pr ove th is outer b ound theorem next. Theor em 2 ( Outer b ound ): For k = 1 , 2 , ..., K − 1 , define C k ( w, b ) = P k ∩ conv n p ( k ) k , b 1 k , · · · , b K − k k o . Then C k ( w, b ) ⊆ C k ( w, b ) . Pr oof: L et ℓ max = arg max ℓ =1 , 2 , ··· ,K − k δ ℓ . If δ ℓ max ≥ 1 , th en C k ( w, b ) = P k ( ⊇ C k ( w, b )) and the result trivially f ollows. Hen ce, let us consider the case whe re δ ℓ max < 1 . Pick any p / ∈ C k ( w, b ) . W e will sho w that p / ∈ C k ( w, b ) . Let q ∈ P k be such that q ( k ) = p ( k ) and q ( k + 1) = 1 − p ( k ) . p / ∈ C k ( w, b ) imp lies that p ( k ) < 1 − δ ℓ max . Since q ( k + 1) = 1 − p ( k ) > δ ℓ max ≥ δ 1 , it follows th at under q it is optimal to contin ue so that q / ∈ C k ( w, b ) i.e. , c k ( q , w , b ) < − η b . Finally by app lying Lemma 6 we can write c k ( p, w, b ) ≤ c k ( q , w , b ) < − η b . This means that at p it is op timal to continue so that p / ∈ C k ( w, b ) . The outer bound fo r k = K − 2 is illustrated in Fig. 5. The lig ht shaded r egion is the inner bound. The oute r bound is the union of the ligh t and the d ark sh aded region s. The bou ndary of th e o ptimum stopp ing set falls within the dark shaded region. For any p within th e inner bound we know that it is optima l to stop and f or any p o utside 21 the outer b ound it is optimal to continue. W e are un certain about the optimal action for b elief states within the dark shaded region. V I . O P T I M U M R E L A Y S E L E C T I O N I N A S I M P L I FI E D M O D E L The boun ds obtained in the previous section require us to comp ute the threshold function s { φ ℓ : ℓ = 0 , 1 , · · · , K − 1 } (see Definition 3) recursively . These are compu tationally very inten siv e to obtain. Hence, in th is section we simplify the exact model and extract a simple selection rule. Our aim is to ap ply this simple ru le to the exact model and compar e its perfor mance with the other po licies. A. The Simplified Model Now we describe ou r simplified mo del . Ther e are ˜ N relays. Here, ˜ N is a constant a nd is k nown to the sou rce. The key simp lification in this mod el is that here the relay nod es wake-up a t the first ˜ N points of a P oisson pr ocess of rate ˜ N T . Th e fo llowing are the mo tiv ation s for co nsidering such a simplification. No te that in our actu al model (Section II) , when N = ˜ N , the inter wake-up times { U k : 1 ≤ k ≤ ˜ N } are iden tically distributed [27, Chap ter 2], but n ot indepe ndent. Their co mmon cdf (cumulative distribution function ) is F U k | N ( u | ˜ N ) = 1 − (1 − u T ) ˜ N for u ∈ (0 , T ) . From Fig. 6 we observe that the cdf of { U k : 1 ≤ k ≤ ˜ N } is close to that of an exponential random variable of parameter ˜ N T and the app roximatio n becomes better for large values of ˜ N (fo r a fixed T ). This motiv a tes u s to appro ximate the a ctual in ter wake-up tim es by exponential rando m variable o f rate ˜ N T . Furth er in the simplified m odel we allow the inter wake-up times to be indep endent. Finally , observe that in the simp lified model the average nu mber of re lays that wake-up within the duty cycle T is ˜ N which is same as th at in the exact model when N = ˜ N . 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 u c.d.f. of U k c.d.f. of Y (a) 0 0.2 0.4 0.6 0 0.2 0.4 0.6 0.8 1 u c.d.f. of U k c.d.f. of Y (b) Fig. 6. T he cdfs F U k | N ( . | ˜ N ) and F Y ( . ) where Y ∼ E xponential ( ˜ N T ) (with T = 1 ) are plotted for (a) ˜ N = 5 and (b) ˜ N = 15 . W e will use the notatio ns such as ˜ W k , ˜ R k , ˜ U k , etc., to represent the analo gous qu antities tha t w ere defined for the exact m odel. For instance, ˜ W k represents the wake-up time of the k th relay . However , un like in th e exact 22 model, here ˜ W k can be beyond T . As men tioned b efore, { ˜ U k : k = 1 , · · · , ˜ N } are sim ply iid exponential random variables with para meter ˜ N T . { ˜ R k : k = 1 , · · · , ˜ N } are iid ran dom rewards with com mon pdf f R which is same as that in the exact m odel. B. MDP F ormulation Again, here the decision instants a re the tim es at wh ich the relays wake-up. At some stage k , 1 ≤ k < ˜ N , suppose ( ˜ W k , ˜ R k ) = ( w , b ) then the one step cost of stop ping is − η b and that of continuin g is ˜ U k +1 . Note that since ˜ U k +1 ∼ E xp ( ˜ N T ) , the on e step costs do not dep end on w , wh ich means that the optimal p olicy for the simplified m odel do es not depe nd on the value of w . Also since the number of relays ˜ N is a co ntant, we do not wish to re tain it as a p art o f the state unlike that in the actual state space S a k (Equation (6)). Therefor e we simplify the state space to be ˜ S 0 = { 0 } and for k = 1 , 2 , · · · , ˜ N , ˜ S k = [0 , R ] ∪ { ψ } . As befor e ψ is the terminating state. Suppo se at some stage 1 ≤ k < ˜ N the state is ˜ B k = b then the n ext state s k +1 will b e s k +1 = ψ if a k = 1 max { b, ˜ R k +1 } if a k = 0 . W e h ad mentioned th e one step c osts earlier . W e write them down here for th e sake o f co mpleteness, ˜ g k b, a k = − η b if a k = 1 ˜ U k +1 if a k = 0 . The cost of termination is simply ˜ g ˜ N ( b ) = − η b . C. Optimal P o licy via O ne-Step- Stopp ing Set In th is section we will pr ove tha t the one-step-lo ok-ahe ad rule is optimal fo r the simp lified mo del. The id ea is to show that the one-step- stopping set is a bsorbing [28, Section 4.4]. All these will now be defined. For an alterna te deriv ation of the optimal policy b y value iter ation, see the next section (Sectio n VI -D). At stage k , 1 ≤ k < ˜ N , when the state is b , the cost o f stop ping is simply c s ( b ) = − η b . The cost of contin uing for one more step (which is ˜ U k +1 ) and then stoppin g at the next stag e (where the state is max { b, ˜ R k +1 } ) is, c c ( b ) = E h ˜ U k +1 − η max { b, ˜ R k +1 } i = − η E [max { b, R } ] − T η ˜ N By defining the fun ction β ( · ) for b ∈ [0 , R ] as β ( b ) = E h max { b, R } i − T η ˜ N , (21) we can write c c ( b ) = − η β ( b ) . Note that b oth the c osts, c s and c c , do not depen d on the stage index k . 23 Definition 7: W e defin e the One-step-stopping set as, C 1 step = n b ∈ [0 , R ] : − η b ≤ − η β ( b ) o . (22) i.e., it is the set of all states b ∈ [0 , R ] where the cost of stopping, c s ( b ) , is less than the cost of continu ing for one more step a nd then stopping at the next stage c c ( b ) . W e will show that C 1 step is ch aracterized by a thresho ld α and can be written as C 1 step = [ α, R ] . This will require the following pro perties abo ut β ( · ) . Lemma 7 : 1) β is continuo us, increasing a nd conv ex in b . 2) If β (0) < 0 , then β ( b ) < b for all b ∈ [0 , R ] . 3) If β (0) ≥ 0 , then ∃ a uniq ue α such that α = β ( α ) . 4) If β (0) ≥ 0 , then β ( b ) < b for b ∈ ( α, R ] and β ( b ) > b f or b ∈ [0 , α ) . Pr oof: See App endix I II-A. Discussion of Lemma 7: When β (0) ≥ 0 then using Lemma 7.3 and 7 .4, we can w rite C 1 step in (22) as C 1 step = [ α, R ] . For the other case where β (0) < 0 , f rom Lemma 7.2 it follows that C 1 step = [0 , R ] . Thus by defining α = 0 when ev er β (0) < 0 we can write C 1 step = [ α, R ] for either case. Definition 8: Depend ing on the value of β (0) defin e α as follows, α = β 1 ( α ) if β 1 (0) ≥ 0 0 otherwise (23) Definition 9: A policy is said to be one-step-lo ok-ahe ad if at stage k , 1 ≤ k < ˜ N , it stops iff the state b ∈ C 1 step , i.e., iff the cost of stopp ing, c s ( b ) , is less than the cost of continuin g for o ne more step and then stopping, c c ( b ) . Definition 10: Let C be some subset of the state space [0 , R ] , i.e., C ⊆ [0 , R ] . W e say that C is abso rbing if for ev ery b ∈ C , if th e actio n at stage k , 1 ≤ k < ˜ N , is to con tinue, then th e n ext state, s k +1 at stage k + 1 , also falls into C . Since we have expressed C 1 step as [ α, R ] and since s k +1 = max { b, ˜ R k +1 } it is clear that C 1 step is absorbin g. Finally , r eferring to [28, Section 4 .4], it follows that, for o ptimal stopp ing pr oblems, when ever the one-step-stopp ing set is a bsorbing th en the o ne-step-loo k-ahea d rule is o ptimal . Thus the o ptimal policy f or the simp lified mo del is to ch oose th e first relay whose re ward is more than α . If n one of the r elays’ reward values are more th an α then at the last stage, ˜ N , c hoose the one with th e max imum r ew a rd. D. Optimal P olicy via V alue I teration In this section we p rovide an alternati ve d eriv atio n fo r the optima l p olicy ( already obtain ed in the previous section). W e will write down the v alue fun ctions starting fro m the last stag e ˜ N and proceed backwards, and th en simplify to o btain the optim al policy . 24 The value functio n for the last stage ˜ N is simp ly ˜ J ˜ N ( b ) = ˜ g ˜ N ( b ) = − η b . Next, when the stage is ˜ N − 1 , ˜ J ˜ N − 1 ( b ) = min − η b, E h ˜ U ˜ N + ˜ J ˜ N max { b, ˜ R ˜ N } i = min − η b, E h ˜ U ˜ N − η max { b, ˜ R ˜ N i = min − η b, − η E [max { b, R } ] − T η ˜ N = min n − η b, − η β 1 ( b ) o , (24) where the f unction β 1 ( · ) is exactly sam e as the f unction β ( · ) in (21), which we r eprodu ce here for convenience, β 1 ( b ) = E h max { b, R } i − T η ˜ N . β 1 satisfies the pro perties listed in Lemma 7. From (24) it is clear that at stage ˜ N − 1 the o ptimal policy is to stop iff − η b ≤ − η β 1 ( b ) , i.e., iff b ≥ β 1 ( b ) . Whenever β 1 (0) < 0 , f rom Lemma 7.2 and (24), we observe that at stage ˜ N − 1 it is o ptimal to stop for any b ∈ [0 , R ] . On the oth erhand when β 1 (0) ≥ 0 , from Lemm a 7.3, 7 .4 and ( 24), we can co nclude that it is optimal to stop iff b ≥ α . A p lot of th e function β 1 ( · ) for the ca se wh en β 1 (0) ≥ 0 is sh own in Fig. 7. It will follow that there is a similar fu nction at each stage. Formally , a t stage k there is a fun ction β K − k ( · ) such that at stage k it is o ptimal to stop if f b ≥ β K − k ( b ) . Furthe r β K − k ( · ) statisfies for b < α , β K − k ( b ) ≥ β 1 ( b ) and fo r b ≥ α , β K − k ( b ) = β 1 ( b ) . This property of the β functions is illustrated in Fig. 7 for stages K − 2 and K − 3 . Thus the optimal policy at any other stag e k = 1 , 2 , · · · , ˜ N − 2 , is same a s th e ab ove mentioned α -threshold policy . 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 b β 3 (b) β 2 (b) β 1 (b) b Fig. 7. Simplified Model: Illustra tion of the sucessi ve β functions. T he threshold α is the point of intersection of β 1 ( b ) with the linear functi on, b . In the figure, α = 0 . 6 . β ℓ ( b ) as a function of ℓ is increasi ng for b < α . For b ≥ α , β ℓ ( b ) = β 1 ( b ) . First we will extend the definition of α for the case wh en β 1 (0) < 0 by definin g α = 0 (wh ich is same as the definition o f α in (2 3) in the pr evious section). 25 Definition 11: Depending on the value of β 1 (0) define α as follows, α = β 1 ( α ) if β 1 (0) ≥ 0 0 otherwise Lemma 8 : For every k ∈ { 1 , 2 , · · · , ˜ N − 1 } the following hold s, ˜ J k ( b ) = min n − η b, − η β ˜ N − k ( b ) o , (25) where β 1 ( b ) is as defined in (21) and for k = 1 , 2 , · · · , ˜ N − 2 , β ˜ N − k ( b ) = E max n b, R, β ˜ N − ( k + 1) (max { b, R } ) o − T η ˜ N , (26) and has the prop erty , β ˜ N − k ( b ) ≥ β ˜ N − ( k +1) ( b ) for any b ∈ [0 , R ] . In particular if b ≥ α then β ˜ N − k ( b ) = β 1 ( b ) . Pr oof: Her e we provid e on ly an outlin e of the pro of. For a complete proo f, see Appendix III- B. The result already hold s f or k = ˜ N − 1 ( see (24) a nd (21)). Next we prove the r esult for ˜ N − 2 . The proof is b y indu ction. Suppose for some k , 1 < k ≤ ˜ N − 2 , (25) and (26) hold along with the o rdering prop erty men tioned in the Lem ma. W e write down the value function ˜ J k − 1 in term s of ˜ J k and straigh t fo rward manip ulation will y ield (2 5) and (2 6) for k − 1 . The orderin g result for k − 1 can also be easily obtained by using the order ing result for k . In Fig. 7 we have depicted this or dering behaviour of the β ℓ function s. The following main theorem is a simple con sequence of the Le mma 7 an d L emma 8. Theor em 3: At any stage k = 1 , 2 , · · · , ˜ N − 1 the optimal policy for the sim plified model is to s top if f ˜ B k = b ≥ α . Pr oof: From (2 5) in Lemma 8, it follows th at th e op timal policy is to stop iff − η b ≤ − ηβ ˜ N − k ( b ) i.e., b ≥ β ˜ N − k ( b ) . If b ≥ α the n fr om Lemm a 7.4 and Lemma 8 we have b ≥ β 1 ( b ) = β ˜ N − k ( b ) an d h ence it is optimal to stop (see Fig . 7 for an illustrations). On the other hand if b < α then (again from Lemma 7.4 and 8 ) we ha ve b < β 1 ( b ) ≤ β ˜ N − k ( b ) and hence the optimal action is to con tinue. Thus th e policy f or the simplified model is to simply select the first relay with a re ward of more that α . If all the relays have rew ard of le ss tha n α then at th e last stage ˜ N , c hoose the one with th e best r ew a rd. E. Analysis o f the α -Threshold P olicies W e h av e thus seen that th e optimal policy for the simplified mod el is character ized b y a thresho ld α . Let R α represent the reward obtained when the thresho ld used is α . R α is equa l to the reward value o f that relay to which the packet is finally f orwarded. W e are in terested in o btaining an expression for E [ R α ] (this will b e useful later in Section VII-B). E [ R α ] can be wr itten down as E [ R α ] = Z R 0 P ( R α > r ) dr , (27) which will require us to obtain P ( R α > r ) f or r ∈ [0 , R ] . Let us consider two cases, r ∈ [0 , α ] and r ∈ ( α, R ] . 26 For r ∈ [0 , α ] , th e a verage r ew a rd R α > r when ev er there is at least one relay with a rew a rd value of more than r . Therefor e for r ∈ [0 , α ] , P ( R α > r ) = P (max { ˜ R 1 , · · · , ˜ R ˜ N } > r ) = 1 − P (max { ˜ R 1 , · · · , ˜ R ˜ N } ≤ r ) = 1 − F R ( r ) ˜ N . (28) The third equality is because the ˜ R i ’ s are iid with F R being their commo n cdf. Now f or r ∈ ( α, R ] , the average reward R α > r whenever the set of r elays who se r ewar ds ar e mor e than α is nonemp ty and fur ther the r ew a rd of the first relay to wak e-up from this set is more than r . Therefor e for r ∈ ( α, R ] , P ( R α > r ) = 1 − F R ( α ) ˜ N 1 − F R ( r ) 1 − F R ( α ) . (29) 1 − F R ( α ) ˜ N is the probab ility th at there is at least one relay with a rew ard value of more than α and 1 − F R ( r ) 1 − F R ( α ) is the probab ility that the reward of the first relay (to wake-up from the set m entioned above) is more than r con ditioned on the fact that its reward is a lready more than α . Using (28) and (29) in (27) it is po ssible to numerically compute E [ R α ] . W e will use these expressions while describing a policy π A − S I M P L (in Section VI I-B) which is d erived from the simplified m odel. For α 1 > α 2 it is clear th an R α 1 ≥ R α 2 which means that E [ R α ] as a functio n of α is no n d ecreasing. V I I . N U M E R I C A L A N D S I M U L AT I O N R E S U LT S A. One Ho p P erformance Recall (f rom Section II) that ou r model adm its any general reward associated with a relay . In this section we perfor m and discuss a simulation stud y of geograph ical for warding in a d ense sensor network with sleep-wake cycling n odes where the reward provide d by a relay is th e prog ress mad e towards the base- station ( or sink) if the packet is for warded to that relay . In Appendix I V we have sho wn simula tion r esults for other r ew a rds (e.g., reward being a f unction of the pro gress and chan nel gain) . S ink S our ce i r c d d − Z i Z i Fig. 8. T he hatc hed regio n is the forw arding region . The so urce and sink ar e separ ated by a distance of d = 10 (see Fig. 8) . Th e sour ce has a packet to forward at time 0 . The commun ication rad ius of the source is r c = 1 . The potential relay nodes are the neighbor s of the source 27 that ar e closer to th e sink than itself. The period of sleep-wake cycling is T = 1 . Let Z i represent th e prog ress of relay i . Z i is the difference between the sou rce-sink and relay-sink distan ces. The reward associated with a relay i is simp ly the prog ress made by it, i.e., R i = Z i . W e interchangeab ly use p rogre ss and r ew ard in this section . Each o f the nodes is lo cated u niform ly in the forwarding set, inde penden tly of the o ther nodes. T herefor e, it can be shown that, the prog ress made by them are iid with pdf f Z ( r ) = 2( d − r ) cos − 1 d 2 +( d − r ) 2 − r c 2 2 d ( d − r ) Area o f th e forwarding regio n , (30) and the support of f Z is [0 , r c ] . Hence r c is analogous to R (see System Model, Sectio n II). W e take th e bound on the number of relays as K = 50 , an d the initial pmf is taken as truncated Poisson with parameter 10 , i.e. , for n = 1 , 2 , · · · , K , p 0 ( n ) = c 10 n n ! e − 10 where c is the no rmalization co nstant. The above mentioned rew ard p mf ( f Z ) and initial b elief ( p 0 ) will b e a goo d approxim ation if the nodes are d eployed in a region according to a spatial Poisson p rocess o f rate 10 . The appr oximation will be come be tter fo r larger values of K . Since it is com putationa lly intensive to obtain the thresholds { φ l } in (17) indu ctiv ely , we h av e discretized the space [0 , T ] × [0 , R ] into 1 00 × 1 0 0 e qually spaced po ints and ob tain { φ l } at these points. Appropr iate pmfs are obtained from th e pdfs. All the analysis in th e pr evious sections hold f or th is discrete setting as well. When the actual state space S a k is discrete, then ther e are e stablished algorithm s to ob tain the optimal policy fo r POMDP problems [22], [ 23], [ 24]. However it is high ly comp utationally intensive to apply these algorithms he re because of the large state space. For instance with K = 50 , the cardinality of S a 1 is 50 × 100 × 10 0 . Hence we compare the p erform ance of our suboptimal POMDP p olicies with th e COMDP policy (Section IV) th at is optimal when the actual nu mber o f re lays is known and hence serves as a lower bo und fo r the cost that can be achieved by the optima l POMDP policy . 1) Implemented P o licies (o ne-hop ): W e summarize the various policies we have implemen ted. • π C OM DP : The source knows the actual v alue o f N . Suppose N = n , then the source begins with an initial belief with m ass on ly on n . At any stage, k = 1 , 2 , · · · , n , if the delay an d best reward pair is ( w, b ) the n transmit if b ≥ φ n − k ( w, b ) , continue otherwise. See the remark following Lemm a 3. • π I N N E R : W e u se the inner bou nd C k ( w, b ) to obtain a suboptim al policy . At stage k if the belief state is ( p, w, b ) ( ∈ S k ) , then transmit iff p ∈ C k ( w, b ) . • π OU T E R : W e u se the outer bound C k ( w, b ) to obtain a suboptim al policy . At stage k if the belief state is ( p, w, b ) ( ∈ S k ) , then transmit iff p ∈ C k ( w, b ) . • π A − C OM DP : (A verage-COMDP) Th e source a ssumes that N is equal to its average value N = [ E N ] 2 , and begins with an initial pm f with mass o nly on N . Suppose N = n , which the source does n ot know , then at some stage k = 1 , 2 , · · · , min { n, N } if th e delay and best reward pair is ( w , b ) then transmit iff b ≥ φ N − k ( w, b ) . In th e case when N > n , if the sou rce h as not transmitted until stage n and fur ther at stage n if the action 2 [x] represe nts the smallest inte ger greate r than x. 28 is to continu e, th en since there are no more relays to go , the source ends up waiting until time T and th en forwards to the no de w ith the best reward. • π A − S I M P L : (A verage-Simple) Th is policy is der iv ed from the simplified model describ ed in Section VI. The source considers the simplified model assuming that ther e are N = [ E N ] number of relay s. I t computes th e threshold α according ly using (23). The p olicy is to tr ansmit to th e first relay that wak es up an d offers a rew ard (pr ogress in th is ca se) of more than α . If there is n o such r elay then the so urce ends up waiting until time T , and th en transmits to the nod e with the b est reward. 2) Discussion: W e have per formed simulations to obtain the average values for the above policies for se veral values o f η ra nging from 0 . 1 to 100 0 . In Fig. 9( a), we plot the av erage delays of the policies describ ed ab ove as a function of η . Th e average reward is plotted in Fig. 9(b). 10 −1 10 0 10 1 10 2 10 3 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 η Average Delay π COMDP π INNER π OUTER π A−COMDP π A−SIMPL (a) 10 −1 10 0 10 1 10 2 10 3 0.4 0.45 0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 η Average Progress π COMDP π INNER π OUTER π A−COMDP π A−SIMPL (b) Fig. 9. (a) A verage Dela y as a function of η . (b) A ve rage Re ward as a functio n of η . As a functio n o f η both the average delay and the average reward are increasing. This is because for larger η we value the pr ogress more so that we tend to wait for lon ger time to d o better in prog ress. For very small values of η , all the thresh olds ( { φ ℓ } and α ) are very small and mo st of the time, th e packet is forward ed to the fir st nod e (referre d to as th e F irst F orwar d po licy in [7]). F or very high values of η the policies end up waiting fo r all the relays and the n cho ose the one with th e best reward (refer red to as the Max F orward policy in [ 7]). Therefor e, as η inc reases the average progr ess of all the p olicies (excluding π A − C OM DP ) conver ge to E [max { Z 1 , · · · , Z N } ] which is abo ut 0 . 82 (see Fig. 9( b)). However the av erage pr ogress for π A − C OM DP conv erges to a value less tha n 0 . 82 . Th is is because whenever N < N a nd for large η (wh ere all the threshold s { φ ℓ } are large) π A − C OM DP ends up waiting for the first N relays and obtain an pro gress of max { Z 1 , · · · , Z N } wh ich is less than (o r equal to) th e progr ess made by the other policies (wh ich is max { Z 1 , · · · , Z N } ). Recall that th e main prob lem we are in terested in is the one in (11). W e should be co mparin g the average d elay obtained using the above p olicies such that the average reward p rovided by each of them is γ . Th is will require us, fo r ea ch p olicy , to use an η such that the average rew ard is e qual to γ . Since we do not have a ny closed for m 29 expression for average reward in term s of η , we pr oceed as f ollows. W e fix a target γ . For ea ch policy , we choose among th e sev eral average rew ard values (correspo nding to the several η values) th e on e that is closest to the target γ and con sider the corre spondin g average delay . For d ifferent target γ , in T ables II and II I we have tabulated such av erage pr ogress and delay values respectively fo r d ifferent policies. T arget γ 0.6800 0.7200 0.7600 0.8000 E [ R π C OM D P ] 0.6840 0.7198 0.7612 0. 8000 E [ R π I N N ER ] 0.6822 0.7212 0.7600 0. 8001 E [ R π OU T E R ] 0.6789 0.7208 0.7578 0. 8003 E [ R π A − C OM D P ] 0.6773 0.7195 0.7590 0. 8005 E [ R π A − S I M P L ] 0.6819 0.7165 0.7585 0. 7996 T ABLE II F O R A G I V E N TA R G E T γ ( A C O L U M N ) A N D A P O L I C Y ( A RO W ) T H E E N T R Y I N T H E TA B L E C O R R E S P O N D S T O T H E A V E R A G E P RO G R E S S V A L U E T H AT I S C L O S E S T T O T H E TAR G E T γ . T arget γ 0.6800 0. 7200 0.7600 0.8000 E [ D π C OM D P ] 0.2262 0.2711 0.3529 0.5012 E [ D π I N N ER ] 0.2343 0.2905 0.3735 0.5450 E [ D π OU T E R ] 0.2359 0.2967 0.3756 0.5551 E [ D π A − C OM D P ] 0.2336 0.2954 0.3825 0.5997 E [ D π A − S I M P L ] 0.2338 0.2823 0.3684 0.5415 T ABLE III F O R A G I V E N TA R G E T γ ( A C O L U M N ) A N D A P O L I C Y ( A R O W ) T H E E N T R Y I N T H E TAB L E I S T H E AV E R A G E D E L AY VAL U E C O R R E S P O N D I N G T O T H E A V E R A G E P RO G R E S S VA L U E I N T A B L E I I . The entries in the first row of bo th the tab les contain different values o f target γ ( namely , 0 . 68 , 0 . 72 , 0 . 76 and 0 . 8 ). W e will discuss the en tries in the last column (i.e., entr ies c orrespon ding to the target γ of 0 . 8 ). By r eading the values fr om the last column of T able. II, which con tains the average pro gress values, we see that the average progr ess fo r all the policies are with in ± 0 . 00 05 of 0 . 8 (for oth er columns all the entr ies are within ± 0 . 0 05 of the co rrespon ding target γ ). Hence it is reasonab le to comp are the delay values o f the various policies in the last column of T able III. As expected, the COMDP obtains the lowest delay (of 0 . 501 2 ). There is only a very small perfor mance gap between the INNE R and OUTER bound policies i.e., the delay ob tained by th e INNER bou nd policy ( 0 . 5450 ) is slightly less th an that of the OUTER b ound policy ( 0 . 555 1 ). The sch eme A-COMDP , wh ich simply assumes th at the actual n umber of relays is the a verage of the initial belief, results in a h igher delay (o f 0 . 5997 ). Interestingly we observe that the policy A-SIMPL, wh ich was d eriv ed from the simplified m odel per forms very close to th e INNE R boun d policy (with an a verage delay o f 0 . 54 15 ). Other co lumns can b e read similarly . For small v alues of target p rogre ss, γ , we see similar perfor mance fo r all the policies. T hese o bservations are f or th e particular c ase wher e the rew ard is simply the p rogress an d the initial b elief is trunc ated Poisson. I n Ap pendix IV we have shown simulation re sults for oth er rew ard structures and in itial beliefs. W e observe similar behavior ther e as w ell. 30 B. End-to- End P erformanc e The single h op p roblem con sidered by us was originally motiv ated by the end-to-en d problem. In the g eograp hical forwarding context, the end- to-end metrics of interest are the total delay and ho p count. Hop co unt is impo rtant because it is pro portion al to the num ber of transmissions and h ence the en ergy expended by the network. Each of these metr ics immed iately m otiv a tes u s to consider two extrem e po licies. One policy is for each node to transmit to its first neighbor in the f orwarding set to wake-up. The second policy is to wait for all the neighbo rs in the forwarding set to wake-up and then transmit to the one that makes maximum prog ress towards the sink. I t is reasonable to expect that the first po licy will minim ize the en d-to-en d d elay while the secon d one will result in the least h op cou nt. Hence ther e is a tradeoff between the two metrics. Suppo se we want to minimiz e the average total end-to -end delay by imp osing a n a verage hop count constrain t of h . Le t d b e the distance between the source and the sink. Heuristically , we expect that the hop c ount con straint would be ( appro ximately) met if e ach no de, enroute to the sink, contributes an a verage progress of d h . For this average progress constraint if each no de now uses the locally optim al policy ( π C OM DP ), we e xpect the a verag e delay at each ho p to be minimized and, hen ce, obtain close to o ptimal average total delay . Instead o f the optimal p olicy , each n ode can use th e policy π A − S I M P L since its one hop performan ce is close to the optimum. Also, its application only requires a nod e i to com pute a simple threshold α i , unlike th e other p olicies where the threshold { φ ℓ } computatio n is intensive. Fig. 1 0 illustrates the mu ltihop forwarding algorith m with each node using the locally d erived thresho ld (obtained for m the simplified model in Section VI) to forward. Next we br iefly de scribe th e n etwork setting and the imp lemented p olicies. r c sink d α i α j i j Fig. 10. Each node enroute to the sink uses the thresho ld obtained from the simplified model. α i and α j are the thresholds used by nodes i and j respect i vel y . 1) Network Setting : First we fix a network by p lacing M nod es rando mly in [0 , L ] 2 where L = 10 . M is sam pled from P oiss on ( λL 2 ) wh ere λ = 5 . Ad ditional source a nd sink n odes are p laced at the lo cations (0 , 0) and ( L, L ) respectively . Further we have consider ed a network r ealization where the fo rwarding set of each node is nonem pty . The wake-up times of the node s are sampled in depend ently from U nif orm ([0 , T ]) with T = 1 . If the wake-up instant of a n ode i is T i then it wakes u p at the pe riodic instances { k T + T i : k ≥ 0 } . The com munication ra dius of each n ode is r c = 1 . The sour ce is given a pac ket at time 0 and we are in terested in ro uting this p acket to the sink. 31 2) Implemented P olicies (end-to-en d): W e also com pare ou r work with th at of Kim et al. [1] who have developed end-to- end delay optimal geographical forwarding in a network setting similar to ours. W e first gi ve a brief description of their work. They min imize, for a given network, the average delay f rom any n ode to the sink when each node i wakes u p asynchron ously with rate r i . They show th at periodic wake up patterns ob tain m inimum d elay among all sleep-wake pattern s with the same rate. A relay no de with a p acket to forward, tran smits a sequence of beacon- ID signals. They propo se an algorith m called LOCAL- OPT [13] which yields, for each neighb or j of n ode i , an integer h ( i ) j such th at if j wakes up an d listens to the h th beacon signal fro m node i and if h ≤ h ( i ) j , then j will send an AC K to r eceiv e the packet from i . Otherwise (if h > h ( i ) j ) j will go bac k to sleep and i will c ontinue waiting for fu rther n eighbor s to wake-up. A confi guration p hase is requ ired to run the LOCAL-OPT algor ithm. T o make a fair comparision with the work o f Kim et al. in our network setting we also introduce b eacon-I D signals of duratio n t I = 5 msec and p acket transmission du ration of t D = 30 msec . Description o f the policies we have implemented is g iv en below , • π F F (First Forward): Each of the no de, wh enever it gets a packet, it will alw ays transmit to the first neig hbor in its forwarding set to wake-up, irrespective o f th e prog ress made by it. • π M F (Max Forward): W e assume that each nod e kn ows the numbe r of neig hbors in its forwarding set. in this policy a no de, when it gets a packet, it will wait f or all of its neighbo rs in the fo rwarding set to wake-up. Finally when the last nod e wakes up, it will forward the packet to the one which achieves maximu m pr ogress tow ards th e sink. • ˆ π S F (Simplified Forward): This en d-to-en d p olicy works by app lying the π A − S I M P L policy at each hop. First we fix γ as a ne twork pa rameter (as mentioned before, γ can be set to d h ). Nodes do not know the number of neigh bors in their f orwarding set. Ho wev er they know the n ode density and thus estimate s this n umber as [ λ × for warding set area ] . Using th is estimated num ber, a no de considers the simplified model an d co mes up with a threshold α such that the average prog ress E R α in (27) is equal to γ (see also (28) an d (29)). E R α as a fun ction of α is no n d ecreasing. He nce f or so me n ode i , if γ < E R 0 then node i choo ses its thre shold as 0 , and if γ > E R r c then no de i uses r c as its thr eshold. Su ppose n ode i has a pa cket to forward. When a neighbo r of no de i , say n ode j , wakes up and hears a beacon signal from i , it waits fo r the ID signa l and then sends an ACK signal containin g its location inf ormation . If the progr ess made by j is more than the threshold, then i f orwards th e pa cket to j ( packet d uration is t D = 30 msec ). If the p rogress ma de b y j is less than the threshold, then i asks j to stay aw a ke if its pro gress is the maximu m am ong all the n odes that ha ve woken up thus far, otherwise i asks j to return to sleep. If m ore than one node wakes up dur ing the same beacon signal, then con tentions are resolved by selecting the on e which makes the most pr ogress a mong them . In the simulation, this hap pens in stantly (as also for the Kim et al. algo rithm that we comp are with); in practice this will r equire a splitting algorithm ; see, fo r example, [30, Chapter 4.3 ]. W e assume that within t I = 5 msec all these transactions (b eacon sign al, I D, A CK and contention resolutio n if any) are over . If there is no eligib le node ev en after the T t I − th beac on sign al (one case when this is possible is when the actual number of nodes 32 16 18 20 22 24 26 28 30 32 34 4 6 8 10 12 14 16 18 Average Hop Count Average Total Delay π S F ˆ π S F π F F π M F K im et al . γ Fig. 11. End-to-end performance: Plot of avera ge end-to-end delay vs. avera ge end-to-end hop count obtained by applying the s imple α - threshold policy , π A − S I M P L , at each hop. The operating points of the policies π F F , π M F and Kim et al. are also shown in the figure. Each point on the curve corresp onds to a differe nt v alue of γ which increases along the dire ction shown . N is less tha n [ λ × Forwarding set Are a ] and none of the nod es make a progr ess of mor e than the thr eshold) then i will select one which makes the maxim um progr ess amon g all no des. • π S F : T his is the same as ˆ π S F , b ut here we assume th at each node knows th e exact number of neighb ors in its fo rwarding set and uses this exact number to come u p with the th reshold α . Unlike in the previous case, here if none of th e neighb ors of nod e i make a prog ress of more that th e threshold used by i then, k nowing the number o f n eighbo rs, node i cho ose the neig hbor with the best progr ess when the last on e wakes u p. π F F and π M F can b e thoug ht of as special cases of π S F with threshold s of 0 and r c respectively . • Kim et al. : W e run th e LOCAL-OPT algorithm [13] on the network and obtain the values h ( i ) j for each pair ( i, j ) where i an d j are neighb ors. W e use th ese values to route fro m source to sink in th e presence of sleep wake cycling. Contentions, if any , ar e resolved (instantly , in the simulatio n) by selecting a no de j with the highest h ( i ) j index. 3) Discussion: In Fig. 11 we plot average total de lay vs. a verage hop count for different policies f or fixed n ode placement, wh ile th e averaging is over the wake-up times of the nod es. Each point o n the curve is obtained b y av eraging over 100 0 transfers of the packet f rom the source node to the sink . As expected, Kim et al. ach iev es minimum average delay . In co mparision with π F F , Kim et al. also achieve smaller a verag e hop coun t. Notice, howe ver that using ˆ π S F (or π S F ) policy an d prope rly ch oosing γ , it is possible to o btain hop coun t similar to that of K im et al., incurrin g only slightly hig her dela y . The advantage of ˆ π S F over Kim et al. is that there is no need for a con figuration pha se . Eac h rela y n ode has to only co mpute a thresho ld that depends on the parameter γ which can be set as a network parameter dur ing 33 deployment. A more intere sting app roach would be to allow th e sour ce node to set γ depend ing on th e type of application. For delay sensitive app lications it is appr opriate to use a smaller value of γ so that th e d elay is small, whereas, for energy c onstrained app lications (where the network energy needs to c onserved) it is b etter to u se large γ so that the numb er of hops ( and h ence the n umber of tran smissions) is red uced. For other ap plications, mo derate values of γ ca n b e used. γ can be a part of the ID signal so that it is mad e available to the next ho p relay . Another interesting observation fr om Fig. 11 is that the perf ormanc e of ˆ π S F is close to that of π S F . In practice, it may not be po ssible for a nod e to k now the exact numbe r of relay s in its forwardin g set, due to varying channe l condition , n ode failures, etc. Recall that ˆ π S F works with the av erage number of nodes instead of the actua l number . For small values of γ both the p olicies π S F and ˆ π S F , m ost of the time, tr ansmit to the first n ode to wake up. Hence the perfo rmance is similar for small γ . F or la rge γ , we observe tha t the delay incurred by ˆ π S F is larger . V I I I . C O N C L U S I O N Our work in th is paper was motiv ated by the pro blem of g eograp hical forwarding of p ackets in a wireless sensor networks wh ose fu nction is to detect certain inf requen t ev ents and forward these alarms to a base station, an d whose nod es a re sleep-wake cycling to conserve energy . This end-to -end p roblem gave rise to the loc al problem faced by a p acket forwarding node, i.e ., that of choosing one among a set of potential relays, so as to minimize the average delay in selecting a r elay subjec t to a c onstraint o n the average pro gress (or some rew ard, in general). The source does not know the nu mber of a vailable relays, which made this a sequential decision prob lem with partial information . W e formulate d the prob lem as a finite horizon POMDP with th e unkn own state being the number of av ailable relay s. The optimum stop ping set is the set of all pmfs on the number of relays f or wh ich the a verage cost o f stopping is less than that o f continuing . W e showed th at the optimum stopping set is con vex (Corollary 1) and obtained thre shold points alon g ce rtain edges o f the simplex which belong to the optimal stop ping set. A c onv ex comb ination o f the se poin t gave u s an inner bou nd fo r the optimum stopping set (Theo rem 1). W e proved a mon otonicity result and obtain ed an outer bo und ( Theorem 2). W e also obtain ed a simple thr eshold rule by formu lating an alternate simplified mod el (Section VI). W e have perfor med simulations to co mpare th e performan ce of the various po licies. W e observe that the inn er bound po licy ( π I N N E R ) is better th an th e ou ter boun d ( π OU T E R ). Fur ther the perform ance of the simple thresh old policy ( π A − S I M P L ) is comparab le with π I N N E R , bo th of which are close to the op timal po licy ( π C OM DP ). W e have pe rforme d one- hop simulations for few other examples where we h ave con sidered different r ew ards and initial beliefs ( see Append ix IV). In all the examples, we o bserve the go od pe rforma nce of th e p olicy π A − S I M P L . W e have devised simple end- to-end policies ( π S F and ˆ π S F ) using π A − S I M P L . W e have shown that by varying a network parameter these policies can fav ourably tradeoff between the average total delay and average hop cou nt. R E F E R E N C E S [1] J. Kim, X. Lin, and N. Shrof f, “Opti mal Anycast Technique for Delay-Sensiti ve E nergy -Constrain ed Asynchronous Sensor Netw orks, ” in INFOCOM 2009. The 28th Confer ence on Computer Communic ations. IEEE , April 2009, pp. 612–620. 34 [2] Q. Cao, T . Abdelzaher , T . He, and J. Stank ovic , “T owa rds Optimal Sleep Scheduli ng in Sensor Networks for Rare-Ev ent Detect ion, ” in IPSN ’05: Pr oceedings of the 4th internat ional symposium on Informati on proce ssing in sensor net works , 2005. [3] K. Premkumar , A. Kumar , and J. Kuri, “Distri bute d Detection and L ocali zation of Event s in Large Ad Hoc Wireless Sensor Netw orks, ” in Communic ation, Contr ol, and Computing, 2009. Allert on 2009. 47th Annual Allerton Confer ence on , sept. 2009, pp. 178 –185. [4] G. L u, N. S adagopa n, B. Krishnamacha ri, and A. Goel, “Delay Efficient Sleep Scheduli ng in Wireless Sensor Networks, ” in In IEE E INFOCOM , 2005, pp. 2470–2481. [5] K. Akkaya and M. Y ounis, “ A Surve y on Routing Protoco ls for Wireless Sensor Networks, ” Ad Hoc Networks , vol. 3, no. 3, pp. 325 – 349, 2005. [6] M. Mauve , J. Wi dmer , and H. Hartenstein, “ A Surve y on Position-Ba sed Routing in Mobil e Ad-Hoc Networks, ” IEEE Network , vol . 15, pp. 30–39 , 2001. [7] K. P . Nav een and A. Kumar , “T unable Local ly-Optimal Geographical Forward ing in Wireless Sensor Networks with Sleep-Wak e Cycling Nodes, ” in INFOCOM 2010. The 29th Confere nce on Computer Communic ations. IEEE , march 2010, pp. 1–9. [8] J. H. Chang and L. T assiulas, “Maximum Lifetime Routing in Wireless Sensor Networks, ” IEEE/ACM T rans. Netw . , vol. 12, no. 4, pp. 609–619, 2004. [9] J. Xu, B. Peric, and B. V ojcic, “Performance of Energy-Awa re and Link-Adapti ve Routi ng Metrics for Ultra Wideband Sensor Network s, ” Mob . Netw . Appl. , vol. 11, no. 4, pp. 509–519, 2006. [10] M. Zorzi and R. R. Rao, “Geographic Random Forwardi ng (GeRaF) for Ad Hoc and Sensor Networks: Multihop Performance, ” IEE E T ransaction s on Mobile Computing , vol. 2, pp. 337–348, 2003. [11] S. Liu, K. W . Fan, and P . Sinha, “CMAC: An Energy Efficient MAC Layer Protocol using Con vergent Packet Forwarding for Wireless Sensor Netw orks, ” in Sensor , Mesh and Ad Hoc Communicati ons and Netw orks, 2007. SECON ’07. 4th Annual IEE E Communications Societ y Confere nce on , June 2007, pp. 11–20. [12] V . Pa ruchuri, S. Basav araju, A. Durresi , R. Kannan, and S. S. Iyengar , “Rand om Asynchronou s Wakeup Protocol for Sensor Network s, ” Internati onal Confere nce on Bro adband Networks , vol. 0, pp. 710–717, 2004. [13] J. Kim, X . Lin, and N. B. Shrof f, “Optimal Anycast Technique for Delay Sensiti ve Ener gy-Constrai ned Asynchronous Sensor Netw orks, ” 2008, Technic al Report, Purdue Uni versity . [Online]. A v ailabl e: http:/ /web .ics.purdue.edu/ ∼ kim309/Kim08t ech3.pdf [14] M. Rossi, M. Zorzi, and R. R. Rao, “Stati sticall y Assisted Routi ng Algorithms (SARA) for Hop Count Based Forwardin g in Wireless Sensor Netw orks, ” W ire l. Netw . , vol. 14, no. 1, pp. 55–70, 2008. [15] P . Chaporkar and A. Proutiere, “Optimal Joint Probing and Transmission Strate gy for Maximizing Throughput in W ireless Systems, ” Selec ted A rea s in Communicati ons, IEEE Journal on , v ol. 26, no. 8, pp. 1546–1 555, October 2008. [16] N. B. Chang and M. Liu, “Opti mal Channel Probing and Transmission Scheduli ng for Opportun istic Spect rum Acce ss, ” in MobiCom ’07: Pr oceedi ngs of the 13th annual ACM internationa l confer ence on Mobile computing and net working , 2007, pp. 27–38. [17] M. Sakagu chi, “Dynamic Programming of Some Sequential Sampling D esign, ” J ournal of Mathematical Analysis and Applications , vol. 2, no. 3, pp. 446 – 466, 1961. [18] S. Karlin, Stochasti c Models and Optimal P olicy for Selling an Asset, Studie s in Applied Probab ility and Mana geme nt Science / edited by K enneth J . Ar r ow , Samuel Karlin, Herbert Scarf . Stanford Uni versity Press, Stanford , Calif, 1962. [19] B. K. Kang, “Optimal Stopping Problem with Double Reserv ation Value Propert y , ” Eur opean Journal of Operat ional Resear ch , vol. 165, no. 3, pp. 765 – 785, 2005. [20] I. Da vid and O. L e vi, “ A New Algorithm for the Multi-ite m Exponentiall y Discounte d Optimal Select ion Problem, ” Eur opean Jou rnal of Operat ional Researc h , vol. 153, no. 3, pp. 782 – 789, 2004. [21] M. S. Ee, “ Asset-Selling Problem with an Uncerta in Dead line, Quitting Offe r , and Search Skipping Option, ” Europe an Jo urnal of Operat ional Researc h , vol. 198, no. 1, pp. 215 – 222, 2009. [22] G. E. Monahan, “ A Surve y of Partially O bserv able Marko v Decision Processes: Theory , Models, and Algorith ms, ” Manag ement Sci ences , vol. 28, no. 1, pp. 1–16, January 1982. [23] W . S. Lov ejoy , “ A Survey of Algorithmic Methods for Partiall y Observed Mark ov Decision Processes, ” Ann. Oper . Res. , vol . 28, no. 1-4, pp. 47–66 , 1991. [24] R. D. Small wood and E. J. Sondik, “The Opti mal Control of Parti ally Observab le Markov P rocesses Over a Finite Horizon, ” Oper ations Resear ch , vol. 21, no. 5, pp. 1071–10 88, 1973. 35 [25] J. M. P orta, N. Vlassis, M. T . J. Spaan, and P . Poupart, “Point-Based Valu e Iteratio n for Continuous POMDPs, ” Jo urnal of Machine Learning Resear ch , vol . 7, pp. 2329–2367, 2006. [26] R. W . W olf f, Stoc hastic Modeling and The Theory of Queues . Prentice Hall, NJ, 1989. [27] H. A. David and H. N. Nagaraja , Order Statistics (W ile y Series in Pr obabili ty and Stati stics) . W ile y-Intersc ience , August 2003. [28] D. P . Bertsekas, Dynamic Pr ogr amming and Optimal Contro l, V ol. I . Athena Scienti fic, 2005. [29] S. Bo yd and L. V anden berghe , Con vex Optimi zation . Cambridge Unive rsity Press, March 2004. [30] D. Bert sekas and R. Galla ger , Data netwo rks . Upper Saddle Riv er , NJ, USA: Prentice-Hal l, Inc., 1992. A P P E N D I X I P R O O F S O F L E M M A S I N S E C T I O N I V A. Pr oof of Lemma 2 From ( 17) (the sub scripts of both U and R in the following expressions is K − ℓ + 1 , which w e h ave suppressed for simplicity), φ ℓ ( w, b ) = E K − ℓ max b, R, φ ℓ − 1 w + U, max { b, R } − U η w, K ≥ E K − ℓ max b, R, φ ℓ − 1 w + U, max { b, R } w, K − T − w η ≥ b − T − w η , where the fir st inequ ality follows fro m U ≤ T − w and the second du e to th e max inside the expectatio n. B. Pr oof of Lemma 3 W e p roceed by v alue itera tion. First we will sh ow that the lemma h olds for k = n , where the fixed n cou ld be either less than K or equal to K ( recall that K is the bound o n the n umber of relays). Supp ose n < K . Since p n ( n ) = 1 , fro m ( 15), it f ollows that c n ( p n , w, b ) = T − w − η b . Therefore, J n ( p n , w, b ) = min n − η b, T − w − η b o = − η b . If n = K then J n ( p n , w, b ) = g K ( p n , w, b ) = − η b . Thus for any fixed n we can wr ite J n ( p n , w, b ) = min n − η b, − η φ n − n ( w, b ) o . Suppose f or some k = 1 , · · · , n − 1 th e fo llowing holds, J k +1 ( p k +1 , w, b ) = min n − η b, − η φ n − ( k +1) ( w, b ) o . Then, c k ( p k , w, b ) = E k U k +1 + J k +1 τ k +1 ( p k , w, U k +1 ) , w + U k +1 , max { b, R k +1 } w, n = E k U k +1 + min − η max { b, R k +1 } , − η φ n − ( k +1) w + U k +1 , max { b, R k +1 } w, n = − η E k max b, R k +1 , φ n − ( k +1) w + U k +1 , max { b, R k +1 } − U k +1 η w, n . 36 In th e second equality we have used the induction hypo thesis and the fact that if p k ( n ) = 1 th en τ k +1 ( p k , w, U k +1 )( n ) = 1 . The expectation in (31) is over the pdf f R () f k ( | w, n ) . From (4), n ote tha t th e pd f f k ( | w, n ) depen ds on k an d n only throu gh the difference n − k . Therefor e f k ( . | w, n ) = f K − ( n − k ) ( . | w, K ) . Using this and (17) in (31) we can wr ite c k ( p k , w, b ) = − η E K − ( n − k ) max b, R, φ n − ( k +1) w + U, max { b, R } − U η w, K = − η φ n − k ( w, b ) . (31) Finally u sing (16), we can write, J k ( p k , w, b ) = min n − η b, − η φ n − k ( w, b ) o . Hen ce we hav e pr oved that the lemma holds for k if it is true for k + 1 . Since we h ave already shown that the lemm a holds fo r n , from induction argument we can conclud e that it ho lds fo r all k = 1 , 2 , · · · , n . A P P E N D I X I I P R O O F O F L E M M A S I N S E C T I O N V A. Pr oof of Lemma 4 The essence of the proof is same as that in [25, Lemma 1]. W e provide the pr oof here for completene ss. The cost to c ontinue at stage K − 1 is (see (15)), c K − 1 ( p, w, b ) = p ( K − 1) T − w − η b + p ( K ) E K − 1 h U K − η max { b, R K } w, K i = p ( K − 1) T − w − η b + p ( K ) − η φ 1 ( w, b ) . (32) Thus we have shown that c K − 1 ( · , w, b ) is an affine functio n of p ∈ P K − 1 , f or every ( w, b ) . Recalling ( 16), J K − 1 ( · , w, b ) , be ing th e minimu m of two affine f unction s, − η b and c K − 1 ( · , w, b ) , is con cave on P K − 1 . The proo f now proceed s by inductio n. Inductio n hypothesis : For some k = 1 , 2 , · · · , K − 2 , and for each ( w, b ) , J k +1 ( · , w, b ) is c oncave on P k +1 and can b e written down as, J k +1 ( p, w, b ) = inf α ∈A k +1 ( w, b ) h α, p i = D α ( p,w ,b ) k +1 , p E , (33) where A k +1 ( w, b ) is some collection of K − k leng th vector s and α ( p,w ,b ) k +1 = arg min α ∈A k +1 ( w, b ) h α, p i . There ar e two poin ts to note here. First, in g eneral a co ncave func tion can be written down as an infimum over some c ollection o f affine func tions of th e fo rm h α, p i + c wh ere c is some con stant. Howe ver , we claim that the re are no such constants associated with the α vectors in the set A k +1 ( w, b ) . Second, we are claiming the existence of the vector α ( p,w ,b ) k +1 . Notice that both of these claims are tr ue for stage K − 1 , sinc e the set A K − 1 ( w, b ) c omprises only two vectors, ( T − w − η b ) , − η φ 1 ( w, b ) and ( − η b, − η b ) , i.e ., the inductio n hypothesis ho lds for k = K − 1 . 37 T o show th at J k ( · , w, b ) is concave on P k , it suffices to prove that c k ( · , w, b ) is concave. c k in (15) can be written down as, c k ( p, w, b ) = p ( k ) T − w − η b + K X n = k +1 p ( n ) E k h U k +1 w, n i + K X n = k +1 p ( n ) E k J k +1 τ k +1 ( p, w, U k +1 ) , w + U k +1 , max { b, R k +1 } w, n . (34) Let us focu s on the third term in the above summ ation. Call it s 3 for conv enience. s 3 = K X n = k +1 p ( n ) Z R 0 Z T − w 0 f R ( r ) f k ( u | w, n ) J k +1 τ k +1 ( p, w, u ) , w + u, max { b , r } dudr = K X n = k +1 p ( n ) Z R 0 Z T − w 0 f R ( r ) f k ( u | w, n ) D α ( τ k +1 ( p,w ,u ) ,w + u, max { b,r } ) k +1 , τ k +1 ( p, w, u ) E dudr . (35) Substituting f or τ k +1 ( p, w, u ) from (9) and simplifyin g yields, s 3 = K X n = k +1 p ( n ) Z R 0 Z T − w 0 f R ( r ) f k ( u | w, n ) K X n ′ = k +1 α ( τ k +1 ( p,w ,u ) ,w + u, max { b,r } ) k +1 ( n ′ ) p ( n ′ ) f k ( u | w, n ′ ) P K ℓ = k +1 p ( ℓ ) f k ( u | w, ℓ ) dudr = K X n ′ = k +1 p ( n ′ ) Z R 0 Z T − w 0 f R ( r ) f k ( u | w, n ′ ) α ( τ k +1 ( p,w ,u ) ,w + u, max { b,r } ) k +1 ( n ′ ) dudr . (36) Define K − k + 1 leng th vector α ( p,w ,b ) k as α ( p,w ,b ) k ( k ) = ( T − w − η b ) an d f or n = k + 1 , · · · , K , α ( p,w ,b ) k ( n ) = E k h U k +1 w, n i + Z R 0 Z T − w 0 f R ( r ) f k ( u | w, n ) α ( τ k +1 ( p,w ,u ) ,w + u, max { b,r } ) k +1 ( n ) dudr . (37) Then (34) can be written as, c k ( p, w, b ) = D α ( p,w ,b ) k , p E . Now for any q 6 = p if we write d own D α ( q,w , b ) k , p E , th en it will have a term similar to s 3 (see (35) and (36)), but with α ( τ k +1 ( p,w ,u ) ,w + u, max { b,r } ) k +1 replaced with α ( τ k +1 ( q,w , u ) ,w + u, max { b,r } ) k +1 . L et us call this term as ˆ s 3 . M ore precisely D α ( q,w , b ) k , p E will b e similar to RHS of (34), but with the th ird term there (r ecall that we had named the third term as s 3 ) replaced by ˆ s 3 . Using (33) in (35) we ob serve that ˆ s 3 ≥ s 3 so that, c k ( p, w, b ) ≤ D α ( q,w , b ) k , p E . (38) Hence by defining A k ( w, b ) := { α ( q,w , b ) k : q ∈ P k } we c an write, c k ( p, w, b ) = inf α ∈A k ( w, b ) h α, p i which p roves th at c k ( · , w, b ) is concave. Finally , by including in the set A k ( w, b ) , the K − k + 1 length vector with each compon ent equal to − η b , we c an expr ess J k ( p, w, b ) as, J k ( p, w, b ) = inf α ∈A k ( w, b ) h α, p i . 38 B. Pr oof of Lemma 6 Since q has m ass o nly on k and k + 1 , usin g Le mma 5 we can write, c k ( q , w , b ) = p ( k ) T − w − η b + p ( k + 1) − η φ 1 ( w, b ) . Using (5) an d (1 7), we ob tain φ 1 ( w, b ) = E h max { b, R } i − T − w 2 η . Substitutin g for φ 1 ( w, b ) in the above expression we have, c k ( q , w , b ) = p ( k ) T − w − η b + p ( k + 1) T − w 2 − η E [max { b , R } ] . (3 9) Recall (15), c k ( p, w, b ) = p ( k ) T − w − η b + K X n = k +1 p ( n ) E k U k +1 + J k +1 τ k +1 ( p, w, U k +1 ) , w + U k +1 , max { b, R k +1 } w, n . Using ( 16) and (5) we can write, c k ( p, w, b ) ≤ p ( k ) T − w − η b + K X n = k +1 p ( n ) E k h U k +1 − η max { b, R k +1 } w, n i = p ( k ) T − w − η b + K X n = k +1 p ( n ) T − w n − k + 1 − η E h max { b, R } i ≤ p ( k ) T − w − η b + K X n = k +1 p ( n ) T − w 2 − η E h max { b, R } i = p ( k ) T − w − η b + 1 − p ( k ) T − w 2 − η E h max { b, R } i = c k ( q , w , b ) . A P P E N D I X I I I P R O O F O F L E M M A S I N S E C T I O N V I A. Pr oof of Lemma 7 Pr oof o f 7.1: Let F R represent the cumm ulative distribution function (cdf) of R . For b ∈ [0 , R ] , the cdf of max { b, R } is, F max { b,R } ( r ) = 0 if r < b F R ( r ) if r ≥ b , using which β ( b ) in (21) can be written d own as, β ( b ) = Z R 0 1 − F max { b,R } ( r ) dr − T η ˜ N = b + Z R b 1 − F R ( r ) dr − T η ˜ N . 39 β ′ ( b ) = F R ( b ) ≥ 0 and β ′′ ( b ) = f R ( b ) ≥ 0 implies th at β is continuo us, increasing a nd conv ex in b . Pr oof of 7.2 : Fr om (7) no te tha t β ( R ) < R . Also β is con vex (from Lemma 7 .1). Hence we can write, β ( b ) ≤ R − b R β (0 ) + b R β ( R ) < b . Pr oof of 7.3 : Let g ( b ) = b − β 1 ( b ) . Then, g (0 ) ≤ 0 and g ( R ) > 0 (because β ( R ) < R ). Also g ( b ) is contin uous (being differentiable) on [0 , R ] . Hen ce, ∃ an α ∈ [0 , R ) such tha t g ( α ) = 0 . Suppose ∃ an α ′ > α such that g ( α ′ ) = 0 . Then by conve xity of β (fro m Lemm a 7.1), β ( α ′ ) ≤ R − α ′ R − α β ( α ) + α ′ − α R − α β ( R ) , which implies that β ( R ) ≥ R . Contradicts the fact that, β ( R ) < R . Pr oof of 7.4: Ag ain consider g ( b ) = b − β ( b ) . g ( b ) is continu ous (being differentiable) on [0 , R ] . Supp ose ∃ b ∈ ( α, R ] such that β 1 ( b ) > b , then g ( b ) ≤ 0 and g ( R ) > 0 . This implies that ∃ b ′ in [ b, R ) such that g ( b ′ ) = 0 . Contradicts the u niquen ess of α shown in Lemm a 7. 3. Similarly it c an b e shown that β ( b ) > b for b ∈ [0 , α ) . B. Pr oof of Lemma 8 The proof is by indu ction. Fro m (24) and (21), we see that the resu lt is alread y tr ue for k = ˜ N − 1 . Next we will prove it for k = ˜ N − 2 . L et us e valuate the value fu nction at stage ˜ N − 2 and sim plify using the expression for ˜ J ˜ N − 1 (from (24)), ˜ J ˜ N − 2 ( b ) = min − η b, E h ˜ U ˜ N − 1 + ˜ J ˜ N − 1 (max { b, ˜ R ˜ N − 1 } ) i = min − η b, E h ˜ U ˜ N − 1 + min n − η max { b, ˜ R ˜ N − 1 } , − η β 1 (max { b, ˜ R ˜ N − 1 } ) oi = min − η b, T ˜ N − η E h max n b, ˜ R ˜ N − 1 , β 1 (max { b, ˜ R ˜ N − 1 } ) oi = min n − η b, − η β 2 ( b ) o , (40) where β 2 ( b ) = E max n b, R, β 1 (max { b, R } ) o − T η ˜ N . (41) β 2 ( b ) ≥ β 1 ( b ) easily follows because E max n b, R, β 1 (max { b, R } ) o ≥ E h max { b, R } i . Next if b ≥ α then f rom Lemma 7.2 and 7.4 we have max { b, R } ≥ β 1 (max { b, R } ) so that ma x n b, R, β 1 (max { b, R } ) o = max { b, R } . Therefo re, β 2 ( b ) = E h max { b, R } i − T η ˜ N = β 1 ( b ) . (42) 40 Hence we have shown that the Lem ma holds fo r ˜ N − 2 . Supp ose that the Lemm a (i.e., (25), (26) and the ord ering proper ty) holds for some k , 1 < k ≤ ˜ N − 1 , then f ollowing the same argu ments w hich were used to obtain ( 40) and (41) (rep lace ˜ N − 2 by k − 1 and ˜ N − 1 by k ) we can show th at (25) and (2 6) hold fo r stage k − 1 as well. The orderin g proper ty can be easily shown to h old for stage k − 1 by using the orderin g pro perty for stage k . A P P E N D I X I V O N E - H O P P E R F O R M A N C E F O R D I FF E R E N T R E WA R D D I S T R I B U T I O N S ( f R ) A N D I N I T I A L B E L I E F S ( p 0 ) In Section VII- A we perfor med simulations to compar e the one-hop perform ance of the various policies (recall the description of th e implemented p olicies from Section VII-A1). Th ere w e had considered th e context of geograp hical forwarding (which was the primary motiv a tion for our work ), s o that the re ward a ssociated with a relay is the progress it makes towards the sink (see Fig . 8). Also the initial belief we had consider ed was truncated Poisson (of mean λ = 10 ) with K = 5 0 (recall that K is the boun d on th e numb er o f relays). From T ables II an d III we were able to draw the fo llowing conclusions: For large values o f target γ , • The average delay of π A − S I M P L and π I N N E R is close to π C OM DP , which is the optim al po licy . • The difference in the d elays, in curred by π I N N E R and π OU T E R , is small. • π A − C OM P D incurs a larger delay . For smaller values o f target γ , we see that all th e po licies incu r similar a verag e delay . In this appen dix, to comment more on these conc lusions, we ha ve perform ed simulation s f or few o ther examples, with different pairs of reward distrib utions ( f R ) and intial beliefs ( p 0 ). In each of these examples, the good perfor mance of the policy π A − S I M P L is ob served. W e hav e fixed T = 1 and no rmalize the re wards to take values with in the interval [0 , 1] for all the examples. The first two examples extend the scenar io of geograp hical forwarding mentioned earlier while in the next two we simply take R to have uniform and trun cated Gaussian distributions, respecti vely . As in Section VII -A we discretize the state space and approxim ate all the p mfs with pdfs in simulations. For eac h example we tabulate the results (i.e. , a verage r ew ard values for few values of target γ and the co rrespond ing average delays) which have the same explan ation as for th e T ab les II and III ( see the explana tion following T ab les II a nd III). EXAMPLE 1 • Re ward: W e consider the same scenario of geograp hical f orwarding as in Section VI I-A. Here we allow the rew ard to be a function of the progress. Let Z i be the pro gress made by relay i . Small values of Z i are n ot fa vourable because the packet does not m ake sig nificant p rogr ess towards th e sink. On the other hand wh en Z i is large, the attenuatio n o f the signa l transmitted from the so urce to th e relay will be large. Th is m eans a higher power is re quired to ach ieve a given p acket error rate. Thus, we want to penalize both small an d large values of Z i . This motivates us to ch oose the r ew ard fun ction to b e R i = − a 1 Z i log( Z i a 2 ) . R i is m aximum at Z i = a 2 e . W e h av e choosen a 2 = 0 . 4 e . a 1 is a constant used to normalize the maximum re ward value to 1 . Using f Z in (30) one can ob tain f R . 41 • Initial Belief: Bou nd on the n umber of relay s is K = 4 0 . Initial belief is tru ncated Poisson with p arameter 5 i.e., f or n = 1 , 2 , · · · , K , p 0 ( n ) = c 5 n n ! e − 5 , whe re c is the no rmalization con stant. Results are tabulated in T ables IV a nd V. T arget γ 0.780 0 0.8200 0.8600 0.9000 0.9400 E [ R π C OM D P ] 0.7751 0.8164 0.8628 0.9018 0.9402 E [ R π I N N ER ] 0.7755 0.8195 0.8663 0.8991 0.9397 E [ R π OU T E R ] 0.7826 0.8216 0.8593 0.9006 0.9407 E [ R π A − C OM D P ] 0.7730 0.8184 0.8651 0.8986 0.9401 E [ R π A − S I M P L ] 0.7755 0.8166 0.8589 0.8992 0.9406 T ABLE IV EXAMPLE 1 : T A R G E T γ A N D C O R R E S P O N D I N G AV E R A G E R E WA R D S T arget γ 0.7800 0. 8200 0.8600 0.9000 0.9400 E [ D π C OM D P ] 0.1950 0.2082 0.2340 0.2715 0.3649 E [ D π I N N ER ] 0.1963 0.2150 0.2471 0.2839 0.4005 E [ D π OU T E R ] 0.1993 0.2168 0.2431 0.2865 0.4078 E [ D π A − C OM D P ] 0.1963 0.2164 0.2499 0.2871 0.4153 E [ D π A − S I M P L ] 0.1963 0.2133 0.2411 0.2840 0.4056 T ABLE V EXAMPLE 1 : A V E R A G E D E L AY S C O R R E S P O N D I N G T O AVE R AG E R E WA R D S I N T A B L E I V EXAMPLE 2 • Re ward: Again we c onsider th e scenar io of geog raphical forwarding . Let Z i be th e pr ogress mad e by a relay i and H i be the (nor malized) d ata rate from the source to th e relay i . H i is a rando m variable which takes values from the set { 0 . 2 , 0 . 4 , 0 . 6 , 0 . 8 , 1 . 0 } . F or small (large) values of Z i there is a high (low) prob ability that the data rates are go od. T hus as Z i increases we want the pr obability of H i taking larger values to d ecrease. Therefo re when Z i = z we set P ( H i = h | Z i = z ) = a z he − dz h for h ∈ { 0 . 2 , · · · , 1 . 0 } . a z he − dz h , as a function of h , attains maximu m at 1 dz so that as Z i increases H i takes lower values with h igh prob ability . W e have choosen d = 1 0 . a z is a constant to normalize the to tal probab ility to 1 . Finally the rew ard associated with relay i is R i = c 1 Z i + c 2 H i . W e cho ose c 1 = c 2 = 0 . 5 . • Initial Belief: K = 3 0 and p 0 is bin omial with para meter 0 . 5 i.e., f or n = 1 , 2 , · · · , K , p 0 ( n ) = c K n 0 . 5 n where c is the norm alization contant. Such an initial belief is appro priate if initially du ring deploymen t the source had K p otential relays a nd at the tim e when the sour ce has a pac ket (which hap pens after a significant amount of time b ecause the events ar e rare), the p robab ility with wh ich a relay has no t failed is 0 . 5 (we h ave ignored the case where all the relays have failed). Results are tabulated in T ables VI a nd VII. 42 T arget γ 0.450 0 0.5000 0.5500 0.6000 0.6500 E [ R π C OM D P ] 0.4510 0.5058 0.5489 0.6002 0.6500 E [ R π I N N ER ] 0.4510 0.5050 0.5508 0.6013 0.6503 E [ R π OU T E R ] 0.4510 0.5056 0.5506 0.5989 0.6500 E [ R π A − C OM D P ] 0.4510 0.5066 0.5500 0.6002 0.6500 E [ R π A − S I M P L ] 0.4510 0.5088 0.5518 0.6013 0.6499 T ABLE VI EXAMPLE 2 : T A R G E T γ A N D C O R R E S P O N D I N G AV E R A G E R E WA R D S T arget γ 0.4500 0. 5000 0.5500 0.6000 0.6500 E [ D π C OM D P ] 0.0638 0.0830 0.1179 0.2075 0.4246 E [ D π I N N ER ] 0.0638 0.0835 0.1218 0.2155 0.4456 E [ D π OU T E R ] 0.0638 0.0836 0.1225 0.2159 0.4582 E [ D π A − C OM D P ] 0.0638 0.0843 0.1213 0.2146 0.4510 E [ D π A − S I M P L ] 0.0638 0.0858 0.1225 0.2159 0.4443 T ABLE VII EXAMPLE 2 : A V E R A G E D E L AY S C O R R E S P O N D I N G T O AVE R AG E R E WA R D S I N T A B L E V I EXAMPLE 3 • Re ward: R is distributed unifor mly on [0 , 1] . • Initial Belief: K = 20 and p 0 is b inomial with parameter 0 . 5 . Results are tabulated in T ables VII I an d I X. T arget γ 0.700 0 0.7500 0.8000 0.8500 0.9000 E [ R π C OM D P ] 0.7093 0.7566 0.8030 0.8503 0.9000 E [ R π I N N ER ] 0.7102 0.7588 0.7984 0.8512 0.9001 E [ R π OU T E R ] 0.7099 0.7523 0.8004 0.8500 0.8999 E [ R π A − C OM D P ] 0.7135 0.7580 0.8040 0.8488 0.9001 E [ R π A − S I M P L ] 0.7119 0.7538 0.7968 0.8485 0.9009 T ABLE VIII EXAMPLE 3 : T A R G E T γ A N D C O R R E S P O N D I N G AV E R A G E R E WA R D S T arget γ 0.7000 0. 7500 0.8000 0.8500 0.9000 E [ D π C OM D P ] 0.1557 0.1846 0.2279 0.3033 0.5115 E [ D π I N N ER ] 0.1588 0.1910 0.2288 0.3180 0.5443 E [ D π OU T E R ] 0.1594 0.1870 0.2339 0.3201 0.5515 E [ D π A − C OM D P ] 0.1610 0.1909 0.2367 0.3136 0.5995 E [ D π A − S I M P L ] 0.1600 0.1872 0.2279 0.3107 0.5529 T ABLE IX EXAMPLE 3 : A V E R A G E D E L AY S C O R R E S P O N D I N G T O A V E R A G E R E WA R D S I N T A B L E V I I I 43 EXAMPLE 4 • Re ward: T runcated Gaussian of mean 0 . 5 and variance 1 i.e., f or r ∈ [0 , 1] , f R ( r ) = c √ 2 π e ( r − 0 . 5) 2 2 where c is the n ormalization constant. • Initial Belief: K = 15 and p 0 is u niform on { 1 , 2 , · · · , K } . Results are tabulated in T ables X an d XI . T arget γ 0.640 0 0.6800 0.7200 0.7600 0.8000 E [ R π C OM D P ] 0.6500 0.6725 0.7208 0.7625 0.8001 E [ R π I N N ER ] 0.6487 0.6728 0.7240 0.7601 0.7997 E [ R π OU T E R ] 0.6388 0.6807 0.7213 0.7600 0.7998 E [ R π A − C OM D P ] 0.6259 0.6791 0.7225 0.7618 0.7997 E [ R π A − S I M P L ] 0.6302 0.6769 0.7146 0.7607 0.8009 T ABLE X EXAMPLE 4 : T A R G E T γ A N D C O R R E S P O N D I N G AV E R A G E R E WA R D S T arget γ 0.6400 0. 6800 0.7200 0.7600 0.8000 E [ D π C OM D P ] 0.2092 0.2222 0.2600 0.3060 0.3799 E [ D π I N N ER ] 0.2274 0.2473 0.2981 0.3478 0.4386 E [ D π OU T E R ] 0.2307 0.2622 0.3031 0.3576 0.4460 E [ D π A − C OM D P ] 0.2290 0.2689 0.3122 0.3740 0.4735 E [ D π A − S I M P L ] 0.2218 0.2577 0.2924 0.3532 0.4473 T ABLE XI EXAMPLE 4 : A V E R A G E D E L AY S C O R R E S P O N D I N G T O A V E R A G E R E WA R D S I N T A B L E X
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment