Optimal Power Allocation for Renewable Energy Source
Battery powered transmitters face energy constraint, replenishing their energy by a renewable energy source (like solar or wind power) can lead to longer lifetime. We consider here the problem of finding the optimal power allocation under random chan…
Authors: Abhinav Sinha, Prasanna Chaporkar
1 Optimal Po wer Allocation for a Rene wable Ener gy Source Abhinav Sinha and Prasanna Chaporkar Electrical Engineering Department, Indian Institute of T echnology , Bombay , India. { abhinavsinha,chaporkar } @ee.iitb.ac.in Abstract —Battery power ed transmitters face energy constraint, r eplenishing their energy by a renewable energy source (like solar or wind power) can lead to longer lifetime. W e consider her e the problem of finding the optimal power allocation under random channel conditions for a wireless transmitter , such that rate of inf ormation transfer is maximized. Here a r echargeable battery , which is periodically charged by r enewable source, is used to power the transmitter . All of above is formulated as a Markov Decision Pr ocess. Structural properties like the monotonicity of the optimal value and policy deriv ed in this paper will be of vital importance in understanding the kind of algorithms and approximations needed in real-life scenarios. The effect of curse of dimensionality which is pre valent in Dynamic programming problems can thus be reduced. W e show our results under the most general of assumptions. Index T erms —Optimal reward function, Monotone op- timal policy , Concavity , Stochastic domination. I . I N T RO D U C T I O N A S we mov e towards hand-held devices that use wireless transmitters, there is an e xceeding need to prolong the lifetime of their batteries without ha ving to manually rechar ge them on a regular basis. One natural solution to such a problem is to utilize the en vironment, i.e., hav e a renew able energy source recharge the battery periodically . This will enable the system to be self- sustaining. List of renew able energy sources include solar power , wind energy , geothermal ener gy and ocean energy (tidal and wav e). Our objecti ve here is to max- imize the throughput of a wireless transmitter enabled with renew able energy source. (A lot of work in this regard has also been done to optimize the performance of the battery po wered sensor (see Chang [1], Hou [2]) and also in field of energy-harvesting (see Y asser [3]). A recent paper has experimentally shown it possible to po wer a remote sensor via magnetic resonance without being in contact with the sensor, see Kurs [4]) The rene wable sources of ener gy are better modelled as random sources due to the lack of control that we hav e ov er the source (for e xample in wind energy , speed of the winds is not in our control). Thus the key challenges we face are on account of having randomness in recharge energy from the renew able source and randomness in channel state. Also since we ha ve a battery , the maxi- mum energy that can be stored at any point of time is limited. This is quite different in contrast to ha ving a constraint only in terms of av erage power used. There could be a case for not operating at energy lev els close to maximum lest added ener gy could go to waste. Whereas randomness in channel state could see the optimal policy conserving ener gy while waiting for a better channel to come. W e hope to answer for such trade-off in this paper . W e model the problem of maximizing throughput of renew able energy empo wered wireless transmitter as an infinite horizon discounted rew ard Mark ov Decision Process (MDP). W e will use the reward function ( J ? ), which represents the overall throughput, to compare policies. Optimal policies for us would mean deciding on what power to allocate for ev ery possible value of battery state and channel state (defined together as states ) so as to obtain maximum o verall reward ( J ? ) for e very state. Generally MDP or dynamic programming solutions follow the “Curse of dimensionality”, because the state space tends to be exponential in one or more system parameter . That is the case in our problem as well. Higher complexity solutions are not preferred as it would become a nightmare to implement it. In such a case, having some kind of structure on the solution will hav e big advantage implementation-wise, not to mention having more analytic tractability of the problem. Our contrib ution here is to prove the non-decreasing nature of the optimal policy w .r .t states. Our proofs rely only on standard results and techniques used in MDP’ s. Monotonicity in optimal policy is also important as it tells us about how the structure of the system is impervious to v arious situations like ha ving different probability distributions on channel state and rechar ge energy . Once we have proven non-decreasing optimal policy , the search space automatically reduces. Moreover on the basis of this we can also try to get the threshold behaviour (approximately if suitable) which will gi ve us 2 chance to make the implementation in real-time. As far as structural properties go, monotonicity for the optimal policy is one of the most basic results. Hence there has been a plethora of work on the matter . One of the earliest method to prove monotonicity was provided by Serfozo [5]. In his book [6], Martin Puterman has provided sufficient conditions for the same as well, here ho we ver we approach the problem in a dif ferent manner (we sho w results based on properties of J ? rather than the T ransition Probability Matrix). There has also been a lot of work on optimal policy for rechargeable sensors but with different considerations, in [7] we can find a policy which not only takes into account the rate of information transfer but also actual throughput for the queued data. Similarly , in [8], the authors hav e dealt with the finite horizon equiv alent and have given an on- line policy which can guarantee fraction of the optimal throughput. After defining the problem we set up the equations for finding the solution in section II. In section III we begin by proving results about monotonicity (non- decreasing) and concavity of J ? and then mov e on to our main result where we prov e that the Optimal Power Allocation function is non-decreasing. Once we have our main structural result, we talk of possible generalizations from this frame work. In section IV we present simulation results for verification of our result as well as to look at the effects of varying system parameters and conclude by noting some of the work that is being taken up. I I . F O R M A L I S M A. System definition W e consider a system consisting of one receiv er and one transmitter with a wireless channel for commu- nication. Moreover fading channel has been assumed. For a fading wireless channel, the maximum rate of information transfer i.e. capacity of the channel (due to Shannon [9]) is C = log(1 + S N R ) S N R = P h N 0 W here P is the transmitted power , h is the channel-fade coef ficient and N 0 W is the noise spectral density (SNR thus is the signal-to-noise ratio). The channel-f ade coef fi- cient, h ∈ H = { e 1 , e 2 , . . . , e N } according to the known probability distribution P H ( · ) . W e assume a memoryless channel and H represents the set of possible channel states, where e i < e j for i < j . On the transmitter side, po wer is provided by a rechargeable battery which has finite capacity to store energy (this could be the model for remote sensors placed in obscure areas which can be recharged periodically using only renew able sources like wind and solar energy and which will hav e a limited capacity to store energy). Our main aim is to find the optimal power allocation policy for this system, which will tell us the rule by which po wer is to be used for data transmission in terms of the other parameters of the system so as to get maximum rate of information transfer . T ime is considered to be slotted and we also assume full channel-side information (CSI). So we ha ve perfect channel state information before transmission in e very slot. Let the energy in the battery at the beginning of the n th time slot be ξ n and power allocated in the slot be P n (energy per slot). W e will use the random v ariable X n to model the amount of recharging energy added to the battery at the end of n th slot by the renew able source. Note that the process { X n } n ≥ 1 is assumed to be i.i.d. and random v ariable has a finite support in the set { 0 , 1 , . . . , a } . All our variables are over non-negati ve integers. (For e xample in Solar ener gy refer to [10] for the model relating to the exact distribution on X ). Using these we can write our system equation ξ n +1 = min ( ξ n − P n ) + + X n , ξ m (1) ( x ) + = max { x, 0 } and here ξ m is maximum energy that can be stored in the battery . B. Marko v Decision Pr ocess formulation T o solve this problem we are going to formulate it as an infinite horizon Marko v Decision Pr ocess (MDP). The state space, S , will be two-dimensional, a typical state would be ( ξ , h ) , which represents the current energy in the battery and the current channel-fade coef ficient. From this the size of the state space will be | S | = ( ξ m + 1) × N (note that energy in the battery can be 0 ). V alid action space (power allocation) for the state ( ξ , h ) will be P ∈ { 0 , 1 , . . . , ξ } , this is because at any time we can at most allocate all the po wer av ailable in the battery and also that we can also choose to allocate zero po wer (using this the ( · ) + sign in the system equation becomes redundant). Union of all action spaces will be A = { 0 , 1 , . . . , ξ m } . W e will consider discounted re wards with a constant discount factor λ ∈ (0 , 1) . Our re ward function r : S × A → R + 0 is r (( ξ , h ) , P ) = log 1 + h P N 0 W No w we define optimal reward function J ? : S → R + 0 as the optimal value for each state that we start with. Transition Probability Matrix (TPM), P { ( ξ 0 , h 0 ) | ( ξ , h ) , P } , represents the probability of getting to some state ( ξ 0 , h 0 ) starting from ( ξ , h ) and taking action P . 3 Using all of the abo ve we can write the Bellman’ s equation of dynamic programming as J ? ( ξ , h ) = max P ≤ ξ n log 1 + h P N 0 W + λ ξ m X ξ 0 =( ξ − P ) N X h 0 =1 P { ( ξ 0 , h 0 ) | ( ξ , h ) , P } × J ? ( ξ 0 , h 0 ) o we will write this succinctly as (using s ≡ ( ξ , h ) as state) J ? ( s ) = max P ≤ ξ n r ( s , P ) + λ E X h 0 J ? ( f ( s , P ) , h 0 ) o (2) here f represents the rhs in (1). Policy for this system will be map from state space to action space for each epoch, but as this is an infinite horizon MDP we will only look at Stationary Deter- ministic Policies to get the maximum throughput. So the optimal policy for our problem will be of the form π ? = { µ ? , µ ? , . . . } and for conv enience lets call it policy µ ? . So we can write the equation for optimal decision rule µ ? : S → A succinctly as follows µ ? ( s ) = arg max P ≤ ξ n r ( s , P ) + λ E X h 0 J ? ( f ( s , P ) , h 0 ) o . W ith this our formulation of this problem is done and no w we can mov e to wards some of the results. I I I . R E S U LT S Here we prov e structural results about monotonicity of J ? and µ ? for our optimal po wer allocation problem which we have formulated as an MDP . In the previous section we wrote the Bellman’ s Equa- tion for our MDP and one way to solv e it is using V alue Iteration procedure (refer to the book by Bertsekas [11]). For this we start with an initial v alue (estimate) for the optimal rew ard function, say J 0 ( s ) = 0 ∀ s ∈ S and then write iteration equations as J k +1 ( s ) = max P ≤ ξ n r ( s , P ) + λ E X h 0 J k ( f ( s , P ) , h 0 ) o (3) where s = ( ξ , h ) . From the theory of infinite horizon discounted rew ard MDP problems we know that this will con verge (to J ? ) under the condition of bounded rew ard per stage (which is satisfied by the re ward function in our case, the reward function is bounded and the action space and state space are all finite due to discrete nature of our formulation). A. Pr eliminary Results Here we will state and prove lemmas which will be required later to prov e the main theorem. Lemma 1 (Monotone Optimal Re ward Function) . The optimal Rewar d Function, J ? ( ξ , h ) , is non-decreasing in both ar guments. W e have two parts in this, 1) F or any ξ ∈ { 0 , . . . ξ m } , J ? ( ξ , h + ) ≥ J ? ( ξ , h − ) wher e h + > h − , 2) F or any h ∈ H , J ? ( ξ + , h ) ≥ J ? ( ξ − , h ) wher e ξ + > ξ − . Pr oof: ( P art 1 ) T ake any ξ and consider channel states h − and h + where h + > h − . Notice that as the channel process is i.i.d. , the channel transitions are independent of each other . Specifically , we can say that the future channels are independent of current channel state, so the second term in (2) for J ? ( ξ , h + ) and J ? ( ξ , h − ) will be identical (as a function of P ). T ak e P − = µ ? ( ξ , h − ) , by using (2) at this power we hav e J ? ( ξ , h + ) − J ? ( ξ , h − ) ≥ log 1 + h + P − N 0 W − log 1 + h − P − N 0 W ≥ 0 (4) Pr oof : ( P art 2 ) T ake any h and consider ξ + and ξ − where ξ + > ξ − . Starting the v alue iteration with J 0 ( s ) = 0 ∀ s ∈ S we will use induction to prove our result (for e very step of value iteration). The base case is vacuously true. Now we assume that J k ( ξ , h ) is non-decreasing in ξ . Let P − k maximize the r .h.s of (3) for the state ( ξ − , h ) . From our iteration equations we have at po wer P = P − k and for D = J k +1 ( ξ + , h ) − J k +1 ( ξ − , h ) D ≥ λ E X h 0 J k ( f ( ξ + , P ) , h 0 ) − J k ( f ( ξ − , P ) , h 0 ) (5) Since ξ + > ξ − , then for the same po wer P − k , we’ll hav e f ( ξ + ) > f ( ξ − ) (for e very instance of X ). By induction hypothesis J k ( ξ , h ) is non-decreasing in ξ , hence the term inside the expectation in (5) is non-negati ve (for e very instance of X and h ). Hence after taking the expectation we will ha ve J k +1 ( ξ + , h ) ≥ J k +1 ( ξ − , h ) using induction no w we can claim the abov e ∀ k ∈ Z + and hence the result follo ws by taking lim k →∞ . The abo ve lemma can be effecti vely written as J ? ( ξ + , h + ) ≥ J ? ( ξ − , h − ) ∀ ξ + ≥ ξ − , h + ≥ h − No w that we have shown monotonically increasing nature of optimal reward function, another property that will go a long way in proving our final result is that 4 of conca vity of J ? . T ypically conca vity (conv exity) and equi v alently sub-modularity (super-modularity) has been the most used method to prove monotonicity of policy . So here with the help of a little extra set up we prove the important property of concavity of J ? in ener gy only . Lemma 2 (Concave Optimal Re ward Function) . The optimal r ewar d function J ? ( ξ , h ) is conca ve in ξ for a fixed h . Pr oof: Here we will use induction on V alue iteration steps, just like before. W e will first sho w that conca vity in J k implies concavity in J k +1 . Assuming J k is concav e we take states as s 1 = ( ξ 1 , h ) s 2 = ( ξ 2 , h ) ¯ s = ( ξ , h ) where ξ = αξ 1 + (1 − α ) ξ 2 (0 < α < 1) . Now taking the optimal powers for this step of the iteration as P 1 and P 2 we can write the equations J k +1 ( s 1 ) = r ( s 1 , P 1 ) + λ E X h 0 J k ( f ( s 1 , P 1 ) , h 0 ) J k +1 ( s 2 ) = r ( s 2 , P 2 ) + λ E X h 0 J k ( f ( s 2 , P 2 ) , h 0 ) W e kno w that log ( · ) reward here is a conca ve function in P and is constant w .r .t variation in ξ , hence we hav e αr ( s 1 , P 1 ) + (1 − α ) r ( s 2 , P 2 ) ≤ r ¯ s, ¯ P (6) where ¯ P = αP 1 + (1 − α ) P 2 and ¯ s can be used because it has the same channel coef ficient, h , as s 1 and s 2 . By induction hypothesis J k is conca ve as well, so αJ k ( f ( s 1 , P 1 ) , h 0 ) + (1 − α ) J k ( f ( s 2 , P 2 ) , h 0 ) ≤ J k αf ( s 1 , P 1 ) + (1 − α ) f ( s 2 , P 2 ) , h 0 (7) Beyond this point we di vide the problem into cases, depending on the values of X . Case 1 : All X , such that f ( s 1 , P 1 ) , f ( s 2 , P 2 ) < ξ m . ⇒ αf ( s 1 , P 1 ) + (1 − α ) f ( s 2 , P 2 ) = αξ 1 + (1 − α ) ξ 2 − ( αP 1 + (1 − α ) P 2 ) + X = ξ − ¯ P + X = f ¯ s, ¯ P The last equality follo ws since the ar gument in this case is clearly < ξ m . Hence continuing from (7) we can write αJ k ( f ( s 1 , P 1 ) , h 0 ) + (1 − α ) J k ( f ( s 2 , P 2 ) , h 0 ) ≤ J k f ¯ s, ¯ P , h 0 (8) Using (6) and (8) we can thus write αJ k +1 ( s 1 ) + (1 − α ) J k +1 ( s 2 ) ≤ r ¯ s, ¯ P + λJ k f ¯ s, ¯ P , h 0 (9) Case 2 : All X , such that f ( s 1 , P 1 ) = ξ m = f ( s 2 , P 2 ) . ⇒ α ( ξ 1 − P 1 + X ) + (1 − α )( ξ 2 − P 2 + X ) ≥ ξ m so f ( ¯ s, ¯ P ) = ξ m and hence we can write J k αf ( s 1 , P 1 ) + (1 − α ) f ( s 2 , P 2 ) , h 0 = J k ( ξ m , h 0 ) = J k f ( ¯ s, ¯ P ) , h 0 from this the same result as in (9) follo ws. Case 3 : All X , such that f ( s 2 , P 2 ) < f ( s 1 , P 1 ) = ξ m . ⇒ ξ 2 − P 2 + X < ξ m = ξ 1 − P 1 + X − β ( β ≥ 0) , αf ( s 1 , P 1 ) + (1 − α ) f ( s 2 , P 2 ) = ξ − ¯ P + X − αβ (10) Clearly the term in the r .h.s in (10) is less than ξ m and it also is ≤ ξ − ¯ P + X so we can conclude ξ − ¯ P + X − αβ ≤ min { ξ − ¯ P + X, ξ m } Since J k is non-decreasing in energy (shown in the proof of Lemma 1) we can conclude the same as in (8) and from there (9) as well. Cases finished. From these three cases what we have seen that (9) is satisfied for all h 0 and all possible values of X and hence we can introduce the E ( · ) operator and conclude αJ k +1 ( s 1 ) + (1 − α ) J k +1 ( s 2 ) ≤ r ¯ s, ¯ P + λ E X h 0 h J k f ¯ s, ¯ P , h 0 i ≤ J k +1 ( ¯ s ) where the last inequality holds because ¯ P can generate a value only less that or equal to the optimal value for state ¯ s (at the ( k + 1) th iteration). No w from all this we have shown that concavity in J k implies concavity in J k +1 and starting with a conca ve initial value of the iteration like J 0 ( s ) = 0 ∀ s ∈ S , we can conclude by induction that J k is conca ve in ξ ∀ k ∈ Z + . Hence as V alue iteration con verges we can conclude that J ? is conca ve in ξ . Corollary 1. If we have ener gy levels x ≤ w ≤ z ≤ y such that x + y = w + z then (11) J ? ( x, h ) + J ? ( y , h ) ≤ J ? ( w , h ) + J ? ( z , h ) Pr oof: For a fixed h define J ? ( ξ , h ) ≡ g ( ξ ) . Also let ∆ g ( i ) = g ( i + 1) − g ( i ) , then we can write g ( x, h ) + g ( y , h ) = 2 g ( x ) + y − 1 X i = x ∆ g ( i ) g ( w , h ) + g ( z , h ) = 2 g ( x ) + w − 1 X i = x ∆ g ( i ) + z − 1 X i = x ∆ g ( i ) As J ? is concave in energy , we know that ∆ g ( i ) is non- increasing with i (follo wing the “Law of diminishing returns” for concave functions). Summations in both equations abo ve have the same number of terms (due to (11)) and clearly the first equation sums ∆ g ( i ) ov er higher v alues of i and therefore is smaller . This property is called sub-modularity . 5 B. Main Structural Result No w we prove the main structural result with the aid of the lemmas of pre vious subsection. Theorem 1 (Monotonic Optimal Polic y) . The optimal policy of power allocation, µ ? ( ξ , h ) , is non-decreasing in both arguments. W e have two parts in this, 1) F or any ξ ∈ { 0 , . . . ξ m } , µ ? ( ξ , h + ) ≥ µ ? ( ξ , h − ) wher e h + > h − 2) F or any h ∈ H , µ ? ( ξ + , h ) ≥ µ ? ( ξ − , h ) wher e ξ + > ξ − Pr oof: ( P art 1 ) Consider two channel states h − and h + where h + > h − . W e can write µ ? ( ξ , h + ) = ar g max P ≤ ξ n log 1 + h + P N 0 W − log 1 + h − P N 0 W + log 1 + h − P N 0 W + λ E B h 0 J ? ( f ( ξ , P ) , h 0 ) o Since the last term is independent of h + we hav e µ ? ( ξ , h + ) = max P ≤ ξ T 1 + T 2 where T 1 = log 1 + h + P N 0 W − log 1 + h − P N 0 W and T 2 is the full term that will appear inside the max operator in the expression for µ ? ( ξ , h − ) , which means that T 2 achie ves its maximum at P h − = µ ? ( ξ , h − ) . Notice that T 1 is monotonically increasing in P , since dT 1 dP = N 0 W ( h + − h − ) ( N 0 W + h + P )( N 0 W + h − P ) > 0 (12) Considered at any P < P h − , the term T 1 will hav e a v alue lesser than at P h − (because its monotonically increasing) and same for T 2 (because maxima is at P h − ). Hence { T 1 + T 2 } cannot achiev e its maxima for an y P < P h − and we conclude µ ? ξ , h + ≥ µ ? ( ξ , h − ) Pr oof : ( P art 2 ) Firstly note that ξ 2 < ξ m ⇒ P { ξ 2 | ξ , P } = P { X = ξ 2 − ξ + P } ξ 2 = ξ m ⇒ P { ξ m | ξ , P } = P { X ≥ ξ m − ξ + P } From the above now we can write the second term in J ? as ξ m X ξ 0 = ξ − P N X h 0 =1 P { h 0 } × P { ξ 0 | ξ , P } × J ? ( ξ 0 ,h 0 ) ≡ E h 0 " ξ m − ξ + P − 1 X i =0 P { X = i } × J ? ( ξ − P + i, h 0 ) + P { X ≥ ξ m − ξ + P } × J ? ( ξ m , h 0 ) # but we can write P ( X ≥ ξ m − ξ + P ) in terms of the summation preceding it, hence we will have J ? ( ξ , h ) = λ E h 0 [ J ? ( ξ m , h 0 )] + max P ≤ ξ n log 1 + P h N 0 W − λ E h 0 ξ m − ξ + P − 1 X i =0 P { X = i } × J ? ( ξ m , h 0 ) − J ? ( ξ − P + i, h 0 ) o (13) No w we will use contradiction to prov e our result i.e. assume that there exists states ξ 1 > ξ 2 with optimal po wers P 1 < P 2 . Let J P ( ξ , h ) represents the rhs term in (2), e valuated at po wer P . Then due to optimality of P 2 with ξ 2 and P 1 with ξ 1 we will have the equations J P 2 ( ξ 2 , h ) − J P 1 ( ξ 2 , h ) ≥ 0 , J P 1 ( ξ 1 , h ) − J P 2 ( ξ 1 , h ) ≥ 0 Adding the two equations with the help of (13) and using g ( ξ ) ≡ J ? ( ξ , h ) as well as p i ≡ P { X = i } will gi ve us E h " κ 11 X i =0 p i A ( i ) + κ 12 X i = κ 11 +1 p i B ( i )+ κ 21 X i = κ 12 +1 p i C ( i ) + κ 22 X i = κ 21 +1 p i D ( i ) # ≥ 0 (14) for κ ij = ξ m − y ij − 1 , y ij = ( ξ i − P j ) i, j ∈ { 1 , 2 } A ( i ) = g ( y 11 + i ) + g ( y 22 + i ) − g ( y 12 + i ) − g ( y 21 + i ) , B ( i ) = g ( ξ m ) + g ( y 22 + i ) − g ( y 12 + i ) − g ( y 21 + i ) , C ( i ) = g ( y 22 + i ) − g ( y 21 + i ) , D ( i ) = − g ( ξ m ) + g ( y 22 + i ) . In breaking the above summations appropriately we have assumed w .l.o.g. κ 12 ≤ κ 21 , which means κ 11 ≤ κ 12 ≤ κ 21 ≤ κ 22 & y 11 ≥ y 12 ≥ y 21 ≥ y 22 . W e will argue that (14) is a contradiction. Our follow- ing calculations hold for e very h . Simply by our construction y 22 ≤ y 12 , y 21 ≤ y 11 and y 11 + y 22 = ( ξ 1 + ξ 2 ) − ( P 1 + P 2 ) = y 21 + y 12 so by Corollary 1, A ( i ) ≤ 0 ∀ i . W e know that g is non-decreasing (Lemma 1). As y 22 ≥ y 21 we’ll hav e C ( i ) ≤ 0 ∀ i . Since the range of summation for D ( i ) is such that y 22 + i ≤ ξ m we also have D ( i ) ≤ 0 ∀ i . No w looking at B ( i ) , define successiv e dif ferences ∆ g ( l ) = g ( l + 1) − g ( l ) (using the same method as in Corollary 1). Due to concavity of J ? (Lemma 2) this is non-increasing. W e can e xpress g ( ξ m ) , g ( y 12 + i ) and g ( y 21 + i ) as a summation of ∆ g starting from g ( y 22 + i ) . W e will then see here that g ( ξ m ) + g ( y 22 + i ) has fewer 6 ∆ g terms in summation compared to g ( y 12 + i ) + g ( y 21 + i ) and those ∆ g ( l ) terms are also smaller since they are being summed ov er higher l . Since ∆ g is positiv e we can conclude that B ( i ) ≤ 0 ∀ i . So from all this we ha ve shown that all terms in (14) are negati ve ∀ h and thus when their expectation is taken, it will be ne gati ve too. Thus we have shown a contradiction. Hence proved. The abo ve result can be concisely written as µ ? ( ξ + , h + ) ≥ µ ? ( ξ − , h − ) ∀ ξ + ≥ ξ − , h + ≥ h − C. P ossible Generalizations In this problem we had compact support on X and ξ . Note that as long as we have compact support for these two, the results will carry through to uncountable state/action space as well. Meaning, instead of having discrete values of ξ and X , we can make it continuous (ov er real numbers) and end up with the same results. The re ward function used here was log , we can enlist the follo wing properties that were used explicitly in proving our results 1) rew ard ( r ) depends only on h and P , its indepen- dent of ξ (used in Lemma 1 part 1), 2) r (( ξ , h ) , P ) is concav e in P (used in Lemma 2), 3) ∂ r (( ξ , h ) , P ) ∂ h ≥ 0 (used in (4)) . 4) ∂ 2 r (( ξ , h ) , P ) ∂ P ∂ h ≥ 0 (used in (12)) . No other property of log function was used. This means that any reward function satisfying these three proper- ties will giv e us the same results. (Re ward function is assumed to be positiv e for all state/action pairs) I V . S I M U L A T I O N R E S U LT S W e present here simulation results which essentially verify our results (the properties prov ed here were ver- ified for a large number of parameters before being prov ed). W e take the parameters in the problem as ξ m = 50 a = 56 λ = 0 . 85 N = 17 and N 0 W = 10 . This means that the channel states are in H = { 1 , . . . , 17 } . The distribution h is taken to be bell-shaped and distribution on X was taken to be a strictly decreasing one. For this system we first plot the optimal policy µ ? ( ξ , h ) , (which we ha ve proved to be non-decreasing in both ξ and h ), Fig. 1. µ ? ( ξ , h ) vs. ξ for h = 5 , 15 Fig. 2. J ? ( ξ , h ) vs. ξ for h = 5 , 10 and then the optimal rew ard function J ? ( ξ , h ) , which should not only be non-decreasing in both arguments but also conca ve in ξ . Apart from verifying our prov en results another im- portant feature to discuss is the structure of the random po wer being added in e very slot i.e. distribution of X . Higher power added in ev ery slot should giv e us higher optimal po wers to w ork with, since e ven if we spend po wer on a bad channel once, we wouldn’ t hav e to wait long before the battery gets rechar ged (since higher v alues of X are more likely). In this regard we also present here the graph of µ ? for 2 dif ferent distributions on X . P X 1 represents a distrib ution which decreases with x (this is also the distribution we hav e been using till no w) and P X 2 represents a distrib ution which is exactly in verted i.e. it increases with x . Clearly P X 2 has higher mean that P X 1 . As an instructiv e example we can also look at the solution after v arying λ , variation in λ is of central importance because it essentially tells us how much importance is being gi ven to future re wards as opposed to the current re ward, which basically dictates the average number of recharge c ycles that the battery may hav e to go through (and consequently its effecti ve life-time). W e notice in our case that as λ increases more impor - tance is gi ven to future re wards and consequently optimal po wers become lower i.e. po wer is being saved for future 7 Fig. 3. µ ? ( ξ , h ) vs. ξ for P X 1 , P X 2 and h = 10 where probably better channels may be av ailable. Fig. 4. µ ? ( ξ , h ) vs. ξ for λ = 0 . 5 , 0 . 85 , 0 . 9 and h = 15 V . C O N C L U S I O N In this paper we ha ve pro ved one of the most important features of the po wer allocation problem constrained under limited capacity of the battery . The results hav e been prov ed from scratch without the use of any kno wn results except the standard ones for a general MDP setting. The most pleasing aspect of this result is that there were no assumptions required on the distribution of X and h , just that their respecti ve processes are i.i.d. . Along with the main result, the side results like the monotone and concav e nature of J ? are also important tools in deciding a minimum complexity algorithm. Once we have a monotonically increasing optimal policy then not only does the search space for any algorithm gets reduced but also the memory required to store the related tables gets reduced, which is v ery much desirable as the sensors are quite small in size. The polic y here is an of f-line polic y . The other results being looked into are that of finding an actual algorithm that will take full advantage of the results prov ed here. Further work that is going on is for the case of unknown channel process, in which case Q-learning methods need to be looked into and possibly an on-line policy can be determined. Another possibility is that of { X n } n ≥ 1 process being dependent on state, which actually is a realistic scenario in capacitor charging models gi ven for solar cells. R E F E R E N C E S [1] J.-H. Chang and L. T assiulas, “Maximum lifetime routing in wireless sensor networks, ” Networking, IEEE/ACM T ransac- tions on , vol. 12, no. 4, pp. 609 – 619, aug. 2004. [2] Y . T . Hou, Y . Shi, and H. D. Sherali, “Rate allocation in wireless sensor networks with network lifetime requirement, ” in Pr oceedings , ser . MobiHoc ’04, 2004, pp. 67–77. [3] Y . Ammar, A. Buhrig, M. Marzencki, B. Charlot, S. Basrour , K. Matou, and M. Renaudin, “Wireless sensor network node with asynchronous architecture and vibration harv esting micro power generator , ” in Pr oceedings , ser . sOc-EUSAI ’05, 2005, pp. 287–292. [4] A. Kurs, A. Karalis, R. Moffatt, J. D. Joannopoulos, P . Fisher , and M. Soljai, “Wireless power transfer via strongly coupled magnetic resonances, ” vol. 317, no. 5834, pp. 83–86, 2007. [5] R. F . Serfozo, “Monotone optimal policies for marko v decision processes, ” in Stochastic Systems: Modeling, Identification and Optimization, II , ser . Mathematical Programming Studies. [6] M. L. Puterman, Markov Decision Processes: Discr ete Stochas- tic Dynamic Pr ogramming , 1st ed. New Y ork, NY , USA: John W iley & Sons, Inc., 1994. [7] Z. Mao, C. Koksal, and N. Shroff, “Resource allocation in sensor networks with renewable ener gy , ” in Computer Com- munications and Networks (ICCCN), 2010 Pr oceedings of 19th International Confer ence on , aug. 2010, pp. 1 –6. [8] S. Chen, P . Sinha, N. Shroff, and C. Joo, “Finite-horizon energy allocation and routing scheme in rechargeable sensor netw orks, ” in Proceedings of IEEE INFOCOM, 2011 , april 2011, pp. 2273 –2281. [9] C. E. Shannon, “ A mathematical theory of communication, ” SIGMOBILE Mob . Comput. Commun. Rev . , vol. 5, pp. 3–55, January 2001. [10] C. Renner , J. Jessen, and V . T urau, “Lifetime prediction for supercapacitor-po wered wireless sensor nodes. ” [11] D. P . Bertsekas, Dynamic Pro gramming and Optimal Contr ol, T wo V olume Set , 2nd ed. Athena Scientific, 2001.
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment