The Reliability Value of Storage in a Volatile Environment

This paper examines the value of storage in securing reliability of a system with uncertain supply and demand, and supply friction. The storage is frictionless as a supply source, but once used, it cannot be filled up instantaneously. The focus appli…

Authors: Ali Par, ehGheibi, Mardavij Roozbehani

The Reliability Value of Storage in a Volatile Environment
The Reliability V alue of Storage in a V olatile En vir onment Ali ParandehGheibi, Marda vij Roozbehani, Asuman Ozdaglar , and Munther A Dahleh Abstract —This paper examines the value of storage in secur - ing reliability of a system with uncertain supply and demand, and supply friction. The storage is frictionless as a supply source, but once used, it cannot be filled up instantaneously . The f ocus application is a power supply network in which the base supply and demand are assumed to match perfectly , while de viations fr om the base are modeled as random shocks with stochastic arrivals. Due to friction, the random surge shocks cannot be tracked by the main supply sources. Storage, when available, can be used to compensate, fully or partially , for the surge in demand or loss of supply . The problem of optimal utilization of storage with the objective of maximizing system reliability is formulated as minimization of the expected discounted cost of blackouts ov er an infinite horizon. It is shown that when the stage cost is linear in the size of the black out, the optimal policy is my opic in the sense that all shocks are compensated by storage up to the available level of storage. Howev er , when the stage cost is strictly con vex, it may be optimal to curtail some of the demand and allow a small current blackout in the interest of maintaining a higher level of reserv e to avoid a lar ge blackout in the futur e. The value of storage capacity in improving system’s reliability , as well as the effects of the associated optimal policies under different stage costs on the probability distribution of blackouts are examined. Index T erms —Storage, Ramp Constraints, Reliability , Prob- ability of Large Blackouts I . I N T RO D U C T I O N Supply and demand in electric power networks are subject to exogenous, impulsi ve, and unpredictable shocks due to generator outages, failure of transmission equipments or unexpected changes in weather conditions. On the other hand, en vironmental causes along with price pressure hav e led to a global trend in large-scale integration of renewable resources with stochastic output. This is likely to increase the magnitude and frequency of impulsiv e shocks to the supply side of the network. W e ask, what is the value of storage in mitigating volatility of supply and demand, and what are the fundamental limits that cannot be ov ercome by storage due to physical ramp constraints, and finally , what are the impacts of different control policies on system reliability , for instance, on the expected cost or the probability of large blackouts? In this paper our focus is on the reliability value of storage, defined as the maximal improv ement in system reliability as a function of storage capacity . T w o metrics for quantifying reliability in a system are considered: The first is the e xpected long-term discounted cost of blackouts (cost of blackouts This work was supported by the National Science Foundation. The authors are with the Laboratory for Information and Decision Systems, Department of Electrical Engineering and Computer Science, Massachusetts Institute of T echnology , Cambridge, MA. Emails: { parandeh, mardavij, asuman, dahleh } @mit.edu. (COB) metric), and the second is the probability of loss of load by a certain amount or less. W e model the system as a supply-demand model that is subject to random arri v als of ener gy deficit shocks, and a stor - age of limited capacity , with a ramp constraint on charging, but no constraint on discharging. The storage may be used to partially or completely mask the shocks to av oid blackouts. W e formulate the problem of optimal storage management as the problem of minimization of the COB metric, and provide sev eral characterizations of the optimal cost function. By ignoring other factors such as the en vironment, cost of energy or storage, we characterize the value of storage purely from a reliability perspectiv e, and examine the effects of physical constraints on system reliability . Moreover , for a general con ve x stage cost function, we present various structural properties of the optimal policy . In particular , we pro ve that for a linear stage cost, a myopic policy which compensates for all shocks regardless of their size by draining from storage as much as possible, is optimal. Howe v er , for nonlinear stage costs where the penalty for larger blackouts is significantly higher, the myopic policy is not optimal. Intuitively , the optimal policy is inclined to mitigate large blackouts at the cost of allowing more frequent small blackouts. Our numerical results confirm this intuition. W e further in vestigate the value of additional storage under different control policies, and for different ranges of system parameters. Our results suggest that if the ratio of the average rate of deficit shocks to ramp constraints is sufficiently large, there is a critical le vel of storage capacity above which, the value of ha ving additional capacity quickly diminishes. When this ratio is significantly large, there seems to be another critical level for storage size below which, storage capacity provides very little value. Finally , we in v estigate the ef fect of storage size and volatility of the demand/supply process on the probability of large blackouts under various policies. W e observe that for all control policies, there appears to be a critical le v el of storage size, above which the probability of suf fering large blackouts diminishes quickly . Recent works hav e examined the effects of ramp con- straints on the economic v alue of storage [1]. Herein, our focus is on reliability . Prior research on using queueing models for characterization of system reliability , particularly in power systems, has been reported in [2] and [3]. Similar models and concepts exist in the queueing theory literature [4], [5], perhaps with different application contexts. Despite similarities, our model is different than those of [2], [3] in many ways. W e assume that the storage capacity (reserve in their model) is fixed and find the optimal polic y for with- drawing from storage (consuming from reserve), as opposed to always draining the reserve and optimizing the capacity . Another dif ference is that our model of uncertainty is a compound poisson process instead of the bro wnian motion used in [2], [3]. W e show that the myopic policy of always draining storage to mask ev ery energy deficit shock is not optimal for strictly con ve x costs, and inv estigate the effects of nonlinear stage costs (strictly con ve x cost of blackouts) on the optimal policy and the statistics of blackouts. The organization of this paper is as follows. Section II presents the elements of the model and the problem formulation. Section III includes the main analytical results. Section IV presents the numerical simulations and discus- sions. Finally , Section V includes the concluding remarks. Notation. Throughout the paper , I A denotes the indicator function of a set A . The operator [ x ] + = max { 0 , x } is the pr ojection operator onto the nonnegative orthant. I I . T H E M O D E L W e examine an abstract model of system consisting of a single consumer , a single fully controllable supplier , a sup- plier with stochastic output (e.g., wind), and a storage system with finite capacity (Figure 1). These agents each represent an aggregate of sev eral small consumers and producers. The details of the model are outlined below . Fig. 1. Layout of the physical layer of a power supply network with con ventional and renewable generation, storage, and demand. A. Supply 1) Contr ollable Supply: The controllable supply process is denoted by G = { G t : t ≥ 0 } , where G t is the po wer output at time t ≥ 0 . It is assumed that the supplier’ s production is subject to an upward ramp constraint, in the sense that its output cannot increase instantaneously , G t − G t 0 t − t 0 ≤ ζ , ∀ t : 0 ≤ t < t 0 . W e do not assume a downward ramp constraint or a maxi- mum capacity constraint on G t . Thus, production can shut down instantaneously , and can meet any large demand suffi- ciently far in the future. 2) Renewable Supply: The renewable supply process is denoted by R = { R t : t ≥ 0 } . It is assumed that R can be modeled as a process with two components: R = R + ∆ R , where R = { R t : t ≥ 0 } is a deterministic process representing the predicted renew able supply , and ∆ R = { ∆ R t : t ≥ 0 } is the residual supply assumed to be a random arriv al process. Thus, at any giv en time t ≥ 0 , the total forecast supply from the renew able and controllable generators is given by G t + R t . B. Demand The demand process is denoted by D = { D t : t ≥ 0 } , where D t is the total power demand at time t , assumed to be exogenous and inelastic. Similar to the renewable supply , D has two components: D = D + ∆ D , where D = { D t : t ≥ 0 } is the predicted demand process (deterministic), and ∆ D = { ∆ D t : t ≥ 0 } is the residual demand, again, assumed to be a random arriv al process. Definition 1. The power imbalance is defined as the residual demand minus the residual supply . P t = ∆ D t − ∆ R t (1) The normalized energy imbalance is defined as: W t = P 2 t 2 ζ (2) C. Storag e The storage process is denoted by s = { s t ∈ [0 , s ] : t ≥ 0 } , where s t is the amount of stored energy at time t, and s < ∞ is the storage capacity . The storage technology is subject to an upward ramp constraint: s t − s t 0 t − t 0 ≤ r, ∀ t : 0 ≤ t < t 0 . Thus, storage cannot be filled up instantaneously , though, it can be drained (to supply power) instantaneously . Let U = { U t : t ≥ 0 } , be the power withdrawal process from storage. The dynamics of storage is then giv en by: s t = s 0 + Z t 0 I { s τ 0 is the discount rate. E. Pr oblem F ormulation In this section we present the problem formulation. Before we proceed, we pose the following assumptions. Assumption 1. The normalized energy imbalance process (2) is the jump process in a compound poisson process with arriv al rate Q and jump size distribution f W , where the support of f W lies within a bounded interval [0 , B ] . The maximum jump size is thus upperbounded by B . Assumption 2. The forecast supply is equal to the forecast demand. That is: D t = G t + R t , ∀ t ≥ 0 Under Assumption 2, the energy from storage will be used only to compensate for the power imbalance, since in the absence of an ener gy shock, supply is equal to demand, and storage provides no additional utility . Under Assumptions 1 and 2, the dynamics of the storage process can be written as: s t = s 0 + Z t 0 I { s τ 0 , and g (0) = 0 . The system reliability problem can now be formulated as an infinite horizon stochastic optimal control problem C µ ( s ) → min µ ∈ Π (7) where the optimization problem (7) is subject to the state dynamics (5). A policy µ ∗ ∈ Π is defined to be optimal if µ ∗ ∈ arg min µ ∈ Π C µ ( s ) . The associated value function or optimal cost function is denoted by C ( s ) , where C ( s ) = min µ ∈ Π C µ ( s ) , 0 ≤ s ≤ ¯ s. (8) I I I . M A I N R E S U LT S A. Characterizations of the V alue Function W e first provide several characterizations for the value function defined in (8) and establish specific properties that are useful in characterization of the optimal policy . Let J µ ( s, w ) be the expected long-term discounted cost under policy µ conditioned on the first jump arriving at time t 1 = 0 , and being of size w . Here, s is the state of the system before ex ecuting the action dictated by the policy . By the memoryless property of the Poisson process, we hav e J µ ( s, w ) = g ( w − µ ( s, w )) + E h ∞ X k =1 e − θt k g ( W k − µ ( s t − k , W k ))    s 0 = s − µ ( s, w ) i (9) W e may relate J µ ( s, W ) to the total expected cost C µ ( s ) defined in (6) as follows: C µ ( s ) = E  e − θt 0 J µ (min { s + r t 0 , ¯ s } , W )  , (10) where t 0 is an exponential random variable with mean 1 /Q , and is independent of W , drawn from distrib ution f W . From (10), it is clear that from the minimization of J µ across all admissible policies Π , we may obtain the optimal solution to the original problem in (8). The discrete-time for- mulation of J µ giv en by (9), facilitates deri ving the Bellman equation as the necessary and sufficient optimality condition, as well as dev elopment of efficient numerical methods. W e summarize these results in the following theorem. Theorem 1. Given an admissible contr ol policy µ ∈ Π , let J µ : [0 , s ] × [0 , B ] 7→ R be the function defined as in (9). A function J : [0 , s ] × [0 , B ] 7→ R satisfies J ( s, w ) = J ∗ ( s, w ) def = min µ ∈ Π J µ ( s, w ) , ∀ ( s, w ) , if and only if it satisfies the following fixed-point equation: J ( s, w ) = ( T J )( s, w ) def = min u ∈ [0 , min { s,w } ]  g ( w − u ) + E h e − θt 0 J  min { s − u + r t 0 , ¯ s } , W  i  , (11) Mor eover , a stationary policy µ ∗ ( s, w ) is optimal if and only if u = µ ∗ ( s, w ) achieves the minimum in (11) for J = J ∗ . F inally , the value iteration algorithm J k +1 = T J k , (12) con ver g es to J ∗ for any initial condition J 0 . Pr oof: The result follows from establishing the contrac- tion property of T , which is standard for discounted problems with bounded stage cost. See [6] for more details. An alternativ e approach to characterization of the optimal cost function is based on continuous-time analysis of problem (8), which leads to Hamilton-Jacobi-Bellman (HJB) equation. In the follo wing theorem we present some basic properties of the optimal cost function as well as the HJB equation. Theorem 2. Let C ( s ) be the optimal cost function defined in (8). The following statements hold: (i) C ( s ) is strictly decr easing in s . (ii) If the stage cost g ( · ) is con ve x, the optimal cost function C ( s ) is also con vex in s . (iii) If C is continuously differ entiable , then for all s ∈ [0 , ¯ s ] , it satisfies the following HJB equation dC ds = Q + θ r C ( s ) − Q r E h min u ∈ [0 , min { s,W } ] g ( W − u ) + C ( s − u ) i , (13) with the boundary condition dC ds    s = ¯ s = 0 . (14) Mor eover , the optimal policy µ ∗ ( s, w ) achieves the optimal solution of the minimization pr oblem in (13). Furthermor e, for a given policy µ , if the cost function C µ ( s ) is differ en- tiable, it satisfies the following delay differ ential equation dC µ ds = Q + θ r C µ ( s ) − Q r E h g ( W − µ ( s, W )) + C µ ( s − µ ( s, W )) i , (15) with the boundary condition given by (14). Pr oof: See the Appendix. The result of Theorem 2 part (iii) requires continuous differentiability of the optimal cost function, which can be established under some mild conditions such as differen- tiability of the stage cost function g and the probability density function f W ( · ) of Poisson jumps (cf. Ben v eniste and Scheinkman [7]). Throughout this paper , we assume that C ( s ) is in fact continuously differentiable and the results of Theorem 2 are applicable. B. Characterizations of the Optimal P olicy In this subsection, we derive some structural properties of the optimal policy using the optimal cost characterizations giv en in Theorems 1 and 2. First, we show that the myopic policy of allocating reserve energy from storage to cover as much of ev ery shock as possible is optimal for linear stage cost functions. Then, we partially characterize the structure of optimal policy for strictly con v ex stage cost functions. Theorem 3. If the stage cost is linear , i.e., g ( x ) = β x for some β > 0 , then the myopic policy µ ∗ ( s, w ) = min { s, w } , (16) is optimal for problem (8). Pr oof: See the Appendix. Next, we focus on nonlinear b ut con v ex stage cost func- tions. In this case, the myopic policy defined in (16) is no longer optimal. Intuitively , the myopic policy greedily consumes the reserve and thereby increases the chance of a large blackout. In the linear stage cost case, the penalty for a large blackout is equiv alent to the total penalty of many small blackouts. This is contrary to the strictly con ve x case. Therefore, the optimal policy in this case tends to be more conservati ve in consuming the reserve. Ne vertheless, the structure of the optimal policy shows some similarities with the myopic policy . In the following we present some characterizations of the structural properties of the optimal policy using the results from Section III-A. Assumption 4. The storage process has a positive drift in the sense that the rate of the compound Poisson process is less than the ramp constraint, i.e., Q E [ W ] ≤ r. Theorem 4. Let µ ∗ ( s, w ) be the optimal policy associated with pr oblem (8) . If Assumption 4 holds, then µ ∗ ( s, w ) is monotonically nondecr easing in both s and w . Pr oof: See the Appendix. Theorem 5. Let µ ∗ denote the optimal policy associated with pr oblem (8) with strictly con ve x stage cost g ( · ) . Ther e exist a unique kernel function φ : [ − B , ¯ s ] → R such that µ ∗ ( s, w ) = h w − φ ( s − w ) i + , ∀ ( s, w ) ∈ [0 , ¯ s ] × [0 , B ] , (17) wher e, φ ( p ) = arg min x g ( x ) + C ( x + p ) (18) s.t. x ≤ min { B , ¯ s − p } x ≥ max { 0 , − p } Mor eover , under Assumption 4, we can r epr esent the kernel function φ ( p ) as follows: φ ( p ) =      − p, − B ≤ p ≤ b 0 φ ◦ ( p ) , b 0 ≤ p ≤ b 1 0 , b 1 ≤ p ≤ ¯ s , (19) wher e φ ◦ ( p ) is the unique solution of g 0 ( x ) + C 0 ( x + p ) = 0 , (20) and b 0 and b 1 ar e the break-points, wher e b 0 = − ( g 0 ) ( − 1)  C 0 (0)  ≥ − ( g 0 ) ( − 1)  Q r E [ g ( W )]  ≥ − B , (21) b 1 = − ( C 0 ) ( − 1)  g 0 (0)  ≤ ¯ s. (22) Pr oof: See Appendix. Theorem 5 demonstrates a very special structure for the optimal policy . In fact, it sho ws that the two dimensional policy can be represented using a single dimensional kernel function. This result allows us to significantly reduce the computational complexity of numerical methods for comput- ing the optimal policy . In addition, using Theorem 5, we can provide a qualitati ve picture of the structure of the optimal policy . Figures 3 and 4 illustrate a conceptual plot of the kernel function, and the optimal policy , respectiv ely . Fig. 3. Structure of the kernel function φ ( p ) defined in (18). Fig. 4. Structure of the optimal policy µ ∗ ( s, w ) for a conv ex stage cost, for w = w 1 , w 2 . In particular, we can summarize the characterization of the optimal policy as follows. If w ≥ − b 0 , we have µ ∗ ( s, w ) =    s, 0 ≤ s ≤ s 0 ( w ) w − φ ◦ ( s − w ) , s 0 ( w ) ≤ s ≤ s 1 ( w ) w , s 1 ( w ) ≤ s ≤ ¯ s , (23) where s i ( w ) = w + b i for i = 0 , 1 . In the case where w ≤ − b 0 , we have µ ∗ ( s, w ) =    0 , 0 ≤ s ≤ q 0 ( w ) w − φ ◦ ( s − w ) , q 0 ( w ) ≤ s ≤ s 1 ( w ) w , s 1 ( w ) ≤ s ≤ ¯ s , (24) where q 0 ( w ) is the unique solution of φ ◦ ( s − w ) = w . I V . N U M E R I C A L S I M U L A T I O N S In this part, we present numerical characterizations of the optimal cost function and optimal policy in different scenarios. Moreover , we study the effect of storage size and volatility on system performance, for various control policies. W e use the value iteration algorithm (12) to compute the optimal policy and cost function for nonlinear stage costs. Figures 5 and 6 illustrate the optimal policy and cost function in a scenario with uniformly distributed random jumps, quadratic stage cost, and the following parameters: θ = 0 . 1 , r = 1 , Q = 0 . 8 , ¯ s = 2 . Observe that the optimal policy complies with the conceptual Figure 4. Figure 7 shows the value of storage, defined as the nor- malized improvement of energy storage in expected cost, for different Poisson arri val rates. In this case θ = 0 . 01 , g ( x ) = x 3 , r = 1 , W = 1 . Note that the storage process has a negati v e drift if and only if Q > 1 . Observe that in the positiv e or zero drift cases, e ven a small value of storage yields a significant effect in reducing the blackout cost. Howe v er , in the negati ve drift case, the value of storage is significantly lower . Observe that for the negati ve drift case, 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 s μ ( s, w ) w =1 w =0 . 9 w =0 . 8 w =0 . 7 w =0 . 6 w =0 . 5 w =0 . 4 w =0 . 3 w =0 . 2 w =0 . 15 w =0 . 1 w =0 . 05 Fig. 5. Optimal policy computed by value iteration algorithm (12) for quadratic stage cost and uniform shock distribution. 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 0 0.02 0.04 0.06 0.08 0.1 0.12 s C ( s ) Fig. 6. Optimal cost function computed by value iteration algorithm (12) for quadratic stage cost and uniform shock distribution. there is a critical storage size that yields a sharp improvement in the value of storage. A. Blackout Statistics W e discussed in Section III-B that the myopic policy giv en by (16) is not necessary optimal for nonlinear stage cost functions. In this part, we study the ef fect of different optimal policies, in the sense of (7), for different stage costs on the distribution of large blackouts. Figure 8 shows the blackout distribution in a scenario with deterministic jumps of size one, for both myopic policy and the optimal policy for a cubic cost function. Note that, the stage cost for the non- myopic policy assigns a significantly higher weight to larger blackouts. Therefore, as we can see in Figure 8, the non- myopic policy results in less frequent large blackouts at the price of more frequent small blackouts. Next, we study the effect of storage size on probability of large blackouts. Figure 9 plots this metric for different policies that are all optimal for different stage cost functions. Similarly to Figure 7, we observe a sharp improv ement of the reliability metric at a critical storage size. It is worth mentioning that given a target reliability metric, the storage size required by the optimal policy with cubic stage cost is about half of what is required by the myopic policy . Finally , we compare the reliability of myopic and non- myopic policies in terms of probability of large blackouts as a function of the volatility of the demand/supply process. W e define volatility as the energy of the shock process, i.e., volatility = Q E [ W 2 ] , which depends both on the mean and v ariance of the com- pound arriv al process. Figure 10 demonstrates large blackout 0.5 1 1.5 2 2.5 3 3.5 4 0 10 20 30 40 50 60 70 80 90 100 ¯ s 10 0 × (1 − c (¯ s, ¯ s ) c (0 , 0) ) Q = 0.5 Q = 1.0 Q = 1.5 Q = 3.0 Fig. 7. V alue of energy storage as a function of the storage capacity for different Poisson arriv al rates. c ( s ; ¯ s ) denotes the optimal cost function (8) when the storage capacity is given by ¯ s . Fig. 8. Blackout distribution comparison of myopic and non-myopic policies (deterministic jumps with rate Q = 0 . 8 ). probabilities as a function of volatility , for a system with uniformly distrib uted jumps with constant mean R E [ W ] = 1 . As shown in Figure 10, higher volatility increases the prob- ability of large blackouts in an almost linear fashion. V . C O N C L U S I O N S W e examined the reliability value of storage in a po wer supply network with uncertainty in supply/demand and up- ward ramp constraints on both supply and storage. The uncer - tainty was modeled as a compound poisson arri val of energy deficit shocks. W e formulated the problem of optimal control of storage for maximizing system reliability as minimization ov er all stationary Markovian control policies, of the infinite horizon expected discounted cost of blackouts. W e sho wed that for a linear stage cost, a myopic policy which uses storage to compensate for all shocks regardless of their size is optimal. Howe ver , for strictly con vex stage costs the myopic policy is not optimal. Our results suggest that for high ratios of the average rate of shock size to storage ramp rate, there is a critical level of storage size abov e which, the value of additional capacity quickly diminishes. For ratios around three and abov e, there seems to be another critical le vel belo w which, storage capacity provides very little value. Finally , Our results suggest that for all control policies, there seems to be a critical level of storage size, above which the probability of suf fering large blackouts diminishes quickly . 0 0.5 1 1.5 2 2.5 3 0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 ¯ s Pr(black out size > 0.8) Myopic policy Optimal policy ( g(x) = x 3 ) Optimal policy ( g(x) = x 2 ) Fig. 9. Probability of large blackouts as a function of storage size for different policies (deterministic jumps with rate Q = 1 . 0 ). 1 1.05 1.1 1.15 1.2 1.25 1.3 1.35 0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 volatility Pr(blackout size > 0.8) Myopic policy Non−myopic policy Fig. 10. Probability of large blackouts vs. volatility for different policies (uniformly distributed random jumps with Q = 1 . 0 and E [ W ] = 1 ). A P P E N D I X Proof of Theorem 2 : P art (i): The monotonicity property of the value function follows almost immediately from the definition. Let 0 ≤ s 1 < s 2 ≤ ¯ s , and assume C ( s ) = C µ ( s ) for some policy µ . Given the initial state s 1 , let u (1) t be the control process under polic y µ . Note that for e very realization ω of the compound Poisson process, the sample path u (1) t ( ω ) is admissible for initial condition s 2 > s 1 . Therefore, by definitions (6) and (8), we have C ( s 2 ) ≤ C ( s 1 ) . In order to show the strict monotonicity , consider the controlled process starting from s 1 . Let τ be the first arriv al time such that g ( W τ − u (1) τ ) > 0 . By Assumption 3, we have P ( τ ∈ [0 , T ]) > 0 for some T < ∞ . For ev ery sample path ω , define the control process u (2) t ( ω ) = u (1) t ( ω ) + δ · I { t = τ ( ω ) } , for some δ > 0 such that δ ≤ min { s 2 − s 1 , W τ ( ω ) − u (1) τ ( ω ) } . It is clear that u (2) t ( ω ) is admissible for the controlled process starting from s 2 . Using the definition of the expected cost function in (6), we can write C ( s 1 ) − C ( s 2 ) = E ω [ e − θτ ( ω ) g ( W τ ( ω ) − u (1) τ ( ω ) ) − e − θτ ( ω ) g ( W τ ( ω ) − u (1) τ ( ω ) − δ )] ≥ E [ e − θτ ( ω ) ] , for some  > 0 ≥ e − θT P ( τ ∈ [0 , T ]) > 0 , where the first inequality holds by strict monotonicity of g . P art (ii): W e first prov e con vexity of J ∗ ( s, w ) defined in Theorem 1, and use it to establish con vexity of C ( s ) . In order to sho w conv exity of J ∗ ( s, w ) , we need to sho w that the operator T defined in (11) preserves conv exity . Then the claim would be immediate using the con ver gence of value iteration algorithm (12) to optimal cost J ∗ , where the initial condition is an arbitrary conv ex function such as J 0 = 0 . Next we show that the operator T preserves con vexity for this particular problem. Define the objective function in (11) as Q ( s, w , u ) . W e hav e Q ( s, w , u ) = g ( w − u ) + E h e − θt 0 J  min { s − u + r t 0 , ¯ s } , W  i = g ( w − u ) + Z ∞ ¯ s − s + u r e − θt 0 E [ J  ¯ s, W  ] Re − Qt 0 dt 0 + Z ¯ s − s + u r 0 e − θt 0 E [ J  s − u + r t 0 , W  ] Re − Qt 0 dt 0 . Using the fact that J is con ve x, linearity of expectation and basic definition of a con vex function, it is straightforward but tedious to sho w that Q ( s, w , u ) is a conv ex function. W e omit the details for brevity . Gi ven the conv exity of Q , the conv exity of ( T J )( s, w ) is immediate, since we are minimizing a multidimensional con vex function ov er one of its dimensions. Hence, we hav e established con ve xity of J ∗ ( s, w ) in ( s, w ) . Finally , we can express C ( s ) in terms of J ∗ ( s, w ) as in (10). This results in conv exity of C ( s ) using the abov e argument for proving con vexity of Q ( s, w , u ) . P art (iii): The deriv ation of Hamilton-Jacobi-Bellman is relativ ely standard. W e omit the proof for brevity , and present a proof sketch based on principle of optimality in [8]. For a more detailed treatment, please refer to [6], [9] and [10]. Proof of Theorem 3 : W e establish optimality of µ ∗ by showing that it achiev es an expected cost no higher than any other admissible policy . Consider an admissible polic y ˜ µ such that ˜ µ ( s, w ) < min { s, w } for some ( s, w ) ∈ [0 , ¯ s ] × [0 , B ] . For ev ery sample path of the controlled process, let τ 1 ( ω ) be the first Poisson arriv al time such that min { s τ − 1 , W τ 1 } − ˜ µ ( s τ − 1 , W τ 1 ) =  > 0 . Therefore, by applying policy ˜ µ instead of µ ∗ , we pay an extra penalty of β e − θτ 1 ( ω ) . The rew ard for this extra penalty is that the state process is now biased by at most  , which allows us to av oid later penalties. Ho wever , since the stage cost is linear , the penalty reduction by this bias for any time τ 2 ( ω ) > τ 1 ( ω ) is at most β e − θτ 2 ( ω ) . Hence, for this sample path ω , the policy ˜ µ does worse than the myopic policy µ ∗ at least by β  ( e − θτ 1 ( ω ) − e − θτ 2 ( ω ) ) > 0 . Therefore, by taking the expectation for all sample paths, the myopic policy cannot do worse than any other admissible policy . Note that this argument does not prove the uniqueness of µ ∗ as the optimal policy . In fact, we may construct optimal policies that are different from µ ∗ on a set A ⊆ [0 , ¯ s ] × [0 , B ] , where P (( s t − , W t ) ∈ A ) = 0 .  W e delay the proof of Theorem 4 until after proof of Theorem 5. Let us start with some useful lemmas on the structure of the kernel function. Lemma 1. Let φ ( p ) be defined as in (18). W e have 1) If φ ( p 0 ) = − p 0 for some p 0 , then φ ( p ) = − p, for all p ≤ p 0 . 2) If φ ( p 1 ) = 0 for some p 1 , then φ ( p ) = 0 , for all p ≥ p 1 . Pr oof: By conv exity of the stage cost function and Theorem 2(ii), φ ( p ) is the optimal solution of a con vex program. Therefore, if φ ( p 0 ) = − p 0 for some p 0 ≤ 0 , we hav e g 0 ( − p 0 ) + C 0 (0) ≥ 0 . Thus, by con vexity of stage cost, g ( − p ) ≥ g ( − p 0 ) , for any p ≤ p 0 . Therefore, by conv exity of C ( · ) and g ( · ) , g 0 ( x ) + C 0 ( x + p ) ≥ g 0 ( − p ) + C 0 (0) ≥ 0 , for all x ≥ − p, which immediately implies optimality of ( − p ) , for p ≤ p 0 . Similarly , for the case where φ ( p 1 ) = 0 , we hav e g 0 (0) + C 0 ( p 1 ) ≥ 0 , which implies g 0 ( x ) + C 0 ( x + p ) ≥ g 0 (0) + C 0 ( p ) ≥ 0 , for all p ≥ p 1 , hence, the objectiv e is nondecreasing for all feasible x and φ ( p ) = 0 . Lemma 2. Let C ( s ) be defined as in (8), and assume that the stage cost g ( · ) is con vex. Then dC ds ( s ) ≥ − Q r E W [ g ( W )] , 0 ≤ s ≤ ¯ s. (25) Pr oof: By Theorem 2(ii), the optimal cost function C ( s ) is con vex. Hence, dC ds ( s ) ≥ dC ds (0) . On the other hand, by Theorem 2(iii), we can write dC ds (0) = Q + θ r C (0) − Q r E W h min u =0 g ( W − 0) + C (0) i . Combining the two preceding relations proves the claim. Lemma 3. If Assumption 4 holds, then the first constraint in (18) is never active, i.e., φ ( p ) < min { B , ¯ s − p } . Pr oof: W e show that under Assumption 4, the slope of the objective function is always non-negativ e at x = min { B , ¯ s − p } . In the case where ¯ s − p ≤ B , we have ∂ ∂ x  g ( x ) + C ( x + p )     x = ¯ s − p = g 0 ( ¯ s − p ) + C 0 ( ¯ s ) ≥ 0 , where the inequality follows from monotonicity of g and (14). For the case where ¯ s − p ≥ B , we employ Lemma 2 and Assumption 4 to write ∂ ∂ x  g ( x ) + C ( x + p )     x = B = g 0 ( B ) + C 0 ( B + p ) ≥ g 0 ( B ) − Q r E W [ g ( W )] ≥ g 0 ( B ) − E W [ g ( W )] E [ W ] ≥ 0 , where the last inequality holds because g ( w ) ≤ w g 0 ( B ) , for all w ≤ B , which is a conv exity result. Proof of Theorem 5 : By Theorem 2(iii), we can charac- terize the optimal policy as µ ∗ ( s, w ) = argmin g ( w − u ) + C ( s − u ) (26) s.t. 0 ≤ u ≤ min { s, w } . Note that the optimization problem in (26) is con ve x, because g ( · ) and hence, C ( · ) is con ve x (cf. Theorem 2(ii)). Using the change of variables x = w − u, p = s − w , we can rewrite (26) as µ ∗ ( s, w ) = w − x ∗ ( p, w ) , where x ∗ ( p, w ) = argmin g ( x ) + C ( p + x ) (27) s.t. x ≥ max { 0 , − p } x ≤ w . The optimization problem in (27) depends on both pa- rameters p and w . W e may remov e the dependency on w as follows. Since w ≤ B , ¯ s − p , we may relax the last constraint, x ≤ w , by replacing it with x ≤ min { B , ¯ s − p } The optimal solution of the relaxed problem is the same as φ ( p ) defined in (18). If φ ( p ) < w , then the relaxed constraint is not active, and φ ( p ) is also the solution of (27). Otherwise, since we hav e a con vex problem, the constraint x ≤ w must be active, which uniquely identifies the optimal solution as w . Therefore, the optimal solution of the problem in (27) is given by x ∗ ( p, w ) = min { φ ( p ) , w } . Combining the preceding relations, we obtain µ ∗ ( s, w ) = w − min { φ ( s − w ) , w } = h w − φ ( s − w ) i + . The representation in (19) is a direct consequence of Lemmas 1 and 3. Between some break-points b 0 and b 1 , the optimal solution of (18) can only be an interior solution, which is giv en by (20). The uniqueness of φ ◦ ( p ) follows from strict con vexity of g . Finally , by continuous dif ferentiability of the cost function, equation (20) should hold at the break- points as well. Therefore, g 0 ( b 0 ) + C 0 ( b 0 + ( − b 0 )) = 0 , g 0 (0) + C 0 (0 + b 1 ) = 0 , which is equiv alent to the characterizations in (21) and (22). The first inequality in (21) holds by Lemma 25 and conv exity of g ( · ) , and the second inequality holds by Assumption 4 and applying con vexity of g ( · ) again.  Lemma 4. Let φ ( p ) be defined as in (18), and assume that Assumption 4 holds and the stage cost g ( · ) is strictly conve x. Then for all p 1 ≤ p 2 , − ( p 2 − p 1 ) ≤ φ ( p 2 ) − φ ( p 1 ) ≤ 0 . (28) Pr oof: W e first establish the monotonicity of φ ( p ) . Let p 1 < p 2 . Given the structure of the kernel function in (19), there are multiple cases to consider , for most of which the claim is immediate using (19). W e only present the case where − B ≤ p 1 ≤ b 1 and b 0 ≤ p 2 ≤ b 1 . A necessary optimality condition at p 1 is gi ven by g 0 ( φ ( p 1 )) + C 0 ( p 1 + φ ( p 1 )) ≥ 0 . (29) Similarly , for p 2 , we must have g 0 ( φ ( p 2 )) + C 0 ( p 2 + φ ( p 2 )) = 0 , (30) Now , assume φ ( p 2 ) > φ ( p 1 ) . By con vexity of C ( · ) (cf. Theorem 2(ii)) and strict conv exity of g ( · ) , we obtain g 0 ( φ ( p 2 ))+ C 0 ( p 2 + φ ( p 2 )) > g 0 ( φ ( p 1 ))+ C 0 ( p 1 + φ ( p 1 )) ≥ 0 , which is a contradiction to (30). For the second part of the claim, again, we should consider sev eral cases depending on the interv al to which p 1 and p 2 belong. Here, we present the case where b 0 ≤ p 1 ≤ b 2 and b 0 ≤ p 2 ≤ ¯ s . The remaining cases are straightforward using (19). In this case, we have g 0 ( φ ( p 1 )) + C 0 ( p 1 + φ ( p 1 )) = 0 , (31) g 0 ( φ ( p 2 )) + C 0 ( p 2 + φ ( p 2 )) ≥ 0 . (32) Combine the optimality conditions in (31) and (32) to get g 0 ( φ ( p 2 )) + C 0 ( p 2 + φ ( p 2 )) ≥ g 0 ( φ ( p 1 )) + C 0 ( p 1 + φ ( p 1 )) (33) Assume φ ( p 2 ) < φ ( p 1 ) ; otherwise, the claim is trivial. By strict con ve xity of g ( · ) , we have g 0 ( φ ( p 2 )) < g 0 ( φ ( p 1 )) . Therefore by (33), it is true that C 0 ( p 2 + φ ( p 2 )) > C 0 ( p 1 + φ ( p 1 )) . (34) Now assume φ ( p 2 ) − φ ( p 1 ) < − ( p 2 − p 1 ) . By rearranging the terms of this inequality and in voking the con ve xity of C ( · ) , we get C 0 ( p 2 + φ ( p 2 )) ≤ C 0 ( p 1 + φ ( p 1 )) , which is in contradiction to (34). Therefore, the claim holds. Proof of Theor em 4 : First, note that by Lemma 4, we get φ ( s 2 − w ) ≤ φ ( s 1 − w ) , for all w , s 1 ≤ s 2 which implies (cf. Theorem 5) µ ∗ ( s 2 , w ) =  w − φ ( s 2 − w )] + ≥  w − φ ( s 1 − w )] + = µ ∗ ( s 1 , w ) . Moreov er , for all s and w 1 ≤ w 2 , we can use the second part of Lemma 4 to conclude φ ( s − w 1 ) − φ ( s − w 2 ) ≥ − ( w 2 − w 1 ) . By rearranging the terms, it follows that µ ∗ ( s, w 2 )=  w 2 − φ ( s − w 2 )] + ≥  w 1 − φ ( s − w 1 )] + = µ ∗ ( s, w 1 ) , which completes the proof.  R E F E R E N C E S [1] A. Faghih, M. Roozbehani, and M. A. Dahleh. Optimal utilization of storage and the induced price elasticity of demand in the presence of ramp constraints. Decision and Contr ol (CDC), 2011 50th IEEE Confer ence on , T o Appear, 2011. [2] M. Chen, I.-K. Cho, and S.P . Meyn. Reliability by design in a distributed power transmission network. Automatica , 42:1267–1281, August 2006. (in vited). [3] I.-K. Cho and S. P . Meyn. A dynamic newsbo y model for optimal reserve management in electricity markets. Submitted for publication, SIAM J. Control and Optimization., 2009. [4] D. Gross, J. F . Shortle, J. M. Thompson, and C. M. Harris. Funda- mentals of queueing theory . John W iley & Sons, Hoboken, NJ, 4th edition, 2009. [5] Sean Meyn. Control T echniques for Complex Networks . Cambridge Univ ersity Press, New Y ork, NY , USA, 1st edition, 2007. [6] D. Bertsekas. Dynamic Progr amming and Optimal Contr ol , volume 1,2. Athena Scientific, 3rd edition, 2007. [7] L.M. Benv eniste and J.A. Scheinkman. On the differentiability of the value function in dynamic models of economics. Econometrica , 47(3), 2010. [8] A. ParandehGheibi, M. Roozbehani, A. Ozdaglar , and M. Dahleh. The reliability value of storage in a volatile environment. LIDS report. [9] W .H. Fleming and H. M. Soner . Contr olled Markov Processes and V iscosity Solutions . Springer , 2nd edition, 2005. [10] R. J. Elliott. Stochastic Calculus and Applications . SpringerV erlag, 1982.

Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Leave a Comment