Approximate Dynamic Programming using Halfspace Queries and Multiscale Monge decomposition
Let $P=(P_1, P_2, \ldots, P_n)$, $P_i \in \field{R}$ for all $i$, be a signal and let $C$ be a constant. In this work our goal is to find a function $F:[n]\rightarrow \field{R}$ which optimizes the following objective function: $$ \min_{F} \sum_{i=…
Authors: Gary L. Miller, Richard Peng, Russell Schwartz
The above optimization problem reduces to solving the following recurrence, which can be done efficiently using dynamic programming in O(n 2 ) time:
The above recurrence arises naturally in applications where we wish to approximate the original signal P with another signal F which consists ideally of few piecewise constant segments. Such applications include database (e.g., histogram construction), speech recognition, biology (e.g., denoising aCGH data) applications and many more.
In this work we present two new techniques for optimizing dynamic programming that can handle cost functions not treated by other standard methods. The basis of our first algorithm is the definition of a constant-shifted variant of the objective function that can be efficiently approximated using state of the art methods for range searching. Our technique approximates the optimal value of our objective function within additive ǫ error and runs in Õ(n 1.5 log ( U ǫ )) time, where U = maxi fi. The second algorithm we provide solves a similar recurrence within a factor of ǫ and runs in O(n log 2 n/ǫ). The new technique introduced by our algorithm is the decomposition of the initial problem into a small (logarithmic) number of Monge optimization subproblems which we can speed up using existing techniques.
analysis and many others. Due to its importance a lot of work has focused on speeding up naive dynamic programming implementations. Such techniques include the Knuth-Yao technique [21,28,29], a special case of the use of totally monotone matrices [5], the SMAWK algorithm for finding the row-minima of totally monotone matrices [4], and several other techniques exploiting special properties such as the convexity and concavity [12,15]. The basis of these techniques lies in the theory of Monge properties for optimization [8], which date back to the French mathematician Monge [23].
In this work, we consider the following recurrence, where P ∈ R n and C is a constant:
This recurrence arises naturally in several applications where one wants to approximate a given signal f with a signal F which ideally consists of few piecewise constant segments. Such applications include histogram construction in databases (e.g., [20,17,18]), determining DNA copy numbers in cancer cells from micro-array data (e.g., [24,26]), speech recognition, data mining (e.g., [25]) and many others.
In this work we present two new techniques for optimizing dynamic programming that can handle cost functions not treated by other standard methods. The basis of our first algorithm is the definition of a constant-shifted variant of the objective function that can be efficiently approximated using state of the art methods for range searching. Our technique approximates the optimal value of our objective function within additive ǫ error and runs in Õ(n 1.5 log ( U ǫ )) time, where U = max i f i . The second algorithm we provide solves a similar recurrence within a factor of ǫ and runs in O(n log 2 n/ǫ). The new technique introduced by our algorithm is the decomposition of the initial problem into a small (logarithmic) number of Monge optimization subproblems which we can speed up using existing techniques.
The remainder of the paper is organized as follows: Section 2 presents briefly existing work on the problem and the necessary background. Section 3 presents our proposed algorithms and their theoretical analysis and Section 4 concludes the paper with a brief summary and discussion.
In this section, we briefly summarize existing work on speeding up dynamic programming. First, we briefly present existing work on speeding up dynamic programming. Then, we present state of the art results on reporting points in halfspaces, an important case of range queries in computational geometry, and on optimizing Monge functions.
Dynamic programming is a powerful problem solving technique introduced by Bellman [6] with numerous applications in biology (e.g., [24,19,27]), in control theory (e.g., [7]), in operations research and many other fields. Due to its importance, a lot of research has focused on speeding up naive dynamic programming implementations. A successful example of speeding up a naive dynamic programming implementation is the computation of optimal binary search trees. Gilbert and Moore solved the problem efficiently using dynamic programming [16]. Their algorithm runs in O(n 3 ) time and for several years this running time was considered to be tight. In 1971 Knuth [21] showed that the same computation can be carried out in O(n 2 ) time. This remarkable result was generalized by Frances Yao in [28,29]. Specifically, Yao showed that this dynamic programming speedup technique works for a large class of recurrences. She considered the recurrence c(i, i) = 0, c(i, j) = min i 0 is an arbitrarily small constant and c ′ = c ′ (d) is a constant depending on the dimension.
Here, we refer to a theorem in [12] which we use in Section 3.2. A function w defined on pairs of integer indices is Monge if for any 4-tuple of indices
Then the following Theorem holds:
Given a Monge function w : {0 . . . n}×{0 . . . n} → R, and a vector (a 0 , a 1 . . . a n-1 ) the value of min j 0 can be rewritten as:
The transition can be viewed as a weight function w(j, i) that takes the two indices j and i as parameters such that w(j, i)
+ C. The dynamic programming is then equivalent to a shortest path from 0 to n.
The weight function does not have the Monge property, as demonstrated by the vector P 0 = 1, P 2 = 2, P 3 = 0, P 4 = 2, . . . , P 2k-1 = 0, P 2k = 2, P 2k+1 = 1. When C = 1, thee optimal choices of j for i = 1, . . . , 2k are j = i -1, i.e., we fit one segment per point. However, once we add in P 2k+1 the optimal solution changes to to fitting all points on a single segment. Therefore, choosing a transition to j 1 over one to j 2 at some i does not allow us to discard j 2 from future considerations. This is one of the main difficulties for applying techniques based on the increasing order of decision points, such as the method of Eppstein et al. [12], to reduce the complexity of the O(n 2 ) algorithm in [26].
Let
We claim that DP i is the solution to a simpler optimization problem: Lemma 1. DP i satisfies the following optimization formula:
Proof. As 0 m=1 P 2 m = 0, the result is true for i = 0. If i > 0, substituting in equation 2 gives:
Elimination of the terms i m=1 P 2 m from both sides gives our result. Observe that the second order moments of OP T i are absent from DP i .
We can use w(j, i) to denote the shifted weight function, aka. w(j, i) = -
C, it's easy to check that w(j, i) = w(j, i) +
Our proposed method (as shown in Algorithm 1) uses the results of [22] to obtain a fast algorithm for the additive approximation variant of the problem. Specifically, the algorithm initializes a 4-dimensional halfspace query data structure. The algorithm then uses binary searches to compute an accurate estimate of the value DP i for i = 1, . . . , n. As errors are introduced at each term, we use DP i to denote the approximate value of DP i calculated by earlier iterations of the binary search, and DP i to be the optimum value of the transition function computed using the approximate values of DP j . Formally:
+C.
Theorem 3 shows that it's sufficient to approximate DP i to within an additive ǫ/n of DP i in order to approximate DP n within additive ǫ. Let U = max { √ C, P 1 , . . . , P n }, the maximum value of the objective function is upper-bounded by the cost one would incur from declaring there is single interval with x = 0, giving a bound of U 2 n. Therefore O(log( U 2 n ǫ/n )) = Õ(log ( U ǫ )) iterations of binary search at each i are sufficient. To check whether DP i ≥ x + C, we need to solve the decision problem of whether there exists a j < i such that the following inequality holds:
The term on the right hand side can be interpreted as the dot product between (x, i, S i , -1) and (j, DP j , 2S j , S 2 j +j DP j ). If we think of the values (j, DP j , 2S j , S 2 j + DP j j) as points in R 4 , the decision problem becomes whether the intersection of a point set with a halfplane is null. If the point set has size n, this can be done in Õ(n 0.5 ) per query and O(log n) amortized time per insertion of a point [22]. So the optimal value of DP i can be found within an additive constant of ǫ/n using the binary search in Õ(n 0.5 log ( U ǫ )) time. Therefore, we can proceed along the indices from 1 to n, find the approximately optimal value of OP T i and insert a point corresponding to it into the query structure, getting an algorithm that runs in Õ(n 1.5 log ( U ǫ )) time. The following theorem states that a small error at each step suffices to give an overall good approximation. We show inductively that if DP i approximates DP i within ǫ/n, DP i is within iǫ/n additive error from the optimal value DP i . For the proof of Theorem 3, see the Appendix 4. Theorem 3. Let DP i be the approximation of our algorithm to DP i . Then, the following inequality holds:
By substituting in Theorem 3 i = n we obtain the following Corollary, proving that our algorithm is an approximation algorithm within ǫ additive error.
Corollary 1. Let DP n be the approximation of our algorithm to DP n . Then, the following inequality holds:
3.2 O(n log 2 n/ǫ) algorithm to approximate within multiplicative ǫ
Once again, consider the transition function w in the optimization formula for OP T i .
Our approach is based on approximating w with a logarithmic number of Monge functions, while incurring a multiplicative error of at most ǫ. When viewed from the context of a shortest path problem, we are perturbing the weight of each edge by ǫ. So as long as the weight of each edge is positive, the length of any path, and therefore the optimal answer computed in the perturbed version, is off by a factor of ǫ as well.
The main idea of our algorithm is as follows: we break our initial problem into a small (logarithmic) number of Monge optimization subproblems which we can speed up using existing techniques, e.g., [12]. We achieve this by detecting which part of w(j, i) causes w(j, i) not to be Monge, and then by finding intervals in which we can approximate it accurately with a constant. We also make sure the optimal breakpoints of the Monge sub-problems lie in the specified subintervals by making the function outside that subinterval arbitrarily large while maintaining its Monge property.
In other words, w(j, i) = w ′ (j, i)/(i -j) + C. Since V ar(X) = E(X 2 ) -E(X) 2 , an alternate formulation of w ′ (j, i) is:
Proof. Since each term in the summation is non-negative, it suffices to show that any pair of (m 1 , m 2 ) is summed as many times on the left hand side as on the right hand side. If i 2 + 1 ≤ m 1 < m 2 ≤ i 3 , each term is counted twice on each side. Otherwise, each term is counted once on the left hand side since i 1 + 1 ≤ m 1 < m 2 ≤ i 4 and at most once on the right hand side since
Also, as w ′ (i, j), i -j and C are all non-negative, approximating w ′ (i, j)/(i -j) within a factor of ǫ gives an approximation of w(i, j) within a factor of ǫ. For each i, we try to approximate i -j (j < i) with a constant c ′ such that c ′ ≤ i -j ≤ c ′ (1 + ǫ). Since 1 ≤ i -j ≤ n, we only need log (1+ǫ) n = log n/ log(1 + ǫ) distinct values of c ′ for all transitions. Note that this is equivalent to j being in the range[l, r]
Since c ′ is a constant, w ′ (j, i)/c ′ is also a Monge function. However, notice that we can only use j when c ′ ≤ i -j ≤ c ′ (1 + ǫ). This constraint on pairs (j, i) can be enforced by setting w k (j, i) to arbitrarily large values when (j, i) do not satisfy the condition. This ensures that j will not be used as a breakpoint for i. Furthermore, w k needs to be adjusted to remain Monge in order to keep the assumptions of Theorem 2 valid. Since the [i 1 +1, i 4 ] is the longest and [i 2 +1, i 3 ] is the shortest range respectively, one possibility is to assign exponentially large values to very long and short ranges. The following is one possibility when M is an arbitrarily large positive constant:
Proof. As M is arbitrarily large, we may assume w k (j, i) ≥ w ′ (j, i).
We assume i 3 -i 1 ≤ i 4 -i 2 since the mirror case can be considered similarly. Suppose
Adding them gives the desired result.
So the equation for OP T i becomes:
Note that storing values of the form 2 k M using only their exponent k suffices for comparison, so this adjustment does result in any change in runtime. By Theorem 2, the Monge function optimization problem for each w k can be solved in O(n log n) time, giving a total runtime of O(n log 2 n/ǫ). Pseudocode for this algorithm is shown in Algorithm 2. The algorithm uses m = log n/ǫ copies of the algorithm mentioned in theorem 2 implicitly.
In this work we present two new techniques for optimizing dynamic programming that can handle cost functions not treated by other standard methods. The first algorithm approximates the optimal value of our objective function within additive ǫ error and runs in Õ(n 1.5 log ( U ǫ )) time, where U = max i f i . The efficiency of our algorithm is based on the work of Jirí Matousek [22], since our binary search on the value of a costantshifted variant of the objected reduces to performing halfspace queries. The second algorithm solves a similar recurrence within a factor of ǫ and runs in O(n log 2 n/ǫ). The new technique introduced by our algorithm is the decomposition of the initial problem into a small (logarithmic) number of Monge optimization subproblems which we can speed up using existing techniques [12].
While the recurrences we solve are not treated using existing techniques for speeding up dynamic programming in an exact way, the results we obtain suggest that the O(n 2 ) bound is not tight, i.e., there exists more structure which we can take advantage of. For example, the following lemma holds(see Appendix for a proof):
, then in the optimal solution of the dynamic programming using L 2 norm, i 1 and i 2 are in different segments.
In future work, we plan to exploit the structure inherent to our problem to obtain a faster, exact algorithm.
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment