A deterministic algorithm for fitting a step function to a weighted point-set

Reading time: 6 minute
...

📝 Original Info

  • Title: A deterministic algorithm for fitting a step function to a weighted point-set
  • ArXiv ID: 1109.1152
  • Date: 2012-10-12
  • Authors: Herve Fournier and Antoine Vigneron

📝 Abstract

Given a set of n points in the plane, each point having a positive weight, and an integer k>0, we present an optimal O(n \log n)-time deterministic algorithm to compute a step function with k steps that minimizes the maximum weighted vertical distance to the input points. It matches the expected time bound of the best known randomized algorithm for this problem. Our approach relies on Cole's improved parametric searching technique.

💡 Deep Analysis

Figure 1

📄 Full Content

A function f : R → R is called a k-step function if there exists a real sequence a 1 < • • • < a k-1 such that the restriction of f to each of the intervals (-∞, a 1 ), [a i , a i+1 ) and [a k-1 , +∞) is a constant. A weighted point in the plane is a triplet p = (x, y, w) ∈ R 3 where (x, y) ∈ R 2 represents the coordinates of p and w > 0 is a weight associated with p. We use d(p, f ) to denote the weighted vertical distance w • |f (x) -y| between p and f . For a set of weighted points P , we define the distance d(P, f ) between P and a step function f as:

Given the point-set P , our goal is to find a k-step function f that minimizes d(P, f ).

This histogram construction problem is motivated by databases applications, where one wants to find a compact representation of the dataset that fits into main memory, so as to optimize query processing [6]. The unweighted version, where w i = 1 for all i, has been studied extensively, until optimal algorithms were found. (See our previous article [5] and references therein.)

The weighted case was first considered by Guha and Shim [6], who gave an O(n log n + k 2 log 6 n)-time algorithm. Lopez and Mayster [10] gave an O(n 2 )-time algorithm, which is thus faster for small values of k. Then Fournier and Vigneron [5] gave an O(n log 4 n) algorithm, which was further improved to O(min(n log 2 n, n log n + k 2 log n k log n log log n)) by Chen and Wang [2]. Eventually, an optimal randomized O(n log n)-time algorithm was obtained by Liu [9]. In this note, we present a deterministic counterpart to Liu’s algorithm, which runs in O(n log n) time. This time bound is optimal as the unweighted case already requires Ω(n log n) time [5].

Our approach combines ideas from previous work on this problem [6,7] with the improved parametric searching technique by Cole [4].

Our result has a direct application to the k-center problem on the real line: Given a set of n points r 1 < • • • < r n ∈ R, with weights w 1 , . . . , w n , the goal is to find a set of k centers c 1 , . . . , c k ∈ R that minimizes the maximum over i of the weighted distance w i d(r i , {c 1 , . . . , c k }). Given such an instance of the weighted k-center problem, we construct an instance of our stepfunction approximation problem where the input points are p i = (i, r i ) for i = 1, . . . , n, keeping the same weights w 1 , . . . , w n . Then these two problems are equivalent: The y-coordinates of the k steps of an optimal step-function give an optimal set of k centers. So our algorithm also solves the weighted k-center problem on the line in O(n log n) time, improving on a recent result by Chen and Wang [3].

We consider an input set of weighted points P = {(x i , y i , w i ) | 1 i n}, and an integer k > 0. Let ε * denote the optimal distance from P to a k-step function, that is,

Karras et al. [7] made the following observation: Lemma 1 Given a set of n weighted points sorted with respect to their x-coordinate, an integer k > 0 and a real ε > 0, one can decide in time

The above lemma is obtained by a greedy method, going through the points from left to right and creating a new step whenever necessary. More than k steps are created along this process if and only if ε < ε * . A consequence is that once ε * is known, an optimal k-step function can be built in linear time by running this algorithm on ε = ε * .

A second observation, made by Guha and Shim [6], is the following. The distance of a point p = (x i , y i , w i ) to the constant function c is equal to d(p, c) = w i • |cy i |. Hence, for a (non empty) subset Q ⊆ P of the input points, the distance min{d(Q, f ) | f is a constant function} between Q and the closest constant function is given by the minimum y-coordinate of the points in the region U Q defined as:

In other words, the distance between Q and the closest 1-step function is the y-coordinate of the lowest vertex in the upper envelope U Q of the lines with equation y = ±w i (x-y i ) corresponding to the points (x i , y i , w i ) ∈ Q. (There is only one lowest vertex as the slopes ±w i are nonzero.)

An immediate consequence is the following. For i ∈ {1, . . . , n}, let 2i-1 be the line defined by the equation y = w i (x-y i ), and 2i the line defined by y = -w i (x-y i ). Let L = { 1 , . . . , 2n }. We denote by A(L) the arrangement of these lines. (See Figure 1.)

The optimal distance ε * from a set of weighted points P to a k-step function is the y-coordinate of a vertex of A(L).

The deterministic algorithm presented here will be obtained by performing a search on the vertices of A(L), calling the decision procedure of Lemma 1 only O(log n) times, and with an overall extra time O(n log n). We achieve it by applying Cole’s improved parametric searching technique [4]: (ii) There exists a linear order on the set {(i, j) | 1 i < j n} such that

and such that we can decide if (i, j) (i , j ) in O(1) time.

Then, the array A can be sorted in O(n log n) time.

We briefly explain Cole’s method. Recall

📸 Image Gallery

cover.png

Reference

This content is AI-processed based on open access ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut