Online Sum-Radii Clustering

Online Sum-Radii Clustering ? Dimitris Fotakis 1 and Paraschos K outris 2 1 School of Electrical and Computer Engineering, National T echnical Uni versity of Athens, 157 80 Athens, Greece. fotakis@cs.ntua.gr 2 Computer Science and Engineering, Univ ersity of W ashington, U.S.A. pkoutris@cs.washington.edu Abstract. In Online Sum-Radii Clustering, n demand points arriv e online and must be irre vocably assigned to a cluster upon arriv al. The cost of each cluster is the sum of a ﬁxed opening cost and its radius, and the objectiv e is to minimize the total cost of the clusters opened by the algorithm. W e show that the deterministic competitiv e ratio of Online Sum-Radii Clustering for general metric spaces is Θ (log n ) , where the upper bound follo ws from a primal-dual algorithm and holds for general metric spaces, and the lo wer bound is valid for ternary Hierarchically W ell-Separated Trees (HSTs) and for the Euclidean plane. Combined with the results of (Csirik et al., MFCS 2010), this result demonstrates that the deterministic competiti ve ratio of Online Sum-Radii Clustering changes abruptly , from constant to logarithmic, when we move from the line to the plane. W e also show that Online Sum-Radii Clustering in metric spaces induced by HSTs is closely related to the Parking Permit problem introduced by (Meyerson, FOCS 2005). Exploiting the relation to Parking Permit, we obtain a lo wer bound of Ω (log log n ) on the randomized competitiv e ratio of Online Sum-Radii Clustering in tree metrics. Moreover , we present a simple randomized O (log n ) -competiti ve algorithm, and a deterministic O (log log n ) -competitive algorithm for the fractional v ersion of the problem. Keyw ords: Online Algorithms, Competitiv e Analysis, Sum- k -Radii Clustering 1 Introduction In clustering problems, we seek a partitioning of n demand points into k groups, or clusters , so that a gi ven objectiv e function, that depends on the distance between points in the same cluster , is minimized. T ypical examples are the k -Center problem, where we minimize the maximum cluster diameter , the Sum- k -Radii problem, where we minimize the sum of cluster radii, and the k -Median problem, where we minimize the total distance of points to the nearest cluster center . These are fundamental problems in Computer Science, with many important applications, and hav e been extensi vely studied from an algorithmic vie wpoint (see e.g., [19] and the references therein). In this work, we study an online clustering problem closely related to Sum- k -Radii. In the on- line setting, the demand points arri ve one-by-one and must be irrev ocably assigned to a cluster upon arri v al. Speciﬁcally , when a new demand point u arri ves, if it is not covered by an open cluster, the algorithm has to open a new cluster cov ering u and to assign u to it. Opening a new cluster means that the algorithm must irrev ocably ﬁx the center and the radius of the new cluster . W e emphasize that once formed, clusters cannot be merged, split, or ha ve their center or radius changed. The goal is to open few clusters with a small sum of radii. Ho we ver , instead of requiring that at most k clusters ? This work was supported by the project AlgoNow , co-ﬁnanced by the European Union (European Social Fund - ESF) and Greek national funds, through the Operational Program “Education and Lifelong Learning”, under the research funding program THALES. Part of this work was done while P . K outris was with the School of Electrical and Computer Engi- neering, National T echnical University of Athens, Greece. An e xtended abstract of this work appeared in the Proceedings of the 37th Symposium on Mathematical Foundations of Computer Science (MFCS 2012), Branislav Rov an, Vladimiro Sassone, and Peter W idmayer (Editors), Lecture Notes in Computer Science 7464, pp. 395-406, Springer, 2012. open, which would lead to an unbounded competitiv e ratio, we follow [7,8] and consider a Facility- Location-like relaxation of Sum- k -Radii, called Sum-Radii Clustering . In Sum-Radii Clustering, the cost of each cluster is the sum of a ﬁx ed opening cost and its radius, and we seek to minimize the total cost of the clusters opened by the algorithm. In addition to clustering and data analysis, Sum-Radii Clustering has applications to the problem of base station placement for the design of wireless networks where users are scattered to various locations (see e.g., [8,3,16]). In such problems, we place some wireless base stations and setup their communication range so that the communication demands are satisﬁed and the total setup and opera- tional cost is minimized. A standard assumption is that the setup cost is proportional to the number of stations installed, and the operational cost for each station is proportional to the energy consumption, which is typically modeled by a low-de gree polynomial of its range. In Sum-Radii Clustering, we study the particular v ariant where the operational cost has a linear dependence on the range. 1.1 Pre vious W ork In the ofﬂine setting, Sum- k -Radii and the closely related problem of Sum- k -Diameters 3 hav e been thoroughly studied. Sum- k -Radii is NP -hard e ven in metric spaces of constant doubling dimension [15]. Gibson et al. [14] prov ed that Sum- k -Radii in Euclidean spaces of constant dimension is poly- nomially solv able, and presented an O ( n log ∆ log n ) -time algorithm for Sum- k -Radii in general metric spaces, where ∆ is the diameter [15]. As for approximation algorithms, Doddi et al. [10] proved that it is NP -hard to approximate Sum- k -Diameters in general metric spaces within a factor less than 2 , and ga ve a bicriteria algorithm that achie ves a logarithmic approximation using O ( k ) clusters. Subse- quently , Charikar and Panigraphy [7] presented a primal-dual (3 . 504 + ε ) -approximation algorithm for Sum- k -Radii in general metric spaces, which uses as a building block a primal-dual 3 -approximation algorithm for Sum-Radii Clustering. Bil ´ o et al. [3] considered a generalization of Sum- k -Radii, where the cost is the sum of the α -th po wer of the clusters radii, for α ≥ 1 , and presented a polynomial-time approximation scheme for Euclidean spaces of constant dimension. Charikar and Panigraphy [7] also considered the incremental version of Sum- k -Radii, Similarly to the online setting, an incremental algorithm processes the demands one-by-one and assigns them to a cluster upon arri v al. Ho wev er , an incremental algorithm can also merge any of its clusters at any time. They presented an O (1) -competitiv e incremental algorithm for Sum- k -Radii that uses O ( k ) clusters. In the online setting, where cluster reconﬁguration is not allowed, the Unit Co vering and the Unit Clustering problems have receiv ed considerable attention. In both problems, the demand points arriv e one-by-one and must be irrev ocably assigned to unit-radius balls upon arri val, so that the number of balls used is minimized. The difference is that in Unit Cov ering, the center of each ball is ﬁxed when the ball is ﬁrst used, while in Unit Clustering, there is no ﬁxed center and a ball may shift and cov er more demands. Charikar et al. [6] pro ved an upper bound of O (2 d d log d ) and a lo wer bound of Ω (log d/ log log log d ) on the deterministic competitive ratio of Unit Covering in d dimensions. The results of [6] imply a competiti ve ratio of 2 and 4 for Unit Covering on the line and the plane, respec- ti vely . The Unit Clustering problem was introduced by Chan and Zarrabi-Zadeh [5]. The deterministic competiti ve ratio of Unit Clustering on the line is at most 5 / 3 [11] and no less than 8 / 5 [12]. Unit Clustering has also been studied in d -dimensions with respect to the L ∞ norm, where the competiti ve ratio is at most 5 6 2 d , for any d , and no less than 13 / 6 , for d ≥ 2 [11]. Departing from this line of work, Csirik at el. [8] studied online Clustering to minimize the sum of the Setup Costs and the Diameters of the clusters, or CSDF in short. Motiv ated by the difference 3 These problems are closely related in the sense that a c -competitiv e algorithm for Sum- k -Radii implies a 2 c -competiti ve algorithm for Sum- k -Diameters, and vice versa. 2 between Unit Co vering and Unit Clustering, they considered three models, the strict, the intermediate, and the ﬂexible one, depending on whether the center and the radius of a new cluster are ﬁxed at its opening time. Csirik at el. only studied CSDF on the line and prov ed that its deterministic competitiv e ratio is 1 + √ 2 for the strict and the intermediate model and (1 + √ 5) / 2 for the ﬂexible model. Recently , Div ´ eki and Imreh [9] studied online clustering in two dimensions to minimize the sum of the setup costs and the area of the clusters. The y proved that the competiti ve ratio of this problem lies in (2 . 22 , 9] for the strict model and in (1 . 56 , 7] for the ﬂe xible model. 1.2 Contribution Follo wing [8], it is natural and interesting to study the online clustering problem of CSDF in metric spaces more general than the line, such as trees, the Euclidean plane, and general metric spaces. In this work, we consider the closely related problem of Online Sum-Radii Clustering (OnlSumRad), and give upper and lower bounds on its deterministic and randomized competitive ratio for general metric spaces and for the Euclidean plane. W e restrict our attention to the strict model of [8], where the center and the radius of each new cluster are ﬁxed at opening time. T o justify our choice, we sho w that a c -competitive algorithm for the strict model implies an O ( c ) -competitiv e algorithm for the intermediate and the ﬂexible model. In Sections 4 and 5, we prove that the deterministic competitiv e ratio of OnlSumRad for general metric spaces is Θ (log n ) , where the upper bound follows from a primal-dual algorithm, and the lower bound is valid for ternary Hierarchically W ell-Separated Trees (HSTs) and for the Euclidean plane. This result is particularly interesting because it demonstrates that the deterministic competitiv e ratio of OnlSumRad (and of CSDF) changes abruptly , from constant to logarithmic, when we move from the line to the plane. Interestingly , this does not happen when the cost of each cluster is proportional to its area [9]. Another interesting ﬁnding is that OnlSumRad in metric spaces induced by HTSs is closely related to the Parking Permit problem introduced by Meyerson [18]. In Parking Permit, we cov er a set of dri ving days by choosing among K permit types, each with a giv en cost and duration. The permit costs are concav e, in the sense that the cost per day decreases with the duration. The algorithm is informed of the driving days in an online f ashion, and irre vocably decides on the permits to purchase, so that all driving days are co vered by a permit and the total cost is minimized. Meyerson [18] proved that the competitiv e ratio of Parking Permit is Θ ( K ) for deterministic and Θ (log K ) for randomized algorithms. In Section 3, we prove that OnlSumRad in HSTs with K + 1 le vels is a generalization of the Parking Permit problem with K permit types. Combined with the randomized lower bound of [18], this implies a lo wer bound of Ω (log log n ) on the randomized competitiv e ratio of OnlSumRad. Moreov er, we show that, under some assumptions, a c -competiti ve algorithm for P arking Permit with K types implies a c -competitive algorithm for OnlSumRad in HSTs with K le vels. W e conclude, in Sections 6 and 7, with a simple randomized O (log n ) -competitiv e algorithm, and a deterministic O (log log n ) -competiti ve algorithm for the fractional version of OnlSumRad. Both algorithms work for general metric spaces. The randomized algorithm is memoryless, in the sense that it keeps in memory only its solution, i.e., the centers and the radii of its clusters. The fractional algorithm is based on the primal-dual approach of [2,1], and generalizes the fractional algorithm of [18] for Parking Permit. 1.3 Other Related W ork OnlSumRad is a special case of Online Set Cov er [2] with sets of different weight. [2] presents a nearly optimal deterministic O (log m log N ) -competitiv e algorithm, where N is the number of elements and 3 m is the number of sets. Moreov er , if all sets have the same weight and each element belongs to at most d sets, the competitiv e ratio can be improv ed to O (log d log N ) . If we cast OnlSumRad as a special case of Online Set Cover , N is the number of points in the metric space, which can be much larger than the number of demands n , m = Ω ( n ) , and d = O (log n ) . Hence, a direct application of the algorithm of [2] to OnlSumRad does not lead to an optimal deterministic competitiv e ratio. This holds even if one could possibly extend the improved ratio of O (log d log N ) to the weighted set structure of OnlSumRad. At the conceptual lev el, OnlSumRad is related to the problem of Online Facility Location [17,13]. Ho we ver , the two problems exhibit a different behavior w .r .t. their competitiv e ratio, since the com- petiti ve ratio of Online Facility Location is Θ ( log n log log n ) , ev en on the line, for both deterministic and randomized algorithms [13]. 2 Notation, Problem Deﬁnition, and Pr eliminaries W e consider a metric space ( M , d ) , where M is the set of points and d : M × M 7→ N is the distance function, which is non-negati ve, symmetric and satisﬁes the triangle inequality . For a set of points M 0 ⊆ M , we let diam( M 0 ) ≡ max u,v ∈ M 0 { d ( u, v ) } denote the diameter , and let rad( M 0 ) ≡ min u ∈ M 0 max v ∈ M 0 { d ( u, v ) } denote the radius of M 0 . In a tree metric , the points correspond to the nodes of an edge-weighted tree and the distances are giv en by the tree’ s shortest path metric. For some α > 1 , a Hierar chically α -W ell-Separated T r ee ( α -HST) is a complete rooted tree with lengths on its edges that satisﬁes the following properties: (i) the edge length from any node to each of its children is the same, and (ii) the edge lengths along any path from the root to a leaf decrease by a factor of at least α on each lev el. W e say that an α -HST is strict if the distance of each leaf to its parent is 1 , and the edge lengths along any path from the root to a leaf decrease by a factor of α on each lev el. Thus, in a strict α -HST , the distance of any node v k at lev el k to its children is α k − 1 and the distance of v k to the nearest leaf is ( α k − 1) / ( α − 1) . W e usually identify a tree with the metric space induced by it. A cluster C ( p, r ) ≡ { v : d ( p, v ) ≤ r } is determined by its center p and its radius r , and consists of all points within a distance at most r to p . The cost of a cluster C ( p, r ) is the sum of its opening cost f and its radius r . Sum-Radii Clustering. In the ofﬂine version of Sum-Radii Clustering, we are giv en a metric space ( M , d ) , a cluster opening cost f , and a set D = { u 1 , . . . , u n } of demand points in M . The goal is to ﬁnd a collection of clusters C ( p 1 , r 1 ) , . . . , C ( p k , r k ) that cover all demand points in D and minimize the total cost, which is P k i =1 ( f + r i ) . Online Sum-Radii Clustering. In the online setting, the demand points arri ve one-by-one, in an online fashion, and must be irrev ocably assigned to an open cluster upon arriv al. Formally , the input to Online Sum-Radii Clustering (OnlSumRad) consists of the cluster opening cost f and a sequence u 1 , . . . , u n of (not necessarily distinct) demand points in an underlying metric space ( M , d ) . The goal is to maintain a set of clusters of minimum total cost that cov er all demand points revealed so f ar . In this work, we focus on the so-called Fixed-Cluster v ersion of OnlSumRad, where the center and the radius of each ne w cluster are irrev ocably ﬁx ed when the cluster opens. Thus, the online algorithm maintains a collection of clusters, which is initially empty . Upon arriv al of a new demand u j , if u j is not covered by an open cluster , the algorithm opens a new cluster C ( p, r ) that includes u j , and assigns u j to it. The algorithm incurs an irre vocable cost of f + r for the ne w cluster C ( p, r ) . Competitive Ratio. W e ev aluate the performance of online algorithms using competitive analysis (see e.g. [4]). A (randomized) algorithm is c -competitiv e if for any sequence of demand points, its 4 (expected) cost is at most c times the cost of the optimal solution for the corresponding ofﬂine Sum- Radii instance. The (e xpected) cost of the algorithm is compared against the cost of an optimal of ﬂine algorithm that is a ware of the entire demand sequence in adv ance and has no computational restrictions whatsoe ver . Simpliﬁed Optimal. The following proposition simpliﬁes the structure of the optimal solution con- sidered in the competitiv e analysis of our algorithms. It shows that, with losing a factor of 2 in the competiti ve ratio, we may assume that each optimal cluster has a radius of 2 k f , for some integer k ≥ 0 . Proposition 1. Let S be a feasible solution of an instance I of OnlSumRad. Then, there is a feasible solution S 0 of I with a cost of at most twice the cost of S , wher e each cluster has a r adius of 2 k f , for some inte ger k ≥ 0 . Pr oof. For each cluster C ( v , r ) of S , the ne w solution S 0 opens a cluster C ( v , 2 k f ) , where k = max {d log 2 ( r /f ) e , 0 } . Clearly , C ( v , 2 k f ) cov ers all demand points in C ( v , r ) , and thus S 0 is a feasible solution. As for the cost of S 0 , we next show that the cost of C ( v , 2 k f ) , which is (1 + 2 k ) f , is most twice the cost of C ( v , r ) , which is f + r . If r < f , in which case k = 0 , the cost of C ( v , 2 k f ) is 2 f . If r ≥ f , 2 k f ≤ 2 1+log 2 ( r/f ) f = 2 r . Therefore, the cost of C ( v , 2 k f ) is at most f + 2 r ≤ 2( f + r ) . u t Other V ersions of Online Sum-Radii Clustering. For completeness, we discuss two seemingly less restricted versions of OnlSumRad, corresponding to the intermediate and the ﬂexible model in [8], and show that they are essentially equiv alent to the Fixed-Cluster version considered in this work. In both versions, the demands are irrev ocably assigned to a cluster upon arriv al. In the Fixed-Radius version, only the radius of a ne w cluster is ﬁxed when the cluster opens. The algorithms incurs an irre vocable cost of f + r for each new cluster C of radius r . Then, new demands can be assigned to C , provided that rad( C ) ≤ r . In the Flexible-Cluster version, a cluster C is a set of demands with neither a ﬁxed center nor a ﬁxed radius. The algorithm’ s cost for each cluster C is f + rad( C ) , where rad( C ) may increase as new demands are added to C . The Fix ed-Cluster v ersion is a restriction of the Fixed-Radius version, which, in turn, is a restriction of the Flexible-Cluster version. The following proposition shows that the competitiv e ratios of the three versions are within a constant factor from each other . Proposition 2. A c -competitive algorithm for the F ixed-Radius (r esp. Flexible-Cluster) version im- plies a 2 c -competitive (r esp. 10 c -competitive) algorithm for the F ixed-Cluster version. Pr oof. W e ﬁrst assume a c -competitiv e algorithm A for the Fixed-Radius version. Based on A , we describe an algorithm A 0 for the Fixed-Cluster version that simulates the behavior of A and has a competiti ve ratio of at most 2 c . Whenev er the algorithm A opens a new cluster C of radius r and cov ers a ne w demand u , the algorithm A 0 opens a new cluster C 0 = C ( u, 2 r ) . The cost of C 0 is at most twice the cost of C . Moreov er, since any subsequent demand u 0 assigned to C by A is at distance at most 2 r to u , the new cluster C 0 also covers u 0 . Therefore, A 0 cov ers all demands with a total cost at most twice the total cost of A . Next, we assume a c -competitiv e algorithm A for the Flexible-Cluster version. Based on A , we describe an algorithm A 0 for the Fixed-Cluster version that simulates the behavior of A and has a competiti ve ratio of at most 10 c . Let u be a new demand assigned to a cluster C by the algorithm A . If C is a new cluster that includes only u , A 0 opens a new cluster C ( u, f ) and assigns u to it. Otherwise, let ˆ u be the demand in C arri ved ﬁrst. If u is co vered by an open cluster of A 0 centered at ˆ u , u is assigned to it. Otherwise, A 0 opens a ne w cluster C ( ˆ u, 2 k f ) , where k = d log 2 ( d ( ˆ u, u ) /f ) e , and assigns u to it. 5 W e compare the total cost of A and A 0 for co vering the demands in cluster C just after the assign- ment of u . The cost of A is at least f + diam( C ) / 2 . If diam( C ) ≤ f , all demands in C are within a dis- tance of f to C ’ s ﬁrst demand ˆ u , and are assigned to the cluster C ( ˆ u, f ) opened by A 0 when ˆ u arri ved. Thus, the total cost of A 0 for the demands in C is at most 2 f . If diam( C ) > f , A 0 cov ers the demands in C by opening, in the worst case, a sequence of ` + 1 clusters C ( ˆ u, f ) , C ( ˆ u, 2 f ) , . . . , C ( ˆ u, 2 ` f ) , where ` = d log 2 (diam( C ) /f ) e . Thus, the total cost of A 0 for the demands in C is at most ` X i =0 (1 + 2 i ) f = ( ` + 2 1+ ` ) f ≤ 5 diam( C ) , where the inequality follo ws from d log 2 x e ≤ x and d log 2 x e ≤ 1 + log 2 x , for all x > 1 . u t Parking Permit. In Parking Permit (ParkPermit), we are giv en a schedule of days, some of which are marked as driving days, and K types of permits, where a permit of each type k , k = 1 , . . . , K , has cost c k and duration d k . The goal is to purchase a set of permits of minimum total cost that cover all driving days. In the online setting, the driving days are presented one-by-one, and the algorithm irre vocably decides on the permits to purchase based on the dri ving days rev ealed so far . Meyerson [18] observed that by losing a constant factor in the competitiv e ratio, we can restrict our attention to instances with an interv al structure and with permit costs that scale geometrically and permit durations non-decreasing with type. More speciﬁcally , in the interval version of ParkPermit, the permits hav e a hierarchical structure, in the sense that each permit is av ailable ov er speciﬁc time interv als (e.g. a weekly permit is v alid from Monday to Sunday), ev ery day is cov ered by exactly one permit of each of the k types, and each permit of type k ≥ 2 has (an integer number of) d k /d k − 1 permits of type k − 1 embedded in it (see also Fig. 1 for the structure of an interv al instance). Moreover , for each permit type k , 1 < k ≤ K , c k ≥ 2 c k − 1 and d k ≥ d k − 1 . An interesting feature of the deterministic algorithm in [18] is that it is time-sequence-independent , in the sense that it applies, with the same competitive ratio of O ( K ) , e ven if the order in which the driving days are rev ealed may not be their time order (e.g. the adversary may mark August 6 as a driving day , before marking May 25 as a dri ving day). Meyerson [18] proves the following lower bounds on the competitiv e ratio of deterministic and randomized online algorithms for ParkPermit. Theorem 1 ([18, Theorem 3.2]). Any deterministic online algorithm for P arkP ermit has a competi- tive ratio of at least Ω ( K ) . Theorem 2 ([18, Theorem 4.6]). Any randomized online algorithm for P arkP ermit has an expected competitive ratio of at least Ω (log K ) . 3 Online Sum-Radii Clustering and Parking P ermit In this section, we show that OnlSumRad in tree metrics and the interval version of ParkPermit are closely related problems. Our results either are directly based on this correspondence or exploit this correspondence so that they draw ideas from ParkPermit. The correspondence is based on the fact that we can map the action of purchasing a permit type to that of opening a cluster of a speciﬁc radius and vice versa. For example, purchasing a single-day permit corresponds to opening a cluster of zero radius. Following the same logic, driving days for ParkPermit will be mapped to demands for OnlSumRad. Howe ver , since OnlSumRad in general metric spaces has a much richer geometric 6 2 1 3 4 driving days permit types time axis demand points levels c 4 − c 3 4 3 2 1 0 c 1 − 1 = 0 c 2 − c 1 c 3 − c 2 Fig. 1. An example of the reduction of Theorem 3. On the left, there is an instance of the interval version of ParkPermit. A feasible solution consists of the permits in grey . On the right, we depict the instance of OnlSumRad constructed in the proof of Theorem 3. The ParkPermit solution on left is mapped to a solution with clusters centered at each black node. The radius of each cluster is equal to the distance of its center to the nearest leaf. structure than the one-dimensional ParkPermit, we restrict OnlSumRad to HSTs to obtain a useful correspondence between the two problems. W e start with the following theorem, which shows that OnlSumRad in tree metrics is a general- ization of the interv al version of P arkPermit. Theorem 3. A c -competitive algorithm for Online Sum-Radii Clustering in HSTs with K + 1 levels implies a c -competitive algorithm for the interval version of P arking P ermit with K permit types. Pr oof. Giv en an instance I of the interval version of ParkPermit with K permit types, we construct an instance I 0 of OnlSumRad in an HST with K lev els such that any feasible solution of I is mapped, in an online fashion, to a feasible solution of I 0 of equal cost, and vice versa. Let I be an instance of the interv al v ersion of ParkPermit with K permit types of costs c 1 , . . . , c K and durations d 1 , . . . , d K . For simplicity and without loss of generality , we assume that c 1 = 1 and that all days are cov ered by the permit of type K . Moreover , by a slight modiﬁcation of the proof of [18, Theorem 2.1], we can assume that for each k = 2 , . . . , K , c k ≥ 3 c k − 1 and d k ≥ d k − 1 . Gi ven the costs and the durations of the permits, we construct a tree T with appropriate edge lengths, which giv es the metric space for I 0 . The construction exploits the tree-like structure of the interv al v ersion (see also Fig. 1). Speciﬁcally , the tree T has K + 1 lev els, its lea ves correspond to the days of I , and each node at level k , 1 ≤ k ≤ K , corresponds to a permit of type k . Formally , the tree T has a leaf, at level 0 , for each day in the schedule of I . For each interv al D 1 of d 1 days covered by a permit of type 1 , there is a level- 1 node v 1 in T whose children are the d 1 leav es corresponding to the days in D 1 . The distance of each le vel- 1 node to its children is c 1 − 1 = 0 . Hence, opening a cluster C ( v 1 , c 1 − 1) covers all nodes in the subtree rooted at v 1 (and thus all lea ves corresponding to the days in D 1 ). Similarly , for each interval D k of d k days covered by a permit of type k , 2 ≤ k ≤ K , there is a node v k at le vel k in T whose children are the d k /d k − 1 nodes at level k − 1 corresponding to the permits of type k − 1 embedded within the particular permit of type k . The distance of each lev el- k node to its children is c k − c k − 1 . Therefore, opening a cluster C ( v k , c k − 1) cov ers all nodes in the subtree rooted at v k (and thus all leav es corresponding to the days in D k ). This concludes the construction of the tree T that deﬁnes the metric space for I 0 . W e note that T is a 2 -HST , because the distance of each le vel- k node, 1 ≤ k ≤ K , to its children is c k ≥ c k − c k − 1 ≥ 2 c k − 1 . The cluster opening cost in instance I 0 is f = 1(= c 1 ) . As for the demand sequence of I 0 , for each dri ving day t in I , there is, in I 0 , a demand located at the leaf of T corresponding to t . 7 Based on the correspondence between a type- k permit and a cluster C ( v k , c k − 1) rooted at a le vel- k node v k , we next show that any solution of I is mapped, in an online fashion, to a solution of I 0 of equal cost, and vice versa. W e ﬁrst describe an online mapping of any feasible solution of I to a feasible solution of I 0 of equal cost. By the construction of T , a permit of type k that covers the dri ving days in an interv al D k in I corresponds to a node v k at le vel k of T , in the sense that opening a cluster C ( v k , c k − 1) covers all demands corresponding to the driving days in D k . Moreover , the cost of C ( v D k , c k − 1) is c k , i.e., equal to the cost of the corresponding permit. Therefore, opening the clusters corresponding to the permits bought by a feasible solution of I gi ves a feasible solution of I 0 of equal cost. For the con verse mapping, we assume that in any feasible solution of I 0 , all clusters are centered at nodes at le vels 1 , . . . , K of T and that any cluster centered at a le vel- k node v k has radius c k − 1 . This assumption is essentially without loss of generality , since any feasible solution without this property can be translated into a feasible solution of no greater cost that satisﬁes this property . Indeed, let C k = C ( v k , r ) be any cluster rooted at v k . If v k is a leaf, we can root C k at the ancestor of v k (recall that a leaf and its ancestor are at distance 0 to each other). If r < c k − 1 , C k does not co ver any leav es, and can be safely remo ved from the solution. Otherwise, if for some lev el j ≥ k , r ∈ [ c j − 1 , c j +1 − 1) , we can replace C k by a new cluster which is rooted at the lev el- j ancestor of v k and has a radius of c j − 1 . In all cases, the new cluster co vers all demand co vered by C k at no greater cost. In such a solution, each cluster C ( v k , c k − 1) costs c k and, by the construction of T , corresponds to a parking permit of type k that covers all the driving days corresponding to the demand points in the subtree rooted at v k . Therefore, buying the parking permits corresponding to the clusters opened by a feasible solution of I 0 gi ves a feasible solution of I of equal cost. u t In the proof of Theorem 3, if the ParkPermit instance has d 1 = 1 and c k = 2 k , for each type k , the tree T is essentially a strict 2 -HST with K lev els where all nodes at the same level k have d k /d k − 1 children. Thus, combined with Theorem 3, the following lemma shows that OnlSumRad in strict HSTs is closely related to the interv al version of P arkPermit. Lemma 1. A c -competitive time-sequence-independent algorithm for the interval version of P arking P ermit with K permit types implies a c -competitive algorithm for Online Sum-Radii Clustering in strict HSTs with K levels, wher e all nodes at the same level have the same number of childr en and all demands ar e located at the leaves. Pr oof. At the intuiti ve le vel, the proof applies the rev erse reduction of that in the proof of Theorem 3. More speciﬁcally , gi ven an instance I of OnlSumRad in a strict HST with K lev els, we construct an instance I 0 of the interval version of ParkPermit with K permit types, such that any solution of I is mapped, in an online fashion, to a solution of I 0 of equal cost, and vice versa. Let I be an instance of OnlSumRad in a strict α -HST T with K lev els, where all nodes at lev el k , 1 ≤ k ≤ K − 1 , hav e the same number n k of children, and all demands are located at the leav es of T . For simplicity and without loss of generality , we assume that the cluster opening cost is f = 1 . The permit structure of I 0 essentially reﬂects the hierarchical structure of T . Speciﬁcally , there is a day in the schedule of I 0 corresponding to each leaf of T . For each leaf v 0 , there is a permit of type 0 with cost c 0 = 1(= f ) and duration d 0 = 1 . This permit cov ers the day corresponding to v 0 and is equi v alent to a cluster C ( v 0 , 0) of cost 1 . Similarly , for each node v k at le vel k of T , 1 ≤ k ≤ K − 1 , there is a permit of type k with cost c k = ( α k + α − 2) / ( α − 1) and duration d k = Q k j =1 n j . This permit covers the days corresponding to the leav es of the subtree rooted at v k and is equiv alent to a cluster C ( v k , ( α k − 1) / ( α − 1)) of cost equal to c k . The permits of type k − 1 corresponding to the children of v k in T are embedded in the permit of type k corresponding to v k , in the sense that the 8 interv als cov ered by the former permits form a partition of the interval covered by the latter . As for the demand sequence of I 0 , for each demand of I located at a leaf v 0 of T , the day corresponding to v 0 in I 0 is marked as a dri ving day 4 . Next, we describe an online mapping of any feasible solution of I to a feasible solution of I 0 of equal cost. Similarly to the proof of Theorem 3, we assume, without loss of generality , that in any feasible solution of I , any cluster centered at a level- k node v k has a radius of ( α k − 1) / ( α − 1) . Then, each cluster C ( v k , ( α k − 1) / ( α − 1)) costs c k , and corresponds to a permit of type k that co vers all dri ving days corresponding to lea ves of the subtree rooted at v k . Therefore, purchasing the permits corresponding to the clusters of a feasible solution of I 0 gi ves a feasible solution of I of equal cost. For the con verse mapping, we observe that a permit of type k that cov ers the driving days in an interv al D k corresponds to a level- k node v k of T , in the sense that opening a cluster C ( v k , c k − 1) , of cost c k , covers all demand points corresponding to the driving days in D k . Therefore, opening the clusters corresponding to the permits purchased by a feasible solution of I gives a feasible solution of I 0 of equal cost. u t 4 Lower Bounds on the Competitiv e Ratio of Online Sum-Radii Clustering By Theorem 3, OnlSumR ad in trees with K + 1 levels is a generalization of ParkPermit with K permit types. Therefore, the results of [18], and in particular Theorem 1 and Theorem 2, imply a lo wer bound of Ω ( K ) (resp. Ω (log K ) ) on the deterministic (resp. randomized) competitiv e ratio of OnlSumRad in trees with K lev els. Howe ver , a lo wer bound on the competitiv e ratio of OnlSumRad would rather be expressed in terms of the number of demands n , because there is no simple and natural way of deﬁning the number of “le vels” of a general metric space, and because for online clustering problems, the competiti ve ratio, if not constant, is typically stated as a function of n . Going through the proofs of Theorem 3 and of Theorem 1 and Theorem 2 from [18], we can translate the lower bounds on the competiti ve ratio of ParkPermit, expressed as a function of K , into equi v alent lower lower bounds for OnlSumRad, expressed as a function of n . In fact, the proofs of Theorem 1 and Theorem 2 require that the ratio d k /d k − 1 of the number of days covered by permits of type k and k − 1 is 2 K . Thus, in the proof of Theorem 3, the tree T has (2 K ) K leav es, and the number of demands n is at most (2 K ) K . Combining this with the lo wer bound of Ω (log K ) on the randomized competiti ve ratio of P arkPermit (Theorem 2), we obtain the following corollary: Corollary 1. The competitive ratio of any randomized algorithm for Online Sum-Radii Clustering in tr ee metrics is Ω (log log n ) , wher e n is the number of demands. 4.1 A Stronger Lo wer Bound on the Deterministic Competitive Ratio This approach giv es a lower bound of Ω ( log n log log n ) on the deterministic competiti ve ratio of Online Sum-Radii Clustering. Using a strict ternary HST instead, we next obtain a stronger lo wer bound. Theorem 4. The competitive ratio of any deterministic online algorithm for Online Sum-Radii Clus- tering in tr ee metrics is Ω (log n ) , where n is the number of demands. Pr oof. For simplicity , let us assume that n is an integral power of 3 . For some constant α ∈ [2 , 3) , we consider a strict α -HST T of height K = log 3 n whose non-leaf nodes hav e 3 children each. 4 W e highlight that the leaves of T can appear in the demand sequence of I in any order . Thus, we require that the ParkPermit algorithm be time-sequence-independent, i.e., it can handle driving requests that arri ve out of the time order . 9 2 1 3 4 5 T c T a T b Fig. 2. An instance of an online algorithm. The subtree T a is not acti ve, since no demand has arrived at its leafs. On the other hand, both subtrees T b and T c are activ e. Moreover , T b is a bad subtree, since demand 1 has not resulted in opening a cluster that cov ers T b . T c is a good subtree, since demand 5 (the last one) opens a cluster at the root of T c , thus cov ering it. The cluster opening cost is f = 1 . Let A be any deterministic algorithm. W e consider a sequence of demands located at the leaves of T . More precisely , starting from the leftmost leaf and advancing to wards the rightmost leaf, the next demand in the sequence is located at the next leaf not covered by an open cluster of A . Since T has n leaves, A may cov er all leaves of T before the arriv al of n demands. Then, the demand sequence is completed in an arbitrary way that does not increase the optimal cost. W e let C OP T be the optimal cost, and let C A be the cost of A on this demand sequence. W e let c k = 1+ P k − 1 ` =0 α ` denote the cost of a cluster centered at a level- k node v k with radius equal to the distance of v k to the nearest leaf. W e observe that for any k ≥ 1 and any α ≥ 2 , c k ≤ αc k − 1 . W e classify the clusters opened by A according to their cost. Speciﬁcally , we let L k , 0 ≤ k ≤ K , be the set of A ’ s clusters with cost in [ c k , c k +1 ) , and let ` k = | L k | be the number of such clusters. The ke y property is that a cluster in L k can cover the demands of a subtree rooted at level at most k , but not higher . Therefore, we can assume that all A ’ s clusters in L k are centered at a level- k node and hav e cost equal to c k , and obtain a lo wer bound of C A ≥ P K k =0 ` k c k on the algorithm’ s cost. T o deriv e an upper bound on the optimal cost in terms of C A , we distinguish between good and bad acti ve subtrees, depending on the size of the largest radius cluster with which A cov ers the demand points in them. Formally , a subtree T k rooted at lev el k is active if there is a demand point located at some leaf of it. For an active subtree T k , we let C max T k denote the largest radius cluster opened by A when a new demand point in T k arri ves. Let j , 0 ≤ j ≤ K , be such that C max T k ∈ L j . Namely , C max T k is centered at a level- j node v j and cov ers the entire subtree rooted at v j . If j ≥ k , i.e. if C max T k cov ers T k entirely , we say that T k is a good (activ e) subtree (for the algorithm A ). If j < k , i.e. if C max T k does not cov er T k entirely , we say that T k is a bad (acti ve) subtree (for A ) (see also Fig. 2). For each k = 0 , . . . , K , we let g k (resp. b k ) denote the number of good (resp. bad) active subtrees rooted at lev el k . T o bound g k from abov e, we observe that the last demand point of each good activ e subtree rooted at lev el k is cov ered by a ne w cluster of A rooted at a le vel j ≥ k . Therefore, the number of good activ e subtrees rooted at le vel k is at most the number of clusters in ∪ K j = k L j . Formally , for each le vel k ≥ 0 , g k ≤ P K j = k ` j . T o bound b k from abov e, we ﬁrst observe that each activ e leaf / demand point is a good activ e lev el- 0 subtree, and thus b 0 = 0 . For each lev el k ≥ 1 , we observe that if T k is a bad subtree, then by the deﬁnition of the demand sequence, the 3 subtrees rooted at the children of T k ’ s root are all activ e. Moreov er, each of these subtrees is either a bad subtree rooted at 10 le vel k − 1 , in which case it is counted in b k − 1 , or a good subtree covered by a cluster in L k − 1 , in which case it is counted in ` k − 1 . Therefore, for each le vel k ≥ 1 , 3 b k ≤ b k − 1 + ` k − 1 . Using these bounds on g k and b k , we can bound from above the optimal cost in terms of C A . T o this end, the crucial observation is that we can obtain a feasible solution by opening a cluster of cost c k centered at the root of e very activ e subtree rooted at lev el k . Since the number of activ e subtrees rooted at le vel k is b k + g k , we obtain that for e very k ≥ 0 , C OP T ≤ c k ( b k + g k ) . Using the upper bound on g k and summing up for k = 0 , . . . , K , we have that ( K + 1) C OP T ≤ P K k =0 c k b k + P K k =0 c k P K j = k ` j . Using that c k ≤ α k and that c k ≤ αc k − 1 , which hold for all α ≥ 2 , we bound the second term by: K X k =0 c k K X j = k ` j = K X k =0 ` k k X j =0 c j ≤ K X k =0 ` k k X j =0 α j ≤ K X k =0 ` k c k +1 ≤ α K X k =0 ` k c k ≤ αC A T o bound the ﬁrst term, we use that for ev ery lev el k ≥ 1 , 3 b k ≤ b k − 1 + ` k − 1 and c k ≤ αc k − 1 . Therefore, (3 /α ) b k c k ≤ ( b k − 1 + ` k − 1 ) c k − 1 . Summing up for k = 1 , . . . , K , we have that: 3 α K X k =1 b k c k ≤ K X k =1 b k − 1 c k − 1 + K X k =1 ` k − 1 c k − 1 Using that b 0 = 0 and that α < 3 , we obtain that: 3 α K X k =0 b k c k ≤ K − 1 X k =0 b k c k + K − 1 X k =0 ` k c k ≤ K X k =0 b k c k + C A ⇒ K X k =0 b k c k ≤ α 3 − α C A Putting e verything together , we conclude that for any α ∈ [2 , 3) , ( K + 1) C OP T ≤ ( α + α 3 − α ) C A . Since K = log 3 n , this implies a lower bound of Ω (log n ) on the deterministic competitive ratio of OnlSumRad in tree metrics. u t Notably , the OnlSumRad instance constructed in the proof of Theorem 4 satisﬁes the conditions of Lemma 1. Moreover , since the demands in the proof of Theorem 4 appear from left to right, the ParkPermit algorithm used in the proof of Lemma 1 does not need to be time-sequence-independent. Therefore, the lower bound of Theorem 4 holds even for the subclass of OnlSumRad instances that are reducible to the interv al version of ParkPermit by the competiti ve-ratio-preserving transformation of Lemma 1. 4.2 A Lower Bound f or Deterministic Online Sum-Radii Clustering on the Plane Moti v ated by the fact that the deterministic competitiv e ratio of OnlSumRad on the line is constant [8], we study OnlSumRad in the Euclidean plane. The following theorem uses a constant-distortion planar embedding of a ternary strict α -HST , and establishes a lo wer bound of Ω (log n ) on the deterministic competiti ve ratio of OnlSumRad on the Euclidean plane. Theorem 5. The competitive ratio of any deterministic algorithm for Online Sum-Radii Clustering on the Euclidean plane is Ω (log n ) , wher e n is the number of demands. Pr oof. The idea is to use a planar embedding of a ternary strict α -HST T with distortion D α ≤ √ 2 α/ ( α − 2) , and sho w that a c -competiti ve algorithm for OnlSumRad on the plane implies a 2 cD α - competiti ve algorithm for OnlSumRad in strict α -HSTs. 11 1 2 7 22 20 21 5 14 16 15 19 18 17 6 3 10 31 29 30 8 23 25 24 28 27 26 9 4 13 40 38 39 37 36 35 12 11 32 34 33 1 a a 2 a 2 a 1 Fig. 3. An example of the embedding used in the proof of Theorem 5. The nodes and the edges connecting them depict the structure of a ternary strict α -HST with 4 lev els and α ≈ 2 . 5 . The locations of the nodes correspond to the locations in the plane to which they are mapped by the embedding. T o this end, we ﬁrst show that a constant-distortion planar embedding of a ternary strict α -HST T implies the theorem. Speciﬁcally , let α ∈ (2 , 3) be any constant, and let D α be the distortion of an embedding e that maps each node v of T to a point e ( v ) in the plane. Namely , for ev ery pair of nodes u, v of T , we have that d T ( u, v ) /D α ≤ d P ( e ( u ) , e ( v )) ≤ d T ( u, v ) , where d T ( u, v ) (resp. d P ( u, v ) ) denotes the distance of u and v in T (resp. in the Euclidean plane). Assuming the embedding e and a c - competiti ve deterministic algorithm A for OnlSumRad on the plane, we describe a 2 cD α -competiti ve algorithm A 0 for T . For any demand point u in T , we present the algorithm A with a demand located at e ( u ) . If A cov ers e ( u ) by opening a new cluster C ( v , r ) , the algorithm A 0 opens a ne w cluster C ( u, 2 D α r ) . Then, for ev ery node z of T for which e ( z ) is covered by C ( v , r ) , z is covered by the corresponding cluster C ( u, 2 D α r ) of A 0 . This holds because d P ( e ( u ) , e ( v )) ≤ 2 r and the distortion of e is D α . If e ( u ) is covered by an existing cluster of A , the previous observation implies that u is covered by the corresponding cluster of A 0 . Since for any demand points u , u 0 , the distance of e ( u ) and e ( u 0 ) in the plane is no greater than their distance in T , the optimal cost of the instance presented to A is no greater than the optimal cost of the instance presented to A 0 . Also, the cost of each cluster of A 0 is at most 2 D α times the cost of the corresponding cluster of A . Therefore, the competitiv e ratio of A 0 is at most 2 D α c . Since D α is a constant and, by Theorem 4, the competitive ratio of A 0 is Ω (log n ) , the competitive ratio of A is Ω (log n ) as well. 12 T o conclude the proof, we describe a D α -distortion embedding of a ternary strict α -HST T with K + 1 le vels in the Euclidean plane. The root of T is mapped to the point (0 , 0) . The children of the root are mapped to the points ( − α K , 0) , (0 , α K ) , ( α K , 0) . For each level- k node v k , k = K , . . . , 1 , whose parent is located along the x -axis on the left (resp. on the right), its children are mapped to the 3 points at distance α k − 1 to v k located along the x -axis on the right (resp. on the left) and along the y -axis up and do wn. For each level- k node v k , k = K, . . . , 1 , whose parent is located do wn along the y -axis, its children are mapped to the 3 points at distance α k − 1 to v k located up along the y -axis and left and right along the x -axis (see also Fig. 3). W e proceed to sho w that the distortion of this embedding is at most √ 2 α/ ( α − 2) . W e ﬁrst observ e that for any two nodes u , v of T , d P ( e ( v ) , e ( u )) ≤ d T ( u, v ) , i.e., the distance of u and v in T is no less than the distance of their images e ( u ) and e ( v ) in the plane. Moreov er, due to the self-similarity of the embedding, the maximum distortion occurs for pairs of leav es of T mapped to points in the plane that lie at symmetric locations with respect to the line y = x (or to the line y = − x ) and are closest to it (e.g., such are the pairs of leaves/points 21 and 24 , 22 and 23 , 31 and 32 , and 30 and 33 in Fig. 3). The distance of any such a pair of leav es u , v in T is d T ( u, v ) = 2( α K +1 − 1) / ( α − 1) . On the other hand, the distance of their images e ( u ) , e ( v ) in the Euclidean plane is: d P ( e ( u ) , e ( v )) = √ 2  α K − α K − 1 α − 1  = √ 2 α K +1 − 2 α K + 1 α − 1 Therefore, the maximum distortion of the embedding is: D α = 2( α K +1 − 1) √ 2( α K +1 − 2 α K + 1) ≤ √ 2 α α − 2 , where the inequality holds for all α ∈ (2 , 3) . u t 5 An Asymptotically Optimal Online Algorithm In this section, we present a deterministic primal-dual algorithm for OnlSumRad in a general metric space ( M , d ) . In the following, we assume that the optimal solution only consists of clusters with radius 2 k f , where k is a non-negati ve integer (see also Proposition 1). F or simplicity , we let r k = 2 k f , if k ≥ 0 , and r k = 0 , if k = − 1 . Let N = N ∪ {− 1 } . Then, the following are a Linear Programming relaxation of OnlSumRad and its dual: min X ( z ,k ) ∈ M × N x z k ( f + r k ) s.t. X ( z ,k ): d ( u j ,z ) ≤ r k x z k ≥ 1 ∀ u j x z k ≥ 0 ∀ ( z , k ) max n X j =1 a j s.t. X j : d ( u j ,z ) ≤ r k a j ≤ f + r k ∀ ( z , k ) a j ≥ 0 ∀ u j In the primal program, there is a variable x z k for each point z and each k ∈ N that indicates the extent to which cluster C ( z , r k ) is open. The constraints require that each demand u j is fractionally cov ered. If we require that x z k ∈ { 0 , 1 } for all z , k , we obtain an Integer Programming formulation of OnlSumRad. In the dual, there is a variable a j for each demand u j , and the constraints require that no potential cluster is “ov erpaid”. The algorithm we present belo w maintains at all times a pair of feasible solutions for the primal and dual programs that correspond to the structure that has been re vealed. When a new demand arriv es, 13 the algorithm has to update the primal variables such that the ne w demand is covered and further increment the dual variables, but without violating the capacity constraints. The algorithm must also guarantee that the cost of the primal and dual solutions will be close enough, since the gap between these two will determine the competiti ve ratio. The Algorithm. The primal-dual algorithm, or PD-SumRad in short, maintains a collection of clusters that cover all the demands processed so far . The collection of clusters of PD-SumRad is initially empty . When a new demand u j , j = 1 , . . . , n , arriv es, if u j is covered by an already open cluster C , PD-SumRad assigns u j to C and sets u j ’ s dual v ariable a j to 0 . Otherwise, PD-SumRad sets a j to f . This makes the dual constraint corresponding to ( u j , − 1) and possibly some other dual constraints tight. PD-SumRad ﬁnds the maximum k ∈ N such that for some point z ∈ M , the dual constraint corresponding to ( z , k ) becomes tight due to a j . Then, PD-SumRad opens a new cluster C ( z , 3 r k ) and assigns u j to it. Competitive Analysis. The main result of this section is that: Theorem 6. The competitive ratio of PD-SumRad is at most 3 (2 + log 2 n ) . The analysis of the competitiv e ratio consists of Lemma 2 and Lemma 3 below . Lemma 2 shows that the dual solution maintained by PD-SumRad is feasible. Thus, the optimal cost for any demand sequence is at least the v alue of the dual solution maintained by PD-SumRad. Lemma 2. F or any sequence u 1 , . . . , u n of demand points, the dual solution a 1 , . . . , a n maintained by PD-SumRad satisﬁes all the dual constraints. Pr oof. Let an arbitrary demand sequence u 1 , . . . , u n . In the dual solution maintained by PD-SumRad, each variable a j is either 0 or f . Since the righthand-side of any constraint is a multiple of f , no constraint can be violated without ﬁrst becoming tight. T o prove the lemma, we show that after a constraint becomes tight, its lefthand-side does not increase, and thus the constraint will ne ver be violated. W e call a cluster C ( z , r k ) tight if the dual constraint corresponding to ( z , k ) is satisﬁed with equality . W e next prove that as soon as a cluster C ( z , r k ) becomes tight, each subsequent demand u ∈ C ( z , r k ) is cov ered by some open cluster of PD-SumRad, and thus the corresponding dual v ariable is set to 0 . T o this end, let us consider some cluster C ( z , r k ) that becomes tight when a demand u j is processed. Then, d ( u j , z ) ≤ r k . T o co ver u j , PD-SumRad opens a ne w cluster C 0 = C ( z 0 , 3 r k 0 ) . The algorithm ensures that k 0 ≥ k (and thus r k 0 ≥ r k ) and that d ( u j , z 0 ) ≤ r k 0 . Now let u be any subsequent demand in C ( z , r k ) . Since d ( u, z 0 ) ≤ d ( u, u j ) + d ( u j , z 0 ) ≤ 2 r k + r k 0 ≤ 3 r k 0 , u is covered by C 0 . The ﬁrst inequality abov e holds because the metric space satisﬁes the triangle inequality; the second holds because both u and u j belong to C ( z , r k ) . Finally , the third inequality follo ws from r k 0 ≥ r k . u t W e proceed to show that the total cost of PD-SumRad is at most O (log n ) times the value of its dual solution, which in turn is at most the total cost of the optimal solution. Lemma 3. F or any sequence u 1 , . . . , u n of demand points, the total cost of PD-SumRad is at most 3 (2 + log 2 n ) P n j =1 a j . 14 Pr oof. W e observe that for any integer k > log 2 n and for all points z , a cluster C ( z , k ) cannot become tight, because the lefthand-side of any dual constraint is at most nf . Therefore, we can restrict our attention to at most 2 + log 2 n values of k . Next, we sho w that for all k = − 1 , 0 , . . . , b log 2 n c , each demand u j with a j > 0 contributes to the opening cost of at most one cluster with radius 3 r k . Namely , PD-SumRad opens at most one cluster C ( z , 3 r k ) for which u j belongs to the tight cluster C ( z , r k ) . W e prove this claim by contradiction. Let us assume that for some value of k , PD-SumRad opens two clusters C 1 = C ( z 1 , 3 r k ) and C 2 = C ( z 2 , 3 r k ) for which there is a demand u j with a j > 0 that belongs to both C ( z 1 , r k ) and C ( z 2 , r k ) . Since PD-SumRad opens at most one new cluster when a ne w demand is processed, one of the clusters C 1 , C 2 opens before the other . So, let us assume that C 1 opens before C 2 . This means that PD-SumRad opened C 1 in response to a demand u j 0 , with j 0 ≤ j , that was uncovered at its arriv al time and made C ( z 1 , r k ) tight. Then, any subsequent demand u ∈ C ( z 2 , r k ) is covered by C 1 , because: d ( u, z 1 ) ≤ d ( u, u j ) + d ( u j , z 1 ) ≤ 2 r k + r k = 3 r k The second inequality above holds because both u and u j belong to C ( z 2 , r k ) and u j also belongs to C ( z 1 , r k ) . Therefore, after C 1 opens, there are no uncovered demands in C ( z 2 , r k ) that can force PD-SumRad to open C 2 , a contradiction. T o conclude the proof of the lemma, we observe that when PD-SumRad opens a new cluster C ( z , 3 r k ) , the cluster C ( z , r k ) is tight. Hence, the total cost of C ( z , 3 r k ) is at most 3 P u j ∈ C ( z ,r k ) a j . Therefore, the total cost of PD-SumRad is at most: X ( z ,k ): C ( z , 3 r k ) opens X u j ∈ C ( z ,r k ) 3 a j = 3 n X j =1 a j |{ ( z , k ) : C ( z , 3 r k ) opens and u j ∈ C ( z , r k ) }| ≤ 3 (2 + log 2 n ) n X j =1 a j The inequality holds because for each k = − 1 , 0 , . . . , b log 2 n c and each u j with a j > 0 , there is at most one pair ( z , k ) such that C ( z , 3 r k ) opens and u j ∈ C ( z , r k ) . u t 6 A Randomized Online Algorithm In this section, we present a simple randomized algorithm, or Simple-SumRad in short, of logarithmic competiti veness. Simple-SumRad is memoryless , in the sense that it keeps in memory only its solution, namely the centers and the radii of its clusters. For simplicity , we assume that n is an integral po wer of 2 and kno wn to the algorithm in advance. This assumption can be remo ved by standard techniques, similar to those discussed in the Appendix. When a new demand u j arri ves, if u j is cov ered by an already open cluster C , Simple-SumRad assigns u j to C . Otherwise, for each k = 0 , . . . , log 2 n, 1 + log 2 n , Simple-SumRad opens a new cluster C ( u j , 2 k f ) with probability 2 − k , and assigns u j to the cluster C ( u j , f ) , which opens with probability 1 . Lemma 4. The competitive ratio of Simple-SumRad is at most 2 (5 + log 2 n ) . Pr oof. W e recall the assumption that the optimal solution only consists of clusters of radius 2 k f , where k is a non-negati ve inte ger . T o establish the competiti ve ratio, we consider each optimal cluster C ( p, 2 k f ) of total cost (2 k + 1) f , k ≤ log 2 n , and bound the expected cost of the algorithm until it opens a cluster that cov ers the entire cluster C ( p, 2 k f ) . 15 Let u 1 , u 2 , . . . , u T be the subsequence of demands included in C ( p, 2 k f ) , such that the cluster opened by u T cov ers the entire cluster C ( p, 2 k f ) . W e note that T itself is a random v ariable. For each demand u i , we let X i be the random variable for the cost of the clusters that u i opens. Hence, the total algorithm’ s cost for u 1 , u 2 , . . . , u T is X = P T i =1 X i . For each demand u i , X i is 0 if u i is covered upon arriv al. Otherwise, X i follo ws the distribution in the description of Simple-SumRad. Let Y i be a new random variable such that Y i = X i if u i is not covered, else Y i takes a v alue as if u i was not cov ered at its arriv al time. Clearly , for each i , X i ≤ Y i . Thus, the expected cost of Simple-SumRad until it opens a cluster cov ering the entire cluster C ( p, 2 k f ) is: E " T X i =1 X i # ≤ E " T X i =1 Y i # W e observe that Y i are nonnegati ve, independent and identically distributed random variables, and that T is a stopping time. Hence, by W ald’ s equation we have that E [ P T i =1 Y i ] = E [ Y ] · E [ T ] , where Y denotes the (identical) distribution of Y 1 , . . . , Y T . E [ T ] denotes the expected number of demands in C ( p, 2 k f ) that have arriv ed before the ﬁrst of them opens a new cluster of radius 2 k +1 f that includes the entire cluster C ( p, 2 k f ) . Hence, E [ T ] = 2 k +1 . Moreov er, we ha ve that: E [ Y ] = 1+log 2 n X i =0 1 2 i (2 i + 1) f ≤ (4 + log 2 n ) f T aking also into account the cost of (2 k +1 + 1) f for the cluster of radius 2 k +1 f opened by u T , the expected cost of the algorithm for the demands in C ( p, 2 k f ) is at most (2 k +1 (4 + log 2 n ) + 2 k +1 + 1) f , which is at most 2 (5 + log 2 n ) times the optimal cost for C ( p, 2 k f ) . Since this holds for all optimal clusters, the competiti ve ratio of Simple-SumRad is at most 2 (5 + log 2 n ) . u t 7 A Fractional Online Algorithm W e conclude with a deterministic O (log log n ) -competitiv e algorithm for the fractional version of OnlSumRad in general metric spaces. The fractional algorithm is based on the primal-dual approach of [2,1], and is a generalization of the online algorithm for the fractional version of ParkPermit in [18, Section 4.1]. A fractional algorithm maintains, in an online fashion, a feasible solution to the Linear Program- ming relaxation of OnlSumRad. In the notation of Section 5, for each point-type pair ( z , k ) , the al- gorithm maintains a fraction x z k , which denotes the extent to which the cluster C ( z , r k ) opens, and can only increase as new demands arriv e. For each demand u j , the fractions of the clusters covering u j must sum up to at least 1 , i.e. P ( z ,k ): u j ∈ C ( z ,r k ) x z k ≥ 1 . The total cost of the fractional solution maintained by the algorithm is P ( z ,k ) x z k ( f + r k ) . The competiti ve ratio is the worst-case ratio of the algorithm’ s cost to the cost of an ofﬂine optimal integral solution for the same demand sequence. The Algorithm. F or the fractional algorithm, or Frac-SumRad in short, we assume that n is an integral po wer of 2 and known in advance. In the Appendix, we sho w how to remov e these assumptions, by losing a constant factor in the competiti ve ratio. Frac-SumRad considers only K + 1 dif ferent types of clusters, where K = log 2 n . For each k = 1 , . . . , K + 1 , we let c k = f + r k denote the cost of a cluster C ( p, r k ) of type k . The algorithm considers only the demand locations as potential cluster centers. F or con venience, for each demand u j 16 and for each k , we let x j k be the extent to which the cluster C ( u j , r k ) is open, with the understanding that x j k = 0 before u j arri ves. Similarly , we let F j k = P ( i,k ): u j ∈ C ( u i ,r k ) x ik be the extent to which demand u j is covered by clusters of type k , and let F j = P k F j k be the e xtent to which u j is covered. When a ne w demand u j , j = 1 , . . . , n , arri ves, if F j ≥ 1 , u j is already co vered. Otherwise, while F j < 1 , Frac-SumRad performs the follo wing operation: 1. For e very k = 1 , . . . K + 1 , x j k ← x j k + 1 c k ( K +1) 2. For e very k = 1 , . . . , K + 1 and ev ery demand u i ∈ C ( u j , r k ) , x ik ← x ik (1 + 1 c k ) Competitive Analysis. Frac-SumRad maintains a (fractional) feasible solution in an online fashion. The proof of the follo wing theorem extends the competiti ve analysis in [18, Section 4.1]. Theorem 7. The competitive ratio of F rac-SumRad is O (log log n ) . Pr oof. W e ﬁrst consider a single operation performed when a demand u j arri ves, and show that it increases the fractional cost by at most 2 . Since an operation is performed, F j < 1 . The ﬁrst step of the operation increases the fractional cost by 1 / ( K + 1) for each cluster type. Hence, the total increase in the fractional cost is 1 . The second step of the operation increases the fractional cost by: X ( i,k ): u i ∈ C ( u j ,r k ) x ik = X ( i,k ): u j ∈ C ( u i ,r k ) x ik = K +1 X k =1 F j k = F j < 1 Therefore, each operation increases the fractional cost by at most 2 . W e next show that the number of operations performed by Frac-SumRad for the demands in an optimal cluster C ( p, r k ) of cost c k is O ( c k +1 log K ) . W e let F p ( k +1) = P j : u j ∈ C ( p,r k ) x j ( k +1) . Since for any demand u j ∈ C ( p, r k ) , C ( u j , r k +1 ) includes the entire cluster C ( p, r k ) , we hav e that F j ( k +1) ≥ F p ( k +1) . Hence, as soon as F p ( k +1) ≥ 1 , ev ery subsequent demand u j ∈ C ( p, r k ) has F j ≥ 1 at its arri val time, and Frac-SumRad does not perform any operations due to u j . Consequently , the total cost of Frac-SumRad for the demands in C ( p, r k ) can be bounded by the total increase in the fractional cost due to operations caused by demands in C ( p, r k ) arriving as long as F p ( k +1) < 1 . T o bound the number of such operations, we observe that after the ﬁrst c k +1 operations caused by demands in C ( p, r k ) , F p ( k +1) becomes at least 1 / ( K + 1) , due to the ﬁrst step of these operations. For each subsequent operation caused by a demand in C ( p, r k ) , all fractions x j ( k +1) , with u j ∈ C ( p, r k ) , increase by factor of (1 + 1 c k +1 ) . Therefore, F p ( k +1) increases by a factor of (1 + 1 c k +1 ) . After O ( c k +1 log K ) such increases, F p ( k +1) becomes at least 1 , and Frac-SumRad does not perform any additional operations due to demands in C ( p, r k ) arriving afterwards. Therefore, the total fractional cost of Frac-SumRad for the demands in an optimal cluster C ( p, r k ) of cost c k is O ( c k +1 log K ) . Then, the theorem follo ws from c k +1 ≤ 2 c k and K = log 2 n . u t 8 Conclusions and Open Problems In this work, we study the problem of Online Sum-Radii Clustering, a natural relaxation of the on- line version of Sum- k -Radii Clustering. W e prove that the deterministic competitiv e ratio of Online Sum-Radii Clustering for general metric spaces is Θ (log n ) , where the lower bound is valid ev en for relati vely simple metric spaces, such as the Euclidean plane and metrics induced by ternary HSTs. Interestingly , we prove that Online Sum-Radii Clustering in HSTs can be re garded as a generalization of Online Parking Permit [18]. Exploiting this result, we obtain a lower bound of O (log log n ) on the randomized competiti ve ratio of Online Sum-Radii Clustering in HSTs. 17 The main remaining open problem is to determine the randomized competitive ratio of Online Sum-Radii Clustering not only in general metric spaces, but also in simple metrics, such as the Eu- clidean plane and HSTs. In this direction, we present Frac-SumRad, a deterministic O (log log n ) - competiti ve algorithm for the fractional version of Online Sum-Radii Clustering in general metrics. Our main open question concerns the existence of a randomized rounding procedure that con verts, in an online fashion, the fractional solution computed by Frac-SumRad to an integral clustering of cost within a constant f actor of the cost incurred by Frac-SumRad. This would be quite interesting since it would imply that the randomized competitiv e ratio of Online Sum-Radii Clustering is Θ (log log n ) . Also, it would be interesting from a technical viewpoint, because known online randomized round- ing procedures for cov ering problems increase the competitiv e ratio by a logarithmic factor , due to feasibility constraints that hav e to fulﬁll with high probability (b ut they apply to non-metric covering problems, see e.g., [2,1]). References 1. N. Alon, B. A werbuch, Y . Azar , N. Buchbinder , and J. Naor . A General Approach to Online Network Optimization Problems. ACM T ransactions on Algorithms , 2(4):640–660, 2006. 2. N. Alon, B. A werbuch, Y . Azar, N. Buchbinder , and J. Naor . The Online Set Cover Problem. SIAM J. on Computing , 39(2):361–370, 2009. 3. V . Bil ´ o, I. Caragiannis, C. Kaklamanis, and P . Kanellopoulos. Geometric Clustering to Minimize the Sum of Cluster Sizes. In Pr oc. of the 13th European Symposium on Algorithms (ESA ’05) , v olume 3669 of LNCS , pages 460–471, 2005. 4. A. Borodin and R. El-Y ani v . Online Computation and Competitive Analysis . Cambridge University Press, 1998. 5. T .M. Chan and H. Zarrabi-Zadeh. A Randomized Algorithm for Online Unit Clustering. Theory of Computing Systems , 45(3):486–496, 2009. 6. M. Charikar , C. Chekuri, T . Feder , and R. Motwani. Incremental Clustering and Dynamic Information Retriev al. SIAM J. on Computing , 33(6):1417–1440, 2004. 7. M. Charikar and R. Panigrahy . Clustering to Minimize the Sum of Cluster Diameters. J. of Computer and System Sciences , 68(2):417–441, 2004. 8. J. Csirik, L. Epstein, C. Imreh, and A. Levin. Online clustering with variable sized clusters. Algorithmica , 65(2):251– 274, 2013. 9. G. Div ´ eki and C. Imreh. An Online 2 -Dimensional Clustering Problem with V ariable Sized Clusters. Submitted for publication , 2011. 10. S. Doddi, M.V . Marathe, S.S. Ravi, D.S. T aylor , and P . Widmayer . Approximation Algorithms for Clustering to Mini- mize the Sum of Diameters. Nordic J . Computing , 7(3):185–203, 2000. 11. M.R. Ehmsen and K.S. Larsen. Better Bounds on Online Unit Clustering. In Pr oc. of the 12th Scandinavian Symposium on Algorithm Theory (SW A T ’10) , volume 6139 of LNCS , pages 371–382, 2010. 12. L. Epstein and R. van Stee. On the Online Unit Clustering Problem. ACM T ransactions on Algorithms , 7(1):7, 2010. 13. D. Fotakis. On the Competitiv e Ratio for Online Facility Location. Algorithmica , 50(1):1–57, 2008. 14. M. Gibson, G. Kanade, E. Krohn, I.A. Pirwani, and K. V aradarajan. On Clustering to Minimize the Sum of Radii. In Pr oc. of the 19th ACM-SIAM Symposium on Discr ete Algorithms (SODA ’08) , pages 819–815, 2008. 15. M. Gibson, G. Kanade, E. Krohn, I.A. Pirw ani, and K. V aradarajan. On Metric Clustering to Minimize the Sum of Radii. Algorithmica , 57:484–498, 2010. 16. N. Lev-T ov and D. Peleg. Polynomial T ime Approximation Schemes for Base Station Coverage with Minimum T otal Radii. Computer Networks , 47(4):489–501, 2005. 17. A. Meyerson. Online Facility Location. In Pr oc. of the 42nd IEEE Symposium on F oundations of Computer Science (FOCS ’01) , pages 426–431, 2001. 18. A. Meyerson. The P arking Permit Problem. In Pr oc. of the 46th IEEE Symposium on F oundations of Computer Science (FOCS ’05) , pages 274–284, 2005. 19. S.E. Schaeffer . Graph Clustering. Computer Science Revie w , 1:27–64, 2007. A A ppendix: Online Estimation of the Number of Demands T o remove the assumption that n is known to Frac-SumRad in advance, we run the algorithm in phases, where each phase ` uses an estimation n ` = 2 2 2 ` of n . Phase ` , ` = 1 , 2 , . . . , ends just after 18 n ` demands have been processed. Then, the algorithm keeps the fractional solution for the demands processed in phase ` , and starts computing a ne w fractional solution for the next demands arriving in phase ` + 1 , using an estimation of n ` +1 . W e show that running Frac-SumRad in phases increases its competiti ve ratio by no more than a constant factor . Let λ be the last phase of Frac-SumRad. By Theorem 7, the cost of Frac-SumRad in phase ` , ` = 1 , . . . , λ , is at most 2 ` β OPT ` , where OPT ` is the optimal cost for the demands arriving in phase ` , and β is the constant hidden in the O -notation, in Theorem 7. Since the optimal cost OPT for all demands is no less than OPT ` , the total cost of Frac-SumRad is at most 2 λ +1 β OPT . On the other hand, the total number of demands is at least 2 2 2 λ − 1 , because the phase λ − 1 is complete, and log log n ≥ 2 λ − 1 . Therefore, the total cost of Frac-SumRad is at most 4 β 2 λ − 1 OPT , and the competiti ve ratio is O (log log n ) . 19

Online Sum-Radii Clustering

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment