Hierarchical Aggregation Clustering Algorithms Derived from the Bi-partial Objective Function

1 Hierarchical Aggregation Cl ustering Algorithms Derived fro m the Bi-partial Objective Function Jan W. Owsiński Systems Research Institute, Polish Academy of Science s owsinski@ibspan.waw.p l Abstract The paper outlines the prin ciples of construction of a broad class of hierarchical aggr egation algorithms of cluster analys is, essentially based on minimum distance mergers, wh ich are derived from the general bi-partial objective function. It is sh own how the algorithms aris e from the bi-partial objective function, their aff inity with the classical hier archical aggregation algorit hms is demonstrated, and the exa mples of such algorith ms for the concrete for ms of the bi-partial objective function are provided. This amounts to the first explici t and, at the same time, qu ite general, connection between optimization in clustering and the hierarchical aggregation algorithms. Thereby, the respective hierarchical algorith ms gain a deeper justification, the means for evaluating the quality of clustering is pro vided, along with the criteri on of stopping the cluster mergers. Keywords: clustering, hierarch ical aggregation, minimum distanc e, bi-partial objective function, optimization, merger rule 1. Introduction: the clusterin g problem Clustering of objects or observations is the subject of a domain in the field of data analysis – cluster analysis – consisting in solvi ng the problem of dividing a set of observations or ob jects into subsets (clusters) in such a way tha t the items, belonging to th e same cluster, are possibl y similar or close to each other, while the items , belonging to different clus ters, are possibly dissimila r, or distant . This problem, apparently and in tuitively quite obvious in its nature, is the basis of a very b road spectrum of human endeavors an d, indeed, achievements. Thes e range from language for mation (meanings of words vs. objects, features and phenomena in reality) down to automatic self-c ontrol of autonomous vehicles (clusters of com munication and control node s). Yet, even though the p roblem forms a kind of intellectual paradigm, i ts concrete and precise formu lation for the purpose of s olving it in a practical situation, with d efinite data, is still missing. There are, of course, dozens of clustering algorithms cu rrently available, both in literature and in publicly available software libraries, based on a variety of presumptions and fea turing highly diversified properties in terms of computational effort and the character of results. During the l ast half century a lot of books have been published, de voted to cluster analysis, some selected examples are Hennig et al. (20 15), Everitt et al. (2011), Xu and Wunsch (2009) or Mirkin (199 5). Many of these books have been pub lished in several consecutive edit ions. These books present n ot only various algorithms, but also the oretical foundations for and a nalyses of the essential elements an d questions, arising in formul ation and solving of the clusterin g problem, as defined above. Even a superficial consider ation of this verbally expressed problem leads to the concl usion that these essential elements and qu estions are related to: (i) the measure(s) of distance an d/or proximity, both for individual objects / observations, and (ii) for th eir sets (clusters); (iii) th e formulation of an overall objective function, represe nting the quality of a given partition, in accordance wit h the content of the problem; and (iv) the corresp onding algorithm that would lead to solution, or at lea st its approximation, for the objective function formulated. The existing methodol ogies do not respond satisfactori ly to these challenges, especial ly regarding points (iii) and (iv). This s tatement holds even if there exist algorithms that are ve ry computationally effective and yield results which can be accepted on the basis of some “external ” criteria, not related to these algorithms and expressing various kinds of r ational qualifications of th e partitions obtained. 2 In this paper we deal with one of the families of clusterin g algorithms, the hierarchical merger or aggregation algorithms. It i s one of the most popular an d deeply rooted groups of alg orithms, of an obvious intuitive charact er. We shall show how algorithms of this kind can be derived fr om the objective function, called b i-partial (see Owsiński, 2020), which refl ects adequately the essence of the clustering problem. This objective function is formu lated in a very general ma nner, and then it is shown how for its m ore concrete forms the correspond ing hierarchical merger algorithms can be derived from it, leading to i ts suboptimisation. In this way, not only partic ular partitions can be evaluated in terms of the original clustering problem, but also simple al gorithms can be designed , corresponding to the mann er of evaluating these partitions, leading to suboptimal solutions. On the top of this, the stop cond ition for the algorithms is specified, an d a room is established for the improvement of the subop timal solutions. In the next section, the hie rarchical aggregation algorithms ar e characterized. Then , in Section 3, comments are forwarded on the relation between the f unctioning of the known alg orithms from this group and the issue of clus tering optimization. In Sect ion 4 a telling example is pr ovided for both the construction of the bi-parti al objective function and th e reasoning behind the derivati on of the respective hierarchical mer ger algorithms. Section 5 provides the general formula tion of the bi- partial objective function a nd the principles of design of the c orresponding hierarchical merger algorithms. The next sectio n, Section 6 is devoted to presentation of examples of concrete for ms and the objective function an d corresponding algorithms, alon g with some comments , mainly of technical character. Section 7, contai ning conclusions, closes th e paper. 2. The hierarchical agg regation algorithms Hierarchical aggregation al gorithms belong among the most popu lar, classical cl ustering algorithms. They form a relatively bro ad class of algorithmi c procedures, including singl e linkage (nearest neighbor), complete linkag e, average linkage, Ward’s algorithm, and yet quite a n umber of other ones, including their technical varian ts. All these algori thms work in such a way that for a set X of n objects or observation s, indexed i (or j ), that is: i  {1 ,…, n } = I , and so mehow defined distance d (.,.) for any two objects fro m the set X (or the respective space, E X , of which X is a sub set), one realizes the following generic proce dure: (0) treat all the objects, in dexed 1,…, n as n separate clusters; (1) find the smallest of th e distances between the pai rs of clusters; (2) merge the correspondi ng two clusters; (3) check, whether the new nu mber of clusters is still bigger than 1, if not – stop ; (4) update the distances b etween clusters (with resp ect to the newly formed cl uster); (5) go to step 1. The particular algorithms from this gr oup differ by the step 4 (distance matrix upd ating) 1 , and it is exactly the essential sourc e for the algorithmic variet y. Lance and Williams (1966, 19 67) were the first to systematize these al gorithms through the parametric f ormula of inter-cluster distance updating. Later on, this for mula was being yet extend ed in order to accommodat e a broader variety of algorithms. If we denote by q , q = 1,…, p the index of clusters A q in a partition P , and by D ( A q , A q’ ), or D qq’ , the distance between clusters, th en the formula, proposed by Lance and Williams for in ter-cluster distance updating, assu ming the merged clusters are i ndexed q* and q** , is as follows: 1 Some of the var iants may di ffer at other steps, e.g. step 1, w here additional c onditions b esides minimum distance may be a dded. 3 D q*  q** , q = a 1 D q*q + a 2 D q**q + bD q*q** + c | D q*q – D q**q | (1) defining the distance betw een the new cluster A q*  A q** and any other cluster A q . Concrete values of coefficients a 1 , a 2 , b and c correspond to p articular algorithms, as this is ex emplified in Table 1 for some of the most popu lar ones. (Concerning this tabl e, and, in general, n q denotes the nu mber of objects in cluster indexed q .) Over time, both new algori thms have been found to f ollow the Lance and Williams for mula, with different values of the coeff icients, and the formula itself h as been extended to encompa ss more coefficients and hence also a much broader class of al gorithms, e.g. by M. Jambu . For an interesting exposition a Reader is refer red to Podani (1989) (anoth er worthwhile reference i s Murtagh and Contreras, 2012). The nu mber of the thus described aggregation algorith ms is now at about 20. Table 1. Examples of hi erarchical merger algorithms an d their Lance-and-William s formula coefficients Algorithm a 1 a 2 b c Single linkage (nearest n eighbor) 1/2 1/2 0 - 1/2 Complete linkage (farthes t neighbor) 1/2 1/2 0 1/2 Unweighted average (UPGMA) n q* /( n q + n q* ) n q** /( n q + n q** ) 0 0 Weighted average (WPGMA ) 1/2 1/2 0 0 Centroid (UPGMC) n q* /( n q + n q* ) n q** /( n q + n q** ) - n q* n q** /( n q* + n q** ) 0 Median (WPGMC) 1/2 1/2 - 1/4 0 The generic procedure gi ves rise, in a natural manner, to a graph of consecutive merger s, a dendrogram (a tree), con taining n leaves, n -1 nodes (corresponding to mergers) a nd 2( n -1) edges. The dendrogram is a very powerful representation of the concrete way of procee ding of the algorithm (and – hopefull y – also of the dataset, whos e essential characteristics o ught to be reflected in this tree). It shows the gr oups of objects at variou s “levels of distinction” (length of distances between the clusters consecu tively merged) and their in terconnections, as well a s a measure of cluster affinity in the form of the “height” of nodes, either ordinal (sequence of mergers) or given through the value of d istance, at which the merger oc curred. In connection with the abo ve, it must be emphasised th at the hierarchical aggregation alg orithms are very appealing as highly intuitive and easily interpreta ble. Yet, in view of the com putational requirements of these algo rithms, primarily the neces sity of processing the matrices of d istances, they are not appropriate fo r the large data sets, and s o, if used for such data sets , it is usually in hybrid techniques, in whic h they are used in conjuncti on with some other metho ds, e.g. based on density of objects in spac e E X . Yet, there is another issu e in the applicability of the hierarchical merger algorithms. Na mely, although widely used in clustering, they actually d o not solve the fundamental problem of cluster analysis, i.e., slightly refo rmulating the definiti on, given at the outset: “to partition the set of objects I into subsets (clusters) A q , q = 1,…, so that the obj ects in the same clusters be possibly close (s imilar) to each other, while objects in different cluster s – be possibly distant (dissimilar)”. In other words – these algor ithms do not produce the partition, unle ss we apply a n “external” criterion or method to, sa y, cut the dendrogram at a given heigh t. And this is actu ally what is usually being done. The present paper is meant to attempt bridging exactl y this gap, namely – to sho w a class of objective functions for clus tering, for which there exists (from which can be derived) a class of the 4 hierarchical merger algorit hms, which solve, at least i n an approximate mann er, the clustering problem as represented by these objective functions. This pap er does not aim at mathema tical precision, but is intended to present the essential prin ciples and the logic of the ap proach. The paper follows some of the considerations, pr esented in Owsiński (2020), conc entrating, though, on the subject of relation b etween the objective fun ctions and the hierarchical merger alg orithms, and presenting some i mportant new developments. 3. Hierarchical aggregation algorithms and optimality Almost starting with the initia l work of Florek et al. (1956), which introd uced the algorithms in question (actually: the singl e linkage algorithm), the r elation between these alg orithms and some sort of optimality has been discussed. Particularly kno wn has become the link betwe en the single linkage algorithm and the minimum spanning trees (t he Kruskal’s algorithm). Yet, there is no comprehensive the ory, nor even empirical study, sho wing the links of the entire cla ss of the hierarchical merger algorit hms, or its sub-classes, wit h definite optimality conce pts. When considering the relati on between the algorithm s in question and optimalit y, one can start with the very basic “ minimum di stance ” heuristic of these algorithms. One deals her e with an explicit step- wise rationality , oriented a t some sort of quasi- optim isa tion, composed of two elements: (1) d istance minimization (rather than a ny other choice of cluster s for the merger), and (2) the dista nce updating rationality (the distance-rel ated merger rule), expressed through the Lance-Willia ms-type formula. Notwithstanding various stud ies, showing the associati ons between the particular progressi ve merger algorithms and defi nite optimality concepts an d objective functions (see further on) the entire class of procedures can be seen as representing various kinds of greed y-type “local optimality intuitions”, having, in gene ral, no properties, related to any broader optimality. This changes, when we c onsider the entire dendrogra ms , rather than consecuti ve mergers. It then turns out that dendrogra ms, produced by various aggregation procedures, ma y be shown to ha ve definite optimality proper ties. The relation between si ngle linkage and minimum spanning tree generation algorithms was alr eady mentioned. Similarl y early (see Hartigan, 1967) the question was asked of the possibly bes t approximation of the actua l distances between object s by the dendrograms, produced with different aggregation pr ocedures (approximation o f distance by the ultrametrics, correspondin g more directly to dendrograms). In a similar vein, the concep t of parsimony also appeared in the context of the dendr ograms, although understo od somewhat differently. This, however, concerns not the optimalit y o f partiti ons (clustering results), but the possibly most accurate ren dition of the distance struc ture in the data set by the d endrogram. More recently, an appr oach was elaborated of a much broader meaning, based on the seminal paper by Dasgupta (2016), later on developed significantly b y Cohen-Addad and associa tes (see, e.g., Cohen-Addad et al., 2017), to link hierarchical clusterin g with a definite class of objective functions at the level of entire dendrog rams. This approach app lies for a broad class of alg orithms, on the one hand, and of objective fu nctions, on the other, with ap propriate conditions of cor respondence being formulated. We are, however, interest ed in the relation between t he “ minimum distance merger ” hierarchical clustering algorithms and th e objective function, or fun ctions, pertaining to the p roper problem of clustering , as formulat ed at the outset, i.e. of obtaining an optimum partition , an d not a tree (i.e. dendrogram). Let us n ote, at this point, that there do exist, indeed, very many criteria or indices (dozens of them!) of cluste ring quality, see, e.g., Rend ón et al. (2011), Vendra min, Campello and Hruschka (2010), Owsiń ski (2020), Section 5.4, or the choice, offered by the R environ ment (see 5 Cluster Analysis in R ) docu ments 2 . None of these, th ough, is directly associated (b eing derived from put apart) to any of the hierarchical clustering algorithms, and they are used post-h oc to select the “best” clustering results among many that can be produced with various algorithms and th eir diverse parameterisations, with n o relation whatsoever to th e logic of the algorithm, who se results are being evaluated. At this point we might quot e from the already mention ed Dasgupta (2016): “The se [hierarchical clustering algorithms] are widely us ed and are part of stand ard packages for data an alysis. Despite this, there remains an aura of mystery about the kind s of clusters that they fin d. In part, this is because they are specified procedurally rather than i n terms of the objective fu nctions they are trying to optimize . For ma ny hierarchical clustering algorithm s, it is hard to imagine what th e objective function might be.” (e mphasis added). Dasg upta (2016) tried to make the link f or the entire dendrograms, while here we shall concentrate on the p artitions, i.e. on the proper proble m of clustering. Yet, even with respect to partitions, there have been – besides the work by the p resent author, conducted since the early 1 980s, and summarized in Owsiń ski (2020) – some early hints as to the link with hierarchical aggregation algorithms. The notable case is the one of Ducimeti ère (1970), where the connection between th e average link algorithm and the objective function of clustering, proposed, within a broader framework, in Rubin (1967), is indicated. Yet, this very important junction has not been anyho w developed and got, in fact, total ly forgotten. In this paper, a broad app roach is presented, linking a class of hierarchical aggrega tion algorithms (“minimum distance” algor ithms) with a class of objec tive functions for clustering (the “bi-partial objective functions”). For a broader treatment of the sub ject of the bi-partial objective fun ctions we refer to Owsiński ( 2020), while here we concentrate o n the derivation of the corr esponding hierarchical merger algorit hms. 4. A telling example For purposes of introducin g the framework that we s hall promote here, let us remind the objective function for clustering, introd uced by Marcotorchino and Michaud (1979, 1982), as a part of the mathematical program ming formulation of the clusterin g problem. This formulati on was, after a slight modification, as foll ows: maximise  i,j  I ( y ij s ij + (1- y ij ) d ij ) (2) where d ij denote the distances between objects in the data set, and s ij – the similarities ( proximities) between th em, while the d ecision variables a re y ij = 1 when objects i and j belong in the solution to the same c luster, and y ij = 0 when they belong in the solution to different clusters, this formulation being subject to the following obvious constraints: y ij  {0,1},  i , j , y ij = y ji ,  i , j , meaning sy mmetry (3) y ij + y jv - y iv  1,  i , j , v , mean ing transitivity. Of course, formula (2) coul d be written down with an exp licit expression, accounting for the transformation s ( d ) or vice versa, but, both for purposes of p reserving generality, and intuitive appeal, as well as si mplicity of formulation, we keep h ere and onwards explicitly b oth quantities, that is – distance ( d ) and proximity ( s ), assuming, of cours e, that there is a transformation be tween them that can be applied in defin ite concrete cases. 2 We mean her e, of course, o nly the so-called „ internal” crit eria or in dices, the „external” o nes making reference to som e given part itions, treated a s references. 6 Formulation (2), (3) addres ses, definitely, both the ad equate representation of the clustering problem, as also formulat ed here (i.e. the possibly strong internal cohesion of clusters and the possibly distinct separation of clusters), and the way of approaching the solution. The obvious numerical problem arises i n connection with the transitivity constrai nt, which requ ires O( n 3 ) inequalities to be treated. The objective function (2) can, however, be paramete rized (see Owsiński and Zad rożny, 1986, 1988) in the manner shown belo w: max P Q ( P , r ) = r  q  i < j  Aq s ij + (1- r )  q < q’  i  Aq  j  Aq’ d ij (4) where we no longer use th e decision variables y ij , but refer to th e partition P as the “decision variable”, and where the p arameter r  [0,1] reflects the weight, assigned the two parts of the objective function, now de noted Q ( P , r ), these two par ts corresponding, respectively, to the int ernal proximity of objects inside clusters and to the distance s between objects from va rious clusters . It is, naturally, tacitly assumed tha t the “proper solution” is found for r = ½ (i.e. when (4) is equivalen t to (2)). Assume now a proced ure, meant to find the optimum solution in t erms of partition P , based on moving the value of r , say , from 0 towards 1. Thus, for r = 0, the formulation o f the problem, given no other constraints, except f or thos e enforcing the partition, (3), would yield the solut ion, in which each of the objects constitutes a separate cluster, i.e. p (the number o f clusters in P ) = n , since the first co mponent in (4) simply disapp ears. Given the constraints (3), such a solution is feasible, and shall indeed be provided by any (corre ct!) method whatsoever. Now, as the v alue of the parameter r is increased from 0, the very first encountered obvious sol ution to (4), satisfying the constr aints (3), and, at that, dif ferent from the on e, obtained for r = 0, would appear to be the one, w hich merges t wo most similar (least distant) objects i* and j* (i.e. such th at d i*j* = min i , j d ij ). The switch from the optimum partition for r = 0, which we s hall denote P * ( r =0) = P * (0) = P 0 , with P 0 = I , to t he one, in which objects indexed i* and j* form a two-object cluster, takes place at a definite value of t he parameter r > 0. Denote this value, at which the opt i mum P 0 is replaced by P 1 , established by the merger of i* and j* , by, accordingly, r 1 (implying that r 0 = 0). Let us note that, quit e obviously, if there ex ist in the data set the pairs ( i , j ), for which d ij = 0, then t he value of (4) and (2) doe s not change, whether we merge these objects, o r not. Actually, we can assume that for such special cases r 1 = r 0 = 0, with r 1 correspond ing to P 1 , the partition, wh ich incorporates th e merger of all the objects, among which all the dista nces are zero. An analogous reasoning applies when there are more equidistant objects than just pairs at the level of minimum distance. Hence, from now on, just in order to omit t riviality (w hile, in fact , entirely preserving the val idit y of the reasoning), we shall b e considering that the d istances am ong the objects in the data set are diff erent fro m 0, an d tha t all distances are diff erent as to their values. Thus, the (first) merger of a pair i* and j* (for which d i*j* = m in i , j d ij > 0) takes place for the parameter value r 1 , determined by d i*j* and s i*j* . In order t o show t he relation, t ake the two partitions, P 0 and P 1 , the lat ter one differing from the former by just one merger, and the values of objec tive function for these partitions, Q ( P 0 , r ) and Q ( P 1 , r ). Hence, we compa re the values of Q ( P 0 , r ) = r  i s ii + (1- r )  i < j d ij (5) and Q ( P 1 , r ) = r (  i s ii + s i*j* ) + (1- r ) (  i < j d ij – d i*j* ) (6) for the parameter r increas ing up from 0. Obviously, the values of  i s ii and  i < j d ij are constant for any given set of objects. The comparison (equation) of Q ( P 0 , r ) and Q ( P 1 , r ) yields: r  i s ii + (1- r )  i < j d ij = r (  i s ii + s i*j* ) + (1- r ) (  i < j d ij – d i*j* ); (7) hence, after simple operati ons, this leads to 7 rs i*j* - (1- r ) d i*j* = 0 (8) and we get the sough t value of r 1 : r 1 = * * * * * * j i j i j i s d d  . (9) Formula (9) is very telling, indeed. Namely, as we shift the value of the weight parameter r from 0 upwards, we look for the smallest possible v alue of this parameter, for which P 0 is no longer the best partition in terms of (4), and should be replaced by another partition. We conclude from (9) that we obtain this smallest value, r 1 , exactly for the smallest distance between t wo objects . Formula (9) is valid for any pair of objects, but t he value of the merger parameter r is the smallest for the closest two of them. It is quite straightforward to observe t hat t he reasoning, leading to for mula (9) applies to ( i) all the subsequent (disjoint) pairs of o bjects in the set X , ranked according to their pairwise d istances, and then to (ii) all the subsequ ent mergers of clusters, no matter how many objects they may contain (in this case t he respective formula would account for the appropriate characteristics of individual clusters and of pairs of clusters involved). If, namely, we adopt quite a natural definitions, closely associated with (4), namely: D qq’ =  i  Aq  j  Aq’ d ij and S qq’ =  i  Aq  j  Aq’ s ij , with q  q’ , (10) then the formula (9), for th e merger step, indexed t , takes the more general for m: r t =  󰆓   󰆓  󰆓 . (11) At this point, it should be n oted that formulae (9) and (11) provide a simple rule fo r merging the objects and clusters, analogous to the rules of the classical agglomerat i ve schemes. This particular rule, expressed b y (9) and (11), i s equivalent to th e a verage linkage, a s noted already in Ducimetière (1970). Yet, here, we do not only proceed as in these agglomerative schemes, guided by a definite merger rule, but dispose also of the “global” objective fun ction, which allows f or the evaluation of the entire successive partitions obtained. Obv iously, the iterative merger procedure, which is proposed here, is only a suboptimising one, but an improve ment is always possible, thr ough certain additional procedures, although, natu rally at a cost. We lo ok for the ultimate solution at the value of r = ½. As a complement to the global objective function and its values, we get the sequence of values of r t , constituting a natural index of the hierarchy, with values obtained in the course of the procedure, more closely associated with th e objective function than the u sually applied distance valu e. 5. A more general perspe ctive Having in mind t he prev iou sly presente d example, we can now t ry to outline a ge neral a pproach. Thus, quite in analogy to (4) we c an propose solving the pr oblem max P Q ( P , r ) = rQ S ( P )+ (1- r ) Q D ( P ) (12) where Q S ( P ) represents a measure of internal similarity of clusters, extended over the entire partition P , and Q D ( P ) represents a measure of dis tances between clusters, als o extended over the entire partition P . It appears quite natural to propose a “dual” to (12) in the form of min P Q’ ( P , r ) = rQ S ( P )+ (1- r ) Q D ( P ) (13) 8 where Q S ( P ) reflects the inter-cluster similarity measure, extended over the entire partition, and Q D ( P ) reflects the intra-cluster distance measure, also extended over the entire partition (see Owsiński, 2020, for a broader treatment). Yet, we shall continue considerin g t he form (12), the reasoning, concerning the “dual” of (13), being ful ly analogous. In order to proceed, w e are obliged to make some assumptions, concerning Q S ( P ) and Q D ( P ). Yet, it is obvious that, in any case, the choice o f Q S ( P ) and Q D ( P ) m ust be m ade, first of all, with consideration of the clustering-oriented rationality. Hence, w hate ver the li mitations, introduced by any assumptions, they ought to be a ssessed from the s tandpoint of this rationality. And, obviously, this rationality sh ould be assessed with respect to Q S ( P ) and Q D ( P ) jointly, not separately, as it is often the case in various clustering algorithms. On the other hand, there are more than one line of reasoning, and hence of the assumptions, leading to our goal, i.e. derivation of the family of m inimum distance merger rules, analogou s t o those o f the classical hierarchical merg er algorithms, from the objecti ve function like (12). Hence, we shall concentrate here o n the examples thereof, in order not to extend the paper too much, and to show that t he general form assumed allows for quite a margin of flexibility. In any case, it will be quite apparent that the illustrative case, presented in the preceding section, definitely follows the precepts introduced for the general case. At t he most general level, w e shall only assume that the functions Q S ( P ) and Q D ( P ) ar e characterised by the opposite monotonicity al ong any hier archy of partitions, arising from mergers . This is, of course, not the sa m e a s opposite monotonicity with respect t o p , the number o f cluste rs, unless we add t o this condition some ap propriate qualification, like, max P Q S ( P |card P = p ) and, analogously, m ax P Q D ( P |card P = p ). 3 Hence, for any hierarchy of partition s H = { P ( p )} p (any dendrogram, resultin g from the merging procedure) we can propose that arg min p Q D ( P  H ) = 1, an d so arg max p Q D ( P  H ) = n , (14) therefore: arg min p Q S ( P  H ) = n , and s o arg max p Q S ( P  H ) = 1. (15) The above is inso far intuitively plausible as, indeed, there are no inter-cluster distances for p = 1 (first part of (14)), and there are no intra-cluster distanc es nor proximities for p = n (firs t part of (15)). Now, let us return to (12) a nd put r = 0: max P Q ( P ,0) = 0ˑ Q S ( P )+ 1ˑ Q D ( P ) (16) which means, according t o (14), that P * (0) = I , where P * ( r ) is the best partiti on, in term s of (12), for the given parameter value r . Thus, for r = 0 w e get the (optimal) solution, in which each o bject is a separ ate cluster. Assume we increase r from 0 and try to maximise (12) al ong the val ues of this param eter. An 3 Other possibly int roduced assumpt ions would c oncern the nature of the sequenc es of values of Q S ( P ) and Q D ( P ) for conse cutive merger s (e.g. monotonici ty of differen ces between t he consecutiv e values), an d/or, especially, the a ssociation wit h the values of int er-cluster a nd intra-cluster d istance / pr oximity measures. 9 alternative t o the solution P * (0), which, definitely, s tays valid for r small enough 4 is provided by a partition P , different from I , for which, conform to t he assumption adopted, Q S ( P ) > Q S ( I ) and, at the same time, Q D ( P ) < Q D ( I ). Thus, it becomes obvious that for all P different from I a value o f r can be found, denoted r * ( P , I ), for which P b ecomes a better partition, in terms of (12), than the partition I : r * ( P , I ) =   (  )   ()   (  )   (  )   (  )   () . (17) Definitely, r  [0,1]. Also, th e values of r * ( P , I ) are smaller, when the difference Q S ( P )- Q S ( I ) is bigger. Let us remind that in line wit h the algorithmic scheme, considered in this study, w e assume that we look for the subsequent partitions, forming dichotomous hierarchies, i.e. for the sequences P t , t = 0, 1,…, n -1, suc h that P t is established by the merger of a pair of clusters, forming partition P t -1 . We can put P 0 = I , in accordance with the above, and so (1 7) becomes r * ( P 1 , P 0 ) =         (  )   (   )   (   )   (   )   (  ) . (18) It is obvious that the sam e relation holds for each sub sequent partition along the in dex t : r * ( P t , P t -1 ) =         (  )   (   )   (   )   (   )   (  ) . (19) Definitely, r elations (17)-(1 9) hold not just for the dic hotomous hierarchy an d for its imm ediatel y neighbouring level s (partitions), but for any t wo par titions coming from the same hierarchy, with appropriate preservation of the order of the two . Here, however, more narrowly, we look, given some P t -1 , for t he “best” among the P t , formed by t he merger of two clusters, com posing P t -1 , the “ best ” being expressed in terms of (12) (ultimately , for r = ½). Resulting from these consecutive choices is the sequence (hierar chy), denoted { P *t }. We shall propose that it arises from t he minimisation, for eac h consecutive P *t -1 , of the re spective r * ( P t , P *t -1 ), i.e. P *t = arg min r * ( P t , P *t -1 ), (20 ) with P t being limited t o partitions formed by a merger of two clusters, composing P *t -1 . We shall further denote min r * ( P t , P *t -1 ) by r t . Minimisation from (20), give n (19), is equivalent to maximisati on of the quotient         ( ∗ )   (  ∗ )   (  ) , (21) meaning that t he m erger o ught, in a greedy manner, exploit the biggest current benefit in terms o f  Q S ( P t ) = Q S ( P t ) – Q S ( P *t -1 ) and  Q D ( P t ) = Q D ( P *t -1 ) – Q S ( P t ). 5 4 As already mentioned , we assume tha t there ar e no zero int er-object distances in the set considered , although exist ence of such zero distances (iden tical objects ) is not a pr oblem for the enti re reasoning (all o f the respective obj ects can be a ggregated already at r = 0). 5 As we refer to the minimum distance procedu re, a natura l implication aris es, concerning the propert ies of the bi-partial ob jective function, as to the relation between  Q S ( P t ) and  Q D ( P t ), on the one h and, an d min q , q’ D ( q , q’ ), on the other hand. 10 Let u s, then, consider, in this context, the shape o f the potentially resulting function Q ( P , r ) = rQ S ( P )+ (1- r ) Q D ( P ) as the procedure, outlined above, is performed along r , starting from r 0 = 0. Since the gradient of Q ( P , r ), w ith respect to r , is Q S ( P ) - Q D ( P ), we start, for r 0 = 0, with P 0 = I , from t he biggest decrease possible at this step. At the o ther extreme, for t = n -1, we have P n -1 = P *n -1 = { I }, meaning all objects in one cluster, and the gradient of Q ( P , r ) at this end attaining its (positive) maximum. The obvious conjecture is that we deal with a convex, piece-wise linear function, changing its gradient at every merger (every t ). This is definitely true along the sequence of P t ’s, but not necessarily along t he values of r , as this will depe nd upon the concrete forms of Q S ( P ) and Q D ( P ). At this point we have effecti vely formulated a general procedure of suboptimisat ion of the bi-partial clustering criterion, this procedure being very much like the classica l hierarchical merger pr ocedures, but with clear indication of the point, when the genera tion of the dendrogram ma y be stopped (i.e. r t  ½, r t +1  ½). Indeed, the example of this procedure th at we have provided in the prec eding section was equivalent to one of th e classical merger algorith ms, which thereby gets equ ipped with a very definite indicator of partiti on quality, without the nee d to recur to “external” sta tistical measures. We shall now provide a choice of further exa mples of such procedures for a couple of selected definitions of Q ( P , r ). 6. The examples The precepts tha t we have ad opted for the fu nction Q ( P , r ) allo w for a very broad variety of formulations of this function, and hence also of the potentially associated merger algorithms. We shall now present a couple of examples thereof, starting with the simplest cases, the first one ac tually bordering upon t riviality. In order not to prolong the exposition, we provide only the basic precepts, leaving in most cas es the derivation of t he concrete form of the key merger relation, analogous to (19), to the Reader. 6.1. The additive objectiv e function with a constant cl uster cost The simple case, pr esented here (see also Owsiński, 2020 ) illustrates the possibility o f representing the facility l ocation problem in terms of the ex plicit bi-partial object ive function. Notwithstanding the possible variants o f the actual facility location problem, we can represent it in the following manner, for the minimised version of the bi-partial functi on: Q D ( P ) =  q D ( A q ); Q S ( P ) = p (22) which, ev en if a bit artificial from the clustering perspective, regarding especially the form of Q S ( P ), if appropriately sc aled (norm alised), so as to adequately repres ent the facili ty location problem, ca n still be treated as a representa tion of the bi -partial paradig m. In this case, Q S ( P ) represents t he t otal cost of establishing a clu ster (a f acility location), while D ( A q ) rep resent the costs, a ssociated with distances, generated by the s ubsets of demand / supply points. The D ( A q ) c an have, fo r instance, the for m analogous to (10), or the one of the classical k-mean s algorithm (sum of distances to the cluster centroid). With this kind of bi-partial fu nction, we can de vise a progressive merger procedure, which is based on the sm allest distances between clusters. Actually, for p = n we have, assuming that D ( x ) = 0 for any object x  E K , which is qu ite natural, Q D ( I ) = 0; Q S ( I ) = n, i.e. Q ( I ) = n . (23) Then, the question is: can we find a pair of objects, sa y i * and j * , such that D ({ i * , j * }) < 1, 11 so that the value of the o verall objective function for the partition, in which only this particular pair is formed, is smaller than in (23), i.e. than for the parti tion P = I : D ({ i * , j * }) + n – 1 < n . (24) Consequently, it can be concluded that, as long as t he value of D ( A q ) fo r a cluster A q , which is fo rmed through aggregation of any (t wo) clusters from some preceding partition, is smaller than 1, it pays to proceed with the thus defi ned aggregation, since Q ( P ) shall thereby decrease . In this man ner we can easily design in details a progressive merger a lgorithm that would stop once no longer D ( A q ) < 1 can be found, meaning that we established a suboptimal solution in terms of Q ( P ). It suffices for t he function D ( A q ) to satisfy quite natural and simple conditions (no decrease of the value being pos sible as we join t he objec ts and clusters that are increasingly mo re distant) for t he thus outlined procedure to dete rmine a good approximati on of the optimum solution. The ab ove not only justifies the incorp oration of this case in the b i-partial p aradigm, but also indicate s the essential issue of relevance for the facility locat ion proble m, namely the issue of appr opriate scaling of values, so t hat t he whole pr oblem, and hence its solution, have a prac tical (or at least interpretative) sense. 6.2. The case of minimum distances and maximu m proximities Another reasonable representation of t he clustering problem might inv olve the following, actually quite classical, definiti ons of distances and proximiti es for the particular clusters: D ( A , B ) = min i  A , j  B d ij ; and S ( A ) = max i , j  A s ij (25) with the following defin itions of the components of bi -partial objective function: Q D ( P ) =  q  q’ > q D ( A q , A q’ ) and Q S ( P ) =  q card A q  S ( A q ), (26) where card A denotes the n umber of objects in the set A . This f ormulation, while o ffering a relativel y well ju stified model for the clust ering problem, allows als o for the use of the progres sive merger algorithm, very similar to those already proposed, and also leading to a suboptimal solution. The formulation satisfies by i tself the conditions for such an algorithm to be applicable, and to sto p at the suboptimal s olution. 6.3. The case of average distances and additive p roximities In this case it is possible to apply the progressive merger algorithm, based on the direct application of the precepts analogous to those already presented. Here, at the level of individual clusters, we refer to the following definition s: D ( A , B ) =     B j A i ij d cardB cardA 1 , and S ( A ) =   A j i ij s , 2 1 , (27) while the respective two components of t he overall bi-partial o bjective function, i.e. Q D ( P ) and Q S ( P ), are simple summations over all clusters, for ming the partition. 6.4. The k-means cas e: the objective function We shall now present the bi-partial v ersion of the classical k-means algorithm and its setting. The version here introd uced differs from the original o ne by th e formulation of the o bjective function, with the consequences , concerning the nature of solutions obtained, and, most importantly, the possibility of spec ifying the number of clusters, this being ro utinely done for the clas sical k-m eans ex clusively using the “external” criteria. 12 Let us remind that the general formulation of the classical k -means is based on t he minimised object ive function that we shall denote in the usual manner here adopted, Q D ( P ). In the standard set ting of the classical k-means algorithm , Q D ( P ) has the for m Q D ( P ) =  q  i  Aq d ( x i , x q ), (28) where d (.,.) is some distance function 6 , x i is a vector of values, c haracterising object i , and x q is the (analogous) characterisa tion of the “representative” of cluster A q , q = 1,…, p . The minimum values o f Q D ( P ) are obtained w ith the use o f the primeval generic algor ithm associated with the k-means, namely the “centre-and-realloca te” one: for some initial, possibly random, set o f x q , q = 1 ,…, p , assign the elements of the set X to the closest x q , determining thereby t he clusters A q , then calculate the new x q , e.g. as averages over A q , and go back to t he assign ment step, stopping the ent ire pr ocedure whenever t here is no change between the iterations, o r the change satisfies definite conditions. It is known that this alg orit hm very efficiently leads to a loc al minimum of Q D ( P ) for a given value of p – in a very limited number of iterations for even quite large data sets. In addition, even though it is known that the “centre-and-reallocate” procedure att ains just a local minimum of the given objective function, starting the procedure from multiple randomly generated sets of “representatives” x q ultimately yields the proper m inimum for t he given number of clusters p . There exist a number of techniques for improving the c hoice of initial set o f x q , which diminish t he number of necessary rep etitions of the procedure. The minimum values of Q D ( P ), determined with the primeval algorithm for consecutive numbers of clusters, p , these values denoted here as Q * D ( p ), decrease, from a simile of total variance, conform to (28), for p = 1, i.e. for the entire set of ob jects considered, treat ed as a single clus ter, down to z ero for p = n (or even “earlier”, in terms of p ). Hence, it is necessary to run the respective algorithm, with repetitions, mentioned before, for several consecutiv e v alues of p and apply an external (statistical) criterion, say, AIC, BIC, Cali nski & Harabasz, etc., in or der to pick the “proper” n umber of clusters p . In this manner we are o bliged, in order to find the “best” p , to recur to a criterion that does not stem from the same “philosophy” as the objective function we used, Q D ( P ), an d the respec tive algorithm, and thus a criterion ap plied ma y be poorly f it to the original k-means fram ework (the criterion may b e oriented at a different c hara cter of partition than produced by th e k- means alg orithm). This i mportant reservation holds true even if we accept the results produced by such an “ex te rnal” criterion o n the basis of comparison of results produced by several different criteria (all of them being actually in principle similarly “external ” to the original procedure). We would like to resolve this issue by applying t he bi-partial approach. The weak point of this approach is that t here is no r eady recip e for designing the concrete forms o f Q S D ( P ). It is quite often so that – like in the case o f k-means type of algorithms – so me o bjective function, whether explicit or implicit, corresponding to an existing approach, and representing either Q S ( P ) o r Q D ( P ) (i.e. Q D ( P ) o r Q S ( P ) for the dual formulation), can be complemented with an appropriate c ounterpart, which has then to be cleverly designed. 6 Actually, i n order to ensure t he fulfilment of the a ppropria te properties of the k -means a lgorithm, especially related to the se curing that the cluster mean mi nimises the sum of distances to t he objects in a cluster, most ly the squared Euclidean distanc e is referred to in t his context. Otherwise, the ba sic propertie s mentioned are replaced by respe ctive approx imations. 13 The reward, however, may be w orth the effort: for the pair of functions fulfilling certain additional conditions, which in some cases are quite natural, one obtains also a straightforward, even if not always very effective, algor ithm, leading to the opti mum or sub-optimum partitio n P . In this p articular case, t he objective function, matching the principles of k-mea ns, might have the following minimised form : Q D S ( P ) = Q D ( P ) + Q S ( P ), (29) with Q D ( P ) de fined as in form ula (28). The second component, Q S ( P ), would t hen have to reflect the overall measure of inter- cluster proximity (similarity). It might take, in particular, the for m of Q S ( P ) = ½  q S* ( A q ), (30) with S* ( A q ) being the “ outer similarity” measure for the cluste r A q , defined, say, as S* ( A q ) =  i  Aq max q’  q s ( x i , x q’ ). (31) With the definitions, provided through the formulae (29)-(31) we can observe the behaviour of the objective function up on choosing a definite measure of distance. If we c hoose Manhattan, or city-block distance, we dea l in the “centering” step only w i th an approximation of the minimum sum of distances. Computations, carried out for simple academic data sets, serve to v erify the basic features of the proposed objective function. An example thereof is shown in Fig. 1. It can be easily seen that it is meant to c heck the possibility of indicati ng more than o ne “best” partition, and the respective (alternative / potential) partitions are well visible. It can be said that in this example one dea ls with true “nested” partitions (two or even thr ee levels of quasi-optimum partitions). This example, on the one ha nd, provides the possibilit y o f checking the “correctness” of results froma clustering alg orithm, and, on the other, poses challenges that cannot be properly addressed by the existing clustering methods . The values of the functions, involved in (29), ob tained in respective consecutive solutions for the successive values of p , are provided in Table 2. Figure 1 An academ ic examp le of a data set for th e bi-parti al k-means Some explanations are due, concerning the exemplary calculations. Thus, n = 60. The values of the objective f unction Q D S f or th e extreme numbers of clu sters (1 and 60), shown in Table 2, d iffer, d espite Ne sted cl usters - 2D - three levels 0 1 2 3 4 5 6 7 8 0 1 2 3 4 5 6 7 8 x 1 x 2 14 the application of an “ a veraging” definition of transfor mat i on s ( d ) (i.e. transformation, which preserves the average v alue over the set of pairs of objects), due to the rounding, appearing in t he actual transformation. Finall y, the values for p = 7 were not calculat ed. Table 2. Optimu m values of th e objective functio n (29) and its components for the su ccessive numbers o f clusters for the aca demic ex ample, illustrated in Fig. 1 No. of clusters Q D S Q D Q S 1 353.65 353.65 0 2 338.87 207.05 131.82 3 333.67 134.87 198.80 4 265.15 59.67 205.48 5 288.23 51.05 237.18 6 307.38 42.52 264.86 7 X X X 8 335.40 21.96 324.44 60 361.23 0 361.23 Concerning the results obtained, let us note that the bi-partial objective function, as designed for this particular case, indicates “correct” values of p as corresponding to the part itions (clusterings), visible in the data sets. So, a v ery clear m ini mum is observed for p = 4. No proper local minima, though, are observed, as it could b e ho ped, for p = 8, and pos sibly higher values of p . Still, for p = 8 a s light deviation is observed in the respec tive curve of Q D S ( p ) from the upward trend towards Q D S (60). These other minima would have been o btained if a different transformation s ( d ) were applied (see also t he next subsection) or a different weight w ere used in formula (30). The assumptions made appear, ho wever, to be quite “natural”, and thes e other solu tions can b e reasona bly sought only after the initial, “natural” solution would have been found. 6.5. The k-means cas e: the procedure We shall now show the development of the correspondin g algorithmic procedure for the k-means case. Let us remind that we deal here with the objective function Q D S ( P ) = Q D ( P ) + Q S ( P ) with Q D ( P ) =  q  i  Aq d ( x i , x q ) and Q S ( P ) = ½  q  i  Aq max q’  q s ( x i , x q’ ), (32) which is minimized. Now, as before, we para meterize this function: Q D S ( P , r ) = rQ D ( P ) + (1- r ) Q S ( P ) (33) and start with, say, r 0 = 1. For this value of r we have min P Q D S ( P ,1) = min P Q D ( P ), and thus it is obvi ous, given the properties of Q D ( P ), that the optimu m P ( r 0 ) = P 0 = I , as Q D ( I ) = 0. Starting w ith r 0 = 1 we decrease the value of t he parameter, and so (1- r ) Q S ( P ) increases from zero (while rQ D ( P 0 ) decreases). The part of the objective function, associated with Q S ( P ), is equal (1- r ) ½  i  I max j  i s ij . (34) Conform to the idea adopted, we look for the possibly highest (first encountered) value of r , for which P 0 is rep laced by another partition, minimizing the objective function. We look for the shift, constituted by the merger o f two objects (clusters), but, in fact, at this st age (parameter value) this would be a global minim um. So, we select for t he merger the objects (clusters) i *, j *, on the basis of max i , j s ij (i.e., min i , j d ij ), and compare the values of the objective function before and after the merger, whic h takes place, when the left hand s ide (before) is bigger than t he right hand side (after): 15 (1- r ) ½  i  I max j  i s ij > r min i , j d ij + (1- r ) ½ (  i  I max j  i s ij - s i*j* ). (35) After straightforward trans formations, we get from (35): r < s i*j* /( s i*j* + d i*j* ), (36) which, definitely, confirms our choice of s i*j* = m ax i , j s ij , since we look for the first, i.e. the biggest r , satisfying (36). We have th us obtained the relation, corresp onding to (17), actua lly analogous to (9). Now, we would like to ascertain that when we deal with clusters composed of more objects, the relations, derived before, a lso generally hold. So, again we compare the values of the objective fun ctio n befor e and after merger: before: r  q D ( A q ) + (1- r ) ½  q S ( A q ) (37a) after: r (  q D ( A q ) - D ( A q 1 ) - D ( A q 2 ) + D ( A q 12 )) + (1- r ) ½ (  q S ( A q ) - S ( A q 1 ) - S ( A q 2 ) + S ( A q 12 ), (37b) where clusters in dexed q 1 and q 2 are su pposed to be m erged and to for m together t he cluster i ndexed q 12. Like b efore, when looking for the condition th at (37a) gets bigger than ( 37b), after simple transformations, we obtain : 𝑟 <   (               )   (               )(               ) . ( 38) Again, we t ry to find the biggest r by selecting appropriate c lusters, indexed q 1, q 2. So, we have formulated a progressive merger a lgorithm, w hich fits into the k-means framework, but actually proceeds in a different manner. At the end of this secti on we shall ye t devote so me atten tion to the pr operties of the qu antities, which appear in formula (3 8), namely the crucial two:  S q 1 q 2 = S ( A q 1 ) + S ( A q 2 ) - S ( A q 12 ) (39a)  D q 1 q 2 = D ( A q 12 ) - D ( A q 1 ) - D ( A q 2 ). (39b) First, it is obvious that both of these ar e non-negative f or any two clusters, in dexed q 1 and q 2. Hence, the v alue of (38) belongs to the interval [0,1]. Then, also in connection with t he prev ious statement, (39a) can be regarded as a measure of proximity between two clusters, while (39b) – as a measure of distance between two clusters. Hence, we definite ly deal here with the minim um distance aggregation rule. Another interesting point is t he resulting sequence of values of r t for subsequent cluster m ergers. Although we choose t he possibly bigges t v alue, according t o (38), we cannot be assured t hat the sequence we g et is monoton ic (non-increasing). Yet, if it so occurs for so me merger t that the value of r t is b igger than r t -1 , then w e have a clear indication that the distribution of objects in space is inhomogeneous in the manner, illustrated in Fig. 1 (groups of close-by c lusters , these groups being separated by larger distanc es). 6.6. Some computational is sues We have presented here th e examples of both the formulation s of the bi-partial object ive function and the associated merger algorithms. As mentioned alread y, the hierarchical merger algorithms suffer from the c omputational burd en, which makes the m of l ittle or no u se for really la rge data sets. That is why nowadays, such app roaches are used, i f at all, eith er for smaller data sets, f or “didactic” purposes, or in conjunction with other, faster approaches, in the form of hybrid procedures. This is o ften done with a fas ter algorithm f orming initial g roups, which are then m erged by a hierarchical merger 16 algorithm. This is, fo r instance, done in Owsiński and Mejza (2007) and Owsiński (2010), with classical k-means in t he first st age, which starts from a relatively high number of clusters, say n 1/2 , with the bi- partial objective values Q S D ( P ) being observed for consecutive, decreasing p . Then, the mergers are performed according to th e bi-partial-generated rules . The case of k-means, described befo re, is insofar specif ic as by applying the bi-partial-b ased merging we lose, of course, t he advantage of t he origi nal k-means in t erms of fast c onvergence, without the need of taking into acc ount all the distances / proximiti es. Yet, altogether, the verification of applicability of t he bi-partial approac h to the k-means-type paradigm shows that it is n ot only fully fe asible, bu t al so effective. Definitely, similar ( or an alogous) bi- partial functions can be defined also for the fuzzy version of the paradigm (FCM). The paper by Dvoenko and Owsiński (2019) presented yet another version of the same paradigm (“the meanless k - means”), also transf ormed using the bi-partial appro ach. 7. Conclusions The paper introduced the general o utline for a broad class of progressive merger procedures of clustering, based o n the concept of bi-partial objective function, whic h serves to r epresent correctly (fully) the original proble m o f clusterin g. It is shown how the bi-partial objective func tion can be designed for a varie ty of cir cumstances (assumptions as to th e clustering criteria), and how the merg er algorithms can be derived for these formulations, app roximating the corresponding optimal solution. Thus, th e appr oach pr ovides, first, the effective instrument o f evaluation of the p artitions / clu sterings obtained, and also an effe ctive, general form of resp ective suboptimisation proced ure. In view of the computation al issues, related to the merger algorithms, it is advised to recur to hy brid algorithms, with mergers performed in the later stages of suc h algorithms. It must be noted, though, that the broad applicability and v ersatility of the overall approach allows for very different forms of the merger rules, some of which may be much more computationall y efficient than the others. Another open questi on relates to the choice and consequences of the more st rict conditions on the clustering criteria, which could be use d within the bi-p artial fra mework regarding the correct con struction of the objective function and the derivation of the merger algorithms. Compliance with Ethical St andards Funding: no special fundin g is associated with this pap er Conflict of Interest: there a re no conflicts of interest, a ssociated with this paper a nd its contents Ethical Conduct: this paper does not involve any releva nt aspects from the point of view of ethical conduct Data Availability Statement: this p aper refers solel y to small academic sets of illustrative data, which can be obtained fro m the author upon request Bibliography Cohen-Addad V., Kanade V., Mallmann-Trenn F. and Mathieu C. ( 2017) Hierarchical Clustering: Objective Functions and Al gorithms. arXiv:1704.02147v1 [cs.DS] 7 Apr. 2017. Dasgupta S. (2016) A Cost Function for Similarity-Based Hierarchical Clustering. STOC’16 , June 19-21, 2016, Cambridge, MA, USA ; ACM. 17 Ducimetière P. (1 970) Les méthodes de la c lassificati on numérique. Revue de Statistique Appliquée XVIII , 4, 5-25. Dvoenko S. D. and Owsiński J. W . ( 2019) The Permutable k-Means for t he Bi-Partial Criterion. Informatica , vol. 43, No. 2. Everitt B. S., Landau S., L eese M.and Stahl D. (2011) Cluster Analysi s. John Wiley an d Sons Ltd. Florek K ., Łukaszewicz J., Perkal J., Steinhaus H. and Zubrzycki S. ( 1956) Taksonomia W rocławska (Th e Wrocław Taxonomy; in Poli sh). Przegląd Antropologic zny , 17. Hartigan J. A. (1967) Repre sentation of similarity matr ices by trees. JASA 62, 114 0-1158. Hennig Ch., Meila M., Murtagh F. and Rocci, R . (2015) Handbook o f Cluster Analysis. Chapman and Hall/CRC, New York. Lance G. N. and Williams W. T. (1966) A Generalized Sorting Strategy for Computer Classifications. Nature , 212, 218. Lance G. N. and Williams W. T. (1967) A General Theory of Classification Sorting Strategies. 1. Hierarchical Systems. Com puter Journal , 9, 373-380. Marcotorchino F. and M ichaud P. (1979) Optimisation en Analyse Ordinale des Données . Masson, Paris. Marcotorchino F. and M ichau d P. (1982) A ggrégatio n de similarités en classification automatique. Revue de Stat. Appl . 30 , 2. Mirkin B. ( 1995) Mathematic al Classification and Clust ering . Kluwer Acad emic Pub lishers, Dordrecht – Boston – London. Murtagh F. and Cont reras P. (2012) Algorithms for hierarchical clustering: an overview. WIREs Data Mining Knowledge Discov ery 2, 86-97; doi: 10.100 2/widm.53 Owsiński, J.W. (2010) On a t wo-stag e clustering procedure and t he c hoice of objective function. In: Computer Dat a Analysis and Modeling: C omplex Stochastic Dat a and Systems. Pr oc. of the 9 th International Conference. Minsk, Sept embe r 7-11, 2010 . Publishing center of BSU, Minsk, 2010, 157-165. Owsiński, J. W. (202 0) Data Analysis in Bi-partial Perspective: Clust ering and Beyond . Studies in Computational Intelligence 818 . Springer Nature Switzerla nd. Owsiński, J. W., Mejza, M. T. (2007) On a New Hybrid Cl ustering Method for General Purp ose Use and Pattern Recognition. In: Proceedings of the International Multiconference on Computer S cence and Information Technology , vol. 2, ISSN 1896-7094, htt p://www.papers2 007.imcsit.org/ 1 21- 126. Owsiński J.W. and Zadro żny Sł . (1986) Clustering for ordinal data: a linear programming formulation. Control & Cybernetics , 15 , no. 2, 183-193. Owsiński J.W. and Zadrożn y Sł. (1988) A flexible system o f precedence coefficient aggregation and consensus m easurem ent. Systems Analysis a nd Simulation 1988 , A. Sydow, S. G. Tzafestas & R. Vichnevetsky, eds., vol. 2: Applications . Mathem atica l Research , 47. Akademie Verlag, Berlin, 364-3704. Podani J. (1989) New co mbinatorial clustering methods. Vege tatio , 81, 61-77. Rendón E., Abundez I. , Arizmendi A. and Quiroz E. M . (2011) Internal vs. External cluster validation indexes. Int. Journal of Computers and Communi cations , 5 (1), 27-34. Rubin J. ( 1967) Optimal classification into g roups: an approach for solving t he taxonomy problem. J. Theoretical Biology 1 5, 103-144. Vendramin L., C a mpello R. J. G. B. and Hruschka E. R. (2010) Relative Clusterin g Validity Criteria: A Comparative Overview. Wiley InterScience DOI: 10.1002/sa m.10080 Xu R. and Wunsch D. C. (2009) Clustering . John Wiley and Sons, Inc., Hoboken, N J.

Hierarchical Aggregation Clustering Algorithms Derived from the Bi-partial Objective Function

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment