Axiomatic Construction of Hierarchical Clustering in Asymmetric Networks
This paper considers networks where relationships between nodes are represented by directed dissimilarities. The goal is to study methods for the determination of hierarchical clusters, i.e., a family of nested partitions indexed by a connectivity pa…
Authors: Gunnar Carlsson, Facundo Memoli, Alej
1 Axiomatic Construction of Hierarchical Clustering in Asymmetric Networks Gunnar Carlsson, Facundo M ´ emoli, Alejandro Ribeiro, and Santiago Segarra Abstract —This paper considers networks wher e relationships be- tween nodes ar e repr esented by dir ected dissimilarities. The goal is to study methods for the determination of hierar chical clusters, i.e., a family of nested partitions indexed by a connectivity parameter , induced by the given dissimilarity structures. Our construction of hierarchical clustering methods is based on defining admissible meth- ods to be those methods that abide by the axioms of value – nodes in a network with two nodes ar e clustered together at the maximum of the two dissimilarities between them – and transf ormation – when dissimilarities are r educed, the network may become more clustered but not less. Sev eral admissible methods are constructed and two particular methods, termed reciprocal and nonreciprocal clustering, are shown to pro vide upper and lower bounds in the space of admissible methods. Alternativ e clustering methodologies and axioms are further considered. Allowing the outcome of hierarchical clustering to be asymmetric, so that it matches the asymmetry of the original data, leads to the inception of quasi-clustering methods. The existence of a unique quasi-clustering method is shown. Allowing clustering in a two-node network to proceed at the minimum of the two dissimilarities generates an alternati ve axiomatic construction. There is a unique clustering method in this case too. The paper also develops algorithms f or the computation of hierarchical clusters using matrix powers on a min-max dioid algebra and studies the stability of the methods proposed. W e pro ved that most of the methods introduced in this paper are such that similar networks yield similar hierarchical clustering results. Algorithms are exemplified through their application to networks describing internal migration within states of the United States (U .S.) and the interr elation between sectors of the U .S. economy . I . I N T RO D U C T I O N The problem of determining clusters in a data set admits different interpretations depending on whether the underlying data is metric, symmetric but not necessarily metric, or asymmetric. Of these three classes of problems, clustering of metric data is the most studied one in terms of both, practice and theoretical foundations. In terms of practice there are literally hundreds of methods, techniques, and heuristics that can be applied to the determination of hierarchical and nonhierarchical clusters in finite metric spaces – see, e.g., [5]. Theoretical foundations of clustering methods, while not as well dev eloped as their practical applications [6]–[8], have been ev olving over the past decade [9]– [14]. Of particular relev ance to our work is the case of hierarchical clustering where, instead of a single partition, we look for a family of partitions index ed by a resolution parameter; see e.g., [15], [16, Ch. 4], and [17]. In this context, it has been shown in [13] that single linkage [16, Ch. 4] is the unique hierarchical W ork in this paper is supported by NSF CCF-0952867, AFOSR MURI F A9550- 10-1-0567, D ARP A GRAPHS F A9550-12-1-0416, AFOSR F A9550-09-0-1-0531, AFOSR F A9550-09-1-0643, NSF DMS 0905823, and NSF DMS-0406992. G. Carlsson is with the Department of Mathematics, Stanford Univ ersity . F . M ´ emoli is with the Department of Mathematics and the Department of Computer Science and Engineering, Ohio State Univ ersity . A. Ribeiro and S. Segarra are with the Depart- ment of Electrical and Systems Engineering, Univ ersity of Pennsylvania. Email: gunnar@math.stanford.edu, memoli@math.osu.edu, aribeiro@seas.upenn.edu, and ssegarra@seas.upenn.edu. Parts of the results in this paper appeared in [1]–[4]. clustering method that satisfies three reasonable axioms. These axioms require that the hierarchical clustering of a metric space with two points is the same metric space, that there be no non singleton clusters at resolutions smaller than the smallest distance in the space, and that when distances shrink, the metric space may become more clustered but not less. When we remove the condition that the data be metric, we mov e into the realm of clustering in networks, i.e. a set of nodes with pairwise and possibly dir ected dissimilarities represented by edge weights. F or the undirected case, the kno wledge of theoretical underpinnings is incipient b ut practice is well de vel- oped. Determining clusters in this undirected context is often termed community detection and is formulated in terms of finding cuts such that the edges between different groups hav e high dissimilarities – meaning points in different groups are dissimilar from each other – and the edges within a group have small dissimilarities – which means that points within the same cluster are similar to each other , [18]–[20]. An alternative approach for clustering nodes in graphs is the idea of spectral clustering [21]– [24]. When a graph contains sev eral connected components its Laplacian matrix has multiple eigen vectors associated with the null eigen v alue and the nonzero elements of the corresponding eigen vectors identify the different connected components. The underlying idea of spectral clustering is that different communities should be identified by examining the eigen vectors associated with eigen values close to zero. Further relaxing symmetry so that we can allo w for asymmetric relationships between nodes [25] reduces the number of av ailable methods that can deal with such data [26]–[34]. Examples of these methods are the adaptation of spectral clustering to asymmetric graphs by using a random walk perspecti ve [32] and the use of weighted cuts of minimum aggregate cost [33]. In spite of these contributions, the rarity of clustering methods for asymmetric networks is expected because the interpretation of clusters as groups of nodes that are closer to each other than to the rest is difficult to generalize when nodes are close in one direction but far apart in the other . E.g., in the network in Fig. 1 nodes a and b are closest to each other in a clockwise sense, but farthest apart in a counterclockwise manner , c and d seem to be closest on average, yet, it seems that all nodes are relativ ely close as it is possible to loop around the network clockwise without encountering a dissimilarity larger than 3. Although it seems difficult to articulate a general intuition for clustering of asymmetric networks, there are nevertheless some behaviors that we should demand from any reasonable clustering method. Follo wing [11]–[14], the perspectiv e taken in this paper is to impose these desired behaviors as axioms and proceed to characterize the space of methods that are admissible with respect to them. While different axiomatic constructions are discussed here, the general message is that surprisingly strong structure can be induced by seemingly weak axioms. E.g., by defining the 2 a b c d 1 6 7 3 2 4 5 2 10 10 10 10 Fig. 1. Asymmetric network. Edges denote directed dissimilarities between nodes. Clustering intuition is precarious because there is not a clear proximity notion between nodes. E.g., the pair a, b has the smallest dissimilarity in one direction whereas the pair c, d has the smallest av erage dissimilarity in both directions. It is not clear which of the two pairs is less dissimilar . result of clustering networks with two nodes and specifying the behavior of admissible methods when the gi ven dataset shrinks, we encounter that two simple methods are uniformly minimal and maximal across nodes and networks among all those that are admissible (Section VI). Besides axiomatic constructions, this paper also studies stability with respect to perturbations of the original data and establishes that most methods are stable (Section XI). W e also introduce computationally tractable algorithms to determine the hierarchical clusters that result from applying the different methods that we propose (Section VIII). These algorithms are applied to cluster the network of internal migration between states of the United States (U.S.) and the network of interactions between sectors of the U.S. economy (Section XII). W e also introduce the concept of hierarchical quasi-clustering that generalizes the idea of hierarchi- cal clustering to permit retaining asymmetric influences (Section IX) between the clusters. The following sections present a more detailed pre view of the results outlined above. A. Fundamental axioms and admissible methods Recall that hierarchical clustering methods produce a resolution dependent clustering of a giv en network. Throughout this paper we introduce various axioms and properties that represent sev eral desirable features of hierarchical clustering methods. Among these, the axioms of value and transformation underly most of the results presented in this paper . These axioms are stated formally in Sections III and III-A but they correspond to the follo wing intuitions: (A1) Axiom of V alue. For a network with two nodes, the nodes are first clustered together at a resolution level equal to the maximum of the two intervening dissimilarities. (A2) Axiom of T ransformation. If we consider a domain net- work and map it into a target network in a manner such that no pairwise dissimilarity is increased by the mapping, then the resolution lev el at which two nodes in the target network become part of the same cluster is not larger than the level at which they were clustered together in the original domain network. The intuition supporting the Axiom of T ransformation is that if some nodes become closer to each other , it may be that new clusters arise, but no cluster can disappear . The intuition supporting the Axiom of V alue is that the two nodes in the two- node network form a single cluster at resolutions that allow them to influence each other directly , i.e., resolutions larger than the dissimilarities between them. A hierarchical clustering method satisfying axioms (A1) and (A2) is said to be admissible. Our first theoretical study is the relationship between cluster- ing and mutual influence in networks of arbitrary size (Section IV). In particular , we show that the outcome of any admissible hierarchical clustering method is such that a necessary condition for two nodes to cluster together is the existence of chains that allow for direct or indirect influence between the nodes. W e can interpret this result as showing that the requirement of direct influence in the two-node network in the Axiom of V alue (A1) induces a requirement for , possibly indirect, influence in general networks. This result is termed the Property of Influence and plays an instrumental role in the theoretical de velopments presented throughout the paper . T wo hierarchical clustering methods that abide by axioms (A1) and (A2), and that therefore satisfy the Property of Influ- ence, are then deriv ed (Section V). The first method, r ecipr ocal clustering , requires clusters to form through edges exhibiting low dissimilarity in both directions whereas the second method, nonr eciprocal clustering , allo ws clusters to form through cycles of small dissimilarity . More specifically , reciprocal clustering defines the cost of an edge as the maximum of the two directed dissimilarities. Nodes are clustered together at a giv en resolution if there exists a chain linking them such that all links in the chain ha ve a cost smaller than said resolution. In nonreciprocal clustering we consider directed chains and define the cost of a chain as the maximum dissimilarity encountered when traversing it from beginning to end. Nodes are clustered together at a gi ven resolution if it is possible to find directed chains in both directions whose edge costs do not exceed the given resolution. Observe that both of these methods rely on the determination of chains of minimax cost linking any pair of nodes. This fact is instrumental in the deriv ation of algorithms for the computation of output dendrograms as we discuss in Section I-C. A fundamental result regarding admissible methods is the proof that any clustering method that satisfies axioms (A1) and (A2) lies between reciprocal and nonreciprocal clustering in a well- defined sense (Section VI). Specifically , any clustering method that satisfies axioms (A1) and (A2) forms clusters at resolutions larger than the resolutions at which the y are formed with nonrecip- rocal clustering, and smaller than the resolutions at which they are formed with reciprocal clustering. The clustering resolutions vary from method to method, but they are always contained within the specified bounds. When restricted to symmetric networks, reciprocal and nonreciprocal clustering yield equiv alent outputs, which coincide with the output of single linkage (Section VI-A). This observation is consistent with the existence and uniqueness result in [13] since axioms (A1) and (A2) are reduced to two of the axioms considered in [13] when we restrict attention to metric data. The deriv ations in this paper show that the existence and uniqueness result in [13] is true for all symmetric, not necessarily metric, datasets and that a third axiom considered there is redundant because it is implied by the other two. W e then un veil some of the clustering methods that lie between reciprocal and nonreciprocal clustering and study their properties (Section VII). Three families of intermediate clustering methods 3 are introduced. The grafting methods consist of attaching the clustering output structures of the reciprocal and nonreciprocal methods in a way such that admissibility is guaranteed (Section VII-A). W e further present a construction of methods that can be reg arded as a conv ex combination in the space of clustering methods. This operation is sho wn to preserve admissibility there- fore giving rise to a second family of admissible methods (Section VII-B). A third family of admissible clustering methods is defined in the form of semi-reciprocal methods that allow the formation of cyclic influences in a more restricti ve sense than nonreciprocal clustering but more permissiv e than reciprocal clustering (Section VII-C). In some applications the requirement for bidirectional influence in the Axiom of V alue is not justified as unidirectional influence suffices to establish proximity . This alternativ e value statement leads to the study of alternative axiomatic constructions and their corresponding admissible hierarchical clustering methods (Section X). W e first propose an Alternati ve Axiom of V alue in which clusters in two-node networks are formed at the minimum of the two dissimilarities: (A1”) Alternative Axiom of V alue. F or a network with two nodes, the nodes are clustered together at the minimum of the two dissimilarities between them. Under this axiomatic framew ork we define unilateral clustering as a method in which influence propagates through chains of nodes that are close in at least one direction (Section X-A). Contrary to the case of admissibility with respect to (A1)-(A2) in which a range of methods exist, unilateral clustering is the unique method that is admissible with respect to (A1”) and (A2). A second alternati ve is to take an agnostic position and allow nodes in two-node networks to cluster at any resolution between the minimum and the maximum dissimilarity between them. All methods considered in the paper satisfy this agnostic axiom and, not surprisingly , outcomes of methods that satisfy this agnostic axiom are uniformly bounded between unilateral and reciprocal clustering (Section X-B). B. Hier ar chical quasi-clustering Dendrograms are symmetric structures used to represent the outputs of hierarchical clustering methods. Having a symmetric output is, perhaps, a mismatched requirement for the processing of asymmetric data. This mismatch motiv ates the dev elopment of asymmetric structures that generalize the concept of dendrogram (Section IX). Start by observing that a hierarchical clustering method is a map from the space of networks to the space of dendrograms, that a dendrogram is a collection of nested partitions indexed by a resolution parameter and that each partition is induced by an equi valence relation, i.e., a relation satisfying the reflexi vity , symmetry , and transitivity properties. Hence, the symmetry in hierarchical clustering deriv es from the symmetry property of equiv alence relations which we remov e in order to construct the asymmetric equi valent of hierarchical clustering. T o do so we define a quasi-equiv alence relation as one that is reflexi ve and transitive but not necessarily symmetric and define a quasi-partition as the structure induced by a quasi-equi valence relation – these structures are also known as partial orders [35]. Just like regular partitions, quasi-partitions contain disjoint blocks of nodes b ut also include an influence structure between the blocks deriv ed from the asymmetry in the original network. A quasi- dendrogram is further defined as a nested collection of quasi- partitions, and a hierarchical quasi-clustering method is regarded as a map from the space of networks to the space of quasi- dendrograms (Section IX-A). As in the case of (regular) hierarchical clustering we proceed to study admissibility with respect to asymmetric versions of the axioms of v alue and transformation (Section IX-C). W e sho w that there is a unique quasi-clustering method admissible with respect to these axioms and that this method is an asymmetric version of the single linkage clustering method (Section IX-D). The analysis in this section hinges upon an equiv alence between quasi-dendrograms and quasi-ultrametrics (Section IX-B) that generalizes the known equi valence between dendrograms and ultrametrics [36]. If we further recall that, for symmetric netw orks, single linkage is the only hierarchical clustering method that is admissible with respect to the axioms of v alue and transformation [cf. [13] and Section VI-A], we conclude that there is a strong parallelism between symmetric networks, equiv alence relations, partitions, dendrograms, ultrametrics, and single linkage on the one hand and asymmetric networks, quasi-equiv alence relations, quasi-partitions, quasi-dendrograms, quasi-ultrametrics, and di- rected single linkage on the other . In the same way that dendro- grams are particular cases of quasi-dendrograms, ev ery element in the former list is a particular case of the corresponding element in the latter . Moreov er , every result relating two elements of the former list can be generalized as relating the two corresponding, more general elements in the latter . C. Algorithms and Stability Besides the characterization of methods that are admissible with respect to different sets of axioms we also develop algorithms to compute the dendrograms associated with the methods introduced throughout the paper and study their stability with respect to perturbations. The determination of algorithms for all of the methods intro- duced is giv en by the computation of matrix powers in a min-max dioid algebra [37]. In this dioid algebra we operate in the field of positiv e reals and define the addition operation between two scalars to be their minimum and the product operation of two scalars to be their maximum (Section VIII). From this definition it follows that the ( i, j ) -th entry of the n -th dioid power of a matrix of network dissimilarities represents the minimax cost of a chain linking node i to node j with at most n edges. As we hav e already mentioned, reciprocal and nonreciprocal clustering require the determination of chains of minimax cost. Similarly , other clustering methods introduced in this paper can be interpreted as minimax chain costs of a previously modified matrix of dissimilarities which can therefore be framed in terms of dioid matrix po wers as well. E.g., in unilateral clustering we define the cost of an edge as the minimum of the dissimilarities in both directions and then search for minimax chain costs whereas in semi-reciprocal clustering we limit the length of allowable chains. In order to study the stability of clustering methods with respect to perturbations of a network, following [13], [38] we adopt and adapt the Gromov-Hausdorf f distance between finite metric spaces [39, Chapter 7.3] to furnish a notion of distance between asymmetric networks (Section XI-A). This distance allo ws us to compare any two networks, even when they hav e different node 4 sets. Since dendrograms are equiv alent to finite ultrametric spaces which in turn are particular cases of asymmetric networks, we can use the Gromov-Hausdorff distance to quantify the difference between two dendrograms obtained when clustering two different networks. W e then say that a clustering method is stable if the clustering outputs of similar networks are close to each other . More precisely , we say that a clustering method is stable if, for any pair of netw orks, the distance between the output dendrograms can be bounded by the distance between the original networks. In particular , stability of a method guarantees robustness to the presence of noise in the dissimilarity values. Although not ev ery method considered in this paper is stable, we show stability for most of the methods including the reciprocal, nonreciprocal, semi- reciprocal, and unilateral clustering methods (Section XI-B). D. Applications Clustering methods are ex emplified through their application to tw o real-world networks: the network of internal migration between states of the U.S. for the year 2011 and the network of interactions between economic sectors of the U.S. economy for the year 2011 (Section XII). The purpose of these examples is to understand which information can be extracted by performing hierarchical clustering and quasi-clustering analyses based on the different methods proposed. Analyzing migration clusters pro vides information on population mixing (Section XII-A). Analyzing interactions between economic sectors un veils their relati ve impor - tances and their differing levels of coupled interactions (Section XII-B). The migration network example illustrates the different clus- tering outputs obtained when we consider the Axiom of V alue (A1) or the Alternativ e Axiom of V alue (A1”) as conditions for admissibility . Unilateral clustering, the unique method compatible with (A1”), forms clusters around influential states like California and T exas by merging each of these states with other smaller ones around them (Section XII-A4). On the other hand, methods compatible with (A1) like reciprocal clustering, tend to first merge states with balanced bidirectional influence such as two different populous states or states sharing urban areas. In this way , reciprocal clustering sees California first merging with T exas for being two very influential states and W ashington merging with Oregon for sharing the urban area of Portland (Section XII-A1). Moreov er , the similarity between the reciprocal and nonreciprocal outcomes (Section XII-A2) indicates that no other clustering method satisfying axiom (A1) w ould re veal ne w information, thus, intermediate clustering methods are not applied. Clustering meth- ods provide information about grouping but obscure information about influence. T o study the latter we apply the directed single linkage quasi-clustering method (Section XII-A5). Analysis of the output quasi-dendrograms show , e.g., the dominant roles of California and Massachusetts in the population influxes into the W est Coast and New England, respectiv ely . The network of interactions between sectors of the U.S. econ- omy records how much of a sector’ s output is used as input to another sector of the economy . For this network, reciprocal and nonreciprocal clustering output essentially different dendrograms, indicating the ubiquity of influential cycles between sectors. Reciprocal clustering first mer ges sectors of bidirectional influence such as professional services with administrative services and the farming sector with the food and beverage sector (Section XII-B1). Nonreciprocal clustering, on the other hand, captures cycles of influence such as the one between oil and gas extraction, petroleum and coal products, and the construction sector (Section XII-B2). Howe ver , nonreciprocal clustering propagates influence through arbitrarily large cycles, a feature which might be undesir- able in practice. The observed difference between the reciprocal and the nonreciprocal dendrograms moti vates the application of a clustering method with intermediate behavior such as the semi- reciprocal clustering method with parameter 3 (Section XII-B3). Its cyclic propagation of influence is closer to the real behavior of sectors within the economy and, thus, we obtain a more reasonable clustering output. Finally , the application of the directed single linkage quasi-clustering method rev eals the dominant influence of energy , manufacturing, and financial and professional services ov er the rest of the economy (Section XII-B5). I I . P R E L I M I NA R I E S W e define a network N X to be a pair ( X , A X ) where X is a finite set of points or nodes and A X : X × X → R + is a dissimilarity function. The dissimilarity A X ( x, x 0 ) between nodes x ∈ X and x 0 ∈ X is assumed to be non negati ve for all pairs ( x, x 0 ) and 0 if and only if x = x 0 . W e do not, ho wev er , require A X to be a metric on the finite space X : dissimilarity functions A X need not satisfy the triangle inequality and, more consequential for the problem considered here, the y may be asymmetric in that it is possible to hav e A X ( x, x 0 ) 6 = A X ( x 0 , x ) for some x 6 = x 0 . In some discussions it is con venient to introduce a labeling of of the elements in X , X = { x 1 , . . . , x n } , and reinterpret the dissimilarity function A X as the possibly asymmetric matrix A X ∈ R n × n + with ( A X ) i,j = A X ( x i , x j ) for all i, j ∈ { 1 , ..., n } . The diagonal elements ( A X ) i,i = A X ( x i , x i ) are zero. As it doesn’t lead to confusion we use A X to denote both, the dissimilarity function and its matrix representation. W e further define N as the set of all networks N X . Networks in N can hav e different node sets X as well as different dissimilarities functions A X . An e xample network is shown in Fig. 1. The set of nodes is X = { a, b, c, d } with dissimilarities A X represented by a weighted directed graph. The dissimilarity from, e.g, a to b is A X ( a, b ) = 1 , which is different from the dissimilarity A X ( b, a ) = 7 from b to a . The smallest nontrivial networks contain two nodes p and q and two dissimilarities α and β as depicted in Fig. 3. The follo wing special networks appear often throughout our paper: consider the dissimilarity function A p,q with A p,q ( p, q ) = α and A p,q ( q , p ) = β for some α , β > 0 and define the two-node network ~ ∆ 2 ( α, β ) with parameters α and β as ~ ∆ 2 ( α, β ) := ( { p, q } , A p,q ) . (1) By a clustering of the set X we always mean a partition P X of X ; i.e., a collection of sets P X = { B 1 , . . . , B J } which are pairwise disjoint, B i ∩ B j = ∅ for i 6 = j , and are required to cov er X , ∪ J i =1 B i = X . The sets B 1 , B 2 , . . . B J are called the blocks or clusters of P X . W e define the power set P ( X ) of X as the set containing ev ery subset of X , thus B i ∈ P ( X ) for all i . An equiv alence relation ∼ on X is a binary relation such that for all x, x 0 , x 00 ∈ X we have that (1) x ∼ x , (2) x ∼ x 0 if and only if x 0 ∼ x , and (3) x ∼ x 0 and x 0 ∼ x 00 imply x ∼ x 00 . A partition P X = { B 1 , . . . , B J } of X always induces and is induced by an equiv alence relation ∼ P X on X where for all x, x 0 ∈ X we have that x ∼ P X x 0 if and only if x and x 0 belong to the same block B i for some i . In this paper we focus 5 on hierarchical clustering methods. The output of hierarchical clustering methods is not a single partition P X but a nested collection D X of partitions D X ( δ ) of X indexed by a resolution parameter δ ≥ 0 . In consistency with our previous notation, for a gi ven D X , we say that two nodes x and x 0 are equi valent at resolution δ ≥ 0 and write x ∼ D X ( δ ) x 0 if and only if nodes x and x 0 are in the same block of D X ( δ ) . The nested collection D X is termed a dendr ogram and is required to satisfy the following properties (cf. [13]): (D1) Boundary conditions. For δ = 0 the partition D X (0) clusters each x ∈ X into a separate singleton and for some δ 0 sufficiently large D X ( δ 0 ) clusters all elements of X into a single set, D X (0) = n { x } , x ∈ X o , D X ( δ 0 ) = n X o for some δ 0 > 0 . (2) (D2) Hierar chy . As δ increases clusters can be combined but not separated. I.e., for any δ 1 < δ 2 any pair of points x, x 0 for which x ∼ D X ( δ 1 ) x 0 must be x ∼ D X ( δ 2 ) x 0 . (D3) Right continuity . For all δ ≥ 0 , there exists > 0 such that D X ( δ ) = D X ( δ 0 ) for all δ 0 ∈ [ δ , δ + ] . The second boundary condition in (2) together with (D2) implies that we must have D X ( δ ) = X for all δ ≥ δ 0 . W e denote by [ x ] δ the equiv alence class to which the node x ∈ X belongs at resolution δ , i.e. [ x ] δ := { x 0 ∈ X x ∼ D X ( δ ) x 0 } . From requirement (D1) we must have that [ x ] 0 = { x } and [ x ] δ 0 = { X } for all x ∈ X . The interpretation of a dendrogram is that of a structure which yields different clusterings at dif ferent resolutions. At resolution δ = 0 each point is in a cluster of its o wn. As the resolution parameter δ increases, nodes start forming clusters. According to condition (D2), nodes become ever more clustered since once they join together in a cluster , they stay together in the same cluster for all larger resolutions. Eventually , the resolutions become coarse enough so that all nodes become members of the same cluster and stay that way as δ keeps increasing. A dendrogram can be represented as a rooted tree; see Fig. 2. Its root represents D X ( δ 0 ) with all nodes clustered together and the leav es represent D X (0) with each node separately clustered. Forks in the tree happen at resolutions δ at which the partitions become finer – or coarser if we mov e from leav es to root. Denoting by D the space of all dendrograms we define a hierarchical clustering method as a function H : N → D , (3) from the space of networks N to the space of dendrograms D such that the underlying space X is preserved. For the network N X = ( X , A X ) we denote by D X = H ( X , A X ) the output of clustering method H . In the description of hierarchical clustering methods H in general, and in those deriv ed on this paper in particular , the concepts of chain , chain cost , and minimum chain cost are important. Given a netw ork ( X, A X ) and x, x 0 ∈ X , a chain from x to x 0 is any order ed sequence of nodes in X , [ x = x 0 , x 1 , . . . , x l − 1 , x l = x 0 ] , (4) which starts at x and finishes at x 0 . W e will frequently use the notation C ( x, x 0 ) to denote one such chain. W e say that C ( x, x 0 ) links or connects x to x 0 . Given two chains C ( x, x 0 ) = [ x = x 0 , x 1 , ..., x l = x 0 ] and C ( x 0 , x 00 ) = [ x 0 = x 0 0 , x 0 1 , ..., x 0 l 0 = x 00 ] such that the end point x 0 of the first one coincides with the starting point of the second one we define the concatenated chain C ( x, x 0 ) ] C ( x 0 , x 00 ) as C ( x, x 0 ) ] C ( x 0 , x 00 ) := [ x = x 0 , . . . , x l = x 0 = x 0 0 , . . . , x 0 l 0 = x 00 ] . (5) It follows from (5) that the concatenation operation ] is associa- tiv e in that C ( x, x 0 ) ] C ( x 0 , x 00 ) ] C ( x 00 , x 000 ) = C ( x, x 0 ) ] [ C ( x 0 , x 00 ) ] C ( x 00 , x 000 )] . Observ e that the chain C ( x, x 0 ) = [ x = x 0 , x 1 , . . . , x l − 1 , x l = x 0 ] and its rev erse [ x 0 = x l , x l − 1 , . . . , x 1 , x 0 = x ] are different entities even if the inter- mediate hops are the same. The links of a chain are the edges connecting its consecutiv e nodes in the direction imposed by the chain. W e define the cost of a gi ven chain C ( x, x 0 ) = [ x = x 0 , . . . , x l = x 0 ] as max i | x i ∈ C ( x,x 0 ) A X ( x i , x i +1 ) , (6) i.e., the maximum dissimilarity encountered when trav ersing its links in order . The directed minimum chain cost ˜ u ∗ X ( x, x 0 ) be- tween x and x 0 is then defined as the minimum cost among all the chains connecting x to x 0 , ˜ u ∗ X ( x, x 0 ) := min C ( x,x 0 ) max i | x i ∈ C ( x,x 0 ) A X ( x i , x i +1 ) . (7) In asymmetric networks the minimum chain costs ˜ u ∗ X ( x, x 0 ) and ˜ u ∗ X ( x 0 , x ) are dif ferent in general b ut they are equal on symmetric networks. In this latter case, the costs ˜ u ∗ X ( x, x 0 ) = ˜ u ∗ X ( x 0 , x ) are instrumental in the definition of the single linkage dendrogram [13]. Indeed, for resolution δ , single linkage makes x and x 0 part of the same cluster if and only if they can be linked through a chain of cost not exceeding δ . Formally , the equiv alence classes at resolution δ in the single linkage dendrogram SL X ov er a symmetric network ( X , A X ) are defined by x ∼ SL X ( δ ) x 0 ⇐ ⇒ ˜ u ∗ X ( x, x 0 ) = ˜ u ∗ X ( x 0 , x ) ≤ δ. (8) Fig. 2 shows a finite metric space along with the corresponding single linkage dendrogram. For resolutions δ < 2 the dendrogram partitions are D X ( δ ) = { a } , { b } , { c } , { d } . For resolutions 2 ≤ δ < 4 nodes a and b get clustered together to yield D X ( δ ) = { a, b } , { c } , { d } . As we keep increasing the parameter δ , c and d also get clustered together yielding D X ( δ ) = { a, b } , { c, d } for resolutions 4 ≤ δ < 5 . For 5 ≤ δ all nodes are part of a single cluster, D X ( δ ) = { a, b, c, d } because we can build chains between any pair of nodes incurring maximum cost smaller than or equal to δ . W e further define a loop as a chain of the form C ( x, x ) for some x ∈ X such that C ( x, x ) contains at least one node other than x . Since a loop is a particular case of a chain, the cost of a loop is giv en by (6). Furthermore, consistently with (7), we define the minimum loop cost mlc ( X , A X ) of a network ( X , A X ) as the minimum across all possible loops of each individual loop cost, mlc ( X, A X ) := min x min C ( x,x ) max i | x i ∈ C ( x,x ) A X ( x i , x i +1 ) , (9) where, we recall, C ( x, x ) contains at least one node different from x . Another relev ant property of a network ( X , A X ) is the 6 a b c d 0 { a } , { b } , { c } , { d } 2 { a, b } , { c } , { d } 4 { a, b } , { c, d } 5 { a, b, c, d } a b c d 2 5 4 5 2 5 4 5 Fig. 2. Single linkage dendrogram for a symmetric network. Dendrograms are trees representing the outcome of hierarchical clustering algorithms. The single linkage dendrogram as defined by (8) for the network on the left is shown on the right. For resolutions δ < 2 each node is in a separate partition, for 2 ≤ δ < 4 nodes a and b form the cluster { a, b } , for 4 ≤ δ < 5 we add the cluster { c, d } , and for 5 ≤ δ all nodes are part of a single cluster. separation of the netw ork sep ( X , A X ) which we define as its minimum positi ve dissimilarity , sep ( X, A X ) := min x 6 = x 0 A X ( x, x 0 ) . (10) Notice that from (9) and (10) we must have sep ( X, A X ) ≤ mlc ( X, A X ) . (11) Further observe that in the particular case of networks with symmetric dissimilarities the two quantities coincide, i.e., sep ( X, A X ) = mlc ( X , A X ) , when A X ( x, x 0 ) = A X ( x 0 , x ) for all x, x 0 ∈ X . For example, the network in Fig. 1 has separation equal to 1 and minimum loop cost equal to 3 . When one restricts attention to networks ( X , A X ) having dissimilarities A X that conform to the definition of a finite metric space – i.e., dissimilarities A X are symmetric and satisfy the triangle inequality – it has been sho wn [13] that single linkage is the unique hierarchical clustering method satisfying axioms (A1)- (A2) in Section III plus a third axiom stating that clusters cannot form at resolutions smaller than the minimum distance between different points of the space. In the case of asymmetric networks the space of admissible methods is richer , as we demonstrate throughout this paper . I I I . A X I O M S O F V A L U E A N D T R A N S F O R M A T I O N T o study hierarchical clustering methods on asymmetric net- works we start from intuitiv e notions that we translate into the axioms of v alue and transformation discussed in this section. The Axiom of V alue is obtained from considering the two- node network ~ ∆ 2 ( α, β ) defined in (1) and depicted in Fig. 3. W e say that node x is able to influence node x 0 at resolution δ if the dissimilarity from x to x 0 is not greater than δ . In two- node networks, our intuition dictates that a cluster is formed if nodes p and q are able to influence each other . This implies that the output dendrogram should be such that p and q are part of the same cluster at resolutions δ ≥ max( α, β ) that allo w direct mutual influence. Con versely , we expect nodes p and q to be in separate clusters at resolutions 0 ≤ δ < max( α, β ) that do not allow for mutual influence. At resolutions δ < min( α, β ) there is no influence between the nodes and at resolutions min( α, β ) ≤ δ < max( α, β ) there is unilateral influence from one node ov er the other . In either of the latter two cases the nodes are different in nature. If we think of dissimilarities as, e.g., trust, it means one node is trustworthy whereas the other is not. If we think of the p q α β δ max( α, β ) p q ~ ∆ 2 ( α, β ) D p,q Fig. 3. Axiom of V alue. Nodes in a two-node network cluster at the minimum resolution at which both can influence each other . network as a Markov chain, at resolutions 0 ≤ δ < max( α, β ) the states are dif ferent singleton equiv alence classes – one of the states would be transient and the other one absorbent. Giv en that, according to (3), a hierarchical clustering method is a map H from networks to dendrograms, we formalize this intuition as the following requirement on the set of admissible maps: (A1) Axiom of V alue . The dendrogram D p,q = H ( ~ ∆ 2 ( α, β )) produced by H applied to the network ~ ∆ 2 ( α, β ) is such that D p,q ( δ ) = { p } , { q } for 0 ≤ δ < max( α, β ) and D p,q ( δ ) = { p, q } otherwise; see Fig. 3. Clustering nodes p and q together at resolution δ = max( α, β ) is some what arbitrary , as any monotone increasing function of max( α, β ) would be admissible. As a value claim, howe ver , it means that the clustering resolution parameter δ is expressed in the same units as the elements of the dissimilarity function. The second restriction on the space of allow able methods H formalizes our e xpectations for the beha vior of H when con- fronted with a transformation of the underlying space X and the dissimilarity function A X ; see Fig. 4. Consider networks N X = ( X, A X ) and N Y = ( Y , A Y ) and denote by D X = H ( X, A X ) and D Y = H ( Y , A Y ) the corresponding dendrogram outputs. If we map all the nodes of the network N X = ( X , A X ) into nodes of the network N Y = ( Y , A Y ) in such a way that no pairwise dissimilarity is increased we expect the latter network to be more clustered than the former at any given resolution. Intuiti vely , nodes in N Y are more capable of influencing each other, thus, clusters should be formed more easily . In terms of the respectiv e dendrograms we expect that nodes co-clustered at resolution δ in D X are mapped to nodes that are also co-clustered at this resolution in D Y . In order to formalize this notion, we introduce the following concept: giv en two networks N X = ( X , A X ) and N Y = ( Y , A Y ) , map φ : X → Y is called dissimilarity- 7 x 1 x 2 x 3 1 2 2 2 3 3 y 1 y 2 y 3 1 / 2 1 / 2 1 / 2 1 1 1 φ φ φ δ x 1 x 2 x 3 D X δ δ 0 y 1 y 2 y 3 D Y N X N Y Fig. 4. Axiom of Transformation. If the network N X can be mapped to the network N Y using a dissimilarity reducing map φ , then for ev ery resolution δ nodes clustered together in D X ( δ ) must also be clustered in D Y ( δ ) . E.g., since points x 1 and x 2 are clustered together at resolution δ 0 , their image through φ , i.e. y 1 = φ ( x 1 ) and y 2 = φ ( x 2 ) , must also be clustered together at this resolution because the map φ is dissimilarity reducing. r educing map if it holds that A X ( x, x 0 ) ≥ A Y ( φ ( x ) , φ ( x 0 )) for all x, x 0 ∈ X . The Axiom of T ransformation that we introduce ne xt is a formal statement of the intuition described abov e : (A2) Axiom of T ransformation. Consider any two networks N X = ( X, A X ) and N Y = ( Y , A Y ) and any dissimilarity- reducing map φ : X → Y . Then, the method H satisfies the axiom of transformation if the output dendrograms D X = H ( X , A X ) and D Y = H ( Y , A Y ) are such that x ∼ D X ( δ ) x 0 for some δ ≥ 0 implies that φ ( x ) ∼ D Y ( δ ) φ ( x 0 ) . W e say that a hierarchical clustering method H is admissible with respect to (A1) and (A2), or admissible for short, if it satisfies Axioms (A1) and (A2). Axiom (A1) states that units of the resolution parameter δ are the same units of the elements of the dissimilarity function. Axiom (A2) states that if we reduce dissimilarities, clusters may be combined but cannot be separated. These axioms are an adaptation of the axioms proposed in [11], [13], [14] for the case of finite metric spaces. A. Dendr ogr ams as ultrametrics Dendrograms are con venient graphical representations but oth- erwise cumbersome to handle. A mathematically more con venient representation is obtained when one identifies dendrograms with finite ultrametric spaces. An ultrametric defined on the space X is a metric function u X : X × X → R + that satisfies a stronger triangle inequality as we formally define next. Definition 1 Given a node set X an ultrametric u X is a non- ne gative function u X : X × X → R + satisfying the following pr operties: (i) Identity . The ultr ametric u X ( x, x 0 ) = 0 if and only if x = x 0 for all x, x 0 ∈ X . (ii) Symmetry . F or all pairs of points x, x 0 ∈ X it holds that u X ( x, x 0 ) = u X ( x 0 , x ) . (iii) Str ong triangle inequality . Given points x, x 0 , x 00 ∈ X the ultrametrics u X ( x, x 00 ) , u X ( x, x 0 ) , and u X ( x 0 , x 00 ) satisfy the str ong triangle inequality u X ( x, x 00 ) ≤ max u X ( x, x 0 ) , u X ( x 0 , x 00 ) . (12) Since (12) implies the usual triangle inequality u X ( x, x 00 ) ≤ u X ( x, x 0 ) + u X ( x 0 , x 00 ) for all x, x 0 , x 00 ∈ X , ultrametric spaces are particular cases of metric spaces. a b c d u X ( a, b ) = 2 u X ( c, d ) = 4 u X ( a/b, c/d ) = 5 Fig. 5. Equiv alence of dendrograms and ultrametrics. Giv en a dendrogram D X define distance u X ( x, x 0 ) := min δ ≥ 0 , x ∼ D X ( δ ) x 0 . This distance is an ultrametric because it satisfies the strong triangle inequality (12) and is symmetric. Our interest in ultrametrics stems from the f act that it is possible to establish a structure preserving bijectiv e mapping between dendrograms and ultrametrics as proved by the following construction and theorem; see also Fig. 5. Consider the map Ψ : D → U from the space of dendrograms to the space of networks endowed with ultrametrics, defined as follows: For a given dendrogram D X ov er the finite set X write Ψ( D X ) = ( X, u X ) , where we define u X ( x, x 0 ) for all x, x 0 ∈ X as the smallest resolution at which x and x 0 are clustered together u X ( x, x 0 ) := min n δ ≥ 0 , x ∼ D X ( δ ) x 0 o . (13) W e also consider the map Υ : U → D constructed as follows: for a giv en ultrametric u X on the finite set X and each δ ≥ 0 define the relation ∼ u X ( δ ) on X as x ∼ u X ( δ ) x 0 ⇐ ⇒ u X ( x, x 0 ) ≤ δ . (14) Further define D X ( δ ) := X mo d ∼ u X ( δ ) and Υ( X , u X ) := D X . Theorem 1 The maps Ψ : D → U and Υ : U → D ar e both well defined. Furthermore , Ψ ◦ Υ is the identity on U and Υ ◦ Ψ is the identity on D . The proof of this result can be found in [13], yet, for the reader’ s con venience, we present the proof here. Proof: First notice that the technical condition (D3) of dendro- grams ensures that the minimum in (13) exists rendering a well- defined function u X . Thus, to sho w that Ψ is a well-defined map, we must prove that u X is an ultrametric. In order to do this, we hav e to show the symmetry , identity , non negati vity and strong triangle inequality properties. Non negati vity follows 8 from the non negati vity of the resolution parameter δ in (13). Symmetry u X ( x, x 0 ) = u X ( x 0 , x ) for all x, x 0 ∈ X follo ws from the symmetry property of the equiv alence relation ∼ D X ( δ ) . The identity property u X ( x, x 0 ) = 0 ⇔ x = x 0 follows from reflexi vity of the equiv alence relation ∼ D X ( δ ) and the boundary condition (D1) on dendrograms. T o see that u X satisfies the strong triangle inequality in (12) consider points x , x 0 , and x 00 ∈ X such that the lo west resolution for which x ∼ D X ( δ ) x 00 is δ 1 and the smallest resolution for which x 0 ∼ D X ( δ ) x 00 is δ 2 . According to (13) we then hav e u X ( x, x 00 ) = δ 1 := min n δ ≥ 0 , x ∼ D X ( δ ) x 00 o , u X ( x 0 , x 00 ) = δ 2 := min n δ ≥ 0 , x 0 ∼ D X ( δ ) x 00 o . (15) Denote by δ 0 := max( δ 1 , δ 2 ) . Because the dendrogram is a nested set of partitions [cf. (D2)] it must be x ∼ D X ( δ 0 ) x 00 and x 0 ∼ D X ( δ 0 ) x 00 . Furthermore, being ∼ D X ( δ 0 ) an equiv a- lence relation it satisfies transitivity from where it follows that x ∼ D X ( δ 0 ) x 0 . Using (13) for x , x 0 we conclude that u X ( x, x 0 ) := min n δ ≥ 0 , x ∼ D X ( δ ) x 0 o ≤ δ 0 . (16) But now observe that by definition δ 0 := max( δ 1 , δ 2 ) . Substitute this expression in (16) and compare with (15) to write u X ( x, x 0 ) ≤ max( δ 1 , δ 2 ) = max u X ( x, x 00 ) , u X ( x 0 , x 00 ) . (17) Thus, u X satisfies the strong triangle inequality and is therefore an ultrametric, proving that the map Ψ is well-defined. For the con verse result, we need to sho w that Υ is a well-defined map. In order to do so, we first need to show that the relation ∼ u X ( δ ) as defined in (14) is an equi valence relation. Symmetry and refle xivity are implied by the symmetry and identity properties of the ultrametric u X , respectiv ely . T o see that ∼ u X ( δ ) is also transitiv e consider points x , x 0 , and x 00 ∈ X such that x ∼ u X ( δ ) x 00 and x 0 ∼ u X ( δ ) x 00 . Consequently , it follows from (14) that u X ( x, x 00 ) ≤ δ , u X ( x 0 , x 00 ) ≤ δ . (18) Further note that being u X an ultrametric it satisfies the strong triangle inequality in (12). Combining this with (18) yields u X ( x, x 0 ) ≤ max u X ( x, x 0 ) , u X ( x 0 , x 00 ) ≤ δ , (19) from where it follows that x ∼ u X ( δ ) x 0 [cf. (14)]. Thus, ∼ u X ( δ ) is an equiv alence relation which, as such, induces a partition D X ( δ ) := { X mo d ∼ u X ( δ ) } of the set X for every δ ≥ 0 . Now , we need to show that D X is a well-defined dendrogram, i.e., we need to show that the partitions D X ( δ ) for δ ≥ 0 satisfy (D1)-(D3). The boundary conditions (D1) are satisfied from the identity property of u X and the fact that the maximum value of u X in the finite set X must be upper bounded by some δ 0 . T o see that partitions are nested in the sense of condition (D2) notice that for δ 1 < δ 2 , the condition u X ( x, x 0 ) ≤ δ 1 implies u X ( x, x 0 ) ≤ δ 2 . This latter inequality substituted in the definition in (14) leads to the conclusion that x ∼ u X ( δ 1 ) x 0 implies x ∼ u X ( δ 2 ) x 0 for δ 1 ≤ δ 2 as in condition (D2). Finally , to see that the technical condition (D3) is satisfied, for each δ ≥ 0 such that D X ( δ ) 6 = { X } we may define ( δ ) as any positiv e real satisfying 0 < ( δ ) < min x,x 0 ∈ X u X ( x,x 0 ) >δ u X ( x, x 0 ) − δ , (20) where the finiteness of X ensures that ( δ ) is well-defined. Hence, (14) guarantees that the equi valence relation ∼ u X ( δ ) is the same as ∼ u X ( t ) for t ∈ [ δ, δ + ( δ )] . Consequently , the partition D X ( t ) induced by the equiv alence relation is the same for this range of resolutions, proving (D3) for these resolutions. For resolutions δ such that D X ( δ ) = { X } , (D3) is trivially satisfied since the dendrogram remains unchanged for all larger resolutions, proving that Υ is well-defined. In order to conclude the proof, we need to show that Ψ ◦ Υ and Υ ◦ Ψ are the identities on U and D , respectively . T o see why the former is true, pick an y ultrametric network ( X , u X ) and consider an arbitrary pair of nodes x, x 0 ∈ X such that u X ( x, x 0 ) = δ 0 . Also, consider the ultrametric network Ψ ◦ Υ( X , u X ) := ( X, u ∗ X ) . From (14), in the dendrogram Υ( X , u X ) the nodes x and x 0 are not mer ged for resolutions δ < δ 0 and at resolution δ = δ 0 both nodes merge into one single cluster . When we apply Ψ to the resulting dendrogram, from (13) we obtain u ∗ X ( x, x 0 ) = δ 0 . Since x, x 0 ∈ X were chosen arbitrarily , we hav e that u X = u ∗ X , showing that Ψ ◦ Υ is the identity on U . A similar argument shows that Υ ◦ Ψ is the identity on D . Giv en the equiv alence between dendrograms and ultrametrics established by Theorem 1 we can regard hierarchical clustering methods H as inducing ultrametrics in node spaces X based on dissimilarity functions A X . Howe ver , ultrametrics are particular cases of dissimilarity functions. Thus, we can reinterpret the method H as a map H : N → U (21) mapping the space of networks to the space U of networks endowed with ultrametrics. For all x, x 0 ∈ X , the ultrametric value u X ( x, x 0 ) induced by H is the minimum resolution at which x and x 0 are co-clustered by H . Observe that the outcome of a hierarchical clustering method defines an ultrametric in the space X ev en when the original data does not correspond to a metric, as is the case of asymmetric netw orks. At any rate, a simple observation with important consequences [13] for the study of the stability of methods, is that U ⊂ N . W e say that two methods H 1 and H 2 are equivalent , and we write H 1 ≡ H 2 , if and only if H 1 ( N ) = H 2 ( N ) , (22) for all N ∈ N . A further consequence of the equiv alence provided by Theorem 1 is that we can no w rewrite axioms (A1)-(A2) in a manner that refers to properties of the output ultrametrics. W e then say that a hierarchical clustering method H is admissible if and only if it satisfies the follo wing two axioms: (A1) Axiom of V alue. The ultrametric output ( { p, q } , u p,q ) = H ( ~ ∆ 2 ( α, β )) produced by H applied to the two-node network ~ ∆ 2 ( α, β ) satisfies u p,q ( p, q ) = max( α, β ) . (23) (A2) Axiom of T ransformation. Consider two networks N X = ( X, A X ) and N Y = ( Y , A Y ) and a dissimilarity-reducing map φ : X → Y , i.e. a map φ such that for all x, x 0 ∈ X it holds that A X ( x, x 0 ) ≥ A Y ( φ ( x ) , φ ( x 0 )) . Then, for all x, x 0 ∈ X , the output ultrametrics ( X , u X ) = H ( X , A X ) and ( Y , u Y ) = H ( Y , A Y ) satisfy u X ( x, x 0 ) ≥ u Y ( φ ( x ) , φ ( x 0 )) . (24) 9 a b c 1 / 2 1 1 2 3 3 δ a b c a b c δ = 1 Fig. 6. Property of Influence. No clusters can be formed at resolutions for which it is impossible to form influence loops. Here, the loop of minimum cost is formed by circling the network clockwise where the maximum cost encountered is A X ( b, c ) = A X ( c, a ) = 1 . The top dendrogram is an in valid outcome because it has a and b clustering together at resolution δ < 1 . The bottom dendrogram satisfies the Property of Influence (P1), [cf (26)]. The axioms in Section III restrict admissible methods H by placing conditions on the dendrograms that the methods may produce. The axioms here do the same by imposing conditions on the ultrametrics produced by the methods. Axiom (A1) implies that the units of the dissimilarity function A X and the ultrametric u X are the same. Axiom (A2) implies that not increasing any dissimilarity in the network cannot result in an increase of the out- put ultrametric between some pair of nodes. Despite the somewhat different interpretations, by virtue of Theorem 1, the requirements here imposed on the output ultrametrics are equiv alent to the requirements imposed on the output dendrograms in the axiom statements introduced earlier in Section III. For the particular case of symmetric networks ( X , A X ) we defined the single linkage dendrogram SL X through the equiv- alence relations in (8). According to Theorem 1 this dendrogram is equiv alent to an ultrametric space that we denote by ( X , u SL X ) . Comparing (8) with (14) we conclude, as is well known [13], that the single linkage ultrametric u SL X in symmetric networks is given by u SL X ( x, x 0 ) = ˜ u ∗ X ( x, x 0 ) = ˜ u ∗ X ( x 0 , x ) (25) = min C ( x,x 0 ) max i | x i ∈ C ( x,x 0 ) A X ( x i , x i +1 ) , where we also used (7) to write the last equality . W e read (25) as saying that the single linkage ultrametric u SL X ( x, x 0 ) between x and x 0 is the minimum chain cost ˜ u ∗ X ( x, x 0 ) = ˜ u ∗ X ( x 0 , x ) among all chains linking x to x 0 . I V . I N FL U E N C E M O D A L I T I E S The axiom of value states that in order for tw o nodes to belong to the same cluster the y have to be able to exercise mutual influence on each other . When we consider a netw ork with more than two nodes the concept of mutual influence is more difficult because it is possible to have direct influence as well as indirect chains of influence through other nodes. In this section we introduce two intuitive notions of mutual influence in networks of arbitrary size and show that they can be deri ved from the axioms of value and transformation. Besides their intrinsic value, these influence modalities are important for later de velopments in this paper; see, e.g. the proof of Theorem 4. Consider first the intuitiv e notion that for two nodes to be part of a cluster there has to be a way for each of them to exercise influence on the other , either directly or indirectly . T o formalize 1 2 . . . . . . n α α β β Fig. 7. Canonical network ~ ∆ n ( α, β ) for Extended Axiom of V alue. Edges from a node to another node identified with a higher number have weight α , whereas edges going to nodes identified with lower numbers have weight β . All admissible methods H cluster the n nodes together at resolution max( α, β ) . this idea, recall the concept of minimum loop cost (9) which we ex emplify in Fig. 6. For this network, the loops [ a, b, a ] and [ b, a, b ] hav e maximum cost 2 corresponding to the link ( b, a ) in both cases. All other two-node loops ha ve cost 3 . All of the counterclockwise loops, e.g., [ a, c, b, a ] , have cost 3 and any of the clockwise loops have cost 1 . Thus, the minimum loop cost of this network is mlc ( X , A X ) = 1 . For resolutions 0 ≤ δ < mlc ( X , A X ) it is impossible to find chains of mutual influence with maximum cost smaller than δ between any pair of points. Indeed, suppose we can link x to x 0 with a chain of maximum cost smaller than δ , and also link x 0 to x with a chain having the same property . Then, we can form a loop with cost smaller than δ by concatenating these two chains. Thus, the intuitiv e notion that clusters cannot form at resolutions for which it is impossible to observe mutual influence can be translated into the requirement that no clusters can be formed at resolutions 0 ≤ δ < mlc ( X , A X ) . In terms of ultrametrics, this implies that it must be u X ( x, x 0 ) ≥ mlc ( X , A X ) for any pair of different nodes x, x 0 ∈ X as we formally state next: (P1) Property of Influence. For any network N X = ( X, A X ) the output ultrametric ( X , u X ) = H ( X , A X ) corresponding to the application of hierarchical clustering method H is such that the ultrametric u X ( x, x 0 ) between an y two distinct points x and x 0 cannot be smaller than the minimum loop cost mlc ( X , A X ) [cf. (9)] of the network u X ( x, x 0 ) ≥ mlc ( X, A X ) for all x 6 = x 0 . (26) Since for the network in Fig. 6 the minimum loop cost is mlc ( X, A X ) = 1 , then the Property of Influence implies that u X ( x, x 0 ) ≥ mlc ( X, A X ) = 1 for any pair of nodes x 6 = x 0 . Equiv alently , the output dendrogram is such that for resolutions δ < mlc ( X, A X ) = 1 each node is in its own block. Observe that (P1) does not imply that a cluster with more than one node is formed at resolution δ = mlc ( X , A X ) but states that achieving this minimum resolution is a necessary condition for the formation of clusters. A second intuitive statement about influence in networks of arbitrary size comes in the form of the Extended Axiom of V alue . T o introduce this concept define a family of canonical asymmetric networks ~ ∆ n ( α, β ) := ( { 1 , . . . , n } , A n,α,β ) , (27) with n ∈ N and α, β > 0 , where the underlying set { 1 , . . . , n } is the set of the first n natural numbers and the dissimilarity value A n,α,β ( i, j ) between points i and j depends on whether i > j or not; see Fig. 7. For points i > j we let A n,α,β ( i, j ) = α whereas 10 for points i < j we hav e A n,α,β ( i, j ) = β . Or, in matrix form, A n,α,β := 0 α α α · · · α β 0 α α · · · α β β 0 α · · · α . . . . . . . . . . . . . . . . . . β β β · · · 0 α β β β · · · β 0 . (28) In the network ~ ∆ n ( α, β ) all pairs of nodes hav e dissimilarities α in one direction and β in the other direction. This symme- try entails that all nodes should cluster together at the same resolution, and the requirement of mutual influence along with consistency with the Axiom of V alue entails that this resolution should be max( α, β ) . Before formalizing this definition notice that having clustering outcomes that depend on the ordering of the nodes in the space { 1 , . . . , n } is not desirable. Thus, we consider a permutation Π = { π 1 , π 2 , . . . , π n } of { 1 , 2 , . . . , n } and the action Π( A ) of Π on a n × n matrix A , which we define by (Π( A )) i,j = A π i ,π j for all i and j . Define now the network ~ ∆ n ( α, β , Π) := ( { 1 , . . . , n } , Π( A n,α,β )) with underlying set { 1 , . . . , n } and dissimilarity matrix giv en Π( A n,α,β ) . W ith this definition we can now formally introduce the Extended Axiom of V alue as follows: (A1’) Extended Axiom of V alue. Consider the network ~ ∆ n ( α, β , Π) = ( { 1 , . . . , n } , Π( A n,α,β )) . Then, for all indices n ∈ N , constants α, β > 0 , and permutations Π of { 1 , . . . , n } , the outcome ( { 1 , . . . , n } , u ) = H ~ ∆ n ( α, β , Π) of hierarchical clustering method H applied to the network ~ ∆ n ( α, β , Π) satisfies u ( i, j ) = max( α, β ) , (29) for all pairs of nodes i 6 = j . Observe that the Axiom of V alue (A1) is subsumed into the Ex- tended Axiom of V alue for n = 2 . Further note that the minimum loop cost of the canonical network ~ ∆ n ( α, β ) is mlc ~ ∆ n ( α, β ) = max( α, β ) because forming a loop requires traversing a link while moving right and a link while moving left at least once in Fig. 7. Since a permutation of indices does not alter the minimum loop cost of the network we have that mlc ~ ∆ n ( α, β , Π) = mlc ~ ∆ n ( α, β ) = max( α, β ) . (30) By the Property of Influence (P1) it follo ws from (30) and (26) that for the network ~ ∆ n ( α, β , Π) we must have u ( i, j ) ≥ mlc ( ~ ∆ n ( α, β )) = max( α, β ) for i 6 = j . By the Extended Axiom of V alue (A1’) we have u ( i, j ) = max( α, β ) for i 6 = j , which means that (A1’) and (P1) are compatible requirements. W e can then conceiv e of two alternative axiomatic formulations where admissible methods are required to abide by the Axiom of T ransformation (A2), the Property of Influence (P1), and either the (regular) Axiom of V alue (A1) or the Extended Axiom of V alue (A1’) – Axiom (A1) and (P1) are compatible because (A1) is a particular case of (A1’) which we already ar gued is compatible with (P1). W e will see in the following section that these two alternativ e axiomatic formulations are equi valent to each other in the sense that a clustering method satisfies one set of axioms if and only if it satisfies the other . W e further show that (P1) and (A1’) are implied by (A1) and (A2). As a consequence, it follows that both alternative axiomatic formulations are equi valent to simply requiring v alidity of axioms (A1) and (A2). A. Equivalent Axiomatic F ormulations W e be gin by stating the equiv alence between admissibility with respect to (A1)-(A2) and (A1’)-(A2). A theorem stating that methods admissible with respect to (A1’) and (A2) satisfy the Property of Influence (P1) is presented ne xt to conclude that (A1)- (A2) imply (P1) as a consequence. Theorem 2 Assume the hierar chical clustering method H satis- fies the Axiom of T ransformation (A2). Then, H satisfies the Axiom of V alue (A1) if and only if it satisfies the Extended Axiom of V alue (A1’). In proving Theorem 2, we make use of the follo wing lemma which proves that, gi ven a network, if the directed minimum chain cost ˜ u ∗ X ( x, x 0 ) ≥ δ between x, x 0 ∈ X is at least δ it is possible to find a network partition separating x and x 0 such that the dissimilarities between points in dif ferent partitions are bounded below by δ . Lemma 1 Let N = ( X, A X ) be any network and δ any positive constant. Suppose that x, x 0 ∈ X are such that their associated minimum chain cost [cf. (7) ] satisfies ˜ u ∗ X ( x, x 0 ) ≥ δ . (31) Then, ther e exists a partition P δ ( x, x 0 ) = { B δ ( x ) , B δ ( x 0 ) } of the node space X into blocks B δ ( x ) and B δ ( x 0 ) with x ∈ B δ ( x ) and x 0 ∈ B δ ( x 0 ) such that for all points b ∈ B δ ( x ) and b 0 ∈ B δ ( x 0 ) A X ( b, b 0 ) ≥ δ . (32) Proof: W e prove this result by contradiction. If a partition P δ ( x, x 0 ) = { B δ ( x ) , B δ ( x 0 ) } with x ∈ B δ ( x ) and x 0 ∈ B δ ( x ) satisfying (32) does not exist for all pairs of points x, x 0 ∈ X satisfying (31), then there is at least one pair of nodes x, x 0 ∈ X satisfying (31) such that for all partitions of X into two blocks P = { B , B 0 } with x ∈ B and x 0 ∈ B 0 we can find at least a pair of elements b P ∈ B and b 0 P ∈ B 0 for which A X ( b P , b 0 P ) < δ . (33) Begin by considering the partition P 1 = { B 1 , B 0 1 } where B 1 = { x } and B 0 1 = X \{ x } . Since (33) is true for all partitions having x ∈ B and x 0 ∈ B 0 and x is the unique element of B 1 , there must exist a node b 0 P 1 ∈ B 0 1 such that A X ( x, b 0 P 1 ) < δ . (34) Hence, the chain C ( x, b 0 P 1 ) = [ x, b 0 P 1 ] composed of these two nodes has cost smaller than δ . Moreover , since ˜ u ∗ X ( x, b 0 P 1 ) repre- sents the minimum cost among all chains C ( x, b 0 P 1 ) linking x to b 0 P 1 , we can assert that ˜ u ∗ X ( x, b 0 P 1 ) ≤ A X ( x, b 0 P 1 ) < δ . (35) Consider no w the partition P 2 = { B 2 , B 0 2 } where B 2 = { x, b 0 P 1 } and B 0 2 = X \ B 2 . From (33), there must exist a node b 0 P 2 ∈ B 0 2 that satisfies at least one of the two following conditions A X ( x, b 0 P 2 ) < δ , (36) A X ( b 0 P 1 , b 0 P 2 ) < δ . (37) If (36) is true, the chain C ( x, b 0 P 2 ) = [ x, b 0 P 2 ] has cost smaller than δ . If (37) is true, we combine the dissimilarity bound with the one in (34) to conclude that the chain C ( x, b 0 P 2 ) = [ x, b 0 P 1 , b 0 P 2 ] has 11 cost smaller than δ . In either case we conclude that there exists a chain C ( x, b 0 P 2 ) linking x to b 0 P 2 whose cost is smaller than δ . Therefore, the minimum chain cost must satisfy ˜ u ∗ X ( x, b 0 P 2 ) < δ . (38) Repeat the process by considering the partition P 3 with B 3 = { x, b 0 P 1 , b 0 P 2 } and B 0 3 = X \ B 3 . As we did in arguing (36)-(37) it must follo w from (33) that there exists a point b 0 P 3 such that at least one of the dissimilarities A X ( x, b 0 P 3 ) , A X ( b 0 P 1 , b 0 P 3 ) , or A X ( b 0 P 2 , b 0 P 3 ) is smaller than δ . This observation implies that at least one of the chains [ x, b 0 P 3 ] , [ x, b 0 P 1 , b 0 P 3 ] , [ x, b 0 P 2 , b 0 P 3 ] , or [ x, b 0 P 1 , b 0 P 2 , b 0 P 3 ] has cost smaller than δ from where it follo ws ˜ u ∗ X ( x, b 0 P 3 ) < δ . (39) This recursi ve construction can be repeated n − 1 times to obtain partitions P 1 , P 2 , ..., P n − 1 and corresponding nodes b 0 P 1 , b 0 P 2 , . . . , b 0 P n − 1 such that the minimum chain cost satisfies ˜ u ∗ X ( x, b 0 P i ) < δ , for all i. (40) Observe that nodes b 0 P i are distinct by construction and distinct from x . Since there are n nodes in the network it must be that x 0 = b 0 P k for some i ∈ { 1 , . . . , n − 1 } . It follows from (40) that ˜ u ∗ X ( x, x 0 ) < δ . (41) This is a contradiction because x, x 0 ∈ X were assumed to satisfy (31). Thus, the assumption that (33) is true for all partitions is incorrect. Hence, the claim that there is a partition P δ ( x, x 0 ) = { B δ ( x ) , B δ ( x 0 ) } satisfying (32) must be true. Proof of Theorem 2: T o prov e that (A1)-(A2) imply (A1’)-(A2) let H be a method that satisfies (A1) and (A2) and denote by ( { 1 , 2 , . . . , n } , u n,α,β ) = H ( ~ ∆ n ( α, β , Π)) the output ultrametric resulting of applying H to the network ~ ∆ n ( α, β , Π) considered in the Extended Axiom of V alue (A1’). W e want to prove that (A1’) is satisfied which means that we hav e to sho w that for all indices n ∈ N , constants α, β > 0 , permutations Π of { 1 , . . . , n } , and points i 6 = j , we hav e u n,α,β ( i, j ) = max( α , β ) . W e will do so by sho wing both u n,α,β ( i, j ) ≤ max( α , β ) , (42) u n,α,β ( i, j ) ≥ max( α , β ) , (43) for all n ∈ N , α, β > 0 , Π , and i 6 = j . T o prov e (42) define a symmetric two-node network ~ ∆ 2 (max( α, β ) , max( α, β )) = ( { p, q } , A p,q ) where A p,q ( p, q ) = A p,q ( q , p ) = max( α, β ) and denote by { p, q } , u p,q = H ( ~ ∆ 2 (max( α, β ) , max( α, β ))) the outcome of method H when applied to ~ ∆ 2 (max( α, β ) , max( α, β )) . Since the method H abides by (A1), u p,q ( p, q ) = max max( α, β ) , max( α, β ) = max( α, β ) . (44) Consider now the map φ i,j : { p, q } → { 1 , . . . , n } from the two- node network ~ ∆ 2 (max( α, β ) , max( α, β )) to the permuted canon- ical network ~ ∆ n ( α, β , Π) where φ i,j ( p ) = i and φ i,j ( q ) = j . Since dissimilarities in ~ ∆ n ( α, β , Π) are either α or β and the dissimilarities in ~ ∆ 2 (max( α, β ) , max( α, β )) are max( α, β ) it follows that the map φ i,j is dissimilarity reducing regardless of the particular v alues of i and j . Since the method H was assumed to satisfy (A2) as well, we must have u p,q ( p, q ) ≥ u n,α,β φ i,j ( p ) , φ i,j ( q ) = u n,α,β ( i, j ) . (45) The inequality in (42) follo ws form substituting (44) into (45). In order to show inequality (43), pick tw o arbitrary distinct nodes i, j ∈ { 1 , . . . , n } in the node set of ~ ∆ n ( α, β , Π) . Denote by C ( i, j ) and C ( j, i ) two minimizing chains in the definition (7) of the directed minimum chain costs ˜ u ∗ n,α,β ( i, j ) and ˜ u ∗ n,α,β ( j, i ) respectiv ely . Observe that at least one of the following two inequalities must be true ˜ u ∗ n,α,β ( i, j ) ≥ max( α, β ) , (46) ˜ u ∗ n,α,β ( j, i ) ≥ max( α, β ) . (47) Indeed, if both (46) and (47) were f alse, the concatenation of C ( i, j ) and C ( j, i ) would form a loop C ( i, i ) = C ( i, j ) ] C ( j, i ) of cost strictly less than max( α, β ) . This cannot be true because max( α, β ) is the minimum loop cost of the network ~ ∆ n ( α, β , Π) as we already sho wed in (30). W ithout loss of generality assume (46) is true and consider δ = max( α , β ) . By Lemma 1 we are therefore guaranteed to find a partition of the node set { 1 , . . . , n } into two blocks B δ ( i ) and B δ ( j ) with i ∈ B δ ( i ) and j ∈ B δ ( j ) such that for all b ∈ B δ ( i ) and b 0 ∈ B δ ( j ) it holds that Π( A n,α,β )( b, b 0 ) ≥ δ = max( α, β ) . (48) Define a two-node network ~ ∆ 2 (max( α, β ) , min( α, β )) = ( { r , s } , A r,s ) where A r,s ( r , s ) = max( α, β ) and A r,s ( s, r ) = min( α, β ) and denote by ( { r, s } , u r,s ) = H ( ~ ∆ 2 (max( α, β ) , min( α, β ))) . Since the method H satisfies (A1) we must hav e u r,s ( r , s ) = max max( α, β ) , min( α, β ) = max( α, β ) . (49) Consider the map φ 0 i,j : { 1 , . . . , n } → { r, s } such that φ 0 i,j ( b ) = r for all b ∈ B δ ( i ) and φ 0 i,j ( b 0 ) = s for all b 0 ∈ B δ ( j ) . The map φ 0 i,j is dissimilarity reducing because Π( A n,α,β )( k , l ) ≥ A r,s ( φ 0 i,j ( k ) , φ 0 i,j ( l )) , (50) for all k , l ∈ { 1 , . . . , n } . T o see the validity of (50) consider three different possible cases. If k and l belong both to the same block, i.e., either k , l ∈ B δ ( i ) or k , l ∈ B δ ( j ) , then φ 0 i,j ( k ) = φ 0 i,j ( l ) and A r,s ( φ 0 i,j ( k ) , φ 0 i,j ( l )) = 0 which cannot exceed the nonnegati ve Π( A n,α,β )( k , l ) . If k ∈ B δ ( j ) and l ∈ B δ ( i ) it holds that A r,s ( φ 0 i,j ( k ) , φ 0 i,j ( l )) = A r,s ( s, r ) = min( α, β ) which cannot ex- ceed Π( A n,α,β )( k , l ) which is either equal to α or β . If k ∈ B δ ( i ) and l ∈ B δ ( j ) , then we have A r,s ( φ 0 i,j ( k ) , φ 0 i,j ( l )) = A r,s ( r , s ) = max( α, β ) but we also have Π( A n,α,β )( k , l ) = max( α, β ) as it follows by taking b = k and b 0 = l in (48). Since H satisfies the Axiom of Transformation (A2) and the map φ 0 i,j is dissimilarity reducing we must hav e u n,α,β ( i, j ) ≥ u r,s φ 0 i,j ( i ) , φ 0 i,j ( j ) = u r,s ( r , s ) . (51) Substituting (49) in (51) we obtain the inequality (43). Combining this result with the validity of (42), it follows that u n,α,β ( i, j ) = max( α, β ) for all n ∈ N , α, β > 0 , Π , and i 6 = j . Thus, admissibility with respect to (A1)-(A2) implies admissibility with respect to (A1’)-(A2). That admissibility with respect to (A1’)- (A2) implies admissibility with respect to (A1)-(A2) is immediate because (A1) is a particular case of (A1’). Hence, if a method satisfies axioms (A1’) and (A2) it must satisfy (A1) and (A2). The stated equi valence between admissibility with respect to (A1)- (A2) and (A1’)-(A2) follo ws. 12 The Axiom of Extended V alue (A1’) is stronger than the (regular) Axiom of V alue (A1). Howe ver , Theorem 2 sho ws that when considered together with the Axiom of T ransformation (A2), both axioms of v alue are equi v alent in the restrictions they impose in the set of admissible clustering methods H . In the following theorem we sho w that the Property of Influence (P1) can be deriv ed from axioms (A1’) and (A2). Theorem 3 If a clustering method H satisfies the axioms of extended value (A1’) and transformation (A2) then it satisfies the Pr operty of Influence (P1). The following lemma is instrumental towards the proof of Theorem 3. Lemma 2 Let N = ( X, A X ) be an arbitrary network with n nodes and ~ ∆ n ( α, β ) = ( { 1 , . . . , n } , A n,α,β ) be the canonical network in (28) with 0 < α ≤ sep ( X , A X ) [cf. (10) ] and β = mlc ( X, A X ) [cf. (9) ]. Then, ther e exists a bijective map φ : X → { 1 , . . . , n } such that A X ( x, x 0 ) ≥ A n,α,β ( φ ( x ) , φ ( x 0 )) , (52) for all x, x 0 ∈ X . Proof: T o construct the map φ consider the function P : X → P ( X ) from the node set X to its power set P ( X ) such that P ( x ) := { x 0 ∈ X | x 0 6 = x , A X ( x 0 , x ) < β } , (53) for all x ∈ X . Ha ving r ∈ P ( s ) for some r, s ∈ X implies that A X ( r , s ) < β = mlc ( X , A X ) . An important observation is that we must have a node x ∈ X whose P -image is empty . Otherwise, pick a node x n ∈ X and construct the chain [ x 0 , x 1 , . . . , x n ] where the i th element of the chain x i − 1 is in the P -image of x i . From the definition of the map P it follows that all dissimilarities along this chain satisfy A X ( x i − 1 , x i ) < β = mlc ( X , A X ) . But since the chain [ x 0 , x 1 , . . . , x n ] contains n + 1 elements, at least one node must be repeated. Hence, we have found a loop for which all dissimilarities are bounded abov e by β = mlc ( X, A X ) , which is impossible because it contradicts the definition of the minimum loop cost in (9). W e can then find a node x i 1 for which P ( x i 1 ) = ∅ . Fix φ ( x i 1 ) = 1 . Select now a node x i 2 6 = x i 1 whose P -image is either { x i 1 } or ∅ , which we write jointly as P ( x i 2 ) ⊆ { x i 1 } . Such a node must exist, otherwise, pick a node x n − 1 ∈ X \{ x i 1 } and construct the chain [ x 0 , x 1 , . . . , x n − 1 ] where x i − 1 ∈ P ( x i ) \{ x i 1 } , i.e. x i − 1 is in the P -image of x i and x i − 1 6 = { x i 1 } . Since the chain [ x 0 , x 1 , ..., x n − 1 ] contains n elements from the set X \{ x i 1 } of cardinality n − 1 , at least one node must be repeated. Hence, we hav e found a loop where all dissimilarities between consecutive nodes satisfy A X ( x i − 1 , x i ) < β = mlc ( X, A X ) , contradicting the definition of minimum loop cost. W e can then find a node x i 2 6 = x i 1 for which P ( x i 2 ) ⊆ { x i 1 } . Fix φ ( x i 2 ) = 2 . Repeat this process k times so that at step k we ha ve φ ( x i k ) = k for a node x i k 6∈ { x i 1 , x i 2 , . . . x i k − 1 } whose P-image is a subset of the nodes already picked, that is P ( x i k ) ⊆ { x i 1 , x i 2 , . . . x i k − 1 } . (54) This node must exist, otherwise, we could start with a node x n − k +1 ∈ X \{ x i 1 , x i 2 , . . . x i k − 1 } and construct a chain [ x 0 , x 1 , . . . , x n − k +1 ] where x i − 1 ∈ P ( x i ) \{ x i 1 , x i 2 , . . . x i k − 1 } and arri ve to the same contradiction as for the case k = 2 . Since all the nodes x i k are dif ferent, the map φ with φ ( x i k ) = k is bijectiv e. By construction, φ is such that for all l > k , x i l / ∈ P ( x i k ) . From (53), this implies that the dissimilarity from x i l to x i k must satisfy A X ( x i l , x i k ) ≥ β , for all l > k . (55) Moreov er , from the definition of the canonical matrix A n,α,β in (28) we hav e that for l > k A n,α,β ( φ ( x i l ) , φ ( x i k )) = A n,α,β ( l, k ) = β . (56) By comparing (56) with (55) we conclude that (52) is true for all points with φ ( x ) > φ ( x 0 ) . When φ ( x ) < φ ( x 0 ) , we hav e A n,α,β ( φ ( x ) , φ ( x 0 )) = α which was assumed to be bounded abov e by the separation of the network ( X , A X ) , thus, A n,α,β ( φ ( x ) , φ ( x 0 )) is not greater than any positiv e dissimilarity in the range of A X . Proof of Theorem 3: Consider a given arbitrary network N = ( X, A X ) with X = { x 1 , x 2 , ..., x n } and denote by ( X , u X ) = H ( X , A X ) the output of applying the clustering method H to the network N . The method H is kno wn to satisfy (A1’) and (A2) and we want to show that it satisfies (P1) for which we need to show that u X ( x, x 0 ) ≥ mlc ( X, A X ) for all x 6 = x 0 [cf. (26)]. Consider the canonical network ~ ∆ n ( α, β ) = ( { 1 , . . . , n } , A n,α,β ) in (28) with β = mlc ( X , A X ) being the minimum loop cost of the network N [cf. (9)] and α > 0 a constant not exceeding the separation of the network (10). Thus, we have α ≤ sep ( X, A X ) ≤ mlc ( X , A X ) = β . Note that networks N and ~ ∆ n ( α, β ) have equal number of nodes. Denote by ( { 1 , . . . , n } , u α,β ) = H ( ~ ∆ n ( α, β )) the ultrametric space obtained when we apply the clustering method H to the network ~ ∆ n ( α, β ) . Since H satisfies the Extended Axiom of V alue (A1’), then for all indices i, j ∈ { 1 , . . . , n } with i 6 = j we have u α,β ( i, j ) = max( α, β ) = β = mlc ( X , A X ) . (57) Further , focus on the bijectiv e dissimilarity reducing map consid- ered in Lemma 2 and notice that since the method H satisfies the Axiom of T ransformation (A2) it follows that for all x, x 0 ∈ X u X ( x, x 0 ) ≥ u α,β ( φ ( x ) , φ ( x 0 )) . (58) Since the equality in (57) is true for all i 6 = j and since all points x 6 = x 0 are mapped to points φ ( x ) 6 = φ ( x 0 ) because φ is bijective, (58) implies u X ( x, x 0 ) ≥ β = mlc ( X , A X ) , (59) for all distinct x, x 0 ∈ X . This is the definition of the Property of Influence (P1). The fact that (P1) is implied by (A1’) and (A2) as claimed by Theorem 3 implies that adding (P1) as a third axiom on top of these two is moot. Since we hav e already established in Theorem 2 that (A1) and (A2) yield the same space of admissible methods as (A1’) and (A2) we can conclude as a corollary of theorems 2 and 3 that (P1) is also satisfied by all methods H that satisfy (A1) and (A2). Corollary 1 If a given clustering method H satisfies the axioms of value (A1) and transformation (A2), then it also satisfies the Pr operty of Influence (P1). Proof: If a clustering method H satisfies (A1) and (A2), then by Theorem 2 it must satisfy (A1’) and (A2). If the latter is true, by Theorem 3 method H must satisfy property (P1). 13 x x 1 . . . . . . x l − 1 x 0 A X ( x, x 1 ) A X ( x 1 , x 2 ) A X ( x l − 2 , x l − 1 ) A X ( x l − 1 , x 0 ) A X ( x 1 , x ) A X ( x 2 , x 1 ) A X ( x l − 1 , x l − 2 ) A X ( x 0 , x l − 1 ) Fig. 8. Reciprocal clustering. Nodes x and x 0 are clustered together at resolution δ if they can be joined with a (reciprocal) chain whose maximum dissimilarity is smaller than or equal to δ in both directions [cf. (61)]. Of all methods that satisfy the axioms of value and transformation, reciprocal clustering yields the largest ultrametric between any pair of nodes. In the discussion leading to the introduction of the Axiom of V alue (A1) in Section III we argued that the intuitive notion of a cluster dictates that it must be possible for co-clustered nodes to influence each other . In the discussion leading to the definition of the Property of Influence (P1) at the beginning of this section we argued that in networks with more than two nodes the natural extension is that co-clustered nodes must be able to influence each other either directly or through their indirect influence on other intermediate nodes. The Property of Influence is a codification of this intuition because it states the impossibility of cluster formation at resolutions where influence loops cannot be formed. While (P1) and (A1) seem quite different and seemingly independent, we have sho wn in this section that if a method satisfies axioms (A1) and (A2) it must satisfy (P1). Therefore, requiring direct influence on a two-node network as in (A1) restricts the mechanisms for indirect influence propagation so that clusters cannot be formed at resolutions that do not allow for mutual, possibly indirect, influence as stated in (P1). In that sense the restriction of indirect influence propagation in (P1) is not just intuitively reasonable but formally implied by the more straightforward restrictions on direct influence in (A1) and dissimilarity reducing maps in (A2). V . R E C I P R OC A L A N D N O N R E C I P RO C A L C L U S T E R I N G Pick any network N X = ( X , A X ) ∈ N . One particular clustering method satisfying axioms (A1)-(A2) can be constructed by considering the symmetric dissimilarity ¯ A X ( x, x 0 ) := max( A X ( x, x 0 ) , A X ( x 0 , x )) , (60) for all x, x 0 ∈ X . This effecti vely reduces the problem to cluster- ing of symmetric data, a scenario in which the single linkage method in (8) is known to satisfy axioms analogous to (A1)- (A2), [13]. Drawing upon this connection we define the recipr ocal clustering method H R with output ( X , u R X ) = H R ( X, A X ) as the one for which the ultrametric u R X ( x, x 0 ) between points x and x 0 is gi ven by u R X ( x, x 0 ) := min C ( x,x 0 ) max i | x i ∈ C ( x,x 0 ) ¯ A X ( x i , x i +1 ) . (61) An illustration of the definition in (61) is shown in Fig. 8. W e search for chains C ( x, x 0 ) linking nodes x and x 0 . For a gi ven chain we walk from x to x 0 and for every link, connecting say x i with x i +1 , we determine the maximum dissimilarity in both directions, i.e. the value of ¯ A X ( x i , x i +1 ) . W e then determine the maximum across all the links in the chain. The reciprocal ultrametric u R X ( x, x 0 ) between points x and x 0 is the minimum of this value across all possible chains. Recalling the equiv alence of dendrograms and ultrametrics provided by Theorem 1, we know that R X , the dendrogram produced by reciprocal clustering, clusters x and x 0 together for resolutions δ ≥ u R X ( x, x 0 ) . Combin- ing the latter observation with (61), we can write the reciprocal clustering equi valence classes as x ∼ R X ( δ ) x 0 ⇐ ⇒ min C ( x,x 0 ) max i | x i ∈ C ( x,x 0 ) ¯ A X ( x i , x i +1 ) ≤ δ . (62) Comparing (62) with the definition of single linkage in (8) with ˜ u ∗ X ( x, x 0 ) as defined in (7), we see that reciprocal clustering is equiv alent to single linkage for the symmetrized network N = ( X, ¯ A X ) where dissimilarities between nodes are symmetrized to the maximum v alue of each directed dissimilarity . For the method H R specified in (61) to be a properly defined hierarchical clustering method, we need to establish that u R X is a valid ultrametric. One way of seeing that this is true is to observe that u R X arises from applying single linkage hierarchical clustering to the symmetric dissimilarity ¯ A X , which is known to output valid ultrametrics. Nev ertheless, here we directly verify that u R X as defined by (61) is indeed an ultrametric on the space X . It is clear that u R X ( x, x 0 ) = 0 only if x = x 0 and that u R X ( x, x 0 ) = u R X ( x 0 , x ) because the definition is symmetric on x and x 0 . T o verify that the strong triangle inequality in (12) holds, let C ∗ ( x, x 0 ) and C ∗ ( x 0 , x 00 ) be chains that achie ve the minimum in (61) for u R X ( x, x 0 ) and u R X ( x 0 , x 00 ) , respecti vely . The maximum cost in the concatenated chain C ( x, x 00 ) = C ∗ ( x, x 0 ) ] C ∗ ( x 0 , x 00 ) does not exceed the maximum cost in each individual chain. Thus, while the maximum cost may be smaller on a dif fer- ent chain, the chain C ( x, x 00 ) suf fices to bound u R X ( x, x 00 ) ≤ max u R X ( x, x 0 ) , u R X ( x 0 , x 00 ) as in (12). It is also possible to prov e that H R satisfies axioms (A1)-(A2) as in the follo wing proposition. Proposition 1 The recipr ocal clustering method H R is valid and admissible. I.e., u R X defined by (61) is an ultrametric for all networks N X = ( X, A X ) and H R satisfies axioms (A1)-(A2). Proof: That u R X conforms to the definition of an ultrametric was prov ed in the paragraph preceding this proposition. T o see that the Axiom of V alue (A1) is satisfied pick an arbitrary two- node network ~ ∆ 2 ( α, β ) as defined in Section II and denote by ( { p, q } , u R p,q ) = H R ( ~ ∆ 2 ( α, β )) the output of applying the reciprocal clustering method to ~ ∆ 2 ( α, β ) . Since every possible chain from p to q must contain p and q as consecutive nodes, applying the definition in (61) yields u R p,q ( p, q ) = max A p,q ( p, q ) , A p,q ( q , p ) = max( α, β ) . (63) Axiom (A1) is thereby satisfied. T o sho w fulfillment of axiom (A2), consider two networks ( X, A X ) and ( Y , A Y ) and a dissimilarity reducing map φ : X → Y . Let ( X , u R X ) = H R ( X, A X ) and ( Y , u R Y ) = H R ( Y , A Y ) be the outputs of applying the reciprocal clustering method to networks ( X, A X ) and ( Y , A Y ) . For an arbitrary pair of nodes x, x 0 ∈ X , 14 x x 1 . . . . . . x l − 1 x 0 x 0 l 0 − 1 . . . . . . x 0 1 A X ( x, x 1 ) A X ( x 1 , x 2 ) A X ( x l − 2 , x l − 1 ) A X ( x l − 1 , x 0 ) A X ( x 0 , x 0 1 ) A X ( x 0 1 , x 0 2 ) A X ( x 0 l 0 − 2 , x 0 l 0 − 1 ) A X ( x 0 l 0 − 1 , x ) Fig. 9. Nonreciprocal clustering. Nodes x and x 0 are co-clustered at resolution δ if they can be joined in both directions with possibly different (nonreciprocal) chains of maximum dissimilarity not greater than δ [cf. (67)]. Of all methods abiding to the axioms of value and transformation, nonreciprocal clustering yields the smallest ultrametric between any pair of nodes. denote by C ∗ X ( x, x 0 ) = [ x = x 0 , . . . , x l = x 0 ] a chain that achiev es the minimum reciprocal cost in (61) so as to write u R X ( x, x 0 ) = max i | x i ∈ C ∗ X ( x,x 0 ) ¯ A X ( x i , x i +1 ) . (64) Consider the transformed chain C Y ( φ ( x ) , φ ( x 0 )) = [ φ ( x ) = φ ( x 0 ) , . . . , φ ( x l ) = φ ( x 0 )] in the space Y . Since the transfor- mation φ does not increase dissimilarities we hav e that for all links in this chain A Y ( φ ( x i ) , φ ( x i +1 )) ≤ A X ( x i , x i +1 ) and A Y ( φ ( x i +1 ) , φ ( x i )) ≤ A X ( x i +1 , x i ) . Combining this observ ation with (64) we obtain, max φ ( x i ) ∈ C Y ( φ ( x ) ,φ ( x 0 )) ¯ A Y ( φ ( x i ) , φ ( x i +1 )) ≤ u R X ( x, x 0 ) . (65) Further note that C Y ( φ ( x ) , φ ( x 0 )) is a particular chain joining φ ( x ) and φ ( x 0 ) whereas the reciprocal ultrametric is the minimum across all such chains. Therefore, u R Y ( φ ( x ) , φ ( x 0 )) ≤ max φ ( x i ) ∈ C Y ( φ ( x ) ,φ ( x 0 )) ¯ A Y ( φ ( x i ) , φ ( x i +1 )) . (66) Substituting (65) in (66), it follows that u R Y ( φ ( x ) , φ ( x 0 )) ≤ u R X ( x, x 0 ) . This is the requirement in (24) for dissimilarity re- ducing transformations in the statement of Axiom (A2). In reciprocal clustering, nodes x and x 0 belong to the same clus- ter at a resolution δ whenev er we can go back and forth from x to x 0 at a maximum cost δ through the same chain. In nonreciprocal clustering we relax the restriction about the chain being the same in both directions and cluster nodes x and x 0 together if there are chains, possibly different, linking x to x 0 and x 0 to x . T o state this definition in terms of ultrametrics consider a giv en network N = ( X, A X ) and recall the definition of the unidirectional minimum chain cost ˜ u ∗ X in (7). W e define the nonr ecipr ocal clustering method H NR with output ( X, u NR X ) = H NR ( X, A X ) as the one for which the ultrametric u NR X ( x, x 0 ) between points x and x 0 is given by the maximum of the unidirectional minimum chain costs ˜ u ∗ X ( x, x 0 ) and ˜ u ∗ X ( x 0 , x ) in each direction, u NR X ( x, x 0 ) := max ˜ u ∗ X ( x, x 0 ) , ˜ u ∗ X ( x 0 , x ) . (67) An illustration of the definition in (67) is shown in Fig. 9. W e consider forward chains C ( x, x 0 ) going from x to x 0 and backward chains C ( x 0 , x ) going from x 0 to x . For each of these chains we determine the maximum dissimilarity across all the links in the chain. W e then search independently for the best forward chain C ( x, x 0 ) and the best backward chain C ( x 0 , x ) that minimize the respectiv e maximum dissimilarities across all possible chains. The nonreciprocal ultrametric u NR X ( x, x 0 ) between points x and x 0 is the maximum of these two minimum values. a b c 1 / 2 1 / 2 1 2 3 4 δ a b c a b c 2 3 1 R X NR X Fig. 10. Reciprocal and nonreciprocal dendrograms. An example network with its corresponding reciprocal (bottom) and nonreciprocal (top) dendrograms is shown. The optimal reciprocal chain linking a and b is [ a, b ] the optimal chain linking b and c is [ b, c ] and the optimal chain linking a and c is [ a, b, c ] . The optimal nonreciprocal chains linking a and b are [ a, b ] and [ b, c, a ] . Of these two the cost of [ b, c, a ] is larger . As it is the case with reciprocal clustering we can v erify that u NR X is a properly defined ultrametric and that, as a con- sequence, the nonreciprocal clustering method H NR is properly defined. Identity and symmetry are immediate. For the strong triangle inequality consider chains C ∗ ( x, x 0 ) and C ∗ ( x 0 , x 00 ) that achieve the minimum costs in ˜ u ∗ X ( x, x 0 ) and ˜ u ∗ X ( x 0 , x 00 ) as well as the chains C ∗ ( x 00 , x 0 ) and C ∗ ( x 0 , x ) that achie ve the minimum costs in ˜ u ∗ X ( x 00 , x 0 ) and ˜ u ∗ X ( x 0 , x ) . The concate- nation of these chains permits concluding that u NR X ( x, x 00 ) ≤ max u NR X ( x, x 0 ) , u NR X ( x 0 , x 00 ) , which is the strong triangle in- equality in (12). The method H NR also satisfies axioms (A1)-(A2) as the follo wing proposition shows. Proposition 2 The nonr ecipr ocal clustering method H NR is valid and admissible. I.e., u NR X defined by (67) is an ultrametric for all networks N = ( X, A X ) and H NR satisfies axioms (A1)-(A2). Proof: See Appendix A. W e denote by NR X the dendrogram output by the nonreciprocal method H NR , equi valent to u NR X by Theorem 1. The reciprocal and nonreciprocal dendrograms for an example network are shown in Fig. 10. Notice that these dendrograms dif- fer ent . In the reciprocal dendrogram nodes a and b cluster together at resolution δ = 2 due to their direct connections A X ( a, b ) = 1 / 2 and A X ( b, a ) = 2 . Node c joins this cluster at resolution δ = 3 because it links bidirectionally with b through the direct chain [ b, c ] whose maximum cost is A X ( c, b ) = 3 . The optimal reciprocal chain linking a and c is [ a, b, c ] whose maximum cost is also A X ( c, b ) = 3 . In the nonreciprocal dendrogram we can link nodes with different chains in each direction. As a consequence, a and b cluster together at resolution δ = 1 because the directed cost 15 of the chain [ a, b ] is A X ( a, b ) = 1 / 2 and the directed cost of the chain [ b, c, a ] is A X ( c, a ) = 1 . Similar chains demonstrate that a and c as well as b and c also cluster together at resolution δ = 1 . V I . E X T R E M A L U LT R A M ET R I C S Giv en that we hav e constructed two admissible methods sat- isfying axioms (A1)-(A2), the question whether these two con- structions are the only possible ones arises and, if not, whether they are special in some sense. W e will see in Section VII that there are constructions other than reciprocal and nonreciprocal clustering that satisfy axioms (A1)-(A2). Ho wev er, we prove in this section that reciprocal and nonreciprocal clustering are a peculiar pair in that all possible admissible clustering methods are contained between them in a well-defined sense. T o explain this sense properly , observe that since reciprocal chains [cf. Fig. 8] are particular cases of nonreciprocal chains [cf. Fig. 9] we must hav e that for all pairs of nodes x, x 0 u NR X ( x, x 0 ) ≤ u R X ( x, x 0 ) . (68) I.e., nonreciprocal ultrametrics do not exceed reciprocal ultramet- rics. An important characterization is that an y method H satisfying axioms (A1)-(A2) yields ultrametrics that lie between u NR X and u R X as we formally state in the follo wing generalization of Theorem 18 in [13]. Theorem 4 Consider an admissible clustering method H satis- fying axioms (A1)-(A2). F or an arbitrary given network N = ( X, A X ) denote by ( X , u X ) = H ( N ) the output of H applied to N . Then, for all pairs of nodes x, x 0 u NR X ( x, x 0 ) ≤ u X ( x, x 0 ) ≤ u R X ( x, x 0 ) , (69) wher e u NR X ( x, x 0 ) and u R X ( x, x 0 ) denote the nonr ecipr ocal and r eciprocal ultrametrics as defined by (67) and (61) , respectively . Proof of u NR X ( x , x 0 ) ≤ u X ( x , x 0 ) : Recall that validity of (A1)- (A2) implies validity of (P1) by Corollary 1. T o show the first inequality in (69), consider the nonreciprocal clustering equiv alence relation ∼ NR X ( δ ) at resolution δ according to which x ∼ NR X ( δ ) x 0 if and only if x and x 0 belong to the same nonreciprocal cluster at resolution δ . Notice that this is true if and only if u NR X ( x, x 0 ) ≤ δ . Further consider the space Z := X mo d ∼ NR X ( δ ) of corresponding equiv alence classes and the map φ δ : X → Z that maps each point of X to its equi valence class. Notice that x and x 0 are mapped to the same point z if they belong to the same cluster at resolution δ , which allows us to write φ δ ( x ) = φ δ ( x 0 ) ⇐ ⇒ u NR X ( x, x 0 ) ≤ δ . (70) W e define the network N Z := ( Z , A Z ) by endo wing Z with the dissimilarity A Z deriv ed from the dissimilarity A X as A Z ( z , z 0 ) := min x ∈ φ − 1 δ ( z ) ,x 0 ∈ φ − 1 δ ( z 0 ) A X ( x, x 0 ) . (71) The dissimilarity A Z ( z , z 0 ) compares all the dissimilarities A X ( x, x 0 ) between a member of the equiv alence class z and a member of the equi valence class z 0 and sets A Z ( z , z 0 ) to the value corresponding to the least dissimilar pair; see Fig. 11. Notice that according to construction, the map φ δ is dissimilarity reducing A X ( x, x 0 ) ≥ A Z ( φ δ ( x ) , φ δ ( x 0 )) , (72) because we either hav e A Z ( φ δ ( x ) , φ δ ( x 0 )) = 0 if x and x 0 are co-clustered at resolution δ , or A X ( x, x 0 ) ≥ z z 0 z 00 A Z ( z , z 0 ) A Z ( z 0 , z ) A Z ( z 0 , z 00 ) A Z ( z 00 , z 0 ) A Z ( z , z 00 ) A Z ( z 00 , z ) Fig. 11. Network of equiv alence classes for a giv en resolution. Each shaded subset of nodes represent an equiv alence class. The Axiom of T ransformation permits relating the clustering of nodes in the original network and the clustering of nodes in the network of equi valence classes. min x ∈ φ − 1 δ ( z ) ,x 0 ∈ φ − 1 δ ( z 0 ) A X ( x, x 0 ) = A Z ( φ δ ( x ) , φ δ ( x 0 )) if the y are mapped to dif ferent equiv alent classes. Consider now an arbitrary method H satisfying axioms (A1)- (A2) and denote by ( Z, u Z ) = H ( N Z ) the outcome of H when applied to N Z . T o apply Property (P1) to this outcome we determine the minimum loop cost of N Z in the follo wing claim. Claim 1 The minimum loop cost of the network N Z is mlc ( N Z ) > δ . (73) Proof: According to the definition in (71), if z 6 = z 0 it must be that either A Z ( z , z 0 ) > δ or A Z ( z 0 , z ) > δ for otherwise z and z 0 would be the same equi valent class. Indeed, if both A Z ( z , z 0 ) ≤ δ and A Z ( z 0 , z ) ≤ δ we can build chains C ( x, x 0 ) and C ( x 0 , x ) with maximum cost smaller than δ for any x ∈ φ − 1 δ ( z ) and x 0 ∈ φ − 1 δ ( z 0 ) . For the chain C ( x, x 0 ) denote by x o ∈ φ − 1 δ ( z ) and x 0 i ∈ φ − 1 δ ( z 0 ) the points achie ving the minimum in (71) so that A Z ( z , z 0 ) = A X ( x o , x 0 i ) . Since x and x o are in the same equiv alence class there is a chain C ( x, x o ) of maximum cost smaller than δ . Likewise, since x 0 i and x 0 are in the same class there is a chain C ( x 0 i , x 0 ) that joins them at maximum cost smaller than δ . Therefore, the concatenated chain C ( x, x 0 ) = C ( x, x o ) ] [ x o , x 0 i ] ] C ( x 0 i , x 0 ) , (74) has maximum cost smaller than δ . The construction of the chain C ( x 0 , x ) is analogous. Howe ver , the existence of these two chains implies that x and x 0 are clustered together at resolution δ [cf (67)] contradicting the assumption that z and z 0 are dif ferent equiv alent classes. T o prove that the minimum loop cost of N Z is mlc ( Z, A Z ) > δ assume that (73) is not true and denote by [ z , z 0 , . . . , z ( l ) , z ] a loop of cost smaller than δ . For any x ∈ φ − 1 δ ( z ) and x 0 ∈ φ − 1 δ ( z 0 ) we can join x to x 0 using the chain C ( x, x 0 ) in (74). T o join x 0 and x denote by x ( k ) o and x ( k +1) i the points for which A Z ( z ( k ) , z ( k +1) ) = A X ( x ( k ) o , x ( k +1) i ) as in (71). W e can then join x 0 o and x ( l ) o with the concatenated chain C ( x 0 o , x ( l ) o ) = l − 1 ] k =1 h x ( k ) o , x ( k +1) i i ] C x ( k +1) i , x ( k +1) o . (75) The maximum cost in traversing this chain is smaller than δ because the maximum cost in C ( x ( k +1) i , x ( k +1) o ) is smaller than δ since both nodes belong to the same class z ( k +1) , and because 16 A X ( x ( k ) o , x ( k +1) i ) ≤ δ by assumption. W e can now join x 0 to x with the concatenated chain C ( x 0 , x ) = C ( x 0 , x 0 o ) ] C ( x 0 o , x ( l ) o ) ] [ x ( l ) o , x i ] ] C ( x i , x ) , (76) whose maximum cost is smaller than δ . Using the chains (74) and (76) it follows that u NR X ( x, x 0 ) ≤ δ contradicting the assumption that x and x 0 belong to different equiv alent classes. Therefore, the assumption that (73) is false cannot hold. The opposite must be true. Continuing with the main proof, recall that ( Z , u Z ) = H ( Z , A Z ) . Since the minimum loop cost of Z satisfies (73) it follows from Property (P1) that for all pairs z , z 0 , u Z ( z , z 0 ) > δ . (77) Further note that according to (72) and Axiom (A2) we must have u X ( x, x 0 ) ≥ u Z ( z , z 0 ) . This fact, combined with (77) allows us to conclude that when x and x 0 map to different equi valence classes u X ( x, x 0 ) ≥ u Z ( z , z 0 ) > δ . (78) Notice that according to (70), x and x 0 mapping to different equiv alence classes is equiv alent to u NR X ( x, x 0 ) > δ . Consequently , we can claim that u NR X ( x, x 0 ) > δ implies u X ( x, x 0 ) > δ , or, in set notation that { ( x, x 0 ) : u NR X ( x, x 0 ) > δ } ⊆ { ( x, x 0 ) : u X ( x, x 0 ) > δ } . (79) Because (79) is true for arbitrary δ > 0 it implies that u NR X ( x, x 0 ) ≤ u X ( x, x 0 ) for all x, x 0 ∈ X as in the first inequality in (69). Proof of u X ( x , x 0 ) ≤ u R X ( x , x 0 ) : T o prov e the second inequality in (69) consider points x and x 0 with reciprocal ultrametric u R X ( x, x 0 ) = δ . Let C ( x, x 0 ) = [ x = x 0 , . . . , x l = x 0 ] be a chain achieving the minimum in (61) so that we can write δ = u R X ( x, x 0 ) = max i max A X ( x i , x i +1 ) , A X ( x i +1 , x i ) . (80) T urn attention to the symmetric two-node network ~ ∆ 2 ( δ, δ ) = ( { p, q } , A p,q ) with A p,q ( p, q ) = A p,q ( q , p ) = δ . Denote the output of clustering method H applied to network ~ ∆ 2 ( δ, δ ) as ( { p, q } , u p,q ) = H ( ~ ∆ 2 ( δ, δ )) . Notice that according to Axiom (A1) we hav e u p,q ( p, q ) = max( δ, δ ) = δ . Focus no w on transformations φ i : { p, q } → X gi ven by φ i ( p ) = x i , φ i ( q ) = x i +1 so as to map p and q to subsequent points in the chain C ( x, x 0 ) used in (80). Since it follows from (80) that A X ( x i , x i +1 ) ≤ δ and A X ( x i +1 , x i ) ≤ δ for all i , it is just a simple matter of notation to observe that A X ( φ i ( p ) , φ i ( q )) ≤ A p,q ( p, q ) = δ, A X ( φ i ( q ) , φ i ( p )) ≤ A p,q ( q , p ) = δ. (81) Since according to (81) transformations φ i are dissimilarity- reducing, it follo ws from Axiom (A2) that u X ( φ i ( p ) , φ i ( q )) ≤ u p,q ( p, q ) = δ. (82) Substituting the equiv alences φ i ( p ) = x i , φ i ( q ) = x i +1 and recalling that (82) is true for all i we can equiv alently write u X ( x i , x i +1 ) ≤ δ , for all i. (83) T o complete the proof we use the fact that since u X is an ultrametric and C ( x, x 0 ) = [ x = x 0 , . . . , x l = x 0 ] is a chain joining x and x 0 the strong triangle inequality dictates that u X ( x, x 0 ) ≤ max i u X ( x i , x i +1 ) ≤ δ , (84) where we used (83) in the second inequality . The proof of the second inequality in (69) follo ws by substituting δ = u R X ( x, x 0 ) [cf. (80)] into (84). According to Theorem 4, nonreciprocal clustering applied to a giv en network N = ( X , A X ) yields a uniformly minimal ultrametric among those output by all clustering methods satis- fying axioms (A1)-(A2). Reciprocal clustering yields a uniformly maximal ultrametric. Any other clustering method abiding by (A1)-(A2) yields an ultrametric such that the value u X ( x, x 0 ) for any two points x, x 0 ∈ X lies between the v alues u NR X ( x, x 0 ) and u R X ( x, x 0 ) assigned by nonreciprocal and reciprocal clus- tering. In terms of dendrograms, (69) implies that among all possible clustering methods, the smallest possible resolution at which nodes are clustered together is the one corresponding to nonreciprocal clustering. The highest possible resolution is the one that corresponds to reciprocal clustering. A. Hier ar chical clustering on symmetric networks Restrict attention to the subspace M ⊂ N of symmetric networks, that is N = ( X , A X ) ∈ M if and only if A X ( x, x 0 ) = A X ( x 0 , x ) for all x, x 0 ∈ X . When restricted to the space M reciprocal and nonreciprocal clustering are equi valent methods because, for any pair of points, minimizing nonreciprocal chains are al ways reciprocal – more precisely there may be multiple minimizing nonreciprocal chains but at least one of them is reciprocal. T o see this formally first fix x, x 0 ∈ X and observe that in symmetric networks the symmetrization in (60) is unnecessary because ¯ A X ( x i , x i +1 ) = A X ( x i , x i +1 ) = A X ( x i +1 , x i ) and the definition of reciprocal clustering in (61) reduces to u R X ( x, x 0 ) = min C ( x,x 0 ) max i | x i ∈ C ( x,x 0 ) A X ( x i , x i +1 ) = min C ( x 0 ,x ) max i | x i ∈ C ( x 0 ,x ) A X ( x i , x i +1 ) . (85) Further note that the costs of any giv en chain C ( x, x 0 ) = [ x = x 0 , x 1 , . . . , x l − 1 , x l = x 0 ] and its reciprocal C ( x 0 , x ) = [ x 0 = x l , x l − 1 , . . . , x 1 , x 0 = x ] are the same. It follows that directed minimum chain costs ˜ u ∗ X ( x, x 0 ) = ˜ u ∗ X ( x 0 , x ) are equal and according to (67) equal to the nonreciprocal ultrametric u NR X ( x, x 0 ) = ˜ u ∗ X ( x, x 0 ) = ˜ u ∗ X ( x 0 , x ) = u R X ( x, x 0 ) . (86) T o write the last equality in (86) we used the definitions of ˜ u ∗ X ( x, x 0 ) and ˜ u ∗ X ( x 0 , x ) in (7) which are correspondingly equiv- alent to the first and second equality in (85). By further comparison of the ultrametric definition of single linkage in (25) with (86) the equiv alence of reciprocal, nonrecip- rocal, and single linkage clustering in symmetric networks follo ws u NR X ( x, x 0 ) = u SL X ( x, x 0 ) = u R X ( x, x 0 ) . (87) The equiv alence in (86) along with Theorem 4 demonstrates that when considering the application of hierarchical clustering methods H : M → U to symmetric networks, there exist a unique method satisfying (A1)-(A2). The equiv alence in (87) shows that this method is single linkage. Before stating this result formally let us define the symmetric version of the Axiom of V alue: 17 (B1) Symmetric Axiom of V alue. Consider a symmetric two- node netw ork ~ ∆ 2 ( α, α ) = ( { p, q } , A p,q ) with A p,q ( p, q ) = A p,q ( q , p ) = α . The ultrametric output ( { p, q } , u p,q ) = H ( ~ ∆ 2 ( α, α )) produced by H satisfies u p,q ( p, q ) = α. (88) Since there is only one dissimilarity in a symmetric network with two nodes, (B1) states that they cluster together at the resolution that connects them to each other . W e can now in vok e Theorem 4 and (87) to prov e that single linkage is the unique hierarchical clustering method in symmetric networks that is admissible with respect to (B1) and (A2). Corollary 2 Let H : M → U be a hierar chical clustering method for symmetric networks N = ( X , A X ) ∈ M , that is A X ( x, x 0 ) = A X ( x 0 , x ) for all x, x 0 ∈ X , and H SL be the single linkag e method with output ultrametrics as defined in (25) . If H satisfies axioms (B1) and (A2) then H ≡ H SL . Proof: When restricted to symmetric networks (B1) and (A1) are equi v alent statements. Thus, H satisfies the hypotheses of Theorem 4 and as a consequence (69) is true for any pair of points x, x 0 of any network N ∈ M . But by (87) nonreciprocal, single linkage, and reciprocal ultrametrics coincide. Thus, we can reduce (69) to u SL X ( x, x 0 ) ≤ u X ( x, x 0 ) ≤ u SL X ( x, x 0 ) . (89) It then must be u SL X ( x, x 0 ) = u X ( x, x 0 ) for any pair of points x, x 0 of any network N ∈ M . This means H ≡ H SL . The uniqueness result claimed by Corollary 2 strengthens the uniqueness result in [13, Theorem 18]. T o explain the differences consider the symmetric version of the Property of Influence. In a symmetric network there is always a loop of minimum cost of the form [ x, x 0 , x ] for some pair of points x, x 0 . Indeed, say that C ∗ ( x ∗ , x ∗ ) is one of the loops achieving the minimum cost in (9) and let A X ( x, x 0 ) = mlc ( X , A X ) be the maximum dissimilarity in this loop. Then, the cost of the loop [ x, x 0 , x ] is A X ( x, x 0 ) = A X ( x 0 , x ) = mlc ( X, A X ) which means that either the loop C ∗ ( x ∗ , x ∗ ) was already of the form [ x, x 0 , x ] or that the cost of the loop [ x, x 0 , x ] is the same as C ∗ ( x ∗ , x ∗ ) . In any ev ent, there is a loop of minimum cost of the form [ x, x 0 , x ] which implies that in symmetric networks we must have mlc ( X, A X ) = min x 6 = x 0 A X ( x, x 0 ) = sep ( X, A X ) , (90) where we recalled the definition of the separation of a network stated in (10) to write the second equality . W ith this observ ation we can now introduce the symmetric version of the Property of Influence (P1): (Q1) Symmetric Pr operty of Influence. For any symmetric network N X = ( X, A X ) the output ( X , u X ) = H ( X , A X ) cor- responding to the application of hierarchical clustering method H is such that the ultrametric u X ( x, x 0 ) between any two distinct points x and x 0 cannot be smaller than the separation of the network [cf. (90)], u X ( x, x 0 ) ≥ sep ( X, A X ) . (91) In [13] admissibility is defined with respect to (B1), (A2), and (Q1), which corresponds to conditions (I), (II), and (III) of [13, Theorem 18]. Corollary 2 sho ws that Property (Q1) is redundant when given Axioms (B1) and (A2) – respectiv ely , Condition (III) of [13, Theorem 18] is redundant when giv en conditions (I) and (II) of [13, Theorem 18]. Corollary 2 also shows that single linkage is the unique admissible method for all symmetric, not necessarily metric, networks. V I I . I N T E R M E D I A T E C L U S T E R I N G M E T H O D S Reciprocal and nonreciprocal clustering bound the range of clustering methods satisfying axioms (A1)-(A2) in the sense specified by Theorem 4. Since methods H R and H NR are in general dif ferent (e.g. recall the example in Fig. 10) a question of great interest is whether one can identify methods which are intermediate to H R and H NR . In this section we study three types of intermediate methods. In Section VII-A we introduce grafting methods, which are built by exchanging branches between dendrograms generated by different admissible methods. In Section VII-B, we compute a form of con- ve x combination of dendrograms generated by admissible methods to obtain ne w admissible methods. In Section VII-C, we present the semi-reciprocal family which requires part of the influence to be reciprocal and allows the rest to propagate through loops. These latter methods arise as natural intermediate ultrametrics in an algorithmic sense, as further discussed in Section VIII. A. Gr afting and r elated constructions A family of admissible methods can be constructed by grafting branches of the nonreciprocal dendrogram into corresponding branches of the reciprocal dendrogram; see Fig. 12. T o be pre- cise, consider a given positiv e constant β > 0 . For any given network N = ( X , A X ) compute the reciprocal and nonreciprocal dendrograms and cut all branches of the reciprocal dendrogram at resolution β . For each of these branches define the corresponding branch in the nonreciprocal tree as the one whose leaves are the same. Replacing the previously cut branches of the reciprocal tree by the corresponding branches of the nonreciprocal tree yields the H R / NR ( β ) method. Grafting is equiv alent to providing the following piecewise definition of the output ultrametric; for x, x 0 ∈ X let u R / NR X ( x, x 0 ; β ) := ( u NR X ( x, x 0 ) , if u R X ( x, x 0 ) ≤ β , u R X ( x, x 0 ) , if u R X ( x, x 0 ) > β . (92) For pairs x, x 0 having large reciprocal ultrametric u R X ( x, x 0 ) > β we keep the reciprocal ultrametric v alue u R / NR X ( x, x 0 ; β ) = u R X ( x, x 0 ) . For pairs x, x 0 with small reciprocal ultrametric u R X ( x, x 0 ) ≤ β we replace the reciprocal by the nonreciprocal ultrametric and make u R / NR X ( x, x 0 ; β ) = u NR X ( x, x 0 ) . T o show that (92) is an admissible method we need to sho w that it defines an ultrametric on the space X and that the method satisfies axioms (A1) and (A2). This is asserted in the following proposition. Proposition 3 The hierar chical clustering method H R / NR ( β ) is valid and admissible. I.e., u R / NR X ( β ) defined by (92) is an ultra- metric for all networks N = ( X , A X ) and H R / NR ( β ) satisfies axioms (A1)-(A2). Proof: See Appendix B. 18 Since u R / NR X ( x, x 0 ; β ) coincides with either u NR X ( x, x 0 ) or u R X ( x, x 0 ) for all x, x 0 ∈ X , it satisfies Theorem 4 as it should be the case for the output ultrametric of any admissible method. An example construction of u R / NR X ( x, x 0 ; β ) for a particular network and β = 4 is illustrated in Fig. 12. The nonreciprocal ul- trametric (67) is u NR X ( x, x 0 ) = 1 for all x 6 = x 0 due to the outmost clockwise loop visiting all nodes at cost 1. This is represented in the nonreciprocal H NR dendrogram in Fig. 12. For the reciprocal ultrametric (61) nodes c and d merge at resolution u R X ( c, d ) = 2 , nodes a and b at resolution u R X ( a, b ) = 3 , and the y all join together at resolution δ = 5 . This is represented in the reciprocal H R den- drogram in Fig. 12. T o determine u R / NR X ( x, x 0 ; 4) use the piece wise definition in (92). Since the reciprocal ultrametrics u R X ( c, d ) = 2 ≤ 4 , and u R X ( a, b ) = 3 ≤ 4 are smaller than β = 4 we set the grafted outcomes to the nonreciprocal ultrametric values to obtain u R / NR X ( c, d ) = u NR X ( c, d ) = 1 , and u R / NR X ( a, b ) = u NR X ( a, b ) = 1 . Since the remaining ultrametric distances are u R X ( x, x 0 ) = 5 which exceed β = 4 we set u R / NR X ( x, x 0 ; 4) = u R X ( x, x 0 ) = 5 . This yields the H R / NR dendrogram in Fig. 12 which we interpret as cutting branches from H R that we replace by corresponding branches of H NR . In the method H R / NR ( β ) we use the reciprocal ultrametric as a decision v ariable in the piece wise definition (92) and use nonrecip- rocal ultrametrics for nodes having small reciprocal ultrametrics. There are three other possible grafting combinations H R / R ( β ) , H NR / R ( β ) and H NR / NR ( β ) depending on which ultrametric is used as decision v ariable to swap branches and which of the two ultrametrics is used for nodes having small v alues of the decision ultrametric. In the method H R / R ( β ) , we use reciprocal ultrametrics as decision v ariables and as the choice for small values of reciprocal ultrametrics, u R / R X ( x, x 0 ; β ) := ( u R X ( x, x 0 ) , if u R X ( x, x 0 ) ≤ β , u NR X ( x, x 0 ) , if u R X ( x, x 0 ) > β . (93) In the same manner in which (92) represents cutting the reciprocal dendrogram at a resolution and grafting branches of the nonrecip- rocal dendrogram for resolutions lower than the cut, the definition in (93) entails cutting the reciprocal dendrogram at a giv en resolu- tion and grafting branches of the nonreciprocal tree for resolutions higher than the cut. The method H R / R ( β ) as defined in (93) is not valid, ho wev er, because for some networks N = ( X , A X ) the function u R / R X ( β ) is not an ultrametric as it violates the strong triangle inequality in (12). As a counterexample consider again the network in Fig. 12. Applying the definition in (93) we make u R / R X ( a, b ; 4) = u R X ( a, b ) because u R / R X ( a, b ; 4) ≤ 4 and we make u R / R X ( a, c ; 4) = u NR X ( a, c ) = 1 and u R / R X ( c, b ; 4) = u NR X ( c, b ) = 1 because both u R X ( a, c ; 4) > 4 and u R X ( c, b ; 4) > 4 . Howe ver , this implies that u R / R X ( a, b ; 4) > max( u R / R X ( a, c ; 4) , u R / R X ( c, b ; 4)) violating the strong triangle inequality (12) and proving that the definition in (93) is not a v alid output of a hierarchical clustering method. In H NR / NR ( β ) we use nonreciprocal ultrametrics as decision variables and as the choice for small values of nonreciprocal ultrametrics. In H NR / R ( β ) nonreciprocal ultrametrics are used as decision variables and reciprocal ultrametrics are used for small values of nonreciprocal ultrametrics. Both of these methods are in valid as the y can be seen to also violate the strong triangle inequality for some networks. A second valid grafting alternativ e can be obtained as a a b c d 1 1 1 1 3 5 2 5 δ 1 2 3 5 6 d c b a H R d c b a H NR d c b a H R / NR β = 4 Fig. 12. Dendrogram grafting. Reciprocal ( H R ) and nonreciprocal ( H NR ) den- drograms for the given network are sho wn – edges not drawn have dissimilarities greater than 5. Grafting according to (92) with β = 4 is performed to construct the dendrogram corresponding to the method H R / NR (4) . Branches of the reciprocal dendrogram are cut at resolution β = 4 and replaced by corresponding branches of the nonreciprocal dendrogram. modification of H R / R ( β ) in which reciprocal ultrametrics are kept for pairs having small reciprocal ultrametrics, nonreciprocal ultrametrics are used for pairs having lar ge reciprocal ultrametrics, but all nonreciprocal ultrametrics smaller than β are saturated to this v alue. Denoting the method by H R / R max ( β ) the output ultrametrics are thereby gi ven as u R / R max X ( x, x 0 ; β ) := ( u R X ( x, x 0 ) , if u R X ( x, x 0 ) ≤ β , max β , u NR X ( x, x 0 ) , if u R X ( x, x 0 ) > β . (94) This alternativ e definition outputs a valid ultrametric and the method H R / R max ( β ) satisfies axioms (A1)-(A2) as we claim in the follo wing proposition. Proposition 4 The hier ar chical clustering method H R / R max ( β ) is valid and admissible. I.e., u R / R max X ( β ) defined by (94) is an ultrametric for all networks N = ( X, A X ) and H R / R max ( β ) satisfies axioms (A1)-(A2). Proof: See Appendix B. Remark 1 Intuitiv ely , the grafting combination H R / NR ( β ) allo ws nonreciprocal propagation of influence for resolutions smaller than β while requiring reciprocal propagation for higher resolutions. This is of interest if we want tight clusters of small dissimilarity to be formed through loops of influence while looser clusters of higher dissimilarity are required to form through links of bidirec- tional influence. Conv ersely , the clustering method H R / R max ( β ) requires reciprocal influence within tight clusters of resolution smaller than β but allows nonreciprocal influence in clusters of higher resolutions. This latter behavior is desirable in, e.g., trust propagation in social interactions, where we want tight clusters to 19 be formed through links of mutual trust but allow looser clusters to be formed through unidirectional trust loops. B. Con vex combinations Another completely different family of intermediate admissible methods can be constructed from the result of performing a con ve x combination of methods known to satisfy axioms (A1) and (A2). Indeed, consider two admissible clustering methods H 1 and H 2 and a giv en parameter 0 ≤ θ ≤ 1 . For an arbitrary network N = ( X, A X ) denote by ( X , u 1 X ) = H 1 ( N ) and ( X , u 2 X ) = H 2 ( N ) the respecti ve outcomes of methods H 1 and H 2 . Construct then the dissimilarity function A 12 X ( θ ) as the con ve x combination of ultrametrics u 1 X and u 2 X : for all x, x 0 ∈ X A 12 X ( x, x 0 ; θ ) := θ u 1 X ( x, x 0 ) + (1 − θ ) u 2 X ( x, x 0 ) . (95) Although it can be shown that A 12 X ( θ ) is a well-defined dissimi- larity function, it is not an ultrametric in general because it may violate the strong triangle inequality . Thus, we can recover the ultrametric structure by applying any admissible clustering method H to the network N 12 θ = ( X, A 12 X ) to obtain ( X , u X ) = H ( N 12 θ ) . Notice howe ver that the network N 12 θ is symmetric because the ultrametrics u 1 X and u 2 X are symmetric by definition. Also recall that, according to Corollary 2, single linkage is the unique admissible clustering method for symmetric networks. Thus, we define the con vex combination method H 12 θ : N → U as the one where the output ( X , u 12 X ( θ )) = H 12 θ ( N ) corresponding to network N = ( X , A X ) is gi ven by u 12 X ( x, x 0 ; θ ) := min C ( x,x 0 ) max i | x i ∈ C ( x,x 0 ) A 12 X ( x i , x i +1 ; θ ) , (96) for all x, x 0 ∈ X and A 12 X as given in (95). The operation in (96) is equiv alent to the definition of single linkage applied to the symmetric network N 12 θ . W e sho w that (96) defines a v alid ultrametric and that H 12 θ fulfills axioms (A1) and (A2) as stated in the follo wing proposition. Proposition 5 Given two admissible hierar chical clustering methods H 1 and H 2 , the con vex combination method H 12 θ is valid and admissible. I.e., u 12 X ( θ ) defined by (96) is an ultrametric for all networks N = ( X, A X ) and H 12 θ satisfies axioms (A1)-(A2). Proof: See Appendix B. The construction in (96) can be generalized to produce inter- mediate clustering methods generated by con vex combinations of any number (i.e. not necessarily two) of admissible methods (such as reciprocal, nonreciprocal, members of the grafting family of Section VII-A, members of the semi-reciprocal family to be introduced in Section VII-C, etc). These conv ex combinations can be seen to satisfy axioms (A1) and (A2) through recursiv e application of Proposition 5. Remark 2 Since (96) is equiv alent to single linkage applied to the symmetric network N 12 θ , it follows [11], [13] that the ultra- metric u 12 X ( θ ) in (96) is the largest ultrametric uniformly bounded by A 12 X ( θ ) , i.e., the largest ultrametric for which u 12 X ( x, x 0 ; θ ) ≤ A 12 X ( x, x 0 ; θ ) for all pairs x, x 0 . W e can then think of (96) as an operation ensuring a valid ultrametric definition while deviating as little as possible from A 12 X ( θ ) , thus, retaining as much information as possible in the con vex combination of u 1 X and u 2 X . C. Semi-r ecipr ocal ultrametrics In reciprocal clustering we require influence to propagate through bidirectional chains; see Fig. 8. W e could reinterpret bidirectional propagation as allowing loops of node length two in both directions. E.g., the bidirectional chain between x and x 1 in Fig. 8 can be interpreted as a loop between x and x 1 composed by two chains [ x, x 1 ] and [ x 1 , x ] of node length two. Semi- r eciprocal clustering is a generalization of this concept where loops consisting of at most t nodes in each direction are allo wed. Giv en t ∈ N such that t ≥ 2 , we use the notation C t ( x, x 0 ) to denote any chain [ x = x 0 , x 1 , . . . , x l = x 0 ] joining x to x 0 where l ≤ t − 1 . That is, C t ( x, x 0 ) is a chain starting at x and finishing at x 0 with at most t nodes, where x and x 0 need not be different nodes. Recall that the notation C ( x, x 0 ) represents a chain linking x with x 0 where no maximum is imposed on the number of nodes in the chain. Giv en an arbitrary network N = ( X, A X ) , define as A SR ( t ) X ( x, x 0 ) the minimum cost incurred when trav eling from node x to node x 0 using a chain of at most t nodes. I.e., A SR ( t ) X ( x, x 0 ) := min C t ( x,x 0 ) max i | x i ∈ C t ( x,x 0 ) A X ( x i , x i +1 ) . (97) W e define the family of semi-reciprocal clustering methods H SR ( t ) with output ( X , u SR ( t ) X ) = H SR ( t ) ( X, A X ) as the one for which the ultrametric u SR ( t ) X ( x, x 0 ) between points x and x 0 is u SR ( t ) X ( x, x 0 ) := min C ( x,x 0 ) max i | x i ∈ C ( x,x 0 ) A SR ( t ) X ( x i , x i +1 ) (98) where the function A SR ( t ) X ( x i , x i +1 ) is defined as A SR ( t ) X ( x i , x i +1 ) := max A SR ( t ) X ( x i , x i +1 ) , A SR ( t ) X ( x i +1 , x i ) . (99) The chain C ( x, x 0 ) of unconstrained length in (98) is called the main chain , represented by [ x = x 0 , x 1 , ..., x l − 1 , x l = x 0 ] in Fig. 13. Between consecutive nodes x i and x i +1 of the main chain, we build loops consisting of secondary chains in each direction, represented in Fig. 13 by [ x i , y i 1 , ..., y ik i , x i +1 ] and [ x i +1 , y 0 i 1 , ..., y 0 ik 0 i , x i ] for all i . For the computation of u SR ( t ) X ( x, x 0 ) , the maximum allowed length of secondary chains is equal to t nodes, i.e., k i , k 0 i ≤ t − 2 for all i . In particular , for t = 2 we recover the reciprocal chain depicted in Fig. 8. W e can reinterpret (98) as the application of reciprocal clus- tering [cf. (61)] to a netw ork with dissimilarities A SR ( t ) X as in (97), i.e., a network with dissimilarities giv en by the optimal choice of secondary chains. Semi-reciprocal clustering methods are valid and satisfy axioms (A1)-(A2) as shown in the following proposition. Proposition 6 The semi-r eciprocal clustering method H SR ( t ) is valid and admissible for all integ ers t ≥ 2 . I.e., u SR ( t ) X defined by (98) is an ultrametric for all networks N = ( X , A X ) and H SR ( t ) satisfies axioms (A1)-(A2). Proof: See Appendix B. The semi-reciprocal family is a countable family of clustering methods parameterized by integer t representing the allowed maximum node length of secondary chains. Reciprocal and nonre- ciprocal ultrametrics are equiv alent to semi-reciprocal ultrametrics for specific values of t . For t = 2 we have u SR (2) X = u R X meaning that we recover reciprocal clustering. T o see this formally , note 20 x y 01 . . . . . . y 0 k 0 x 1 . . . . . . x l − 1 y ( l − 1)1 . . . . . . y l − 1 k l − 1 x 0 y 0 ( l − 1)1 . . . . . . y 0 l − 1 k 0 l − 1 y 0 01 . . . . . . y 0 0 k 0 0 Fig. 13. Semi-reciprocal chains. The main chain joining x and x 0 is formed by [ x, x 1 , ..., x l − 1 , x 0 ] . Between two consecutive nodes of the main chain x i and x i +1 , we have a secondary chain in each direction [ x i , y i 1 , ..., y ik i , x i +1 ] and [ x i +1 , y 0 i 1 , ..., y 0 ik 0 i , x i ] . For u SR ( t ) X ( x, x 0 ) , the maximum allowed node length of secondary chains is t , i.e., k i , k 0 i ≤ t − 2 for all i . x x 1 x 2 x 3 x 4 x 0 x 5 x 6 1 1 1 2 1 1 1 1 3 2 2 4 4 4 4 2 Fig. 14. Semi-reciprocal example. Computation of semi-reciprocal ultrametrics between nodes x and x 0 for different values of parameter t . u SR (2) X ( x, x 0 ) = 4 , u SR (3) X ( x, x 0 ) = 3 , u SR (4) X ( x, x 0 ) = 2 and u SR ( t ) X ( x, x 0 ) = 1 for all t ≥ 5 ; see text for details. that A SR (2) X ( x, x 0 ) = A X ( x, x 0 ) [cf. (97)] since the only chain of length two joining x and x 0 is [ x, x 0 ] . Hence, for t = 2 , (98) reduces to u SR (2) X ( x, x 0 ) = min C ( x,x 0 ) max i | x i ∈ C ( x,x 0 ) ¯ A X ( x i , x i +1 ) , (100) which is the definition of the reciprocal ultrametric [cf. (61)]. Nonreciprocal ultrametrics can be obtained as u SR ( t ) X = u NR X for any parameter t exceeding the number of nodes in the network analyzed. T o see this, notice that minimizing o ver C ( x, x 0 ) is equiv alent to minimizing over C t ( x, x 0 ) for all t ≥ n , since we are looking for minimizing chains in a network with non negati ve dissimilarities. Therefore, visiting the same node twice is not an optimal choice. This implies that C n ( x, x 0 ) contains all possible minimizing chains between x and x 0 . I.e., all chains of interest hav e at most n nodes. Hence, by inspecting (97), A SR ( t ) X ( x, x 0 ) = ˜ u ∗ X ( x, x 0 ) [cf. (7)] for all t ≥ n . Furthermore, when t ≥ n , the best main chain that can be picked is formed only by nodes x and x 0 because, in this way , no additional meeting point is enforced between the chains going from x to x 0 and vice versa. As a consequence, definition (98) reduces to u SR ( t ) X ( x, x 0 ) = max ˜ u ∗ X ( x, x 0 ) , ˜ u ∗ X ( x 0 , x ) , (101) for all x, x 0 ∈ X and for all t ≥ n . The right hand side of (101) is the definition of the nonreciprocal ultrametric [cf. (67)]. For the network in Fig. 14, we calculate the semi-reciprocal ultrametrics between x and x 0 for different v alues of t . The edges which are not delineated are assigned dissimilarity v alues greater than the ones depicted in the figure. Since the only bidirectional chain between x and x 0 uses x 3 as the intermediate node, we conclude that u R X ( x, x 0 ) = u SR (2) X ( x, x 0 ) = 4 . Furthermore, by constructing a path through the outermost clockwise cycle in the network, we conclude that u NR X ( x, x 0 ) = 1 . Since the longest secondary chain in the minimizing chain for the nonreciprocal case, [ x, x 1 , x 2 , x 4 , x 0 ] , has node length 5, we may conclude that u SR ( t ) X ( x, x 0 ) = 1 for all t ≥ 5 . For intermediate v alues of t , if e.g., we fix t = 3 , the minimizing chain is giv en by the main chain [ x, x 3 , x 0 ] and the secondary chains [ x, x 1 , x 3 ] , [ x 3 , x 4 , x 0 ] , [ x 0 , x 5 , x 3 ] and [ x 3 , x 6 , x ] joining consecutive nodes in the main chain in both directions. The maximum cost among all dissimi- larities in this path is A X ( x 1 , x 3 ) = 3 . Hence, u SR (3) X ( x, x 0 ) = 3 . The minimizing chain for t = 4 is similar to the minimizing one for t = 3 but replacing the secondary chain [ x, x 1 , x 3 ] by [ x, x 1 , x 2 , x 3 ] . In this way , we obtain u SR (4) X ( x, x 0 ) = 2 . Remark 3 Intuitiv ely , when propagating influence through a network, reciprocal clustering requires bidirectional influence whereas nonreciprocal clustering allows arbitrarily large unidirec- tional cycles. In many applications, such as trust propagation in social networks, it is reasonable to look for an intermediate situa- tion where influence can propagate through cycles but of limited length. Semi-reciprocal ultrametrics represent this intermediate situation where the parameter t represents the maximum length of chains through which influence can propagate in a nonreciprocal manner . V I I I . A L G O R I T H M S In this section, gi ven a network N = ( X , A X ) with | X | = n , we interpret A X as an n × n matrix of dissimilarities. Ultrametrics ov er X will de denoted u X and will also be regarded as n × n symmetric matrices. Gi ven a square matrix A , its transpose will be denoted by A T . By (61), reciprocal clustering searches for chains that minimize the maximum dissimilarity in the symmetric matrix ¯ A X := max( A X , A T X ) , (102) where the max is applied element-wise. This is equiv alent to finding chains in ¯ A X that hav e minimum cost in a ` ∞ sense. Like wise, nonreciprocal clustering searches for directed chains of minimum ` ∞ -sense cost in A X to construct the matrix ˜ u ∗ X [cf. (7)] and selects the maximum of the directed costs by performing the operation u NR X = max( ˜ u ∗ X , ˜ u ∗ T X ) [cf. (67)]. These operations can be performed algorithmically using matrix powers in the dioid algebra A := ( R + ∪ { + ∞} , min , max) [37]. In the dioid algebra A the regular sum is replaced by the minimization operator and the regular product by maximization. Using ⊕ and ⊗ to denote sum and product, respecti vely , on this dioid algebra we hav e a ⊕ b := min( a, b ) and a ⊗ b := max( a, b ) for all a, b ∈ R + ∪ { + ∞} . Henceforth, for a natural number n , [1 , n ] will denote the set { 1 , 2 , . . . , n } . In the algebra A , the matrix 21 product A ⊗ B of two real valued matrices of compatible sizes is therefore gi ven by the matrix with entries A ⊗ B ij := n M k =1 A ik ⊗ B kj = min k ∈ [1 ,n ] max A ik , B kj . (103) For integers k ≥ 2 dioid matrix powers A ( k ) X := A X ⊗ A ( k − 1) X with A (1) X := A X of a dissimilarity matrix are related to ultra- metric matrices u X . W e delve into this relationship in the next section. A. Dioid powers and ultrametrics Notice that the elements of the dioid power u (2) X of a given ultrametric matrix u X are gi ven by u (2) X ij = min k ∈ [1 ,n ] max [ u X ] ik , [ u X ] kj . (104) Since u X satisfies the strong triangle inequality we have that [ u X ] ij ≤ max [ u X ] ik , [ u X ] kj for all k ∈ [1 , n ] . And for k = j in particular we further hav e that max [ u X ] ik , [ u X ] kj ) = max [ u X ] ij , [ u X ] j j ) = max [ u X ] ij , 0) = [ u X ] ij . Combining these two observations it follo ws that the result of the minimiza- tion in (104) is u (2) X ij = u X i,j since none of its arguments is smaller that [ u X ] ij and one of them is exactly [ u X ] ij . This being valid for all i, j implies u (2) X = u X (105) Furthermore, a matrix having the property in (105) is such that u X ij = u (2) X ij = min k ∈ [1 ,n ] max [ u X ] ik , [ u X ] kj ≤ max [ u X ] ik , [ u X ] kj , which is just a restatement of the strong triangle inequality . Therefore, a nonnegati ve matrix u X represents a finite ultrametric if and only if (105) is true, has null diagonal elements u X ii = 0 and positiv e of f-diagonal elements, and is symmetric, u X = u T X . W e then expect dioid powers and max- symmetrization operations (102) to play a role in the construction of ultrametrics. This is indeed the case. From the definition in (103) it follows that for a giv en dissimilarity matrix A X the i, j entry [ A (2) X ] ij of the dioid power A (2) X represents the minimum ` ∞ -sense cost of a chain linking i to j in at most 2 hops. Proceeding recursiv ely we can show that the l th dioid power A ( l ) X is such that its i, j entry [ A ( l ) X ] ij represents the minimum ` ∞ -sense cost of a chain containing at most l hops. The quasi-inver se of a matrix in a dioid algebra is a useful concept that simplifies the proofs within this section. In any dioid algebra, we call quasi-inv erse of A , denoted A ∗ , the limit, when it exists, of the sequence of matrices [37, Ch.4, Def. 3.1.2] A ∗ := lim k →∞ I ⊕ A ⊕ A (2) ⊕ ... ⊕ A ( k ) , (106) where I has zeros in the diagonal and + ∞ in the off diagonal elements. The utility of the quasi-inv erse resides in the fact that, giv en a dissimilarity matrix A X , then [37, Ch.6, Sec 6.1] [ A ∗ X ] ij = min C ( x i ,x j ) max k | x k ∈ C ( x i ,x j ) A X ( x k , x k +1 ) , (107) where A ∗ X is the quasi-in verse of A X in the dioid A as defined in (106). I.e., the elements of the quasi-inv erse A ∗ X correspond to the directed minimum chain costs of the associated network ( X, A X ) as defined in (7). B. Algorithms for recipr ocal, nonrecipr ocal, and semi-r ecipr ocal clustering Since, as already discussed in Section VII-C, given N = ( X, A X ) ∈ N , we can restrict candidate minimizing chains to those with at most | X | − 1 hops, the following result follows. Theorem 5 F or given network N = ( X , A X ) with n nodes the r eciprocal ultrametric u R X defined in (61) can be computed as u R X = max A X , A T X ( n − 1) , (108) wher e the operation ( · ) ( n − 1) denotes the ( n − 1) st matrix power in the dioid algebra A with matrix pr oduct as defined in (103) . The nonr eciprocal ultrametric u NR X defined in (67) can be computed as u NR X = max A ( n − 1) X , A T X ( n − 1) . (109) Proof: By comparing (107) with (7), we can see that A ∗ X = ˜ u ∗ X . (110) It is just a matter of notation when comparing (110) with (67) to realize that u NR X = max A ∗ X , ( A ∗ X ) T . (111) Similarly , if we consider the quasi in verse of the symmetrized matrix ¯ A X := max( A X , A T X ) , expression (107) becomes [ ¯ A ∗ X ] ij = min C ( x i ,x j ) max k | x k ∈ C ( x i ,x j ) ¯ A X ( x k , x k +1 ) . (112) From comparing (112) and (61) it is immediate that u R X = ¯ A ∗ X = max( A X , A T X ) ∗ . (113) If we show that A ∗ X = A ( n − 1) X , then (113) and (111) imply equations (108) and (109) respecti vely , completing the proof. Notice in particular that when A = ( R + ∪ { + ∞} , min , max) , the min or ⊕ operation is idempotent, i.e. a ⊕ a = a for all a . In this case, it can be sho wn that [37, Ch.4, Prop. 3.1.1] I ⊕ A X ⊕ A (2) X ⊕ ... ⊕ A ( k ) X = ( I ⊕ A X ) ( k ) , (114) for all k ≥ 1 . Moreover , since diagonal elements are null in both matrices in the right hand side of (114) and the off diagonal elements in I are + ∞ , it is immediate that I ⊕ A X = A X . Consequently , (114) becomes I ⊕ A X ⊕ A (2) X ⊕ ... ⊕ A ( k ) X = A ( k ) X . (115) T aking the limit to infinity in both sides of equality (115) and in voking the definition of the quasi-in verse (106), we obtain A ∗ X = lim k →∞ A ( k ) X . (116) Finally , it can be shown [37, Ch. 4, Sec. 3.3, Theo. 1] that A ( n − 1) X = A ( n ) X , proving that the limit in (116) exists and, more importantly , that A ∗ X = A ( n − 1) X , as desired. For the reciprocal ultrametric we symmetrize dissimilarities with a maximization operation and take the ( n − 1) st power of the resulting matrix on the dioid algebra A . For the nonreciprocal ultrametric we re vert the order of these two operations. W e first consider matrix powers A ( n − 1) X and A T X ( n − 1) of the dissimilar- ity matrix and its transpose which we then symmetrize with a max- imization operator . Besides emphasizing the relationship between 22 reciprocal and nonreciprocal clustering, Theorem 5 suggests the existence of intermediate methods in which we raise dissimilarity matrices A X and A T X to some power , perform a symmetrization, and then continue matrix multiplications. These procedures yield methods that are not only valid but coincide with the family of semi-reciprocal ultrametrics introduced in Section VII-C as the following proposition asserts. Proposition 7 F or a given network N = ( X , A X ) with n nodes, the t -th semi-r ecipr ocal ultrametric u SR ( t ) X in (98) for e very natur al t ≥ 2 can be computed as u SR ( t ) X = max A ( t − 1) X , A T X ( t − 1) ( n − 1) , (117) wher e ( · ) ( t − 1) and ( · ) ( n − 1) denote matrix powers in the dioid algebra A with matrix pr oduct as defined in (103) . Proof: See Appendix C. The result in (117) is intuiti vely clear . The powers A ( t − 1) X and A T X ( t − 1) represent the minimum ` ∞ -sense cost among directed chains of at most t − 1 links. In the terminology of Section VII-C these are the costs of optimal secondary chains containing at most t nodes. Therefore, the maximization max A ( t − 1) X , A T X ( t − 1) computes the cost of joining two nodes with secondary chains of at most t nodes in each direction. This is the definition of A SR ( t ) X in (98). Applying the dioid po wer ( n − 1) to this ne w matrix is equiv alent to looking for minimizing chains in the network with costs giv en by the secondary chains. Thus, the outermost dioid power computes the costs of the optimal main chains that achiev e the ultrametric v alues in (98). Observe that we recover (108) by making t = 2 in (117) and that we recov er (109) when t = n . F or this latter case note that when t = n in (117), comparison with (109) sho ws that max( A ( t − 1) X , ( A T X ) ( t − 1) ) = max( A ( n − 1) X , ( A T X ) ( n − 1) ) = u NR X . Howe ver , since u NR X is an ultrametric it is idempotent in the dioid algebra [cf. (105)] and the outermost dioid power in (117) is moot. This recov ery is consistent with the observ ations in (100) and (101) that reciprocal and nonreciprocal clustering are particular cases of semi-reciprocal clustering H SR ( t ) in that for t = 2 we ha ve u SR (2) X ( x, x 0 ) = u R X ( x, x 0 ) and for t ≥ n it holds that u SR ( t ) X ( x, x 0 ) = u NR X ( x, x 0 ) for arbitrary points x, x 0 of arbitrary network N = ( X , A X ) . The results in Theorem 5 and Proposition 7 emphasize the extremal nature of the reciprocal and nonreciprocal methods and characterize the semi-reciprocal ultrametrics as natural intermediate clustering methods in an algorithmic sense. C. Algorithmic intermediate clustering methods This algorithmic perspecti ve allo ws for a generalization in which the powers of the matrices A X and A T X are different. T o be precise consider positiv e integers t, t 0 > 0 and define the algorithmic intermediate clustering method H t,t 0 with parameters t, t 0 as the one that maps the giv en network N = ( X , A X ) to the ultrametric space ( X, u t,t 0 X ) = H t,t 0 ( N ) given by u t,t 0 X := max A ( t ) X , A T X ( t 0 ) ( n − 1) . (118) The ultrametric (118) can be interpreted as a semi-reciprocal ultrametric where the allo wed length of secondary chains varies with the direction. Forward secondary chains may hav e at most t + 1 nodes whereas backward secondary chains may have at most t 0 + 1 nodes. The algorithmic intermediate family H t,t 0 encapsulates the semi-reciprocal family since H t,t ≡ H SR ( t +1) as well as the reciprocal method since H R ≡ H 1 , 1 as it follows from comparison of (118) with (117) and (108), respectiv ely . W e also have that H NR ( N ) = H n − 1 ,n − 1 ( N ) for all networks N = ( X, A X ) such that | X | ≤ n . This follo ws from the comparison of (118) with (109) and the idempotency of u NR X = max( A ( n − 1) X , ( A T X ) ( n − 1) ) with respect to the dioid algebra. The intermediate algorithmic methods H t,t 0 are admissible as we claim in the follo wing proposition. Proposition 8 The hierar chical clustering method H t,t 0 is valid and admissible. I.e., u t,t 0 X defined by (118) is an ultrametric for all networks N = ( X, A X ) and H t,t 0 satisfies axioms (A1)-(A2). Proof: See Appendix C. D. Algorithms for the grafting and con vex combination families of methods Algorithms to compute ultrametrics associated with the grafting families in Section VII-A entail simple combinations of matrices u R X and u NR X . E.g., the ultrametrics in (92) corresponding to the grafting method H R / NR ( β ) can be computed as u R / NR X ( β ) = u NR X ◦ I u R X ≤ β + u R X ◦ I u R X > β , (119) where A ◦ B denotes the Hadamard product of matrices A and B and I {·} is an element-wise indicator function which outputs a matrix with a 1 in the positions of the elements that satisfy the condition to which its applied and a 0 otherwise. In symmetric networks Corollary 2 states that any admissible algorithm must output an ultrametric equal to the single linkage ultrametric u SL X as defined in (25). Thus, all algorithms in this sec- tion yield the same output when restricted to symmetric matrices A X and this output is u SL X . Considering, e.g., the algorithm for the reciprocal ultrametric in (108) and noting that for a symmetric network A X = max( A X , A T X ) we conclude that single linkage can be computed as u SL X = A ( n − 1) X . (120) Algorithms for the conv ex combination family in Section VII-B in volve computing dioid algebra powers of a conv ex combination of ultrametric matrices. Given two admissible methods H 1 and H 2 with outputs ( X , u 1 X ) = H 1 ( N ) and ( X , u 2 X ) = H 2 ( N ) , and θ ∈ [0 , 1] , the ultrametric in (96) corresponding to the method H 12 θ can be computed as u 12 X ( θ ) = θ u 1 X + (1 − θ ) u 2 X ( n − 1) . (121) The operation θ u 1 X + (1 − θ ) u 2 X is just the regular con ve x combination in (95) and the dioid po wer in (121) implements the single linkage operation in (96) as it follows from (120). Remark 4 It follows from (108), (109), (117), (118), and (120) that all methods presented in this paper can be computed in a number of operations of order O ( n 4 ) which coincides with the time it takes to compute n matrix products of matrices of size n × n . This complexity can be reduced to O ( n 3 log n ) by noting that the dioid matrix po wer A n can be computed with the sequence 23 A, A 2 , A 4 , . . . which requires O (log n ) matrix products at a cost of O ( n 3 ) each. Complexity can be further reduced using the sub cubic dioid matrix multiplication algorithms in [40], [41] that ha ve complexity O ( n 2 . 688 ) for a total complexity of O ( n 2 . 688 log n ) to compute the n th matrix power . There are also related methods with even lo wer complexity . For the case of reciprocal clustering, complexity of order O ( n 2 ) can be achiev ed by lev eraging an equiv alence between single linkage and a minimum spanning tree problem [42], [43]. For the case of nonreciprocal clustering, T arjan’ s method [29] can be implemented to reduce complexity to O ( n 2 log n ) . I X . Q UA S I - C L U S T E R I N G M E T H O D S A partition P = { B 1 , . . . , B J } of a set X represents a clustering of X into groups of nodes B 1 , . . . , B J ∈ P such that nodes within each group can influence each other more than they can influence or be influenced by the nodes in other groups. A partition can be interpreted as a reduction in data complexity in which variations between elements of a group are neglected in fa vor of the larger dissimilarities between elements of dif ferent groups. This is natural when clustering datasets endo wed with symmetric dissimilarities because the concepts of a node x ∈ X being similar to another node x 0 ∈ X and x 0 being similar to x are equiv alent. In an asymmetric network these concepts are different and this dif ference motiv ates the definition of structures more general than partitions. Recalling that a partition P = { B 1 , . . . , B J } of X induces and is induced by an equiv alence relation ∼ P on X we search for the analogous of an asymmetric partition by removing the symmetry property in the definition of an equiv alence relation. Thus, we define a quasi-equivalence as a binary relation that satisfies the reflexivity and transitivity properties but is not necessarily symmetric as stated next. Definition 2 A binary r elation between elements of a set X is a quasi-equivalence if and only if the following pr operties hold true for all x, x 0 , x 00 ∈ X : (i) Reflexivity . P oints ar e quasi-equivalent to themselves, x x . (ii) T ransitivity . If x x 0 and x 0 x 00 then x x 00 . Quasi-equiv alence relations are more often termed preorders or quasi-orders in the literature [35]. W e choose the term quasi- equiv alence to emphasize that they are a modified version of an equiv alence relation. W e further define a quasi-partition of the set X as a directed unweighted graph as stated next. Definition 3 A quasi-partition of a given set X is a dir ected unweighted graph ˜ P = ( P , E ) where the vertex set P is a partition P = { B 1 , . . . , B J } of the space X and the edge set E ⊂ P × P is such that it contains no self-loops and the following pr operties are satisfied (see F ig. 15): (QP1) Unidir ectionality . F or any given pair of distinct blocks B i and B j ∈ P we have, at most, one edge between them. Thus, if for some i 6 = j we have ( B i , B j ) ∈ E then forcibly ( B j , B i ) / ∈ E . B 1 B 2 B 3 B 4 B 5 B 6 Fig. 15. A quasi-partition ˜ P = ( P, E ) on a set of nodes. The vertex set P of the quasi-partition is given by a partition of the nodes P = { B 1 , B 2 , . . . , B 6 } . Nodes within the same block of the partition P can influence each other . The edges of the directed graph ˜ P = ( P , E ) represent unidirectional influence between the blocks of the partition. In this case, block B 1 can influence B 3 , B 4 and B 5 while block B 2 and B 4 can only influence B 3 and B 5 , respecti vely . (QP2) T ransitivity . If ther e ar e edges between blocks B i and B j and between blocks B j and B k , then there is an edge between blocks B i and B k . The vertex set P of a quasi-partition ˜ P = ( P , E ) is meant to capture sets of nodes that can influence each other , whereas the edges in E intend to capture the notion of directed influence from one group to the next. In the example in Fig. 15, nodes which are drawn close to each other ha ve low dissimilarities between them in both directions. Thus, the nodes inside each block B i are close to each other b ut dissimilarities between nodes of different blocks are large in at least one direction. E.g., the dissimilarity from B 1 to B 4 is small but the dissimilarity from B 4 to B 1 is large. This latter fact motiv ates keeping B 1 and B 4 as separate blocks in the partition whereas the former motiv ates addition of the directed influence edge ( B 1 , B 4 ) . Like wise, dissimilarities from B 1 to B 3 , from B 2 to B 3 and from B 4 to B 5 are small whereas those on opposite directions are not. Dissimilarities from the nodes in B 1 to the nodes in B 5 need not be small, but B 1 can influence B 5 through B 4 , hence the edge from B 1 to B 5 , in accordance with (QP2). All other dissimilarities are large justifying the lack of connections between the other blocks. Further observe that there are no bidirectional edges as required by (QP1). Requirements (QP1) and (QP2) in the definition of quasi- partition represent the relational structure that emer ges from quasi- equiv alence relations as we state in the following proposition. Proposition 9 Given a node set X and a quasi-equivalence r elation on X [cf. Definition 2] define the relation ↔ on X as x ↔ x 0 ⇐ ⇒ x x 0 and x 0 x, (122) for all x, x 0 ∈ X . Then, ↔ is an equivalence r elation. Let P = { B 1 , . . . , B J } be the partition of X induced by ↔ . Define E ⊆ P × P such that for all distinct B i , B j ∈ P ( B i , B j ) ∈ E ⇐ ⇒ x i x j , (123) for some x i ∈ B i and x j ∈ B j . Then, ˜ P = ( P , E ) is a quasi- partition of X . Con versely , given a quasi-partition ˜ P = ( P, E ) of X , define the binary r elation on X so that for all x, x 0 ∈ X x x 0 ⇐ ⇒ [ x ] = [ x 0 ] or ([ x ] , [ x 0 ]) ∈ E , (124) 24 wher e [ x ] ∈ P is the block of the partition P that contains the node x and similarly for [ x 0 ] . Then, is a quasi-equivalence on X . Proof: See Theorem 4.9, Ch. 1.4 in [35]. In the same way that an equiv alence relation induces and is induced by a partition on a gi ven node set X , Proposition 9 shows that a quasi-equiv alence relation induces and is induced by a quasi-partition on X . W e can then adopt the construction of quasi- partitions as the natural generalization of clustering problems when gi ven asymmetric data. Further, observe that if the edge set E contains no edges, then ˜ P = ( P , E ) is such that P is a standard partition of X . In this sense, partitions can be regarded as particular cases of quasi-partitions having the generic form ˜ P = ( P , ∅ ) . T o allow generalizations of hierarchical clustering methods with asymmetric outputs we introduce the notion of quasi-dendr ogram in the following section. A. Quasi-dendr ogr ams Recalling that a dendrogram is defined as a nested set of partitions, we define a quasi-dendr ogram ˜ D X of the set X as a collection of nested quasi-partitions ˜ D X ( δ ) = ( D X ( δ ) , E X ( δ )) index ed by a resolution parameter δ ≥ 0 . Recall the definition of [ x ] δ from Section II. Formally , for ˜ D X to be a quasi-dendrogram we require the follo wing conditions: ( ˜ D1) Boundary conditions. At resolution δ = 0 all nodes are in separate clusters with no influences between them and for some δ 0 sufficiently large all elements of X are in a single cluster , ˜ D X (0) = { x } , x ∈ X , ∅ , ˜ D X ( δ 0 ) = { X } , ∅ for some δ 0 > 0 . (125) ( ˜ D2) Equivalence hierar chy . For any pair of points x, x 0 for which x ∼ D X ( δ 1 ) x 0 at resolution δ 1 we must hav e x ∼ D X ( δ 2 ) x 0 for all resolutions δ 2 ≥ δ 1 . ( ˜ D3) Influence hierar chy . If there is an influence edge ([ x ] δ 1 , [ x 0 ] δ 1 ) ∈ E X ( δ 1 ) between the equiv alence classes [ x ] δ 1 and [ x 0 ] δ 1 of nodes x and x 0 at resolution δ 1 , at an y reso- lution δ 2 ≥ δ 1 we either have ([ x ] δ 2 , [ x 0 ] δ 2 ) ∈ E X ( δ 2 ) or [ x ] δ 2 = [ x 0 ] δ 2 . ( ˜ D4) Right continuity . For all δ ≥ 0 there exists > 0 such that ˜ D X ( δ ) = ˜ D X ( δ 0 ) for all δ 0 ∈ [ δ , δ + ] . Requirements ( ˜ D1), ( ˜ D2), and ( ˜ D4) are counterparts to require- ments (D1), (D2), and (D3) in the definition of dendrograms. The minor v ariation in ( ˜ D1) is to specify that the edge sets at the extreme values of δ are empty . For δ = 0 this is because there are no influences at that resolution and for δ = δ 0 because there is a single cluster and we declared in Definition 3 that blocks do not have self-loops. Condition ( ˜ D3) states for the edge set the analogous requirement that condition (D2), or ( ˜ D2) for that matter , states for the node set. If there is an edge present at a giv en resolution δ 1 that edge should persist at coarser resolutions δ 2 > δ 1 except if two blocks linked by the edge merge into a single cluster . Respectiv e comparison of ( ˜ D1), ( ˜ D2), and ( ˜ D4) to properties (D1), (D2), and (D3) in Section II implies that giv en a quasi- dendrogram ˜ D X = ( D X , E X ) on a node set X , the component D X is a dendrogram on X . I.e, the verte x sets D X ( δ ) of the quasi-partitions ( D X ( δ ) , E X ( δ )) for varying δ form a nested set of partitions. Hence, if the edge set E X ( δ ) = ∅ for every resolution parameter δ ≥ 0 , ˜ D X recov ers the structure of the dendrogram D X . Thus, quasi-dendrograms are a generalization of dendrograms, or , equiv alently , dendrograms are particular cases of quasi-dendrograms with empty edge sets. Redefining dendrograms D X so that they represent quasi-dendrograms ( D X , ∅ ) with empty edge sets and reinterpreting D as the set of quasi-dendrograms with empty edge sets we have that D ⊂ ˜ D , where ˜ D is the space of quasi-dendrograms. A hierarchical clustering method H : N → D is defined as a map from the space of networks N to the space of dendrograms D [cf. (3)]. Like wise, we define a hierarchical quasi-clustering method as a map ˜ H : N → ˜ D , (126) from the space of networks to the space of quasi-dendrograms such that the underlying space X is preserved. Since D ⊂ ˜ D we hav e that e very clustering method is a quasi-clustering method b ut not vice versa. Our goal here is to study quasi-clustering methods satisfying suitably modified versions of the axioms of v alue and transformation introduced in Section III. Before that, we introduce quasi-ultrametrics as asymmetric v ersions of ultrametrics and show their equi v alence to quasi-dendrograms in the following section after two pertinent remarks. Remark 5 If we are given a quasi-equiv alence relation and its induced quasi-partition on a node set X , (122) implies that all nodes inside the same block of the quasi-partition are quasi- equiv alent to each other . If we combine this with the transitivity property in Definition 2, we ha ve that if x i x j for some x i ∈ B i and x j ∈ B j or , equiv alently , ( B i , B j ) ∈ E then x 0 i x 0 j for all x 0 i ∈ B i and all x 0 j ∈ B j . Remark 6 Unidirectionality (QP1) ensures that no cycles con- taining exactly two blocks can exist in any quasi-partition ˜ P = ( P , E ) . If there were longer cycles, transiti vity (QP2) would imply that ev ery two distinct blocks in a longer cycle would hav e to form a two-block cycle, contradicting (QP1). Thus, conditions (QP1) and (QP2) imply that ev ery quasi-partition ˜ P = ( P , E ) is directed acyclic graph (D AG). The fact that a D A G represents a partial order sho ws that our construction of a quasi-partition from a quasi-equi valence relation is consistent with the known set theoretic construction of a partial order on a partition of a set giv en a preorder on the set [35]. B. Quasi-ultr ametrics Giv en a node set X , a quasi-ultrametric ˜ u X on X is a function ˜ u X : X × X → R + satisfying the identity property and the strong triangle inequality in (12) as we formally define next. Definition 4 Given a node set X a quasi-ultrametric ˜ u X is a nonne gative function ˜ u X : X × X → R + satisfying the following pr operties for all x, x 0 , x 00 ∈ X : (i) Identity . ˜ u X ( x, x 0 ) = 0 if and only if x = x 0 . (ii) Str ong triangle inequality . ˜ u X satisfies (12) . 25 Comparison of definitions 1 and 4 shows that a quasi- ultrametric may be regarded as a relaxation of the notion of an ultrametric in that the symmetry property is not imposed. In particular , the space ˜ U of quasi-ultrametric networks, i.e. netw orks with quasi-ultrametrics as dissimilarity functions, is a superset of the space of ultrametric networks U ⊂ ˜ U . See [44] for a study of some structural properties of quasi-ultrametrics. Analogously to the claim in Theorem 1 that provides a struc- ture preserving bijection between dendrograms and ultrametrics, the follo wing constructions and theorem establish a structure preserving equiv alence between quasi-dendrograms and quasi- ultrametrics. Consider the map ˜ Ψ : ˜ D → ˜ U defined as follows: for a giv en quasi-dendrogram ˜ D X = ( D X , E X ) over the set X write ˜ Ψ( ˜ D X ) = ( X , ˜ u X ) , where we define ˜ u X ( x, x 0 ) for each x, x 0 ∈ X as the smallest resolution δ at which either both nodes belong to the same equi valence class [ x ] δ = [ x 0 ] δ , i.e. x ∼ D X ( δ ) x 0 , or there exists an edge in E X ( δ ) from the equiv alence class [ x ] δ to the equi valence class [ x 0 ] δ , ˜ u X ( x, x 0 ) := min n δ ≥ 0 (127) [ x ] δ = [ x 0 ] δ or ([ x ] δ , [ x 0 ] δ ) ∈ E X ( δ ) o . W e also consider the map ˜ Υ : ˜ U → ˜ D constructed as follo ws: for a gi ven quasi-ultrametric ˜ u X on the set X and each δ ≥ 0 define the relation ∼ ˜ u X ( δ ) on X as x ∼ ˜ u X ( δ ) x 0 ⇐ ⇒ max ˜ u X ( x, x 0 ) , ˜ u X ( x 0 , x ) ≤ δ . (128) Define further D X ( δ ) := X mo d ∼ ˜ u X ( δ ) and the edge set E X ( δ ) for ev ery δ ≥ 0 as follows: B 1 6 = B 2 ∈ D X ( δ ) are such that ( B 1 , B 2 ) ∈ E X ( δ ) ⇐ ⇒ min x 1 ∈ B 1 x 2 ∈ B 2 ˜ u X ( x 1 , x 2 ) ≤ δ . (129) Finally , ˜ Υ( X, ˜ u X ) := ˜ D X , where ˜ D X := ( D X , E X ) . Theorem 6 The maps ˜ Ψ : ˜ D → ˜ U and ˜ Υ : ˜ U → ˜ D are both well defined. Furthermore , ˜ Ψ ◦ ˜ Υ is the identity on ˜ U and ˜ Υ ◦ ˜ Ψ is the identity on ˜ D . Proof: See Appendix D. Remark 7 Theorem 6 implies that every quasi-dendrogram ˜ D X has an equi valent representation as a quasi-ultrametric network defined on the same underlying node set X gi ven by ˜ Ψ( ˜ D X ) . Analogously , ev ery quasi-ultrametric network ˜ U = ( X , ˜ u X ) has an equi valent quasi-dendrogram giv en by ˜ Υ( ˜ U ) . The equiv alence between quasi-dendrograms and quasi- ultrametric networks described in Remark 7 allows us to rein- terpret hierarchical quasi-clustering methods [cf. (126)] as maps ˜ H : N → ˜ U , (130) from the space of networks to the space of quasi-ultrametric networks. Apart from the theoretical importance of Theorem 6, this equiv alence result is of practical importance since quasi- ultrametrics are mathematically more con venient to handle than quasi-dendrograms – in the same sense in which regular ultra- metrics are easier to handle than regular dendrograms. Quasi- dendrograms are still preferable for data representation as we discuss in the numerical examples in Section XII. Giv en a quasi-dendrogram ˜ D X = ( D X , E X ) , the value ˜ u X ( x, x 0 ) of the associated quasi-ultrametric for x, x 0 ∈ X is giv en by the minimum resolution δ at which x can influence x 0 . This may occur when x and x 0 belong to the same block of D X ( δ ) or when they belong to different blocks B , B 0 ∈ D X ( δ ) , but there is an edge from the block containing x to the block containing x 0 , i.e. ( B , B 0 ) ∈ E X ( δ ) . Con versely , given a quasi- ultrametric network ( X , ˜ u X ) , for a gi ven resolution δ the graph ˜ D X ( δ ) has as a v ertex set the classes of nodes whose quasi- ultrametric is less than δ in both directions. Furthermore, ˜ D X ( δ ) contains a directed edge between two distinct equiv alence classes if the quasi-ultrametric from some node in the first class to some node in the second is not greater than δ . In Fig. 16 we present an example of the equi valence between quasi-dendrograms and quasi-ultrametric networks stated by The- orem 6. At the top left of the figure, we present a quasi-ultrametric ˜ u X defined on a three-node set X = { x 1 , x 2 , x 3 } . At the top right, we depict the dendrogram component D X of the quasi- dendrogram ˜ D X = ( D X , E X ) equiv alent to ( X, ˜ u X ) as giv en by Theorem 6. At the bottom of the figure, we present graphs ˜ D X ( δ ) for a range of resolutions δ ≥ 0 . T o obtain ˜ D X from ˜ u X , we first obtain the dendrogram compo- nent D X by symmetrizing ˜ u X to the maximum [cf. (128)], nodes x 1 and x 2 merge at resolution 2 and x 3 merges with { x 1 , x 2 } at resolution 3. T o see how the edges in ˜ D X are obtained, at resolutions 0 ≤ δ < 1 , there are no edges since there is no quasi- ultrametric value between distinct nodes in this range [cf. (129)]. At resolution δ = 1 , we reach the first nonzero values of ˜ u X and hence the corresponding edges appear in ˜ D X (1) . At resolution δ = 2 , nodes x 1 and x 2 merge and become the same verte x in graph ˜ D X (2) . Finally , at resolution δ = 3 all the nodes belong to the same equiv alence class and hence ˜ D X (3) contains only one vertex. Con versely , to obtain ˜ u X from ˜ D X as depicted in the figure, note that at resolution δ = 1 two edges ([ x 1 ] 1 , [ x 2 ] 1 ) and ([ x 3 ] 1 , [ x 2 ] 1 ) appear in ˜ D X (1) , thus the corresponding values of the quasi-ultrametric are fixed to be ˜ u X ( x 1 , x 2 ) = ˜ u ( x 3 , x 2 ) = 1 . At resolution δ = 2 , when x 1 and x 2 merge into the same verte x in ˜ D X (2) , an edge is generated from [ x 3 ] 2 to [ x 1 ] 2 the equiv alence class of x 1 at resolution δ = 2 which did not exist before, implying that ˜ u X ( x 3 , x 1 ) = 2 . Moreover , we have that [ x 2 ] 2 = [ x 1 ] 2 , hence ˜ u X ( x 2 , x 1 ) = 2 . Finally , at ˜ D X (3) there is only one equiv alence class, thus the values of ˜ u X that hav e not been defined so far must equal 3. C. Axioms for hierar chical quasi-clustering methods Mimicking the dev elopment in Section III, we encode desirable properties of quasi-clustering methods into axioms which we use as a criterion for admissibility . The axioms considered are the directed versions of the axioms of value (A1) and transformation (A2) introduced in Section III. The Directed Axiom of V alue ( ˜ A1) and the Directed Axiom of Transformation ( ˜ A2) winno w the space of quasi-clustering methods by imposing conditions on their output quasi-dendrograms. ( ˜ A1) Directed Axiom of V alue. For each α, β ≥ 0 , the quasi- dendrogram ˜ D X = ( D X , E X ) = ˜ H ( ~ ∆ 2 ( α, β )) produced by ˜ H on the arbitrary two-node network ~ ∆ 2 ( α, β ) is such that D X ( δ ) = { p } , { q } for δ < max( α, β ) and D X ( δ ) = { p, q } for δ ≥ max( α, β ) . When α 6 = β , the edge sets E X ( δ ) are non-empty for resolutions min( α, β ) ≤ δ < 26 x 2 x 3 x 1 3 2 1 1 3 2 ˜ D X ( δ ) x 2 x 3 x 1 0 ≤ δ < 1 x 2 x 3 x 1 1 ≤ δ < 2 x { 1 , 2 } x 3 2 ≤ δ < 3 x { 1 , 2 , 3 } δ ≥ 3 δ 1 2 3 x 1 x 2 x 3 D X ˜ u X 0 1 2 3 δ Fig. 16. Equiv alence between quasi-dendrograms and quasi-ultrametrics. A quasi- ultrametric ˜ u X is defined on three nodes { x 1 , x 2 , x 3 } and the equivalent quasi- dendrogram ˜ D X = ( D X , E X ) is presented by depicting D X and graphs ˜ D X ( δ ) for e very resolution δ . p q α β ~ ∆ 2 ( α, β ) ˜ D X 0 min( α, β ) max( α, β ) δ p q p q if α < β p q if α > β { p, q } Fig. 17. Directed Axiom of V alue. Nodes in a two-node network merge into one block at the minimum resolution at which both can influence each other . For smaller resolutions, the quasi-dendrogram captures unidirectional influence. max( α, β ) where ( q , p ) ∈ E X ( δ ) if α > β and ( p, q ) ∈ E X ( δ ) if α < β ; see Fig. 17. ( ˜ A2) Dir ected Axiom of T ransformation. Consider two networks N X = ( X, A X ) and N Y = ( Y , A Y ) and a dissimilarity- reducing map φ : X → Y . Then, the output quasi-dendrograms ˜ D X = ( D X , E X ) = H ( N X ) and ˜ D Y = ( D Y , E Y ) = H ( N Y ) are such that for all δ ≥ 0 , if [ x ] δ = [ x 0 ] δ then [ φ ( x )] δ = [ φ ( x 0 )] δ and if ([ x ] δ , [ x 0 ] δ ) ∈ E X ( δ ) then ([ φ ( x )] δ , [ φ ( x 0 )] δ ) ∈ E Y ( δ ) or [ φ ( x )] δ = [ φ ( x 0 )] δ for all x, x 0 ∈ X . Theorem 6 allows us to rewrite axioms ( ˜ A1) and ( ˜ A2) in terms of quasi-ultrametric networks. As it was the case for ultrametrics and dendrograms, quasi-ultrametrics are mathematically more con venient to handle than quasi-dendrograms. The first indication of this fact is the simpler reformulation of axioms ( ˜ A1) and ( ˜ A2) in terms of quasi-ultrametrics: ( ˜ A1) Dir ected Axiom of V alue. ˜ H ( ~ ∆ 2 ( α, β )) = ~ ∆ 2 ( α, β ) for ev ery two-node network ~ ∆ 2 ( α, β ) . ( ˜ A2) Dir ected Axiom of T ransformation. Consider two networks N X = ( X, A X ) and N Y = ( Y , A Y ) and a dissimilarity- reducing map φ : X → Y , i.e. a map φ such that for all x, x 0 ∈ X it holds that A X ( x, x 0 ) ≥ A Y ( φ ( x ) , φ ( x 0 )) . Then, for all x, x 0 ∈ X , the outputs ( X , ˜ u X ) = ˜ H ( X , A X ) and ( Y , ˜ u Y ) = ˜ H ( Y , A Y ) satisfy ˜ u X ( x, x 0 ) ≥ ˜ u Y ( φ ( x ) , φ ( x 0 )) . (131) The Directed Axiom of Transformation ( ˜ A2) is just a re- statement of the (regular) Axiom of T ransformation (A2) where the ultrametrics u X and u Y in (24) are replaced by the quasi- ultrametrics ˜ u X and ˜ u Y in (131). The axioms are otherwise conceptual analogues. In terms of quasi-dendrograms, ( ˜ A2) states that no influence relation can be weakened by a dissimilarity reducing transformation. The Directed Axiom of V alue ( ˜ A1) simply recognizes that in any two-node network, the dissimilarity function is itself a quasi-ultrametric and that there is no valid justification to output a different quasi-ultrametric. In this sense, ( ˜ A1) is similar to the Symmetric Axiom of V alue (B1) that also requires two-node netw orks to be fixed points of (symmetric) hierarchical clustering methods. In terms of quasi-dendrograms, ( ˜ A1) requires the quasi-clustering method to output the quasi- dendrogram equi valent according to Theorem 6 to the dissimilarity function of the two-node network. D. Existence and uniqueness of admissible quasi-clustering meth- ods: dir ected single linkage W e call a quasi-clustering method ˜ H admissible if it satisfies axioms ( ˜ A1) and ( ˜ A2) and, emulating the development in Section V, we want to find methods that are admissible with respect to these axioms. This is can be done in the following way . Recall the definition of the directed minimum chain cost ˜ u ∗ X in (7) and define the dir ected single linkage quasi-clustering method ˜ H ∗ as the one with output quasi-ultrametrics ( X , ˜ u ∗ X ) = ˜ H ∗ ( X, A X ) gi ven by the directed minimum chain cost function ˜ u ∗ X . The directed single linkage method ˜ H ∗ is valid and admissible as we show in the following proposition. Proposition 10 The hierar chical quasi-clustering method ˜ H ∗ is valid and admissible . I.e., ˜ u ∗ X defined by (7) is a quasi-ultr ametric and ˜ H ∗ satisfies axioms ( ˜ A1)-( ˜ A2). Proof: In order to show that ˜ u ∗ X is a v alid quasi-ultrametric we may apply an argument based on concatenated chains as the one preceding Proposition 1. T o show fulfillment of Axiom ( ˜ A1), pick an arbitrary two- node network ~ ∆ 2 ( α, β ) as defined in Section II and denote by ( { p, q } , ˜ u ∗ p,q ) = ˜ H ∗ ( ~ ∆ 2 ( α, β )) . Then, we ha ve ˜ u ∗ p,q ( p, q ) = α and ˜ u ∗ p,q ( q , p ) = β because there is only one possible chain selection in each direction [cf. (7)]. Satisfaction of the Directed Axiom of T ransformation ( ˜ A2) is the intermediate result (177) in the proof of Proposition 2 in Appendix A. From Proposition 10 we kno w that ˜ u ∗ X is a quasi-ultrametric. Its equiv alent quasi-dendrogram according to Theorem 6 [cf. Remark 7] is related to the nonreciprocal clustering method H NR as we show next. Proposition 11 F or e very network N = ( X , A X ) , let ˜ D ∗ X = ( D ∗ X , E ∗ X ) denote the quasi-dendr ogram H ∗ ( N ) . Then, D ∗ X = D NR X wher e D NR X = H NR ( N ) is the output dendr ogram of applying nonr eciprocal clustering as defined in (67) to the network N . Proof: Compare (67) with (128) and conclude that x ∼ ˜ u ∗ X ( δ ) x 0 ⇐ ⇒ u NR X ( x, x 0 ) ≤ δ , (132) 27 for all x, x 0 ∈ X . The equiv alence relation ∼ ˜ u ∗ X ( δ ) defines D ∗ X and by (14) in Theorem 1 we obtain that the equiv alence relation ∼ U NR X ( δ ) defining D NR X is such that x ∼ U NR X ( δ ) x 0 ⇐ ⇒ u NR X ( x, x 0 ) ≤ δ . (133) Comparing (132) and (133), the result follo ws. Furthermore, from (25) and (120) it follows that for ev ery network ( X, A X ) with | X | = n , the quasi-ultrametric ˜ u ∗ X can be computed as ˜ u ∗ X = A ( n − 1) X , (134) where the operation ( · ) ( n − 1) denotes the ( n − 1) st matrix power in the dioid algebra ( R + ∪ { + ∞} , min , max) with matrix product as defined in (103). Mimicking the developments in sections V and VI, we next ask which other methods satisfy ( ˜ A1)-( ˜ A2) and what special properties directed single linkage has. As it turns out, directed single linkage is the unique quasi-clustering method that is admissible with respect to ( ˜ A1)-( ˜ A2) as we assert in the follo wing theorem. Theorem 7 Let ˜ H be a valid hierar chical quasi-clustering method satisfying axioms ( ˜ A1) and ( ˜ A2). Then, ˜ H ≡ ˜ H ∗ wher e ˜ H ∗ is the dir ected single linka ge method with output quasi- ultrametrics as in (7) . Proof: The proof is similar to the proof of Theorem 4. Giv en an arbitrary network N = ( X, A X ) denote by ( X, ˜ u X ) = ˜ H ( X , A X ) the output quasi-ultrametric resulting from application of an arbitrary admissible quasi-clustering method ˜ H . W e will show that for all x, x 0 ∈ X ˜ u ∗ X ( x, x 0 ) ≤ ˜ u X ( x, x 0 ) ≤ ˜ u ∗ X ( x, x 0 ) . (135) T o prove the rightmost inequality in (135) we begin by showing that the dissimilarity function A X acts as an upper bound on all admissible quasi-ultrametrics ˜ u X , i.e. ˜ u X ( x, x 0 ) ≤ A X ( x, x 0 ) , (136) for all x, x 0 ∈ X . T o see this, suppose A X ( x, x 0 ) = α and A X ( x 0 , x ) = β . Define the two-node network ~ ∆ 2 ( α, β ) = ( { p, q } , A p,q ) where A p,q ( p, q ) = α and A p,q ( q , p ) = β and denote by ( { p, q } , ˜ u p,q ) = ˜ H ( ~ ∆ 2 ( α, β )) the output of applying the method ˜ H to the network ~ ∆ 2 ( α, β ) . From axiom ( ˜ A1), we hav e ˜ H ( ~ ∆ 2 ( α, β )) = ~ ∆ 2 ( α, β ) , in particular ˜ u p,q ( p, q ) = A p,q ( p, q ) = A X ( x, x 0 ) . (137) Moreov er , notice that the map φ : { p, q } → X , where φ ( p ) = x and φ ( q ) = x 0 is a dissimilarity reducing map, i.e. it does not increase any dissimilarity , from ~ ∆ 2 ( α, β ) to N . Hence, from axiom ( ˜ A2), we must hav e ˜ u p,q ( p, q ) ≥ ˜ u X ( φ ( p ) , φ ( q )) = ˜ u X ( x, x 0 ) . (138) Substituting (137) in (138), we obtain (136). Consider now an arbitrary chain C ( x, x 0 ) = [ x = x 0 , x 1 , . . . , x l = x 0 ] linking nodes x and x 0 . Since ˜ u X is a valid quasi-ultrametric, it satisfies the strong triangle inequality (12). Thus, we hav e that ˜ u X ( x, x 0 ) ≤ max i | x i ∈ C ( x,x 0 ) ˜ u X ( x i , x i +1 ) ≤ max i | x i ∈ C ( x,x 0 ) A X ( x i , x i +1 ) , (139) where the last inequality is implied by (136). Since by definition C ( x, x 0 ) is an arbitrary chain linking x to x 0 , we can minimize (139) over all such chains maintaining the validity of the inequal- ity , ˜ u X ( x, x 0 ) ≤ min C ( x,x 0 ) max i | x i ∈ C ( x,x 0 ) A X ( x i , x i +1 ) = ˜ u ∗ X ( x, x 0 ) , (140) where the last equality is given by the definition of the directed minimum chain cost (7). Thus, the rightmost inequality in (135) is prov ed. T o prove the leftmost inequality in (135), consider an arbitrary pair of nodes x, x 0 ∈ X and fix δ = ˜ u ∗ X ( x, x 0 ) . Then, by Lemma 1, there exists a partition P δ ( x, x 0 ) = { B δ ( x ) , B δ ( x 0 ) } of the node space X into blocks B δ ( x ) and B δ ( x 0 ) with x ∈ B δ ( x ) and x 0 ∈ B δ ( x 0 ) such that for all points b ∈ B δ ( x ) and b 0 ∈ B δ ( x 0 ) we hav e A X ( b, b 0 ) ≥ δ . (141) Focus on a tw o-node network ~ ∆ 2 ( δ, s ) = ( { u, v } , A u,v ) with A u,v ( u, v ) = δ and A u,v ( v , u ) = s where s = sep ( X, A X ) as defined in (10). Denote by ( { u, v } , ˜ u u,v ) = ˜ H ( ~ ∆ 2 ( δ, s )) the output of applying the method ˜ H to the network ~ ∆ 2 ( δ, s ) . Notice that the map φ : X → { u, v } such that φ ( b ) = u for all b ∈ B δ ( x ) and φ ( b 0 ) = v for all b 0 ∈ B δ ( x 0 ) is dissimilarity reducing because, from (141), dissimilarities mapped to dissimilarities equal to δ in ~ ∆ 2 ( δ, s ) were originally larger . Moreov er , dissimilarities mapped into s cannot have increased due to the definition of separation of a network (10). From axiom ( ˜ A1), ˜ u u,v ( u, v ) = A u,v ( u, v ) = δ, (142) since ~ ∆ 2 ( δ, s ) is a two-node netw ork. Moreov er , since φ is dissimilarity reducing, from ( ˜ A2) we may assert that ˜ u X ( x, x 0 ) ≥ ˜ u u,v ( φ ( x ) , φ ( x 0 )) = δ, (143) where we used (142) for the last equality . Recalling that ˜ u ∗ X ( x, x 0 ) = δ and substituting in (143) concludes the proof of the leftmost inequality in (135). Since both inequalities in (135) hold, we must hav e ˜ u ∗ X ( x, x 0 ) = ˜ u X ( x, x 0 ) for all x, x 0 ∈ X . Since this is true for any arbitrary network N = ( X , A X ) , it follows that the admissible quasi- clustering method must be ˜ H ≡ ˜ H ∗ . As it follows from Section VII, there are e xist many (actu- ally , infinitely many) different admissible hierarchical clustering algorithms for asymmetric networks. In the case of symmetric net- works, [13] establishes that there is a unique admissible method. Theorem 7 suggests that what prev ents uniqueness in asymmetric networks is the insistence that the hierarchical clustering method should hav e a symmetric ultrametric output. If we remove the symmetry requirement there is also a unique admissible hierar- chical quasi-clustering method. Furthermore, this unique method is an asymmetric version of single linkage. Remark 8 The definition of directed single linkage as a natural extension of single linkage hierarchical clustering to asymmetric networks dates back to [28]. Our contribution is to de velop a framew ork to study hierarchical quasi-clustering that starts from quasi-equiv alence relations, builds towards quasi-partitions and quasi-dendrograms, shows the equiv alence of the latter to quasi- ultrametrics, and culminates with the proof that directed single 28 linkage is the unique admissible method to hierarchically quasi- cluster asymmetric networks. Furthermore, stability of directed single linkage is established in Section XI. X . A L T E R N A T I V E A X I O M A T I C C O N S T RU C T I O N S The axiomatic framework that we adopted allows alternativ e constructions by modifying the underlying set of axioms. Among the axioms in Section III, the Axiom of V alue (A1) is perhaps the most open to interpretation. Although we required the two-node network in Fig. 3 to first cluster into one single block at resolution max( α, β ) corresponding to the largest dissimilarity and ar gued that this was reasonable in most situations, it is also reasonable to accept that in some situations the two nodes should be clustered together as long as one of them is able to influence the other . T o account for this possibility we replace the Axiom of V alue by the following alternativ e. (A1”) Alternative Axiom of V alue. The ultrametric output ( { p, q } , u p,q ) := H ( ~ ∆ 2 ( α, β )) produced by H applied to the two-node network ~ ∆ 2 ( α, β ) satisfies u p,q ( p, q ) = min( α, β ) . (144) Axiom (A1”) replaces the requirement of bidirectional influence in Axiom (A1) to unidirectional influence; see Fig. 18. W e say that a clustering method H is admissible with respect to the alternative axioms if it satisfies axioms (A1”) and (A2). The property of influence (P1), which is a keystone in the proof of Theorem 4, is not compatible with the Alternativ e Axiom of V alue (A1”). Indeed, just observe that the minimum loop cost of the two-node network in Fig. 18 is mlc ( ~ ∆ 2 ( α, β )) = max( α, β ) whereas in (144) we are requiring the output ultrametric to be u X ( p, q ) = min( α, β ) . W e therefore have that Axiom (A1”) itself implies u p,q ( p, q ) = min( α, β ) < max( α, β ) = mlc ( ~ ∆ 2 ( α, β )) for the cases when α 6 = β . Thus, we reformulate (P1) into the Alternativ e Property of Influence (P1’) that we define next. (P1’) Alternative Pr operty of Influence. For any network N X = ( X , A X ) the output ultrametric ( X , u X ) = H ( X, A X ) corresponding to the application of a hierarchical clustering method H is such that the ultrametric value u X ( x, x 0 ) between any two distinct points x and x 0 cannot be smaller than the separation [cf. (10)] of the network u X ( x, x 0 ) ≥ sep ( X, A X ) for all x 6 = x 0 . (145) Observe that the Alternative Property of Influence (P1’) coincides with the Symmetric Property of Influence (Q1) defined in Section VI-A. This is not surprising because for symmetric networks the Axiom of V alue (A1) and the Alternative Axiom of V alue (A1”) impose identical restrictions. Moreover , since the separation of a network cannot be larger than its minimum loop cost, the Alternativ e Property of Influence (P1’) is implied by the (regular) Property of Influence (P1), but not vice versa. The Alternativ e Property of Influence (P1’) states that no clus- ters are formed at resolutions at which there are no unidirectional influences between any pair of nodes and is consistent with the Alternativ e Axiom of V alue (A1”). Moreov er , in studying methods admissible with respect to (A1”) and (A2), (P1’) plays a role akin to the one played by (P1) when studying methods that are admissible with respect to (A1) and (A2). In particular , as (P1) is p q α β δ min( α, β ) p q ~ ∆ 2 ( α, β ) D p,q Fig. 18. Alternativ e Axiom of V alue. For a two-node network, nodes are clustered together at the minimum resolution at which one of them can influence the other . implied by (A1) and (A2), (P1’) is true if (A1”) and (A2) hold as we assert in the follo wing theorem. Theorem 8 If a clustering method H satisfies the Alternative Axiom of V alue (A1”) and the Axiom of T ransformation (A2) then it also satisfies the Alternative Pr operty of Influence (P1’). Proof: See Appendix E. Theorem 8 admits the following interpretation. In (A1”) we require two-node netw orks to cluster at the resolution where unidirectional influence occurs. When we consider (A1”) in conjunction with (A2) we can translate this requirement into a statement about clustering in arbitrary netw orks. Such requirement is the Alternati ve Property of Influence (P1’) which pre vents nodes to cluster at resolutions at which each node in the network is disconnected from the rest. A. Unilater al clustering Mimicking the dev elopments in sections III-VI, we move on to identify and define methods that satisfy axioms (A1”)-(A2) and then bound the range of admissible methods respect to these axioms. T o do so let N = ( X , A X ) be a given network and consider the symmetric dissimilarity function ˆ A X ( x, x 0 ) := min( A X ( x, x 0 ) , A X ( x 0 , x )) , (146) for all x, x 0 ∈ X . Notice that as opposed to the definition of ¯ A X , where the symmetrization is done by means of a max operation, ˆ A is defined by using a min operation. W e define the unilateral clustering method H U with output ultrametric ( X, u U X ) = H U ( N ) , where u U X is defined as u U X ( x, x 0 ) := min C ( x,x 0 ) max i | x i ∈ C ( x,x 0 ) ˆ A X ( x i , x i +1 ) , (147) for all x, x 0 ∈ X . T o show that H U is a properly defined clustering method, we need to establish that u U X as defined in (147) is a v alid ultrametric. Ho wev er , comparing (147) and (25) we see that H U ( X, A X ) ≡ H SL ( X, ˆ A X ) , (148) i.e. applying the unilateral clustering method to an asymmet- ric network ( X , A X ) is equiv alent to applying single linkage clustering method to the symmetrized network ( X , ˆ A X ) . Since we know that single linkage produces a valid ultrametric when applied to any symmetric network such as ( X, ˆ A X ) , (147) is a properly defined ultrametric. Moreover , as an elaboration of the results in Section VIII, from (148), (120), and (146) we obtain an algorithmic way of computing the unilateral ultrametric output for any network, u U X = min A X , A T X ( n − 1) , (149) 29 where the operation ( · ) ( n − 1) denotes the ( n − 1) st matrix po wer in the dioid algebra ( R + ∪ { + ∞} , min , max) with matrix product as defined in (103). Furthermore, it can be shown that H U satisfies axioms (A1”) and (A2). Proposition 12 The unilateral clustering method H U with output ultrametrics defined in (147) satisfies axioms (A1”) and (A2). Proof: See Appendix E. In the case of admissibility with respect to (A1) and (A2), in Section VII we constructed an infinite number of clustering methods whose outcomes are uniformly bounded between those of nonreciprocal and reciprocal clustering as predicted by Theorem 4. In contrast, in the case of admissibility with respect to (A1”) and (A2), unilateral clustering is the unique admissible method as stated in the follo wing theorem. Theorem 9 Let H be a hierar chical clustering method satisfying axioms (A1”) and (A2). Then, H ≡ H U wher e H U is the unilateral clustering method with output ultrametrics as in (147) . Proof: See Appendix E. Remark 9 By Theorem 9, the space of methods that satisfy the Alternativ e Axiom of V alue (A1”) and the Axiom of Transfor - mation (A2) is inherently simpler than the space of methods that satisfy the (regular) Axiom of value (A1) and the Axiom of T ransformation (A2). Further note that in the case of symmetric networks, for all x, x 0 ∈ X we hav e ˆ A X ( x, x 0 ) = A X ( x, x 0 ) = A X ( x 0 , x ) [cf. (146)] and as a consequence unilateral clustering is equiv alent to single linkage as it follows from comparison of (25) and (147). Thus, the result in Theorem 9 reduces to the statement in Corollary 2, which was deriv ed upon observing that in symmetric networks reciprocal and nonreciprocal clustering yield identical outcomes. The fact that reciprocal, nonreciprocal, and unilateral clustering all coalesce into single linkage when restricted to symmetric networks is consistent with the fact that the Axiom of V alue (A1) and the Alternativ e Axiom of V alue (A1”) are both equiv alent to the Symmetric Axiom of V alue (B1) when restricted to symmetric dissimilarities. B. Agnostic Axiom of V alue Axiom (A1) stipulates that every two-node network ~ ∆ 2 ( α, β ) is clustered into a single block at resolution max( α, β ) , whereas Axiom (A1”) stipulates that the y should be clustered at min( α, β ) . One can also be agnostic with respect to this issue and say that both of these situations are admissible. An agnostic version of axioms (A1) and (A1”) is gi ven next. (A1 ”’ ) Agnostic Axiom of V alue. The ultrametric output ( X, u p,q ) = H ( ~ ∆ 2 ( α, β )) produced by H applied to the two- node network ~ ∆ 2 ( α, β ) satisfies min( α, β ) ≤ u X ( p, q ) ≤ max( α, β ) . (150) Since fulfillment of (A1) or (A1”) implies fulfillment of (A1”’), any admissible clustering method with respect to the original axioms (A1)-(A2) or with respect to the alternativ e axioms (A1”)- (A2) must be admissible with respect to the agnostic axioms (A1”’)-(A2). In this sense, (A1”’)-(A2) is the most general combination of axioms described in this paper . For methods that are admissible with respect to (A1”’) and (A2) we can bound the range of outcome ultrametrics as explained in the follo wing theorem. Theorem 10 Consider a clustering method H satisfying axioms (A1 ”’ ) and (A2). F or an arbitrary given network N = ( X, A X ) denote by ( X , u X ) = H ( X , A X ) the outcome of H applied to N . Then, for all pairs of nodes x, x 0 ∈ X u U X ( x, x 0 ) ≤ u X ( x, x 0 ) ≤ u R X ( x, x 0 ) , (151) wher e u U X ( x, x 0 ) and u R X ( x, x 0 ) denote the unilateral and r ecip- r ocal ultrametrics as defined by (147) and (61) , respectively . Proof: See Appendix E. By Theorem 10, giv en an asymmetric network ( X , A X ) , any hierarchical clustering method abiding by axioms (A1”’) and (A2) produces outputs contained between those corresponding to two methods. The first method, unilateral clustering, symmetrizes A X by calculating ˆ A X ( x, x 0 ) = min( A X ( x, x 0 ) , A X ( x 0 , x )) for all x, x 0 ∈ X and computes single linkage on ( X , ˆ A X ) . The other method, reciprocal clustering, symmetrizes A X by calculating ¯ A X ( x, x 0 ) = max( A X ( x, x 0 ) , A X ( x 0 , x )) for all x, x 0 ∈ X and computes single linkage on ( X, ¯ A X ) . X I . S T A B I L I T Y The collection of all compact metric spaces modulo isometry becomes a metric space of its own when endo wed with the Gromov-Hausdorf f distance [39, Chapter 7.3]. This distance has been proven very useful in studying the stability of different methods of data analysis [13], [38], [45] and here we generalize it to the space of networks N modulo a properly defined notion of isomorphism. For a giv en hierarchical clustering method H we can then ask the question of whether networks that are close to each other result in dendrograms that are also close to each other . In analogy to the symmetric case [13], the answer to this question is affirmati ve for semi-reciprocal methods – of which reciprocal and nonreciprocal methods are particular cases –, and most other constructions introduced earlier , as we discuss in the follo wing sections. A. Gr omov-Hausdorf f distance for asymmetric networks Relabeling the nodes of a gi ven netw ork N X = ( X, A X ) results in a network N Y = ( Y , A Y ) that is identical from the perspective of the dissimilarity relationships between nodes. T o capture this notion formally , we say that N X and N Y are isomorphic whene ver there exists a bijecti ve map φ : X → Y such that for all points x, x 0 ∈ X we ha ve A X ( x, x 0 ) = A Y ( φ ( x ) , φ ( x 0 )) . (152) When networks N X and N Y are isomorphic we write N X ∼ = N Y . The space of networks where all isomorphic networks are repre- sented by a single point is called the space of networks modulo isomorphism and denoted as N mo d ∼ = . T o motiv ate the definition of a distance on the space N mo d ∼ = of networks modulo isomorphism, we start by consider- ing networks N X and N Y with the same number of nodes and assume that a bijectiv e transformation φ : X → Y is given. It is then natural to define the distortion dis ( φ ) of the map φ as dis ( φ ) := max ( x,x 0 ) A X ( x, x 0 ) − A Y ( φ ( x ) , φ ( x 0 ) . (153) 30 Since different maps φ : X → Y are possible, we further focus on those maps φ that makes the networks N X and N Y as similar as possible and define the distance d ∞ between networks N X and N Y with the same cardinality as d ∞ ( N X , N Y ) := 1 2 min φ dis ( φ ) , (154) where the factor 1 / 2 is added for consistency with the definition of the Gromov-Hausdorf f distance for metric spaces [39, Chapter 7.3]. T o generalize (154) to networks that may hav e dif ferent numbers of nodes we consider the notion of correspondence between node sets to take the role of the bijectiv e transformation φ in (153) and (154). More specifically , for node sets X and Y consider subsets R ⊆ X × Y of the Cartesian product space X × Y with elements ( x, y ) ∈ R . The set R is a correspondence between X and Y if for all x 0 ∈ X we have at least one element ( x 0 , y ) ∈ R whose first component is x 0 , and for all y 0 ∈ Y we hav e at least one element ( x, y 0 ) ∈ R whose second component is y 0 . The distortion of the correspondence R is defined as dis ( R ) := max ( x,y ) , ( x 0 ,y 0 ) ∈ R A X ( x, x 0 ) − A Y ( y , y 0 ) . (155) In a correspondence R all the elements of X are paired with some point in Y and, con versely , all the elements of Y are paired with some point in X . W e can then think of R as a mechanism to superimpose the node spaces on top of each other so that no points are orphaned in either X or Y . As we did in going from (153) to (154) we now define the distance between networks N X and N Y as the distortion associated with the correspondence R that makes N X and N Y as close as possible, d N ( N X , N Y ) := 1 2 min R dis ( R ) (156) = 1 2 min R max ( x,y ) , ( x 0 ,y 0 ) ∈ R A X ( x, x 0 ) − A Y ( y , y 0 ) . Notice that (156) does not necessarily reduce to (154) when the networks hav e the same number of nodes. Since for networks N X , N Y with | X | = | Y | , correspondences are more general than bijectiv e maps there may be a correspondence R that results in a distance d N ( N X , N Y ) smaller than the distance d ∞ ( N X , N Y ) . The definition in (156) is a verbatim generalization of the Gromov-Hausdorf f distance in [39, Theorem 7.3.25] except that the dissimilarity functions A X and A Y are not restricted to be metrics. It is legitimate to ask whether the relaxation of this condition renders d N ( N X , N Y ) in (156) an in valid metric. W e prove in the following theorem that this is not the case since d N ( N X , N Y ) becomes a legitimate metric in the space N mo d ∼ = of networks modulo isomorphism. Theorem 11 The function d N : N × N → R + defined in (156) is a metric on the space N mo d ∼ = of networks modulo isomorphism. I.e., for all networks N X , N Y , N Z ∈ N , d N satisfies the following pr operties: Nonne gativity: d N ( N X , N Y ) ≥ 0 . Symmetry: d N ( N X , N Y ) = d N ( N Y , N X ) . Identity: d N ( N X , N Y ) = 0 if and only if N X ∼ = N Y . T riangle ineq.: d N ( N X , N Y ) ≤ d N ( N X , N Z ) + d N ( N Z , N Y ) . Proof: See Appendix F. The guarantee offered by Theorem 11 entails that the space N mo d ∼ = of networks modulo isomorphism endowed with x 2 x 3 x 1 2 2 2 1 1 1 y 2 y 3 y 1 2 + 2 + 2 + 1 1 1 δ 1 x 1 x 2 x 3 u R / NR X (2) N X δ 2 + y 1 y 2 y 3 u R / NR Y (2) N Y d ( N X , N Y ) = d ( U X , U Y ) = 1 + Fig. 19. Instability of the method H R / NR (2) . Some dissimilarities in the network N X are perturbed by an arbitrarily small to obtain N Y such that the distance between both networks is . Howe ver , the distance between the output ultrametrics cannot be bounded by a multiple of , violating the definition of stability (157). the distance defined in (156) is a metric space. Restriction of (156) to symmetric networks sho ws that the space M mo d ∼ = of symmetric networks [cf. Section VI-A] modulo isomorphism is also a metric space. Further restriction to metric spaces shows that the space of finite metric spaces modulo isomorphism is properly metric [39, Chapter 7.3]. A final restriction of (156) to finite ultrametric spaces sho ws that the space U mo d ∼ = of ultrametrics modulo isomorphism is a metric space. As implemented in [13], having a properly defined metric to measure distances between networks N and therefore also between ultrametrics U ⊂ N permits the study of stability of hierarchical clustering methods for asymmetric networks that we undertake in the following section. B. Stability of clustering methods Intuitiv ely , a hierarchical clustering method H is stable if its application to networks that hav e small distance between each other results in dendrograms that are close to each other . Formally , we require the distance between output ultrametrics to be bounded by the distance between the original networks as we define next. (P2) Stability . W e say that the clustering method H : N → U is stable if d N ( H ( N X ) , H ( N Y )) ≤ d N ( N X , N Y ) , (157) for all N X , N Y ∈ N . Remark 10 Note that our definition of a stable hierarchical clus- tering method H coincides with the property of H : ( N , d N ) → ( U , d N | U ×U ) being a 1-Lipschitz map between the metric spaces ( N , d N ) and ( U , d N | U ×U ) . Recalling that the space of ultrametrics U is included in the space of networks N , the distance d N ( H ( N X ) , H ( N Y )) in (157) is well defined and endo ws U with a metric by Theorem 11. The relationship in (157) means that a stable hierarchical clustering method is a non-expansi ve map from the metric space of networks endowed with the distance defined in (156) into itself. A particular consequence of (157) is that if networks N X and N Y are at small distance d N ( N X , N Y ) ≤ of each other , the output 31 ultrametrics of the stable method H are also at small distance of each other d N ( H ( N X ) , H ( N Y )) ≤ d N ( N X , N Y ) ≤ . This latter observation formalizes the idea that nearby networks yield nearby dendrograms when processed with a stable hierarchical clustering method. Notice that the stability definition in (P2) extends to the hierarchical quasi-clustering methods introduced in Section IX, since the space of quasi-ultrametric networks, just like the space of ultrametric networks, is a subset of the space of asymmetric networks. Thus, we begin by showing the stability of the directed single linkage quasi-clustering method ˜ H ∗ . The reason to start the analysis with ˜ H ∗ is that the proof of the following theorem can be used to simplify the proof of stability of other clustering methods. Theorem 12 belo w is a generalization of [13, Proposition 26]. Theorem 12 The directed single linkage quasi-clustering method ˜ H ∗ with outcome quasi-ultrametrics as defined in (7) is stable in the sense of pr operty (P2). Proof: Giv en two arbitrary networks N X = ( X, A X ) and N Y = ( Y , A Y ) , assume η = d N ( N X , N Y ) and let R be a correspondence between X and Y such that dis( R ) = 2 η . Write ( X, ˜ u X ) = ˜ H ∗ ( N X ) and ( Y , ˜ u Y ) = ˜ H ∗ ( N Y ) . Fix ( x, y ) and ( x 0 , y 0 ) in R . Pick any x = x 0 , x 1 , . . . , x n = x 0 in X such that max i A X ( x i , x i +1 ) = ˜ u X ( x, x 0 ) . Choose y 0 , y 1 , . . . , y n ∈ Y so that ( x i , y i ) ∈ R for all i = 0 , 1 , . . . , n. Then, by definition of ˜ u Y ( y , y 0 ) in (7) and the definition of η in (156): ˜ u Y ( y , y 0 ) ≤ max i A Y ( y i , y i +1 ) ≤ max i A X ( x i , x i +1 ) + 2 η = ˜ u X ( x, x 0 ) + 2 η . (158) By symmetry , one also obtains ˜ u X ( x, x 0 ) ≤ ˜ u Y ( y , y 0 ) + 2 η , which combined with (158) implies that | ˜ u X ( x, x 0 ) − ˜ u Y ( y , y 0 ) | ≤ 2 η . (159) Since this is true for arbitrary pairs ( x, y ) and ( x 0 , y 0 ) ∈ R , it must also be true for the maximum as well. Moreov er , R need not be the minimizing correspondence for the distance between the networks ( X , ˜ u X ) and ( Y , ˜ u Y ) . Howe ver , it suffices to obtain an upper bound implying that d N (( X, ˜ u X ) , ( Y , ˜ u Y )) ≤ 1 2 max ( x,y ) , ( x 0 ,y 0 ) ∈ R | ˜ u X ( x, x 0 ) − ˜ u Y ( y , y 0 ) | ≤ η = d N ( N X , N Y ) , (160) concluding the proof. Moving into the realm of clustering methods, we sho w that semi-reciprocal methods H SR ( t ) are stable in the sense of property (P2) in the follo wing theorem. Theorem 13 The semi-r ecipr ocal clustering method H SR ( t ) with outcome ultrametrics as defined in (98) is stable in the sense of pr operty (P2) for every inte ger t ≥ 2 . The follo wing lemma is used to prov e Theorem 13. Lemma 3 Given a, ¯ a, b, ¯ b, c ∈ R + such that | a − b | ≤ c and | ¯ a − ¯ b | ≤ c , then | max( a, ¯ a ) − max( b, ¯ b ) | ≤ c . Proof: Begin by noticing that a = | a − b + b | ≤ | a − b | + | b | = | a − b | + b, (161) and similarly for ¯ a and ¯ b . Thus, we may write max( a, ¯ a ) ≤ max( | a − b | + b, | ¯ a − ¯ b | + ¯ b ) . (162) By using the bounds assumed in the statement of the lemma, we obtain max( a, ¯ a ) ≤ max( c + b, c + ¯ b ) = c + max( b, ¯ b ) . (163) By applying the same reasoning but starting with max( b, ¯ b ) , we obtain that max( b, ¯ b ) ≤ c + max( a, ¯ a ) . (164) Finally , by combining (163) and (164) we obtain the result stated in the lemma. Proof of Theorem 13: Here we present the proof for t = 2 in order to illustrate the main conceptual steps, which are similar to those in the proof of Proposition 26 of [13]. The general proof for any t ≥ 2 can be found in Appendix F. Recall that from (100), we kno w that H SR (2) ≡ H R . Giv en two networks N X = ( X, A X ) and N Y = ( Y , A Y ) denote by ( X , u R X ) = H R ( N X ) and ( X, u R Y ) = H R ( N Y ) the outputs of applying the reciprocal clustering method to such networks. Let η = d N ( N X , N Y ) be the distance between N X and N Y as defined by (156) and R be an associated minimizing correspondence such that | A X ( x, x 0 ) − A Y ( y , y 0 ) | ≤ 2 η , (165) for all ( x, y ) , ( x 0 , y 0 ) ∈ R . By rev ersing the order of ( x, y ) and ( x 0 , y 0 ) we obtain that | A X ( x 0 , x ) − A Y ( y 0 , y ) | ≤ 2 η . (166) From (165), (166), and the definition ¯ A X ( x, x 0 ) = max( A X ( x, x 0 ) , A X ( x 0 , x )) for all x, x 0 ∈ X , we obtain from Lemma 3 that | ¯ A X ( x, x 0 ) − ¯ A Y ( y , y 0 ) | ≤ 2 η , (167) for all ( x, y ) , ( x 0 , y 0 ) ∈ R . By using the same argument applied in the proof of Theorem 12 to go from (159) to (160), we obtain that d N (( X, ¯ A X ) , ( Y , ¯ A Y )) ≤ d N ( N X , N Y ) . (168) By comparing (7) with (61) (or equi valently in terms of algorithms by comparing (134) with (108)) it follo ws that ( X, u R X ) = ˜ H ∗ ( X, ¯ A X ) , (169) and similarly for ( Y , u R Y ) . Howe ver , since ˜ H ∗ is stable from Theorem 12, we obtain that d N (( X, u R X ) , ( Y , u R Y )) ≤ d N (( X, ¯ A X ) , ( Y , ¯ A Y )) , (170) which combined with (168) completes the proof. Reciprocal clustering is a particular case of semi-reciprocal clustering for t = 2 . Moreov er , given any network, nonreciprocal clustering behaves as a semi-reciprocal clustering for some big enough t which, by Theorem 13 is a stable method. It thus follows that these two methods are stable. This result is of sufficient merit so as to be stated separately in the following corollary . Corollary 3 The r ecipr ocal and nonrecipr ocal clustering meth- ods H R and H NR with output ultrametrics given as in (61) and (67) , r espectively , ar e stable in the sense of pr operty (P2). 32 By (157), Theorem 13 shows that semi-reciprocal clustering methods – subsuming the particular cases of reciprocal and nonreciprocal clustering – do not expand distances between pairs of input and their corresponding output networks. In particular , for any method of the above, nearby networks yield nearby den- drograms. This is important when we consider noisy dissimilarity data. Property (P2) ensures that noise has limited ef fect on output dendrograms. Remark 11 Theorem 13 notwithstanding, not all methods that are admissible with respect to axioms (A1) and (A2) are stable. For e xample, the admissible grafting method H R / NR ( β ) intro- duced in Section VII-A does not abide by (P2). T o see this fix β = 2 and turn attention to the networks N X and N Y shown in Fig. 19, where > 0 . For network N X we hav e u NR X ( x, x 0 ) = 1 and u R X ( x, x 0 ) = 2 for all pairs x, x 0 . Since u R X ( x, x 0 ) = β = 2 for all x, x 0 , the top condition in definition (92) is acti ve and we hav e u R / NR X ( x, x 0 ; 2) = u NR X ( x, x 0 ) = 1 leading to the top dendrogram in Fig. 19. For the network N Y we have that u R Y ( y , y 0 ) = 2 + > 2 = β for all y , y 0 . Thus, the bottom condition in definition (92) is acti ve and we ha ve u R / NR Y ( y , y 0 ; 2) = u R Y ( y , y 0 ) = 2 + for all y , y 0 . Giv en the symmetry in the original network and the output ultrametrics, the correspondence R with ( x i , y i ) ∈ R for i = 1 , 2 , 3 is an optimal correspondence in the definition in (156). It then follows that d N ( H R / NR ( N X ; 2) , H R / NR ( N Y ; 2)) = 1 + > d N ( N X , N Y ) = . (171) Comparing (171) with (157) we conclude that methods in the grafting family H R / NR ( β ) are in general not stable in the sense of property (P2). This observations concurs with our intuition on instability . A small perturbation in the original data results in a large v ariation in the output ultrametrics. The discontinuity in the grafting method H R / NR ( β ) arises due to the switching between reciprocal and nonreciprocal ultrametrics implied by (92). Remark 12 The same tools used in the proofs of theorems 12 and 13 can be used to show that the unilateral clustering method H U introduced in Section X-A is stable. Moreov er , con ve x combination methods introduced in Section VII-B need not be stable in general, ev en when the methods combined are stable. Nev ertheless, it can be shown that the combination of any two of the stable methods described in this section is also stable. Howe ver , the respective proofs are omitted to av oid repetition. X I I . A P P L I C A T I O N S W e apply the hierarchical clustering and quasi-clustering meth- ods dev eloped throughout the paper to determine dendrograms and quasi-dendrograms for two asymmetric network datasets. In Section XII-A we analyze the internal migration network between states of the United States (U.S.) for the year 2011. In Section XII-B we analyze the network of interactions between sectors of the U.S. economy for the same year . A. Internal migration between states of the United States The number of migrants from state to state, including the District of Columbia (DC) as a separate entity , is published yearly by the geographical mobility section of the U.S. census bureau [46]. W e denote by S , with cardinality | S | = 51 , the set containing ev ery state plus DC and as M : S × S → R + ∪ { + ∞} the migration flow similarity function gi ven by the U.S. census bureau in which M ( s, s 0 ) is the number of indi viduals that migrated from state s to state s 0 and M ( s, s ) = + ∞ for all s, s 0 ∈ S . W e then construct the asymmetric network N S = ( S, A S ) with node set S and dissimilarities A S such that A S ( s, s ) = 0 for all s ∈ S and A S ( s, s 0 ) := f M ( s, s 0 ) P t M ( t, s 0 ) , (172) for all s 6 = s 0 ∈ S where f : [0 , 1) → R ++ is a gi ven decreasing function (to be specified below). The normalization M ( s, s 0 ) / P t M ( t, s 0 ) in (172) can be interpreted as the prob- ability that an immigrant to state s 0 comes from state s . The role of the decreasing function f is to transform the similarities M ( s, s 0 ) / P t M ( t, s 0 ) into corresponding dissimilarities. For the experiments here we use f ( x ) = 1 − x . Dissimilarities A S ( s, s 0 ) focus attention on the composition of migration flows rather than on their magnitude. A small dissimilarity from state s to state s 0 implies that from all the immigrants into s 0 a high percentage comes from s . E.g., if 85% of the immigration into s 0 comes from s , then A S ( s, s 0 ) = 1 − 0 . 85 = 0 . 15 . The application of hierarchical clustering to migration data has been extensiv ely in vestigated by Slater , see [27], [30]. 1) Recipr ocal clustering H R : The outcome of applying the reciprocal clustering method H R defined in (61) to the migration network N S was computed with the algorithmic formula in (108). The resulting output dendrogram is shown in Fig. 20-(a). Figs. 20- (b) through 20-(e) illustrate the partitions that are obtained at four representativ e resolutions δ R 1 = 0 . 895 , δ R 2 = 0 . 921 , δ R 3 = 0 . 933 , and δ R 4 = 0 . 947 . States marked with the same color other than white are co-clustered at the given resolution whereas states in white are singleton clusters. F or a giv en δ , states that are clustered together in partitions produced by H R are those connected by a chain of intense bidirectional migration flo ws in the sense dictated by the resolution under consideration. The most definite pattern arising from Fig. 20 is that migration is highly correlated with geographical proximity . W ith the e x- ceptions of California, Florida, and T exas that we discuss below , all states merge into clusters with other neighboring states. In particular , the first non singleton clusters to form are pairs of neighboring states that join together at resolutions smaller than δ R 1 with the exception of one cluster formed by three states – New Y ork, New Jersey , and Pennsylvania – as shown in Fig. 20- (b). In ascending order of resolutions at which they are formed, these pairs are Minnesota and Wisconsin (green, at resolution δ = 0 . 836 ), Oregon and W ashington (orange, at resolution δ = 0 . 860 ), Kansas and Missouri (purple, at resolution δ = 0 . 860 ), District of Columbia and Maryland (turquoise, at resolution δ = 0 . 880 ), as well as Illinois and Indiana (red, at resolution δ = 0 . 891 ). In the group of three states composed of New Jersey , New Y ork, and Pennsylvania, we observe that New Y ork and New Jersey form a cluster (blue) at a smaller resolution ( δ = 0 . 853 ) than the one at which they merge with Pennsylvania ( δ = 0 . 859 ). The formation of these clusters can be explained by the fact that these states share respecti ve metropolitan areas. These areas are Minneapolis and Duluth for Minnesota and W isconsin, Portland for Ore gon and W ashington, Kansas City for Kansas and Missouri, W ashington for the District of Columbia and Maryland, Chicago for Illinois and Indiana, New Y ork City for New Y ork State and New Jersey , as well as Philadelphia for Pennsylv ania and New Jersey . Even 33 MN WI ND IL IN KY TN MI OH KS MO AL DC MD NC VA SC FL NJ NY PA GA IA NE SD AZ CA TX OR WA NV LA MS CT ME MA NH RI VT ID UT CO NM AR OK DE WV MT WY HI AK 0.84 0.86 0.88 0.9 0.92 0.94 0.96 0.98 δ R 1 δ R 2 δ R 3 δ R 4 (a) (b) (c) (d) (e) Fig. 20. (a) Reciprocal dendrogram. Output of clustering method H R when applied to the migration network N S . (b) Clusters at resolution δ R 1 . States that share urban metropolitan areas merge together first. States in white form singleton clusters at this resolution. (c) Clusters at resolution δ R 2 . Clusters are highly determined by geographical proximity except for T exas and Florida. (d) Clusters at resolution δ R 3 . The two coasts form separate clusters. (e) Clusters at resolution δ R 4 . Most of the nation forms a single cluster . Observe New England’s relative isolation. while crossing state lines, migration within shared metropolitan areas corresponds to people mo ving to dif ferent neighborhoods or suburbs and occurs frequently enough to suggest it is the reason behind the clusters formed at low resolutions in the reciprocal dendrogram. As we continue to increase the resolution, clusters formed by pairs of neighboring states continue to appear and a few clusters with multiple states emerge. At resolution δ R 2 , sho wn in Fig. 20-(c), clusters with two adjacent states include Louisiana and Mississippi, Iowa and Nebraska, and Idaho and Utah. Kentucky and T ennessee join Illinois and Indiana to form a midwestern cluster while Maine, Massachusetts, and New Hampshire form a cluster of Ne w England states. The only two exceptions to geographic proximity appear at this resolution. These exceptions are the merging of Florida into the northeastern cluster formed by New Jersey , Pennsylvania, and New Y ork, due to its closeness with the latter, and the formation of a cluster made of California and T exas. This anomaly occurs among the four states with the most intense outgoing and incoming migration in the country during 2011. The data analyzed shows that people move from all over the United States to New Y ork, California, T exas, and Florida. For instance, T exas has the lowest standard de viation in the proportion of immigrants from each other state indicating a homogenous migration flow from the whole country . Hence, the proportion of incoming migration from neighboring states is not as significant as for other states. E.g., only 19% of the migration into California comes from its three neighboring states whereas for North Dakota, which also has three neighboring states, these provide 45% of its immigration. Based on the data, we observe that New Y ork, California, T exas, and Florida have a strong influence on the immigration into their neighboring states but, giv en the mechanics of H R , the lack of influence in the opposite direction is the reason why T exas joins California and Florida joins New Y ork before forming a cluster with their neighbors. If we require only unidirectional influence as in Section XII-A4, then these four states first join their neighboring states as observed 34 in Fig. 22. Higher resolutions see the appearance of three regional clusters in the Atlantic Coast, Midwest, and New England, as well as a cluster composed of the W est Coast states plus T exas. This is illustrated in Fig. 20-(d) for resolution δ R 3 . This points tow ards the fact that people li ving in a coastal state ha ve a preference to mov e within the same coast, that people in the midwest tend to stay in the midwest, and that New Englanders tend to stay in New England. At larger resolutions states start collapsing into a single cluster . At resolution δ R 4 , shown in Fig. 20-(e), all states e xcept those in New England and the Mountain W est, along with Alaska, Arkansas, Delaware, W est V irginia, Hawaii, and Oklahoma are part of a single cluster . The New England cluster includes all six New England states which sho ws a remarkable degree of migrational isolation with respect to the rest of the country . This indicates that people li ving in New England tend to move within the region, that people outside New England rarely move into the area, or both. The same observation can be made of the pairs Arkansas-Oklahoma and Idaho-Utah. The latter could be partially attributed to the fact that Idaho and Utah are the two states with the highest percentage of mormon population in the country [47]. Four states in the Mountain W est, Ne w Mexico, Colorado, W yoming, and Montana as well as Delaw are, W est V irginia, Haw aii and Alaska stay as singleton clusters. Ha waii and Alaska are respecti vely the ne xt to last, and last state to merge with the rest of the nation further adding evidence to the correlation between geographical proximity and migration clustering. 2) Nonr ecipr ocal clustering H NR : The outcome of applying the nonreciprocal clustering method H NR defined in (67) to the migration network N S is computed with the algorithmic formula in (109). The resulting output dendrogram is shown in Fig. 21. Comparing the reciprocal and nonreciprocal dendrograms in figs. 20-(a) and 21 sho ws that the nonreciprocal clustering method merges any pair of states into a common cluster at a resolution not higher than the resolution at which they are co-clustered by reciprocal clustering. This is as it should be because the uniform dominance of nonreciprocal ultrametrics by reciprocal ultrametrics holds for all networks [cf. (68)]. E.g., for the reciprocal method, Colorado and Florida become part of the same cluster at resolution δ = 0 . 954 whereas for the nonreciprocal case they become part of the same cluster at resolution δ = 0 . 939 . The nonreciprocal resolution need not be strictly smaller , for example, Illinois and T ennessee are merged by both clustering methods at a resolution δ = 0 . 920 . Further observ e that there are many striking similarities between the reciprocal and nonreciprocal dendrograms in figs. 20-(a) and 21. In both dendrograms, the first three clusters to emerge are the pair Minnesota and W isconsin (at resolution δ = 0 . 836 ), followed by the pair New Y ork and New Jersey (at resolution δ = 0 . 853 ) which are in turn co-clustered with Pennsylvania at resolution δ = 0 . 859 . W e then see the emergence of the four pairs: Oregon and W ashington (at resolution δ = 0 . 860 ), Kansas and Missouri (at resolution δ = 0 . 860 ), District of Columbia and Maryland (at resolution δ = 0 . 880 ), and Illinois and Indiana (at resolution δ = 0 . 891 ). These are the same sev en groupings and resolutions at which clusters form in the reciprocal dendrogram that we attributed to the existence of shared metropolitan areas spanning more than one state [cf. Fig. 20-(b)]. Recall that the difference between the reciprocal and nonre- ciprocal clustering methods H R and H NR is that the latter allows influence to propagate through c ycles whereas the former requires direct bidirectional influence for the formation of a cluster . In the particular case of the migration network N S this means that nonreciprocal clustering may be able to detect migration cycles of arbitrary length that are overlook ed by reciprocal clustering. E.g., if people in state A tend to move predominantly to B , people in B to move predominantly to C , and people in C mov e predominantly to A , nonreciprocal clustering merges these three states according to this migration cycle but reciprocal clustering does not. The ov erall similarity of the reciprocal and nonreciprocal dendrograms in figs. 20-(a) and 21 suggests that migration cycles are rare in the United States. In particular , the formation of the sev en clusters due to shared metropolitan areas indicates that the bidirectional migration flow between these pairs of states is higher than any migration cycle in the country . Notice that highly symmetric data would also correspond to similar reciprocal and nonreciprocal dendrograms. Nev ertheless, another consequence of highly symmetric data would be to obtain a unilateral dendrogram similar to the reciprocal and the nonreciprocal ones. This is not the case, as can be seen in Section XII-A4, thus, symmetry cannot be the reason for the similarity observed between the reciprocal and nonreciprocal dendrograms. Howe ver similar, the reciprocal and nonreciprocal dendrograms in figs. 20-(a) and 21 are not identical. E.g., the last state to mer ge with the rest of the country in the reciprocal dendrogram is Alaska at resolution δ = 0 . 975 whereas the last state to merge in the nonreciprocal dendrogram is Montana at resolution δ = 0 . 962 with Alaska joining the rest of the country at resolution δ = 0 . 948 . Giv en the mechanics of H NR , this must occur due to the existence of a cycle of migration in volving Alaska which is stronger than the bidirectional exchange between Alaska and any other state, and direct analysis of the data confirms this fact. As we hav e argued, the areas of the country that cluster together when applying the nonreciprocal method are similar to the ones depicted in Fig. 20-(d) for the reciprocal clustering method. When we cut the nonreciprocal dendrogram in Fig. 21 at resolution δ = 0 . 930 , three major clusters arise – highlighted in green, red, and orange in the dendrogram in Fig. 21. The green cluster corresponds to the exact same block containing the W est Coast plus T exas that arises in the reciprocal dendrogram and is depicted in purple in Fig. 20-(d). The red cluster in the dendrogram corresponds to the East Coast cluster found with the reciprocal method with the exception that Alabama is not included. Howe ver , Alabama joins this block at a slightly higher resolution of δ = 0 . 931 , coinciding with the merging of the green, red and orange clusters. The orange cluster in the nonreciprocal dendrogram corresponds to the Midwest cluster found in 20-(d). Howe ver , in contrast with the reciprocal case, Michigan and Ohio join the Midwest cluster before Minnesota, W isconsin and North Dakota. For the nonreciprocal case, these last three states join the main cluster at resolution δ = 0 . 933 , after the East Coast, W est Coast and Midwest become a single block. The migrational isolation of New England with respect to the rest of the country , which we observed in reciprocal clustering, also arises in the nonreciprocal case. The New England cluster is depicted in blue in the nonreciprocal dendrogram in Fig. 21 and joins the main cluster at a resolution of δ = 0 . 946 , which coincides with the merging resolution for the reciprocal case. Howe ver , the order in which states become part of the New 35 MN WI ND AL AZ CA TX OR WA NV DC MD NC VA SC FL NJ NY PA GA IL IN KY TN MI OH KS MO IA NE CO AR OK SD LA MS CT ME MA NH RI VT ID UT AK WY NM DE WV HI MT 0.84 0.86 0.88 0.9 0.92 0.94 0.96 Fig. 21. Nonreciprocal dendrogram. Dendrogram obtained when applying the nonreciprocal method H NR to the state-to-state migration network N S . The resemblance with the dendrogram in Fig. 20(a) indicates that migration cycles are not ubiquitous. England cluster varies. In the nonreciprocal case, Connecticut merges with the cluster of Maine-Massachusetts-New Hampshire at resolution δ = 0 . 926 before Rhode Island which merges at resolution δ = 0 . 927 . Howe ver , for the reciprocal case, Rhode Island still merges at the same resolution but Connecticut merges after this at a resolution δ = 0 . 933 . The reason for this is that in the reciprocal case, the states of Connecticut and Rhode Island merge with the cluster Maine-Massachusetts-Ne w Hampshire at the resolution where there exist bidirectional flo ws with the state of Massachusetts. In the nonreciprocal case, this same situation applies for Rhode Island, but from the data it can be inferred that Connecticut joins the mentioned cluster at a lower resolution due to a migration cycle composed of the chain [ Connecticut, Maine, New Hampshire, Massachusetts, Connecticut ] . Up to this point we see that all the conclusions that we have extracted when applying H NR are qualitativ ely similar to those obtained when applying H R . This is not surprising because the differences between the reciprocal and nonreciprocal dendrograms either occur at coarse resolutions or are relativ ely small. In fact, one should expect any conclusion stemming from the application of H R and H NR to the migration network N S to be qualitativ ely similar . 3) Intermediate methods: From Theorem 4 we know that any clustering method satisfying the axioms of value and transfor- mation applied to the migration network N S yields an outcome dendrogram such that the resolution at which any pair of states merge in a common cluster is bounded by the resolutions at which the same pair of states is co-clustered in the dendrograms resulting from application of the nonreciprocal and reciprocal clustering methods. Giv en the similar conclusions obtained upon analysis of the reciprocal and nonreciprocal clustering outputs we can assert that any other hierarchical clustering method satisfying the axioms of value and transformation would lead to similar conclusions. In particular , this is true for the intermediate methods described in Section VII and the algorithmic intermediate of Section VIII. 4) Unilater al clustering H U : The outcome of applying the unilateral clustering method H U defined in (147) to the migration network N S is computed with the algorithmic formula in (149). The resulting output dendrogram is sho wn in Fig. 22-(a). The colors in the dendrogram correspond to the clusters formed at resolution δ U 1 = 0 . 872 which are also shown in the map in Fig. 22- (b) with the same color code. States shown in black in Fig. 22-(a) and white in Fig. 22-(b) are singleton clusters at this resolution. In Fig. 22-(c) we show the two clusters that appear when the unilateral dendrogram is cut at resolution δ U 2 = 0 . 896 . States that are clustered together in unilateral partitions are those connected by a chain of intense unidirectional migration flows in the sense dictated by the resolution under consideration. In unilateral clustering, the relation between geographical prox- imity and tendenc y to form clusters is ev en more determinant than in reciprocal and nonreciprocal clustering [cf. sections XII-A1 and XII-A2] since the exceptions of T exas, California, and Florida do not occur in this case. Indeed, California first merges with Ne vada at resolution δ = 0 . 637 , T exas with Louisiana at δ = 0 . 694 , and Florida with Alabama at δ = 0 . 830 , the three pairs of states being neighbors. Moreov er , from Fig. 22-(b) it is immediate that at resolution δ U 1 ev ery non singleton cluster is formed by a set of neighboring states. Recall that unilateral clustering H U abides by the alternativ e axioms of value and transformation (A1”)-(A2) in contrast to the (regular) axioms of value and transformation satisfied by reciprocal H R and nonreciprocal H NR clustering. Consequently , unidirectional influence is enough for the formation of a cluster . In the particular case of the migration network N S this means that unilateral clustering may detect one-way migration flows that are ov erlooked by reciprocal and nonreciprocal clustering. E.g., if people in state A tend to mo ve to B but people in B rarely mo ve to A either directly or through intermediate states, unilateral cluster- ing merges these two states according to the one-way intense flow from A to B but reciprocal and nonreciprocal clustering do not. The differences between the unilateral dendrogram in Fig. 22-(a) with the reciprocal and nonreciprocal dendrograms in figs. 20-(a) and 21 indicate that migration flows which are intense in one way but not in the other are common. E.g., the first two states to mer ge in the unilateral dendrogram in Fig. 22-(a) are Massachusetts and New Hampshire at resolution δ = 0 . 580 because from all the people that mo ved into New Hampshire, 42% came from Massachusetts, this being the highest value in all the country . The flow in the direction from New Hampshire to Massachusetts is lo wer , only 9% of the immigrants entering the latter come from the former . This is the reason why these two states are not the first to merge in the reciprocal and nonreciprocal dendrograms. In these previous cases, Minnesota and W isconsin were the first to merge because the relativ e flow in both directions is 16% and 36 MA NH RI CT NJ NY DE PA VT ME AL FL GA MS MI DC MD VA KY OH WV TN NC SC AK AZ CA NV OR UT WA HI ID AR LA TX OK NM CO WY MT IL WI IA MN ND SD NE IN KS MO 0.6 0.65 0.7 0.75 0.8 0.85 0.9 (a) (b) (c) Fig. 22. Unilateral clustering of state-to-state migration network. (a) Dendrogram output of applying the unilateral clustering method H U to the network of state-to-state migration N S . Clusters at resolution δ U 1 = 0 . 872 are highlighted in color . (b) Highlighted clusters are identified in a map. Clusters tend to form around high populated states. (c) Map colored according to the partition at resolution δ U 2 = 0 . 896 . T wo clear clusters, east and west, arise. 19%. Unilateral clusters tend to form around populous states. In Fig. 22-(b), the six clusters with more than two states contain the sev en states with lar gest population – California, T exas, Ne w Y ork, Florida, Illinois, Pennsylvania, and Ohio [46] – one in each cluster except for the blue one that contains Ne w Y ork and Pennsylv ania. The data suggests that the reason for this is that populous states hav e a strong influence on the immigration into neighboring states. Indeed, if we focus on the cyan cluster formed around T exas, the proportional immigration into Louisiana, Ne w Mexico, Oklahoma, and Arkansas coming from T exas is 31%, 22%, 29%, and 21% respectiv ely . The opposite is not true, since the immigration into T exas from the four aforementioned neighboring state is of 5%, 3%, 4%, and 3%, respectiv ely . Howe ver , this flow in the opposite direction is not required for unilateral clustering to mer ge the states into one cluster . Between two states with large population, the immigration is more balanced in both directions, thus merging at high resolutions in the unilateral dendrogram. E.g., 11% of the immigration into T exas comes from California and 8% in the opposite direction. Unilateral clustering detects an east-west division of migration flows in the United States. The last merging in the unilateral dendrogram occurs at resolution δ = 0 . 8958 and just below the merging resolution, e.g. at resolution δ U 2 , there are two clusters – east and west – corresponding to the ones depicted in Fig. 22-(c). The cut at δ U 2 corresponds to a migrational flow of 10.45%. This implies that for any two different states within the same cluster we can find a unilateral chain where ev ery flow is at least 10.45%. More interestingly , there is no pair of states, one from the east and one form the west, with a flow of 10.45% or more in any direction. 5) Dir ected single linkage quasi-clustering ˜ H ∗ : The outcome of applying the directed single linkage quasi-clustering method ˜ H ∗ with output quasi-ultrametrics defined in (7) to the migration network N S is computed with the algorithmic formula in (134). In figs. 23 and 24 we sho w some quasi-partitions of the output quasi-dendrogram ˜ D ∗ S = ( D ∗ S , E ∗ S ) focusing on New England and an extended W est Coast including Arizona and Nev ada. States represented with the same color are part of the same cluster at the given resolution and states in white form singleton clusters. Arrows between clusters for a giv en resolution δ represent the edge set E ∗ S ( δ ) for resolution δ . The resolutions δ at which quasi-partitions are shown in figs. 23 and 24 correspond to those 0.001 smaller than those in which mergings in the dendrogram component D ∗ S of the output quasi-dendrogram ˜ D ∗ S occur or , in the case of the last map in each figure, correspond to the resolution of the last merging in the region shown. E.g., in Fig. 24 Oregon and W ashington merge at resolution δ = 0 . 860 , thus, in the first map we look at the quasi-partition at resolution δ ∗ 1 = 0 . 859 . The directed single linkage quasi-clustering method ˜ H ∗ cap- tures not only the formation of clusters b ut also the asymmetric influence between them. E.g. the quasi-partition in Fig. 23 for resolution δ ∗ 1 = 0 . 913 is of little interest since every state forms a singleton cluster . The influence structure, ho wever , reveals a highly asymmetric migration pattern. At this resolution Mas- sachusetts has migrational influence over every other state in the region as depicted by the fi ve arrows lea ving Massachusetts and entering each of the other fiv e states. No state has influence ov er Massachusetts at this resolution since this would imply the formation of a non singleton cluster by the mechanics of H ∗ . This influence could be explained by the fact that Massachusetts contains Boston, the largest urban area of the region. Hence, Boston attracts immigrants from all over the country reducing the proportional immigration into Massachusetts from its neighbors and generating the asymmetric influence structure observed. This 37 δ ∗ 1 = 0 . 913 δ ∗ 2 = 0 . 916 δ ∗ 3 = 0 . 925 δ ∗ 4 = 0 . 926 δ ∗ 5 = 0 . 941 δ ∗ 6 = 0 . 942 CT RI MA VT NH ME Fig. 23. Directed single linkage quasi-clustering method applied to New England’ s migration flow . Quasi-partitions sho wn for resolutions before e very merging and after the last. Massachusetts migrational influence over the region is represented by the outgoing edges in the quasi-partitions. is consistent with the conclusions re garding clustering around pop- ulous states that we reached by analyzing the unilateral clusters in Fig. 22-(b). Howe ver , in the quasi-partition analysis, as opposed to the unilateral clustering analysis, the influence of Massachusetts ov er the other states can be seen clearly as it is formally captured in the edge set E ∗ S (0 . 913) . The rest of the influence pattern at this resolution sees Connecticut influencing Rhode Island and V ermont and Ne w Hampshire influencing Maine and V ermont. At resolution δ ∗ 2 = 0 . 916 , we see that Massachusetts has merged with New Hampshire and this main cluster exerts influ- ence over the rest of the region. Similarly , at resolution δ ∗ 3 = 0 . 925 , Maine has joined the cluster formed by Massachusetts and New Hampshire and together they ex ert influence over the singleton clusters of Connecticut, Rhode Island, and V ermont. The influence arcs from Connecticut to Rhode Island and V ermont persist in these two diagrams. W e know that this has to be the case due to the influence hierarchy property of the the edge sets E ∗ S stated in condition ( ˜ D3) in the definition of quasi-dendrogram in Section IX-A. At resolution δ ∗ 4 = 0 . 926 Connecticut joins the main cluster while Rhode Island joins at resolution δ ∗ = 0 . 927 , thus we depict the corresponding maps at resolutions 0.001 smaller than these mer ging resolutions. The whole re gion becomes one cluster at resolution δ ∗ 6 = 0 . 942 – which marks the joining δ ∗ 1 = 0 . 859 δ ∗ 2 = 0 . 921 δ ∗ 3 = 0 . 922 δ ∗ 4 = 0 . 923 W A OR NV CA AZ Fig. 24. Directed single linkage quasi-clustering method applied to the extended W est Coast migration flow . Quasi-partitions shown for resolutions before ev ery merging and after the last. California acts as an agglutination agent in the region. of V ermont into the cluster . For the case of the W est Coast in Fig. 24, California is the most influential state as expected from its large population. The quasi-partition at resolution δ ∗ 1 = 0 . 859 is such that all states are singleton clusters with California ex erting influence onto all other W est Coast states and W ashington e xerting influence on Oregon. The first cluster to form does not in volve California b ut W ashington and Oregon merging at resolution δ = 0 . 860 and the cluster can be observed from the map at resolution δ ∗ 2 = 0 . 921 . Howe ver , California has influence ov er this two-state cluster as shown by the arrow going from California to the green cluster in the corresponding figure. The influence over the two other states, Nev ada and Arizona, remains. This is as it should be because of the persistence property of the edge set E ∗ S . At this resolution we also see an influence arc appearing from Arizona to Nev ada. At resolution δ ∗ 3 = 0 . 922 California joins the W ashington-Oregon cluster that exerts influence over Arizona and Nev ada. The whole region merges in a common cluster at resolution δ ∗ 4 = 0 . 923 . An important property of quasi-dendrograms is that the quasi- partitions at any giv en resolution define a partial order between the clusters. Recall that slicing a dendr ogr am at certain resolution yields a partition of the node set where there is no defined order between the blocks of the partition. Slicing a quasi-dendr ogram yields also an edge set E ∗ S ( δ ) that defines a partial order among the clusters at such resolution. This partial order is useful because it allo ws us to ascertain the relati ve importance of dif ferent clusters. E.g., in the case of the extended W est Coast in Fig. 24 one would expect California to be the dominant migration force in the region. The quasi-partition at resolution δ ∗ 1 = 0 . 859 permits asserting this fact formally because the partial order at this resolution has California ranked as more important than any other state. W e also see the not unreasonable dominance of W ashington ov er Oregon, while the remaining pairs of the ordering are not defined. At larger resolutions we can ascertain relative importance of 38 clusters. At resolution δ ∗ 2 = 0 . 921 we can say that California is more important than the cluster formed by Oregon and W ashing- ton as well as more important than Arizona and Ne vada. W e can also see that Arizona precedes Nev ada in the migration ordering at this resolution while the remaining pairs of the ordering are undefined. At resolution δ ∗ 3 = 0 . 922 there is an interesting pattern as we can see the cluster formed by the three W est Coast states preceding Arizona and Nev ada in the partial order . At this resolution the partial order also happens to be a complete order as Arizona is seen to precede Ne vada. This is not true in general as we hav e already seen. In New England and the W est Coast, the respectiv e importance of Massachusetts and California over nearby states acts as an agglutination force to wards regional clustering. Indeed, if we delete any of these two states and cluster the remaining states in the corresponding region, the resolution at which the whole region becomes one cluster is increased, showing a decreasing tendency to cluster . E.g., for the case of New England, if we delete Massachusetts and cluster the remaining five states, they become one regional cluster at a resolution of δ ∗ = 0 . 979 whereas if we delete, e.g. Maine or Rhode Island, the remaining fi ve states mer ge into one single cluster at resolution δ ∗ 6 = 0 . 942 as in the original case [cf. Fig. 23]. Further observ e that if we limit our attention to the dendrogram component of the quasi-dendrogram depicted in Fig. 23, i.e., if we ignore the edge sets E ∗ S ( δ ) , we recov er the information in the nonreciprocal dendrogram in Fig. 21. In the case of Ne w England the dendrogram part D ∗ S of the quasi-dendrogram ˜ D ∗ S has the mergings occurring at resolutions 0.001 larger than the resolutions used to depict the quasi-partitions, i.e. Massachusetts first merges with New Hampshire ( δ = 0 . 914 ), then Maine joins this cluster ( δ = 0 . 917 ), followed by Connecticut ( δ = 0 . 926 ), Rhode Island ( δ = 0 . 927 ) and finally V ermont ( δ = 0 . 942 ). The order and resolutions in which states join the main cluster coincides with the blue part of the nonreciprocal dendrogram in Fig. 21. In the case of the extended W est Coast in Fig. 24 we have Oregon joining W ashington ( δ = 0 . 860 ), which are then joined by California ( δ = 0 . 922 ), which are then joined by Arizona and Nev ada at resolution δ = 0 . 923 . Observe that Arizona and Nev ada do not form a separate cluster before joining California, Oregon, and W ashington. They both join the rest of the states at the exact same resolution. This is the same order and the same resolutions corresponding to the green part of the nonreciprocal dendrogram in Fig. 21. Notice that while T exas appears in the nonreciprocal dendrogram it does not appear in the quasi-partitions. This is only because we decided to show a partial view of the extended W est Coast without including T exas. The fact that when we limit our attention to the dendrogram component of the quasi-dendrogram we recov er the nonreciprocal dendrogram is not a coincidence. W e know from Proposition 11 that the dendrogram component of the quasi-partitions generated by directed single linkage is equiv alent to the dendrograms generated by nonreciprocal clustering. B. Inter actions between sectors of the U.S. economy The Bureau of Economic Analysis of the U.S. Department of Commerce publishes a yearly table of input and outputs organized by economic sectors [48]. This table records how economic sectors interact to generate gross domestic product. W e focus on a particular section of this table, called uses , corresponds to the inputs to production for year 2011 . More precisely , we are given a set I of 61 industrial sectors as defined by the North American Industry Classification System (NAICS) – see T able I – and a similarity function U : I × I → R + where U ( i, i 0 ) represents how much of the production of sector i , expressed in dollars, is used as an input of sector i 0 . Notice that it is common for part of the output of some sector i ∈ I to be used as input in the same sector , i.e. U ( i, i ) can be strictly positiv e. W e define the network N I = ( I , A I ) where the dissimilarity function A I satisfies A I ( i, i ) = 0 for all i ∈ I and, for i 6 = i 0 ∈ I , is giv en by A I ( i, i 0 ) := f U ( i, i 0 ) P j U ( j, i 0 ) ! , (173) where f : [0 , 1) → R ++ is a giv en decreasing function. For the experiments here we use f ( x ) = 1 − x . The normalization U ( i, i 0 ) / P j U ( j, i 0 ) in (173) can be interpreted as the proportion of the input in dollars to productiv e sector i 0 that comes from sector i . In this way , we focus on the combination of inputs of a sector rather than the size of the economic sector itself. That is, a small dissimilarity from sector i to sector i 0 implies that sector i 0 highly relies on the use of sector i output as an input for its own production. E.g., if 40% of the input into sector i 0 comes from sector i , we say that sector i has an influence of 40% ov er i 0 and the dissimilarity A I ( i, i 0 ) = 1 − 0 . 40 = 0 . 60 . Gi ven that part of the output of some sector can be used as input in the same sector, if we sum the input proportion from ev ery other sector , we obtain a number less than 1. The role of the decreasing function f is to transform the similarities into corresponding dissimilarities. 1) Recipr ocal clustering H R : The outcome of applying the reciprocal clustering method H R defined in (61) to the netw ork N I is computed with the algorithmic formula in (108). The resulting output dendrogram is shown in Fig. 25-(a) where three clusters are highlighted in blue, red and green. These clusters appear at resolutions δ R 1 = 0 . 959 , δ R 2 = 0 . 969 , and δ R 3 = 0 . 977 , respectiv ely . In Fig. 25-(b) we present the three highlighted clusters with edges representing bidirectional influence between industrial sectors at the corresponding resolution. That is, a double arrow is drawn between two nodes if and only if the dissimilarity between these nodes in both directions is less than or equal to the resolution at which the corresponding cluster appears. In particular , it sho ws the bidirectional chains of minimum cost between two nodes. E.g., for the blue cluster ( δ R 1 = 0 . 959 ) the bidirectional chain of minimum cost from the sector ‘Rental and leasing services of intangible assets’ (RL) to ‘Computer and electronic products’ (CE) goes through ‘Management of companies and enterprises’ (MC). According to our analysis, the reciprocal clustering method H R tends to cluster sectors that satisfy one of two possible typologies. The first type of clustering occurs among sectors of balanced influence in both directions. E.g., the first two sectors to be merged by H R are ‘ Administrati ve and support services’ (AS) and ‘Miscellaneous professional, scientific and technical services’ (MP) at a resolution of δ = 0 . 887 . This occurs because 13.2% of the input of AS comes from MP – corresponding to A I ( MP , AS ) = 0 . 868 – and 11.3% of MP’ s input comes from AS – implied by A I ( AS , MP ) = 0 . 887 – both influences being similar in magnitude. It is reasonable that these two sectors hire services from each other in order to better perform their own service. This balanced behavior is more frequently observed among service sectors than between raw material extraction (primary) or manu- facturing (secondary) sectors. Notice that for two manufacturing sectors A and B to hav e balanced bidirectional influence we need 39 T ABLE I C O DE A N D D E SC R I PT I O N O F I N D US T R IA L S E CT O R S Code Industrial Sector Code Industrial Sector A C Accommodation A G Amusements, gambling, and recreation industries AH Amb ulatory health care services AP Apparel and leather and allied products A T Air transportation AS Administrativ e and support services BT Broadcasting and telecommunications CE Computer and electronic products CH Chemical products CO Construction CS Computer systems design and related services ED Educational services EL Electrical equipment, appliances, and components F A Farms FB Food and bev erage and tobacco products FO Forestry , fishing, and related activities FM Fabricated metal products FR Federal Reserve banks and credit intermediation FU Furniture and related products FP Food services and drinking places FT Funds, trusts, and other financial vehicles IC Insurance carriers and related activities ID Information and data processing services LS Legal services MI Mining, except oil and gas MA Machinery MV Motor vehicles, bodies and trailers, and parts MC Management of companies and enterprises MM Miscellaneous manufacturing MP Misc. professional, scientific, and technical services NM Nonmetallic mineral products OG Oil and gas extraction OS Other services, except gov ernment O T Other transportation and support activities P A Paper products PC Petroleum and coal products PE Performing arts, spectator sports and museums PL Plastics and rubber products PM Primary metals PR Printing and related support activities PS Motion picture and sound recording industries PT Pipeline transportation PU Publishing industries (includes software) RA Real estate RE Retail trade RL Rental and leasing serv . and lessors of intang. assets R T Rail transportation SA Social assistance SC Securities, commodity contracts, and in vestments SM Support activities for mining TE T extile mills and textile product mills TG T ransit and ground passenger transportation TM Other transportation equipment TT T ruck transportation UT Utilities WH Wholesale trade WM W aste management and remediation services WO W ood products WT W ater transportation WS W arehousing and storage the outputs of A to be inputs of B in the same proportion as the outputs of B are inputs of A. This situation is rarer . Further examples of this clustering typology where the influence in both directions is balanced can be found between pairs of service sectors with bidirectional edges in the blue cluster formed at reso- lution δ R 1 = 0 . 959 . E.g., the participation of RL in the input to MC is of 7.6% – since A I ( RL , MC ) = 0 . 924 – whereas the influence in the opposite direction is 8.5%, given by A I ( MC , RL ) = 0 . 915 . Similarly , 6.5% of the input to the ‘Real estate’ (RA) sector comes from AS and 6.0% vice versa. This implies that the RA sector hires external administrati ve and support services and the AS sector depends on the real estate services to, e.g., rent a location for their operation. The second type of clustering occurs between sectors with one natural direction of influence but where the influence in the opposite direction is meaningful. E.g., the second merging in the reciprocal dendrogram in Fig. 25-(a) occurs at resolution δ = 0 . 893 between the ‘Farm’ (F A) sector and the ‘Food, bev erage and tobacco products’ (FB) sector . In this case, one e xpects a big portion of FB’ s input to come from F A – 35.2% to be precise – as raw materials for processed food products but there is also a dependency on the opposite direction of 10.7% from, e.g., food supplementation for livestock not entirely fed with grass. This second clustering typology generally occurs between consecutiv e sectors in the production chain of a particular industry , with the strong influence in the natural direction of the material mov ement and the non negligible influence in the opposite direction which is particular of each industry . E.g., for the food industry , the primary F A sector precedes in the production process the secondary FB sector . Thus, the influence of F A ov er FB is clear . Howe ver , there is an influence of FB over F A that could be explained by the provision of food supplementation for liv estock. Further examples of this interaction between sectors can be found in the textile and metal industries. Representing the textile industry , at resolution δ = 0 . 938 the sectors ‘T extile mills and textile product mills’ (TE) and ‘ Apparel and leather and allied products’ (AP) merge. In the garment production process, there is a natural direction of influence from TE that generates fabric from a basic fiber to AP that cuts and sews the fabric to generate garments. Indeed, the influence in this direction is of 17.8% represented by A I ( TE , AP ) = 0 . 822 . Howe ver , there is an influence of 6.2% – corresponding to A I ( AP , TE ) = 0 . 938 – in the opposite direction. This influence can be partially attributed to companies in the TE sector which also manufacture garments and buy intermediate products from companies in the AP sector . For example, a textile mill that produces wool fabric and also manufactures wool garments with some details in leather . This leather comes from a company in the AP sector and represents a movement from AP back to TE. In the metal industry , at resolution δ = 0 . 960 ‘Mining, except oil and gas’ (MI) merges with ‘Primary metals’ (PM). The bidirectional influence between these two sectors can be observed in the red cluster formed at resolution δ R 2 = 0 . 969 in Fig. 25-(b). As before, the natural 40 influence is in the direction of the production process, i.e. from MI to PM. Indeed, 9.3% of PM’ s input comes from MI mainly as ores for metal manufacturing. Moreov er , there is an influence of 4.0% in the opposite direction from PM to MI due to, e.g., structural metals for mining infrastructure. The cluster in Fig. 25 that forms at resolution δ R 1 = 0 . 959 (blue) is mainly composed of services. The first two mergings, described in the pre vious paragraph, occur between MP-AS and RL-MC rep- resenting professional, support, rental and management services, respectiv ely . At resolution δ = 0 . 925 , the sectors ‘Federal Reserve banks, credit intermediation, and related activities’ (FR) and ‘Securities, commodity contracts, and inv estments’ (SC) merge. This is an exception to the described balanced mergings between service sectors. Indeed, 24.1% of FR’ s input comes from SC whereas only 7.5% of SC’ s input comes from FR. This is e xpected since credit intermediation entities in FR ha ve as input in vestments done in the SC sector . At resolution δ = 0 . 940 , RA joins the MP- AS cluster due to the bidirectional influence between RA and AS described in the previous paragraph. The MP-AS-RA cluster merges with the FR-SC cluster at resolution δ = 0 . 948 due to the relation between MP and FR. More precisely , MP provides 11.3% of FR input – corresponding to A I ( MP , FR ) = 0 . 887 – and 5.2% of MP’ s input comes from FR, gi ven by A I ( FR , MP ) = 0 . 948 . At resolution δ = 0 . 957 , CE joins the RL-MC cluster due to its bidirectional influence relation with MC. The sector of electronic products CE is the only sector in the blue cluster formed at resolution δ R 1 = 0 . 959 that does not represent a service. The ‘Insurance carriers and related acti vities’ (IC) sector joins the MP-AS-RA-FR-SC cluster at resolution δ = 0 . 959 because of its relation with SC. In fact, 4.5% of IC’ s input comes from SC in the form of securities and in vestments and 4.1% of SC’ s input comes from IC in the form of insurance policies for in vestments. Finally , at resolution δ R 1 = 0 . 959 , the clusters MP-AS-RA-FR- SC-IC and CE-RL-MC mer ge due to the relation between the supporting services AS and the management services MC. The cluster in Fig. 25 that forms at resolution δ R 2 = 0 . 969 (red) mixes the three lev els of the economy: raw material extraction or primary , manufacturing or secondary and services or tertiary . The ‘Mining, except oil and gas’ sector (MI), which is a primary activity of extraction, merges at resolution δ = 0 . 943 with the ‘Utilities’ (UT) sector which extends vertically into the secondary and tertiary industrial sectors since it generates and distributes energy . This merging occurs because 5.7% of UT’ s input comes from MI and 8.8% vice versa. This pair then merges at resolution δ = 0 . 961 with the manufacturing sector of ‘Primary metals’ (PM). PM joins this cluster due to its bidirectional relation with MI previously described. At resolution δ = 0 . 968 , the primary sector of ‘Oil and gas extraction’ (OG) joins the MI-UT -PM cluster because 3.2% of OG’ s input comes from UT , mainly as electric po wer supply , and 57.3% of UT’ s input comes from OG as natural gas for combustion and distribution. Finally , at resolution δ R 2 = 0 . 969 the service sector of ‘Rail transportation’ (R T) merges with the rest of the cluster due to its influence relation with PM. Indeed, PM provides 7.0% of the input of R T for the construction of railroads – corresponding to A I ( PM , R T ) = 0 . 930 – and R T provides 3.1% of PM’ s input – gi ven by A I ( R T , PM ) = 0 . 969 – as transportation services for final metal products. The cluster in Fig. 25 that forms at resolution δ R 3 = 0 . 977 (green) is composed of food and wood generation and processing. It starts with the aforementioned merging between F A and FB at δ = 0 . 893 . At resolution δ = 0 . 956 , ‘Forestry , fishing, and related activities’ (FO) joins the F A-FB cluster due to its relation with F A. The farming sector F A depends 9.2% on FO due to, e.g., deforestation for crop growth. The dependence in the opposite direction is of 4.7%. Finally , at δ R 3 = 0 . 977 , ‘W ood products’ (WO) joins the cluster . Its relation with FO is highly asymmetric and corresponds to the second clustering typology described at the beginning of this section. There is a natural influence in the direction of the material movement from FO to WO. Indeed, 26.2% of WO’ s input comes from FO whereas the influence is of 2.3% in the opposite direction. Requiring direct bidirectional influence for clustering generates some cluster which are counter-intuiti ve. E.g., in the reciprocal dendrogram in Fig. 25-(a), at resolution δ = 0 . 971 when the blue and red clusters merge together we have that the oil and gas sector OG in the red cluster joins the insurance sector IC in the blue cluster . Howe ver , OG does not merge with ‘Petroleum and coal products’ (PC), a sector that one would expect to be more closely related, until resolution δ = 0 . 975 . In order to avoid this situation, we may allow nonreciprocal influence as we do in the following section. 2) Nonr ecipr ocal clustering H NR : The outcome of applying the nonreciprocal clustering method H NR defined in (67) to the network N I is computed with formula (109). The resulting output dendrogram is sho wn in Fig. 26-(a). Let us first observe, as we did for the case of the migration matrix in Section XII-B2, that the nonreciprocal ultrametric distances in Fig. 26-(a) are not larger than the reciprocal ultrametric distances in Fig. 25-(a) as it should be the case gi ven the inequality in (68). As a test case we ha ve that the mining sector MI and the ‘Pipeline transportation’ (PT) sectors become part of the same cluster in the reciprocal dendrogram at a resolution δ = 0 . 979 whereas they merge in the nonreciprocal dendrogram at resolution δ 0 = 0 . 912 < 0 . 979 . A more interesting observ ation is that, in contrast with the case of the migration matrix of Section XII-A, the nonreciprocal dendrogram is qualitati vely very different from the reciprocal dendrogram. In the reciprocal dendrogram we tended to see the formation of definite clusters that then merged into larger clusters at coarser resolutions. The cluster formed at resolution δ R 1 = 0 . 959 (blue) sho wn in Fig. 25-(b) gro ws by merging with singleton clusters (FP , OS, LS, BT , CS, WH, and O T in progressi ve order of resolution) until it merges at resolution δ = 0 . 971 with a cluster of fiv e nodes which emerges at resolution δ R 2 = 0 . 969 . This whole cluster then grows by adding single nodes and pairs of nodes until it merges at resolution δ = 0 . 988 with a cluster of four nodes that forms at resolution δ R 3 = 0 . 977 . In the nonreciprocal dendrogram, in contrast, we see the progressi ve agglutination of economic sectors into a central cluster . Indeed, the first non singleton cluster to arise is formed at resolution δ NR 1 = 0 . 885 by the sectors of oil and gas extraction OG, petroleum and coal products PC, and ‘Construction’ (CO). For reference, observe that this happens before the first reciprocal merging between AS and MP , which occurs at resolution δ = 0 . 887 [cf. Fig. 25-(a)]. The cluster formed by OG, PC, and MP is shown in the leftmost graph in Fig. 26-(b) where the directed edges represent all the dissimilarities A I ( i, i 0 ) ≤ δ NR 1 = 0 . 885 between these three nodes. W e see that this cluster forms due to the influence cycle [ OG, PC, CO, OG ] . Of all the economic input to PC, 82 . 6% comes from the OG sector – which is represented by the dissimilarity A I ( OG , PC ) = 0 . 174 – in the form of 41 MP AS RA FR SC IC CE RL MC FP OS LS BT CS WH OT OG MI UT PM RT PU FM PC ID EL WS WM PT CO AC PA PL CH PS PE MA MV RE TT NM PR FA FB FO WO AT ED MM FT SM TE AP FU WT AG TG TM AH SA HN 0.9 0.92 0.94 0.96 0.98 1 AS RA MC MP FR SC IC CE RL OG UT MI PM RT FB F A FO WO (a) δ R 1 = 0 . 959 δ R 2 = 0 . 969 δ R 3 = 0 . 977 (b) Fig. 25. (a) Reciprocal dendrogram. Output of the reciprocal clustering method H R when applied to the network N I . Three clusters formed at resolutions δ R 1 = 0 . 959 , δ R 2 = 0 . 969 , and δ R 3 = 0 . 977 are highlighted in blue, red and green, respectiv ely . (b) Highlighted clusters. Edges between sectors represent bidirectional influence between them at the corresponding resolution. raw material for its productiv e processes of which the dominant process is oil refining. In the input to CO a total of 11 . 5% comes from PC – tantamount to dissimilarity A I ( PC , CO ) = 0 . 885 – as fuel and lubricating oil for heavy machinery as well as asphalt coating, and 12 . 3% of OG’ s input comes from CO – corresponding to dissimilarity A I ( CO , OG ) = 0 . 877 – mainly from engineering projects to enable extraction such as perforation and the construction of pipelines and their maintenance. At resolution δ NR 2 = 0 . 887 this cluster grows by the simul- taneous incorporation of the support service sector AS and the professional service sector MP . These sectors join due to the loop [ AS, MP , CO, OG, PC, AS ] . The three new edges in this loop that inv olve the new sectors are the ones from PC to AS, from AS to MP and from MP to CO. Of all the economic input to AS, 13 . 4% comes from the PC sector – which is represented by the dissimilarity A I ( PC , AS ) = 0 . 866 – in the form of, e.g., fuel for the transportation of manpo wer . Of MP’ s input, 11.3% comes from AS – given by A I ( AS , MP ) = 0 . 887 – corresponding to administrative and support services hired by the MP sector for the correct deliv ery of MP’ s professional services and in the input to CO a total of 12.8% comes from MP – corresponding to A I ( MP , CO ) = 0 . 872 – from, e.g., architecture and consulting services for the construction. W e then see the incorporation of the rental service sector RL and ‘Wholesale trade’ (WH) to the five-node cluster at resolution δ NR 3 = 0 . 895 given by the loop [ WH, RL, OG, PC, AS, MP , WH ] . T o be more precise, the sector RL joins the main cluster by the aforementioned loop and by another one excluding WH, i.e. [ RL, OG, PC, AS, MP , RL ] . The formation of both loops is simultaneous since the last edge to appear is the one going from RL to OG at resolution A I ( RL , OG ) = δ NR 3 = 0 . 895 . This implies that from OG’ s inputs, 10.5% comes from RL from, e.g., rental and leasing of generators, pumps, welding equipment and other machinery for extraction. The other edges depicted in the cluster at resolution δ NR 3 that complete the two mentioned loops are the ones from MP to RL, from MP to WH, and from WH to RL. These edges are associated with the corresponding dissimilarities A I ( MP , RL ) = 0 . 886 , A I ( MP , WH ) = 0 . 836 , and A I ( WH , RL ) = 0 . 894 , all of them less than δ NR 3 . At resolution δ NR 4 = 0 . 900 the financial sectors SC and FR join this cluster due to the chain [ SC, FR, RL, OG, PC, AS, SC ] . Analogous to RL ’ s merging at resolution δ NR 3 , the sector FR merges the main cluster by the aforementioned loop and by the one excluding SC, i.e., [ FR, RL, OG, PC, AS, FR ] . Both chains are formed simultaneously since the last edge to appear is the one from FR to RL at resolution A I ( FR , RL ) = δ NR 4 = 0 . 900 . This means that from RL ’ s inputs, 10% comes from FR. The remaining edges depicted in the cluster at resolution δ NR 4 that complete the two mentioned loops are the ones from MP to SC, from MP to FR, and from SC to FR. These edges are associated with the corresponding dissimilarities A I ( MP , SC ) = 0 . 837 , A I ( MP , FR ) = 0 . 887 , and A I ( SC , FR ) = 0 . 759 , all of them less than δ NR 4 . The sole exceptions to this pattern of progressiv e agglutination are the pairings of the farms F A and the food products FB sectors at resolution δ = 0 . 893 and the textile mills TE and apparel products AP sectors at resolution δ = 0 . 938 . The nonreciprocal clustering method H NR detects cyclic influ- ences which, in general, lead to clusters that are more reasonable than those requiring the bidirectional influence that defines the reciprocal method H R . E.g., H NR merges OG with PC at reso- lution δ = 0 . 885 before they merge with the insurance sector IC at resolution δ = 0 . 923 . As we had already noted in the last paragraph of the preceding section, H R merges OG with IC before their common joining with PC. Howe ver , the preponderance of cyclic influences in the network of economic interactions N I leads to the formation of clusters that look more lik e artifacts than fundamental features. E.g., the cluster that forms at resolution δ NR 2 = 0 . 887 has AS and MP joining the three-node cluster CO- 42 CO PC OG MP AS RL WH SC FR RA UT PM FM PT MI CS MC NM OT TT RE MA IC RT CH CE FP FO FA FB PE BT PS EL SM OS PL MV PU PA TE AP PR ID WO AC WS LS TM WM AT ED FU MM FT TG WT AG AH HN SA 0.88 0.9 0.92 0.94 0.96 0.98 1 PC CO OG PC CO OG AS MP PC CO OG AS MP WH RL PC CO OG AS MP WH RL SC FR (a) δ NR 1 = 0 . 885 δ NR 2 = 0 . 887 δ NR 3 = 0 . 895 δ NR 4 = 0 . 900 (b) Fig. 26. (a) Nonreciprocal dendrogram. Output of the nonreciprocal clustering method H NR when applied to the network N I . One cluster, formed at resolution δ NR 4 = 0 . 900 , is highlighted in blue. (b) Generation of highlighted cluster. Sequential mergings of sectors at resolutions δ NR 1 = 0 . 885 , δ NR 2 = 0 . 887 , δ NR 3 = 0 . 895 , and δ NR 4 = 0 . 900 are shown. Directed edges between sectors imply unidirectional influence between them at the corresponding resolution. Notice the cyclic influences between the sectors, e.g., OG → PC → CO → OG in the leftmost diagram. PC-OG because of an influence cycle of fi ve nodes composed of [ AS, MP , CO, OG, PC, AS ] . From our discussion above, it is thus apparent that allowing clusters to be formed by arbitrarily long cycles ov erlooks important bidirectional influences between co-clustered nodes. If we wanted a clustering method which at resolution δ NR 2 = 0 . 887 would cluster the nodes PC, CO, and OG into one cluster and AS and MP into another cluster , we should allo w influence to propagate through c ycles of at most three or four nodes. A family of methods that permits this degree of flexibility is the family of semi-reciprocal methods H SR ( t ) that we discussed in Section VII-C and whose application we exemplify in the follo wing section. 3) Semi-r ecipr ocal clustering H SR (3) : The outcome of ap- plying the semi-reciprocal clustering method H SR (3) defined in Section VII-C to the network N I is computed with the formula in (117). The resulting output dendrogram is shown in Fig. 27- (a). T wo clusters generated at resolutions δ SR 1 = 0 . 909 and δ SR 2 = 0 . 917 are highlighted in red and blue, respectively . These clusters are depicted in Fig. 27-(b) with directed edges between the nodes representing dissimilarities less than or equal to the corresponding resolution. E.g., for the cluster generated at resolu- tion δ SR 1 = 0 . 909 (red), we draw an edge from sector i to sector i 0 if and only if A I ( i, i 0 ) ≤ δ SR 1 . Comparing the semi-reciprocal dendrogram in Fig. 27-(a) with the reciprocal and nonreciprocal dendrograms in figs. 25-(a) and 26-(a), we observe that semi- reciprocal clustering merges any pair of sectors into a cluster at a resolution not higher than the resolution at which they are co-clustered by reciprocal clustering and not lo wer than the one at which they are co-clustered by nonreciprocal clustering. E.g., the sectors of construction CO and ‘Fabricated metal products’ (FM) become part of the same cluster at resolution δ R = 0 . 980 in the reciprocal dendrogram, at resolution δ SR = 0 . 950 in the semi-reciprocal dendrogram and at resolution δ NR = 0 . 912 in the nonreciprocal dendrogram, satisfying δ NR ≤ δ SR ≤ δ R . The inequalities described among the merging resolutions need not be strict as in the previous example, e.g., the farms (F A) sector merges with the food products FB sector at resolution δ = 0 . 893 for the reciprocal, nonreciprocal and semi-reciprocal clustering methods. This ordering of the merging resolutions is as it should be since the reciprocal and nonreciprocal ultrametrics uniformly bound the output ultrametric of any clustering method satisfying the axioms of v alue and transformation such as the semi-reciprocal clustering method [cf. (69)]. The semi-reciprocal clustering method H SR (3) allows reason- able cyclic influences and is insensiti ve to intricate influences described by long c ycles. As we pointed out in the two pre- ceding subsections, H R does not recognize the obvious relation between the sectors oil and gas extraction OG and the petroleum products PC sectors because it requires direct bidirectional influ- ence whereas H NR merges OG and PC at a low resolution but also considers other counter-intuitiv e cyclic influence structures represented by long loops such as the merging of the service sectors AS and MP with the cluster OG-PC-CO before forming a cluster by themselves [cf. Fig. 26]. The semi-reciprocal method H SR (3) combines the desirable features of the reciprocal and nonreciprocal methods. Indeed, as can be seen from the semi- reciprocal dendrogram in Fig. 27-(a), H SR (3) recognizes the hea vy industry cluster OG-PC-CO since these three sectors are the first to merge at resolution δ = 0 . 885 . Howe ver , the service sectors MP and AS form a cluster of their own before merging with the heavy industry cluster . T o be more precise, MP and AS merge at resolution δ = 0 . 887 due to the bidirectional influence between 43 CO PC OG RL MP AS MC FR SC RA WH IC MI NM RT PM UT PT CS RE SM MA BT CE FM EL CH OS ID PE FP PU TT MV PR FA FB FO WO AC WS PS OT LS PA PL TM TE AP WM AT ED FU MM FT TG WT AG AH HN SA 0.88 0.9 0.92 0.94 0.96 0.98 1 SC RA FR PC OG CO RL AS MP (a) δ SR 1 = 0 . 909 δ SR 2 = 0 . 917 (b) Fig. 27. (a) Semi-reciprocal dendrogram. Output of the semi-reciprocal clustering method H SR (3) when applied to the network N I . T wo clusters formed at resolution δ SR 1 = 0 . 909 and δ SR 2 = 0 . 917 are highlighted in red and blue, respectively . (b) Highlighted clusters. Directed edges between sectors imply unidirectional influence between them at the corresponding resolution. Cyclic influences can be observed. them. This resolution coincides with the first merging in the reciprocal dendrogram [cf. Fig. 25-(a)]. When we increase the resolution, at δ SR 2 = 0 . 917 the ‘Rental and leasing services’ (RL) sector acts as an intermediary merging the OG-PC-CO cluster with the MP-AS cluster forming the blue cluster in Fig. 27-(b). The cycle containing RL with secondary chains of length at most 3 nodes is [ RL, OG, PC, AS, RL ] . The sector RL uses administrati ve and support services from AS to pro vide their o wn leasing services and leasing is a common practice in the OG sector . Thus, the influences depicted in the blue cluster . At resolution δ SR 1 = 0 . 909 the credit intermediation sector FR, the in vestment sector SC and the real estate sector RA form a three-node cluster gi ven by the influence cycle [ RA, SC, FR, RA ] and depicted in red in Fig. 27-(b). Of all the economic input to SC, 9 . 1% comes from the RA sector – which is represented by the dissimilarity A I ( RA , SC ) = 0 . 909 – in the form of, e.g., leasing services related to real estate in vestment trusts. The sector SC provides 24.1% of FR’ s input – corresponding to A I ( SC , FR ) = 0 . 759 – whereas FR represents 35.1% of RA ’ s input – corresponding to A I ( FR , RA ) = 0 . 649 . W e interpret the relation among these three sectors as follows: the credit intermediation sector FR acts as a vehicle to connect the in vestments sector SC with the sector that attracts inv estments RA. Notice that in the nonreciprocal dendrogram in Fig. 26-(a), these three sectors join the main blue cluster separately due to the formation of intricate influence loops. The semi-reciprocal method, by not allo wing the formation of long loops, distinguishes the more reasonable cluster formed by FR-RA-SC. 4) Unilater al clustering H U : The outcome of applying the unilateral clustering method H U defined in (147) to the network N I is computed with the algorithmic formula in (149). The resulting output dendrogram is sho wn in Fig. 28-(a). Four clusters appearing at resolutions δ U 1 = 0 . 775 , δ U 2 = 0 . 831 , δ U 3 = 0 . 854 , and δ U 4 = 0 . 883 are highlighted in blue, red, orange, and green, respectiv ely . In Fig. 28-(b) we explicit the highlighted clusters and draw a directed edge between two nodes if and only if the dissimilarity between them is less than or equal to the corresponding resolution at which the clusters are formed. E.g., for the cluster generated at resolution δ U 1 = 0 . 775 (blue), we dra w an edge from sector i to sector i 0 if and only if A I ( i, i 0 ) ≤ δ U 1 . Unidirectional influence is enough for clusters to form when applying the unilateral clustering method. The asymmetry of the original network N I is put in evidence by the difference between the unilateral dendrogram in Fig. 28- (a) and the reciprocal dendrogram in Fig. 25-(a). The last merging in the unilateral dendrogram, i.e. when ‘W aste management and remediation services’ (WM) joins the main cluster, occurs at δ = 0 . 923 . If, in turn, we cut the reciprocal dendrogram at this resolution, we observ e 57 singleton clusters and tw o pairs of nodes merged together . Recall that if the original network is symmetric, the unilateral and the reciprocal dendrograms must coincide [cf. (87) and (148)] and so must every other method satisfying the agnostic set of axioms in Section X-B [cf. (151)]. Thus, the observed difference between the dendrograms is a manifestation of asymmetries in the network N I . Unilateral clustering detects intense one-way influences be- tween sectors. The first two sectors to be merged into a single cluster by the unilateral clustering method H U are the financial sectors SC and ‘Funds, trusts, and other financial vehicles’ (FT) at a resolution δ = 0 . 132 . This occurs because 86.8% of FT’ s input comes from SC, corresponding to A I ( SC , FT ) = 0 . 132 the smallest positiv e dissimilarity in the network N I . The strong influence of SC o ver FT is expected since FT is comprised of entities or ganized to pool securities coming from the SC sector . The next merging when increasing the resolution occurs at δ = 0 . 174 between oil and gas extraction OG and petroleum 44 SC FT FR RA HN ED LS WS SA WH PU PS MP MC CS AC PE AS BT AG ID OS SM RE OG PC UT WT AT OT TG TT MV RT PT CO RL AH CE TM FA FB FP PM FM EL MA FO WO TE CH PL AP FU PA PR MM MI NM IC WM 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 RA ED FR HN LS SC FT MP A C PU PS MC CS PC OG UT PT CO TG R T TT A T WT O T MV CH PL P A PR TE AP FO WO FU (a) δ U 1 = 0 . 775 δ U 2 = 0 . 831 δ U 3 = 0 . 854 δ U 4 = 0 . 883 (b) Fig. 28. (a) Unilateral dendrogram. Output of the unilateral clustering method H U when applied to the network N I . Four clusters formed at resolutions δ U 1 = 0 . 775 , δ U 2 = 0 . 831 , δ U 3 = 0 . 854 , and δ U 4 = 0 . 883 are highlighted in blue, red, orange, and green, respectively . (b) Highlighted clusters. Directed edges between sectors imply unidirectional influence between them at the corresponding resolution. Cycles are not required for the formation of clusters due to the definition of unilateral clustering H U . and coal products PC since 82.6% of PC’ s input comes from OG – tantamount to A I ( OG , PC ) = 0 . 174 – mainly as crude oil for refining. The following three mer gings correspond to sequential additions to the OG-PC cluster of the utilities UT , ‘W ater transportation’ (WT), and ‘ Air transportation’ (A T) sectors at resolution δ = 0 . 428 , δ = 0 . 482 , and δ = 0 . 507 , respectiv ely . These mergings occur because 57.2% of UT’ s input comes from OG in the form of natural gas for both distribution and fuel for the generation of electricity and for the transportation sectors WT and A T , 51.8% and 49.3% of the respectiv e inputs come from PC as the provision of liquid fuel. Unilateral clusters tend to form around sectors of intense output. This observation is analogous to the formation of clusters around populous states, hence with intense population mo vement, that we observed in Section XII-A4. Indeed, if for each sector we ev aluate the commodity intermediate v alue in dollars, i.e. the total output not destined to final uses, the professional service MP sector achiev es the maximum followed by , in decreasing order , the sectors RA, OG, FR, AS and ‘Chemical products’ (CH). These top sectors are composed of massiv ely demanded services like professional, support, real estate and financial services plus the core activities of two important industries, namely oil & gas and chemical products. Of these top six sectors, fiv e are contained in the four clusters highlighted in Fig. 28-(b), with ev ery cluster containing at least one of these sectors and the cluster formed at resolution δ U 1 = 0 . 775 (blue) containing two, FR and RA. These clusters of intense output hav e influence, either directly or indirectly , over most of the sectors in their same cluster . E.g., in the cluster formed at resolution δ U 2 = 0 . 831 (red) in Fig. 28-(b) there is a directed edge from MP to ev ery other sector in the cluster . This occurs because MP provides professional and technical services that represent, in decreasing order, 33.8%, 20.3%, 19.8%, 17.8%, and 16.9% of the input to the sectors of management of companies MC, ‘Motion picture and sound recording industries’ (PS), ‘Computer systems design and related services’ (CS), ‘Publishing industries’ (PU), and ‘ Accommoda- tion’ (A C), respectiv ely . Consequently , in the unilateral clustering we can observe the MP sector merging with MC at resolution δ = 0 . 662 followed by a sequential merging of the remaining singleton clusters, i.e. PS at δ = 0 . 797 , CS at δ = 0 . 802 , PU at δ = 0 . 822 and finally AC joins at resolution δ U 2 = 0 . 831 . As another example consider the cluster formed at resolution δ U 4 = 0 . 833 (green) containing the influential sector CH. Its influence over four different industries, namely plastics, apparel, paper and w ood, is represented by the four directed branches leaving from CH in Fig. 28-(b). The sector CH first merges with ‘Plastics and rubber products’ (PL) at resolution δ = 0 . 531 because 46.9% of PL ’ s input comes from CH as materials needed for the handling and manufacturing of plastics. The textile mills TE sector then merges at resolution δ = 0 . 622 because 37.8% 45 PC CO OG AS MP WH RL SC FR RA PC CO OG AS MP WH RL SC FR RA PC CO OG AS MP WH RL SC FR RA PC CO OG AS MP WH RL SC FR RA δ ∗ 1 = 0 . 884 δ ∗ 2 = 0 . 886 δ ∗ 3 = 0 . 894 δ ∗ 4 = 0 . 899 Fig. 29. Directed single linkage quasi-clustering method applied to a portion of the sectors of the economy . Quasi-partitions shown for resolutions 0.001 smaller than the first four merging resolutions in the dendrogram component D ∗ I of the quasi-dendrogram ˜ D ∗ I . The edges define a partial order among the blocks of every quasi-partition. of TE’ s input comes from CH as dyes and other chemical products for the fabric manufacturing. At resolution δ = 0 . 804 the pre viously formed cluster composed of the forestry FO and wood products WO sectors join the CH-PL-TE cluster due to the dependence of FO on CH for the provision of chemicals for soil treatment and pest control. At resolution δ = 0 . 822 , the apparel sector AP joins the main cluster due to its natural dependence on the fabrics generated by TE. Indeed, 17.8% of AP’ s input comes from TE. In a similar way , at resolution δ = 0 . 867 , ‘Furniture and related products’ (FU) joins the cluster due to the influence from the WO sector . Finally , at resolution δ U 4 = 0 . 833 , the pre viously clustered paper industry comprised of the sectors ‘Paper products’ (P A) and ‘Printing and related support acti vities’ (PR) joins the main cluster due to the intense utilization of chemical products in the paper manufacturing process. 5) Dir ected single linkage quasi-clustering ˜ H ∗ : The outcome of applying the directed single linkage quasi-clustering method ˜ H ∗ with output quasi-ultrametrics defined in (7) to the network N I is computed with the algorithmic formula in (134). In Fig. 29 we present four quasi-partitions of the output quasi-dendrogram ˜ D ∗ I = ( D ∗ I , E ∗ I ) focusing on ten economic sectors. W e limit the view of the quasi-partitions – which were computed for the whole network – to ten sectors to facilitate the interpretation. These ten sectors are the first to cluster in the dendrogram component D ∗ I of the quasi-dendrogram ˜ D ∗ I . T o see this, recall that from Proposition 11 we hav e that D ∗ I = H NR ( N I ) , i.e. the dendrogram component D ∗ I coincides with the output dendrogram of applying the nonreciprocal clustering method to the network N I . Hence, the ten sectors depicted in the quasi-partitions in Fig. 29 coincide with the ten leftmost sectors in the dendrogram in Fig. 26-(a). W e present quasi-partitions ˜ D ∗ I ( δ ) for four different resolutions δ ∗ 1 = 0 . 884 , δ ∗ 2 = 0 . 886 , δ ∗ 3 = 0 . 894 , and δ ∗ 4 = 0 . 899 . These resolutions are 0.001 smaller than the first four merging resolutions in the dendrogram component D ∗ I or , equiv alently , in the nonreciprocal dendrogram [cf. Fig. 26-(b)]. The edge component E ∗ I of the quasi-dendrogram ˜ D ∗ I captures the asymmetric influence between clusters. E.g. in the quasi- partition in Fig. 29 for resolution δ ∗ 1 = 0 . 884 ev ery cluster is a singleton since the resolution is smaller than that of the first merging. Ho wev er , the influence structure re veals an asymmetry in the dependence between the economic sectors. At this resolution the professional service sector MP has influence over every other sector except for the rental services RL as depicted by the eight arrows leaving the MP sector . No sector has influence ov er MP at this resolution since this would imply , except for RL, the formation of a non singleton cluster . The influence of MP reaches primary sectors as OG, secondary sectors as PC and tertiary sectors as AS or SC. The v ersatility of MP’ s influence can be explained by the div ersity of services condensed in this economic sector , e.g. civil engineering and architectural services are demanded by CO, production engineering by PC and financial consulting by SC. For the rest of the influence pattern, we can observe an influence of CO over OG mainly due to the construction and maintenance of pipelines, which in turn influences PC due to the provision of crude oil for refining. Thus, from the transiti vity (QP2) property of quasi-partitions introduced in Section IX we hav e an influence edge from CO to PC. The sectors CO, PC and OG influence the support service sector AS. Moreover , the service sectors RA, SC and FR hav e a totally hierarchical influence structure where SC has influence ov er the other tw o and FR has influence ov er RA. Since these three nodes remain as singleton clusters for the resolutions studied, the influence structure described is preserved for higher resolutions as it should be from the influence hierarchy property of the the edge set E ∗ S ( δ ) stated in condition ( ˜ D3) in the definition of quasi- dendrogram in Section IX-A. At resolution δ ∗ 2 = 0 . 886 , we see that the sectors OG-PC-CO hav e formed a three-node cluster depicted in red that influences AS. At this resolution, the influence edge from MP to RL appears and, thus, MP gains influence o ver e very other cluster in the quasi- partition including the three-node cluster . At resolution δ = 0 . 887 the service sectors AS and MP join the cluster OG-PC-CO and for δ ∗ 3 = 0 . 894 we hav e this fiv e-node cluster influencing the other fiv e singleton clusters plus the mentioned hierarchical structure among SC, FR, and RA and an influence edge from WH to RL. When we increase the resolution to δ ∗ 4 = 0 . 899 we see that RL and WH hav e joined the main cluster that influences the other three singleton clusters. If we keep increasing the resolution, we would see at resolution δ = 0 . 900 the sectors SC and FR joining the main cluster which would have influence over RA the only other cluster in the quasi-partition. Finally , at resolution δ = 0 . 909 RA joins the main cluster and the quasi-partition contains only one block. The influence structure between clusters at an y gi ven resolution defines a partial order . More precisely , for ev ery resolution δ , the edge set E ∗ I ( δ ) defines a partial order between the blocks given by the partition D ∗ I ( δ ) . W e can use this partial order to ev aluate the relati ve importance of different clusters by stating that more important sectors hav e influence over less important ones. E.g., at resolution δ ∗ 1 = 0 . 884 we have that MP is more important than ev ery other sector except for RL, which is incomparable at this resolution. There are three totally ordered chains that hav e MP as the most important sector at this resolution. The first one contains fiv e sectors which are, in decreasing order of importance, MP , CO, 46 OG, PC, and AS. The second one is comprised of MP , SC, FR, and RA and the last one only contains MP and WH. At resolution δ ∗ 2 = 0 . 886 we observe that the three-node cluster OG-PC-CO, although it contains more nodes than any other cluster , it is not the most important of the quasi-partition. Instead, the singleton cluster MP has influence ov er the three-node cluster and, on top of that, is comparable with ev ery other cluster in the quasi-partition. From resolution δ ∗ 3 = 0 . 894 onwards, after MP joins the red cluster, the cluster with the largest number of nodes coincides with the most important of the quasi-partition. At resolution δ ∗ 4 = 0 . 899 we ha ve a total ordering among the four clusters of the quasi-partition. This is not true for the other three depicted quasi-partitions. As a further illustration of the quasi-clustering method ˜ H ∗ , we apply this method to the network N C = ( C, A C ) of consolidated industrial sectors [48] of year 2011 where | C | = 14 – see T able II – instead of the original 61 sectors. T o generate the dissimilarity function A C from the similarity data available in [48] we use (173). The outcome of applying the directed single linkage quasi- clustering method ˜ H ∗ with output quasi-ultrametrics defined in (7) to the network N C is computed with the algorithmic formula in (134). Of the output quasi-dendrogram ˜ D ∗ C = ( D ∗ C , E ∗ C ) , in Fig. 30-(a) we show the dendrogram component D ∗ C and in Fig. 30-(b) we depict the quasi-partitions ˜ D ∗ C ( ˜ δ ∗ i ) for ˜ δ ∗ 1 = 0 . 787 , ˜ δ ∗ 2 = 0 . 845 , ˜ δ ∗ 3 = 0 . 868 , ˜ δ ∗ 4 = 0 . 929 , and ˜ δ ∗ 5 = 0 . 933 , corresponding to resolutions 0.001 smaller than mergings in the dendrogram D ∗ C . The reason why we use the consolidated network N C is to facilitate the visualization of quasi-partitions that capture ev ery sector of the economy instead of only ten particular sectors as in the pre vious application. The quasi-dendrogram ˜ D ∗ C captures the asymmetric influences between clusters of industrial sectors at every resolution. E.g., at resolution ˜ δ ∗ 1 = 0 . 787 the dendrogram D ∗ C in Fig. 30-(a) indicates that ev ery industrial sector forms its own singleton cluster . Howe ver , this simplistic representation, characteristic of clustering methods, ignores the asymmetric relations between clusters at resolution ˜ δ ∗ 1 . These influence relations are formalized in the quasi-dendrogram ˜ D ∗ C with the introduction of the edge set E ∗ C ( δ ) for every resolution δ . In particular, for ˜ δ ∗ 1 we see in Fig. 30-(b) that the sectors of ‘Finance, insurance, real estate, rental, and leasing’ (FIR) and ‘Manufacturing’ (MAN) combined hav e influence ov er the remaining 12 sectors. More precisely , the influence of FIR is concentrated on the service and commercial- ization sectors of the economy whereas the influence of MAN is concentrated on primary sectors, transportation, and construction. Furthermore, note that due to the transitivity (QP2) property of quasi-partitions defined in Section IX, the influence of FIR over ‘Professional and business services’ (PR O) implies influence of FIR over every sector influenced by PR O. The influence among the remaining 11 sectors, i.e. excluding MAN, FIR and PR O, is minimal, with the ‘Mining’ (MIN) sector influencing the ‘Utilities’ (UTI) sector . This influence is promoted by the influence of the ‘Oil and gas extraction’ (OG) subsector of MIN o ver the utilities sector as observed in the cluster formed at resolution δ U 3 = 0 . 854 (orange) by the unilateral clustering method [cf. Fig. 28-(b)]. At resolution ˜ δ ∗ 2 = 0 . 845 , FIR and PR O form one cluster , depicted in red, and they add an influence to the ‘Construction’ (CON) sector apart from the previously formed influences that must persist due to the influence hierarchy property of the the edge set E ∗ C ( δ ) stated in condition ( ˜ D3) in the definition of quasi-dendrogram in Section IX-A. The manufacturing sector also intensifies its influences by reaching the commercialization sectors ‘Retail trade’ (RET) and ‘Wholesale trade’ (WHO) and the service sector ‘Educational services, health care, and social assistance’ (EHS). The influence among the rest of the sectors is still scarce with the only addition of the influence of ‘T ransportation and warehousing’ (TRA) ov er UTI. At resolution ˜ δ ∗ 3 = 0 . 868 we see that mining MIN and manufacturing MAN form their own cluster , depicted in green. The pre viously formed red cluster has influence ov er ev ery other cluster in the quasi-partition, including the green one. At resolution ˜ δ ∗ 4 = 0 . 929 , the red and green clusters become one, composed of four original sectors. Also, the influence of the transportation TRA sector over the rest is intensified with the appearance of edges to the primary sector ‘ Agriculture, forestry , fishing, and hunting’ (A GR), the construction CON sector and the commercialization sectors RET and WHO. Finally , at resolution ˜ δ ∗ 5 = 0 . 933 there is one clear main cluster depicted in red and composed of seven sectors spanning the primary , secondary , and tertiary sectors of the economy . This main cluster influences every other singleton cluster . The only other influence in the quasi- partition ˜ D ∗ C (0 . 933) is the one of RET ov er CON. For increasing resolutions, the singleton clusters join the main red cluster until at resolution δ = 0 . 988 the 14 sectors form one single cluster . The influence structure at e very resolution induces a partial order in the blocks of the corresponding quasi-partition. As done in previous examples, we can interpret this partial order as an ordering of relativ e importance of the elements within each block. E.g., we can say that at resolution ˜ δ ∗ 1 = 0 . 787 , MAN is more important that MIN which in turn is more important than UTI which is less important that PRO. Howe ver , PR O and MAN are not comparable at this resolution. At resolution ˜ δ ∗ 4 = 0 . 929 , after the red and green clusters have mer ged together at resolution δ = 0 . 869 , we depict the combined cluster as red. This representation is not arbitrary , the red color of the combined cluster is inherited from the most important of the two component cluster . The fact that the red cluster is more important than the green one can be seen from the edge from the former to the latter in the quasi-partition at resolution ˜ δ ∗ 3 . In this sense, the edge component E ∗ C of the quasi-dendrogram formally provides a hierarchical structure between clusters at a fixed resolution apart from the hierarchical structure across resolutions giv en by the dendrogram component D ∗ C of the quasi-dendrogram. E.g., if we focus only on the dendrogram D ∗ C in Fig. 30-(a), the nodes MIN and MAN seem to play the same role. Howe ver , when looking at the quasi-partitions at resolutions ˜ δ ∗ 1 and ˜ δ ∗ 2 , it follows that MAN has influence over a larger set of nodes than MIN and hence plays a more important role in the clustering for increasing resolutions. Indeed, if we delete the three nodes with the strongest influence structure, namely PRO, FIR, and MAN, and apply the quasi-clustering method ˜ H ∗ on the remaining 11 nodes, the first merging occurs between the mining MIN and utilities UTI sectors at δ = 0 . 960 . At this same resolution, in the original dendrogram component in Fig. 30-(a), a main cluster composed of 12 nodes only excluding ‘Other services, except government’ (OSE) and EHS is formed. This indicates that by removing influential sectors of the economy , the tendenc y to co-cluster of the remaining sectors is decreased. X I I I . C O N C L U S I O N Continuing the line of work in [11], [13], [14], we have dev eloped a theory for hierarchically clustering asymmetric – 47 FIR PRO MIN MAN AGR TRA WHO AER INF RET CON UTI OSE EHS 0.78 0.8 0.82 0.84 0.86 0.88 0.9 0.92 0.94 0.96 0.98 (a) A GR MIN UTI CON MAN WHO RET TRA INF FIR PR O EHS AER OSE A GR MIN UTI CON MAN WHO RET TRA INF FIR PRO EHS AER OSE A GR MIN UTI CON MAN WHO RET TRA INF FIR PRO EHS AER OSE A GR MIN UTI CON MAN WHO RET TRA INF FIR PRO EHS AER OSE A GR MIN UTI CON MAN WHO RET TRA INF FIR PRO EHS AER OSE ˜ δ ∗ 1 ˜ δ ∗ 2 ˜ δ ∗ 3 ˜ δ ∗ 4 ˜ δ ∗ 5 ˜ δ ∗ 1 = 0 . 787 ˜ δ ∗ 2 = 0 . 845 ˜ δ ∗ 3 = 0 . 868 ˜ δ ∗ 4 = 0 . 929 ˜ δ ∗ 5 = 0 . 933 (b) Fig. 30. (a) Dendrogram component D ∗ C of the quasi-dendrogram ˜ D ∗ C = ( D ∗ C , E ∗ C ) . Output of the directed single linkage quasi-clustering method ˜ H ∗ when applied to the network N C . (b) Quasi-partitions. Giv en by the specification of the quasi-dendrogram ˜ D ∗ C at a particular resolution ˜ D ∗ C ( ˜ δ ∗ k ) for k = 1 , . . . , 5 . T ABLE II C O DE A N D D E SC R I PT I O N O F C O N SO L I DA T E D I ND U S T RI A L S E C TO R S Code Consolidated Industrial Sector Code Consolidated Industrial Sector AER Arts, entertain., recreation, accomm., and food serv . A GR Agriculture, forestry , fishing, and hunting CON Construction EHS Educational services, health care, and social assistance FIR Finance, insurance, real estate, rental, and leasing INF Information MAN Manufacturing MIN Mining OSE Other services, except government PR O Professional and business services RET Retail trade TRA T ransportation and warehousing UTI Utilities WHO Wholesale trade 48 weighted and directed – networks. Starting from the observ a- tion that generalizing methods used to cluster metric data to asymmetric networks is not always intuitiv e, we identified simple reasonable properties and proceeded to characterize the space of methods that are admissible with respect to them. The properties that we hav e considered are the following: (A1) Axiom of V alue. In a network with two nodes, the output dendrogram consists of two singleton clusters for resolutions smaller than the maximum of the tw o intervening dissimilarities and a single two-node cluster for larger resolutions. (A1’) Extended Axiom of V alue. Define a canonical asymmet- ric network of n nodes in which the two directed dissimilarities mediating between any gi ven pair of points are the same for any pair of nodes. These two dissimilarity values are allowed to be different.The output dendrogram consists of n singleton clusters for resolutions smaller than the maximum of the two intervening dissimilarities and, consists of a single n -node cluster for larger resolutions. (A1”) Alternative Axiom of V alue. In a network with two nodes, the output dendrogram consists of two singleton clusters for resolutions smaller than the minimum of the two interv ening dissimilarities, and consists of a single two-node cluster for larger resolutions. (A1 ”’ ) Agnostic Axiom of V alue. In a network with two nodes, the output dendrogram consists of two singleton clusters for resolutions smaller than the minimum of the two intervening dissimilarities, and consists of a single two-node cluster for resolutions larger than their maximum. (A2) Axiom of T ransformation. Consider two giv en networks N and M and a dissimilarity reducing map from the nodes of N to the nodes of M , i.e. a map such that dissimilarities between the image nodes in M are smaller than or equal to the corresponding dissimilarities of the pre-image nodes in N . Then, the resolution at which any two nodes merge into a common cluster in the network M is smaller than or equal to the resolution at which their pre-images merge in the network N . (P1) Pr operty of Influence. For any network with n nodes, the output dendrogram consists of n singleton clusters for resolutions smaller than the minimum loop cost of the network – the loop cost is the maximum directed dissimilarity when trav ersing the loop in a giv en direction, and the minimum loop cost is the cost of the loop of smallest cost. (P1’) Alternative Pr operty of Influence. For any network with n nodes, the output dendrogram consists of n singleton clusters for resolutions smaller than the separation of the network – defined as the smallest positi ve dissimilarity across all pairs of nodes. (P2) Stability . F or any two netw orks N and M , the generalized Gromov-Hausdorf f distance between the corresponding output dendrograms is uniformly bounded by the generalized Gromo v- Hausdorff distance between the networks. Throughout the paper we identified and described clustering methods satisfying different subsets of the abov e properties. Sev eral methods were based on finding directed chains of min- imum cost, where the chain cost was defined as the maximum dissimilarity encountered when trav ersing the giv en chain. The set of clustering methods that we hav e considered in the present paper is comprised of the follo wing: Recipr ocal. Nodes x and x 0 are clustered together at a gi ven resolution δ if there exists a chain linking x to x 0 such that the directed chain costs are not larger than δ in either direction. Nonr eciprocal. Nodes x and x 0 are clustered together at a gi ven resolution δ if there exist two chains, one linking x to x 0 and the other linking x 0 to x , such that both directed chain costs are not larger than δ in either direction. In contrast to the reciprocal method, the chains linking x to x 0 and x 0 to x may be dif ferent. Grafting . Grafting methods are defined by exchanging branches between the reciprocal and nonreciprocal dendro- grams as dictated by an exogenous parameter β . T wo grafting methods were studied. In both methods, the reciprocal den- drogram is sliced at resolution β . In the first method, the branches of resolution smaller than β are replaced by the corresponding branches of the nonreciprocal dendrogram. In the second method, the branches of resolution smaller than β are preserved and these branches merge either at resolution β or at the resolution giv en by the nonreciprocal dendrogram, whichev er is larger . Con vex combinations. Gi ven a network N and two clustering methods H 1 and H 2 , denote by D 1 and D 2 the corresponding output dendrograms. Construct a symmetric network M so that the dissimilarities between any pair ( x, x 0 ) is giv en by the con vex combination of the minimum resolutions at which x and x 0 are clustered together in D 1 and D 2 . Cluster the network M with the single linkage method to define a valid dendrogram. Semi-r eciprocal. A semi-reciprocal chain of index t ≥ 2 between two nodes x and x 0 is formed by concatenating directed chains of length at most t , called secondary chains, from x to x 0 and back. The nodes at which secondary chains in both directions concatenate must coincide, although the chains themselves might differ . Nodes x and x 0 are clustered together at a gi ven resolution δ if they can be link ed by a semi-reciprocal chain of cost not larger than δ . Algorithmic intermediate. Generalizes the semi-reciprocal clustering methods by allo wing the maximum length t of secondary chains to be dif ferent in both directions. Unilateral. Consider the cost of an undirected chain as one where the edge cost between two consecutiv e nodes is giv en by the minimum directed cost in both directions. Nodes x and x 0 are clustered together at a giv en resolution δ if there exists an undirected chain linking x and x 0 of cost not larger than δ . W e can build a taxonomy of this paper from the perspectiv e of axioms and properties and an intertwined taxonomy from the perspectiv e of clustering methods as we summarize in T able III and elaborate in the follo wing sections. A. T axonomy of axioms and pr operties The taxonomy from the perspective of axioms and properties is encoded in the rows in T able III. For most of the paper , the axioms of v alue (A1) and transformation (A2) were requirement for admissibility . All of the methods enumerated abo ve satisfy the Ax- iom of Transformation whereas all methods, except for unilateral clustering, satisfy the Axiom of V alue. Although seemingly weak, (A1) and (A2) are a stringent source of structure. E.g., we showed that admissibility with respect to (A1) and (A2) is equiv alent to 49 T ABLE III S U MM A RY O F M E T HO D S A N D P RO P ERT I E S Reciprocal Nonreciprocal Grafting Conv ex Semi- Algorithmic Unilateral combs. reciprocal intermediate (A1) Axiom of V alue x x x x x x (A1’) Extended Axiom of V alue x x x x x x (A1”) Alt. Axiom of V alue x (A1”’) Agnostic Axiom of V alue x x x x x x x (A2) Axiom of T ransformation x x x x x x x (P1) Property of Influence x x x x x x (P1’) Alt. Property of Influence x x x x x x x (P2) Stability x x x x x admissibility with respect to the apparently stricter conditions giv en by the Extended Axiom of V alue (A1’) combined with (A2). Likewise, we showed that the Property of Influence (P1) is implied by (A1) and (A2). This latter fact can be interpreted as stating that the requirement of bidirectional influence in two-node networks combined with the Axiom of Transformation implies a requirement for loops of influence in all networks. Given that (A1’) and (P1) are implied by (A1) and (A2) and that all methods except for unilateral clustering satisfy (A1) and (A2) it follows that all methods other than unilateral clustering satisfy (A1’) and (P1) as well. The Alternativ e Axiom of V alue (A1”) is satisfied by unilateral clustering only , which is also the unique method listed above that satisfies the Alternative Property of Influence (P1’) but does not satisfy the (regular) Property of Influence. W e have also prov ed that (P1’) is implied by (A1”) and (A2) in the same manner that (P1) is implied by (A1) and (A2). Since the Agnostic Axiom of V alue (A1”’) encompasses (A1) and (A1”) all of the methods listed abov e satisfy (A1”’). T o study stability , we adopted the Gromov-Hausdorff distance. This adopted distance was sho wn to be properly defined which therefore allo ws the quantification of differences between net- works. Since output dendrograms are equiv alent to finite ultra- metric spaces which in turn are particular cases of networks, this distance can be used to compare both the giv en networks and their corresponding output ultrametrics. The notion of stability of a given method that we adopted is that the distance between two outputs produced by the given hierarchical clustering method is bounded by the distance between the original networks. This means that clustering methods are non-expansi ve maps in the space of networks, i.e. they do not increase the distance between the gi ven networks. An intuiti ve interpretation of the stability property is that similar networks yield similar dendrograms. The Stability Property (P2) is satisfied by reciprocal, nonreciprocal, semi-reciprocal, algorithmic intermediates, and unilateral cluster- ing methods. The grafting and con ve x combination families are not stable in this sense. B. T axonomy of methods A classification from the perspectiv e of methods follo ws from reading the columns in T able III. This taxonomy is more inter- esting than the one in Section XIII-A because the reciprocal, nonreciprocal, and unilateral methods not only satisfy desirable properties but ha ve also been pro ved to be either extremal or unique among those methods that are admissible with respect to some subset of properties. Indeed, reciprocal H R and nonreciprocal H NR clustering were shown to be extremes of the range of methods that satisfy (A1)- (A2) in that the clustering outputs of these two methods provide uniform upper and lo wer bounds, respecti vely , for the output of ev ery other method under this axiomatic framework. These two methods also satisfy all the other desirable properties that are compatible with (A1). I.e., they satisfy the extended and agnostic axioms of value, the Property of Influence, and, implied by it, the Alternativ e Property of Influence. They are also stable in terms of the generalized Gromov-Hausdorf f distance. Unilateral clustering H U is the unique method that abides by the alternativ e set of axioms (A1”)-(A2). In that sense it plays the dual role of reciprocal and nonreciprocal clustering when we replace the Axiom of V alue (A1) with the Alternati ve Axiom of V alue (A1”). Unilateral clustering also satisfies all the desirable properties that are compatible with (A1”). It satisfies the Agnostic Axiom of V alue, the Alternativ e Property of Influence, and Stability . In this paper , unilateral clustering H U and reciprocal H R were shown to be extremal among methods that are admissible with respect to (A1”’)-(A2). Unilateral clustering yields uniformly minimal ultrametric distances, while reciprocal clustering yields uniformly maximal ultrametric distances. W e also considered families of intermediate methods that yield ultrametrics that lie between the outputs of reciprocal and nonreciprocal clustering. The first such family considered is that of grafting methods H R / NR ( β ) and H R / R max ( β ) . They satisfy the axioms and properties that can be derived from (A1)-(A2), i.e. the Extended Axiom of V alue, the Agnostic Axiom of V alue, the Property of Influence and the Alternative Property of Influence. Their dependance on a cutting parameter β is the reason why they fail to fulfill Stability , hence, impairing practicality of these methods. Conv ex combination clustering methods H 12 θ constitute another f amily of intermediate methods considered in this pa- per . Their admissibility is based on the result that the con vex combination of two admissible methods is itself an admissible clustering method. Ho we ver , although not prov ed here, one can show that the conv ex combination operation does not preserve Stability in general. Semi-reciprocal clustering methods H SR ( t ) allow the formation of cyclic influences in a more restrictive way than nonreciprocal clustering but more permissiv e than reciprocal clustering, controlled by the integer parameter t . Semi-reciprocal clustering methods were shown to be stable. Algorithmic inter- mediate clustering methods H t,t 0 are a generalization of semi- reciprocal methods and share their same properties. 50 T ABLE IV S U MM A RY O F A L G OR I T H MS Method Observations Notation Formula Reciprocal u R X max A X , A T X ( n − 1) Nonrecipr ocal u NR X max A ( n − 1) X , A T X ( n − 1) Grafting Reciprocal/nonreciprocal u R / NR X ( β ) u NR X ◦ I u R X ≤ β + u R X ◦ I u R X > β Con vex Combinations Given H 1 and H 2 u 12 X ( θ ) θ u 1 X + (1 − θ ) u 2 X ( n − 1) Semi-reciprocal ( t ) Secondary chains of length t u SR ( t ) X max A ( t − 1) X , A T X ( t − 1) ( n − 1) Algorithmic intermediate Given parameters t and t 0 u t,t 0 X max A ( t ) X , A T X ( t 0 ) ( n − 1) Unilateral u U X min A X , A T X ( n − 1) Directed single linkage Quasi-clustering ˜ u ∗ X A ( n − 1) X Single linkage Symmetric networks u SL X A ( n − 1) X C. Algorithms and applications to r eal datasets Algorithms for the application of the methods described throughout the paper were dev eloped using the min-max dioid algebra A on the extended nonnegati ve reals. In this algebra, the regular sum is replaced by the minimization operator and the regular product by maximization. In this algebra, the k -th power of the dissimilarity matrix was shown to contain in position i, j the minimum chain cost corresponding to going from node i to node j in at most k hops. Since chain costs played a major role in the definition of clustering methods, dioid matrix powers were presented as a natural frame work for algorithmic development. The reciprocal ultrametric was computed by first symmetrizing directed dissimilarities to their maximum and then computing increasing powers of the symmetrized dissimilarity matrix until stabilization. For the nonreciprocal case, the opposite was sho wn to be true, i.e., we first take successiv e powers of the asymmetric dissimilarity matrix until stabilization and then symmetrize the result via a maximum operation. The opposite nature of both algorithms illustrated the e xtremal properties of reciprocal and nonreciprocal clustering in the algorithmic domain. In a similar fashion, algorithms for the remaining clustering methods pre- sented throughout the paper were dev eloped in terms of finite matrix powers, thus exhibiting computational tractability of our clustering constructions. A summary of all the algorithms pre- sented in this paper is av ailable in T able IV. Clustering algorithms were applied to two real-world networks. W e gained insight about migrational preferences of individuals within United States by clustering a network of internal migration. In addition, we applied the de veloped theory to a network contain- ing information about ho w sectors of the U.S. economy interact to generate gross domestic product. In this w ay , we learned about economic sectors exhibiting pronounced interdependence and reasoned their relation with the rest of the economy . The clusters appearing in the reciprocal dendrogram of the migration network revealed that population movements are dom- inated by geographical proximity . In particular , the reciprocal dendrogram sho wed that the strongest bidirectional migration flows correspond to pairs of states sharing urban areas. E.g., Minnesota and W isconsin formed a tight cluster due to the spillov er of Minneapolis and Duluth’ s suburbs into W isconsin, and Illinois joined with Indiana because of the southern reaches of Chicago. As we looked for clusters at coarser resolutions, a separation between larger geographical regions such as East, W est, Midwest, and Ne w England could be observed. The two exceptions to geographical proximity were T exas that clustered with the W est Coast states and Florida that clustered with the Northeast states. The relative isolation of New England and the state pairs Arkansas-Oklahoma and Idaho-Utah was observed in their persistence as clusters for very high resolutions. For this particular dataset the outputs of the reciprocal and nonreciprocal dendrograms were very similar , being indicative of the rarity of migrational cycles. Combining this observation with the fact that reciprocal and nonreciprocal clustering are uniform lower and upper bounds on all methods that satisfy (A1) and (A2), it further follo ws that all the methods that satisfy these axioms yield similar clustering outputs. Unilateral clustering is the only hierarchical clustering method included here that does not satisfy these axioms. Its application to the migration network rev ealed regional separations more marked than the ones that appeared in the reciprocal and nonreciprocal dendrograms. For coarse resolutions, we observ ed a clear East-W est separation along the west borders of Michigan, Ohio, K entucky , T ennessee, and Missouri. For finer resolutions we observed clustering around the most populous states. The W est clustered around California, the South around T exas, The Southeast around Florida, the Northeast and New England around New Y ork, Appalachia around V irginia, and the Midwest around Illinois. This latter pattern is indicative of the ability of unidirectional clustering to capture the unidirectional influence of the populous states on the smaller ones – as opposed to the methods that satisfy (A1)-(A2), which capture bidirectional influence. T o study the influence between states, we applied the directed single linkage quasi-clustering method, revealing the dominant roles of California and Massachusetts in the population influxes into the W est Coast and New England, respecti vely . For the netw ork of interactions between sectors of the U.S. economy , in contrast to the migration network, the reciprocal and 51 nonreciprocal dendrograms uncovered different clustering struc- tures. The reciprocal dendrogram generated distinctiv e clusters of sectors that have significant interactions. These include a cluster of service sectors such as financial, professional, insurance, and support services; a cluster of extracti ve industries such as mining, primary metals, and oil and g as extraction; and a cluster formed by farms, forestry , food, and wood processing. As is required for the formation of clusters when reciprocal clustering is applied, sectors in these clusters use as inputs large fractions of each other’ s outputs. The nonreciprocal dendrogram did not output distinctiv e separate clusters but rather a single cluster around which sectors coalesced as the resolution coarsened. This cluster started with the sectors oil and gas, petroleum and coal products, and construction as the tightest coupled triplet to which support services were then added, with financial services then joining the group and so on. This pattern indicates that considering cycles of influence yields a different understanding of interactions between sectors of the U.S. economy than what can be weaned from the direct mutual influence required by reciprocal clustering. W e further observed that allo wing cycles of arbitrary length generates clusters based on rather con voluted influence structures. An intermediate picture that allows cycles of restricted length was obtained by use of the semi-reciprocal method with parameter 3. This method recognizes the importance of cycles by allo wing cyclic influences in volving at most three sectors in each direction b ut discards intricate influences created by longer cycles. Unilateral clustering yielded clusters that group around large sectors of the economy . This is akin to the agglomeration around populous states observed in the case of the migration network. W e finally considered the use of the directed single linkage quasi-clustering method to understand influences between economic sectors. This analysis rev ealed the dominant influence of energy , manufacturing, and financial and professional services ov er the rest of the economy . D. Symmetric networks and asymmetric quasi-ultrametrics In hierarchical clustering of asymmetric networks we output a symmetric ultrametric to summarize information about the original asymmetric structure. As a particular case, we considered the construction of symmetric ultrametrics when the original network is symmetric. As a generalization, we studied the problem of defining and constructing asymmetric ultrametrics associated with asymmetric networks. By restricting our general results to the particular case of sym- metric networks, we strengthened the uniqueness result from [11], [13] which showed that single linkage is the unique admissible clustering method on finite metric spaces under a frame work deter - mined by three axioms. In the current paper , we sho wed that single linkage is the unique admissible method for symmetric networks – a superset of metric spaces – in a framework determined only by two axioms, i.e. the Symmetric Axiom of V alue (B1) and the Axiom of T ransformation (A2), out of the three axioms considered in [11], [13]. Hierarchical clustering methods output dendrograms, which are symmetric data structures. When clustering asymmetric networks, requiring the output to be symmetric might be undesirable. In this context we defined quasi-dendrograms, a generalization of dendrograms that admits asymmetric relations, and developed a theory for quasi-clustering methods, i.e. methods that output quasi-dendrograms when applied to asymmetric networks. In this context, we revised the notion of admissibility by introducing the Directed Axiom of V alue ( ˜ A1) and the Directed Axiom of T ransformation ( ˜ A2). Under this framew ork, we showed that directed single linkage – an asymmetric version of the single linkage clustering method – is the unique admissible method. Fur- thermore, we prov ed an equiv alence between quasi-dendrograms and quasi-ultrametrics that generalizes the kno wn equiv alence between dendrograms and ultrametrics. Algorithmically , the quasi- ultrametric produced by directed single linkage can be computed by applying iterated min-max matrix power operations to the dissimilarity matrix of the network until stabilization. Directed single linkage can be used to understand relationships that cannot be understood when performing (standard) hierarchical clustering. In particular, directed influences between clusters of a given resolution define a partial order between clusters which permits making observations about the relativ e importances of different clusters. This was corroborated through the application of directed single linkage to the United Stated internal migration network. Regular hierarchical clustering uncovers the grouping of California with other W est Coast states and the grouping of Mas- sachusetts with other New England States. Directed single linkage shows that California is the dominant state in the W est Coast whereas Massachusetts appears as the dominant state in New England. When applied to the network of interactions between sectors of the United States economy , directed single linkage rev ealed the prominent influence of manufacturing, finance and professional services ov er the rest of the economy . E. Futur e developments In order to winnow the admissible space of methods that satisfy the axioms of value (A1) and transformation (A2), one can require additional properties to be fulfilled by these methods. The property of stability , discussed in this paper, is a first step in this direction. Further desirable properties will be considered in future work including scale in variance, representability , and excisiv eness. Scale in variance is defined by the requirement that the forma- tion of clusters does not depend on the scale used to measure dissimilarities. Representability , a concept introduced in [11], is an attempt to characterize methods that are described through the specification of their effect over particular exemplar networks thus giving rise to generative models for clustering methods. Excisiv eness [11] encodes the property that clustering a pre viously clustered network does not generate new clusters. By further restricting the space of methods when imposing these additional properties, we aim at achieving a full characterization of the space of hierarchical clustering methods. A P P E N D I X A P RO O F S I N S E C T I O N V Proof of Pr oposition 2: That H NR outputs valid ultrametrics was already argued prior to the statement of Proposition 2. The proof of admissibility is analogous to the proof of Proposition 1 and presented for completeness. For Axiom (A1) notice that for the two-node network ~ ∆ 2 ( α, β ) we have ˜ u ∗ p,q ( p, q ) = α and ˜ u ∗ p,q ( q , p ) = β because there is only one possible chain selection. According to (67) we then hav e u NR p,q ( p, q ) = max ˜ u ∗ p,q ( p, q ) , ˜ u ∗ p,q ( q , p ) = max( α, β ) . (174) T o prove that Axiom (A2) is satisfied consider arbitrary points x, x 0 ∈ X and denote by C ∗ ( x, x 0 ) one chain achieving the 52 minimum chain cost in (7), ˜ u ∗ X ( x, x 0 ) = max i | x i ∈ C ∗ ( x,x 0 ) A ( x i , x i +1 ) . (175) Consider the transformed chain C Y ( φ ( x ) , φ ( x 0 )) = [ φ ( x ) = φ ( x 0 ) , . . . , φ ( x l ) = φ ( x 0 )] in the space Y . Since the map φ : X → Y reduces dissimilarities we hav e that for all links in this chain A Y ( φ ( x i ) , φ ( x i +1 )) ≤ A X ( x i , x i +1 ) . Consequently , max i | x i ∈ C Y ( φ ( x ) ,φ ( x 0 )) A Y ( φ ( x i ) , φ ( x i +1 )) (176) ≤ max i | x i ∈ C ∗ ( x,x 0 ) A X ( x i , x i +1 ) . Further note that the minimum chain cost ˜ u ∗ Y ( φ ( x ) , φ ( x 0 )) among all chains linking φ ( x ) to φ ( x 0 ) cannot exceed the cost in the giv en chain C Y ( φ ( x ) , φ ( x 0 )) . Combining this observation with the inequality in (176) it follo ws that ˜ u ∗ Y ( φ ( x ) , φ ( x 0 )) ≤ max i | x i ∈ C ∗ ( x,x 0 ) A X ( x i , x i +1 ) = ˜ u ∗ X ( x, x 0 ) , (177) where we also used (175) to write the equality . The bound in (177) is true for arbitrary ordered pair ( x, x 0 ) . In particular , it is true if we rev erse the order to consider the pair ( x 0 , x ) . Consequently , we can write max ˜ u ∗ Y ( φ ( x ) ,φ ( x 0 )) , ˜ u ∗ Y ( φ ( x 0 ) , φ ( x )) ≤ max ˜ u ∗ X ( x, x 0 ) , ˜ u ∗ X ( x 0 , x ) , (178) because both maximands in the left are smaller than their corre- sponding maximand in the right. T o complete the proof just notice that the expressions in (178) correspond to the nonreciprocal ultrametric distances u NR Y ( φ ( x ) , φ ( x 0 )) and u NR X ( x, x 0 ) [cf. (67)]. Thus we have that for a dissimilarity reducing map φ : X → Y the nonreciprocal ultrametric distances satisfy u NR Y ( φ ( x ) , φ ( x 0 )) ≤ u NR X ( x, x 0 ) as required by Axiom (A2) [cf. (24)]. A P P E N D I X B P RO O F S I N S E C T I O N V I I Proof of Pr oposition 3: The function u R / NR X ( β ) fulfills the symmetry u R / NR X ( x, x 0 ; β ) = u R / NR X ( x 0 , x ; β ) , non negati vity and identity u R / NR X ( x, x 0 ; β ) = 0 ⇔ x = x 0 properties because u NR X and u R X fulfill them separately . Hence, to show that u R / NR X ( β ) is a properly defined ultrametric, we need to show that it satisfies the strong triangle inequality (12). T o sho w this, we split the proof into two cases: u R X ( x, x 0 ) ≤ β and u R X ( x, x 0 ) > β . Note that, by definition, u NR X ( x, x 0 ) ≤ u R / NR X ( x, x 0 ; β ) ≤ u R X ( x, x 0 ) . (179) Starting with the case where u R X ( x, x 0 ) ≤ β , since u NR X satisfies (12) we can state that, u R / NR X ( x, x 0 ; β ) = u NR X ( x, x 0 ) ≤ max u NR X ( x, x 00 ) , u NR X ( x 00 , x 0 ) . (180) Using the lo wer bound inequality in (179) we can write max u NR X ( x, x 00 ) , u NR X ( x 00 , x 0 ) ≤ max u R / NR X ( x, x 00 ; β ) , u R / NR X ( x 00 , x 0 ; β ) . (181) Combining (180) and (181), we obtain u R / NR X ( x, x 0 ; β ) ≤ max u R / NR X ( x, x 00 ; β ) , u R / NR X ( x 00 , x 0 ; β ) , (182) which implies that u R / NR X ( β ) fulfills the strong triangle inequality in this case. In the second case, suppose that u R X ( x, x 0 ) > β , from the validity of the strong triangle inequality (12) for u R X , we can write β < u R / NR X ( x, x 0 ; β ) = u R X ( x, x 0 ) ≤ max u R X ( x, x 00 ) , u R X ( x 00 , x 0 ) . (183) This implies that at least one of u R X ( x, x 00 ) and u R X ( x 00 , x 0 ) is greater than β . When this occurs, u R / NR X ( β ) = u R X . Hence, max u R X ( x, x 00 ) , u R X ( x 00 , x 0 ) = max u R / NR X ( x, x 00 ; β ) , u R / NR X ( x 00 , x 0 ; β ) . (184) By substituting (184) into (183), we can justify the same inequal- ity as in (182) for this second case. Since the two cases studied include all possible situations, we can conclude that u R / NR X ( β ) always satisfies the strong triangle inequality . T o show that H R / NR ( β ) satisfies Axiom (A1) it suffices to see that in a two-node network u NR X and u R X coincide, meaning that we must hav e u R / NR X ( β ) = u NR X = u R X . Since H R and H NR X fulfill (A1), the clustering method H R / NR ( β ) must satisfy (A1) as well. T o prove (A2) consider a dissimilarity reducing map φ : X → Y as defined in Section III and split consideration with regards to whether the reciprocal ultrametric is u R X ( x, x 0 ) ≤ β or u R X ( x, x 0 ) > β . When u R X ( x, x 0 ) ≤ β we must ha ve u R Y ( φ ( x ) , φ ( x 0 )) ≤ β because H R satisfies (A2) and φ is a dissim- ilarity reducing map. Hence, according to the definition in (92) we must ha ve that both u R / NR X ( x, x 0 ; β ) and u R / NR Y ( φ ( x ) , φ ( x 0 ); β ) coincide with the nonreciprocal ultrametric u R / NR X ( x, x 0 ; β ) = u NR X ( x, x 0 ) , u R / NR Y ( φ ( x ) , φ ( x 0 ); β ) = u NR Y ( φ ( x ) , φ ( x 0 )) . (185) Since H NR satisfies (A2) it is an immediate consequence of the equalities in (185) that u R / NR X ( x, x 0 ; β ) ≥ u R / NR Y ( φ ( x ) , φ ( x 0 ); β ) . (186) This means that H R / NR ( β ) satisfies Axiom (A2) when u R X ( x, x 0 ) ≤ β . In the second case, when u R X ( x, x 0 ) > β , the validity of (A2) for the reciprocal ultrametric u R X allows us to write u R / NR X ( x, x 0 ; β ) = u R X ( x, x 0 ) ≥ u R Y ( φ ( x ) , φ ( x 0 )) . (187) Applying the fact that u R Y is an upper bound on u R / NR Y ( β ) (179), we hav e u R Y ( φ ( x ) , φ ( x 0 )) ≥ u R / NR Y ( φ ( x ) , φ ( x 0 ); β ) . (188) By combining (187) and (188), we can obtain an equation analogous to (186) for the second case. This prov es the fulfillment of (A2) by H R / NR ( β ) in the general case. Proof of Proposition 4: As in Proposition 3, to prove that u R / R max X ( β ) is properly defined, it suf fices to show the strong triangle inequality (12). T o show this, we divide the proof into 53 two cases: u R X ( x, x 0 ) ≤ β and u R X ( x, x 0 ) > β . Note that, by definition, u NR X ( x, x 0 ) ≤ u R / R max X ( x, x 0 ; β ) ≤ u R X ( x, x 0 ) . (189) In the case where u R X ( x, x 0 ) ≤ β , recalling the strong triangle inequality (12) v alidity on u R X , we can assert that u R / R max X ( x, x 0 ; β ) = u R X ( x, x 0 ) ≤ min β , max u R X ( x, x 00 ) , u R X ( x 00 , x 0 ) . (190) Using the definition (94), one can say min β , max u R X ( x, x 00 ) , u R X ( x 00 , x 0 ) ≤ max u R / R max X ( x, x 00 ; β ) , u R / R max X ( x 00 , x 0 ; β ) . (191) The combination of (190) and (191) leads to u R / R max X ( x, x 0 ; β ) ≤ max u R / R max X ( x, x 00 ; β ) , u R / R max X ( x 00 , x 0 ; β ) , (192) which sho ws the strong triangle inequality in this first case. In the case where u R X ( x, x 0 ) > β , using the definition (94) and the strong triangle inequality applied to u NR X , we get u R / R max X ( x, x 0 ; β ) = max β , u NR X ( x, x 0 ) ≤ max β , max u NR X ( x, x 00 ) , u NR X ( x 00 , x 0 ) . (193) Howe ver , since u R X ( x, x 0 ) > β from the strong triangle in- equality applied to u R X we know that either u R X ( x, x 00 ) > β or u R X ( x 00 , x 0 ) > β . This implies that, max β , max u NR X ( x, x 00 ) , u NR X ( x 00 , x 0 ) = max u R / R max X ( x, x 00 ; β ) , u R / R max X ( x 00 , x 0 ; β ) . (194) By substituting (194) into (193) we obtain a result analogous to (192) for this second case. This prov es that u R / R max X ( β ) fulfills the strong triangle inequality . The proof that H R / R max ( β ) satisfies (A1) is identical to the proof in Proposition 3 and is based on (189), so we omit it. Finally , we divide the proof that H R / R max ( β ) satisfies (A2) into the cases where u R X ( x, x 0 ) ≤ β and u R X ( x, x 0 ) > β . Consider a dissimilarity reducing map φ : X → Y as defined in Section III. In the first case, if u R X ( x, x 0 ) ≤ β then u R Y ( φ ( x ) , φ ( x 0 )) ≤ β since H R satisfies (A2). Hence, u R / R max X ( x, x 0 ; β ) = u R X ( x, x 0 ) , u R / R max Y ( φ ( x ) , φ ( x 0 ); β ) = u R Y ( φ ( x ) , φ ( x 0 )) . (195) Since H R satisfies (A2), we can conclude that, u R/R max X ( x, x 0 ; β ) ≥ u R/R max Y ( φ ( x ) , φ ( x 0 ); β ) , (196) showing the fulfillment of Axiom (A2) in this first case. In the case where u R X ( x, x 0 ) > β , we apply definition (94) and the fact that H NR satisfies (A2) to get u R / R max X ( x, x 0 ; β ) = max β , u NR X ( x, x 0 ) ≥ max β , u NR Y ( φ ( x ) , φ ( x 0 )) . (197) Howe ver , the piecewise definition (94), implies that, max β , u NR Y ( φ ( x ) , φ ( x 0 )) ≥ u R / R max Y ( φ ( x ) , φ ( x 0 ); β ) . (198) By substitution of (198) into (197), an analogous result to (196) can be shown for the second case. This prov es the fulfillment of (A2) in the general case. Proof of Proposition 5: W e need to show that (1) u 12 X ( θ ) is a valid ultrametric and (2) that the method H 12 θ satisfies (A1) and (A2). As discussed in the paragraph preceding the statement of this proposition, u 12 X ( θ ) is the output of applying single linkage clustering method to the symmetric network N 12 θ . Hence, u 12 X ( θ ) is well defined. T o see that axiom (A1) is fulfilled, pick an arbitrary two- node network ~ ∆ 2 ( α, β ) = ( { p, q } , A p,q ) with A p,q ( p, q ) = α and A p,q ( q , p ) = β . Since methods H 1 and H 2 are admissible, in particular they satisfy (A1), hence u 1 p,q ( p, q ) = u 2 p,q ( p, q ) = max( α, β ) . It then follo ws from (95) that A 12 p,q ( p, q ; θ ) = A 12 p,q ( q , p ; θ ) = max( α, β ) for all possible values of θ . Moreov er , since in (96) all possible chains joining p and q must contain these two nodes as consecutive elements, we hav e that u 12 p,q ( p, q ; θ ) = A 12 p,q ( p, q ; θ ) = max( α, β ) , (199) for all θ , satisfying axiom (A1). Fulfillment of axiom (A2) also follows from admissibility of methods H 1 and H 2 . Suppose there are two networks N X = ( X, A X ) and N Y = ( Y , A Y ) and a dissimilarity reducing map φ : X → Y . From the fact that H 1 and H 2 satisfy (A2) we hav e u 1 X ( x, x 0 ) ≥ u 1 Y ( φ ( x ) , φ ( x 0 )) , (200) u 2 X ( x, x 0 ) ≥ u 2 Y ( φ ( x ) , φ ( x 0 )) . (201) By multiplying the inequality (200) by θ and (201) by (1 − θ ) , and adding both inequalities we obtain [cf. (95)] A 12 X ( x, x 0 ; θ ) ≥ A 12 Y ( φ ( x ) , φ ( x 0 ); θ ) , (202) for all 0 ≤ θ ≤ 1 . This implies that the map φ is also dissimilarity reducing between the networks ( X, A 12 X ( θ )) and ( Y , A 12 Y ( θ )) . Recall that ( X , u 12 X ( θ )) = H ( X , A 12 X ( θ )) and ( Y , u 12 Y ( θ )) = H ( Y , A 12 Y ( θ )) for the admissible method H since the networks are symmetric. Moreover , we kno w that φ is a dissimilarity reducing map between these two symmetric networks. Hence, from admissibility of the method H it follows that u 12 X ( x, x 0 ; θ ) ≥ u 12 Y ( φ ( x ) , φ ( x 0 ); θ ) , (203) for all θ , showing that axiom (A2) is satisfied by the con vex combination method. Proof of Proposition 6: W e begin the proof by showing that (98) outputs a valid ultrametric. That u SR ( t ) X ( x, x 0 ) = 0 ⇔ x = x 0 and u SR ( t ) X ( x, x 0 ) = u SR ( t ) X ( x 0 , x ) are immediate for all t ≥ 2 from the definition (98). Hence, we need to show fulfillment of the strong triangle inequality (12). For a fixed t , pick an arbitrary pair of nodes x and x 0 and an arbitrary intermediate node x 00 . Let us denote by C ∗ ( x, x 00 ) and C ∗ ( x 00 , x 0 ) a pair of main chains that satisfy definition (98) for u SR ( t ) X ( x, x 00 ) and u SR ( t ) X ( x 00 , x 0 ) respectiv ely . Construct C ( x, x 0 ) = C ∗ ( x, x 00 ) ] C ∗ ( x 00 , x 0 ) by concatenating the aforementioned minimizing chains. Howe ver , C ( x, x 0 ) is a particular chain for computing u SR ( t ) X ( x, x 0 ) and need not be the minimizing one. This implies that u SR ( t ) X ( x, x 0 ) ≤ max u SR ( t ) X ( x, x 00 ) , u SR ( t ) X ( x 00 , x 0 ) , (204) proving the strong triangle inequality . 54 T o show fulfillment of (A1), consider the network ~ ∆ 2 ( α, β ) = ( { p, q } , A p,q ) with A p,q ( p, q ) = α and A p,q ( q , p ) = β . Note that in this situation, A SR ( t ) p,q ( p, q ) = α and A SR ( t ) p,q ( q , p ) = β for all t [cf. (95)], since there is only one possible chain between them and contains only two nodes. Hence, from definition (98), u SR ( t ) p,q ( p, q ) = max( α, β ) , (205) for all t . Consequently , axiom (A1) is satisfied. T o sho w fulfillment of axiom (A2), consider two arbitrary networks ( X , A X ) and ( Y , A Y ) and a dissimilarity reducing map φ : X → Y between them. Further , denote by C ∗ X ( x, x 0 ) = [ x = x 0 , . . . , x l = x 0 ] a main chain that achiev es the minimum semi- reciprocal cost in (98). Then, for a fixed t , we can write u SR ( t ) X ( x, x 0 ) = max i | x i ∈ C ∗ X ( x,x 0 ) A SR ( t ) X ( x i , x i +1 ) . (206) Consider now a secondary chain C X t ( x i , x i +1 ) = [ x i = x (0) , . . . , x ( l 0 ) = x i +1 ] between two consecutive nodes x i and x i +1 of the minimizing chain C ∗ X ( x, x 0 ) . Further , focus on the image of this secondary chain under the map φ , that is C Y t ( φ ( x i ) , φ ( x i +1 )) := φ C X t ( x i , x i +1 ) = [ φ ( x i ) = φ ( x (0) ) , . . . , φ ( x ( l 0 ) ) = φ ( x i +1 )] in the space Y . Since the map φ : X → Y is dissimilarity reducing, A Y ( φ ( x ( i ) ) , φ ( x ( i +1) )) ≤ A X ( x ( i ) , x ( i +1) ) for all links in this chain. Analogously , we can bound the dissimilarities in secondary chains C X t ( x i +1 , x i ) from x i +1 back to x i . Thus, from (97) we can state that, A SR ( t ) X ( x i , x i +1 ) ≥ A SR ( t ) Y ( φ ( x i ) , φ ( x i +1 )) , A SR ( t ) X ( x i +1 , x i ) ≥ A SR ( t ) Y ( φ ( x i +1 ) , φ ( x i )) . (207) Denote by C Y ( φ ( x ) , φ ( x 0 )) the image of the main chain C ∗ X ( x, x 0 ) under the map φ . Notice that C Y ( φ ( x ) , φ ( x 0 )) is a particular chain joining φ ( x ) and φ ( x 0 ) , whereas the semi- reciprocal ultrametric computes the minimum across all main chains. Therefore, u SR ( t ) Y ( φ ( x ) , φ ( x 0 )) ≤ max i A SR ( t ) Y ( φ ( x i ) , φ ( x i +1 )) . (208) By bounding the right-hand side of (208) using (207) we can write u SR ( t ) Y ( φ ( x ) , φ ( x 0 )) ≤ max i A SR ( t ) X ( x i , x i +1 ) . (209) From the combination of (209) and (206), it follows that u SR ( t ) Y ( φ ( x ) , φ ( x 0 )) ≤ u SR ( t ) X ( x, x 0 ) . This proves that (A2) is satisfied. A P P E N D I X C P RO O F S I N S E C T I O N V I I I Proof of Proposition 7: By comparison with (108), in (117) we in fact compute reciprocal clustering on the network ( X, A ( t − 1) X ) . Furthermore, from the definition of matrix multiplication (103) in the dioid algebra ( R + ∪ { + ∞} , min , max) , the ( l − 1) th dioid power A ( t − 1) X is such that its i, j entry [ A ( t − 1) X ] ij represents the minimum infinity norm cost of a chain containing at most t nodes, i.e. [ A ( t − 1) X ] ij = min C t ( x i ,x j ) max k | x k ∈ C t ( x i ,x j ) A X ( x k , x k +1 ) . (210) It is just a matter of notation, when comparing (210) and (97) to see that A ( t − 1) X = A SR ( t ) X (211) Hence, (117) can be reinterpreted as computing reciprocal clus- tering on the network ( X, A SR ( t ) X ) , which is the definition of semi- reciprocal clustering [cf. (98) and (61)]. Proof of Proposition 8: Since method H t,t 0 is a generalization of H SR ( t ) that allows different length for forward and backward secondary chains, the proof is almost identical to the one of Proposition 6. The only major dif ference is that showing the symmetry of u t,t 0 X , i.e. u t,t 0 X ( x, x 0 ) = u t,t 0 X ( x 0 , x ) for all x, x 0 ∈ X , is not immediate as in the case of u SR ( t ) X . In a fashion similar to (98), we rewrite the definition of u t,t 0 X giv en an arbitrary network ( X, A X ) in terms of minimizing chains, u t,t 0 X ( x, x 0 ) = min C ( x,x 0 ) max i | x i ∈ C ( x,x 0 ) A t,t 0 X ( x i , x i +1 ) (212) where the function A t,t 0 X is defined as A t,t 0 X ( x, x 0 ) = max A SR ( t +1) X ( x, x 0 ) , A SR ( t 0 +1) X ( x 0 , x ) , (213) for all x, x 0 ∈ X . The functions A SR ( · ) X in (213) are defined as in (97). Notice that A t,t 0 X is not symmetric in general, hence symmetry of u t,t 0 X has to be explicitly verified. In order to do so, we use the result in the follo wing claim. Claim 2 Given an arbitrary network ( X, A X ) and a pair of nodes x, x 0 ∈ X such that u t,t 0 X ( x, x 0 ) = δ , then u t,t 0 X ( x 0 , x ) ≤ δ . Proof: T o show Claim 2, we must sho w that there exists a chain ˆ C ( x 0 , x ) from x 0 back to x with the same cost δ gi ven by (212). Suppose u t,t 0 X ( x, x 0 ) = δ and let C ( x, x 0 ) = [ x = x 0 , x 1 , ..., x l = x 0 ] be a minimizing chain achie ving the cost δ in (212). From definition (213), there must exist secondary chains in both directions between ev ery pair of consecutiv e nodes x i , x i +1 in C ( x, x 0 ) with cost no greater than δ . These secondary chains C t +1 ( x i , x i +1 ) and C t 0 +1 ( x i +1 , x i ) can have at most t + 1 nodes in the forward direction and at most t 0 + 1 nodes in the opposite direction. Moreover , without loss of generality we may consider the secondary chains as having exactly t + 1 nodes in one direction and t 0 + 1 in the other if we do not require consecutive nodes to be distinct. In this way , if a minimizing secondary chain has, e.g., t − 1 nodes, we can think of it as having t + 1 nodes where the last two links are self loops with null cost. Focus on a pair of consecuti ve nodes x i , x i +1 of the main chain C ( x, x 0 ) . If we can construct a chain from x i +1 back to x i with cost not greater than δ , then we can concatenate these chains for pairs x i +1 , x i for all i and obtain a chain ˆ C ( x 0 , x ) from x 0 back to x of cost not higher than δ , concluding the proof of Claim 2. Notice that the secondary chains C t 0 +1 ( x i +1 , x i ) and C t +1 ( x i , x i +1 ) can be concatenated to form a loop, i.e. a chain starting and ending at the same node, L ( x i +1 , x i +1 ) = C t 0 +1 ( x i +1 , x i ) ] C t +1 ( x i , x i +1 ) of t 0 + t + 1 nodes and cost not larger than δ . W e rename the nodes in L ( x i +1 , x i +1 ) = [ x i +1 = x 0 , x 1 , ..., x t 0 = x i , ..., x t 0 + t − 1 , x t 0 + t = x i +1 ] starting at x i +1 and following the direction of the loop. Now we are going to construct a main chain C ( x i +1 , x i ) from x i +1 to x i . W e may reinterpret the loop L ( x i +1 , x i +1 ) as the concatenation of two secondary chains [ x 0 , x 1 , . . . , x t ] and [ x t , x t +1 , . . . , x t + t 0 = x 0 ] each of them having cost not greater than δ . Thus, we may pick x 0 = x i +1 and x t as the first two nodes of the main chain C ( x i +1 , x i ) . W ith the same reasoning, we may link x t with x 2 t mo d ( t + t 0 ) 55 through the secondary chains [ x t , x t +1 , . . . , x 2 t mo d ( t + t 0 ) ] and [ x 2 t mo d ( t + t 0 ) , . . . , x 2 t + t 0 mod ( t + t 0 ) = x t ] with cost not exceeding δ , and we may link x 2 t mo d ( t + t 0 ) with x 3 t mo d ( t + t 0 ) with cost not exceeding δ , and so on. Hence, we construct the main chain C ( x i +1 , x i ) = [ x 0 , x t , x 2 t mo d ( t + t 0 ) , . . . , x ( t + t 0 − 1) t mo d ( t + t 0 ) ] , (214) which, by construction, has cost not exceeding δ . In order to finish the proof, we need to verify that the last node in the chain in (214) is in fact x t 0 . T o do so, we have to sho w that ( t + t 0 − 1) t ≡ t 0 mo d ( t + t 0 ) . (215) This equality is immediate when rearranging the terms in the left hand side ( t + t 0 )( t − 1) + t 0 ≡ t 0 mo d ( t + t 0 ) . (216) Consequently , using the chain in (214) we can go back from x i +1 to x i with cost not exceeding δ . Since this pair was picked arbitrarily , we may concatenate chains like the one in (214) for ev ery value of i and generate the chain ˆ C ( x 0 , x ) coming back from x 0 to x with cost less than or equal to δ , completing the proof of Claim 2. From Claim 2, we know that if u t,t 0 X ( x, x 0 ) = δ then u t,t 0 X ( x 0 , x ) ≤ δ . Howe ver , suppose that u t,t 0 X ( x 0 , x ) = δ 0 < δ , then, by applying Claim 2 for the pair x 0 , x ∈ X , it must be that u t,t 0 X ( x, x 0 ) ≤ δ 0 < δ , which is a contradiction since u t,t 0 X ( x, x 0 ) = δ . Thus, it cannot be that u t,t 0 X ( x 0 , x ) < δ and, since u t,t 0 X ( x 0 , x ) ≤ δ , we have that that u t,t 0 X ( x 0 , x ) = δ , showing symmetry of u t,t 0 X as wanted. A P P E N D I X D P RO O F S I N S E C T I O N I X Proof of Theorem 6: In order to sho w that Ψ is a well- defined map, we must sho w that Ψ( ˜ D X ) is a quasi-ultrametric net- work for every quasi-dendrogram ˜ D X . Giv en an arbitrary quasi- dendrogram ˜ D X = ( D X , E X ) , for a particular δ 0 ≥ 0 consider the quasi-partition ˜ D X ( δ 0 ) . Consider the range of resolutions δ associated with such quasi-partition. I.e., { δ ≥ 0 ˜ D X ( δ ) = ˜ D X ( δ 0 ) } . (217) Right continuity ( ˜ D4) of ˜ D X ensures that the minimum of the set in (217) is well-defined and hence definition (127) is valid. T o prov e that ˜ u X in (127) is a quasi-ultrametric we need to show that it attains non-negati ve v alues as well as the identity and strong triangle inequality properties. That ˜ u X attains non-negati ve values is clear from the definition (127). The identity property is implied by the first boundary condition in ( ˜ D1). Since [ x ] 0 = [ x ] 0 for all x ∈ X , we must have ˜ u X ( x, x ) = 0 . Con versely , since for all x 6 = x 0 ∈ X , ([ x ] 0 , [ x 0 ] 0 ) 6∈ E X (0) and [ x ] 0 6 = [ x 0 ] 0 we must have that ˜ u X ( x, x 0 ) > 0 for x 6 = x 0 and the identity property is satisfied. T o see that ˜ u X satisfies the strong triangle inequality in (12), consider nodes x , x 0 , and x 00 such that the lowest resolution for which [ x ] δ = [ x 00 ] δ or ([ x ] δ , [ x 00 ] δ ) ∈ E X ( δ ) is δ 1 and the lowest resolution for which [ x 00 ] δ = [ x 0 ] δ or ([ x 00 ] δ , [ x 0 ] δ ) ∈ E X ( δ ) is δ 2 . Right continuity ( ˜ D4) ensures that these lowest resolutions are well-defined. According to (127) we then have ˜ u X ( x, x 00 ) = δ 1 , ˜ u X ( x 00 , x 0 ) = δ 2 . (218) Denote by δ 0 := max( δ 1 , δ 2 ) . From the equiv alence hierarchy ( ˜ D2) and influence hierarchy ( ˜ D3) properties, it follows that [ x ] δ 0 = [ x 00 ] δ 0 or ([ x ] δ 0 , [ x 00 ] δ 0 ) ∈ E X ( δ 0 ) and [ x 00 ] δ 0 = [ x 0 ] δ 0 or ([ x 00 ] δ 0 , [ x 0 ] δ 0 ) ∈ E X ( δ 0 ) . Furthermore, from transitivity (QP2) of the quasi-partition ˜ D X ( δ 0 ) , it follows that [ x ] δ 0 = [ x 0 ] δ 0 or ([ x ] δ 0 , [ x 0 ] δ 0 ) ∈ E X ( δ 0 ) . Using the definition in (127) for x , x 0 we conclude that ˜ u X ( x, x 0 ) ≤ δ 0 . (219) By definition δ 0 := max( δ 1 , δ 2 ) , hence we substitute this expres- sion in (219) and compare with (218) to obtain ˜ u X ( x, x 0 ) ≤ max( δ 1 , δ 2 ) = max ˜ u X ( x, x 00 ) , ˜ u X ( x 00 , x 0 ) . (220) Consequently , ˜ u X satisfies the strong triangle inequality and is therefore a quasi-ultrametric, proving that the map Ψ is well- defined. For the conv erse result, we need to show that Υ is a well- defined map. Giv en a quasi-ultrametric ˜ u X on a node set X and a resolution δ ≥ 0 , we first define the relation x ˜ u X ( δ ) x 0 ⇐ ⇒ ˜ u X ( x, x 0 ) ≤ δ , (221) for all x, x 0 ∈ X . Notice that ˜ u X ( δ ) is a quasi-equiv alence rela- tion as defined in Definition 2 for all δ ≥ 0 . The reflexivity prop- erty is implied by the identity property of the quasi-ultrametric ˜ u X and transitivity is implied by the fact that ˜ u X satisfies the strong triangle inequality . Furthermore, definitions (128) and (129) are just reformulations of (122) and (123) respectiv ely , for the special case of the quasi-equiv alence defined in (221). Hence, Proposition 9 guarantees that Υ( X , ˜ u X ) = ˜ D X ( δ ) = ( D X ( δ ) , E X ( δ )) is a quasi-partition for ev ery resolution δ ≥ 0 . In order to show that Υ is well-defined, we need to show that these quasi-partitions are nested, i.e. that ˜ D X satisfies ( ˜ D1)-( ˜ D4). The first boundary condition in ( ˜ D1) is implied by (128) and the identity property of ˜ u X . The second boundary condition in ( ˜ D1) is implied by the fact that ˜ u X takes finite real values on a finite domain since the node set X is finite. Hence, any δ 0 satisfying δ 0 ≥ max x,x 0 ∈ X ˜ u X ( x, x 0 ) , (222) is a v alid candidate to show fulfillment of ( ˜ D1). T o see that ˜ D X satisfies ( ˜ D2) assume that for a resolution δ 1 we hav e two nodes x, x 0 ∈ X such that x ∼ ˜ u X ( δ 1 ) x 0 as in (128), then it follows that max ˜ u X ( x, x 0 ) , ˜ u X ( x 0 , x ) ≤ δ 1 . Thus, if we pick any δ 2 > δ 1 it is immediate that max ˜ u X ( x, x 0 ) , ˜ u X ( x 0 , x ) ≤ δ 2 which by (128) implies that x ∼ ˜ u X ( δ 2 ) x 0 . Fulfillment of ( ˜ D3) can be sho wn in a similar way as fulfillment of ( ˜ D2). Gi ven a scalar δ 1 ≥ 0 and x, x 0 ∈ X such that ([ x ] δ 1 , [ x 0 ] δ 1 ) ∈ E X ( δ 1 ) then by (129) we hav e that min x 1 ∈ [ x ] δ 1 ,x 2 ∈ [ x 0 ] δ 1 ˜ u X ( x 1 , x 2 ) ≤ δ 1 . (223) From property ( ˜ D2), we know that for all x ∈ X , [ x ] δ 1 ⊂ [ x ] δ 2 for all δ 2 > δ 1 . Hence, two things might happen. Either max( ˜ u X ( x, x 0 ) , ˜ u X ( x 0 , x )) ≤ δ 2 in which case [ x ] δ 2 = [ x 0 ] δ 2 or it might be that [ x ] δ 2 6 = [ x 0 ] δ 2 but min x 1 ∈ [ x ] δ 2 ,x 2 ∈ [ x 0 ] δ 2 ˜ u X ( x 1 , x 2 ) ≤ δ 1 < δ 2 , (224) which implies that ([ x ] δ 2 , [ x 0 ] δ 2 ) ∈ E X ( δ 2 ) , satisfying ( ˜ D3). 56 Finally , to see that ˜ D X satisfies the right continuity condition ( ˜ D4), for each δ ≥ 0 such that ˜ D X ( δ ) 6 = ( { X } , ∅ ) we may define ( δ ) as any positiv e scalar satisfying 0 < ( δ ) < min x,x 0 ∈ X s.t. ˜ u X ( x,x 0 ) >δ ˜ u X ( x, x 0 ) − δ , (225) where the finiteness of X ensures that ( δ ) is well-defined. Hence, (128) and (129) guarantee that ˜ D X ( δ ) = ˜ D X ( δ 0 ) for δ 0 ∈ [ δ, δ + ( δ )] . For all other resolutions δ such that ˜ D X ( δ ) = ( { X } , ∅ ) , right continuity is tri vially satisfied since the quasi- dendrogram remains unchanged for increasing resolutions. Con- sequently , Υ( X, ˜ u X ) is a valid quasi-dendrogram for ev ery quasi- ultrametric network ( X , ˜ u X ) , proving that Υ is well-defined. In order to conclude the proof, we need to show that Ψ ◦ Υ and Υ ◦ Ψ are the identities on ˜ U and ˜ D , respecti vely . T o see why the former is true, pick any quasi-ultrametric network ( X, ˜ u X ) and consider an arbitrary pair of nodes x, x 0 ∈ X such that ˜ u X ( x, x 0 ) = δ 0 . Also, consider the ultrametric network Ψ ◦ Υ( X , ˜ u X ) := ( X , ˜ u ∗ X ) . From (128) and (129), in the quasi- dendrogram Υ( X , ˜ u X ) , x and x 0 belong to different classes for resolutions δ < δ 0 and there is no edge from [ x ] δ to [ x 0 ] δ . Moreov er , at resolution δ = δ 0 either an edge appears from [ x ] δ 0 to [ x 0 ] δ 0 , or both nodes merge into one single cluster . In any case, when we apply Ψ to the resulting quasi-dendrogram, we obtain ˜ u ∗ X ( x, x 0 ) = δ 0 . Since x, x 0 ∈ X were chosen arbitrarily , we have that ˜ u X = ˜ u ∗ X , showing that Ψ ◦ Υ is the identity on ˜ U . A similar argument shows that Υ ◦ Ψ is the identity on ˜ D . A P P E N D I X E P RO O F S I N S E C T I O N X Proof of Theorem 8: Suppose there exists a clustering method H that satisfies axioms (A1”) and (A2) but does not satisfy property (P1’). This means that there exists a network N = ( X , A X ) with output ultrametrics ( X, u X ) = H ( N ) for which u X ( x 1 , x 2 ) < sep ( X, A X ) , (226) for at least one pair of nodes x 1 6 = x 2 ∈ X . Focus on a symmetric two-node network ~ ∆ 2 ( s, s ) = ( { p, q } , A p,q ) with A p,q ( p, q ) = A p,q ( q , p ) = s = sep ( X , A X ) and denote by ( X, u p,q ) = H ( ~ ∆ 2 ( s, s )) the output of applying method H to the two-node network ~ ∆ 2 ( s, s ) . From axiom (A1”), we must have that u p,q ( p, q ) = min sep ( X, A X ) , sep ( X , A X ) = sep ( X, A X ) . (227) Construct the map φ : X → { p, q } from the netw ork N to ~ ∆ 2 ( s, s ) that takes node x 1 to φ ( x 1 ) = p and every other node x 6 = x 1 to φ ( x ) = q . No dissimilarity can be increased when applying φ since ev ery dissimilarity is mapped either to zero or to sep ( X , A X ) which is by definition the minimum dissimilarity in the original network (10). Hence, φ is a dissimilarity reducing map and from Axiom (A2) it follo ws that u X ( x 1 , x 2 ) ≥ u p,q ( φ ( x 1 ) , φ ( x 2 )) = u p,q ( p, q ) . (228) By substituting (227) in (228) we contradict (226) proving that such method H cannot exist. Proof of Proposition 12: T o show fulfillment of (A1”), consider the two-node network ~ ∆ 2 ( α, β ) and denote by ( { p, q } , u U p,q ) = H U ( ~ ∆ 2 ( α, β )) the output of applying unilateral clustering to ~ ∆ 2 ( α, β ) . Since every chain connecting p and q must contain these two nodes as consecutiv e nodes, applying the definition in (147) yields u U p,q ( p, q ) = min A p,q ( p, q ) , A p,q ( q , p ) = min( α, β ) , (229) and axiom (A1”) is thereby satisfied. In order to show fulfillment of axiom (A2), the proof is analogous to the one dev eloped in Proposition 1. The proof only differs in the appearance of minimization operations instead of maximizations to account for the dif ference in the definitions of unilateral and reciprocal ultrametrics [cf. (147) and (61)]. Proof of Theorem 9: Giv en an arbitrary network ( X , A X ) , denote by H a clustering method that fulfills axioms (A1”) and (A2) and define H ( X, A X ) = ( X , u X ) . Then, the output ultrametric u X must satisfy the inequality u U X ( x, x 0 ) ≤ u X ( x, x 0 ) ≤ u U X ( x, x 0 ) , (230) for e very pair of nodes x, x 0 ∈ X . Proof of leftmost inequality in (230) : Consider the unilateral clustering equiv alence relation ∼ U X ( δ ) at resolution δ according to which x ∼ U X ( δ ) x 0 if and only if x and x 0 belong to the same unilateral cluster at resolution δ . That is, x ∼ U X ( δ ) x 0 ⇐ ⇒ u U X ( x, x 0 ) ≤ δ . (231) Further , as in the proof of Theorem 4, consider the space Z of equi valence classes at resolution δ . That is, Z := X mo d ∼ U X ( δ ) . Also, consider the map φ δ : X → Z that maps each point of X to its equiv alence class. Notice that x and x 0 are mapped to the same point z if and only if they belong to the same block at resolution δ , consequently φ δ ( x ) = φ δ ( x 0 ) ⇐ ⇒ u U X ( x, x 0 ) ≤ δ . (232) W e define the network N Z = ( Z, A Z ) by endowing Z with the dissimilarity matrix A Z deriv ed from A X in the follo wing way A Z ( z , z 0 ) = min x ∈ φ − 1 δ ( z ) ,x 0 ∈ φ − 1 δ ( z 0 ) A X ( x, x 0 ) . (233) For further details on this construction, revie w the corresponding proof in Theorem 4 and see Fig. 11. Nonetheless, we stress the fact that the map φ δ is dissimilarity reducing for all δ . I.e., A X ( x, x 0 ) ≥ A Z ( φ δ ( x ) , φ δ ( x 0 )) . (234) Claim 3 The separation as defined in (10) of the equivalence class network N Z is sep ( N Z ) > δ . (235) Proof : First, observe that by definition of unilateral clustering (147), we kno w that, u U X ( x, x 0 ) ≤ min( A X ( x, x 0 ) , A X ( x 0 , x )) , (236) since a two node chain between nodes x and x 0 is a particular chain joining the two nodes whereas the ultrametric is calculated as the minimum ov er all chains. Now , assume that sep ( N Z ) ≤ δ . Therefore, by (233) there exists a pair of nodes x and x 0 that belong to dif ferent equiv alence classes and have A X ( x, x 0 ) ≤ δ . (237) 57 Howe ver , if x and x 0 belong to different equi valence classes, they cannot be clustered at resolution δ , hence, u U X ( x, x 0 ) > δ . (238) Inequalities (237) and (238) cannot hold simultaneously since they contradict (236). Thus, it must be the case that sep ( N Z ) > δ . Denote by H ( Z , A Z ) = ( Z, u Z ) the outcome of the clustering method H applied to the equiv alence class network N Z . Since sep ( N Z ) > δ , it follo ws from property (P1’) that for all z , z 0 such that z 6 = z 0 u Z ( z , z 0 ) > δ . (239) Further , recalling that φ δ is a dissimilarity reducing map (234), from Axiom (A2) we must hav e u X ( x, x 0 ) ≥ u Z ( φ δ ( x ) , φ δ ( x 0 )) = u Z ( z , z 0 ) for some z , z 0 ∈ Z . This fact, combined with (239), entails that when φ δ ( x ) and φ δ ( x 0 ) belong to dif ferent equiv alence classes u X ( x, x 0 ) ≥ u Z ( φ ( x ) , φ ( x 0 )) > δ . (240) Notice now that according to (232), φ δ ( x ) and φ δ ( x 0 ) belonging to different equiv alence classes is equiv alent to u U X ( x, x 0 ) > δ . Hence, we can state that u U X ( x, x 0 ) > δ implies u X ( x, x 0 ) > δ for any arbitrary δ > 0 . In set notation, { ( x, x 0 ) : u U X ( x, x 0 ) > δ } ⊆ { ( x, x 0 ) : u X ( x, x 0 ) > δ } . (241) Since (241) is true for arbitrary δ > 0 , this implies that u U X ( x, x 0 ) ≤ u X ( x, x 0 ) , proving the left inequality in (230). Proof of rightmost inequality in (230) : Consider two nodes x and x 0 with unilateral ultrametric value u U X ( x, x 0 ) = δ . Let C ∗ ( x, x 0 ) = [ x = x 0 , . . . , x l = x 0 ] be a minimizing chain in the definition (147) so that we can write δ = u U X ( x, x 0 ) (242) = max i | x i ∈ C ∗ ( x,x 0 ) min A X ( x i , x i +1 ) , A X ( x i +1 , x i ) . Consider the two-node network ~ ∆ 2 ( δ, M ) = ( { p, q } , A p,q ) with A p,q ( p, q ) = δ and A p,q ( q , p ) = M := max x,x 0 A X ( x, x 0 ) . Denote by ( { p, q } , u p,q ) = H ( { p, q } , A p,q ) the output of the clustering method H applied to network ~ ∆ 2 ( δ, M ) . Notice that according to Axiom (A1”) we hav e u p,q ( p, q ) = u p,q ( q , p ) = min( δ, M ) = δ, (243) where the last equality is enforced by the definition of M . Focus now on each link of the minimizing chain in (242). For ev ery successive pair of nodes x i and x i +1 , we must hav e max A X ( x i , x i +1 ) , A X ( x i +1 , x i ) ≤ M , (244) min A X ( x i , x i +1 ) , A X ( x i +1 , x i ) ≤ δ . (245) Expression (244) is true since M is defined as the maximum dissimilarity in A X . Inequality (245) is justified by (242), since δ is defined as the maximum among links of the minimum distance in both directions of the link. This observation allows the construction of dissimilarity reducing maps φ i : { p, q } → X , φ i := φ i ( p ) = x i , φ i ( q ) = x i +1 , if ˆ A X ( x i , x i +1 ) = A X ( x i , x i +1 ) φ i ( q ) = x i , φ i ( p ) = x i +1 , otherwise. (246) In this way , we can map p and q to subsequent nodes in the chain C ( x, x 0 ) used in (242). Inequalities (244) and (245) combined with the map definition in (246) guarantee that φ i is a dissimilarity reducing map for every i . Since clustering method H satisfies Axiom (A2), it follows that u X ( φ i ( p ) , φ i ( q )) ≤ u p,q ( p, q ) = δ, for all i, (247) where we used (243) for the last equality . Substituting φ i ( p ) and φ i ( q ) in (247) by the corresponding nodes giv en by the definition (246), we can write u X ( x i , x i +1 ) = u X ( x i +1 , x i ) ≤ δ , for all i, (248) where the symmetry property of ultrametrics was used. T o com- plete the proof we in vok e the strong triangle inequality (12) and apply it to C ( x, x 0 ) = [ x = x 0 , . . . , x l = x 0 ] , the minimizing chain in (242). As a consequence, u X ( x, x 0 ) ≤ max i u X ( x i , x i +1 ) ≤ δ , (249) where (248) was used in the second inequality . The proof of the right inequality in (230) is completed by substituting δ = u U X ( x, x 0 ) [cf. (242)] into (249). Having pro ved both inequalities in (230), the conclusion that the unilateral clustering method is the only one that satisfies axioms (A1”) and (A2) is immediate, completing the global proof. Proof of Theorem 10: The leftmost inequality in (151) can be proved using the same method of proof used for the leftmost inequality in (230) within the proof of Theorem 9. The proof of the rightmost inequality in (151) is equiv alent to the proof of the rightmost inequality in Theorem 4. A P P E N D I X F P RO O F S I N S E C T I O N X I Proof of Theorem 11: Proof of nonnegati vity and symmetry statements: That the distance d N ( N X , N Y ) is nonneg ativ e follows from the absolute value in the definition of (156). The symmetry d N ( N X , N Y ) = d N ( N Y , N X ) follows because a correspondence R ⊆ X × Y with elements r i = ( x i , y i ) results in the same associations as the correspondence S ⊆ Y × X with elements s i = ( y i , x i ) . This prov es the first two statements. Proof of identity statement: In order to show the identity statement, assume that N X and N Y are isomorphic and let φ : X → Y be a bijection realizing this isomorphism. Then, consider the particular correspondence R φ = { ( x, φ ( x )) , x ∈ X } . By construction, for all x 0 ∈ X there is an element r = ( x 0 , y ) ∈ R φ and since φ is surjectiv e – indeed, bijectiv e – for all y 0 ∈ Y there is an element s = ( x, y 0 ) ∈ R φ . Thus, R φ is a valid correspondence between X and Y , which satisfies (152), A Y ( y , y 0 ) = A Y ( φ ( x ) , φ ( x 0 )) = A X ( x, x 0 ) (250) for all ( x, y ) , ( x 0 , y 0 ) ∈ R φ . Since R φ is a particular correspon- dence while in definition (156) we minimize o ver all possible correspondences it must be d N ( N X , N Y ) ≤ 1 2 max ( x,y ) , ( x 0 ,y 0 ) ∈ R φ | A X ( x, x 0 ) − A Y ( y , y 0 ) | = 0 , (251) 58 where the equality follows because A X ( x, x 0 ) − A Y ( y , y 0 ) = 0 for all ( x, y ) , ( x 0 , y 0 ) ∈ R φ by (250). Since we already argued that d N ( N X , N Y ) ≥ 0 it must be that d N ( N X , N Y ) = 0 when the networks N X ∼ = N Y are isomorphic. W e now argue that the con verse is also true, i.e., if the distance is d N ( N X , N Y ) = 0 it implies that X and Y are isomorphic. If d N ( N X , N Y ) = 0 there is a correspondence R 0 such that A X ( x, x 0 ) = A Y ( y , y 0 ) for all ( x, y ) , ( x 0 , y 0 ) ∈ R 0 . Define then the function φ : X → Y that associates to x any value y among those that form a pair with x in the correspondence R 0 , φ ( x ) = y 0 ∈ y ( x, y ) ∈ R 0 . (252) Since R 0 is a correspondence the set y ( x, y ) ∈ R 0 is nonempty implying that (252) is defined for all x ∈ X . Moreo ver , since we know that ( x, φ ( x )) ∈ R 0 we must have A X ( x, x 0 ) = A Y ( φ ( x ) , φ ( x 0 )) for all x, x 0 . From this observation it follows that the function φ must be injecti ve. If it were not, there would be a pair of points x 6 = x 0 for which φ ( x ) = φ ( x 0 ) . For this pair of points we can then write, A X ( x, x 0 ) = A Y ( φ ( x ) , φ ( x 0 )) = 0 , (253) where the first equality follows form the definition of φ and the second equality from the fact that φ ( x ) = φ ( x 0 ) and that dissimilarity functions are such that A Y ( y , y ) = 0 . Ho wever , (253) is inconsistent with x 6 = x 0 because the dissimilarity function is A X ( x, x 0 ) = 0 if and only x = x 0 . Then, φ ( x ) = φ ( x 0 ) if and only if x = x 0 , implying that φ is an injection. Like wise, define the function ψ : Y → X that associates to y an y value x among those that form a pair with y in the correspondence R 0 , ψ ( y ) = x 0 ∈ x ( x, y ) ∈ R 0 (254) Since R 0 is a correspondence the set x ( x, y ) ∈ R 0 6 = ∅ is nonempty implying that (254) is defined for all y ∈ Y and since we know that ( ψ ( y ) , y ) ∈ R 0 we must have A X ( ψ ( y ) , ψ ( y 0 )) = A Y ( y , y 0 ) for all y , y 0 from where it follows that the function ψ must be injecti ve. W e ha ve then constructed injections φ : X → Y and ψ : Y → X . The Cantor-Bernstein-Schroeder theorem [49, Chapter 2.6] applies and guarantees that there exists a bijection between X and Y . This forces X and Y to have the same cardinality and, as a consequence, it forces φ and ψ to be bijections. Pick the bijection φ and recall that since ( x, φ ( x )) ∈ R 0 we must hav e A X ( x, x 0 ) = A Y ( φ ( x ) , φ ( x 0 )) for all x, x 0 from where it follo ws that N X ∼ = N Y . Since we already showed d N ( N X , N Y ) = 0 when the networks N X ∼ = N Y are isomorphic the identity statement follo ws. Proof of triangle inequality: T o show the triangle inequality let correspondences R ∗ between X and Z and S ∗ between Z and Y be the minimizing correspondences in (156) so that we can write d N ( N X , N Z ) = 1 2 max ( x,z ) , ( x 0 ,z 0 ) ∈ R ∗ A X ( x, x 0 ) − A Z ( z , z 0 ) . d N ( N Z , N Y ) = 1 2 max ( z ,y ) , ( z 0 ,y 0 ) ∈ S ∗ A Z ( z , z 0 ) − A Y ( y , y 0 ) . (255) Define now the correspondence T between X and Y as the one induced by pairs ( x, z ) and ( z , y ) sharing a common point z ∈ Z , T := ( x, y ) ∃ z ∈ Z with ( x, z ) ∈ R ∗ , ( z , y ) ∈ S ∗ . (256) T o sho w that T is a correspondence we hav e to pro ve that for e very x ∈ X there exists y 0 ∈ Y such that ( x, y 0 ) ∈ T and that for ev ery y ∈ Y there exists x 0 ∈ X such that ( x 0 , y ) ∈ T . T o see this pick arbitrary x ∈ X . Because R is a correspondence there exists z 0 ∈ Z such that ( x, z 0 ) ∈ R . Since S is also a correspondence, there exists y 0 ∈ Y such that ( z 0 , y 0 ) ∈ S . Hence, there exists ( x, y 0 ) ∈ T for every x ∈ X . Con versely , pick an arbitrary y ∈ Y . Since S and R are correspondences there exist z 0 ∈ Z and x 0 ∈ X such that ( z 0 , y ) ∈ S and ( x 0 , z 0 ) ∈ R . Thus, there exists ( x 0 , y ) ∈ T for ev ery y ∈ Y . Therefore, T is a correspondence. The correspondence T need not be a minimizing correspon- dence for the distance d N ( N X , N Y ) , b ut since it is a valid correspondence we can write [cf. (156)] d N ( N X , N Y ) ≤ 1 2 max ( x,y ) , ( x 0 ,y 0 ) ∈ T | A X ( x, x 0 ) − A Y ( y , y 0 ) | . (257) According to the definition of T in (256) the requirement ( x, y ) , ( x 0 , y 0 ) ∈ T is equiv alent to requiring ( x, z ) , ( x 0 , z 0 ) ∈ R ∗ and ( z , y ) , ( z 0 , y 0 ) ∈ S ∗ . Further adding and subtracting A Z ( z , z 0 ) from the maximand and using the triangle inequality on the absolute v alue yields d N ( N X , N Y ) ≤ 1 2 max ( x,z ) , ( x 0 ,z 0 ) ∈ R ∗ ( z,y ) , ( z 0 ,y 0 ) ∈ S ∗ | A X ( x, x 0 ) − A Z ( z , z 0 ) | (258) + | A Z ( z , z 0 ) − A Y ( y , y 0 ) | . W e can further bound (258) by maximizing each summand independently so as to write d N ( N X , N Y ) ≤ 1 2 max ( x,z ) , ( x 0 ,z 0 ) ∈ R ∗ | A X ( x, x 0 ) − A Z ( z , z 0 ) | + 1 2 max ( z ,y ) , ( z 0 ,y 0 ) ∈ S ∗ | A Z ( z , z 0 ) − A Y ( y , y 0 ) | . (259) Substituting the equalities in (255) for the summands on the right hand side of (259) yields the triangle inequality . Having shown the four statements in Theorem 11, the main proof concludes. Proof of Theorem 13: In order to prov e the statement for any t ≥ 2 , we first show that the difference between the costs of secondary chains is bounded as the follo wing claim states. Claim 4 Given two networks N X = ( X , A X ) and N Y = ( Y , A Y ) , let η = d N ( N X , N Y ) and R be the associated minimiz- ing correspondence . Given two pair of nodes ( x, y ) , ( x 0 , y 0 ) ∈ R we have | A SR ( t ) X ( x, x 0 ) − A SR ( t ) Y ( y , y 0 ) | ≤ 2 η , (260) wher e A SR ( t ) X and A SR ( t ) Y ar e defined as in (97) . Proof: Let C ∗ ( x, x 0 ) = [ x = x 0 , x 1 , ..., x l = x 0 ] be a minimizing chain in the definition (97), implying that A SR ( t ) X ( x, x 0 ) = max i | x i ∈ C ∗ ( x,x 0 ) A X ( x i , x i +1 ) . (261) Construct the chain C ( y , y 0 ) = [ y = y 0 , y 1 , ..., y l = y 0 ] in N Y from y to y 0 such that ( x i , y i ) ∈ R for all i . This chain is guaranteed to exist from the definition of correspondence. Using the definition in (97) and the inequality stated in (165), we write A SR ( t ) Y ( y , y 0 ) ≤ max i | y i ∈ C ( y,y 0 ) A Y ( y i , y i +1 ) ≤ max i | x i ∈ C ∗ ( x,x 0 ) A X ( x i , x i +1 ) + 2 η . (262) 59 Substituting (261) in (262) we obtain, A SR ( t ) Y ( y , y 0 ) ≤ A SR ( t ) X ( x, x 0 ) + 2 η . (263) By follo wing an analogous procedure starting with a minimizing chain in the network N Y , we can sho w that, A SR ( t ) X ( x, x 0 ) ≤ A SR ( t ) Y ( y , y 0 ) + 2 η . (264) From (263) and (264), the desired result in (260) follows. W e use Lemma 3 to show that (260) implies | A SR ( t ) X ( x, x 0 ) − A SR ( t ) Y ( y , y 0 ) | ≤ 2 η , (265) where A SR ( t ) X and A SR ( t ) Y are defined as in (99). W e then compare (7) and (98) to see that ( X, u SR ( t ) X ) = ˜ H ∗ ( X, A SR ( t ) X ) , (266) and similarly for ( Y , u SR ( t ) Y ) . Finally , as done for the case t = 2 , by using stability of ˜ H ∗ [cf. Theorem 12], the result follo ws. R E F E R E N C E S [1] G. Carlsson, F . Memoli, A. Ribeiro, and S. Segarra. Axiomatic construction of hierarchical clustering in asymmetric networks. In Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on , pages 5219–5223, 2013. [2] G. Carlsson, F . M ´ emoli, A. Ribeiro, and S. Se garra. Hierarchical clustering methods and algorithms for asymmetric networks. In Signals, Systems and Computers, 2013 Asilomar Confer ence on , pages 1773–1777, Nov 2013. [3] G. Carlsson, F . M ´ emoli, A. Ribeiro, and S. Segarra. Alternativ e axiomatic constructions for hierarchical clustering of asymmetric networks. In Global Confer ence on Signal and Information Pr ocessing (GlobalSIP), 2013 IEEE , pages 791–794, Dec 2013. [4] G. Carlsson, F . M ´ emoli, A. Ribeiro, and S. Segarra. Hierarchical quasi- clustering methods for asymmetric networks. JMLR W&CP: International Confer ence on Machine Learning , 32(1):352–360, 2014. [5] X. Rui and D. W unsch-II. Survey of clustering algorithms. IEEE T rans. Neural Netw . , 16(3):645–678, 2005. [6] U. von Luxburg and S. Ben-David. T owards a statistical theory of clustering. T echnical report, P ASCAL workshop on clustering, London, 2005. [7] Shai Ben-David, Ulrike von Luxburg, and D ´ avid P ´ al. A sober look at clustering stability . In G ´ abor Lugosi and Hans-Ulrich Simon, editors, COLT , volume 4005 of Lectur e Notes in Computer Science , pages 5–19. Springer , 2006. [8] I. Guyon, U. von Luxbur g, and R. Williamson. Clustering: Science or art? T echnical report, Paper presented at the NIPS 2009 W orkshop Clustering: Science or Art?, 2009. [9] M. Ackerman and S. Ben-David. Measures of clustering quality: A working set of axioms for clustering. Pr oceedings of Neural Information Pr ocessing Systems , 2008. [10] R. Zadeh and S. Ben-David. A uniqueness theorem for clustering. Proceed- ings of Uncertainty in Artificial Intelligence , 2009. [11] G. Carlsson and F . M ´ emoli. Classifying clustering schemes. F oundations of Computational Mathematics , 13(2):221–252, 2013. [12] Jon M. Kleinberg. An impossibility theorem for clustering. In NIPS , pages 446–453, 2002. [13] G. Carlsson and F . M ´ emoli. Characterization, stability and conver gence of hierarchical clustering methods. Journal of Machine Learning Researc h , 11:1425–1470, 2010. [14] G.Carlsson and F . M ´ emoli. Multiparameter clustering methods. In Claus W eihs Hermann Locarek-Junge, editor, Classification as a T ool for Resear ch. Proc. 11th Conference of the International F ederation of Classifi- cation Societies (IFCS-09) , pages 63–70, Heidelberg-Berlin, 2010. Springer- V erlag. [15] G. N. Lance and W . T . Williams. A general theory of classificatory sorting strategies 1. Hierarchical systems. Computer Journal , 9(4):373–380, 1967. [16] A.K. Jain and R. C. Dubes. Algorithms for clustering data . Prentice Hall Advanced Reference Series. Prentice Hall Inc., 1988. [17] Y . Zhao and G. Karypis. Hierarchical clustering algorithms for document datasets. Data Mining and Knowledge Discovery , 10:141–168, 2005. [18] J. Shi and J. Malik. Normalized cuts and image segmentation. IEEE T ransactions on P attern Analysis and Machine Intelligence , 22(8):888–905, 2000. [19] M. Ne wman and M. Girvan. Community structure in social and biological networks. Pr oc. Ntnl. Acad. Sci. , 99(12):7821–7826, 2002. [20] M. Newman and M. Girvan. Finding and ev aluating community structure in networks. Phys. Rev . E , 69, 026113, 2004. [21] F . Chung. Spectral graph theory . American Mathematical Society , 1997. [22] U. V on Luxburg. A tutorial on spectral clustering. Statistics and Computing , 17(4):395–416, 12 2007. [23] A. Ng, M. Jordan, and Y . W eiss. On spectral clustering: Analysis and an algorithm. In T .K. Leen, T .G. Dietterich and V . Tr esp (Eds.), Advances in neural information pr ocessing systems 14, MIT Pr ess, Cambridge , 2:849– 856, 2002. [24] F . Bach and M. Jordan. Learning spectral clustering. In S. Thrun, L. Saul and B Sch ¨ olkopf (eds.) Advances in neural information pr ocessing systems 16, MIT Press, Cambridge , pages 305–312, 2004. [25] T . Saito and H. Y adohisa. Data analysis of asymmetric structures: advanced appr oaches in computational statistics . CRC Press, 2004. [26] L. Hubert. Min and max hierarchical clustering using asymmetric similarity measures. Psychometrika , 38(1):63–72, 1973. [27] P .B. Slater . Hierarchical internal migration regions of france. Systems, Man and Cybernetics, IEEE T ransactions on , (4):321–324, 1976. [28] J.P . Boyd. Asymmetric clusters of internal migration regions of france. Ieee T ransactions on Systems Man and Cybernetics , (2):101–104, 1980. [29] R. E. T arjan. An improved algorithm for hierarchical clustering using strong components. Inf. Pr ocess. Lett. , 17(1):37–41, 1983. [30] P .B. Slater. A partial hierarchical regionalization of 3140 us counties on the basis of 1965-1970 intercounty migration. En vir onment and Planning A , 16(4):545–550, 1984. [31] F . Murtagh. Multidimensional clustering algorithms. Compstat Lectures, V ienna: Physika V erlag, 1985 , 1, 1985. [32] W . Pentney and M. Meila. Spectral clustering of biological sequence data. Pr oc. Ntnl. Conf. Artificial Intel. , 2005. [33] M. Meila and W . Pentney . Clustering by weighted cuts in directed graphs. Pr oceedings of the 7th SIAM International Confer ence on Data Mining , 2007. [34] D. Zhou, B. Scholkopf, and T . Hofmann. Semi-supervised learning on directed graphs. Advances in Neural Information Processing Systems , 2005. [35] E. Harzheim. Order ed sets . Springer , 2005. [36] N. Jardine and R. Sibson. Mathematical taxonomy . John Wile y & Sons Ltd., London, 1971. W iley Series in Probability and Mathematical Statistics. [37] M. Gondran and M. Minoux. Graphs, dioids and semi rings: New models and algorithms . Springer , 2008. [38] F acundo M ´ emoli. Metric structures on datasets: stability and classification of algorithms. In Pr oceedings of the 14th international conference on Computer analysis of images and patterns - V olume P art II , CAIP’11, pages 1–33, Berlin, Heidelber g, 2011. Springer-V erlag. [39] D. Burago, Y . Burago, and S. Ivanov . A Course in Metric Geometry , volume 33 of AMS Graduate Studies in Math. American Mathematical Society , 2001. [40] V . V assilevska, R. W illiams, and R. Y uster . All pairs bottleneck paths and max-min matrix products in truly subcubic time. Theory of Computing , 5:173–189, 2009. [41] R. Duan and S. Pettie. Fast algorithms for (max, min)-matrix multiplication and bottleneck shortest paths. Symposium on discr ete algorithms , 2009. [42] T . C. Hu. The maximum capacity route problem. Operations Research , 9(6):898–900, 1961. [43] D. M ¨ ullner. Modern hierarchical, agglomerativ e clustering algorithms. ArXiv e-prints , September 2011. [44] V . Gurvich and M. Vyalyi. Characterizing (quasi-) ultrametric finite spaces in terms of (directed) graphs. Discrete Applied Mathematics , 160(12):1742– 1756, 2012. [45] F acundo M ´ emoli and Guillermo Sapiro. A theoretical and computational framew ork for isometry inv ariant recognition of point cloud data. F ound. Comput. Math. , 5(3):313–347, 2005. [46] United States Census Bureau. State-to-state migration flows. U.S. Depart- ment of Commerce , 2011. [47] The Pew Forum on Religion & Public Life. U.S. religious landscape survey: full report. 2008. A vailable at http://religions.pewforum.org/reports. [48] Bureau of Economic Analysis. Input-output accounts: the use of commodities by industries before redefinitions. U.S. Department of Commerce , 2011. [49] A.N. Kolmogoro v and S.V . Fomin. Intr oductory Real Analysis . translated by R.A. Silverman, Dov er Publications, New Y ork, 1975.
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment