A note on the triangle inequality for the Jaccard distance

Two simple proofs of the triangle inequality for the Jaccard distance in terms of nonnegative, monotone, submodular functions are given and discussed.

Authors: Sven Kosub

A note on the triangle inequalit y for the Jaccard distance Sven Kosub Departmen t of Computer & Information Scienc e, Univ ersit y o f Konstanz Bo x 6 7, D- 78457 K onstanz, G erman y Sven.Kosub@ uni-konstanz.de Decem ber 9, 2016 Abstract Two simple pro ofs o f the triang le inequa lity for the Jaccard distance in terms of nonnegative, m onotone, submo dular functions are given and discussed. The Jaccard index [8] is a classical similarity measure on sets w ith a lot of pr actical applications in inform ation r etriev al, d ata mining, mac hine learnin g, and many more (cf., e.g., [7]). Measuring the relativ e size of the o v erlap of t w o fin ite sets A and B , the Jaccard index J and the asso ciated Jaccard distance J δ are f ormally d efined as: J ( A, B ) = def | A ∩ B | | A ∪ B | , J δ ( A, B ) = def 1 − J ( A, B ) = 1 − | A ∩ B | | A ∪ B | = | A △ B | | A ∪ B | where J ( ∅ , ∅ ) = def 1. The Jaccard distance J δ is kno wn to fu lfill all p r op erties of a metric, most notably , the triangle inequalit y—a fact that h as b een observ ed many t imes, e.g., via metric transforms [12, 13, 4], emb eddings in v ector spaces (e.g., [15, 11, 4]), min- wise indep enden t p erm utations [1], or sometimes cum b ersome arithmetics [10, 3]. A v ery simple, elemen tary pro of of the triangle inequalit y w as giv en in [5] using an appropriate partitioning of s ets. Here, w e giv e tw o more simp le, d irect pro ofs of the triangle inequ alit y . O n e pr o of comes without an y set difference or disjoint ness of sets. It is based only on the fundamental equation | A ∪ B | + | A ∩ B | = | A | + | B | . As suc h, the pro of is generic and leads to (sub)mo du lar v er s ions of the J accard distance (as defined b elo w ). The second p ro of un folds a sub tle d ifference b et w een the t w o p ossible v ersions. Though the original motiv ation was to give a pro of of the tria ngle inequalit y as sim p le as p ossible, the link w ith submo dular functions is in teresting in itself (a s also recen tly suggested in [6]). Let X b e a finite, n on-empt y ground set. A set function f : P ( X ) → R is said to b e submo dular on X if f ( A ∪ B ) + f ( A ∩ B ) ≤ f ( A ) + f ( B ) for all A, B ⊆ X . If all inequalities are equations then f is called mo dular on X . It is kno wn that f is submo dular on X if and on ly if the follo wing condition holds (cf., e.g., [14]): f ( A ∪ { x } ) − f ( A ) ≥ f ( B ∪ { x } ) − f ( B ) for all A ⊆ B ⊆ X , x ∈ B (1) A set function f is monotone if f ( A ) ≤ f ( B ) for all A ⊆ B ⊆ X ; f is nonnegativ e if f ( A ) ≥ 0 f or all A ⊆ X . E ach nonn egativ e, monotone, modu lar fun ction f on X can b e 1 written as f ( A ) = γ + P i ∈ A c i where γ , c i ≥ 0 for all i ∈ X (cf., e.g., [14]). E xamples are set cardinalit y or degree sum in graphs. Standard examples of nonnegativ e, m onotone, submo d ular set fu nctions are m atroid rank, net wo rk flo w to a sink, en tropy of sets of random v ariables, and neigh b orho o d size in bipartite graphs. Let f b e a nonnegativ e, monotone, su bmo du lar set fun ction on X . F or sets A, B ⊆ X , w e define t w o cand id ates for submo dular Jac c ar d distanc es , J δ,f and J ∆ δ,f , as follo ws: J δ,f ( A, B ) = def 1 − f ( A ∩ B ) f ( A ∪ B ) , J ∆ δ,f = def f ( A △ B ) − f ( ∅ ) f ( A ∪ B ) , where J δ,f ( A, B ) = J ∆ δ,f ( A, B ) = def 0 if f ( A ∪ B ) = 0. It is clear that 0 ≤ J δ,f ( A, B ) ≤ J ∆ δ,f ( A, B ). If f is mo dular then J δ,f = J ∆ δ,f . In particular, for f ( A ) = | A | (i.e., the cardinalit y of the set A ⊆ X ), we obtain the standard J accard distance J δ = J δ,f = J ∆ δ,f . First, w e giv e a simple p ro of of the triangle inequalit y for J δ,f . In terestingly , this is only p ossible for mo dular s et fu nctions (see the third remark after Theorem 3 ). Lemma 1. L e t f b e a nonne gative, monotone, su bmo dular set function on X . Then, for al l se ts A, B , C ⊆ X , it holds th at f ( A ∩ C ) · f ( B ∪ C ) + f ( A ∪ C ) · f ( B ∩ C ) ≤ f ( C ) ·  f ( A ) + f ( B )  . Pr o of. W e ea sily obtain f ( A ∩ C ) · f ( B ∪ C ) ≤ f ( A ∩ C ) ·  f ( B ) + f ( C ) − f ( B ∩ C )  (submo d u larit y of f ) ≤ f ( C ) ·  f ( B ) − f ( B ∩ C ) + f ( A ∩ C )  (monotonicit y o f f ) and, by s w apping A and B , f ( A ∪ C ) · f ( B ∩ C ) ≤ f ( C ) ·  f ( A ) − f ( A ∩ C ) + f ( B ∩ C )  . Ov erall, f ( A ∩ C ) · f ( B ∪ C ) + f ( A ∪ C ) · f ( B ∩ C ) ≤ f ( C ) ·  f ( B ) − f ( B ∩ C ) + f ( A ∩ C ) + f ( A ) − f ( A ∩ C ) + f ( B ∩ C )  = f ( C ) ·  f ( B ) + f ( A )  This sho ws the lemma. Corollary 2. L et f b e a nonne gative, monotone, submo dular set fu nction on X . Then, for al l sets S, T ⊆ X , it holds that f ( S ∩ T ) · f ( S ∪ T ) ≤ f ( S ) · f ( T ) . Pr o of. Ap ply Lemm a 1 to sets A = def S , B = def S and C = def T . Theorem 3. L et f b e a nonne gative, monotone, mo dular set fu nction on X . Then, f or al l se ts A, B , C ⊆ X , it holds th at J δ,f ( A, B ) ≤ J δ,f ( A, C ) + J δ,f ( C, B ) . 2 Pr o of. S a y that a set A is a null set iff f ( A ) = 0. Obs er ve that if at least one of the sets is a null set then th e inequalit y is satisfied. So, it is enough to show the equiv alen t inequalit y f ( A ∩ C ) f ( A ∪ C ) + f ( B ∩ C ) f ( B ∪ C ) ≤ 1 + f ( A ∩ B ) f ( A ∪ B ) = f ( A ) + f ( B ) f ( A ∪ B ) (2) for arb itrary non -null sets A, B , C ⊆ I . This is seen as follo ws: f ( A ∩ C ) f ( A ∪ C ) + f ( B ∩ C ) f ( B ∪ C ) = f ( A ∩ C ) · f ( B ∪ C ) + f ( A ∪ C ) · f ( B ∩ C ) f ( A ∪ C ) · f ( B ∪ C ) ≤ f ( C ) ·  f ( A ) + f ( B )  f ( A ∪ C ) · f ( B ∪ C ) (b y Lemma 1) ≤ f ( C ) ·  f ( A ) + f ( B )  f  ( A ∪ C ) ∩ ( B ∪ C )  · f ( A ∪ B ∪ C ) (b y Corollary 2) ≤ f ( C ) f  ( A ∩ B ) ∪ C  · f ( A ) + f ( B ) f ( A ∪ B ) (monotonicit y o f f ) ≤ f ( A ) + f ( B ) f ( A ∪ B ) (monotonicit y of f ) This pro v es the theorem. R emarks : W e co mment on the p ro of of the triangle inequ alit y f or J δ,f : 1. It follo ws fr om Theorem 3 that the triangle in equalit y is v alid for the stand ard Jaccard distance J δ , the generalized Jaccard distance giv en for v ectors x, y ∈ R n b y 1 − P n i =1 min { x i , y i } P n i =1 max { x i , y i } (with the sub case that x i = µ A ( z ) and y i = µ B ( z ) denote m ultiplicities of (o ccur- rences of ) z in m ultisets A and B ; cf. [9]), and the Steinhaus distance [1 2, 4] (i.e., an y set m easur es, includ ing probability measures). W e mention that all these results can equally easily b e prov en b y the argum ents in [5]; ho w ev er, for m o dular fu nctions satisfying f ( ∅ ) > 0, these argumen ts fail. 2. T h eorem 3 is true f or nonnegativ e, monotone, mo dular functions defin ed o ver dis- tributiv e lattices; Lemm a 1 and Corollary 2 also hold for nonn egativ e, monotone, submo d ular functions defined o v er distribu tiv e lattice s. Notice that J ∆ δ,f is not de- fined o v er all distribu tive lattices ( see also the th ird remark after T heorem 4). 3. In general, Th eorem 3 is not true for n onnegativ e, monotone, submo dular fu nctions: An y set function f su c h that f ( A ) = f ( B ) = f ( A ∪ B ) > f ( A ∩ B ) ≥ 0 for non- empt y , incomparable sets A, B refutes J δ,f ( A, B ) ≤ J δ,f ( A, A ∪ B ) + J δ,f ( A ∪ B , B ). Concrete examples include linear cost fu nctions with b udget restrictions, i.e., f ( A ) = min { B , P i ∈ A c i } , or the n eighb orh o o d size in a b ipartite graph G = ( U ⊎ V , E ), i.e., f ( A ) = | Γ( A ) | where A ⊆ U and Γ( A ) = S u ∈ A { v ∈ V |{ u, v } ∈ E } . 3 Next w e give a simple pro of of the triangle inequalit y for J ∆ δ,f . Theorem 4. L et f b e a nonne g ative, monotone, submo dular set function on X . Then, for al l sets A, B , C ⊆ X , i t holds that J ∆ δ,f ( A, B ) ≤ J ∆ δ,f ( A, C ) + J ∆ δ,f ( C, B ) . Pr o of. W e split the set C into t w o disjoint sets C 0 ⊆ A ∪ B and C 1 ⊆ A ∪ B , b oth p ossibly empt y , su c h that C = C 0 ∪ C 1 . W e obtain f ( A △ C ) − f ( ∅ ) f ( A ∪ C ) + f ( B △ C ) − f ( ∅ ) f ( B ∪ C ) ≥ f ( A △ C ) + f ( B △ C ) − 2 f ( ∅ ) f ( A ∪ B ∪ C 1 ) (monotonicit y of f ) ≥ f ( A △ C ∪ B △ C ) − f ( ∅ ) f ( A ∪ B ∪ C 1 ) (submo d u larit y , monotonicit y o f f ) ≥ f ( A △ B ∪ C 1 ) − f ( ∅ ) f ( A ∪ B ∪ C 1 ) (monotonicit y of f ) ≥ f ( A △ B ) f ( A ∪ B ) − f ( ∅ ) f ( A ∪ B ∪ C 1 ) (submo d u larit y of f , Cond. (1)) ≥ f ( A △ B ) f ( A ∪ B ) − f ( ∅ ) f ( A ∪ B ) (monotonicit y of f ) This sho ws the theorem. R emarks : W e co mment on the p ro of of the triangle inequ alit y f or J ∆ δ,f : 1. It follo ws once more from Theorem 4 that the standard J accard d istance, the gen- eralized Jaccard d istance, and the S teinhaus distance satisfy the triangle inequalit y . Moreo ver, J ∆ δ,f is also a (pseudo)metric for, e.g., linear cost functions w ith bud get restrictions and th e neigh b orho o d size in bipartite graphs. 2. T h eorem 4 suggests that J ∆ δ,f is the righ t defi n ition of a su bmo du lar Jaccard distance. As a consequence, one m ight sa y th at the submo dular Jaccard (similarit y) index should b e defined as the inv erse su bmo du lar Jaccard d istance, i. e., J ∆ f ( A, B ) = def 1 − J ∆ δ,f = 1 − f ( A △ B ) − f ( ∅ ) f ( A ∪ B ) Again, if f ( A ) = | A | then we obtain the standard Jaccard ind ex J = J ∆ f = 1 − J δ,f . 3. T h ough J ∆ δ,f migh t generally not b e d efined o ver a giv en distrib utiv e lattice, it can b e seen that for eac h nonnegativ e, monoto ne, submo dular fun ction f : F → R defined on a f amily F ⊆ P ( X ) closed und er u nion and intersecti on, there is a (not necessarily unique) nonnegativ e, monotone, s ubmo d ular extension f : P ( X ) → R on X su c h that f ( A ) = f ( A ) for all A ∈ F (e.g., [16]), so that J ∆ δ, f can b e used instead. Ac knowledgmen ts: I am grateful to Ulrik Br andes (K onstanz) and Julian M¨ uller (Konstanz) for helpful discussio ns. 4 References [1] M. S. Charik ar. Similarity Es timation T e chniques from Rounding Algorithms. In: Pr o c e e dings of the 34th Annual A CM Symp osium on The ory of Computing (STOC’2002) , pp. 38 0–38 8. A CM P ress, New Y ork, NY, 20 02. [2] M. M. Deza , E. Deza. Encyclop e dia of Distanc es. Spr inger, Ber lin, 200 9. [3] O . F ujita. Metrics based on av erage distance b etw een sets. Jap an Journal o f Industrial and Applie d Mathematics , 30 (1):1–19 , 2013 . [4] A. Ga rdner, J. Kanno , C. A. Duncan, R. Selmic. Measuring Distance Betw een Unordered Sets of Different Sizes. In: Pr o c e e dings of the 2014 IEEE Confer enc e on Computer V ision and Pattern R e c o gnition (CVPR’2014) , pp. 137 –143 . IEEE, New Jersey , NJ, 2 014. [5] G. Gilb ert. Distance b etw ee n sets. Letters to Natur e , 239(5 368):17 4, 1972 . [6] J . Gillenw ater, R. Iyer, B. Lusch, R. K ida mbi, J . A. Bilmes. Submo dular Hamming Metrics . In: A dvanc es in Neur al Information Pr o c essing Systems 28 , 3141 - 3149 . NIPS P ro ceedings, Decem b er 2 0 15. [7] J . C. Gower. Similarity , Dissimilarit y and D istance, Measures of. In: S. K otz, C. B. Read, N. Balakr ishnan, B. Vidako vic (eds.), Encyclop e dia of St atistic al Scienc es , vol. 12 ., pp. 7730– 7738. 2nd edition, J ohn Wiley , New Y o r k, NY, 2008. [8] P . Jaccard. ´ Etude comparative de la distribution florale dans une p or tion des Alpes et du Jura. Bul letin de la So ci´ et´ e V audoise des Scienc es Nature l les , 37 (142):54 7–579 , 19 01. [9] W. A. Kosters , J. F. J . La r os. Metrics for Mining Multisets. In: M. Br amer, F. Co enen, M. Petridis (eds.), R ese ar ch and Development in In t el ligent Systems XXIV, Pr o c e e dings of the Twenty-seventh SGAI International Confer enc e on Innovative T e chniques and Applic ations of Artificial Intel ligenc e (AI’2007) , pp. 2 9 3–30 3. Springer, Berlin, 20 07. [10] M. Lev andowsky , D. Winter. Distance b etw een s ets. Le tter s to Natu re , 234 (5323):3 4–35, 1 971. [11] A. H. Lipk us. A pro o f o f the triangle inequalit y for the T animoto distance. Journ al of Mathematic al Chemistry , 26:26 3–265 , 19 99. [12] E. Mar czewski, H. Steinhaus. On a ce r tain distance of sets and the corr esp onding distance of functions. Col lo quium M athematicum , 6:31 9–327 , 1958. [13] D. A. Simovici, C. Djeraba . Mathematic al T o ols for Data Mining . Springer , L o ndon, 200 8. [14] A. Schrijver. Combi natorial O ptimization , vol. B. Spring er, Ber lin, 2003 . [15] T. T. T animoto. An elemen tary mathematical theory of c la ssification and prediction. IBM Repo rt, Nov ember 19 58. [16] D. M. T o pkis. Minimizing a submo dular function on a lattice. Op er ations R ese ar ch , 26(2):305 – 321, 197 8. 5

Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Leave a Comment