Characterization and exploitation of community structure in cover song networks

Characterization and Exploitation of Com m unit y Structure in Co v er Song Net w orks Joan Serr` a a,1 , Massimiliano Zanin b , P erfecto Herrera a , Xa vier Serra a a Music T e chnolo g y Gr oup, Universitat Pomp eu F abr a, R o c Bor onat 138, 080 18 Bar c elo na, Sp ain. b INNAXIS F oun dation & R ese ar ch Institute, V el´ azquez 157, 2800 2 Madrid, Sp ain. Center for Biome dic al T e chnolo gy, Polyte chnic University of Madrid, Campus Monte ganc e do, 28223 Pozuelo de Alar c´ on, Madrid, S p ain. Abstract The use of commun ity detection algorithms is explored within the f ramew ork of co v er song iden t iﬁcatio n, i.e. the auto matic detection of diﬀeren t audio renditions of the same unde rlying music al piece. Un til now, this task has b een p osed as a t ypical query-b y-example task, where one submits a query song and the system retriev es a list of p ossible mat c hes ranked by their similarit y to the query . In this work, w e prop ose a new appro a c h whic h uses song comm unities (clusters, g roups) t o pro vide more relev an t answ ers to a giv en query . Starting from the output of a state-of- the-art system, songs are embedded in a complex w eighte d net w ork whose links represen t similarit y (related m usical conten t). Comm unities inside the netw ork are then recognized as groups of co v ers and this information is used to enhance the results of the system. In particular, w e sho w that this approa c h increases b oth the coherence and the accuracy of the system. F urthermore, w e pro vide insigh t in to the in ternal organization of individual co v er song comm unities, sho wing that there is a tendency for the original song to b e cen tral within the comm unit y . W e p ostulate that t he metho ds and results presen ted here could b e relev ant to ot her query-b y-example tasks . Keywor ds: Complex net w orks, Comm unity detection, Clustering, Music Email addr esses: joan .serra j@upf.ed u (Joan Serr` a), mz anin@i nnaxi s.org (Massimiliano Zanin), pe rfect o.herr era@upf.edu (Perfecto Herr era), xavier .serr a@upf.edu (Xavier Serra) 1 Corresp o nding author . Pho ne +34 93 542 28 6 4, fax + 34 93 542 251 7. Pr eprint su bmitte d to Pattern R e c o gnition L etters August 8, 2018 retriev al, Cov er songs, O riginal song 1. In tro duction Audio co ve r song iden tiﬁcation is the task of automat ically detecting whic h songs ar e v ersions of the same underlying mus ical piece using only information extracted from their ra w audio signal (Serr` a et al., 2010). This addresses an imp ort an t problem faced b y mo dern so ciet y: the classiﬁcation and organization of digita l infor ma t io n. More concretely , it addresses the detection of near-duplicate music al do cuments (Casey et al., 200 8). Co v er song iden tiﬁcation is a c hallenging task, since cov er songs might diﬀer from their originals in sev eral music al a sp ects suc h as tim bre, temp o, song structure, main tonalit y , arrangemen t, lyrics, or langua ge of the vocals (Serr` a et al., 2010). Nev ertheless, the iden tiﬁcation of cov er song vers ions has b een a v ery activ e ar ea of study within the m usic information retriev al (MIR) comm unity ov er the last y ears (Serr` a et al., 201 0; Casey et al., 20 0 8; Do wnie, 2008) . Thanks to these eﬀorts, and to the dev elopmen t of a num- b er of sp eciﬁc to ols to extract a nd a nalyze m usical informatio n from audio (Casey et al., 2008), w e now disp ose of a v ar iet y of metrics f or the estimation of the similarit y b et w een cov er songs (Serr` a et al., 20 1 0). These metrics are commonly used to searc h for co v ers in a m usic col- lection, ranking the relev ance of eac h song to a g iven query . Indeed, co ve r song iden tiﬁcation has b een traditiona lly set up as a typical information re- triev al (IR) task of query-b y-example ( Ba eza-Y ates and Rib eiro- Neto, 1999; Manning et al., 2008), where the user submits a query (a song) and receiv es an answ er bac k (a list o f songs ranked by their relev ance to the query). In the presen t article we prop ose a no ve l approach: af t er pro cessing isolated queries through query-b y-example, syste ms may fo cus on groups of items, with the new aim of iden tifying communities of songs within a giv en m usic collection 2 . Using suc h a strategy has many intuitiv e a dv an tages. Imp ortantly , o ne should b ear in mind that these adv an tages are not sp eciﬁc for the co v er song detection task, and hold for any IR sys tem op erating through query- b y-example (Baeza-Y ates and Rib eiro-Neto, 1999; Manning et a l., 2008), in- 2 Through the ma nu sc r ipt we use the words gro up, set, communit y , or clus ter inter- changeably . 2 cluding analogous systems such as recommendation systems (Resnic k a nd V arian, 1997). First, giv en that curren t systems provide a suitable metric t o quan- tify the similarity b et w een query items, sev eral w ell-researc hed options ex- ist to exploit this information in order to detect inheren t groups of items (Xu and W unsc h I I, 2009; Jain et al., 1999; F ortunato and Castellano, 2009; Danon et al., 20 05). Second, fo cusing o n groups of items ma y help the sys- tem in r etrieving more coheren t answe rs for isola ted queries. In particular, the answ ers to any query b elonging to a give n group w ould coheren t ly con- tain the o ther songs in the group, an adv an tag e that is not guara n teed b y query-b y-example sys tems alone. T hird, m usic collections are usually or - ganized and structured on m ultiple scales. Thus we can infer and exploit these regularities to increase t he o v erall accuracy of traditional cov er song iden tiﬁcation systems . Note that the tw o previous adv antages speciﬁcally aim to achiev e higher user satisfaction a nd conﬁdence in IR systems, as they can b e p erceiv ed as rational agen ts or assistan ts. Finally , once groups of coheren t items are correctly detected, one can study these groups in order to retrieve new informa t ion, either from the individual communities o r fr o m the relations b etw een these. In this a r t icle, for automa t ically iden tifying co v er song sets (o r g roups) in a m usic collection w e emplo y a n um b er of unsup ervised grouping a lgorithms on t o p of a state-of-t he-art query-b y-example system (Serr` a et al., 2009a). W e consider clustering alg o rithms (Xu and W unsc h I I, 200 9; Jain et a l., 1999) and, in particular, comm unity detection algo rithms (F ortunato and Castellano, 2009; D anon et al., 2005). The reader ma y easily see the resem blance b e- t w een the detection of co ve r song sets and a more classical comm unity de- tection task inside a complex net w ork (Bo ccaletti et al., 2006; Costa et al., 2008). This w ay , a set of no des N ≡ { n 1 , n 2 , . . . , n N } represen ts the N recordings b eing analyzed, and the elemen t s of the N × N w eight matrix W represen t the distance (dissimilarit y) b etw een an y couple of no des. Pro- vided that the weigh ts of this matrix are assigned with the help of a suit- able cov er song dissimilarit y metric (e.g. the same one used to originally rank the answ er to a query), comm unities inside this complex net w ork will represen t sets of recordings with related mu sical con tent. Although com- plex net w orks and commun ity detection algorithms ha v e b een used in many problems in volving complex systems (Bo ccaletti et al., 2006; Costa et al., 2008), and mor e sp eciﬁcally in study ing m usical net w orks (Buld ´ u et al., 20 07; T eitelbaum et al., 200 8; Cano et al., 20 0 6), to the b est of our kno wledge they ha v e nev er b een a pplied in the contex t o f a retriev al task b efo r e. The only 3 exception is our previous w o rk (Serr` a et al., 2009b), of whic h the presen t article sho ws considerable extensions, impro ve ments , and new r esults. An alternativ e tec hnique for impro ving cov er song r etriev al was considered in (Lagrange and Serr` a, 201 0). W e now pro vide a brief o ve rview of our main contributions a nd, at the same time, outline the remaining structure of the article. W e ﬁrst build and analyze a co v er song netw ork (Sec. 2). T o this end w e apply a state-of- the- art a lgorithm for cov er song similarity to an in-house mus ic collection. W e then do an analysis of this net w ork, b o th of its top ology and of the c har- acteristics of the p ercolation pr o cess. Within this analysis we ﬁnd a strong mo dular structure, with we ll- deﬁned communitie s a nd a clustering co eﬃcien t higher t han exp ected in an equiv alen t ra ndom netw ork. This conﬁrms our in tuitiv e reasoning that co ve r songs naturally cluster in to cov er song sets. With this knowle dge we can then safely pro ceed to detect the actual sets of co v ers based on the output o f t he state-of-the-art a lgorithm (Sec. 3). F or that, sev eral clustering and commu nity detection strategies are compared. F our of these strategies are based on comm unity detection in complex net- w orks, of whic h three of them are no v el contributions. An assessme nt of the computation time of all the considered metho ds is also done. Next, we show ho w query-b y-example results can b e improv ed b y incorp o rating the infor- mation obtained through the g r oup detection stage into the system (Sec. 4). Indeed, our results show a coheren t increase in the accuracy o f the system, with particularly pro mising v alues fo r commun ity detection metho ds. This conﬁrms our intuitiv e reasoning that exploiting the r egula rities found in t he answ ers given by a query-b y-example system can lead to an ov erall accuracy increase. Finally , w e fo cus on the internal o r ganization of co ve r song sets. More concretely , a pioneering study o f the role that original songs (i.e. the ones p erfo rmed b y the original author or artist) pla y within a group of co v ers is done (Sec. 5) . T o the authors’ kno wledge, the presen t study is the ﬁrst a t - tempt done in this direction. In particular, w e show that there is a tendency for the original song to b e cen tral within the comm unit y . A short conclusions section closes the art icle ( Sec. 6 ) . 2. Co v er song net works 2.1. Building the network The ﬁrst step required b y our prop osal is to create a net w ork and to em b ed no des (songs) in to it. W e use an in-house m usic collection of 2125 4 songs comprising a v ariety of genres and st yles. This collection is an extension of the one used by Serr` a et al. (200 9 a), to whic h w e refer for further details, and consists of 523 non-ov erlapping groups of cov er songs, each group having an iden tiﬁcatory lab el whic h w e use in the ev alua t io n stages. The cardinality of these groups, i.e. the n umber of songs p er group, v aries b et w een 2 and 18, with an exp ected v a lue of 4. Links b et w een netw ork no des should represen t the cov er song relationship b et w een correspo nding mus ical pieces (the dissimilarit y b etw een their m usical con ten t). Therefore, an algorit hm to compute this dissimilarit y is needed in order to calculate the elemen ts w i,j of the matrix W for each couple of no des n i and n j . Sev eral a lternativ es for suc h dissimilarit y measures ha ve b een prop osed in the literature (Serr` a et al., 2 010). In particular, we use the Q max measure prese nted b y Serr` a et a l. (2009 a). This measure allo ws to trac k all p oten tial diﬀerences b etw een cov er songs of the same underlying m usical piece (Sec. 1). Ho w ev er, in spite of b eing one of the most promising strategies prop osed so far, its accuracy is not p erfect. This is a further motiv ation to impro v e the accuracy of the system through a p ost-pro cessing step based on co v er set detection. A brief outline of the Q max measure follow s. First, a t ime series of mu- sical descriptors is extracted for all songs. In the case of cov er songs, t onal similarit y is commonly exploited (Serr` a et al., 2010). In particular, Q max em- plo ys time series of pitch class proﬁles (PCP; G´ omez, 2006). PCP features estimate the amo unt of energy for each mus ical note of the W estern m usical scale tha t is presen t in a short analysis frame of the raw audio signal. This analysis is p erformed in a mov ing windo w, leading to a time series that is ro- bust a gainst non-tonal comp onen ts (e.g. ambie nt noise or p ercussiv e sounds), and indep enden t o f timbre and the sp eciﬁc instrumen ts used. F urthermore, PCPs are indep enden t of a m usical piece’s loudness and v olume ﬂuctuations. As co v er v ersions ma y b e play ed in diﬀeren t tonalities (e.g. to b e adapted to the c haracteristics of a pa r ticular singer or instrumen t) one has to tac kle diﬀerences in the main k ey of the song. This can b e eﬀectiv ely done through v arious strategies (Serr` a et al., 2010). F rom the ab ov e PCP time series, one forms a state space represen tation for eac h song using delay coor dinates (Kan tz and Sch reib er , 2004). These represen tatio ns are then compared on a pairwise basis thro ugh a cross recur- rence plo t (CRP), whic h is the biv ariate generalization of classical recurrence plots (Ec kmann et al., 1987; Marw an et al., 2007). Finally , the Q max measure is used to extract features that are sensitiv e t o co ve r song CRP characteris- 5 tics. This measure was deriv ed f rom a previously published R QA measure [ L max , Ec kmann et al. (1987)], but ada pted to the problem at hand by allow- ing to track curve d and p oten tially disrupted t r a ces in a CRP . Despite this adaptation, in Serr` a et al. (2009a) w e sho w ed that the Q max measure is not restricted to MIR nor to the particular application of cov er song iden tiﬁca- tion. An example of the ab o v emen tioned pro cess is sho wn in F igs. 1 and 2, whic h compar es the song “Ro ck aro und the clo c k” as p erformed by Elvis Presley v ersus a v ersion p erformed b y The Sex Pistols. Since it is not the o b- jectiv e o f this article to thoroug hly presen t the Q max measure, the intereste d reader is referred to Serr` a et al. (2009 a) for furt her details. A comprehen- siv e ov erview of co ve r song similarity measures can b e found in Serr` a et al. (2010). The symmetric measure Q max represen ts similarit y: the higher the v alue, the more similar b oth ana lyzed recordings are in terms of their tonal m usical con ten t. T o ﬁll the we ighted adja cency matrix W of t he net w ork, w e pro ceed as in Serr` a et al. (2009c) and con ve rt Q max to a dissimilarit y v alue b y taking w i,j = p | s j | Q max ( s i , s j ) , (1) where | s j | is pro p ortional to t he duration of song s j and Q max ( s i , s j ) ∈ [1 , max ( | s i | , | s j | )]. Notice that w i,j = w j,i , iﬀ s i and s j ha v e the same dura- tion. Recall that the no des of the net w ork N ≡ { n 1 , n 2 , . . . , n N } represen t the N recordings s i b eing analyzed. 2.2. A nalysis of the network The result o f the previous pro cedure o v er the av aila ble data is a w eighte d directed graph expressing co v er song relationships. This resulting net w ork is represen ted in Fig. 3. A threshold has b een applied so that only pairs of no des with w i,j ≤ 0 . 2 are draw n. Some clusters, that is, sets of co v ers, are already visible, esp ecially in the external zones o f the net w ork. In order to understand ho w the netw ork ev olves when the threshold is mo diﬁed, w e represen t six diﬀerent classical netw ork metrics as a function of the threshold ( F ig. 4). These metrics corresp ond to (Bo ccaletti et al., 2006): graph densit y , nu mber of indep enden t comp o nents, size of the strong gian t compo nen t, num b er of isolated no des, eﬃciency (Lat o ra and Marc hiori , 2001), and clustering co eﬃcien t. In the same plots, w e also displa y the v alues 6 PCP samples PCP samples 50 100 150 200 250 50 100 150 200 250 Figure 1 : CRP for the song “ Ro ck a r ound the clo ck” as per fo rmed b y E lvis Pr esley ( x axis) a nd T he Sex Pistols ( y axis). Axes repr esent time and black dots represent corres p o ndences b etw een the tonal cont ent of both songs . W e see quite long black tr a ces through the CRP , whic h are usually not straight diagonals but curved and disrupted ones, indicating similarly evolving tempo r al patterns in b oth song repres entations. for the last ﬁv e measures a s expected in random net w orks with the same n um b er of no des and links. By lo oking at the ev olution o f these metrics, w e can infer some in ter- esting kno wledge ab out the netw ork and it s inherent structure. Notice that when reducing the threshold (and therefore increasing the deleted links), the net w ork splits into a higher n umber of cluste rs than exp ected (Fig. 4 , top righ t), whic h represen ts the formatio n o f co v er song comm unities. This pro- cess b egins aro und a threshold o f 0 . 5 (see, for instance, the ev olution of the size of the strong giant comp onen t). When these comm unities are formed, they maintain a high clustering co eﬃcien t and a high tria ngular coherence (b ottom righ t gra ph of Fig. 4, b etw een 0 . 3 and 0 . 5), i.e. sub-net works o f co v- 7 PCP samples PCP samples 50 100 150 200 250 50 100 150 200 250 5 10 15 20 25 30 35 40 45 Figure 2: The Q ma trix (Serr ` a et al., 200 9 a) for the sa me pair of s ongs. This matr ix q ua n- tiﬁes the lengths of the pr eviously mentioned black t ra ces. The Q max measure co rresp onds to the maximum v alue in Q (in this example Q max = 4 6 . 6). ers tend to b e fully connected. It is also interes ting to note that the num b er of isolated no des remains low er than expected, except for high thresholds (Fig. 4, middle righ t). This suggests that most of the songs are connected to some cluster while a small group of them are diﬀeren t, with unique m usical features. W e found nearly iden tical results using a symmetric dissimilarit y matrix W ′ with w ′ i,j = w ′ j,i = ( w i,j + w j,i ) / 2. 3. Detecting groups of co vers W e assess the detection of co v er sets ( o r comm unities) b y ev aluating a n um b er of unsup ervised metho ds either based on clustering or o n complex net w orks. Three o f these are nov el approaches . Since standard implemen ta- tions of clustering algorithms do not op erate with a n asymmetric dissimilar- 8 Figure 3: Graphica l representation of the cover song netw or k when a thr eshold of 0 . 2 is applied. O riginal songs are drawn in blue, while cov ers are in black. In Sec. 5, the ro le of original songs inside e a ch communit y will b e further studied. it y measure, in this section and in the subsequen t one w e use t he symmetric dissimilarit y matrix W ′ explained ab ov e. 3.1. Metho ds K-medoids K-medoids (KM) is a classical tec hnique to gro up a set of ob- jects inside a previously kno wn n um b er of K clusters. This algorithm is a common c hoice when the computation of means is una v a ila ble (as it solely op erates o n pairwise distances) and can exhibit some adv a n tages compared to the standard K-means algorithm (Xu and W unsc h I I, 2009), in particular with noisy samples . The main dra wbac k for it s a pplication is that, as w ell as with the K-means algorithm, the K-medoids algo- rithm needs to set K , the num b er of exp ected clus ters. Ho we ve r, sev eral 9 0 . 0 0 . 5 1 . 0 1 . 5 2 . 0 2 . 5 3 . 0 3 . 5 0 . 0 0 . 2 0 . 4 0 . 6 0 . 8 1 . 0 G r a p h d e n si t y T h r e sh o l d 0 . 0 0 . 5 1 . 0 0 5 0 0 1 0 0 0 1 5 0 0 2 0 0 0 2 5 0 0 N u m b e r o f co m p o n e n t s T h r e sh o l d C o ve r s n e t w o r k R a n d o m n e t w o r k 0 . 0 0 . 5 1 . 0 1 . 5 0 . 0 0 . 2 0 . 5 0 . 7 0 . 9 S i ze o f t h e S t r o n g G i a n t C o m p o n e n t T h r e sh o l d C o ve r s n e t w o r k R a n d o m n e t w o r k 0 . 0 0 . 2 0 . 4 0 . 6 0 . 8 1 . 0 0 . 0 0 . 2 0 . 5 0 . 7 0 . 9 P r o p o r t i o n o f i so l a t e d n o d e s T h r e sh o l d C o ve r s n e t w o r k R a n d o m n e t w o r k - 0 . 5 0 . 0 0 . 5 1 . 0 1 . 5 2 . 0 2 . 5 3 . 0 3 . 5 4 . 0 0 . 0 0 . 2 0 . 4 0 . 6 0 . 8 1 . 0 E f f i ci e n cy T h r e sh o l d C o ve r s n e t w o r k R a n d o m n e t w o r k - 0 . 5 0 . 0 0 . 5 1 . 0 1 . 5 2 . 0 2 . 5 3 . 0 3 . 5 4 . 0 0 . 0 0 . 2 0 . 4 0 . 6 0 . 8 1 . 0 C l u st e r i n g co e f f i ci e n t T h r e sh o l d C o ve r s n e t w o r k R a n d o m n e t w o r k Figure 4: (Black solid lines) Evolution o f s ix metrics of the netw or k a s a function of the threshold. These metrics are, from top left to b ottom right: gr aph density , num b er o f independent comp onents, size of the strong giant comp onent, n umber of isola ted no des, eﬃciency , and clustering co eﬃcient . (Red dashed lines) Exp ected v alue in a rando m net work with the same num be r of no des and links. Near ly identical ﬁgures were obtained when consider ing a sy mmetr ic dis similarity matrix (se e text). 10 heuristics can b e used for that purp ose. W e emplo y the K-medoids im- plemen ta tion o f the tamo pack age 3 , whic h incorp ora t es sev eral heuris- tics to ac hiev e an optimal K v a lue. W e use the default parameters and try all p ossible heuristics pro vided in the implemen tat io n. Hierarc hical cluster ing F our represe ntativ e agglo merativ e hierarc hical clus- tering metho ds ha ve b een tested (Xu and W unsc h I I, 20 0 9; Jain et al., 1999): single link age (SL), complete link age ( CL), group av erage link age (UPGMA), and w eighted av erage link age (WPGMA). W e use the hclus- ter implemen ta tion 4 with the default parameters, and we try diﬀerent cluster v alidit y criteria suc h as chec king descendan ts for inconsisten t v alues, or considering the maximal or the av erage in ter-cluster cophe- netic distance. Thus , in the end, all clustering alg o rithms rely only on the deﬁnition of a distance threshold d ′ Th , whic h is set exp erimentally . Mo dularit y opt imization This metho d (MO), as well as the next three algorithms, is designed to exploit a complex net w ork collab orative ap- proac h. MO extracts the comm unit y structure from large net w orks based on the optimization o f the net work mo dularity (F ortunato and Castellano, 2009; Danon et al., 2005). In particular, w e use the metho d prop o sed in Blondel et al. (2 008) with the implemen tation b y Aynaud 5 . This metho d is rep orted to outp erform all other kno wn communit y detec- tion algorithms in t erms of computational time while still main taining a high accuracy . Prop osed metho d 1 Our ﬁrst prop osed metho d (PM1) applies a threshold to each netw ork link in order to create a n un w eigh ted netw ork where t w o no des are connected only if their w eight (dissimilarit y) is less tha n a certain v alue w ′ Th . In addition, for eac h ro w of W ′ , w e only allo w a maximu m n umber of connections, considering only the lo w est v alues of the thresholded ro w as v alid links. That is, w e o nly consider the ﬁrst r ′ Th nearest neigh b ors for eac h no de ( v alues w ′ Th and r ′ Th are set exp erimentally). Finally , each connected comp onen t is assigned to b e a gro up of cov ers. Although this is a v ery na ¨ ıv e approach, it will b e 3 http:/ /frae nkel.mit.e du/TAMO 4 http:/ /code .google.co m/p/scipy- cluster 5 http:/ /pers o.crans.or g/ ~ aynaud /comm unities/in dex.html 11 sho wn that, giv en the considered netw ork and dissimilarit y measure, it ac hiev es a high accuracy leve l at low computational costs. Prop osed metho d 2 The previous approac h could b e furt her impro v ed by reinforcing t r iangular connections in the complex netw ork b efore the last step of c hec king for connected comp onents. In other words, pro- p osed metho d 2 (PM2) tries to reduce the “ uncertain t y” generated b y triplets of no des connected by tw o edges and to reinforce coherence in a triangular sense. This idea can b e illustrated by the following example (Fig. 5) . Supp o se that three no des in the net work, e.g. n i , n j , and n k , are cov ers: the resulting subnet w ork should b e triangular, so that every no de is con- nected with the t w o remaining ones. On the ot her hand, if n i , n j , and n k are not co v ers, no edge should exist b etw een them. If couples n i , n j and n i , n k are respective ly connected (Fig. 5A), w e can induce more coherence by either deleting one of the existing edges (Fig. 5B), or by creating a connection b et w een n j and n k (i.e. forcing the existence of a triangle, Fig . 5C). This coherence can b e measured thro ugh an ob- jectiv e function f O whic h considers complete and incomplete tr ia ngles in the whole graph. W e deﬁne f O as a w eighte d diﬀerence b et w een the n umber o f complete tria ngles N ▽ and the n umber of incomplete triangles N ∨ (three v ertices connected b y only tw o links) that can b e computed from a pair of vertices : f O ( N ▽ , N ∨ ) = N ▽ − αN ∨ . The con- stan t α , whic h w eigh ts the p enalization for having incomplete triangles, is set exp erimen t ally . The implemen ta tion of this idea sequen tially analyzes eac h pair of ve r- tices n i , n j b y calculating the v alue of f O for tw o situations: (i) when an edge b etw een n i and n j is artiﬁcially created and ( ii) when suc h an edge is deleted. Then, the option whic h maximizes f O is kept a nd the adjacency matrix is up dated as necessary . The pro cess of a ssigning co v er sets is the same as in PM1. Prop osed metho d 3 The computation time of the previous metho d can b e substan tially reduced by considering for the computation of f O only those vertic es whose connections seem to b e uncertain. This is what prop osed metho d 3 (PM3) do es: if the dissimilarit y b etw een t w o songs is extremely high or lo w, this means that the co v er song iden tiﬁcation system has clearly detected a match or a mismatc h. Accordingly , w e 12 Figure 5: Example of the pr o cess of reinforcing the triangular coherence of the net work. The sub-netw ork in the left pa rt (A) ca n b e improved b y either deleting a link (B), or by adding a third link be tw een the t wo no des tha t were not orig inally co nnected (C). only consider for f O the pairs o f vertice s whose edge w eight is close to w ′ Th (a closeness margin is empirically set). 3.2. Evaluation metho d olo gy The exp erimen tal setup is an imp ortan t asp ect to b e considered when ev aluating cov er song iden tiﬁcation systems. Eac h setup is deﬁned b y dif- feren t pa rameters (Serr` a et al., 2009b): the total n um b er o f songs N , the n um b er of cov er sets N C the collection includes, the cardinality C of the co v er sets (i.e. the num b er of songs in the set), and the n um b er of added noise songs N N (i.e. songs that do not b elong to a ny co ver set, whic h are included to add diﬃcult y to the task). Because some setups can lead to wrong accuracy estimations (Serr` a et al., 2010), it is safer to consider sev- eral of them, including ﬁxed and v ariable cardinalities. In our exp erimen ts w e use t he setups summarized in T able 1. The whole net w ork a nalyzed in Sec. 2.2 corresp onds to setup 3. F or other setups w e randomly sample co v er sets from setup 3 and rep eat the exp erimen ts N T times. W e either sample co v er sets with a ﬁxed cardinality ( C = 4, the exp ected cardinality of setup 3) or without ﬁxing it (v ariable cardinality , C = ν ). F or sampled setups, the a v erage a ccuracies rep ort ed. T o quantitativ ely ev aluate cov er set (or comm unity) detection w e use to the classical F- measure with eve n w eigh ting (Baeza-Y a t es a nd Rib eiro-Neto, 1999; Manning et al., 20 08), F = 2 ¯ P ¯ R ¯ P + ¯ R , (2) 13 Setup P arameters N C C N N N N T 1.1 25 4 0 100 20 1.2 25 ν 0 h 100 i 20 1.3 25 4 100 200 20 1.4 25 ν 100 h 200 i 20 2.1 125 4 0 50 0 20 2.2 125 ν 0 h 500 i 20 2.3 125 4 400 900 20 2.4 125 ν 400 h 900 i 20 3 523 ν 0 2125 1 T able 1: Exp er imen tal setup summary . The h·i delimiters denote exp ected v alue. whic h go es from 0 (worst case) to 1 (b est case). In Eq. (2), ¯ P and ¯ R cor- resp ond to precision and recall, resp ectiv ely . F or o ur ev alua tion, we com- pute these t w o quan tities indep enden tly for all songs and a v erage af ter- w ards, i.e. unlik e o t her clustering ev aluation measures (Saho o et al., 200 6), F is not computed on a p er- cluster basis, but on a p er-song basis. This w a y , and in con trast to the t ypical clustering F-measure o r other cluster- ing ev aluatio n measures lik e Purit y , En tropy , or F -Score (Saho o et al., 2006 ; Zhao and Karypis, 200 2), we do not ha v e to blindly choose whic h cluster is the represen ta tiv e for a given cov er set. F or each song s i , w e coun t the nu mber of true po sitiv es τ + i (i.e. the num b er of a ctual co v er songs of s i estimated to b elong to the the same comm unit y as s i ), the num b er of false p ositive s τ − i (i.e. the nu mber of songs estimated to b elong to the same g roup as s i that are actually not co ve rs o f s i ) and the n um b er of false negative s υ − i (i.e. the n umber of actual co v ers of s i that are not detected to b elong to the same group as s i ). Then we deﬁne P i = τ + i τ + i + τ − i (3) and R i = τ + i τ + i + υ − i . (4) These t w o quan tities [Eqs. (3) and (4)] a r e av eraged across all N songs ( i = 1 , . . . N ) to obtain ¯ P and ¯ R , resp ectiv ely . 14 Algorithm Setup 2.1 2.2 2.3 2.4 3 KM 0.66 0.6 6 0.68 0.69 n.c. SL 0.79 0.81 0 .8 8 0.89 0.78 CL 0.81 0.8 2 0.83 0.83 0.79 UPGMA 0.82 0.83 0.8 3 0.83 0.79 WPGMA 0.83 0.8 4 0.84 0.84 0.82 MO 0.80 0.8 3 0.89 0.89 0.81 PM1 0.81 0.8 3 0.88 0.89 0.81 PM2 0.77 0.7 7 n.c. n.c. n.c . PM3 0.79 0.7 9 0.87 0.88 0.76 T able 2: Accura cy F for the co ns idered algo r ithms and setups (see T able 1 for the details on the diﬀerent setups). Due to algo r ithms’ complexity , so me results were not co mputed (denoted as n.c. ). 3.3. R esults T o assess the algorithms’ accuracy w e indep enden tly optimized all p ossi- ble parameters f or eac h algorithm. This optimization w as done in-sample by a grid searc h, t r ying to maximize F on the randomly chose n songs of setups 1.1 to 1.4. Within this optimization phase, w e saw that the deﬁnition of a threshold (either d ′ Th for clustering algorithms or w ′ Th for comm unit y detec- tion algorithms) w as, in general, the only critical parameter for all algorithms (for our prop osed metho ds w e used r ′ Th b et w een 1 a nd 3). All other param- eters turned out not to b e critical for obtaining near-opt ima l a ccuracies. Metho ds that had sp ecially broad ranges of these near-optimal accuracies w ere KM, PM2, and all considered hierarc hical clustering alg o rithms. W e rep ort the out-of- sample accuracies F for setups 2.1 to 3 in T able 2 . Ov erall, the high F v alues obtained (a b o ve 0.8 in the ma jority of the cases, some o f them nearly reac hing 0.9) indicate that the considered approache s are able to eﬀectiv ely detect gr o ups of co v er songs. This allo ws the p o ssibility to reinforce the coherence within a nsw ers a nd to enhance the answ er o f a query- based retriev al system (see Sec. 4 ). In particular, w e see that accuracies for PM1 and PM3 are comparable to t he ones ac hiev ed b y the o ther algorit hms and, in some setups, ev en b etter. W e also see tha t KM and PM2 p erform w orst. 15 1.1 1.2 1.3 1.4 2.1 2.2 2.3 2.4 3 −1 0 1 2 3 Setup Time [log 10 (sec)] KM SL CL UPGMA WPGMA PM1 PM2 PM3 MO Figure 6: Average time p erfor mance for each considered setup. Alg orithms w er e run with an Intel( R) Pentium(R ) 4 CPU 2.40 GHz with 512M RAM. 3.4. Computation time In the applicatio n of these techniq ues to big real-world music collections, computational complexity is o f great imp ortance. T o qualitatively ev aluate this asp ect, w e rep ort the a v erage amount of time sp en t b y the algorithms to ach ieve a solution fo r each setup (F ig. 6). W e see that KM and PM2 a re completely inadequate for pro cessing collections with mor e than 2000 songs (e.g. setup 3). The steep rise in the time sp en t by hierarc hical clustering algorithms to ﬁnd a cluster solution for setup 3 also raises some doubts as to the usefulness of these algorithms for h uge m usic collections [ O ( N 2 log N ), Jain et al. (19 99)]. F urthermore, hierarc hical clustering algorithms, as w ell as the K M algo r it hm, tak e the full pairwise dissimilarit y matrix as input. Therefore, with a mus ic collection of, say , 10 million songs, this distance matrix migh t b e diﬃcult to ha ndle. In con trast, algo rithms based on complex netw orks sho w a b etter p erfor- mance (with the aforemen tioned exception o f PM2). More sp eciﬁcally , MO, PM1, and PM3 use lo cal informa t io n (the nearest neigh b ors of the queries), 16 while PM3 furthermore acts on a small subset of the links. It should also b e noticed that the resulting netw ork is v ery sparse, i.e. the n um b er of links is muc h low er than N 2 (Bo ccaletti et al., 2006) and, therefore, calculations on suc h graphs can b e strongly optimized b o t h in memory requiremen ts and computational costs [as demonstrated, for instance, b y Blondel et al. (2008), who ha ve applied t heir metho d to netw orks of millions of no des and links]. 4. Impro ving the accuracy through comm unity detect ion In this section we inv estigate the use of the information o btained through the detection of comm unities to increase the o v erall accuracy of a query system. 4.1. Metho d Giv en t he dissimilarit y matrix W ′ and a solution for the cluster or com- m unit y detection problem, o ne can calculate a reﬁned dissimilarit y matrix ˆ W by setting ˆ w i,j = w ′ i,j max( W ′ ) + β i,j , (5) where β i,j = 0 if s i and s j are estimated to b e in the same communit y and β i,j = c otherwise. F or ensuring songs in the same comm unit y to hav e ˆ w i,j ≤ 1 and others to hav e ˆ w i,j > 1, w e use a constan t c > 1. This reﬁned matrix ˆ W can b e used again to rank query answ ers according to cov er song similarit y and conseque ntly , when compared to the initial W ′ of the original system, to ev aluate t he accuracy increase obtained. 4.2. Evaluation metho d olo gy A common measure to ev aluate query-b y-example systems is the mean of av erage precis ions (MAP) o v er all queries (Manning et al., 2008), which w e denote as  P  . T o calculate such a measure, one av erages across eac h of the answ ers A i to queries s i , A i b eing an ascendingly ordered list according to the row s of W ′ (or ˆ W , dep ending on whic h solution w e ev aluate). More concretely , the av erage precision P i for a query song s i is calculated from the retriev ed answ er A i as P i = 1 C − 1 N − 1 X r =1 P i ( r ) I i ( r ) , (6) 17 where P i is the precision of the sorted list A i at rank r , P i ( r ) = 1 r r X l =1 I i ( l ) , (7) and I i is a relev ance function suc h that I i ( z ) = 1 if the song with ra nk z in A i is a cov er of s i , I i ( z ) = 0 otherwise. W e then deﬁne the relativ e MAP increase as ∆ = 100   D P ( ˆ W ) E  P ( W ′ )  − 1   . (8) 4.3. R esults T o a ssess the algorithms’ accuracy w e indep enden tly optimized the pa- rameters f o r eac h algorithm as explained in the previous section. How ev er, we no w try to maximize  P  instead of F . W e notice that these new thresholds can b e diﬀeren t from the ones used in Sec. 3 , therefore implying that the b est p erforming metho ds of Sec. 3 will not necessarily yield the highest incremen ts ∆. In particular, clustering and comm unity detection algorithms giving b et- ter comm unity detection and more suitable false p ositives will ac hiev e the highest incremen ts. Th us, due to the deﬁnition of ˆ W [Eq. (5)], the role o f false p o sitiv es b ecomes imp orta nt. F urthermore, due to the use of diﬀer- en t ev aluation metrics, small c hanges in the optimal parameters migh t b e necessary . T o illustrate the ab ov e reasoning regarding false p ositiv es consider the follo wing example. Supp ose the ﬁrst items o f the rank ed answ er t o a giv en query ˚ s i are A 0 i = { s j , ˚ s k , s l , s m , . . . } , where ˚ s indicates eﬀectiv e (real) mem- b ership to the same cov er song g roup. Now supp ose that clustering algo rithm CA1 selects songs ˚ s i , s j , a nd ˚ s k as b elonging to the same cluster. In addition, supp ose that clustering algorithm CA2 selects ˚ s i , ˚ s k , s l , and s m . Both clus- tering alg orithms w ould ha v e the same recall ¯ R but CA1 will ha ve a higher precision ¯ P , and therefore a higher accuracy v alue F [Eqs. (2-4)]. Then, b y Eq. (5), the reﬁned answ er fo r CA1 b ecomes A 1 i = { s j , ˚ s k , s l , s m , . . . } , the same as A 0 i . On the other hand, the r eﬁned answ er for CA2 b ecomes A 2 i = { ˚ s k , s l , s m , s j , . . . } . This implies that, when ev aluating the relativ e ac- curacy incremen t ∆ [Eqs. (6- 8)], CA2 will take a higher MAP v alue D P ( ˆ W ) E than CA1, since ˚ s k is ranke d b efore s j in A 2 i . Therefore, with regard to rel- ativ e incremen ts ∆, a nd contrastingly to a ccuracy F , CA1 will not improv e the result, while CA2 will. 18 Algorithm Setup 2.1 2.2 2.3 2.4 3 KM 2.26 2.4 0 2.06 2.29 n.c. SL 2.26 2.40 1 .1 6 2.29 2.05 CL 1.93 1.1 9 1.43 1.10 1.28 UPGMA 5.87 5.22 3.9 6 3.49 4.37 WPGMA 4.91 3.5 8 3.83 2.67 3.60 MO 6.84 5.3 7 5.14 2.94 5.54 PM1 6.15 5.7 0 4.95 3.28 5.49 PM2 5.98 4.8 5 n.c. n.c. n.c . PM3 6.05 5.1 0 3.81 2.97 4.73 T able 3 : Relative MAP increa s e ∆ for the considered setups (se e T able 1 for the deta ils on the diﬀerent setups). Due to algo r ithms’ complexity , some results were no t computed (denoted as n.c. ). W e rep o rt the o ut-of-sample a ccuracy incremen ts ∆ for setups 2.1 to 3 in T able 3. Overall, these are b et w een 3% a nd 5% for UPGMA, WPGMA, MO, and all PMs, with some of them reaching 6%. W e see tha t , in g eneral, metho ds based o n complex net w orks p erform b etter, sp ecially MO and PM1. W e also see that the inclusion of “noise songs” ( N N = 400, setups 2.3 and 2.4) aﬀects the p erformance o f nearly all algorithms (with the exception of p o orly p erfo rming ones). A further out-of- sample test w as done within the MIREX audio co v er song identiﬁcation con test. The MIR ev a lua tion exch ang e (MIREX) is an in ternational communit y-based framew ork f or the forma l ev aluation o f MIR systems a nd a lgorithms (Downie, 20 0 8). Among other tasks, MIREX allow s for an ob jectiv e assessme nt of the accuracy of diﬀerent cov er song iden tiﬁca- tion algorit hms. F o r that purp ose, pa rticipan ts can submit their alg o rithms as binary executables (i.e. as a black b o x, without disclosing an y details), and the MIREX organizers determine and publish the algor ithms’ accura- cies and run times. The underlying mus ic collections ar e nev er published or disclosed to the participants, either b efor e or af ter the con test. Therefore, participan ts cannot tune their algorithms to the mus ic collections used in the ev aluation pro cess. In the editions of 2008 and 2009 we submitted the same t w o v ersions of our system and o btained the tw o highest accuracies ac hiev ed 19 to da te 6 (Serr` a et al., 2009 c). The ﬁrst v ersion o f the system (submitted to b oth editions) corresp onded to the Q max measure alo ne, while the second v ersion (also submitted to b oth editions) comprised Q max plus PM1 7 and the dissimilarit y up date of Eq. (5). The MAP  P  ac hiev ed with the former was 0.66 while with the latt er was 0.75. This corresp o nds to a relative incremen t ∆ = 13 . 64 , which is substan tially higher t ha n t he ones ac hiev ed here with our data, most probably b ecause the setup for the MIREX task is N C = 30, C = 11, and N N = 0. Such setup might capitalize the eﬀects that communit y detection can ha v e in impro ving the accuracy . In pa r t icular, the tec hniques presen ted here hav e greater p oten tial of increasing t he ﬁnal accuracies when high cardinalities are considered. 5. The role of the original song within a cov er song commun ity F rom a m usic p erception and cognition p oin t of view, a m usical w ork o r song can b e considere d as a category (Zbik ows ki , 2002). Categories are one of the basic devices to represen t kno wledge, either b y h umans or b y mac hines (Rogers and McClelland, 2 004). Acc ording to existing empirical evidence, some authors p ostulate t hat our brain builds categories around protot yp es, whic h encapsulate the statistically most-prev alen t catego ry features, and against whic h p ot en tial category mem b ers are compared (Rosc h and Mervis, 1975). Under this view, after the listening of sev eral co v er songs, a protot yp e for the underlying m usical piece would b e abstracted b y listeners. This pro- tot yp e migh t encapsulate features lik e the presence of certain motiv es, c hord progressions, o r con trasts a mong diﬀeren t m usical elemen ts. In this scenario, new items will b e then judged in relation to the prototype, forming gr a dien ts of category mem b ership (Rosch and Mervis, 1 975). In the con text o f co v er song comm unities, w e h yp othesize that these gra- dien ts of catego r y mem b ership, in a ma jorit y of cases, might p oint to the original song, i.e. the one whic h was ﬁr stly released. In particular w e conjec- ture t ha t, in one wa y or another, all co v er songs inherit some ch ara cteristics 6 The results for 20 08 and 2009 are av ailable fro m htt p://mu sic- ir.or g/mirex/2008 and http:// music- i r.org/mirex/2009 , resp ectively . W e did not participate in the 201 0 edition b ecause the MIREX ev aluation datas et was kept the same and we did not have any new algor ithm to s ubmit. 7 W e just submitted PM1 b ecause it was the only alg orithm we had av a ilable at that time. 20 from this “or ig inal pr o tot yp e”. This feature, com bined with the fact tha t new v ersions might as w ell b e inspired b y other cov ers, leads us to infer that the original song o ccupies a c entr al po sition within a co v er song comm unit y , b eing a referen tial or “b est example” of it (Serr` a et a l., 2009b). T o ev aluate this h yp othesis we man ually c hec k for origina l v ersions in setup 3 and discard the sets that do not hav e an o r ig inal, i.e. the ones where the oldest song was no t p erformed b y the origina l artist.Here w e mak e an o v ersimpliﬁcation and assume that the most well-kno wn (or p o pular) v ersion of a song is the o riginal one. This allows us to ob jectiv ely “mark” our co ve r songs with a lab el stating if they are a ctually the original v ersion, th us av oid- ing to mak e sub jectiv e j udgmen ts ab out a song’s p opularity with regard to its co ve rs. F ollowin g this criteria, w e ﬁnd 4 26 originals out of 523 cov er sets. Through this section, we emplo y the directed w eigh ted gra ph deﬁned b y t he asymmetric matrix W (Secs. 2.1 and 2.2). Initial supporting evidence that the or ig inal song is cen tral within its comm unit y is give n b y F igs. 7 and 8. In Fig. 7, w e depict the resulting net w ork after the a pplication of a strong threshold (only using w i,j ≤ 0 . 1). W e see that communities are w ell deﬁned and also that many o f the or iginal songs are usually “the center” of their communities . In Fig. 8, t w o cum ulativ e distributions ha v e b een calculated: one fo r t he w eights o f links exiting an original song (p erformed by t he orig inal artist, blac k solid line), and one for links exiting co v ers (p erformed by the or ig inal artist or anot her one after the original recording w as made, blue dashed line). The plot of these cum ulativ e distributions indicates that original songs tend to b e connecte d to other no des through links with smaller w eigh ts, that is, low er dissimilarities. T o ev aluate the afor ementioned h yp othesis in a more formal wa y , we pro- p ose a study of the abilit y t o a utomatically detect the origina l vers ion within a comm unit y of co ve rs. T o this exten t, w e consider an “ideal” commu nity de- tection algorithm (i.e. a n algor ithm detecting co v er song comm unities with no false p ositiv es and no false negativ es) and prop ose t w o diﬀeren t meth- o ds. These metho ds are based on the structure of w eigh ts of the obtained sub-net w ork after the ideal comm unity detection algorithm has b een applied. Closeness cen trality This algorithm estimates the centralit y (Bo ccaletti et al., 2006; Barrat et al., 2004) of a no de b y calculating the mean pat h length b et w een that no de, and an y other no de in the sub-net w ork. Note that the sub-net w ork is fully connected, as no threshold has b een applied in this phase. Therefore, the shortest path is usually the direct one. 21 Figure 7: Graphical r epresentation of the cov er so ng netw or k with a thr eshold o f 0 . 1. Original songs are dr awn in blue, while covers are in black. Mathematically , let W ( k ) b e t he sub-net work con taining the k -th co v er song communit y . Then t he index l of the original (or prototype) song s ( k ) l of the k -t h communit y corresp onds to l = arg min 1 ≤ i ≤ C ( k )        C ( k ) X j =1 j 6 = i w ( k ) i,j        , (9) where C ( k ) is the cardinalit y of the k -th co ve r song commun ity . Notice that a similar metho dology is emplo y ed in the clustering con text to infer the medoid of a cluster (Xu a nd W unsc h I I, 2009; Jain et al., 1 999). MST cen trality In this second algorithm w e reinforce the role of cen tral no des. First, w e calculate the minim um spanning tree (MST) for the 22 0 . 3 7 1 . 0 0 2 . 7 2 0 . 0 0 0 0 2 0 . 0 0 0 0 5 0 . 0 0 0 1 2 0 . 0 0 0 3 4 0 . 0 0 0 9 1 0 . 0 0 2 4 8 0 . 0 0 6 7 4 0 . 0 1 8 3 2 0 . 0 4 9 7 9 0 . 1 3 5 3 4 0 . 3 6 7 8 8 1 . 0 0 0 0 0 P r o b a b i l i t y ( P k ) L i n ks w e i g h t O r i g i n a l s C o ve r s Figure 8: Cum ulative w eights dis tributions for links in the netw ork, divided b etw een links outgoing fro m an original s ong (black solid line) a nd from a cover so ng (blue dashed line) songs. sub-net w ork under analysis. After that, w e apply t he previously de- scrib ed closeness cen tra lity [Eq. (9)] to the r esulting graph. The results in T able 4 sho w the p ercen ta ge of hits and misses for the detection of original songs in dep endence of the cardinalit y of the consid- ered cov er song communit y . W e rep ort r esults for C b etw een 2 and 7 (the cardinalities for whic h our mus ic collection has a represen tativ e n um b er of comm unities N C ). The p ercen t age o f hits and misses can b e compared to the n ull hy p o thesis of rando mly selecting one song in t he comm unit y . W e observ e that, in general, accuracies are around 50% and, in some cases, they reac h v alues of 60%. An accuracy of exactly 50% is obtained with C = 2 b y b oth the n ull hypothesis and t he MST cen trality alg orithm. This is b e- cause the MST is deﬁned undirected, and there is no w a y to discriminate the original song in a sub-net work of t wo no des. As so o n as C > 2, accuracies b e- come greater than the n ull h yp othesis and statistical signiﬁcance arises. Sta- 23 Algorithm C 2 3 4 5 6 7 Closeness cen trality 59.4 ∗∗ 53.6 ∗∗ 43.1 ∗ 60.5 ∗∗ 48.0 ∗∗ 27.2 MST cen tr a lit y 50.0 52.4 ∗∗ 60.7 ∗∗ 52.6 ∗∗ 48.0 ∗∗ 63.6 ∗∗ Null h yp othesis 50.0 33.3 25.0 20.0 16.7 14.3 N C 190 82 51 38 2 5 11 T able 4: Percen tage o f hits and mis ses for the original song detectio n task depending on the ca rdinality C of the cover song communities. The ∗ and ∗∗ symbols denote statistical signiﬁcance at p < 0 . 05 and p < 0 . 01, r esp ectively . The last line shows N C , i.e. the num b er of communit ies for each car dinality . tistical signiﬁcance is assess ed with the binomial test (Kv am and Vidako vic , 2007). With this exp eriment w e sho w that the original song tends to o ccup y a cen t ral p osition within its gro up and, therefore, that a measure of cen trality can b e used to discriminate it from a group of co ve rs. The same concepts of cen trality ma y b e v a lid for a lternativ e dissimilarit y measures represen ting m usical asp ects suc h a s tim br e, rhy thm, or structure (c.f. Do wnie, 2008; Casey et al., 2008). Th us, one could think of incorp orating information from these o ther asp ects of the audio conten t in order to impro v e the a ccuracy o f the task. A more complicated, if not imp ossible, t a sk w ould b e to detect the original song in a pairwise basis. T o this exten t, works on mo deling court decisions lik e the ones from M ¨ ullens iefen and P endzic h (2009) come closer. In general, for detecting original songs, info r ma t ion coming from the audio con ten t alone ma y b e insuﬃcien t. Essen tia l t emp oral asp ects (in a historical sense) are absen t in suc h information and, for incorp orating t hem, we should gather data from cultural and editorial sources. This go es without sa ying that, probably , high accuracies are unreac hable and, more imp ortantly , that the concept of originality is a v ery particular one, placed in a sp eciﬁc cultural con text and epo c h. Indee d, the digital rev olutio n of the last y ears is beginning to question suc h a concept (Fitzpatrick, 2 0 09). 6. Conclusions In this article w e built and analyzed a m usical netw ork reﬂecting cov er song communities , where no des corresp o nded to diﬀeren t audio recordings and links b etw een them represen ted a measure of resem blance b etw een their m usical conte nt. In additio n, w e analyzed the p ossibilit y of using suc h a 24 net w ork to apply diﬀeren t comm unity detection algorithms to detect coher- en t groups of co ve r songs. Three vers ions of suc h algor it hms w ere prop osed. These algo rithms achiev ed comparable a ccuracies when compared to exist- ing state-of- t he-art metho ds, with similar or ev en faster computation times. F urthermore, w e prov ide evidence that t he know ledge acquired thro ug h com- m unit y detection is v a luable in impro ving the raw results of a query-based co v er song identiﬁcation system. Finally , w e discussed a pa rticular outcome from considering cov er song comm unities, namely the a nalysis of the role of the original song within its cov ers. W e sho w ed t ha t the orig inal song tends to o ccup y a cen tral p osition within its group and, therefore, that a mea- sure o f centralit y can b e used to discriminate original from co v er songs when the sub-net w ork of these comm unities is considered. T o the b est of authors’ kno wledge, the presen t work is the ﬁrst attempt done in this dir ection. In the light of these results, complex netw orks stand as a pro mising re- searc h line within the sp eciﬁc t ask of co v er detection; but, at the same time, the pro p osed a pproac h can b e a pplied to any query-b y-example IR system (Baeza-Y ates and Rib eiro-Neto, 1 999; Manning et al., 2008), and esp ecially to other query-b y-example MIR systems (Downie, 2008; Casey et al., 200 8). Ac knowled gmen ts The authors thank Justin Salamon fo r his review and pro ofreading. This w ork has b een supp orted by the follow ing pro jects: Classical Planet (TSI- 070100- 2009-40 7; MITYC) and DR IMS (TIN2009- 14247-C02 -01; MICINN). References References Baeza-Y ates, R ., R ib eiro-Neto, B., 1999. Mo dern infor ma t ion retriev al. ACM Press, New Y ork, USA. Barrat, A., Barth ´ elem y , M., P astor- Satorras, R., V espignani, A., 2004 . The arc hitecture o f complex w eighted net works . Pro c. of the Natio na l Academ y of Sciences 101, 374 7 . Blondel, V. D., Guillaume, J. L., Lam biotte, R., Lefeb vre, E., 2 008. F a st un- folding of comm unities in la rge netw orks. Journal of Statistical Mec hanics 10, 10008. 25 Bo ccaletti, S., Latora, V., Moreno, Y., Chav ez, M., Hw ang, D .-U., 2006. Complex netw orks: structure and dynamics. Ph ysics Rep orts 4 24 (4), 175– 308. Buld ´ u, J. M., Cano, P ., Kopp en b erger, M., Almendral, J., Bo ccaletti, S., 2007. The complex netw ork of music al ta stes. New Journal of Ph ysics 9, 172. Cano, P ., Celma, O., Kopp en b erger, M., Buld ´ u, J. M., 2006. T op ology of m usic recommendation net works. Chaos: an In terdisciplinary Journal of Nonlinear Science 16 ( 1 ), 013107. Casey , M., V eltk amp, R. C., Goto , M., Leman, M., Rho des, C., Slaney , M., 2008. Con tent-based m usic information retriev al: curren t directions and future c hallenges. Pro ceedings of the IEEE 96 (4), 668–6 96. Costa, L. d. F ., Oliv eira, O. N., T ra vieso, G., Ro drigues, F. A., Villas Boas, P . R ., Antique ira, L., Viana, M. P ., Correa da Ro c ha, L. E., 2008. Ana lyz- ing and mo deling real-w orld phenomena with complex netw orks: a surv ey of applicationsW orking man uscript, arXiv:071 1.3199v2. Av ailable online: http://arxi v.org/abs/0711.3199 . Danon, L., D ´ ıaz-Aguilera, A., Duch , J., Arenas, A., 2 005. Comparing com- m unit y structure iden tiﬁcation. Journal of Statistical Mec hanics 9, 09008. Do wnie, J. S., 20 08. The m usic info rmation retriev al ev aluatio n exc hang e (2005–20 0 7): a windo w into music infor ma t ion retriev a l researc h. Acousti- cal Science and T ec hnology 29 (4), 247–255. Ec kmann, J. P ., Kamphorst, S. O ., Ruelle, D ., 1987 . Recurrence plots of dynamical systems. Europhys ics Letters 5 , 9 73–977. Fitzpatric k, K., 2009. The digital future of autho r ship: rethinking originality . Culture Mac hine, v ol. 12, 6. F ortunato, S., Castellano, C., 2009. Comm unity structure in graphs. In: Mey- ers, R. A. (Ed.), Encyclop edia of complexit y and system science. Springer, Berlin, Germany , pp. 114 1–1163. G´ omez, E., 2006. T onal description of mus ic audio signals. Ph.D. the- sis, Univ ersitat P omp eu F a bra, Barcelona, Spain, Av a ilable online: http://mtg. upf.edu/node/472 . 26 Jain, A. K., Murty , M. N., Flynn, P . J., 1999. Data clustering: a review. A CM Computing Surv eys 31 (3), 264–32 3. Kan tz, H., Sc hreib er, T., 2004. Nonlinear time series analysis, 2nd Edition. Cam bridge Univers ity Press, Cambridge, UK. Kv am, P . H., Vidak ov ic, B., 2007 . Nonparametric statistics with applications to science and engineering. John Wiley and Sons, Hob ok en, USA. Lagrange, M., Serr` a, J., 2010. Unsup ervised accuracy impro v emen t for co ve r song detection using sp ectral connectivit y net w ork. Pro c. of the Int. So c. for Music Information Retriev al, pp. 59 5 –600. Latora, V., Marc hiori, M., 2001. Eﬃcien t b ehav ior of small-world net w orks. Ph ysical R eview Letters 87, 198 701. Manning, C. D., Pra bhak ar, R ., Sc hutze , H., 2 008. An intro duction to infor - mation retriev al. Cam bridge Univers ity Press, Cam bridge, UK. Marw an, N., Romano, M. C., Thiel, M., Kurths, J., 200 7. R ecurrence plots for the analysis of complex systems . Ph ysics R ep orts 438 (5), 237 –329. M ¨ ullens iefen, D., Pe ndzic h, M., 2009. Court decisions on m usic plagiarism and the predictiv e v alue of similarit y algorithms. Musicae Scien tiae, D is- cussion F orum 4B, 20 7 –238. Resnic k, P ., V arian, H. L., 1997. Recommender systems . Communic atio ns of the A CM 4 0 (3 ), 56–58. Rogers, T. T., McClelland, J. L., 2004. Seman tic cognition: a parallel dis- tributed pro cessing approac h. MIT Press, Cam bridge, USA. Rosc h, E., Mervis, C., 1975. F amily resem blances: studies in the internal structure of categories. Cognitiv e Psyc hology 7, 5 7 3–605. Saho o, N., Callan, J., Krishnan, R., Duncan, G., Padman, R., 200 6. Incre- men tal hierarc hical clustering of text do cumen ts. In: Pro c. of the A CM In t. Conf. on Information and Knowledge Managemen t . pp. 3 57–366. Serr` a, J., G ´ omez, E., Herrera, P ., 201 0. Audio co v er song iden tiﬁcation and similarit y: back ground, approaches , ev aluation, and b ey ond. In: Ras, 27 Z. W., Wieczork owsk a, A. A. (Eds.), Adv ances in Music Infor ma t io n Re- triev al. V ol. 16 of Studies in Computat io nal In telligence. Springer, Berlin, German y , Ch. 14, pp. 307–3 32. Serr` a, J., Serra, X., Andrzejak, R. G., 2009a. Cross recurrence quan tiﬁcation for co v er song iden tiﬁcation. New Jour na l o f Ph ysics 11, 093017. Serr` a, J., Zanin, M., Andrzejak, R. G., 2009c. Cov er song retriev al by cross recurrence quantiﬁcation and unsup ervised set detection. Music Infor ma- tion Retriev al Ev aluation eXc hange (MIREX) extended abstract. Serr` a, J., Z anin, M., Laurier, C., Sordo, M., 2009b. Unsup ervised detection of co v er song sets: accuracy improv emen t and origina l identiﬁcation. Pro c. of the In t. So c. for Music Informatio n Retriev al, pp. 225–230. T eitelbaum, T., Ba lenzuela, P ., Cano, P ., Buld ´ u, J. M., 2008. Comm unity structures and role detection in music net w orks. Chaos: an In terdisci- plinary Journal of Nonlinear Science 18 (4), 04 3105. Xu, R., W unsc h I I, D. C., 20 0 9. Clustering. IEEE Press, Piscata w ay , USA. Zbik ow ski, L. M., 2 0 02. Conceptualizing mus ic: cognitiv e structure, theory , and analysis. Oxford Univ ersit y Press, Oxford, UK. Zhao, Y., Karypis, G ., 2 0 02. Ev aluation of hierarc hical clustering algo rithms for do cumen t datasets. In: Pro c. o f the Conf. on Kno wledge Disco v ery in Data (KDD ). pp. 5 1 5–524. 28

Characterization and exploitation of community structure in cover song networks

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment