Self-Organizing Maps with Variable Input Length for Motif Discovery and Word Segmentation

Self-Or ganizing Maps with V ariable Input Length for Motif Discov ery and W ord Segmentation Raphael C. Brito, Member , IEEE , and Hansencle ver F . Bassani, Member , IEEE Center of Informatics - CIn, Univ ersidade Federal de Pernambuco, Recife, PE, Brazil, 50.740-560 Email: { rcb7,hfb } @cin.ufpe.br Abstract —Time Series Motif Discovery (TSMD) is deﬁned as searching for patterns that are previously unknown and appear with a gi ven fr equency in time series. Another problem strongly related with TSMD is W ord Segmentation. This problem has recei ved much attention from the community that studies early language acquisition in babies and toddlers. The development of biologically plausible models for word segmentation could greatly advance this ﬁeld. Therefor e, in this article, we propose the V ariable Input Length Map (VILMAP) for Motif Discovery and W ord Segmentation. The model is based on the Self-Organizing Maps and can identify Motifs with different lengths in time series. In our experiments, we show that VILMAP presents good results in ﬁnding Motifs in a standard Motif discovery dataset and can av oid catastrophic for getting when trained with datasets with increasing values of input size. W e also show that VILMAP achieves results similar or superior to other methods in the literature developed for the task of word segmentation. Index T erms —self-organizing maps, variable input length, subspace clustering, motif discovery . I . I N T R OD U C T I O N Motifs are described in the literature as recurrent patterns, frequent tendencies, successions, forms, episodes, or frequent subsequences that occur in time series [1]. Motif Discovery methods search for previously unknown frequent patterns in a time series [2]. The task can be seen as a time series clustering problem, assuming that each cluster must group together patterns in the time series that represent the same Motif. The problem of W ord Segmentation in transcription of ﬂuent speech can be seen as a Motif discov ery problem in which the words are the Motifs of a time series composed of phonemes. This problem has received much attention from the community that studies early language acquisition in babies and toddlers and is one of the focus of the present article. The dev elopment of biologically plausible models for word segmentation could greatly advance this ﬁeld. The Self-Organizing Map (SOM) [3] is a biologically inspired neural network, frequently applied for visualizing high dimensional data by compressing the information while preserving the topological relations and capturing the most important characteristics of the input data. Howe ver , it ﬁnds applications in many other types of problems, such as surface reconstruction [4] and data clustering [5]. It is also worth mentioning that SOM has been applied to a v ariety of prob- lems in volving sensory processing, including visual [6] and auditory information [7]. The original SOM [8] deﬁnes a map composed of a set of nodes, or prototypes, that compete and cooperate to represent a certain region of the input space. The nodes are usually organized in a ﬁxed bi-dimensional grid and the model employs an Euclidean distance for comparing the input patterns with each node in map, what may not be adequate for clustering high-dimensional datasets, due to the curse of dimensionality [9] or in datasets in which not all dimensions are equally relev ant for the different clusters, such as in subspace clustering [10], [11] and Motif discovery . Howe ver , new models based on SOM were dev eloped for improving its performance in such scenario [5], [12]. The Local Adaptiv e Receptiv e Field Dimension Selective Self-organizing Map (LARFDSSOM), proposed in [5], is an example that presented good results for the task of subspace clustering. The model achie ves this by applying dif ferent weights for each input dimension for each cluster . These characteristics enable a new range of clustering related appli- cations beyond subspace clustering, in which Motif Discov ery is included. Ho wever , in Motif disco very , there is a demand for methods that can work with dif ferent sizes of samples [13] in order to allow the discovery of Motifs with different lengths. LARFDSSOM was not dev eloped for this case and to the best of our knowledge, there is no SOM-based method for clustering data with supporting inputs with different length. In addition to that, the V ariable Input Length MAP (VILMAP) was de veloped to extend LARFDSSOM to make it possible clustering patterns with dif ferent sizes. VLIMAP takes advantage of the self-adjustable input weights of LARFDSSOM for allowing the input samples with a variable regarding the number of input dimensions. Therefore, when VILMAP is trained with patterns of dif ferent sizes, it generates a map with prototype nodes that hav e different sizes. In our experiments, we show that VILMAP is able to ﬁnd Motifs in a standard Motif discovery dataset (the GunPoint dataset [14]). Additionally , we veriﬁed that VILMAP av oids catastrophic for getting when trained with datasets with increas- ing input sizes. Finally , we also sho w that VILMAP achiev es results similar or superior to other methods in the literature dev eloped for the task of word segmentation. The rest of this paper is organized as follo ws: Section II presents the related work; Section III describes the proposed model; Section IV presents the experimental setup and the obtained results; and ﬁnally , in Section V, we present our ﬁnal considerations. I I . R E L A T E D W O R K In the Section II-A a description of Motif Discov ery is presented with a method of the area; already in Section II-B, will be explained how the Self-Organizing Map works with Motif Discovery problem; and ﬁnally , in the Section II-C four methods that are applied to the word segmentation area will be displayed. A. V ariable Motif Discovery A time series (TS), S can be deﬁned as a list S = ( s 1 , s 2 , ..., s n ) of real-valued variables, where n is the total size of the series, that is, the number of points in the series. Motifs are called the frequent patterns in a time series, which are pre viously unkno wn, while the search for these patterns is called Motif Discov ery [2]. Deﬁning R as a threshold that establishes a minimum allowed similarity or the maximum distance allowed between two occurrences of a Motif, there are two major deﬁnitions for the problem of T ime Series Motif Discovery (TSMD), according to Mueen (2014) [15]: Similarity-based Deﬁnition : Given a time series and its length, the time series Motifs are the segments repeated in the order of their similarities between repeated occurrences within a range R . Support-based Deﬁnition : Gi ven a time series and its length, time series Motifs are the segments that hav e the most number of repetitions within a range R . The V ariable-Length Motif Discovery (VLMD) [13] is a method that has been proposed to automatically ﬁnd a suitable set of v ariable length Motifs. VLMD iterati vely separates Mo- tifs of different sizes into groups based on similarity . Within a group, a representative Motif with a normalized minimum distance between pairs of subsequences is selected. Finally , the VLMD returns a set of useful representati ve Motifs, which is extremely small compared to all possibilities of sliding window lengths. The operation of VLMD consists of two steps. Firstly , it ﬁnds a set of groups of Motifs looking for all possible lengths of a sliding window to obtain Motifs of different lengths. If the current Motif and the previous Motif ov erlap, the current Motif is added to the same set of the previous Motif; Otherwise, a ne w Motif group is created. Then, for each group of Motifs, a representative Motif is selected with a minimum normalized distance for the others in the group. Therefore, the VLMD can return a small set of motifs from a gi ven time series sequence; this method does not present an application in W ord Segmentation area, and we not compare with this method because [13] did not provide enough detailed explanation to reproduce the experimental results. B. Self-Organizing Maps for Motif Discovery The Self-Org anizing Map (SOM) (Fig. 1), proposed by [16], is a neural network that maps a high-dimensional data into a smaller , usually bi-dimensional grid of N nodes (or neurons), compressing information while preserving the topological re- lationships of the original data. The nodes in the grid, which position in the input space is c j =1 ..N , participate in a winner- takes-all competition to represent each input stimulus, x . The nearest node to an input stimulus, c s (the winner node), is slightly moved to approximate the x . The neighbors of the winner node in the grid are adjusted as well (cooperation), in a smaller scale, to establish a topological relationship in the grid that reﬂects what is observed in the input space. After training, similar stimuli are clustered in the same node on the grid or in topologically near nodes. Fig. 1. The basic structure of a SOM. Where x , is the input pattern, x i is the values of the i -th input layer node. c j i , represents weight between with the j -th node in the output layer (organization layer) with the i -th node in the input layer . In this conﬁguration, each node in the output layer is directly connected with four neighbors on the rectangular grid. LARFDSSOM is a model based on SOM that has a time- variant structure, in which nodes are inserted or remov ed from the map during the self-org anization process, whene ver it is required for to better representing the patterns in input space. Also, in LARFDSSOM, nodes can apply different relev ances for each input dimension and perform the adjustment of the receptiv e ﬁelds as a function of the local v ariance observed in the input data. The operation of the map takes place in three phases: the organization phase, the conv ergence phase, and the clustering phase [5]. In the organization phase, nodes are dynamically inserted, removed and adjusted in the map in order to cov er the regions of the input space in which the input patterns are found as well as possible. When an input pattern is presented for the network, a lev el of acti vation is computed for each node in the map, and the node with the highest activation is considered the winner of the competition. This activ ation level is an in verse function of the distance between the center of the node and the input pattern. An acti vation threshold, a t , is used for determining when if the winner node is close enough to be adjusted or if it is necessary to insert a new node in the map. The nodes that do not cluster a signiﬁcant percentage of the input patterns are periodically remov ed. In the con v ergence phase, the nodes are also updated and removed when required, similar as in the organization phase. Ho wever , there is no insertion of new nodes. This phase ﬁnishes when the number of nodes in the map stops decreasing. After con ver gence phase, the information stored on the map can be used for clusters new input patterns (clustering phase). All nodes with an acti vation equal to or greater than the threshold a t for a speciﬁc input pattern are considered as clustering it. Finally , as well as in SOM, the input layer of LARFDSSOM has a ﬁxed length, thus, it does not allow the increase or decrease of the sample sizes during the self-organization process. C. W ord Se gmentation Unlike the spacing provided in the written text, the spoken words are rarely delimited by pauses, so children must learn somehow to identify the boundaries between words as they hear the sentences. Because the structure of words is signif- icantly variable in all languages, it is difﬁcult to know how a child between 9 and 15 months of age achiev e this ability . Therefore, segmentation is a key step in language acquisition, speciﬁcally in lexical development [17]. It is worth pointing out that ﬁnding the e xact boundaries of all words in a sentence is not strictly necessary for children to understand what is said. They actually need to recognize the words present in the sentence (simple or compound) in the correct order , what can be done despite the occurrence of certain boundary detection errors, as it is observed in young children [18]. In the present work, we reproduce the e xperiments described in [19] and compared the results we obtained with the follo w- ing four algorithms of W ord Segmentation: 1) DiBS: Diphone-Based Segmentation (DiBS) [20] is based on phonetic properties and keeps in memory the fre- quency of two phonemes that occur together and decide to place a boundary between them by computing Bayes’ theorem. T o do this, the model implies certain assumptions: the learning algorithm knows the phonetic categories, it is able to detect expression boundaries, it assumes phonological independence through word boundaries, it traces the free distribution of context of the diphones and knows the relativ e frequency of word forms already learned. Therefore, it uses the local statistical clues in order to determine the word boundaries. 2) TPs: Another local statistical model is based on tracking T ransitional Probabilities (TPs) [21] ov er syllables which posit a boundary between two syllables if its co-occurrence proba- bility is locally lowest (relati ve threshold) or is lower than an absolute threshold, usually computed by taking the averaged value of syllables pairs. TPs demands a larger memory as the number of all possible syllables encountered are much greater than the number of all possible phonemes if compared with DiBS. 3) PUDDLE: The algorithm Phonotactics from Utterances Determine Distrib utional Lexical Elements (PUDDLE) [22], incrementally creates a lexicon using information about ex- pression limits and deducing phonetic constraints. More pre- cisely , each time a sequence of phonemes is found if corre- spondence with a word in the proto-lexicon is found and if certain phonological constraints are respected, the chunk of phonemes is added to the proto-lexicon. The initial phoneme pairs and endings are added, respectiv ely , to lists of pairs of initial and ﬁnal phonemes previously encountered. 4) AGu: Unigram Adapter Grammar (AGu) [23] models an ideal learner , that is, a learner who has an inﬁnite memory and a batch process, observing the whole corpus before segmenting. The structure consists of two modules: a lexical generator and an adapter . The ﬁrst generates a lexicon of items that are likely to be found in the corpus and the second assigns item frequencies. Importantly , unigram AG assumes that lexical items are generated independently of each other and that the stochastic process is chosen so that the frequencies of the items follo w a power la w distribution as found in natural language. In the next section, the proposed method (VILMAP) will be presented aiming at addressing the problems of word segmentation. I I I . V A R I A B L E I N P U T L E N G T H M A P VILMAP 1 is a model based LARFDSSOM [5] that is capable of receiving stimuli from an unknown number of dimensions. The proposed model inherits some important features from LARFDSSOM: 1) The ﬁrst one is the ability of the model to adapt its structure as new patterns are presented ov er time; 2) The second feature inherited by VILMAP is the capacity that the nodes have to apply different relev ances for each input dimension; 3) the recepti ve ﬁeld of the nodes can be adapted during the self-organizing phase of the model. The VILMAP is composed of two phases: Self-organization phase Alg. 1 and Clustering phase Alg. 2. The con ver gence phase that exists on the map was not inherited for VILMAP because we aim to work in an online way . As it occurs in the standard SOM, the organization phase of VILMAP is composed of three steps: 1) competition, 2) adaptation, and 3) cooperation. Similarly , as in LARFDSSOM, when a ne w input pattern is presented, a competition occurs among the nodes, and the winner is the most active node according to a radial base function (see line 5 of the Alg. 1). Howe ver , a shift operation is employed to compare the input pattern signal with the information stored on the prototype, the shift operation veriﬁes the input pattern signal in all possible position shifted. After this process, a winner node will be obtained, to be the chosen, it needs to have the highest value in one of his shifts. Then, it is veriﬁed if its activ ation is abov e a threshold parameter , a t , when it happens, a new node is created with its center at the same position as the input pattern. On the other hand, if the node acti vation is greater than a t , the adaptation and cooperation steps take place: the winner node is updated to approximate of the input pattern (adaptation) as well as their neighbors (cooperation). This procedure is detailed in Section III-B. After one execution of the organizing phase, the clustering phase can be performed. In VILMAP the clustering procedure associates the input pattern to only one cluster , and it is 1 A v ailable at: https://github.com/RaphaBrito/VILMAP represented by the node on the map that returned the highest activ ation. The clustering procedure is detailed in Section III-C. Algorithm 1: Self-Organization Phase 1 Initialization of parameters a t , e b , e n , β ,  ds , N max , D min , D max and minw d ; 2 Initialize the map with one node with c j initialized at the ﬁrst input pattern, δ j ← 0 and ω s ← 1 ; 3 while have input pattern do 4 Input pattern x is presented to the Map; 5 The activ ation between x and all nodes is calculated according to Eq. 1; 6 Find the winner s with the highest activ ation ( a s ) conforming to Eq. 3; 7 if siz e ( s ) < siz e ( x ) then 8 updateNodeDimension(s, x) described in the Section III; 9 if a s < a t and N < N max then 10 Create ne w node j and set: c j ← x and δ j ← 0 ; 11 Connect j to the other nodes; 12 N ← N + 1 ; 13 else if a s ≥ a t then 14 Update the distance vectors δ s of the winner node and of its neighbors; 15 Update the relev ance vectors ω s of the winner node and of its neighbors; 16 Update the weight vectors c s of the winner and of its neighbors; Algorithm 2: Clustering Phase 1 while have input pattern do 2 Input pattern x is presented to the Map; 3 The activ ation between x and all nodes is calculated according to Eq. 1; 4 Find the winner s with the highest acti vation ( a s ) that is calculated using Eq. 3 ; 5 if a s ≥ a t then 6 Assign x to the cluster with the index of the winner node s ; A. Competition and Insertion of Nodes Each node j in the VILMAP represents a cluster associated with three m -dimensional vectors, where m is the current number of input dimensions. 1) the ﬁrst dimension is the center of the vector c j = { c j i , i = 1 , ..., m } that represents the cluster prototype j in the input space; 2) the second is the relev ance vector ω j = { ω j i , i = 1 , ..., m } , which stores the weights (varying between [0 , 1] ) that the node j applies to the i -th input dimension; 3) and the third is the distance vector δ j = { δ j i , i = 1 , ..., m } , which stores a moving average of the distances obtained from the input patterns x and the center of the vector | x − c j ( n ) | , which is only used to compute the relev ance vector as in [5]. A radial basis function of a weighted distance Eq. 1 is used to calculate the acti vation of a node in the VILMAP . The nodes receptiv e ﬁelds are adjusted as a function of the weighted distance among the prototype, the input pattern, and the summation of the relev ance vector P m i =1 ω j i . Lo wer distances and higher rele vances result in a higher acti vation (Eq. 1). The equation of activ ation is ac ( D ω ( x, c j ) , ω j ) = P m i =1 ω j i P m i =1 ω j i + D ω ( x, c j ) +  (1) where  is a very small value to avoid division by zero and D ω ( x, c j ) , is the weighted distance shown in Eq. 2 as proposed in [12]. D ω ( x, c j ) = v u u t m X i =1 ω j i ( x i − c j i ) 2 (2) When a node is created, its center c j is initialized at the position of the last input pattern presented to the map. The relev ance vector ω j is initialized as an array of ones and the distance vector is initialized as an array of zeros. These vectors are updated in the adaptation and cooperation steps, presented in the Section III-B. As in LARFDSSOM, the winner of a competition s ( x ) , is the node that presents the highest activ ation value related to the input pattern, as deﬁned in the Eq. 3. S ( x ) = ar gmax j [ ac ( D ω ( x, c j ) , ω j )] (3) VILMAP has an activ ation threshold a t . If the winner node activ ation is below a t , a new node is created at the same position of the input pattern and the winner node is not modiﬁed. Otherwise, when the winner node activ ation is abov e a t , the winner node and its neighbors are updated, as described in the next section and sho wn in Alg. 1. Therefore, the activ ation threshold a t affects the ﬁnal number of nodes in the map. Since VILMAP can receiv e inputs with different sizes, it is necessary to develop a new procedure for computing the distance Eq. 2 the input patterns with the nodes in the map. 1) r egular comparison : if the node size in length is equal to the size of the input pattern, then the calculation of the distance is straightforwardly calculated as in LARFDSSOM (Eq. 2), with node and input pattern completely aligned. 2) sliding windo w comparison : If the winner node is greater in length than the input pattern, the information stored in the node will be compared with each part of the input pattern, as in a sliding windo w approach, shifting one position at a time. Then the activ ation is computed for each position according to the (3) and the highest activ ation value is returned as a result at the end of the process. This is illustrated in Fig. 2 in which two displacements are possible when the length of the pattern is 4 and of the node is 5, thus, two activ ations are computed. In this example, the node achiev es a greater activ ation with the displacement B. Fig. 2. Example of how the activ ation is calculated in a binary dataset for an input pattern smaller than the node. 3) truncated comparison : If the node is smaller in length than the input pattern, the activ ation is calculated only on the initial part of the input pattern and the ﬁnal part of the stimulus is disregarded as illustrated in Fig. 3. In this case, if a node that has undergone through this procedure becomes a winner and has an activ ation above a t , it will have its dimension updated: all vectors of the winner node grow to the same size as of the input pattern. Then, the new elements of the vector c j are initialized with the same values of the input pattern x ; the new values of vector δ j are initialized with 0.0; and the new elements of the vector ω j are initialized with an intermediary value 0.5. Fig. 3 also illustrates such dimension update procedure. 0 1 1 1 0.87 0.83 0.53 0.79 Winner node 0 1 0 1 1 0101 Input pattern New winner dimension is added 0 1 1 1 1 0.87 0.83 0.53 0.79 0.50 Fig. 3. The example illustrating the node vectors dimension update. B. Updating the W inner Node and Its Neighbors VILMAP can receiv e stimuli of dif ferent sizes at any moment. Hence, the m -dimensional v ectors of the winner node might have a different size of the input pattern. Here again, three situations are possible: 1) r egular update : if the node size in length is equal to the size of the input pattern, the node is updated straightforwardly as in LARFDSSOM, with node and input pattern completely aligned. 2) update without the end : If the winner node is higher in length than the input pattern, the update is done by ﬁrst aligning the pattern with the ﬁrst position of the winner node. Then, the vectors are updated as in LARFDSSOM, disregarding the parts that lie out of the vectors. 3) truncated update : If the node is smaller in length than the input pattern, then, ﬁrst, the dimensions of vectors associated with the node are update as follo ws: all vectors of the node grow to the same size as of the input pattern. Then, the new elements of the vector c j are initialized with the same values of the input pattern; the new positions of vector δ j are initialized with 0.0; and the new elements of the vector ω j are initialized with an intermediary value (0.5 assuming data normalized in 0-1 interval). Fig. 3 also illustrates such dimension update procedure. Finally , the aligned vectors are updated as in LARFDSSOM. C. Clustering with VILMAP After the organization phase, the information stored in each node of the VILMAP can be used to cluster the test input patterns. Algorithm 2 presents the clustering procedure. Each node in the map deﬁnes one cluster and all test patterns for which a certain node is the winner are clustered together . In the clustering phase, the activ ation of the nodes is calculated in the same way as described in Section III-B. D. Setting P ar ameters for the Model Parameter adjustment is performed as in LARFDSSOM. T able I shows the ranges used for each parameter . The most important parameter for VILMAP is a t because it directly inﬂuences the number of nodes that will be created. The smaller the a t , the fewer nodes will be inserted in the map, since with the high threshold the nodes will recognize fewer input patterns, causing more nodes created on the map. T ABLE I P A R A M ET E R R A N GE S F O R V I L M AP Parameters max min Activ ation threshold ( a t ) 0.70 0.999 Relev ance rate β 0.001 0.5 W inner learning rate ( e b ) 0.0001 0.01 Neighbors learning rate ( e n ) 0.002 1.0 × e b Connection threshold ( minw d ) 0.001 0.5 Relev ance smoothness (  ds ) 0.01 0.1 I V . E X P E R I M E N T S Three experiments were performed in order to ev aluate the proposed approach: the ﬁrst experiment (Section IV -A) aims to verify the performance of VILMAP in a standard TSMD problem, the GunPoint dataset [14]; the second experiment (Section IV -B) ev aluates the ability of the model in learning sequences of phonemes of increasing size without degrading its performance; Finally , the third experiment (Section IV -C) compares the performance of the model with the other methods in the literature dev eloped for W ord Segmentation. (a) Procedure A (b) Procedure B Fig. 4. Training and test procedures designed to verify that VILMAP is able to avoid catastrophic forgetting ev en after being trained with dif ferent dimensions. A. Experiment 1 GunPoint is a dataset that in volv es one female actress and one male actor moving they are to form a gun point gesture. Some characteristics of the Gun-Point dataset are: train size = 50; test size = 150; size of each input pattern: 150 points; and number of classes = 2. The TSMD problem is to identify the two motifs of male and female actors. The main purpose of this experiment is to show that the VILMAP can solve problems of ﬁxed size in the literature, even being created to receive input patterns with different dimensions. In this experiment the parameters were adjusted by trial and error as follo ws: a t = 0 . 702 , e b = 0 . 060 , e n = 0 . 247 , β = 0 . 092 ,  ds = 0 . 070 , N max = 10000 and minw d = 0 . 223 . W ith these parameters, we were able to adjust the map to form only two clusters, one for each Motif. Fig. 5 displays a graphical comparison between the Motifs of the Gun-Point dataset found by VILMAP and the Motifs expected in each class with their averages and their standard deviations. In Fig. 5(a) the av erage of samples from the class is presented with the associated standard deviation (STD), while Fig. 5(b), the av erage, and STD for the second class are presented. This result shows that with only one passage through the data, the model was able to ﬁnd both Motifs correctly , b ut with a certain displacement in the ﬁrst one. The second Motif presented a result with the prototype being within the range of standard deviation and very close to average. B. Experiment 2 The second dataset is composed of a set of 130 text sentences from the transcripts of TIMIT Acoustic-Phonetic Continuous Speech Corpus [24] that we extracted from the Natural Language T ool Kit (NL TK) [25]. The input ﬁle was translated for a sequence of phonemes using the tool provided by CMU [26], and afterward, each phoneme was translated into a 12-dimensional features vector . The network was trained with the sequence of phonemes fea- tures obtained with this procedure (T rue dataset). T o compute the F-Measure, precision and recall, we created a False dataset by consisting phonemes present in the dataset disposed of in random sequences that were not in the T rue dataset. The obtained False dataset has the same quantity of patterns as the original input ﬁle. In this experiment, two test cycles were generated to answer the following question: Is VILMAP capable of avoiding catas- trophic forgetting when trained with input patterns with an increasing number of dimensions? In the procedure illustrated in Fig. 4(a), one training cycle with input patterns of a certain size is followed by a test with the same size, starting with the representation of two phonemes (24 dimensions), increasing by 12 dimensions, up to 72 dimensions (6 phonemes). In Fig. 4(b) the test procedure performed after the last training cycle with each of the ﬁv e input sizes are illustrated. In order to achie ve good results, in this experiment, 100 sets of parameter v alues were sampled from the ranges in T able I, according to a Latin Hypercube Sampling (LHS) [27], and we recorded the best results achiev ed for each parameter set generated. The LHS is employed to ensure complete coverage of the range of each parameter . The interval of each parameter is divided into 100 interv als of equal probability and a single value is randomly selected from each interval. From the results displayed in Fig. 6 we can see that in the procedure A, the model achieved an F-Measure of about 70% a precision of 55% and a recall of 100%. By analyzing the Fig. 6(b), we can see that the model did not degrade its performance on lower dimensional datasets after being trained with datasets with a greater number of dimensions. It is worth noting that the performance in the dataset with the lowest dimensions actually increased after training with other datasets. This improvement occurs because nodes of size 24 continue to be updated and recognizing more speciﬁc stimuli due to the decrease of its relev ances and respectiv e adjustment of the receptiv e ﬁelds. C. Experiment 3 The third dataset was the Brent-Siskind corpus [28]. This corpus is the largest of the CHIELDS repository [29] and contains the orthographic transcription of more than 100 hours of recording of 16 English language mothers with children who were between 9 and 15 months old at the time of recording. In this experiment, we apply VILMAP for segmenting words in transcription of ﬂuent speech. Considering the pro- posed model does not identify precisely the word boundaries, but is capable of recognizing a word when presented by its inputs, to make it possible to compare the proposed method with the methods presented in [19], we separated non-words from words input patterns. The non-words are caused by the displacements performed on the input data when they are presented to the network since ev entually only part of a word is (a) Mean and std of class 1 (green) with the pattern created (blue) (b) Mean and std of class 2 (green) with the pattern created (blue) Fig. 5. Graphical comparison of the Motifs found with the mean and standard de viation of each class of the gun-point time series. 24 36 48 60 72 Number of dimesions 0.0 0.2 0.4 0.6 0.8 1.0 Precision, F-measure and recall Recall F-measure Precision (a) Procedure A 24 36 48 60 72 Number of dimesions 0.0 0.2 0.4 0.6 0.8 1.0 Precision, F-measure and recall Recall F-measure Precision (b) Procedure B Fig. 6. Graphical Results to compare models A and B. av ailable by the inputs. W e present to the network the original words from the database to count the true positiv es and the false negati ves. Finally , as in the pre vious experiment, we can compute precision, recall and F-measure. The parameter sampling procedure described in the previous experiment was also employed in this experiment. The results obtained with the proposed method in compar - ison with the results presented in [19] are shown in T able II. From this table we can see that VILMAP obtained a relativ ely good F-measure, only losing for the AGu algorithm; a very good precision, overcoming all other algorithms; and ﬁnally , an intermediary recall. W e considered this a promising result, since AGu models a learner with inﬁnite memory and a batch process, while our method represents an online learner that passes through the data only once. T ABLE II R E SU LT S O F T HE W O RD S E GM E N T ATI O N E X P E RI M E NT Algorithm F-measure Precision Recall VILMAP 0.750 0.856 0.667 PUDDLE 0.706 0.682 0.733 DiBS 0.236 0.234 0.240 A Gu 0.782 0.787 0.777 TPs 0.468 0.432 0.512 V . C O N C L U S I O N A N D F U T U R E W O R K This work presented a v ariable input length self-organizing map, VILMAP , for the problem of motif discov ery and word segmentation. The preliminary ev aluation presented in Exper- iment 1, shows that VILMAP can be applied for simple tasks of TSMD. Howe ver , a more detailed evaluation of dif ferent datasets and conditions is still required. The results of Experiment 2 sho wed that the proposed model seems to be able to avoid catastrophic forgetting when the number of dimensions of the input patterns increases with time. This is an interesting and peculiar characteristic of the proposed model not found in other SOM-based models the literature. Moreov er , the results of Experiment 3 show that VILMAP can be a promising candidate for the task of W ord Segmenta- tion. As future work, we intend to explore the capacity of VILMAP of dealing with inputs of variable size to build a re- current growing self-organizing map, for learning expressions with an increasing level of complexity . In this regard, the fact that VILMAP deals well with such an increasing number of dimensions (up to 72) without degrading its performance on previously learned data is a motiv ating achie vement in this path. A C K N O W L E D G M E N T The authors would like to thank the Brazilian Coordination for the Improvement of Higher Lev el Personnel (CAPES) for supporting this work. R E F E R E N C E S [1] S. T orkamani and V . Lohweg, “Surve y on time series motif discov ery , ” W iley Inter disciplinary Reviews: Data Mining and Knowledge Discov- ery , vol. 7, no. 2, 2017. [2] J. Lin, E. Keogh, S. Lonardi, and P . Patel, “Finding motifs in time series, ” pp. 53–68, 10 2002. [3] T . K ohonen, “The self-organizing map, ” Neurocomputing , vol. 21, no. 1-3, pp. 1–6, nov 1998. [Online]. A v ailable: http://linkinghub .elsevier . com/retriev e/pii/S0925231298000307 [4] R. L. M. E. do Rego, A. F . R. Araujo, and F . B. de Lima Neto, “Growing self-organizing maps for surface reconstruction from unstructured point clouds, ” in 2007 International Joint Confer ence on Neural Networks , Aug 2007, pp. 1900–1905. [5] H. F . Bassani and A. F . R. Araujo, “Dimension Selectiv e Self- Organizing Maps W ith T ime-V arying Structure for Subspace and Projected Clustering, ” IEEE T ransactions on Neural Networks and Learning Systems , vol. 26, no. 3, pp. 458–471, mar 2015. [Online]. A v ailable: http://ieeexplore.ieee.or g/document/6803941/ [6] R. Miikkulainen, J. A. Bednar, Y . Choe, and J. Sirosh, Computational Maps in the V isual Cortex . Springer, Janeiro 2005, vol. 1. [7] J. Kangas, “Phoneme recognition using time-dependent versions of self- organizing maps, ” in [Pr oceedings] ICASSP 91: 1991 International Confer ence on Acoustics, Speech, and Signal Processing , Apr 1991, pp. 101–104 vol.1. [8] T . Kohonen, “Self-org anized formation of topologically correct feature maps, ” Biological cybernetics , vol. 43, no. 1, pp. 59–69, 1982. [9] E. Keogh and A. Mueen, “Curse of dimensionality , ” in Encyclopedia of machine learning . Springer , 2011, pp. 257–258. [10] L. Parsons, E. Haque, and H. Liu, “Subspace clustering for high dimensional data: A revie w , ” SIGKDD Explor . Newsl. , vol. 6, no. 1, pp. 90–105, Jun. 2004. [Online]. A vailable: http://doi.acm.org/10.1145/ 1007730.1007731 [11] H.-P . Kriegel, P . Kr ¨ oger , and A. Zimek, “Clustering high-dimensional data: A survey on subspace clustering, pattern-based clustering, and correlation clustering, ” A CM T rans. Knowl. Discov . Data , vol. 3, no. 1, pp. 1:1–1:58, Mar . 2009. [Online]. A v ailable: http://doi.acm.org/10.1145/1497577.1497578 [12] H. F . Bassani and A. F . Ara ´ ujo, “Dimension selective self-organizing maps for clustering high dimensional data, ” in Neural Networks (IJCNN), The 2012 International Joint Confer ence on . IEEE, 2012, pp. 1–8. [13] P . Nunthanid, V . Niennattrakul, and C. A. Ratanamahatana, “Dis- covery of variable length time series motif, ” in Electrical Engineer- ing/Electr onics, Computer , T elecommunications and Information T ech- nology (ECTI-CON), 2011 8th International Conference on . IEEE, 2011, pp. 472–475. [14] Y . Chen, E. Keogh, B. Hu, N. Begum, A. Bagnall, A. Mueen, and G. Batista, “The ucr time series classiﬁcation archi ve, ” July 2015. [15] A. Mueen, “T ime series motif discovery: dimensions and applications, ” W iley Interdisciplinary Reviews: Data Mining and Knowledge Discovery , vol. 4, no. 2, pp. 152–159, 2014. [Online]. A vailable: http://dx.doi.org/10.1002/widm.1119 [16] T . K ohonen, “The self-organizing map, ” Pr oceedings of the IEEE , vol. 78, no. 9, pp. 1464–1480, 1990. [17] J. R. Saffran, E. L. Newport, and R. N. Aslin, “W ord segmentation: The role of distributional cues, ” J ournal of memory and language , vol. 35, no. 4, pp. 606–621, 1996. [18] J. Correa and J. E. Dockrell, “Uncon ventional word segmentation in brazilian children’ s early text production, ” Reading and Writing , vol. 20, no. 8, pp. 815–831, Nov 2007. [19] E. Larsen, A. Cristia, and E. Dupoux, “Relating unsupervised word segmentation to reported vocabulary acquisition, ” Jun 2017. [Online]. A v ailable: osf.io/wa6tq [20] R. Daland and J. B. Pierrehumbert, “Learning diphone-based segmen- tation, ” Cognitive science , vol. 35, no. 1, pp. 119–155, 2011. [21] J. R. Saffran, R. N. Aslin, and E. L. Newport, “Statistical learning by 8-month-old infants, ” Science , vol. 274, no. 5294, pp. 1926–1928, 1996. [22] P . Monaghan and M. H. Christiansen, “W ords in puddles of sound: Modelling psycholinguistic effects in speech segmentation, ” Journal of child language , vol. 37, no. 3, pp. 545–564, 2010. [23] M. Johnson, T . L. Grifﬁths, and S. Goldwater , “ Adaptor grammars: A framew ork for specifying compositional nonparametric bayesian mod- els, ” in Advances in neural information pr ocessing systems , 2007, pp. 641–648. [24] J. S. Garofolo, L. F . Lamel, W . M. Fisher, J. G. Fiscus, D. S. Pallett, N. L. Dahlgren, and V . Zue, “Timit acoustic-phonetic continuous speech corpus, 1993, ” Linguistic Data Consortium, Philadelphia . [25] S. Bird and E. Loper, “Nltk: the natural language toolkit, ” in Pr oceedings of the ACL 2004 on Interactive poster and demonstration sessions . Association for Computational Linguistics, 2004, p. 31. [Online]. A vailable: http://www .nltk.org [26] CMU, “The carnegie mellon university pronouncing dictionary - a machine-readable pronunciation dictionary for north american english. on-line. ” 2011. [27] J. C. Helton, F . Davis, and J. D. Johnson, “ A comparison of uncertainty and sensiti vity analysis results obtained with random and latin hypercube sampling, ” Reliability Engineering & System Safety , vol. 89, no. 3, pp. 305–330, 2005. [28] M. R. Brent and J. M. Siskind, “The role of exposure to isolated words in early vocab ulary dev elopment, ” in Cognition , vol. 81, 2001, pp. 31–44. [Online]. A vailable: https://childes.talkbank.org/access/Eng- N A/ Brent.html [29] B. MacWhinney , The childes pr oject: T ools for analyzing talk. Psy- chology Press, 2000, vol. 2.

Self-Organizing Maps with Variable Input Length for Motif Discovery and Word Segmentation

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment