Time-Varying Interaction Estimation Using Ensemble Methods

Directed information (DI) is a useful tool to explore time-directed interactions in multivariate data. However, as originally formulated DI is not well suited to interactions that change over time. In previous work, adaptive directed information was …

Authors: Br, on Oselio, Amir Sadeghian

Time-Varying Interaction Estimation Using Ensemble Methods
TIME-V AR YING INTERA CTION ESTIMA TION USING ENSEMBLE METHODS Brandon Oselio, Alfr ed Her o ∗ Uni versity of Michigan EECS Department 1301 Beal A v e, Ann Arbor , MI 48109 Amir Sade ghian, Silvio Savar ese Stanford Uni versity Computer Science Department 353 Serra Mall, Stanford, CA 94305 ABSTRA CT Directed information (DI) is a useful tool to explore time- directed interactions in multiv ariate data. Howe ver , as orig- inally formulated DI is not well suited to interactions that change o ver time. In pre vious work, adapti ve directed informa- tion was introduced to accommodate non-stationarity , while still preserving the utility of DI to disco ver comple x dependen- cies between entities. There are many design decisions and parameters that are crucial to the ef fecti veness of ADI. Here, we apply ideas from ensemble learning in order to alleviate this issue, allo wing for a more rob ust estimator for exploratory data analysis. W e apply these techniques to interaction estima- tion in a crowded scene, utilizing the Stanford drone dataset as an example. Index T erms — directed information, adaptiv e directed information, temporal modeling, data exploration, interaction mining 1. INTR ODUCTION The study of interactions among entities of interest encom- passes a broad array of applications and is crucial to under- standing complex processes. Often times, we are interested in the directionality ov er time of these relationships. Examples include social influence estimation [18], [19], [21], entity in- teraction in video [5], and biological recording analysis, such as EEG [6], [20]. These interactions can also be used to sum- marize highly complex data topology , allow analysts to obtain a qualitativ e snapshot of the temporal interactions of the data, and make better informed decisions based on these simplified representations. One tool that allows for the extraction of interactions is called directed information (DI). Originally created to analyze an information-theoretic channel with feedback, DI has been used in many contexts to estimate directed relationships be- tween entities, including genetic data and social data. One de- ficiency of directed information is its inflexibility with respect to time-v arying distrib utions [17], [18]. Adaptiv e directed ∗ W e acknowledge the support of USAF grant F A8650-15-D-1845 and US Army Research Office grant W911NF-15-1-0479 information (ADI) was de veloped as an e xtension of directed information to better track changes in relationships ov er time. In this paper, we address some of the issues associated with using ADI. Specifically , ADI requires a choice of filter and corresponding filter parameters, and the quality of the resulting interaction estimate is not generally rob ust to these choices. In addition, simple filters may have difficulty adapting to both abrupt changes in interaction, as well as slowly time- varying systems. An estimate that is able to accomplish both smoothing ov er time, as well as the ability to adapt to abrupt changes in interactivity quickly is desired. In this paper , a form of ensemble learning is used to im- prov e interaction estimation with ADI. Specifically , follow- ing [9], [23], we generate a filter that is a conv ex combination of simpler filters with dif ferent parameter specifications and whose weights are dependent on the data. In order to address the possibility of abrupt changes in the system, a gro wing ensemble of estimators is used to account for these changes in interactivity . The proposed ADI estimator is applied to interaction esti- mation in a cro wded scene, utilizing video from the Stanford drone dataset [22]. Utilizing a dynamic cov ariance model, the ADI is estimated and used to unco ver interesting phenomena in specific scenes across the Stanford campus. The paper is organized as follo ws: Sec. 2 discusses related work. Sec. 3 introduces the mathematical concepts of DI and ADI, and introduces our ensemble estimator . Sec. 4 introduces the dynamic co variance model used to estimate ADI. Sec. 5 discusses the results on the Stanford V ideo Dataset. Finally , Sec. 6 concludes the paper . 2. RELA TED W ORK Directed information has been studied in the conte xt of theory and applications. Estimators for DI hav e been proposed for the case of a finite or countably infinite feature space [11], [13], [20]. Most, if not all, estimators use the stationary Markov assumption, including plugin estimators [17], [18]. Directed information has been used in man y contexts, including EEG analysis [6], neural spike trains [20], and social influence analysis [17], [18]. Changepoint detection methods [2] is one approach to track time-varying data, and parametric as well as non-parametric methods exist. Howe ver , with fe w exceptions, e.g., [3] these methods are mostly uni variate and often require a parametric model or use simple moment-based statistics that do not capture dependency . Other methods of influence estimation hav e been studied, particularly in the context of i.i.d. observ ations; examples include glasso [8] and hub discov ery-type methods [10]. In ad- dition, semi-parametric extensions of these models ha ve been created for non-Gaussian data [12]. The family of directed information measures and in particular ADI is concerned with directionality in time and with more complicated time-varying signals. In this paper, we assume a parametric multiv ariate Gaussian model, which is appropriate for the particular dataset. The ensemble method used stems from the prediction with multiple experts, a popular problem in machine learning [4], [9], [23]. Here, we use these techniques for smoothing. 3. ADI AND ENSEMBLE ESTIMA TION 3.1. Definition of DI and ADI W e begin with some notation. W e assume that we ha ve 1 , 2 , . . . , N entities each with features X i 1 : T =  X i 1 , X i 2 , . . . , X i T  . In this paper , X i t ∈ R d . Directed information between X i and X j is defined as follows: DI ( X i 1 : T → X j 1 : T ) = T X t =1 I ( X i 1 : t ; X j t | X j 1 : t − 1 ) , (1) where I ( X ; Y | Z ) is the Shannon conditional mutual infor- mation. Many interesting conservation properties ha ve been deriv ed for directed information, including a close connection to the standard Shannon mutual information; these will not be repeated here, but the reader is referred to papers [1], [15], [16]. When considering the asymptotic behavior of DI for stationary processes, one defines the directed information rate: DI ( X i → X j ) = lim T →∞ 1 T DI ( X i 1 : T → X j 1 : T ) . If we assume that the entities form a k -Markov process, then I ( X i 1 : t ; X j t | X j 1 : t − 1 ) = I ( X i 1 : t ; X j t | X j t − k : t − 1 ) . When stationarity cannot be assumed, then the traditional definition of DI is inapplicable. Ho wev er , the instantaneous DI summand of (1) retains v aluable information about temporal interactivity of the entities i and j . In [18], we proposed to adaptiv ely estimate this quantity using adaptive directed information (ADI), which is defined as follows: ADI ( X i 1 : T → X j 1 : T ) = T X t =1 g ( t, T ) I ( X i 1 : t ; X j t | X j 1 : t − 1 ) , where g ( t, T ) is a user-defined taper function. In past work [18], the focus has been on the exponential filter g ( t, T ) = α (1 − α ) t − T , so that ADI obeys the recursive update: ADI i → j 1 : t = α I ( X i ; X j t | X j 1 : t − 1 ) + (1 − α ) ADI i → j 1 : t − 1 , where ADI i → j 1 : t = ADI ( X i 1 : t → X j 1 : t ) . Howe ver , the parame- ter α of the exponential filter must be tuned according to the specific application. The goal of the this paper is to impro ve the robustness of ADI when the underlying state is unkno wn and rapidly changing. In order to accomplish this, an ensemble filter is defined: g ∗ ( t, T ) = P n t i =1 w i,t g i ( t, T ; t 0 ) P n t i =1 w i , (2) where g i ( t, T ) are “base filters” with different parameter spec- ifications. Implicitly , the weights w i are allowed to depend on past data. Further , the number of base filters included in the ensemble ( n t ) is allowed to gro w with t , and filter functions will be causal, i.e., g ( t, T ; t 0 ) = 0 for t < t 0 . 3.2. Expanding Fixed Shares of Estimation W e apply an ensemble method based on the simple fixed shares algorithm [9], which was originally introduced in [23]. A set of base filter functions is defined, G = { g 1 , . . . , g k } along with a parameter τ which defines the rate at which new filters are introduced into 2 At each time t , an estimate ˆ I ( X i 1 : t ; X j t | X j 1 : t − 1 ) is obtained and used to both update the weights w i and to update the ADI estimate. The weights w i are updated in a similar manner to [23]: v i,t = w i,t − 1 e − γ ( y i,t − i t ) 2 , w i,t = (1 − β ) v i,t + β n t n t X i =1 v i,t , where β ∈ [0 , 1] and γ > 0 are user -defined hyperparam- eters. Theorem 3.1 pro vides a bound for the MSE, assum- ing that I ( X i 1 : t ; X j t | X j 1 : t − 1 ) is piecewise constant, and the estimate has i.i.d. noise with bounded variance. W e use the abbre viation i t = I ( X i 1 : t ; X j t | X j 1 : t − 1 ) , and similarly ˆ i t = ˆ I ( X i 1 : t ; X j t | X j 1 : t − 1 ) for con venience. Theorem 3.1. Let ˆ i t = i t +  t , wher e  t is independent with mean 0 and variance σ 2 t , and i t is piecewise constant with m transitions. Then the MSE of the ADI ensemble estimator is bounded by: E " T X t =1 ( ADI ( t ) − i t ) 2 # ≤ m γ ln n t − 1 γ ln β m (1 − β ) T − m + γ 8 T + mσ 2 ∗ ln  T e  , (3) wher e σ 2 ∗ = max t σ 2 t . The proof of Theorem 3.1 is giv en in Appendix A. 4. SP A TIAL INTERA CTION ESTIMA TION IN A SCENE W e illustrate ADI by applying it to discov er salient time- varying interactions among actors in a scene. Here, the com- ponents n = 1 , . . . , N are actors moving around in space. For each sampled frame t and actor i , define the position vector X i t = [ x i t , y i t ] on the plane. 4.1. Dynamic Covariance Model W e propose a dynamic Gaussian model, follo wing the model in [7]. Assume that the combined feature matrix is distrib uted as: X ∼ N ( m t , Σ t ) , (4) where m t is a mean vector and Σ t is a cov ariance matrix. W e assume that m t and Σ t are slowly v arying, and further use a kernel estimate of these quantities: ˆ m t = 1 P T i =1 K h ( i − t ) T X i =1 K h ( i − t ) X i . (5) ˆ Σ t = 1 P T i =1 K h ( i − t ) T X i =1 K h ( i − t )( X i − ˆ m i )( X i − ˆ m i ) T , (6) where K h ( t ) is a kernel function. The conditional mutual information is a function of the cov ariance matrices under a Markovian Gaussian random process. ˆ I ( X i 1 : t ; X j t | X j t − 1 , X [ N ] / { i,j } t − 1 ) = 1 2 log    ˆ Σ X j t | X j t − 1 ,X [ N ] / { i,j } t − 1       ˆ Σ X j t | X j t − 1 ,X i t − 1 ,X [ N ] / { i,j } t − 1    . 5. APPLICA TION T O ST ANFORD DR ONE D A T ASET In this section, the proposed ensemble ADI estimator is ap- plied to the Stanford Drone Dataset [22], which is a collection of 60 annotated videos across 8 scenes shot on the Stanford campus. These annotations allow for tracking the mo vement of pedestrians, cars, bicyclists and other mo ving actors in the scene. These estimated locations of actors are smoothed by a moving mean estimator in order to reduce artifacts introduced by the discretization of the annotations. These smoothed loca- tions for each actor in the scene are then used to calculate the ADI. For the analysis, an rbf kernel was used in (5) with param- eter h = 5 , and the ADI ensemble parameters were set to τ = 10 , β = 0 . 01 , γ = 1 , and G = { e xp (0 . 1) , exp (0 . 2) , unif } . Af- ter calculating ADI, only interactions where the actors were within a certain distance (in pixels) from each other were con- sidered - in this case, 100. 1150 1200 1250 1300 1350 frame 0.0 0.5 1.0 1.5 2.0 ADI ADI plot for 5, 25 adi - 25 to 5 adi - 5 to 25 ami Fig. 1 : Stanford video dataset example. Here, we capture an interaction of two people meeting in the scene. The shown video frames and corresponding ADI are demonstrating them coming to wards each other , interacting briefly , as actor 25 even walks in the other direction to continue the con versation, and then resuming their original path. The title on the plots pair of labeled actors in “video0” of the bookstore scene, and the line labeled i to j represents ADI i → j . 5.1. Interaction Example between Pedestrians Fig. 1 sho ws one example of ADI and the corresponding in- teraction between two pedestrians. The pedestrians labeled 5 and 25 stop to chat briefly , with 25 actually re versing course for a small time to continue the conv ersation at frame 1280 to 1300 to continue the conv ersation. The estimated ADI is able to identify this interaction, and to identify that there is more influence from 5 to 25 than vice versa o ver this small window . This is compared with an adaptiv e version of mutual information: AMI ( X i 1 : T , X j 1 : T ) = T X t =1 g ∗ ( t, T ) ˆ I ( X i t ; X j t | X i 1 : t − 1 , X j 1 : t − 1 ) , where the ensemble method outlined for ADI is applied to the estimated summand ˆ I ( X i t ; X j t | X i 1 : t − 1 , X j 1 : t − 1 ) . 5.2. V isualization of Interactions based on ADI W e can use ADI as a tool to cluster and visualize many interac- tions in the dataset. First, the ADI for all interactions between actors in the bookstore scene from the Stanford Drone dataset across 5 dif ferent videos are collected, totaling m = 539 in- teractions. Using symmetrized ADI, ADI i,j = ADI i → j + ADI j → i , the maximal cross correlation between each inter- action is found, and this correlation is used as an affinity measure a k,l , with the corresponding affinity matrix A = [ a k,l ] k,l =1 ,...m . Note that a k,l = a l,k , and so A is symmetric. A can then be used to apply a number of visualization and clus- tering techniques. Here, we use t-SNE dimension reduction and visualization method [14], by transforming A to a distance matrix D = [ d i,j ] i,j =1 ,...m , where d i,j = p 2(1 − a i,j ) and applying the method to this matrix. Fig. 2 sho ws the results. The colors correspond to dif ferent types of interactions, such as between pedestrians, or between a pedestrian and a bike, etc. Fig. 2 : t-SNE plot of interactions based on ADI. The high- lighted cluster of pedestrian interactions is characterized by low lev els of interaction ov er a long period of time combined with spikes of acti vity . The visualization sho ws small clusterings of interactions. An example is circled in black, with representati ve traces shown in Fig. 3. More generally , we see that the pedestrian- biker interactions mostly cluster in the bottom-left portion of the plot, while the biker-bik er and pedestrian-pedestrian inter- actions are less cohesi ve as a group, implying heterogeneity among these types of interactions. The small highlighted clus- ter of pedestrian interactions, for example, are characterized by long periods of lo w ADI combined with abrupt spikes. These are observed to correlate to pedestrians w alking slowly in the same direction or standing still along with occasional changes in velocity or direction. 7000 7500 8000 8500 Frame 0.00 0.25 0.50 0.75 1.00 1.25 1.50 ADI 0: 2, 73 2 to 73 73 to 2 6000 7000 8000 Frame 0.0 0.2 0.4 0.6 0.8 1.0 ADI 0: 5, 128 5 to 128 128 to 5 12000 13000 14000 Frame 0.0 0.2 0.4 0.6 0.8 ADI 1: 308, 309 308 to 309 309 to 308 2000 3000 4000 Frame 0.0 0.1 0.2 0.3 0.4 0.5 0.6 ADI 3: 180, 181 180 to 181 181 to 180 Example ADI Traces from Cluster Fig. 3 : Representati ve ADI traces from highlighted cluster . The majority of these interactions are pedestrians that are moving slowly together or standing still in close proximity , with abrupt direction and velocity changes. The titles on the plots represent the origin video and pair of labeled actors in the dataset, and the line labeled i to j represents ADI i → j . 5.3. Relationship between ADI and V elocity In this section we study the relationship between the v elocity profile and ADI profile of particular types of interactions. For each interaction and each actor i the instantaneous velocity vector v i t = [ v i t,x , v i t,y ] is calculated, along with the corre- sponding instantaneous magnitude v i t =   v i t   . Further , the instantaneous velocity angle between two actors i and j is calculated: θ i,j t = arccos v i t · v j t v i t v j t ! . Using the relativ e velocity angle, we can look for two spe- cific types of interactions, and how their ADI profiles dif fer; those with high angle, so that the tw o actors are approaching from opposite directions, and low angle, where the two actors are moving in the same direction. Fig. 4 shows four represen- tativ e interactions, two with lo w velocity angles and two with high velocity angles. 0 50 100 Frame 0 2 4 6 8 Vel. Mag., Vel. Angle, Symm. ADI High Velocity, Low Angle 0 100 200 300 Frame 0.0 0.5 1.0 1.5 2.0 Vel. Mag., Vel. Angle, Symm. ADI Low Velocity, Low Angle vel. mag. vel. angle symm ADI 0 25 50 75 100 Frame 0.5 1.0 1.5 2.0 2.5 3.0 Vel. Mag., Vel. Angle, Symm. ADI High Angle (1) 0 25 50 75 100 Frame 1.0 1.5 2.0 2.5 3.0 Vel. Mag., Vel. Angle, Symm. ADI High Angle (2) Low and High Angle Interactions Fig. 4 : Representative profiles of lo w and high velocity angle interactions. The top row sho ws two lo w-angle interactions, one with high total v elocity . The high total velocity interaction has a relati vely constant symmetrized ADI profile, while the low total velocity interaction has an ADI profile close to 0. The high angle interactions ha ve more v ariable ADI profiles relati ve to their magnitude, and tend to be sensitiv e to changes in total velocity . In general, interactions with high total velocity , defined as v i t + v j t , and low velocity angle see a stable and non-zero symmetrized ADI. In the low total velocity setting, the ADI is normally much smaller than its high velocity counterpart. T wo examples of lo w-angle interactions are sho wn in the top row of Fig. 4. In the high angle case, ADI is less constant, and in many cases responds more to changes in total v elocity , as shown in the bottom ro w of Fig. 4. 5.4. A verage ADI between Different T ypes of Actors Fig. 5 shows a graph of the average ADI between types of actors in the bookstore scene from the Stanford drone dataset across 5 different videos. Skaters tend to have the lowest av erage ADI with other groups, followed by pedestrians, with bikers and carts ha ving the largest interaction magnitudes. Interestingly , pedestrians influence bikers and carts more than the two groups influence pedestrians on av erage, possibly signifying that bikers and carts are more cautious and thus are more affected by pedestri- ans in the vicinity . As seen in Fig. 4, the velocity magnitudes in interactions can play a role, specifically that the magnitudes of velocity and ADI are positi vely correlated. W ith bikers being 0.25 0.50 0.37 0.70 0.34 0.70 0.25 0.26 0.33 0.54 Pedestrian Biker Skater Cart Fig. 5 : A verage ADI between types of actors in the bookstore scene for the Stanford drone dataset. Bikers hav e the largest lev els of interaction, while skaters hav e the least. among the fastest mo ving actors in this graph, it makes sense that they ha ve some of the lar gest interaction magnitudes. 6. CONCLUSION In this paper, we introduced an ADI estimator that utilizes an ensemble technique in order to make ADI more robust to user-specified parameters. The estimator is applicable to real-world scenarios where directed information e volv es as a function of time. W e illustrated the power of the ensemble ADI estimator to detect latent interactions in a video using the Stanford drone dataset. In the future, ADI can be used as a data summarization and exploration tool or as a component in a larger system. A. PR OOF OF THEOREM 2.1 T o aid in the proof, we prov e two propositions and restate The- orem 2 in [23] as Lemma A.1. Since we assume that i t is piece- wise constant with m changes, we can define [ t 1 , t 2 , . . . , t m ] as the (unkno wn) transition points, where t 1 = 1 . W e further define the “oracle mean estimator” u ∗ ( t ) : u ∗ ( t ) = m X k =1 T X t =1 1 t − t k + 1 1 ( t k ≤ t < t k +1 ) i t , where t m +1 = T + 1 . The proof is based on the following result found in [23], restated in terms of ADI: Lemma A.1. The trac king re gr et of the ensemble ADI estima- tor in comparison with u ∗ ( t ) , defined as: R ( u ∗ ( T )) = T X t =1 ( ADI ( t ) − i t ) 2 − T X t =1 ( u ∗ ( t ) − i t ) 2 , is at most R ( u ∗ ( T )) ≤ m γ ln n t − 1 γ ln β m (1 − β ) T − m + γ 8 T . (7) Proposition A.2. E  ( u ∗ ( t ) − i t ) 2  ≤ σ 2 t + 1 t − t k + 1 σ 2 ∗ . (8) Pr oof. E  ( u ∗ ( t ) − i t ) 2  = E   1 t − t k + 1 t X i = t k ( θ i +  i ) − ( θ t +  t ) ! 2   = E   1 t − t k + 1 t − 1 X i = t k  i −  t ! 2   = 1 ( t − t k + 1) 2 t − 1 X i = t k  i + ( t − t k ) 2  t ! ≤ ( t − t k ) ( t − t k + 1) 2 σ 2 ∗ + ( t − t k ) 2 ( t − t k + 1) 2 σ 2 t ≤ σ 2 ∗ t − t k + 1 + σ 2 t . Proposition A.3. E  ( ADI ( t ) − i t ) 2  ≥ E  ( ADI ( t ) − θ t ) 2  + σ 2 t . (9) Pr oof. W e first decompose the left side using the definition of i t : ( ADI ( t ) − i t ) 2 = ( ADI ( t ) − θ t ) 2 + 2  t ( ADI ( t ) − θ t ) +  2 t . The result follows from taking the e xpectation of both sides, along with the following observ ation: E  2  t ( ADI ( t ) − θ t )  = E   2  t n t X j =1 w j,t − 1 t X i =1 g j ( i, T ; t 0 ) i i   (10) = E   2 n t X j =1 w j,t − 1 g j ( i, T ; t 0 ) i t  t   (11) = 2 σ 2 t n t X j =1 w j,t − 1 g j ( i, T ; t 0 ) ≥ 0 , (12) where the last inequality is due to the fact that w j,t − 1 and g j ( i, T ; t 0 ) are non-negati ve, ∀ i, j, t . Using the definition of R ( u ∗ ( T )) and Props. A.3, A.2, we obtain: T X t =1  E  ( ADI ( t ) − θ t ) 2  + σ 2 t  − T X t =1 σ 2 t + m X k =1 1 t − t k + 1 1 ( t k ≤ t < t k +1 ) σ 2 ∗ ! ≤ R ( u ∗ ( T )) . Finally , note the following inequality: m X k =1 T X t =1 1 t − t k + 1 ≤ m ln  T e  . Combining this with Lemma A.1, and rearranging terms, achiev es the desired bound. Refer ences [1] P .-O. Amblard and O. J. Michel, “On directed informa- tion theory and granger causality graphs, ” Journal of computational neur oscience , vol. 30, no. 1, pp. 7–16, 2011. [2] S. Aminikhanghahi and D. J. Cook, “A survey of meth- ods for time series change point detection, ” Knowledge and information systems , vol. 51, no. 2, pp. 339–367, 2017. [3] T . Banerjee, H. Firouzi, and A. O. Hero, “Quickest detection for changes in maximal knn coherence of random matrices, ” IEEE T ransactions on Signal Pr o- cessing , vol. 66, no. 17, pp. 4490–4503, 2018. [4] N. Cesa-Bianchi and G. Lugosi, Pr ediction, learning, and games . Cambridge univ ersity press, 2006. [5] X. Chen, A. Hero, and S. Sa varese, “Shrinkage opti- mized directed information using pictorial structures for action recognition, ” ArXiv pr eprint arXiv:1404.3312 , 2014. [6] X. Chen, Z. Syed, and A. Hero, “Eeg spatial decoding and classification with logit shrinkage regularized di- rected information assessment (l-soda), ” ArXiv preprint arXiv:1404.0404 , 2014. [7] Z. Chen and C. Leng, “Dynamic cov ariance models, ” Journal of the American Statistical Association , vol. 111, no. 515, pp. 1196–1207, 2016. [8] J. Friedman, T . Hastie, and R. T ibshirani, “Sparse in- verse covariance estimation with the graphical lasso, ” Biostatistics , vol. 9, no. 3, pp. 432–441, 2008. [9] M. Herbster and M. K. W armuth, “Tracking the best expert, ” Machine learning , vol. 32, no. 2, pp. 151–178, 1998. [10] A. Hero and B. Rajaratnam, “Hub disco very in partial correlation graphs, ” IEEE T ransactions on Information Theory , vol. 58, no. 9, pp. 6064–6078, 2012. [11] J. Jiao, H. H. Permuter , L. Zhao, Y .-H. Kim, and T . W eissman, “Uni versal estimation of directed informa- tion, ” IEEE T ransactions on Information Theory , v ol. 59, no. 10, pp. 6220–6242, 2013. [12] H. Liu, J. Lafferty , and L. W asserman, “The nonpara- normal: Semiparametric estimation of high dimensional undirected graphs, ” Journal of Machine Learning Re- sear ch , v ol. 10, no. Oct, pp. 2295–2328, 2009. [13] Y . Liu and S. A viyente, “Directed information measure for quantifying the information flow in the brain, ” in Engineering in Medicine and Biology Society , 2009. EMBC 2009. Annual International Confer ence of the IEEE , IEEE, 2009, pp. 2188–2191. [14] L. v . d. Maaten and G. Hinton, “Visualizing data using t-sne, ” Journal of machine learning r esear ch , v ol. 9, no. Nov, pp. 2579–2605, 2008. [15] J. Massey, “Causality , feedback and directed informa- tion, ” in Pr oc. Int. Symp. Inf . Theory Applic.(ISIT A-90) , Citeseer , 1990, pp. 303–305. [16] J. L. Masse y and P . C. Massey, “Conservation of mutual and directed information, ” in Information Theory , 2005. ISIT 2005. Pr oceedings. International Symposium on , IEEE, 2005, pp. 157–158. [17] B. Oselio and A. O. H. III, “Dynamic directed influence networks: A study of campaigns on twitter, ” in Social, Cultural, and Behavior al Modeling, 9th International Confer ence, SBP-BRiMS 2016, W ashington, DC, USA, J une 28 - July 1, 2016, Pr oceedings , K. S. Xu, D. Reitter, D. Lee, and N. Osgood, Eds., ser . Lecture Notes in Computer Science, vol. 9708, Springer, 2016, pp. 152– 161, I S B N : 978-3-319-39930-0. D O I : 10 . 1007 / 978 - 3 - 319- 39931- 7\_15 . [Online]. A vailable: https://doi.org/ 10.1007/978- 3- 319- 39931- 7\_15 . [18] ——, “Dynamic reconstruction of influence graphs with adaptiv e directed information, ” in 2017 IEEE Interna- tional Confer ence on Acoustics, Speech and Signal Pr o- cessing, ICASSP 2017, New Orleans, LA, USA, Mar ch 5-9, 2017 , IEEE, 2017, pp. 5935–5939, I S B N : 978-1- 5090-4117-6. D O I : 10 .1109 / ICASSP. 2017 . 7953295 . [On- line]. A vailable: https:/ /doi. org/10 .1109/ ICASSP.2017 . 7953295 . [19] B. Oselio, S. Liu, and A. Hero, “Multi-layer rele vance networks, ” in 19th IEEE International W orkshop on Signal Pr ocessing Advances in W ir eless Communica- tions, SP A WC 2018, Kalamata, Greece , J une 25-28, 2018 , IEEE, 2018, pp. 1–5, I S B N : 978-1-5386-3512-4. D O I : 10. 1109 /SPAWC . 2018. 8446016 . [Online]. A vailable: https://doi.org/10.1109/SPAWC.2018.8446016 . [20] C. J. Quinn, T . P . Coleman, N. Kiyavash, and N. G. Hatsopoulos, “Estimating the directed information to infer causal relationships in ensemble neural spike train recordings, ” Journal of computational neuroscience , vol. 30, no. 1, pp. 17–44, 2011. [21] C. J. Quinn, N. Kiyavash, and T . P . Coleman, “Directed information graphs, ” IEEE T ransactions on information theory , vol. 61, no. 12, pp. 6887–6909, 2015. [22] A. Robicquet, A. Sadeghian, A. Alahi, and S. Sav arese, “Learning social etiquette: Human trajectory understand- ing in crowded scenes, ” in Eur opean conference on computer vision , Springer, 2016, pp. 549–565. [23] C. R. Shalizi, A. Z. Jacobs, K. L. Klinkner, and A. Clauset, “ Adapting to non-stationarity with growing ex- pert ensembles, ” ArXiv pr eprint arXiv:1103.0949 , 2011.

Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Leave a Comment