Non-parametric Bayesian Learning with Deep Learning Structure and Its Applications in Wireless Networks

Non-parametric Bayesian Learning with Deep Learning Structure and Its Applications in W ireless Networks Erte Pan and Zhu Han Abstract —In this paper , we present an inﬁnite hierarchical non-parametric Bayesian model to extract the hidden factors over observed data, wher e the number of hidden factors f or each layer is unknown and can be potentially inﬁnite. Moreov er , the number of layers can also be inﬁnite. Previous non-parametric Bayesian methods assume binary values for both hidden factors and weights. In contrast, we construct the model structure that allows continuous values for the hidden factors and weights, which makes the model suitable for more applications. W e use the Metropolis-Hastings method to infer the model structure. Then the performance of the algorithm is evaluated by the experiments. Simulation r esults show that the model ﬁts the underlying structure of simulated data. Index T erms —non-parametric Bayesian learning, deep learn- ing, Indian Buffet Process, Metropolis-Hastings algorithm I . I N T RO D U C T I O N Statistical models have been applied to the classiﬁcation and prediction problems in machine learning and data anal- ysis [1]. Some statistical methods mak e the hypothesis of mathematical models that are controlled by certain param- eters to ﬁt the latent structure of observed data [2]. The observed data are assumed to be generated by complex structures which hav e hierarchical layers and hidden causes [3]. One key challenge faced by modeling the data structure in this way is thus the determination of the numbers of layers and hidden variables. Ho wev er , it is sometimes impractical and challenging to choose any ﬁxed number for the model structure when making the hypothesis. Therefore, we need ﬂexible non-parametric models that make fewer assumptions and are capable of an unlimited amount of latent structures. Hierarchical nonparametric Bayesian model assumes unspec- iﬁed number of latent variables and produces rich kinds of probabilistic structures by constructing cascading layers. Hence, it is considered to be a powerful technique to cope with the challenge. In [4], a two-layer non-parametric Bayesian model w as proposed with both hidden factors and linking weights being binary . The model accommodates potentially inﬁnite number of hidden factors and performs well in inferring stroke localizations. W orks in [5] built a deep cascading graphical model that permits the number of hidden layers to be inﬁnite. This technique has been used in the inference of the structures of images. Howe ver , the proposed model only infers the priors of the number of hidden factors in each layer and ignores the inﬂuence of factor values in each layer on the E. Pan (epan@uh.edu), and Z. Han (zhan2@uh.edu) are with the De- partment of Electrical and Computer Engineering, University of Houston, Houston, T exas, USA, 77004. posterior distributions of factor numbers. In [6], the authors dev eloped a hierarchical model based on the Beta process for con volutional factor analysis and deep learning. In the proposed linear model, the connecting weights and hidden factors are real v alues other than binary ones. The model has been used in multi-le vel analysis of image-processing data sets. In [7], another approach was constructed to build the prior distribution for the nonparametric Bayesian factor regression. The Kingman’ s Coalescent is chosen as the prior and achieves good results in gene-expression data analysis. In [8], the Indian Buffet Process (IBP) was introduced into factor analysis and therefore enabled their model of handling the inﬁnite case. In addition, the method allows real-valued weights and factors. In the application realm, Non-parametric Bayesian model has been explored to solve v arious classiﬁca- tion and clustering problems. In [9], the deep belief networks had been applied to unlabeled auditory data and achie ved good performance in the unsupervised classiﬁcation task. Although the nonparametric Bayesian technique has been advanced by researchers recently , challenges still lie in the problem of constructing the real-valued non-linear models with the numbers of both hidden layers and hidden factors being inﬁnite. In this paper, we in vestigate the nonparametric Bayesian graphical model with inﬁnite hierarchical hidden layers and an inﬁnite number of hidden factors in each layer . Our main contributions include: the proposition of the inﬁnity structure both latently and hierarchically; the linking weights are ex- tended from binary values to real values; the proposed model is constructed in a non-linear fashion, not like the works in [6]; the emplo yment of the Metropolis-Hastings algorithm enables the alternativ e update of the values of hidden factors layer by layer, making the inference procedure recursively . The phantom data are simulated according to our inﬁnite generativ e model. The inferring algorithm is then applied to the simulated data to extract the data structure. As is stated before, when considering wireless security circumstance, the applications we mainly focus on turn to be the clustering problems. Therefore, the most interest lies in the number of hidden factors which indicates the number of clusters in different hierarchical lev els. The simulation results sho w that this greedy algorithm accomplishes the objectiv e of discov ering the number of hidden factors accurately . This paper is organized as follows: In Section II, the nonparametric Bayesian generati ve model is introduced to generate the data. The inference algorithm is giv en in Section III. Simulation results are presented in Section IV. In Section Fig. 1: Proposed inﬁnite generativ e model V, we draw conclusions and gi ve insightful discussions. I I . G E N E R A T I V E M O D E L The objectiv e is to construct a hierarchical Bayesian framew ork based generati ve model which allows both inﬁnite layers and inﬁnite components in each layer . T o better e xplain the proposed model, we describe the ﬁnite generativ e model ﬁrst. Then the inﬁnite model can be obtained by extending the number of hidden factors and the number of layers to inﬁnity . A. F inite Generative Model Finite generati ve model is used to model the causal effects among the factors between layers, as those described in [4] and [10]. Here we construct the model of one ob- servation layer and two hidden layers. Deﬁne the matrix X = [ x 1 , x 2 , ..., x T ] as the data set of T data points with each x t being a v ector of N dimensions. Accordingly , deﬁne the matrix Y 1 = [ y 1 1 , y 1 2 , ..., y 1 T ] as the hidden factors of ﬁrst hidden layer with each y 1 t being a vector of K 1 dimensions. Similarly , we have the deﬁnition for the K 2 × T matrix Y 2 as the hidden factors of second hidden layer . T o express the dependency between two successi ve layers, we use the N × K 1 weight matrix W 1 and K 1 × K 2 weight matrix W 2 , respectively . For instance, if there exists a connection between Y 1 k,t and X n,t , which means the hidden cause Y 1 k,t will inﬂuence the generation of data component X n,t , then W 1 n,k 6 = 0 and W 1 n,k ∈ < . Otherwise, W 1 n,k = 0 . The rest hidden vectors { y i } and weight matrices { W i } can be deriv ed in the similar way . Fig. 1 illustrates the proposed inﬁnite generative model structure for a particular instance t ∈ { 1 , 2 , ..., T } . Note that the weight matrices remain the same through all instances { 1 , 2 , ..., T } while the data sets between two instances are generated independently . W ithin one particular instance t , data v ector x t is generated in as follows: First, the hidden vector y i t of the topmost layer is generated according to Gaussian distrib ution N (0 , σ 2 y ,i ) . Then the weight matrix W i is generated according to W i = Z i J G i , where matrices Z i and G i are of the same size as W i and the symbol J indicates the element-wise product operator . W e assume each column of matrix Z i , Z i .,r , is generated independently as Z i .,r ∼ B er noull i ( p r ) . W e further impose a prior distribution for the parameter p r ∼ B eta ( α 0 i /K i , 1) . It will be demonstrated later that this strategy of constructing matrix Z i will result in the Indian Buf fet Process as the number of variables of y i t , K i , approaches the inﬁnity [11]. For the matrix G i , each column is generated by Gaussian distribution G i .,r ∼ N (0 , σ 2 r ) with the variance conforms to the inv erse gamma prior, σ 2 r ∼ I nv er seGamma ( α 2 , β 2 ) . The matrix Z i imposes the selection effect of variables between layers while the matrix G 2 indicates ho w much inﬂuence a v ariable will receiv e from its higher lev el variables or ancestors. Having obtained the hidden vector y i t and weight matrix W i , the variables of y i − 1 t are conditionally independently generated giv en y i t and W i , and we assume they follow the Gaussian distribution Y i − 1 k,t ∼ N (0 , σ 2 y ,i − 1 ) , where the parameter σ y ,i − 1 is speciﬁed by σ y ,i − 1 =    P K i j =1 W i k,j Y i j,t    . It can be veriﬁed that the element of weight matrix W i follows the distribution: P ( W i k,r | p r , σ 2 r ) =   sg n ( W i k,r )   p r N ( W i k,r ; 0 , σ 2 r ) + (1 − p r ) δ 0 ( W i k,r ) , (1) where sg n indicates the sign function and symbol δ 0 is a delta function at 0. The downw ard layers are constructed in the same fashion. The generation of variables from the ﬁrst hidden layer y 1 t to the observed layer x t is similar to the proce- dure above, except for the parameterizations: W e assume Z 1 .,k ∼ B ernoul li ( p k ) and p k ∼ B eta ( α 0 1 /K, 1) for the matrix Z 1 . For the matrix G 1 , G 1 .,k ∼ N (0 , σ 2 k ) and σ k ∼ I nv erseGamma ( α 1 , β 1 ) . Hence, the distribution of observed data vector can be expressed as X n,t ∼ N (0 , σ 2 xn ) and σ xn =    P K j =1 W 1 n,j Y 1 j,t    , where P ( W 1 n,k | p k , σ 2 k ) =   sg n ( W 1 n,k )   p k N ( W 1 n,k ; 0 , σ 2 k ) + (1 − p k ) δ 0 ( W 1 n,k ) . (2) This generative model can be emplo yed in many appli- cations since we are able to extract not only the features from data points but also the higher lev el hyper -features from the extracted features. Instances can be found in applications such as human face recognition where the input data are images of human faces with ﬁrst lev el of features being curves and edges, second lev el of features being organs like eyes and nose [12]. Moreover , we allo w one v ariable to possess more than one hidden causes (not like the Inﬁnite Gaussian Mixture Model) which makes the model more robust. In addition, we assume real weight matrix instead of binary ones, and this will bring our model closer to practice since dif ferent hidden causes are reasonably weighted. B. Inﬁnite Generative Model Having established the ﬁnite generativ e model, the case of an inﬁnite number of layers can be expressed by the recursive equations: σ y i j =      X l W i +1 j,l Y i +1 l,t      , (3) Y i j,t ∼ N (0 , σ 2 y i j ) . (4) Moreov er, the inﬁnite number of components in each layer can be obtained by taking the limit as K i → ∞ . W e will demonstrate that the distributions on the selection matrices correspond to the IBP . For example, for our assumptions on Z 1 , we have: P ( Z 1 | p ) = K Y k =1 N Y n =1 P ( Z 1 n,k | p k ) = K Y k =1 p m k k (1 − p k ) N − m k , (5) where m k = P N n =1 Z 1 n,k is the number of data points that select hidden factor Y 1 k,. and Z 1 is a N × K matrix. Since we place a prior distribution B eta ( α 0 1 /K 1 , 1) on p k , we can integrate out the parameter p to obtain: P ( Z 1 ) = K Y k =1 α 0 1 K Γ( m k + α 0 1 K )Γ( N − m k + 1) Γ( N + 1 + α 0 1 K ) . (6) By deﬁning the equiv alent-class of matrix Z 1 [11], we can ﬁnd the distribution on Z 1 as K → ∞ : P ( Z 1 ) = α 0 K h 1 N Y n =1 K n 1 ! e − α 0 1 H N K h Y k =1 ( N − m k )!( m k − 1)! N ! , (7) where K n 1 is the number of ﬁrst hidden layer factors being selected by the n -th variable of the data point X .,t , H N is the harmonic number with H N = P N j =1 1 j and K h is the number of ﬁrst hidden layer factors selecting h com- ponents of the data point. This distrib ution corresponds to a stochastic process, the IBP [11], which is the analog of dishes selecting by N customers at an Indian Buf fet restaurant. The restaurant provides customers (variables of a data point) an inﬁnite array of dishes which corresponds to the inﬁnite components of ﬁrst hidden layer factors. The ﬁrst customer tries P oisson ( α 0 1 ) dishes. The succeeding customers select dishes one by one in the way that they ﬁrstly select previously selected dishes with probability m − i,k /i , where m − i,k is the number of customers who have chosen the k -th dish except the i -th customer himself. The i -th customer then selects next P oisson ( α 0 1 /i ) ne w dishes. I I I . I N F E R E N C E A L G O R I T H M Having constructed the inﬁnite generati ve model, the goal is to infer the number of hidden layers as well as the number of hidden factors in each hidden layer based on Bayesian inference. The task is done once we obtain the inference of { W 1 , W 2 , ..., Y 1 , Y 2 , ... } giv en observed data X . How- ev er , direct estimation of P ( W 1 , W 2 , ..., Y 1 , Y 2 , ... | X ) is intractable. Inspired by [13], we perform the inference one layer at a time. That is, we ﬁrst initialize the weights matrices { W 1 , W 2 , ... } as well as the hidden layer { Y 1 , Y 2 , ... } . Then we ﬁx the v alue of { W 2 , ... } and { Y 2 , ... } , leading to the fact that the prior distrib ution of Y 1 is known and can be expressed in terms of { W 2 , ... } and { Y 2 , ... } . Based on this scenario, we use the Metropolis-Hastings algorithm as an approximate method to infer the ﬁrst hidden layer { Y 1 , W 1 } . After inferring ﬁrst hidden layer , we use matrix Y 1 as the input data points and perform Bayesian inference at the second hidden layer , and so forth. Since the prior has been changed during the inference of second hidden layer, we need to re-infer the ﬁrst hidden layer using the updated upper hidden layers. W e iterati vely perform inference at each layer until the value of { W 1 , W 2 , ..., Y 1 , Y 2 , ... } con verges. It has been proved that this layer-wise inferring strategy is efﬁcient in [14]. Dif ferent from [14], the Metropolis-Hastings algorithm is applied to perform the inference, instead of the contrastiv e div ergence method. The Metropolis-Hastings algorithm was ﬁrst introduced by the classic paper by Metropolis, Rosenbluth etc. in 1953 and has been extensiv ely applied in statistical problems. It deﬁnes a Markov chain which allows the change of dimen- sionality between dif ferent states of the model. The ne w state is generated from the previous state by ﬁrst generating a candidate state using a speciﬁed proposal distribution. Then a decision is made to accept the candidate state or not, based on its probability density relativ e to that of the previous state, with respect to the desired in variant distribution, Q . If the candidate state is adopted, it e volves as the next state of the Markov chain; otherwise, the state of the model stays the same. T o better explain the inference algorithm, we specify the problem into one hidden layer inference. The generalized inﬁnite case can be deri ved in the similar f ashion. In our problem settings, let η represent the values of W 1 , Y 1 , K 1 , where W 1 is the weight matrix connecting the N × T data matrix X and the K 1 × T hidden factors matrix Y 1 and K 1 is the dimension of hidden factor y 1 t . Then the change between different states of the model is adopted with probability A ( η ∗ , η ) = min  1 , P ( X , η ∗ ) P ( X , η ) Q ( η | η ∗ ) Q ( η ∗ | η )  , (8) where η ∗ is the proposed new value, η is the current value, and Q ( η ∗ | η ) is the probability of proposing η ∗ giv en η . The term P ( X , η ) can be further expressed as P ( X , η ) = P ( X | W 1 , Y 1 ) P ( Y 1 | K 1 ) P ( W 1 | K 1 ) P ( K 1 ) . (9) The change of dimensionality is completed in this way: Iterativ ely pick a hidden factor with corresponding column k of W 1 and check the number of linked edges m k . If m k = 0 , then remove this hidden factor together with the corresponding column in W 1 and decrease K 1 . Otherwise, propose a new hidden factor with no linked edges and sample the ne w values of Y 1 by (4). This new proposed state is accepted with the probability A ( η ∗ , η ) . The probability of adding a new hidden factor is approximated by K 1 + /K 1 while the probability of generating the new Y 1 is speciﬁed by its normal distribution. Q ( η ∗ | η ) is obtained by multiply these two probabilities. T o return to the pre vious conﬁguration, we can delete any hidden factor with the same v alues as the proposed new row of Y 1 . The probability of choosing such a hidden factor is approximated by 1 / ( K 1 + 1) . Therefore, we ha ve Q ( η | η ∗ ) Q ( η ∗ | η ) = 1 / ( K 1 + 1) K 1 + K 1 Q t 1 √ 2 π σ yk e y 2 k,t 2 σ 2 yk , (10) P ( X , η ∗ ) P ( X , η ) = Q t 1 √ 2 πσ yk e y 2 k,t 2 σ 2 yk P ( W 1 | K 1 + 1) P ( K 1 + 1) P ( W 1 | K 1 ) P ( K 1 ) , (11) where P ( W 1 | K 1 +1) P ( W 1 | K 1 ) is just the probability of generating a new column of Z 1 with all zero v alues, speciﬁed by (2). And P ( K 1 +1) P ( K 1 ) can be computed from Poisson distributions as the priors of IBP . As a result, we have A ( η ∗ , η ) = min   1 , 1 K 1 +1 P ( W 1 | K 1 + 1) P ( K 1 + 1) K 1 + K 1 P ( W 1 | K 1 ) P ( K 1 )   . (12) Similarly , the proposal of delete a hidden factor with no linked edges is accepted with the probability A ( η ∗ , η ) = min   1 , 1 K 1 +1 P ( W 1 | K 1 − 1) P ( K 1 − 1) K 1 + K 1 P ( W 1 | K 1 ) P ( K 1 )   . (13) T o accomplish the algorithm, we need to sample W 1 and Y 1 . Using the Gibbs sampling, we indi vidually infer each v ariable of the two matrices in turn from the distribu- tions P ( W 1 n,k | X , W 1 − n,k , Y 1 ) and P ( Y 1 k,t | X , Y 1 − k,t , W 1 ) , where W 1 − n,k means all values of W 1 except for W 1 n,k and Y 1 − k,t means all v alues of Y 1 except for Y 1 k,t . From the construction of our generative model and the Bayes’ rule, we ha ve P ( W 1 n,k | X , W 1 − n,k , Y 1 ) ∝ P ( X | W 1 n,k , W 1 − n,k , Y 1 ) · P ( W 1 n,k | W 1 − n,k ) , (14) where P ( X | W 1 n,k , W 1 − n,k , Y 1 ) is speciﬁed by the Gaussian likelihood we choose, and the term P ( W 1 n,k | W 1 − n,k ) can be obtained by integrating out the associated priors: P ( W 1 n,k | W 1 − n,k ) = P ( W 1 n,k | ~ W 1 − n,k ) = Z ~ θ ∈ S P ( W 1 n,k | ~ θ ) P ( ~ θ | ~ W 1 − n,k )d ~ θ , ~ θ = ( σ 2 k , p k ); S = { σ 2 k ∈ < + , p k ∈ [0 , 1] } , (15) where P ( W 1 n,k | ~ θ ) is speciﬁed by (2) and ~ W 1 − n,k denotes all values of the k -th column of matrix W 1 except for W 1 n,k . Since the columns of W 1 are generated independently , W e compute P ( W 1 n,k | ~ W 1 − n,k ) , instead of P ( W 1 n,k | W 1 − n,k ) . Utilizing the Bayes’ rule again, P ( ~ θ | W 1 − n,k ) can be com- puted by: P ( ~ θ | ~ W 1 − n,k ) ∝ P ( ~ W 1 − n,k | σ 2 k , p k ) · P ( σ 2 k , p k ) . (16) The distribution P ( ~ W 1 − n,k | σ 2 k , p k ) can be computed by ev aluating each element of the k -th column of matrix W 1 except for W 1 n,k based on (2). The distrib ution P ( σ 2 k , p k ) can be computed by multiplication of P ( σ 2 k ) and P ( p k ) which are speciﬁed by their priors deﬁned in the generativ e model. Similarly , we obtain the expression for P ( Y 1 k,t | X, Y 1 − k,t , W 1 ) : P ( Y 1 k,t | X, Y 1 − k,t , W 1 ) ∝ P ( X | Y 1 k,t , Y 1 − k,t , W 1 ) · P ( Y 1 k,t | Y 1 − k,t ) , (17) where P ( X | Y 1 k,t , Y 1 − k,t , W 1 ) is speciﬁed by the Gaussian likelihood we choose. Since each element of Y 1 .,t is generated independently , P ( Y 1 k,t | Y 1 − k,t ) can be computed by its priors as (4). The inference at the rest hidden layers is similar to the procedure used to infer the ﬁrst hidden layer . Therefore, we summarize our inference algorithm in Algorithm 1. Algorithm 1: MH steps for inferring ﬁrst-layer hidden factors for r = 1 , 2 , . . . , number of iterations do for i = 1 , 2 , . . . , N do iterativ ely select column k of W if m − i,k > 0 then propose adding a new hidden factor with probability speciﬁed by (12) else propose deleting this hidden factor with probability speciﬁed by (13) for k = 1 , 2 , . . . , K do sample W i,k according to (14) for each element of Y do sample Y k,t according to (17) I V . S I M U L A T I O N R E S U LT S A N D D I S C U S S I O N S F O R W I R E L E S S A P P L I C A T I O N S W e analyzed the performance of the proposed modiﬁed Metropolis-Hastings algorithm for inferring the true number of hidden factors in the ﬁrst hidden layer . First, we ﬁx the dimension of the observed data points, N = 16 , and vary the number of hidden f actors, K , from 3 to 10. For each integer v alue of K , we generate a dataset containing T = 200 data instances using the proposed generativ e model. W ithin one instance, Y 1 is sampled according to its Gaussian prior . Then the weight matrix W 1 is drawn from its distribution speciﬁed by (2). Finally , the data point X is generated by the Gaussian distribution where the parameters are expressed in terms of Y 1 and W 1 . The rest model parameters are ﬁxed at α 0 1 = 3 for the Beta distrib ution; α 1 = 2 and β 1 = 1 for the Inv ersegamma distribution. The modiﬁed Metropolis- Hastings algorithm is initialized with three choices of K : K = 2 , K = 10 or random positiv e integer between 3 and 10, and then runs for 200 iterations. Each dataset is estimated 10 times by the inference procedure described previously . W e record the e xpectation of the estimated number of hidden factors and its variance as the result. Fig. 2: Inferring the number of ﬁrst-layer hidden factors using Metropolis-Hastings algorithm. Each curve shows the mean and variance of the expected value of the dimensionality K . W e plotted the results in Fig. 2. The modiﬁed Metropolis- Hastings algorithm is under the inﬂuence of initialization. When initializing K = 10 , which is much greater than the dimensions of the underlying model, the inferred K v alues are generally much larger than the true values. Howe ver , when initializing K randomly , the results correspondingly show some randomness. Another observation is that the MH method tends to over -estimate the number of hidden factors. This is because the proposal to add one hidden factor is preferred to be accepted. According to (10), the nominator is usually lar ger than the denominator because the denominator is composed by the multiplication of probability terms. Hence, the adding proposal is more likely to be accepted. The proposed model can be utilized in unsupervised and nonparametric clustering problem in wireless networks. The estimated number of hidden factors solves one ke y challenge of clustering problem that is the determination of the number of clusters. In wireless security setting, the proposed model is a suitable solution to identify the attack devices in the communication system [15]. In the ﬁeld of data analysis in the wireless networks, the proposed model can serve as the feature extraction approach [16]. Moreov er , the proposed model can contribute to the location estimation task in wireless networks [17]. Many other wireless networking applications can be explored using the proposed framework. V . C O N C L U S I O N S In this paper, we developed a deep hierarchical nonpara- metric Bayesian model to represent the underlying structure of observed data. Correspondingly , we proposed a modiﬁed Metropolis-Hastings algorithm to recover the number of hidden factors. Our simulation results on the hidden layer show that the algorithm discov ers the model structure with some estimation errors. Ho wever , as shown in the results, our approach is capable of inferring increasing dimensions of hidden structures, which is due to the advantage of the nonparametric Bayesian technique. This indicates that the nonparametric Bayesian approach can be a suitable method for discov ering complex structures. R E F E R E N C E S [1] Y . W . T eh and M. I. Jordan, “Hierarchical Bayesian nonparametric models with applications, ” in Bayesian Nonparametrics: Principles and Practice , N. Hjort, C. Holmes, P . M ¨ uller , and S. W alker , Eds. Cambridge Uni versity Press, 2010. [2] R. Thibaux and M. I. Jordan, “Hierarchical beta processes and the indian buf fet process. this volume, ” In Practical Nonparametric and Semiparametric Bayesian Statistics, T ech. Rep., 2007. [3] C. E. Rasmussen, “The inﬁnite gaussian mixture model, ” in In Ad- vances in Neural Information Processing Systems 12 . MIT Press, 2000, pp. 554–560. [4] F . W ood, “ A non-parametric bayesian method for inferring hidden causes, ” in Proceedings of the T wenty-Second Confer ence on Uncer- tainty in Artiﬁcial Intelligence (UAI . A UAI Press, 2006, pp. 536–543. [5] H. M. W . Ryan P . Adams and Z. Ghahramani, “Learning the structure of deep sparse graphical models, ” in 13-th International Confer ence on Artiﬁcial Intelligence and Statistics , Chia Laguna, Sardinia, Italy , May 2010. [6] B. Chen, G. Polatkan, G. Sapiro, L. Carin, and D. B. Dunson, “The hierarchical beta process for con volutional factor analysis and deep learning, ” in Pr oceedings of the 28th International Conference on Machine Learning (ICML-11) , L. Getoor and T . Scheffer , Eds. New Y ork, NY , USA: ACM, 2011, pp. 361–368. [7] P . Rai and H. Daum ´ e III, “The inﬁnite hierarchical factor regression model, ” in Proceedings of the Confer ence on Neural Information Pr ocessing Systems (NIPS) , V ancouver , Canada, 2008. [8] D. Kno wles and Z. Ghahramani, “Inﬁnite sparse factor analysis and inﬁnite independent components analysis, ” in Independent Component Analysis and Signal Separation , ser . Lecture Notes in Computer Science, M. Davies, C. James, S. Abdallah, and M. Plumbley , Eds. Springer Berlin Heidelberg, 2007, vol. 4666, pp. 381–388. [9] H. Lee, P . T . Pham, Y . Lar gman, and A. Y . Ng, “Unsupervised feature learning for audio classiﬁcation using conv olutional deep belief networks, ” in Advances in Neural Information Pr ocessing Systems 22: 23r d Annual Conference on Neural Information Pr ocessing Systems , V ancouver, Canada, December 2009. [10] N. T . Nguyen, X. Liu, and R. Zheng, “ A nonparametric bayesian approach for opportunistic data transfer in cellular netw orks, ” in in Pr o- ceedings of the 7th International Conference of Wir eless Algorithms, Systems, and Applications (W ASA) , Y ellow Mountain, China, August 2012, pp. 88–99. [11] T . L. Grifﬁths and Z. Ghahramani, “The indian buffet process: An introduction and review , ” J. Mach. Learn. Res. , vol. 12, pp. 1185– 1224, July 2011. [12] L. Ma, C. W ang, B. Xiao, and W . Zhou, “Sparse representation for face recognition based on discriminative low-rank dictionary learning, ” in Computer V ision and P attern Recognition (CVPR), 2012 IEEE Confer ence on , Providence, RI, June 2012, pp. 2586–2593. [13] G. E. Hinton, S. Osindero, and Y .-W . T eh, “ A fast learning algorithm for deep belief nets, ” Neural Comput. , vol. 18, no. 7, pp. 1527–1554, July 2006. [14] M. A. Carreira-Perpi ˜ n and G. Hinton, “On contrasti ve div ergence learning, ” R. G. Cowell and Z. Ghahramani, Eds. Society for Artiﬁcial Intelligence and Statistics, 2005, pp. 33–40, (A vailable electronically at http://www .gatsby .ucl.ac.uk/aistats/). [15] N. Nguyen, R. Zheng, and Z. Han, “On identifying primary user emulation attacks in cognitive radio systems using nonparametric bayesian classiﬁcation, ” Signal Processing , IEEE Tr ansactions on , vol. 60, no. 3, pp. 1432–1445, March 2012. [16] S. Chinchali and S. T andon, “Location estimation in wireless networks: A bayesian approach, ” (Online report at http://cs229.stanford.edu/projects2012.html). [17] D. Madigan, W .-H. Ju, P . Krishnan, A. S. Krishnakumar, and I. Zorych, “Location estimation in wireless networks: a bayesian approach, ” Statistica Sinica , vol. 16, no. 2, pp. 495–522, 2006.

Non-parametric Bayesian Learning with Deep Learning Structure and Its Applications in Wireless Networks

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment