Joint Bayesian Parameter and Model Order Estimation for Low-Rank Probability Mass Tensors

Joint Bayesian Parameter and Model Order Estimation for Low-Rank Probability Mass Tensors
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Obtaining a reliable estimate of the joint probability mass function (PMF) of a set of random variables from observed data is a significant objective in statistical signal processing and machine learning. Modelling the joint PMF as a tensor that admits a low-rank canonical polyadic decomposition (CPD) has enabled the development of efficient PMF estimation algorithms. However, these algorithms require the rank (model order) of the tensor to be specified beforehand. In real-world applications, the true rank is unknown. Therefore, an appropriate rank is usually selected from a candidate set either by observing validation errors or by computing various likelihood-based information criteria, a procedure that could be costly in terms of computational time or hardware resources, or could result in mismatched models which affect the model accuracy. This paper presents a novel Bayesian framework for estimating the low-rank components of a joint PMF tensor and simultaneously inferring its rank from the observed data. We specify a Bayesian PMF estimation model and employ appropriate prior distributions for the model parameters, allowing the rank to be inferred without cross-validation.We then derive a deterministic solution based on variational inference (VI) to approximate the posterior distributions of various model parameters. Numerical experiments involving both synthetic data and real classification and item recommendation data illustrate the advantages of our VI-based method in terms of estimation accuracy, automatic rank detection, and computational efficiency.


💡 Research Summary

The paper addresses the problem of estimating a joint probability mass function (PMF) of multiple discrete random variables when the PMF is represented as a low‑rank tensor that admits a canonical polyadic decomposition (CPD). Existing CPD‑based PMF estimation methods assume that the tensor rank (model order) is known a priori; in practice the true rank is unknown and must be selected by cross‑validation or information‑theoretic criteria such as AIC, BIC, or DNML. These procedures are computationally expensive and risk model misspecification when the candidate rank set does not contain the true rank.

To overcome these limitations, the authors propose a fully Bayesian framework that simultaneously estimates the CPD factors (the loading vector λ and factor matrices {Aⁿ}) and infers the rank R directly from the observed data. The key ideas are:

  1. Probabilistic Interpretation of CPD – The CPD of a non‑negative tensor can be interpreted as a naïve Bayes model with a single latent categorical variable H that takes R possible states. The loading vector λ corresponds to the prior probabilities Pr(H=r), and each column Aⁿ(:,r) represents the conditional PMF p(Xₙ|H=r). This interpretation naturally imposes simplex constraints (non‑negativity and sum‑to‑one) on λ and the columns of Aⁿ.

  2. Sparsity‑Promoting Dirichlet Prior – A Dirichlet prior with concentration parameter α is placed on λ. Small α encourages sparsity, causing many λᵣ to shrink toward zero during inference. This mechanism enables automatic pruning of irrelevant components, effectively determining the rank without any ad‑hoc threshold.

  3. Variational Inference (VI) for Posterior Approximation – Instead of sampling‑based MCMC, the authors derive a deterministic variational lower bound (ELBO) and obtain closed‑form updates for the variational distributions of λ, Aⁿ, and the latent one‑hot variables Z. The updates retain the Dirichlet form, making the algorithm computationally efficient and guaranteeing monotonic increase of the ELBO.

  4. Automatic Pruning Rule – By analyzing the expected value of λ under the variational posterior, the authors derive an explicit pruning threshold that depends on α and the number of observations T. Components with expected λᵣ below this threshold are removed, and the effective rank is reduced accordingly.

  5. Algorithmic Flow – The procedure iterates: (i) initialize λ and {Aⁿ} (e.g., randomly or from empirical histograms), (ii) E‑step: compute responsibilities (posterior of Z) using current factor estimates, (iii) M‑step: update Dirichlet parameters of λ and each Aⁿ using the responsibilities, (iv) prune components whose expected λᵣ falls below the threshold, (v) repeat until the ELBO converges.

The authors evaluate the method on both synthetic and real datasets:

  • Synthetic Experiments – They vary sample size, true rank, and outage probability (fraction of missing entries). The proposed VI‑based Bayesian estimator accurately recovers the true rank in >95 % of trials, achieves lower mean‑squared error and KL divergence compared with maximum‑likelihood CPD estimators that assume a fixed rank, and remains robust to the choice of α over a wide range.

  • Real‑World Applications – Experiments on recommendation data (e.g., MovieLens) and classification benchmarks from the UCI repository demonstrate that the automatically inferred rank yields better predictive performance (higher NDCG for recommendation, higher F1 for classification) than models tuned by cross‑validation. Moreover, the VI algorithm is roughly an order of magnitude faster than MCMC‑based Bayesian tensor decompositions and about ten times faster than repeated ML training for each candidate rank.

The paper also provides an extensive appendix detailing the derivation of the ELBO, the pruning threshold, and sensitivity analyses for the hyperparameter α. It discusses the relationship to non‑parametric Bayesian approaches (e.g., Dirichlet‑process mixtures) and explains why a finite Dirichlet prior is preferable for PMF estimation because it respects the probability simplex constraints.

In summary, the contribution of the work is threefold: (1) a principled Bayesian model that embeds the rank as a random variable via a sparsity‑inducing Dirichlet prior, (2) a deterministic variational inference scheme with closed‑form updates that scales to large datasets, and (3) empirical evidence that automatic rank detection improves both estimation accuracy and computational efficiency in practical PMF‑based tasks such as recommendation and classification. The authors suggest future extensions to handle time‑varying tensors, more complex missing‑data mechanisms, and hybridization with deep learning architectures.


Comments & Academic Discussion

Loading comments...

Leave a Comment