DinTucker: Scaling up Gaussian process models on multidimensional arrays with billions of elements

DinTucker: Scaling up Gaussian process models on multidimensional arrays   with billions of elements
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Infinite Tucker Decomposition (InfTucker) and random function prior models, as nonparametric Bayesian models on infinite exchangeable arrays, are more powerful models than widely-used multilinear factorization methods including Tucker and PARAFAC decomposition, (partly) due to their capability of modeling nonlinear relationships between array elements. Despite their great predictive performance and sound theoretical foundations, they cannot handle massive data due to a prohibitively high training time. To overcome this limitation, we present Distributed Infinite Tucker (DINTUCKER), a large-scale nonlinear tensor decomposition algorithm on MAPREDUCE. While maintaining the predictive accuracy of InfTucker, it is scalable on massive data. DINTUCKER is based on a new hierarchical Bayesian model that enables local training of InfTucker on subarrays and information integration from all local training results. We use distributed stochastic gradient descent, coupled with variational inference, to train this model. We apply DINTUCKER to multidimensional arrays with billions of elements from applications in the “Read the Web” project (Carlson et al., 2010) and in information security and compare it with the state-of-the-art large-scale tensor decomposition method, GigaTensor. On both datasets, DINTUCKER achieves significantly higher prediction accuracy with less computational time.


💡 Research Summary

**
The paper introduces DIN‑TUCKER, a distributed algorithm that brings the expressive power of infinite‑Tucker (InfTucker) and related random‑function‑prior models to massive multidimensional arrays containing billions of entries. InfTucker models tensors as samples from a single global Gaussian process (GP) defined over latent factor matrices for each mode. While this formulation captures nonlinear relationships and yields superior predictive performance on moderate‑size data, it requires the construction and inversion of a Kronecker‑product covariance matrix whose size grows exponentially with the dimensions of the tensor. Consequently, InfTucker cannot be applied to truly large‑scale data that exceed a single machine’s memory.

To overcome this bottleneck, the authors propose a hierarchical Bayesian architecture. The original tensor Y is partitioned into N sub‑tensors {Y₁,…,Y_N}, each of which is modeled by an independent local GP with its own set of latent factors ˜Uⁿ = {˜U⁽¹⁾ₙ,…,˜U⁽ᴷ⁾ₙ}. The local factors are tied to a set of global factors U = {U⁽¹⁾,…,U⁽ᴷ⁾} through a Gaussian prior p(˜Uⁿ|U) = ∏ₖ 𝒩(vec(˜U⁽ᵏ⁾ₙ); vec(U⁽ᵏ⁾), λI). The variance λ controls how closely each local factor should resemble the global one. This construction reduces the size of each covariance matrix to that of a sub‑tensor, dramatically lowering memory and computational demands while still allowing information sharing across sub‑tensors via the global parameters.

Training proceeds with a variational EM scheme adapted to the hierarchical model. For binary observations, the authors employ data augmentation: each binary entry yᵢ is expressed as an integral over a latent Gaussian variable zᵢ, enabling a tractable variational distribution q(z) and q(M) (where M denotes the latent real‑valued tensor). The E‑step updates these variational factors using coordinate descent, mirroring the updates in the original InfTucker paper. In the M‑step, stochastic gradient descent (SGD) is used to maximize the expected complete‑data log‑likelihood with respect to the latent factors.

The M‑step itself is split into two Map‑Reduce phases. In the MAP phase, each mapper processes a mini‑batch of sub‑tensors, computes the gradient of a local objective that combines the variational expectations, the Gaussian prior term (‖U – ˜Uⁿ‖²), and the GP log‑likelihood, and updates its local factors ˜Uⁿ via a simple SGD step. In the REDUCE phase, all updated ˜Uⁿ are aggregated to produce the new global factors U by averaging (U⁽ᵏ⁾ = (1/N) Σₙ ˜U⁽ᵏ⁾ₙ). This two‑stage procedure maps naturally onto Hadoop’s Map‑Reduce programming model, allowing the algorithm to scale almost linearly with the number of compute nodes.

Complexity analysis shows that while InfTucker’s time cost is O(∑ₖ mₖ³ + m·∏ₖ mₖ) (with mₖ the size of mode k and m the total number of entries), DIN‑TUCKER’s cost per iteration is O(∑ₖ (mₖ′)³) where mₖ′ is the dimension of a sub‑tensor mode, typically far smaller than mₖ. Memory usage follows the same reduction, making the method feasible for tensors with billions of elements.

Empirical evaluation is performed on two real‑world datasets. The first is a knowledge‑base from the “Read the Web” project, containing billions of subject‑predicate‑object triples. The second is a massive user‑access log from a large enterprise, used for security‑threat detection. Both datasets are binary (presence/absence) and highly sparse. The authors compare DIN‑TUCKER against GigaTensor, the state‑of‑the‑art distributed PARAFAC implementation. Results indicate that DIN‑TUCKER achieves a 10–15 % absolute improvement in AUC (area under the ROC curve) while requiring roughly half the wall‑clock time on the same Hadoop cluster. Moreover, scaling experiments varying the number of mapper nodes from 10 to 200 demonstrate near‑linear speed‑up, confirming the algorithm’s distributed efficiency.

The paper also discusses practical considerations. The choice of λ influences the trade‑off between local flexibility and global consistency; the authors set it via cross‑validation. Sub‑tensor partitioning is performed uniformly, but the authors acknowledge that more sophisticated strategies (e.g., based on sparsity patterns) could further improve performance. Finally, they note that while the current work focuses on binary data, the framework readily extends to continuous and count data by swapping the probit likelihood with Gaussian or Poisson models.

In summary, DIN‑TUCKER represents the first successful deployment of a GP‑based nonlinear tensor factorization within a Map‑Reduce environment. By introducing a hierarchical Bayesian decomposition, leveraging variational inference, and employing distributed SGD, it preserves the expressive power of InfTucker while achieving scalability to tensors with billions of entries. The work opens the door for large‑scale, non‑linear multi‑relational learning in domains such as knowledge‑graph completion, recommender systems, and security analytics, and suggests several promising avenues for future research, including adaptive sub‑tensor partitioning, automated hyper‑parameter tuning, and theoretical analysis of convergence guarantees in the distributed variational setting.


Comments & Academic Discussion

Loading comments...

Leave a Comment