Trace Norm Regularized Tensor Classification and Its Online Learning Approaches
In this paper we propose an algorithm to classify tensor data. Our methodology is built on recent studies about matrix classification with the trace norm constrained weight matrix and the tensor trace norm. Similar to matrix classification, the tensor classification is formulated as a convex optimization problem which can be solved by using the off-the-shelf accelerated proximal gradient (APG) method. However, there are no analytic solutions as the matrix case for the updating of the weight tensors via the proximal gradient. To tackle this problem, the Douglas-Rachford splitting technique and the alternating direction method of multipliers (ADM) used in tensor completion are adapted to update the weight tensors. Further more, due to the demand of real applications, we also propose its online learning approaches. Experiments demonstrate the efficiency of the methods.
💡 Research Summary
The paper addresses the problem of classifying high‑order tensor data by extending the trace‑norm (nuclear‑norm) regularization that has proven successful in matrix‑based classification. Given an N‑way input tensor X ∈ ℝ^{I₁×…×I_N}, a weight tensor W of the same size, and a scalar bias b, the prediction model is linear: f(X;W,b)=⟨W,X⟩+b. The learning objective combines a convex loss (the authors use the squared loss) with a trace‑norm penalty on W:
min_{W,b} Σ_{t=1}^s ℓ(y_t,⟨W,X_t⟩+b) + λ‖W‖_* .
Here the tensor trace norm is defined as the sum of the nuclear norms of all mode‑unfoldings, i.e., ‖W‖* = Σ{i=1}^N ‖W^{(i)}‖_*. This regularizer promotes low‑rank structure simultaneously across all modes, controlling model complexity and improving generalization.
To solve the convex problem, the authors adopt an Accelerated Proximal Gradient (APG) scheme. Because the loss is smooth, its gradient with respect to W is Lipschitz continuous; Lemma 3.1 provides an explicit Lipschitz constant L = 2 Σ_{m=1}^N I_m Σ_{t=1}^s ‖X_t‖_F², allowing a fixed step size t_k = 1/L without line‑search. The APG iteration consists of a gradient step followed by a proximal step that solves
W_k = argmin_W (L/2)‖W – (Z_{k-1} – (1/L)∇W f(Z{k-1},b))‖F² + λ‖W‖* .
In the matrix case this proximal problem has a closed‑form solution via singular‑value decomposition (SVD) and soft‑thresholding. For tensors of order three or higher, however, no analytic solution exists because the trace norm couples multiple unfolding constraints.
The paper therefore proposes two algorithmic strategies to compute the proximal operator:
-
Douglas‑Rachford (DR) Splitting – The proximal problem is rewritten as the sum of two convex functions f(W)= (L/2)‖W–P‖F² and g(W)=λ‖W‖*. The DR iteration alternates between the proximal maps of f (a simple averaging operation) and g (which requires, for each mode i, an SVD of the unfolding W^{(i)} followed by soft‑thresholding of singular values and refolding). Convergence follows from existing DR theory for convex functions.
-
Alternating Direction Method of Multipliers (ADM) – Auxiliary tensors Y_i (i=1,…,N) are introduced to duplicate W for each mode, enforcing Y_i = W through an augmented Lagrangian with multiplier U_i and penalty β. The updates are:
W^{new} = (L P + β Σ_i Y_i + Σ_i U_i) / (L + β N)
Y_i^{new} = refold( U S S_{βN}
Comments & Academic Discussion
Loading comments...
Leave a Comment