Multilinear Biased Discriminant Analysis: A Novel Method for Facial Action Unit Representation
In this paper a novel efficient method for representation of facial action units by encoding an image sequence as a fourth-order tensor is presented. The multilinear tensor-based extension of the biased discriminant analysis (BDA) algorithm, called multilinear biased discriminant analysis (MBDA), is first proposed. Then, we apply the MBDA and two-dimensional BDA (2DBDA) algorithms, as the dimensionality reduction techniques, to Gabor representations and the geometric features of the input image sequence respectively. The proposed scheme can deal with the asymmetry between positive and negative samples as well as curse of dimensionality dilemma. Extensive experiments on Cohn-Kanade database show the superiority of the proposed method for representation of the subtle changes and the temporal information involved in formation of the facial expressions. As an accurate tool, this representation can be applied to many areas such as recognition of spontaneous and deliberate facial expressions, multi modal/media human computer interaction and lie detection efforts.
💡 Research Summary
The paper introduces a novel framework for representing and recognizing facial Action Units (AUs) by modeling an image sequence as a fourth‑order tensor and applying a multilinear extension of Biased Discriminant Analysis (BDA), termed Multilinear Biased Discriminant Analysis (MBDA). Traditional dimensionality‑reduction techniques such as 2‑D BDA or Multilinear Discriminant Analysis (MDA) treat positive and negative samples symmetrically, which is suboptimal when the negative class actually comprises many unknown subclasses. MBDA addresses this by formulating an asymmetric objective that minimizes the scatter of positive samples while maximizing the distance of all negative samples from the positive centroid.
The authors first encode each facial video (from neutral to peak expression) into a fourth‑order tensor. Two complementary feature streams are extracted: (1) appearance features obtained from 16 Gabor filters applied to each frame, followed by subtraction of the neutral‑expression response to produce “difference Gabor responses” that are robust to illumination; (2) geometric features derived from a 13‑point Wincand‑3 grid placed on the first frame and tracked across the sequence using a pyramidal optical flow algorithm, yielding per‑frame point displacements. The Gabor responses occupy the spatial dimensions of the tensor, the temporal dimension encodes frame order, and the channel dimension holds the filter responses.
MBDA reduces the tensor’s dimensionality by iteratively solving four coupled j‑mode optimization problems (j = 1…4). For each mode the tensor is unfolded, and generalized within‑class and between‑class scatter matrices are constructed based on the asymmetric criterion. The resulting generalized eigenvalue problem yields a projection matrix for that mode; the process repeats until convergence, producing four projection matrices that jointly compress the tensor in all directions. Regularization and weighting of negative samples are incorporated to avoid singularities and to control the influence of the heterogeneous negative class.
After projection, the reduced‑dimensional tensor is vectorized and fed to a Support Vector Machine with a Gaussian kernel. Experiments were conducted on the Cohn‑Kanade database (490 sequences from 97 subjects). For upper‑face AUs (e.g., brow raisers) and lower‑face AUs (e.g., mouth stretch), 300 sequences were used for training and the remaining for testing, ensuring subject independence. The authors compare MBDA against three baselines that use the same Gabor and geometric features: (i) 2‑D BDA, (ii) MDA applied directly to the fourth‑order tensor, and (iii) a two‑stage approach where 2‑D BDA is applied to each frame’s Gabor response followed by a conventional BDA on the concatenated vectors.
Results show that MBDA achieves average recognition rates of 89.2 % for upper‑face AUs and 96.4 % for lower‑face AUs, with false‑alarm rates of 6.7 % and 2.1 % respectively. These figures surpass the baselines, which suffer either from loss of temporal structure (2‑D BDA + BDA) or from symmetric treatment of the negative class (MDA). The authors also note that, after the four projection matrices are learned offline, testing requires only four tensor‑matrix multiplications, allowing the entire pipeline (including Gabor filtering and optical‑flow tracking) to run in under two seconds on modest hardware, making real‑time deployment feasible.
The paper’s contributions are threefold: (1) the formulation of a biased, asymmetric discriminant criterion in a multilinear tensor setting (MBDA); (2) a hybrid feature representation that combines illumination‑robust Gabor differences with precise geometric motion cues; and (3) a practical system that delivers state‑of‑the‑art AU recognition performance while remaining computationally tractable for real‑time applications. The authors discuss limitations such as the high computational cost of eigen‑decomposition on large tensors during training and suggest future work involving GPU acceleration, incorporation of additional modalities (color, depth), and nonlinear kernel extensions to further enhance robustness and discriminative power. Overall, the study demonstrates that respecting the asymmetry between positive and negative samples and preserving the multi‑dimensional structure of facial video data lead to significant gains in AU representation and recognition.
Comments & Academic Discussion
Loading comments...
Leave a Comment