FOODER: Real-time Facial Authentication and Expression Recognition

Reading time: 6 minute
...

📝 Abstract

Out-of-distribution (OOD) detection is essential for the safe deployment of neural networks, as it enables the identification of samples outside the training domain. We present FOODER, a real-time, privacy-preserving radar-based framework that integrates OOD-based facial authentication with facial expression recognition. FOODER operates using low-cost frequency-modulated continuous-wave (FMCW) radar and exploits both range-Doppler and micro range-Doppler representations. The authentication module employs a multi-encoder multi-decoder architecture with Body Part (BP) and Intermediate Linear Encoder-Decoder (ILED) components to classify a single enrolled individual as in-distribution while detecting all other faces as OOD. Upon successful authentication, an expression recognition module is activated. Concatenated radar representations are processed by a ResNet block to distinguish between dynamic and static facial expressions. Based on this categorization, two specialized MobileViT networks are used to classify dynamic expressions (smile, shock) and static expressions (neutral, anger). This hierarchical design enables robust facial authentication and fine-grained expression recognition while preserving user privacy by relying exclusively on radar data. Experiments conducted on a dataset collected with a 60 GHz short-range FMCW radar demonstrate that FOODER achieves an AUROC of 94.13% and an FPR95 of 18.12% for authentication, along with an average expression recognition accuracy of 94.70%. FOODER outperforms state-of-the-art OOD detection methods and several transformer-based architectures while operating efficiently in real time.

💡 Analysis

Out-of-distribution (OOD) detection is essential for the safe deployment of neural networks, as it enables the identification of samples outside the training domain. We present FOODER, a real-time, privacy-preserving radar-based framework that integrates OOD-based facial authentication with facial expression recognition. FOODER operates using low-cost frequency-modulated continuous-wave (FMCW) radar and exploits both range-Doppler and micro range-Doppler representations. The authentication module employs a multi-encoder multi-decoder architecture with Body Part (BP) and Intermediate Linear Encoder-Decoder (ILED) components to classify a single enrolled individual as in-distribution while detecting all other faces as OOD. Upon successful authentication, an expression recognition module is activated. Concatenated radar representations are processed by a ResNet block to distinguish between dynamic and static facial expressions. Based on this categorization, two specialized MobileViT networks are used to classify dynamic expressions (smile, shock) and static expressions (neutral, anger). This hierarchical design enables robust facial authentication and fine-grained expression recognition while preserving user privacy by relying exclusively on radar data. Experiments conducted on a dataset collected with a 60 GHz short-range FMCW radar demonstrate that FOODER achieves an AUROC of 94.13% and an FPR95 of 18.12% for authentication, along with an average expression recognition accuracy of 94.70%. FOODER outperforms state-of-the-art OOD detection methods and several transformer-based architectures while operating efficiently in real time.

📄 Content

Facial Expression Recognition (FER) and facial authentication are two fundamental tasks within human-centered computing, enabling systems to interpret emotional states and verify individual identities, respectively. FER focuses on identifying emotional cues from facial expressions [1], playing a key role in enhancing Human-Computer Interaction (HCI) [2], virtual reality experiences [3], and digital entertainment applications [4], [5]. Meanwhile, facial authentication systems aim to verify user identity, supporting applications in security, device access, and digital identity management.

As a critical and popular computer vision application, facial authentication is now widely integrated into everyday technologies. For instance, many smartphones employ facial authentication via front-facing cameras to unlock screens, ensuring only authorized users can access the device. Numerous research efforts [6], [7], [8] and practical software solutions have utilized RGB image-based sensors for this purpose. Although these systems have achieved impressive performance, they inherently rely on visual data, raising significant privacy concerns. Despite the success of vision-based methods in both FER [9], [10], [11], and facial authentication, they remain vulnerable to environmental factors such as lighting variations, occlusion, and pose changes. Furthermore, capturing and storing visual information raises concerns around user privacy and data security, especially as computer vision technologies become more pervasive across industries like autonomous driving, healthcare, and retail.

Radar sensors have gained increasing popularity due to their resilience under challenging environmental conditions such as fog, smoke, and poor lighting, and their inherent ability to preserve user privacy. These properties make radar an attractive sensing modality for a wide range of applications, including human presence detection, people counting, gesture recognition, and even vital sign monitoring [12], [13], [14], [15]. Short-range radar systems, in particular, are becoming indispensable tools in both academic research and industrial solutions, especially for indoor environments. Facial authentication, a binary classification task, naturally fits within the framework of One-vs-All (OvA) problems. However, given the practically infinite number of “all” class members-i.e., individuals who are not the target identity-facial authentication inherently becomes an out-ofdistribution (OOD) detection problem. Traditional classification models, designed to recognize only a closed set of known classes, struggle with unknown identities. OOD detection reframes the facial authentication challenge as a binary classification task: distinguishing in-distribution (ID) samples, belonging to the enrolled user, from OOD samples, representing all other individuals. OOD detection [16], [17], [18], [19], [20] is critical for safely deploying machine learning models, especially in safety-critical applications such as autonomous driving, healthcare, robotics, and biometric authentication. Without effective OOD handling, neural networks are prone to making overconfident and potentially catastrophic predictions on unfamiliar inputs. A detector assigns a confidence score to each sample in a typical OOD detection setting. A threshold is determined based on a validation set containing both ID and OOD examples (e.g., ensuring 95% true positive rate for ID samples). During testing, if a sample’s score exceeds the threshold, it is classified as ID; otherwise, it is rejected as OOD.

This work proposes a radar-based framework that unifies facial authentication and facial expression recognition within a single, privacy-preserving system. By leveraging low-cost frequency-modulated continuous-wave (FMCW) radar and advanced deep learning architectures, we ensure both secure identity verification and rich emotional understanding without relying on visual imagery. Our system uses radar range-Doppler images (RDIs) and micro range-Doppler images (micro-RDIs) as inputs.

For facial authentication, we adopt a reconstruction-based OOD detection strategy. Specifically, we utilize a multi-encoder, multi-decoder architecture. The system assigns reconstruction errorbased scores to each sample to determine whether it belongs to the ID class (scores below a predefined threshold) or to the OOD class (scores above the threshold).

The facial expression recognition module is activated if the detected sample belongs to the ID class. First, RDIs and micro-RDIs are concatenated and passed through a ResNet block to classify the expression as dynamic or static. Based on this classification, two specialized MobileViT networks are employed: one trained to distinguish dynamic expressions (smile and shock) and the other trained for static expressions (anger and neutral). This two-stage pipeline ensures accurate recognition of the user’s emotional state while maintaining real-time performance and pres

This content is AI-processed based on ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut