Embedded vs. Situated: An Evaluation of AR Facial Training Feedback

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

While augmented reality (AR) research demonstrates benefits of embedded visualizations for gross motor training, its applicability to facial exercises remains under-explored. Providing effective real-time feedback for facial muscle training presents unique design challenges, given the complexity of facial musculature. We developed three AR feedback approaches varying in spatial relationship to the user: situated (screen-fixed), proxy-embedded (on a mannequin), and fully embedded (overlaid on the user’s face). In a within-subjects study (N=24), we measured exercise accuracy, cognitive load, and user preference during facial training tasks. The embedded feedback reduced cognitive load and received higher preference ratings, while the situated feedback enabled more precise corrections and higher accuracy. Qualitative analysis revealed a key design tension: embedded feedback improved experience but created self-consciousness and interpretive difficulty. We distill these insights into design considerations addressing the trade-offs for facial training systems, with implications for rehabilitation, performance training, and motor skill acquisition.

💡 Research Summary

This paper investigates how the spatial placement of augmented‑reality (AR) feedback influences performance, cognitive load, and user preference during facial muscle training. While AR‑based embedded visualizations have been shown to improve gross‑motor learning, their applicability to the fine‑grained movements of the face has not been studied. The authors therefore designed three distinct feedback modalities that differ in their “WHERE” relationship to the user’s face:

Situated (BarChart) – a traditional screen‑anchored bar chart placed at the bottom of the display, showing proportional activation levels for each target muscle.
Proxy‑Embedded (Mannequin) – a 3D avatar (a plain‑textured mannequin) that matches the user’s facial geometry; muscle activation is visualized on the avatar using colored bars.
Fully‑Embedded (ARSelfie) – a direct overlay on the user’s own face, achieved by leveraging Google ARCore’s AugmentedFaces API, which provides a dense 468‑point 3D face mesh. Specific landmark clusters are mapped to facial muscles, and real‑time displacement of these clusters is converted into activation scores that are rendered as color‑coded overlays directly on the skin.

A fourth condition, Baseline, shows only the mirrored video feed without any visual augmentation. The system was implemented as a mobile Unity application running on a front‑facing smartphone camera. During a calibration phase, each participant recorded a neutral expression and a maximal expression for each target muscle; the Euclidean distance between these two states defined a personalized activation range.

The study employed a within‑subjects design with 24 participants (balanced gender, university students). Each participant performed three standardized facial exercises (e.g., eyebrow lift, wide smile, lip stretch) under all four conditions in a counter‑balanced order. The authors measured:

Exercise accuracy – proportion of muscles activated correctly per repetition and the number of correctly executed repetitions.
Cognitive and task load – using the NASA‑TLX questionnaire to capture mental, physical, and performance demands.
User preference – a 7‑point Likert scale assessing overall satisfaction and willingness to reuse the system.

Quantitative Findings

Accuracy was highest for the Situated BarChart condition; participants could more precisely identify which muscle needed correction because the numeric bar representation was unambiguous, even though it required a visual shift away from the face.
Cognitive load was lowest for the Fully‑Embedded ARSelfie condition; the direct co‑location of feedback with the body reduced the perception‑action gap, making the task feel more “natural.” The Mannequin condition fell in the middle, offering some reduction in load without the full intimacy of on‑face overlays.
Preference ratings favored the Fully‑Embedded condition, followed by the Mannequin; the BarChart received the lowest preference despite its accuracy advantage.

Qualitative Insights
Participants reported a tension between the immersive benefit of on‑face feedback and feelings of self‑consciousness. Some found the colored overlays “fun” and “intuitive,” while others described them as “embarrassing” or “distracting” when trying to maintain a natural expression. The BarChart was praised for its clarity (“I could see exactly how much each muscle was activated”), yet participants noted the need to constantly shift gaze between their face and the chart, increasing mental effort.

Design Guidelines Derived

Prioritize Accuracy for Clinical/Rehabilitation – Use Situated or Proxy‑Embedded visualizations where precise muscle activation is critical; provide clear numeric cues.
Prioritize Experience for Entertainment or Skill‑Motivation – Deploy Fully‑Embedded overlays to minimize cognitive load and increase engagement.
Adjust Visual Intensity – Offer sliders for transparency, color saturation, or size so users can modulate how prominent the on‑face feedback is, mitigating self‑consciousness.
Stage‑Based Transition – Introduce novices to the system with a Proxy‑Embedded mannequin to reduce the “mirror anxiety,” then transition to Fully‑Embedded as proficiency grows.

Implications
The work demonstrates that the same AR feedback paradigms successful for limb training can be adapted to facial exercises, provided that designers account for the unique social and perceptual dynamics of the face. By combining mobile ARCore facial tracking with a simple landmark‑to‑muscle mapping validated by a facial‑nerve surgeon, the authors show that real‑time muscle activation estimation is feasible without external sensors. The findings are relevant for a range of applications: facial‑paralysis rehabilitation, actor or public‑speaker training, and even consumer wellness apps that aim to improve expressive capabilities.

In summary, the paper contributes (1) three concrete AR visualizations for fine‑grained facial training, (2) empirical evidence that embedded feedback improves user experience but may sacrifice precision, and (3) actionable design guidelines that balance accuracy, interpretability, and comfort for future facial‑training AR systems.

Embedded vs. Situated: An Evaluation of AR Facial Training Feedback

💡 Research Summary

Comments & Academic Discussion

Leave a Comment