VRSL:Exploring the Comprehensibility of 360-Degree Camera Feeds for Sign Language Communication in Virtual Reality
This study explores integrating sign language into virtual reality (VR) by examining the comprehensibility and user experience of viewing American Sign Language (ASL) videos captured with body-mounted 360-degree cameras. Ten participants identified ASL signs from videos recorded at three body-mounted positions: head, shoulder, and chest. Results showed the shoulder-mounted camera achieved the highest accuracy (85%), though differences between positions were not statistically significant. Participants noted that peripheral distortion in 360-degree videos impacted clarity, highlighting areas for improvement. Despite challenges, the overall comprehension success rate of 83.3% demonstrates the potential of video-based ASL communication in VR. Feedback emphasized the need to refine camera angles, reduce distortion, and explore alternative mounting positions. Participants expressed a preference for signing over text-based communication in VR, highlighting the importance of developing this approach to enhance accessibility and collaboration for Deaf and Hard of Hearing (DHH) users in virtual environments.
💡 Research Summary
This paper investigates the feasibility of delivering American Sign Language (ASL) within virtual reality (VR) by using body‑mounted 360‑degree cameras to record signers and presenting the footage to Deaf and Hard‑of‑ Hearing (DHH) participants. The authors address three research questions: (1) Can 360‑degree video feeds accurately capture and convey ASL in VR? (2) Which mounting position (head, shoulder, chest) yields the clearest view of the signs? (3) How does this method compare with existing VR communication tools?
A within‑subjects experiment was conducted with ten DHH participants (six female, four male, ages 18‑55). A single professional signer performed a set of ASL items—including five minimal‑pair words, two non‑minimal‑pair words, and three short sentences—for each camera position, resulting in 30 trials per participant. The videos were captured using a Ricoh Theta V 360‑camera, mounted on the signer’s head, shoulder, or chest via adjustable straps, then converted to equirectangular format and displayed in a Unity‑based VR application on a Meta Quest headset. Participants viewed each clip, selected the correct meaning from multiple‑choice options, completed a NASA‑TLX workload questionnaire after each condition, and provided qualitative feedback at the end. The entire session lasted about 20–30 minutes.
Accuracy results showed the shoulder‑mounted camera achieved the highest mean recognition rate at 85 %, followed by head (≈80 %) and chest (≈75 %). Statistical analysis indicated that differences among positions were not significant (p > 0.05), likely due to the small sample size and limited statistical power. Overall comprehension across all conditions was 83.3 %. NASA‑TLX scores suggested the shoulder position imposed the lowest perceived workload, while the head position sometimes obscured the non‑dominant hand and the chest position suffered from pronounced peripheral distortion. Qualitative comments highlighted that 360‑degree peripheral distortion, especially on the opposite side of the body and for facial expressions, reduced clarity. Participants uniformly expressed a preference for sign‑based communication over text chat, emphasizing the naturalness and expressiveness of ASL in VR.
The study contributes a novel baseline for video‑based sign language delivery in VR, sidestepping the heavy computational demands of hand‑tracking avatars and the limitations of captioning for DHH‑to‑DHH interaction. However, several limitations are evident: the experiment used pre‑recorded clips rather than live streaming, involved only one signer, and did not systematically vary lighting, background complexity, or signer physique. The modest participant pool restricts generalizability, and the lack of detailed workload data hampers a full understanding of cognitive demands.
Future work should (1) enlarge the participant cohort and include multiple signers to improve statistical robustness; (2) develop a real‑time streaming pipeline and measure latency, bandwidth, and frame‑rate impacts; (3) apply distortion‑correction algorithms or higher‑resolution fisheye lenses to mitigate peripheral warping; (4) explore hybrid mounting configurations (e.g., combined shoulder‑and‑chest rigs) or wearable necklaces for more ergonomic deployment; and (5) integrate multimodal interfaces that combine sign video, text, and optional audio for mixed‑ability groups. By addressing these avenues, the approach could substantially enhance social presence, collaboration efficiency, and accessibility for DHH users in immersive virtual environments.
Comments & Academic Discussion
Loading comments...
Leave a Comment