Cs.MM

All posts under category "Cs.MM"

3 posts total
Sorted by date
QoE-Driven Coupled Uplink and Downlink Rate Adaptation for 360-Degree Video Live Streaming

QoE-Driven Coupled Uplink and Downlink Rate Adaptation for 360-Degree Video Live Streaming

360-degree video provides an immersive 360-degree viewing experience and has been widely used in many areas. The 360-degree video live streaming systems involve capturing, compression, uplink (camera to video server) and downlink (video server to user) transmissions. However, few studies have jointly investigated such complex systems, especially the rate adaptation for the coupled uplink and downlink in the 360-degree video streaming under limited bandwidth constraints. In this letter, we propose a quality of experience (QoE)-driven 360-degree video live streaming system, in which a video server performs rate adaptation based on the uplink and downlink bandwidths and information concerning each user s real-time field-of-view (FOV). We formulate it as a nonlinear integer programming problem and propose an algorithm, which combines the Karush-Kuhn-Tucker (KKT) condition and branch and bound method, to solve it. The numerical results show that the proposed optimization model can improve users QoE significantly in comparison with other baseline schemes.

paper research
Virtual Reality Renderings of World Maps  Comparing 3D Exocentric Globes, Flat Maps, Egocentric 3D Globes, and Curved Maps

Virtual Reality Renderings of World Maps Comparing 3D Exocentric Globes, Flat Maps, Egocentric 3D Globes, and Curved Maps

This paper explores different ways to render world-wide geographic maps in virtual reality (VR). We compare (a) a 3D exocentric globe, where the user s viewpoint is outside the globe; (b) a flat map (rendered to a plane in VR); (c) an egocentric 3D globe, with the viewpoint inside the globe; and (d) a curved map, created by projecting the map onto a section of a sphere which curves around the user. In all four visualisations the geographic centre can be smoothly adjusted with a standard handheld VR controller and the user, through a head-tracked headset, can physically move around the visualisation. For distance comparison, exocentric globe is more accurate than egocentric globe and flat map. For area comparison, more time is required with exocentric and egocentric globes than with flat and curved maps. For direction estimation, the exocentric globe is more accurate and faster than the other visual presentations. Our study participants had a weak preference for the exocentric globe. Generally, the curved map had benefits over the flat map. In almost all cases the egocentric globe was found to be the least effective visualisation. Overall, our results provide support for the use of exocentric globes for geographic visualisation in mixed-reality.

paper research
No Image

Indian EmoSpeech Command Dataset A Real-World Dataset for Emotion-Based Speech Recognition

Speech emotion analysis is an important task which further enables several application use cases. The non-verbal sounds within speech utterances also play a pivotal role in emotion analysis in speech. Due to the widespread use of smartphones, it becomes viable to analyze speech commands captured using microphones for emotion understanding by utilizing on-device machine learning models. The non-verbal information includes the environment background sounds describing the type of surroundings, current situation and activities being performed. In this work, we consider both verbal (speech commands) and non-verbal sounds (background noises) within an utterance for emotion analysis in real-life scenarios. We create an indigenous dataset for this task namely Indian EmoSpeech Command Dataset . It contains keywords with diverse emotions and background sounds, presented to explore new challenges in audio analysis. We exhaustively compare with various baseline models for emotion analysis on speech commands on several performance metrics. We demonstrate that we achieve a significant average gain of 3.3% in top-one score over a subset of speech command dataset for keyword spotting.

paper research

< Category Statistics (Total: 347) >

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut