Spherical clustering of users navigating 360{deg} content

SPHERICAL CLUSTERING OF USERS NA VIGA TING 360 ◦ CONTENT Silvia Rossi ⋆ F rancesca De Simone † P ascal F r ossard ‡ Laura T oni ⋆ ⋆ Department of Electronic & Electrical Engineering, UCL, London (UK) † DIS, Centrum W iskunde & Informatica, The Netherlands ‡ L TS4, ´ Ecole Polytechnique F ´ ed ´ erale de Lausanne (EPFL), Switzerland E-mails: { s.rossi, l.toni } @ucl.ac. uk, F.De.Simone@c wi.nl, pascal.frossard@epfl.ch ABSTRA CT In V irt ual Reality (VR) applications, understanding how users ex- plore the visual content i s important in order to optimize content creation and distribution, de velop user-centric services, or ev en to detect disorders in medical applications. In this paper , we propose a graph-based method to identify clusters of users who are attending the same portion of spherical content, within one frame or a series of frames. W ith respect to other clustering methods, t he proposed solution takes into account the spherical geometry of the content and correctly i dentiﬁes clusters that group viewers who actually display the same portion of spherical content. Results, carried out by using a set of publicly available VR user navigation patterns, sho w t hat the proposed method identiﬁes more meaningful clusters, i.e., clusters of users who are consistently attending the same portion of spherical content, with respect to other methods. Index T erms — V irtual Reality , 360 ◦ video, user behaviour anal- ysis, data clustering 1. INTRODUCTION V irtual Reality (VR) systems are expected to become wide spread in a near future, with applications in a v ariety of ﬁelds, ranging from entertainment to e-healthcare. These systems inv olve omni- directional (i.e., 360 ◦ ) videos, which are visual signals deﬁned on a virtual sphere, depicting the 360 ◦ surrounding scene. The viewer , virtually positioned at the centre of the sphere, can navigate the scene with three Degrees-Of-Freedoms (3-DOF), i.e., yaw , pitch and roll, by rotating his head and changing his viewing direction. This interactiv e navigation is typically enabled by a head-mounted dis- play (HMD), which renders at each instant in time only the portion of the spherical content attended by the user, i. e. , the viewport . Understanding how users explore the VR content is important in order to optimize content creation [1] and distribution [2–6], dev elop user-centric services [7, 8], and ev en for medical applications t hat use VR to study psych iatric disorders [9]. In the last fe w years, many studies hav e appeared collecting and analysing the navigation pat- terns of users watching VR content [6, 8, 10–16]. Most studies build content-depen dent saliency maps as main outcome of their analysis, which compute t he most prob able re gion of the sphere attended by the vie wers, based on t heir head or eye mov ements [6 , 10, 17–19]. Some studies also pro vide additional quantitativ e analy sis based on metrics, such as the av erage angular velocity , frequency of ﬁxa- tion, and mean exploration angles [8, 13]. Models to predict future saliency maps have also been proposed [20–22]. Nev ertheless, none of these studies performs clustering of the na vigation patterns, i.e., none prov ides quantitativ e data indicating how many groups of users consistently share the same behav iour ov er time, by attending a sig- niﬁcantly ov erlapping portion of the 360 ◦ content. T his information can be useful in order to improve the accurac y and robustness of algorithms predicting users na vigation patterns. A proper cluster- ing could also be useful to reﬁne user-cen tric distribution strategies, where for exa mple differen t groups of users might be served with high quality content in t he dif ferent portions of t he sphere that wi ll be more likely attended by the viewers. T o the best of our kno wledge, studies identifying clusters for omnidirectional content deliv ery ha ve appeared only recen tly [23, 24]. User clustering is emplo yed to identify the number of Re gion of Interests (RoIs) over time and to perform long-term prediction, associating to each user the future trajectory of the cluster that user belong to. In [23], the viewport center , i.e., the vie wi ng direction of each user at each instant in time, is considered as a point on the equirectangular planar representation of the spherical content. These points on the plane are then clustered based on their E uclidean dis- tance, which unfortunately ignores the actual spherical geometry of the na vigation domain . Con versely , in [24 ] each user navigation pattern is modelled as indepen dent trajectories in roll, pitch, and yaw angles, and spectral clustering is then applied. While it is ef- ﬁcient in disco vering general trends of users’ na vigation, this clus- tering methodology might fail to identify clusters that are consistent in terms of actual overlap between viewp orts displayed by differ- ent users. It means that users in the same cluster do not necessarily consume the same portion of content. At the same time, this consis- tency needs t o be guaranteed for clustering methods to be used f or prediction purposes or for implementing accurate user-base d deli v- ery strategies. The goa l of this pap er is to prop ose a nov el clustering strategy able to detect meaningful clusters in the spherical domain. W e con- sider as meaningful cluster a set of users attending the same portion of spherical content at a given time instant or over a series of fr ames. This implies t hat the ov erlap between the viewpo rts of all users in a cluster must be substantial. With this goal in mind, ﬁrst we deﬁne a metric to quantify the geometric overlap between two viewpo rts on the sphere (Section II). Then, we use this metric to build a graph whose nodes are the centers of the viewports associated to differ- ent users. T wo nodes are connected only if the two corresponding vie wports hav e a signiﬁcant overlap (Section III). Finally , we pro- pose a clustering method based on the Bron-Kerb osch (BK) algo- rithm [25] t o identify clusters that are cliques, i.e., sub-graphs of inter-conn ected nodes (Section III). Results demon strate the con- sistency of the proposed clustering method in identifying clusters where the overlap between the portions of the spherical surface cor- responding t o different vie wports is higher than in state-of-the-art clustering (Section IV). In summary , the main contribution of this paper is to propose a clustering algorithm that i ) considers the spher- ical geometry of the data, ii ) identiﬁes clusters i n which there is a consistent and signiﬁcant geometric ov erlap between t he portions of spherical surface corresponding to viewports attended by different users (by i mposing that clusters are cliques), iii ) can be applied to a single frame or t o a series of frames. This is a useful new tool to improv e the accuracy of user’ s navigation prediction algorithms and user-dep endent VR content delivery strategies, such as t hose pro- posed in [23 , 24]. 2. GEODESIC DIST ANCE AS P RO XY OF VIEWPOR T O VERLAP Our goal is to identify clusters of users who are displaying the same portion of spherical content wit hin a frame or over a series of con- secuti ve frames. W e derive a simi l arity metric that reliably quantiﬁes ho w similar the portions attended by two users are. More speciﬁ- cally , each user attends a portion of the spherical surface. This is the projection on the spherical surfa ce of a plane tangent to the sphere (i.e., viewport ) in t he point that identiﬁes the user’ s viewing direc- tion ( center of the viewport ) 1 . T he ov erlap between the viewports attended by two users at an instant in time is a clear i ndicator of how similar users are with respec t to their displaye d vie wports. For ex- ample, an o verlap equal to the area of the vie wport corresponds to two users attending exac tly the same portion of visual content. The geometric ov erlap could be analytically compu ted, kno wing t he ro- tation associated to each user head’ s position (i. e., roll, pitch, and yaw) and the horizontal and vertical ﬁelds of view that deﬁne the vie wport. Ho wever , this is non triv ial. Thus, we propose the simple and st r ai ghtforward solution of using the geodesic distance between two viewport centres as a proxy for the viewport overlap. By geodesic distance we denote the length of the shortest arc connecting the vie wport centers on the sphere. Such distance is clearly an app roximation of the actual area o verlap: it does not ac- count for the three degrees of freedom of the user’ s head rotation, which deﬁne the exact viewpo rt. As a result, vie wports whose cen- ters have the same geodesic distance could correspon d to a dif ferent vie wport ove rlap (example in Figure 1). Nev ertheless, the smaller the distance between viewpo rt centers, the smaller t he approxima- tion error with geodesic distance. As an example, Figure 2 shows the pairwise geodesic distance (in blue) and the pairwise area ov er- lap (in red) between the vie wport attended by one user and those of 58 other users, for a frame of a video sequence, extracted from the public dataset proposed in [13]. The correlation between the two metrics is evide nt: if the overlap is high, the geodesic distance between t he two viewport centres is l ow . Particularly , a viewpo rt area overlap larger than 75% of the vie w port area corresponds to a geodesic distance smaller than 3 π / 4 . W e are therefore interested in identifying a threshold value belo w which the geodesic distance is a robust proxy of the viewp orts overlap. T o empirically deﬁne this thresho ld, we built the Recei ver Op- erating C haracteristic (ROC) curve as follo ws. W e assume that two users are attending the same portion of content if their viewports ov erlap by at least O th of the total viewp ort area. W e then deﬁne a threshold v alue for the geo desic distance G th such that users are 1 W ithout loss of gene raliza tion, we con sider a scenario in which the vie w- ports of all users have the same horizontal and vertica l ﬁeld of view . (a) 87% ov erlap (b) 58% ov erlap Fig. 1 . V iewports (in green and blue) with π / 10 centre distance. (a) vie wports are aligned with an overlap of 87% , (b) one vie wport is rotated by π / 2 resulting an overlap of 58% . 5 10 15 20 25 30 35 40 45 50 55 users /4 /2 3* /4 /8 /10 geodesic distance 0 25 50 75 100 % viewport overlap Pairwise geodesic distance Pairwise area overlap Fig. 2 . Comparison between pairwise geodesic distance and view- port ov erlap in one frame of video Rollercoaster from [13]. neighbou rs if their geodesic distance is belo w threshold. Anytime users are neighbors but their ov erlap is less than O th , we experience a false positiv e. Con versely , a true positiv e is e xperienced if users that are neighbors also experience an overlap equal or higher than O th . Equippe d with these deﬁnit i ons, we can compu te the R OC by considering all the videos and user’ nav igation patterns i ncluded i n the dataset described in [13]. F i gure 3 shows the curve obtained in our scenario with O th = 80% . On the x axis of the R OC curve there is the False Positive Rat e (FPR) i.e., probability to have a wrong classiﬁcation ove r the number of actual negati ve ev ents. T his rate should be as small as possible. On the contrary , the T rue Positiv e Rate (T PR) on the y axis represents the probability to correctly clas- sify an even t. The best valu e of geodesic distance i s π/ 10 since it correspon ds to a TPR value equa l to 1, which in our application means a sure identiﬁcation of viewports with an overlap of at least 80% based on the geodesic distance between their centers. There- fore, in the followin g we assume G th = π / 10 as a suitable thresh- old to robustly approximate t he area overlap between two viewpo rts by means of the geodesic distance between their centers. 3. CLIQUE-B ASED CLUSTERING ALGORITHM W e now describe the prop osed clustering algorithm, aimed at iden- tifying clusters of users having a common viewport overlap. W e model the evo lution of users’ viewports ov er a time-windo w T , i .e., a series of consecuti ve frames, as a set of graphs {G t } T t =1 . Each unweighted and und irected graph G t = {V , E t , W t } represents the set of users 2 vie wports at a particular instant t , where V and E t de- 2 W ithout loss of genera lity , we assume that the set of users does not change ov er time. This cov ers also cases in which users’ de vices are not synchroniz ed in the acquisiti on time, as users’ positions are usually interp o- lated to create a synchronized datase t. 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 False positive rate 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 True positive rate /2 12* /25 23* /50 11* /25 21* /50 2* /5 19* /50 9* /25 17* /50 8* /25 3* /10 7* /25 13* /50 6* /25 11* /50 /5 9* /50 4* /25 7* /25 3* /25 /10 2* /25 3* /50 2* /25 /50 0 Fig. 3 . ROC curve to ev aluate optimal G th considering all video in database [13] and O th = 80% . note the node and edge sets of G t . Each node in V corresponds to a user interacting with the 360 ◦ content. Each edge in E t connects neighbou ring nodes, where two nodes are neighbou rs if the geodesic distance between the vie wport centers associated to the users repre- sented by the nodes is lower than G t , as deﬁned in Section II. The binary matrix W t is the adjacenc y matrix of G t , with w t ( i, j ) = 1 if the geodesic distance between the two viewport centres of users i and j at time t is below a threshold. More formally: w t ( i, j ) = ( 0 , if g ( i, j ) ≤ G th 1 , otherwise (1) where g ( i, j ) is the geodesic distance between t he viewport centres of users i and j and G th is thresholding valu e, discussed i n Section II. Note that the clique-based clustering algorithm that we present i n the followin g gets i n input binary adjacency matrices. Hence, W t is binary . Looking at the graphs ove r time {G t } T t =1 , we are interested in clustering users based on their trajectories wit hin a time window T . Similarly to other clusters of trajectories [ 26], we derive an af ﬁnity matrix A that w i ll be the input to our clustering algorithm, with a ( i, j ) = I D T X t =1 w t ( i, j ) ! (2) where I D ( x ) = 1 if x ≥ τ and 0 otherwise. This means that in t he ﬁnal graph two nodes, representing two users, are neigh bours, i.e. , connected by an edge, only if the correspond ing viewports ha ve a signiﬁcant ov erlap in D instants over T . In the case of τ = T , we obtain a ( i, j ) = I T  P t w t ( i, j )  = Q t W t , and users’ viewport centers need to be always at a distance below threshold G t . This condition is howe ver too constraining, therefore we introduc e the threshold value τ . The goal of our clustering algorithm is to identify groups of users that are consistently attending the same portion of the spheri- cal surface. T o ensure that al l users belonging to a cluster are attend- ing the same area, they all need to be neighbors (i. e., a ( i, j ) = 1 for all pairs of users i and j in the cluster). Therefore, we propose a clique-based clustering . In graph theory , a set of nodes all con- nected to each other is called a clique . A clique perfectly matches with the deﬁnition of cluster needed in our application, which i den- tiﬁes a set of users all having signiﬁcant pairwise vie wport overlaps, thus att ending a common portion of video. W e consider the Br on- K erbosch (BK) algorithm [25] to ﬁnd all maximal cliques present in our graph (i.e., the most populated sub-graphs forming clique s). Ho wev er, maximal cliques identiﬁed by the BK algorithm can inter- sect, i.e., one user can belong to more than one clique. Con versely , Fig. 4 . Graphical example of t he proposed clique clustering. Algorithm 1 Clique-Based Clustering Input: {G t } T t =1 , D Output: K, Q Q Q = [ Q 1 , ..., Q K ] Init: i = 1 , A (1) = I D ( P t W t ) , Q Q Q = [ {∅} , . . . , {∅} ] repeat C C C = [ C 1 , ..., C L ] ← K B ( A ( i ) ) l ⋆ = arg max l |C l | Q i = C l ⋆ A ( i +1) = A ( i ) ( C C C \ C l ⋆ ) i ← i + 1 until A ( i ) is not empty ; K = i − 1 we are interested in identifying disjoint sets 3 .Hence, our clustering method consists of iterations of BK instances, as depicted in Fig- ure 4. W e initiali ze the clustering method by ev aluating the afﬁnity matrix from Eq. (2). Then, we perform the following steps (Algo- rithm 1 ): 1. Maximal cliques in t he graph are detected by the BK algo- rithm. 2. Among the resulting cl i ques, only the most populated one (i.e., the one w i th the highest cardinality) is kept as a clus- ter . 3. A new afﬁnity matrix is built, eliminating the entries corre- sponding to the elements of the cluster identiﬁed in S t ep 2. These t hree steps are repeated until all nodes are assigned to clusters. It is worth mentioning that this iterativ e selection does not guaran- tee optimal clusters (i.e., clusters with maximal joint ov erlap among the viewpo rts of users belonging to a cluster). Ho wever , i ) it im- poses viewport ov erlap among users within a cluster , ii ) it identiﬁes highly populated clusters, which can be translated in reliable t r ajec- tories/behav iours shared among users. 4. EXPERIMENT AL RESUL TS The proposed clustering algorithm is compared to state-of-the-art so- lutions, namely the Louvain method [27], the K-means clustering [28] and the clustering of VR trajectories proposed in [24] (labelled “SC”). W e use the geodesic distance between vie wport centers as distance metri c in all algorithms. Moreo ver , in the K-means cluster- ing , the number of clusters K is imposed as the v alue achie ved by the Louvain method (labelled “K-means 1”), as well as the K value obtained from our proposed clustering (labelled “K-means 2”). T he proposed implementations have been made publicly av ailable 4 . W e 3 Clusters should be disjoint for most content-del i very applic ations. For exa mple, if clusters are used for predict ion, each user must belong only to one cluster . 4 https:/ /github .com/LASP-UCL/spherical-clustering-in-VR-content. ROLLERCO AS TER TIMELAPSE Louvain method Clique Clustering K-Means 1 K-Means 2 Louvain Clique Clustering K-Means 1 K-Means 2 Fr . ~30s K 10 15 10 15 13 24 13 2 4 Mean Ov erl ap Cl.(% user > 3) 38.90 % (84.75 %) 62.50 % (76.30 %) 53.95 % (93.20 %) 48.10 % (94.90 %) 46 % (89.70%) 72.35% (56.90 %) 45.90 % (96.50 %) 51.50% (50% ) Main cl. ove rlap (% users) 26.70% (44.10 %) 58.60% (30.50%) 48.30% (19% ) 0% (20.70 % ) 32.90% (20.70 %) 69% (12.10% ) 15% (19% ) 23.50% (13.80 %) Fr . ~40s K 8 15 8 15 18 27 18 27 Mean Ov erlap Cl.(% users > 3) 35.60% (89.83%) 65.75% (76.30%) 44.38% (100 %) 47.65% (84.75 %) 47.65 % (75.90%) 72.95 % (77.60%) 60.27% (96.55%) 65.90% (84.50%) Main cl. ove rlap (% users) 24.20% (45.80 %) 58.33% (35.60%) 0% (30.50%) 0% (15 .25%) 51.80% (20.70 %) 63.70% (17.24%) 47.50% (20.70%) 33.60% (8.60 %) Fr . ~50s K 8 12 8 12 18 29 18 29 Mean Ov erlap Cl.(% users > 3) 48.20% (89.80%) 65.70% (86.45%) 43.50% (98.30%) 55.30% (96.60% ) 49.12 % (77.60%) 71.40% (51.70%) 48.36 % (87.90%) 55.90 % (55.17 %) Main cl. ove rlap (% users) 46.40%(30.50 %) 59.90% (57.70%) 0% (22.40 %) 0% (15.25%) 30 .60 (22.40%)% 70.80% (25.90%) 37% (24.15% ) 62.71% (17.24% ) T able 1 . Clustering analysis of users in three selected frames from Rollercoaster (ﬁrst half) and Timelapse (secon d half). I n brackets, the percentage of cov ered population. test these algorithms on two 1-minute long video sequences (Roller- coaster and Timelap se), which have been watched by 59 users whose naviga tion paths are publicly av ail able [13]. Rollercoaster has one main R oI (i.e., the rail) whil e in T imelapse, there are many fast mov- ing objects ( e.g., building s, people) along the equator line. Frame-based Clusterin g. Fi rst, we consider frame-based clus- tering, in which users are identiﬁed by their viewp ort centers at one gi ven frame. T able 1 reports results in terms of number of clusters ( K ), mean viewp ort ov erlap computed within each cluster composed by at least three users, and viewport ov erlap within the most pop- ulated cluster , that we refer to as the main cluster . The viewpo rt ov erlap within a cluster is the joint overlap across all users’ view- ports in the cluster . The mean ove rlap is compu ted by av eraging the vie wport ov erlap of all clusters with at least three users identiﬁed at a gi ven frame. In T able 1 , we also prov ide the percentage of users cov- ered by clusters. The proposed algorithm always ensures the highest vie wport overlap (on avera ge always over 50%) with respect to the other methods. T his is due to the implicit constraint that is imposed by the clique-based detection of t he clusters. This con straint leads to the identiﬁcation of clusters that are populated and yet meaning- ful (i. e. , with large viewpo rt ov erlap among users). For examp le, in Rollercoaster at frame 40 s , our algorithm identiﬁ es a main cluster grouping 35% of the population wit h a viewpo rt ov erlap of 58 . 33% . This is much higher than t he ov erlap of 24 . 2 0% ( 0% ) in the main cluster identiﬁed by the Louv ain (K-means) method. Beyo nd the accurac y , another i mportant parameter is the percentag e of t he pop- ulation that is cove red by clusters with a signiﬁcant number of users. These clusters are the most useful ones to allow predictions. For instance in T imelapse at frame 50 s , our method identiﬁes a large number of clusters (29), which also includes single users clusters. Ne vertheless, half of the popu lation (51.70%) belong s to clusters with more than 3 users w i th high v alue of joint overlap (71.40%). T rajector y-based clustering. Second, we test the proposed algorithm ov er a time-window with T = 3 s and τ = 1 . 8 s . In this case we compare the proposed solution with algorithm SC [24]. The algorithm S C is applied t o trajectories spanning the entire video, as in [24], as well as consecu tiv e ti me windo ws of 3 s . W e also con- sider the case of SC in w hich t he number of clusters K is not e va l- uated from their afﬁnity matrixb ut it is imposed as the K obtaine d from our solution. W e label this clustering (“SC - K giv en”). Fig- ure 5 shows results in t erms of overlap amon g viewports clustered together i n both Rollercoster (a) and T imelapse (b). In more details, all users are clustered ov er consecutiv e time-windo ws of T secon ds each. Then, for each frame the vie wport ov erl ap among all users within one cluster is ev aluated and av eraged across clusters. The mean overlap (solid line) and the variance (shaded area) is ﬁ nally depicted in the ﬁgure. Moreov er , the mean v alue of joint ove rlap in clusters with more than three users across the entire video is sho wed in the legen d. Our solution outperforms SC in terms of mean ov er- lap but also in terms of v ariance. The latter shows the stability of 5 10 15 20 25 30 35 40 45 50 55 sec 0 10 20 30 40 50 60 70 80 90 100 % Overall intersection VPs Clique clustering (66.96 %) SC - T = 3s (8.20 %) SC - entire video (24.81%) SC - K given (48.05%) (a) Rolle rcoaster video - T = 3 s. 5 10 15 20 25 30 35 40 45 50 55 sec 0 10 20 30 40 50 60 70 80 90 100 % Overall intersection VPs Clique clustering (72.22%) SC - T = 3s (3.24%) SC - entire video (2.45%) SC - K given (26.43%) (b) Ti melapse video - T = 3 s. Fig. 5 . Mean and v ariance of the joint overlap across clusters ov er time. In the legend, the mean value of joint viewport overlap of clus- ters wi th more than three users performed across the entire video. our clustering method ensuring f or each cluster a consistent overlap ov er time. F inally , the performance gain is signiﬁcant also in terms of overlap i n the most populated clusters (value provide d in the l eg- end). 5. CONCLUSIONS In this paper , we proposed a nov el graph-based clustering strategy able to detect meaningful clusters, i.e., group of users consuming the same portion of a virtual reality spherical content. First , we derived a geodesic distance threshold value to reﬂect the similarity among users and then we built a clique-based clustering based on this met- ric. Results on a set of publicly av ailable VR user navigation patterns sho w that the propose d method identiﬁes more meaningful clusters with respect to other state-of-the-art clustering method s. The as- sociated code has been made publicly a v ailable for future compar- isons. Future works will focus on the application of our method in the framew ork of adapti ve streaming of VR videos and for the pre- diction of user naviga tion patterns. 6. REFERENCES [1] A. Serrano, V . S i tzmann, J. Ruiz-Borau, G. W etzstein, D. Gutierrez, and B. Masia, “Movie editing and cognitiv e e vent segmen tation in virtual reality video, ” vol. 36, 2017. [2] S. Rossi and L. T oni, “Navigation-A ware Adapti ve Str eam- ing Str at egies for Omnidirectional V ideo, ” in IE EE 19th Inter- national W orkshop on Multimedia Signal Pro cessing (MMSP) , 2017. [3] X. Corbillon, G. Simon, A. De vlic, and J. Chakareski, “V iewport-ada ptiv e naviga ble 360-degree video deli very , ” in IEEE International Confer ence on Communications (ICC) , 2017. [4] S. Petrangeli, V . Swaminathan, M. Hosseini, and F . De Turck , “An HTTP/2-Based Adaptiv e Streaming Framew ork for 360 V irtual Realit y V ideos, ” in Pro ceedings of the 20 17 A CM on Multimedia Confer ence , 2017. [5] C. Fan, J. Lee, C. Lo, W .and Huang, K. C hen, and Cheng-H. H., “Fixation Prediction for 360 V ideo Str eaming in Head- Mounted V irtual Reality, ” in P r oceedings of the 27th W orkshop on Network and Operating Systems Support for Digital Audio and V ideo . ACM , 2017. [6] M. Y u, H. Lakshman, and B. Girod, “A Frame work to Evalu- ate Omnidirectional V ideo Coding S chemes, ” in IEEE Interna- tional Symposium on Mixed and Augmented Reality (ISMAR) , 2015. [7] M. Broeck, F . Kawsar , and J. F . Sch ¨ oning, “It’ s all around you: Exploring 360-de gree video viewing experiences on mo- bile de vices, ” in Pr oceedings of the 2017 ACM on Multimedia Confer ence (MM) , 2017. [8] V . Si tzmann, A. Serrano, A. Pav el, M. Agrawala, D. Gutierrez, B. Masia, and G. W etzstein, “Saliency in VR: How Do People Explore V irtual En vironments?, ” vol. 24, no. 4, April 2018. [9] K. Sriv astav a, RC Das, and S Chaudhury , “V irtual reality ap- plications in mental health: Challenges and perspecti ves, ” In- dustrial psych iatry journal , vol. 23, 2014. [10] E. Upenik and T . Ebrahimi, “ A simple method to obtain visual attention data in head mounted virtual reality , ” in IEEE International Confer ence on Multimedia Expo W ork- shops (ICME W) , July 2017. [11] B. Hu, I. Johnson-Be y , M. Sharma, and E. Niebur , “Head mov ements during visual ex ploration of natural images in vir- tual reality , ” in IEEE 51st Annual Confer ence on Information Sciences and Systems (CI SS) , 2017. [12] C. W u, Z. T an, Z. W ang, and S. Y ang, “A Dataset for Explor- ing User Behaviors in VR Spherical V ideo S treaming, ” in P ro- ceedings of the 8th ACM on Multimedia Systems Confer ence (MMSys) , 2017. [13] X. Corbillon, F . De Simone, and G. Si mon, “360-de gree video head mov ement dataset, ” in P r oceedings of the 8th ACM on Multimedia Systems Confer ence (MMSys) , 2017. [14] C. Lo, W . and Fan, J. Lee, C. Huang, K. Chen, and C. Hsu, “360 V ideo V iewing Dataset in Head-Mou nted V ir t ual Real- ity , ” i n Proceed ings of the 8th ACM on Multimedia Systems Confer ence (MMSys) , 2017. [15] S. Fremerey , A. Singla, K. Meseberg, and A. Raak e, “ A V - track360: an open dataset and software recording people’ s head rotations watchin g 360 videos on an HMD, ” in Pr oceed- ings of the 9th ACM Multimedia Systems Confer ence (MMSys) , 2018. [16] A. S ingla, S. Fremerey , W . Robitza, P . Lebreton, and A. R aake, “Comparison of Subjecti ve Quality Ev aluation for HEVC En- coded Omnidirectional V ideos at Different Bit-r ates for UHD and FHD Resolution, ” in Pr oceedings of the ACM Confer ence on Multimedia ( MM) Thematic W orkshop s , 2017. [17] A. Ducho wski and G. Marmitt, Modeling V isual Attention in VR: Measuring t he Accuracy of Pr edicted Scanpaths , P h.D. thesis, 2002. [18] E. J. Da vid, J. Guti ´ errez, A . Coutrot, M. Da Silva, and P . Le Call et, “A Dataset of Head and Eye Moveme nts for 360 ◦ V ideos, ” in Pr oceedings of the 9th ACM Multimedia Systems Confer ence (MMSys) , 2018. [19] Y . Rai, P . Le Cal l et, and P . Guillotel, “Which saliency weight- ing for omni directional image quality assessment?, ” i n In- ternational Confer ence on Quality of Multimedia Experience (QoMEX) , May 2017. [20] I. Bogdanov a, A. Bur, and H. Hugli, “V isual Attention on t he Sphere, ” IEE E T ransac tions on Image Pr ocessing , vol. 17, no. 11, Nov 2008. [21] I. Bogdano va, A. Bur, H. H ¨ ugli, and P . Farine, “Dynamic visual attention on the sphere, ” Computer V ision and Image Under- standing , vol. 114, no. 1, 2010. [22] M. Xu, J. Song, Y . and W ang, M. Qiao, L . Huo, and Z. W ang, “Modeling Att ention in Panoramic V ideo: A Deep R ei nforce- ment Learning Approach , ” arXiv pr eprint arXiv:171 0.10755 , 2017. [23] L. Xie, X. Zhang, and Z. Guo, “CLS : A Cross-user Learning based System for Impro ving QoE in 360-degree V ideo Adap- tiv e Streaming, ” in ACM Multimedia Confer ence on Multime- dia Confer ence (MM) , 2018. [24] S. Petrangeli, G. Simon, and V/ Swaminathan, “T rajectory- Based V iewport Prediction for 360-Degree V i rtual Reality V ideos, ” in IEEE confer ence on Artiﬁcial Intelli gence and V ir- tual Reality (AIVR) , 2018. [25] C. B r on and J. Kerbosch, “ Algorithm 457: ﬁnding all cliques of an undirected graph, ” Communications of the ACM , vol. 16, no. 9, 1973. [26] S. Atev , G. Miller, and N. P . Papanik olopoulos, “Clustering of vehicle trajectories, ” IEEE T ransa ctions on Intelligent Tr ans- portation Systems , vol. 11, 2010. [27] V . D. Blondel, J. Guillaume, and E. Lambiotte, R.and Lefebvre, “Fast unfolding of communities in large networks, ” Journa l of statistical mecha nics: theory and ex periment , , no. 10, 2008. [28] J. Hartigan and M. W ong, “ Algorithm as 136: A k-means clus- tering algorithm, ” Journal of the Royal Statisti cal Society , vol. 28, 1979.

Spherical clustering of users navigating 360{deg} content

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment