DataSpace

Sample screen configurations. Thanks to the 7-DOF robotic arms, screens can be dynamically positioned and oriented in space. Typical configurations include screens in portrait (e.g. “Immersion”) or landscape (e.g. “Context”) mode, distributed along a circumference whose radius determines the level of immersion. Triptych configurations consist of separate groups of combined screens, sometimes coupled with additional screens. The robotic arms can also rotate the screens toward viewers standing outside the environment.

System architecture. Dataspace is composed of four different subsystems whose properties are stored in a spatial representation of the environment. Developers can create web-based applications that access the Dataspace model and connected devices through a software middleware layer called Merlin.

Dataspace (Fig. [fig:datacenter]) is a room-sized collaborative environment where multiple researchers can interact naturally with rich and complex data at scale (D1). This environment re-imagines the conference room as a dynamic physical environment that adapts to users, applications, and data.
The physical components constituting Dataspace are:

15 OLED 4K resolution displays, which can be reconfigured in space by 15 7-DOF Kuka robotic arms, mounted to the ceiling in a circular pattern with a radius of 2.5m (D2). Due to weight and wiring constraints, the screens do not natively possess touchscreen capability. However, this functionality is emulated using the torque sensors located in each robot joint (with a sensitivity of $`\pm`$2cm). While each screen can be moved and rotated independently in space, Figure 1 shows some of the most common configurations (D3). Dataspace possesses both software and hardware safety protocols preventing screens from colliding with each other and with people in the room.
A central, smooth-surface ceramic table that can be raised or lowered based on application requirements. Thanks to two HD projectors, blended 2K resolution visual output can be projected onto the table. Despite its surface being passive, table gestures performed by users can be detected through the Dataspace perception system.
A central cowling mounted to the ceiling, holding the two projectors mentioned above and eight crossfiring Kinect v2 depth sensors, used by the Dataspace perception system to track objects and people, and identify gestures in the area between the table and the screens.
A spatial audio system consisting of a speaker mounted on each robotic arm, plus five speakers and two subwoofers in the cowling. The system also includes four phased-array microphones, which can detect the azimuth of the strongest speech signal and interpret it through IBM Speech-to-text and Conversation services .
A set of ten Microsoft Hololens augmented reality headsets, which seamlessly integrate with the other Dataspace components for interaction, audio services and graphical rendering. These AR headsets can be worn by users interested in interacting with 3D information, often displayed atop the central table (D3).
An optional, varying set of mobile devices such as laptops, tablets, smartphones and virtual reality headsets (D5), all of which may be used as extensions of Dataspace screens, or as means for users to virtually join the session.

Dataspace implements an API-based modular software architecture (Fig. 2) for handling its subsystems: motion control (coordinating the 15 robotic arms), perception (detecting and tracking people and objects), display (controlling screen content and table projections) and audio (speech interpretation and audio output). Each subsystem runs on a dedicated server machine and communicates with the centralized software controller Merlin through a publish/subscribe (MQTT) protocol. We leverage ROS (Robot Operating System) for controlling the robotic arms and IBM Cloud services for speech interpretation, while we developed a custom algorithm for merging the depth information from multiple Kinect devices into a single point cloud. We chose to adopt web-based rendering for our screens through Electron , promoting high flexibility for application development (e.g. HTML, WebGL) and support for external devices with browsing capabilities (e.g. laptops and smartphones), so that the same content and interactions can be easily transferred across devices. Applications can access the Merlin API through a dedicated Node.js package and be automatically deployed to Dataspace as Docker containers. Alternatively, custom applications can be built by leveraging a RESTful version of the Merlin API.

Spatial Awareness

Environmental perception. The 3D model of the environment stored by the robotic arm controller is combined with the point cloud generated by the eight crossfiring Kinect v2 sensors, establishing positions of objects and people with respect to the current screen configuration. This is aligned with the real-time environment mesh built by the AR headsets, enabling correct occlusion rendering and spatial interaction with virtual content.

Drawings of the primary interactions possible with Dataspace. The figure above displays, from left to right, a) touch and b) physical interaction with screens, c) table and d) mid-air gestures, e) AR gaze and gesture integration, and f) VR wand-based extension for remote participants. Despite that the user is shown wearing an AR headset in several of these interactions, the device is only required in interaction (e).

In a complex environment mixing people with the dynamic configuration of heterogeneous devices, it is important to maintain a stateful representation of each entity in order to guarantee proper interaction with the content. Dataspace holds a virtual 3D model of its physical environment (screens, table, robotic arms, and cowling), that is updated by the robotic arm controller every time a new screen configuration is applied. The point clouds generated by the eight crossfiring Kinect v2 sensors are combined as to produce a single depth representation of the environment, which we apply to the corresponding reference frame in the virtual 3D model. This custom perception system allows us to not only detect and track people and objects in space, but also to determine their relative and absolute position with respect to the table and the screens. For instance, we can easily detect if a person is pointing at a particular screen or a specific location on the table, and then forward this information on to the system’s application layer, where it is then transformed into a user interaction. Similarly, knowing that a user is physically close to or far away from specific content can be fundamental in determining which action the system should perform (e.g. bringing a screen or moving the content closer the user, or performing an operation on the nearest screen, especially in response to voice commands). On top of the environment virtual 3D model and the perception system, we include a third, additional layer represented by the SLAM spatial mapping built at run-time by the AR devices. Whenever a Hololens is powered on, it begins constructing a spatial mesh of the surrounding environment, which is then expanded and updated as the device moves in the space. The headset attempts to match its version of the environment to a reference (precomputed) mesh anchor. When a match is found, and the devices recognizes its position in the environment, the AR application adjusts its coordinate frame accordingly. By combining information from the virtual 3D model and the perception system, each headset can identify which screen the user is looking at or where people or content are located in the room (Fig. 3). This is also useful for positioning virtual content inside the workspace (e.g. constraining the position of content to accessible areas) and for proper lighting and occlusion computation (e.g. virtual objects are not rendered if positioned behind other objects). Finally, relying on updates from the virtual 3D model is far more reliable than using built-in Hololens spatial mapping, which takes too much time to recompute its mesh when the environment is physically reconfigured.

Interacting with the Environment

Interaction events registered by each of Dataspace subsystems are collected by the Merlin software middleware and broadcasted to all views and connected devices, which can independently decide how to handle them (D4).

Thanks to our emulation of touchscreen functionality through the robotic arm torque sensors, users can perform single-touch screen operations such as click, drag, scroll, and zoom (Fig. 4a), which are then forwarded to the single views as Javascript events. Users can also physically interact with the system by manually moving and rotating screens in space (Fig. 4b), a function made possible by the robotic arms adapting to force applied by the user (“compliance mode”). This mode of interaction has proven to be effective for applications based on slicing operations (e.g. MRI data, multi-video time analysis).

Thanks to its perception system, Dataspace can identify and respond to gestures performed in mid-air or on the table. Raising both hands in front of a screen will clear that screen’s content, while raising one hand is interpreted as a request to speak, thus orientating content and lighting towards the requester (Fig. 4d, D2). Similarly, hands and objects on the table are tracked based on horizontal position and height, allowing for the use of props and for the selection, panning, zooming, rotation and rescaling of views and projected shapes (Fig. 4c). Table interactions are complemented by a rotational input device (Microsoft Surface dial) whose position is tracked in space. This device is useful for creating radial menus and interactions with content distributed over a circular shape.

If a user wants to physically interact with a view that is currently distant from his position, Dataspace can move that view, or even the physical screen, closer to the user. However, in most cases, it is simpler to leverage the environment’s microphone array, which, combined with the perception system, enables understanding the source and directionality of speech. We use the Watson Assistant service in combination with speech-to-text and text-to-speech to provide agent-based voice interaction. While dialogue is application-dependent, we use the “Merlin” attention keyword for cross-application vocal commands such as moving robotic arms and/or the content of views. The system’s multiple speakers can also generate 3D sounds inside the environment, for instance in proximity to a particular screen or person.

The AR headset we currently use, Microsoft Hololens, allows us to provide additional interactions, responding to user gaze, hand gestures (airtap, bloom), head motion and voice (we plan on integrating eye tracking in the near future as well). Though they are mainly used to interact with the 3D AR content visualized at the center of the room, headsets are an integral part of the Dataspace system (D3). For instance, utilizing audio input through the headset’s microphone renders it trivial to identify who is speaking (and their position in space), in addition to providing better audio quality. In combination with overlay graphics to provide visual feedback, gaze is used to enable users to select screens and other objects in the environment (Fig. 4e). For instance, while gazing at a screen and performing a long airtap, the contents of that screen are selected, and can be swapped with the contents of another screen. Finally, despite this still being an experimental feature, AR headsets can also be used to overlay information atop 2D data already displayed on a screen. As a separate consideration, we note that the smooth surface of the table and the dark color of the screens do not represent an ideal tracking environment for the Hololens. When movement on the table is highly dynamic, there is a chance that AR content may drift in space. To handle these concerns, we have adopted a design solution that involves displaying a trackable crown (often used as a radial menu) around the table content, as well as graphical cues on the screens to facilitate visual alignment.

Since content in Dataspace is delivered through web-based technologies, it is relatively straightforward to integrate with devices such as personal laptops, tablets and smartphones (D5). Tasks such as typing and coding are easier to perform through standard devices (e.g. a keyboard), and it is fundamental to provide such devices as a complement to native Dataspace interactions. For instance, if a view on a Dataspace screen requires textual input, the user can decide to provide that input by voice, use a virtual touch keyboard on the screen, or move the view to her personal laptop, type with her keyboard, and then send the view back to the Dataspace screen. For applications requiring higher frequency input, it is also useful to make gesture-based interactions available in close proximity to the user (and independent of his position in the room), providing a good use case for devices such as smartphones and tablets.

Dataspace virtual reality extension. To enable remote users to participate in data exploration without sacrificing Dataspace’s unique capabilities, we built a VR version of the environment that allows scientists to interact in real time with the people physically present in the Dataspace conference room, as well as the content the are viewing. The figures above illustrate how users perceive remote participants through AR (top) and how remote participants interact with the physical room through VR (bottom).

Two drawbacks typically associated with immersive environments are price and scalability. In particular, these technologies often require large spaces and a dedicated construction process, and their cost is usually a significant limiting factor in the number of deployed instances. While in this paper we present Dataspace as a complete system and research environment, its modular design allows for the deployment of any combination of its subsystems. The screens, robotic arms, table, perception system and augmented reality integration can all be independently removed, in which case the system falls back on the interfaces and interaction methods remaining. We have already constructed a second Dataspace system which does not make use of the robotic arms, and can envision versions of the environment with different numbers of screens, based on user need.
An interactive virtual reality interactive rendering of Dataspace makes the system available to users in remote locations (D1, D2). To create this rendering, we combine the 3D virtual model updated by the robotic arm controller, the perception system, and AR tracking information to recreate the current physical state of a Dataspace installation. This can be observed in a real time, first person perspective by a remote user in VR. This VR extension shows the current configuration of the robotic arms, the content displayed on the table and on each screen, and the estimated position of each team member within the space (Fig. 5). The VR user can perform both touch (emulated using the VR hand controllers, as shown in Fig. 4f) and voice interactions as though they were present in the room. By pulling a virtual screen down with the VR hand controller, for example, both the virtual and the actual screen will move down). Additional examples of uses for the Dataspace VR extension are described in section Applications.

Discussion

Hybrid Analytics

The design of collaborative, immersive environments for data exploration has historically been based on balancing the complex technological trade-offs among hardware complexity, image quality, resolution, field of view, depth rendering, visual acuity, perception issues (e.g. ghosting), and cost . Originally, CAVE was born as a small environment for 3D scientific exploration, applied to domains such as biology, fluid dynamics, architecture and geospatial data. CAVE2 and Reality Deck introduced multi-user data exploration, outlining a space large enough to promote user movement. In the former case, simultaneous rendering of 2D and 3D scientific content was introduced, whereas the latter has focused on ultra high resolution immersive graphics, foregoing the use of stereoscopic 3D and adopting depth clues (e.g. motion parallax) instead. A technology comparison table is presented in Table [table:comparison] for reference.

With its capability to simultaneously render 2D and 3D content, Dataspace represents a second implementation of the concept of a hybrid reality environment, after CAVE2. At the same time, the pixel density of Dataspace is comparable to that of Reality Deck, but without having Reality Deck’s almost 360°horizontal FOV coverage. This technological difference makes Dataspace less applicable to immersive scientific exploration (i.e. our environment cannot completely surround the user with visuals according to the original criteria proposed by Cruz et al. ). However, Dataspace provides “outside-in” 3D stereoscopic rendering combined with high resolution 2D visual analytic capabilities, enabling what we call hybrid analytics. Whereas previous environments have mostly dealt with visual exploration of scientific data, the unique characteristics of Dataspace make it a good candidate for flexible focus-and-context analysis of multiple types of information at the same time — as demonstrated by the applications presented in the previous section. In particular, Dataspace bridges complementary visualization environments and allows users to seamlessly switch between them, providing a wider support for the different perceptual and interaction tasks that characterize visual information analysis .

Towards the Meeting Room of the Future

While in this work we presented Dataspace as a research environment, it is our goal to continue exploring how its components can be better combined. We briefly discussed how the possibility of reconfiguring screens in space applies differently across application scenarios. However, we would like to even further explore how screens can be dynamically reconfigured in space within the same application and how physical interaction (i.e. moving screens by hand) can be better exploited. Similarly, we hope to further explore the concept of egalitarian access to the data, a key design factor differentiating Dataspace from other immersive environments. In particular, how will gestures (e.g. raising your hand) and interfaces (e.g. the puck) be used in taking control, sharing and redirecting content within a group of people? In terms of collaboration, the possibility of decoupling visualizations and interactions and separately providing them to different users or groups of users represents an interesting challenge, requiring further study as to how people will interact with each other and combine their individual results while using the same application, in the same physical space. Finally, it is our mission to continue exploring how AR and VR technologies can come into play in these contexts, eventually becoming integral to collaborative environments.

Conclusion

In this paper we introduced a new hybrid environment called Dataspace, which aims at exploring new types of interaction in collaborative environments. In particular, Dataspace focuses on a seamless integration of different types of technology, offering a hybrid approach to the delivery of immersive analytics. Our discussion focused on the integration of physical workspaces with AR and VR technologies, which proved to be a fundamental extension for handling specific types of data and mitigating system scalability issues. We also demonstrated, through four real-world applications, how Dataspace can be used in very different domains and adapted to different user requirements, and we examined the advantages and trade-offs of the system compared with existing technologies. We believe this research will be helpful in developing better collaborative workspaces.

Immersive environments have gradually become standard for visualizing and analyzing large or complex datasets that would otherwise be cumbersome, if not impossible, to explore through smaller scale computing devices. However, this type of workspace often proves to possess limitations in terms of interaction, flexibility, cost and scalability. In this paper we introduce a novel immersive environment called Dataspace, which features a new combination of heterogeneous technologies and methods of interaction towards creating a better team workspace. Dataspace provides 15 high-resolution displays that can be dynamically reconfigured in space through robotic arms, a central table where information can be projected, and a unique integration with augmented reality (AR) and virtual reality (VR) headsets and other mobile devices. In particular, we contribute novel interaction methodologies to couple the physical environment with AR and VR technologies, enabling visualization of complex types of data and mitigating the scalability issues of existing immersive environments. We demonstrate through four use cases how this environment can be effectively used across different domains and reconfigured based on user requirements. Finally, we compare Dataspace with existing technologies, summarizing the trade-offs that should be considered when attempting to build better collaborative workspaces for the future.