Predicting human performance in interaction tasks allows designers or developers to understand the expected performance of a target interface without actually testing it with real users. In this work, we present a deep neural net to model and predict human performance in performing a sequence of UI tasks. In particular, we focus on a dominant class of tasks, i.e., target selection from a vertical list or menu. We experimented with our deep neural net using a public dataset collected from a desktop laboratory environment and a dataset collected from hundreds of touchscreen smartphone users via crowdsourcing. Our model significantly outperformed previous methods on these datasets. Importantly, our method, as a deep model, can easily incorporate additional UI attributes such as visual appearance and content semantics without changing model architectures. By understanding about how a deep learning model learns from human behaviors, our approach can be seen as a vehicle to discover new patterns about human behaviors to advance analytical modeling.
Seeking models for predicting human performance in performing an interaction task has long been pursued in the field of human computer interaction [2-8, 11, 16]. In addition to the scientific value of understanding human behaviors, creating these models has practical values in user interface design and development. A predictive model allows a developer or designer to understand the expected performance of an interface without having to test it with real users, which can be expensive and effort consuming.
Several predictive models of human performance have been devised, including Fitts’ law [8] and Hick’s law [11], which are rooted in information theory and experimental psychology. However, these models capture a certain aspect of human performance in isolation, e.g., motor control or decision making. They are limited in modeling human performance in realistic interaction tasks where multiple factors interplay. Recent work (e.g., [2]) has attempted to develop compound models that combine models such as Fitts’ law. While these methods have made great progress in predicting time performance in more realistic tasks, these analytical models are not easily extensible to accommodate new factors that might come into play.
In this work, we take a departure from existing analytical approaches for performance modeling by using a datadriven approach based on the recent advance in deep learning [15]. Deep learning has proven successful in many domains, such as computer vision [15] and natural language processing [1]. It relieves the need of careful feature engineering and of a great amount of domain knowledge in creating a predictive model. It can also capture patterns that only manifest in the data but are difficult to articulate in an analytical form.
In particular, we devise a predictive model (see Figure 1) for interaction performance based on a deep recurrent neural net architecture using Long-Short Term Memory (LSTM) [12]. The unique architecture of our LSTM-based model allows us to naturally capture a variety of factors that come into play in UI tasks, including not only what human users are perceiving and performing at the moment but also what they have learned from the past regarding an interaction task. To scope our work, we focus on a common task on desktop and smartphones, where users select a target item from a vertical menu or list1 , e.g., choosing a song to play, a person to contact, selecting an application in the start menu or simply activating a command in a dropdown menu. Because users often need to perform these selection tasks repeatedly over the time, we investigate our approach in the context of a sequence of selection tasks.
We design a novel hierarchical deep architecture for menu performance modeling. In our architecture, a recurrent neural net is used to encode UI attributes and tasks at each target item selection. This allows us to represent a menu with a varied length and to easily incorporate any additional UI attributes such as visual appearance and semantics. We then use another recurrent net to capture learning effects, a major component in human performance. The entire model is learned end-to-end using stochastic gradient descent. The model outperforms existing analytical methods in various settings for predicting selection time. Importantly, it is easily extensible for accommodating new UI features and human factors involved in an interaction task.
As a machine learning model, especially a deep architecture, the general challenge is that it is difficult to gain insights into what the model actually learns. We analyze how our model learns to mimic human behaviors. We show that our model “remembers” and “forgets” like a human-the model gains expertise on a visited item from past trials and that expertise fades away over time if the user does not access the item for a while. Prior work models expertise as frequency counts [2,7], which does not take into account the “forgetting” effect in human behavior. We also discuss how this “memory effect” is affected by different menu organizations. We believe these analyses improve our understanding about how a deep learning model learns from human behaviors, which in turn can inspire analytical modeling.
Extensive work has been conducted in modeling human behaviors and predicting human performance for performing interaction tasks. For example, Fitts’ Law [8] predicts the time needed for an expert human to acquire a visual target. Similarly, Hick’s Law [11] is also a wellknown model that describes the time required for an expert human to make a decision of choosing among a given number of options.
While each of these previous methods is amazingly robust for modeling the specific aspect of human behaviors it focuses on, they are limited in modeling realistic interaction tasks. For example, Fitts’ Law was originally proposed for a limited setting of one-dimensional target with no distractors. Although prior work has extended Fitts’ Law in seve
This content is AI-processed based on open access ArXiv data.