Visual Summary of Value-level Feature Attribution in Prediction Classes with Recurrent Neural Networks

Visual Summary of Value-level Feature Attribution in Prediction Classes   with Recurrent Neural Networks

Conclusion

As deep learning pervasively used in decision-making tasks for multidimensional sequential data analysis, it’s essential to understand contributing features and temporal patterns for predictions. In this work, we present ViSFA, the first visual analytics system that scalably summarizes value-level feature attributions with recurrent neural attention networks. We test ViSFA with two real-world datasets, each using two RNN models. The case study results demonstrate that ViSFA can 1) help distill contributing patterns for different RNN models of different prediction performances, 2) reveal gradual changes in the RNN model learning process, and 3) help effectively reason value-level feature attribution for different application domains, and the visual summaries of temporal patterns in feature attribution provide guidelines for making future decisions. We hope our work will motivate further research in developing domain-user-oriented analysis systems with deep learning.

Design Considerations

There are a few observations of RNN models before visualizing feature attribution with them . First, if an RNN model is successfully trained, the instances within a prediction class in the dataset are expected to share some common characteristics which can be captured by the RNN model. In other words, if the instances do not share any common pattern within any prediction class, it’s impractical to learn a convergent or high-performance RNN model. Second, characteristics from different prediction classes are different. If the commonalities from one class cannot distinguish itself from another, the model training cannot succeed. Third, the attention weight of an event reflects the importance of the event in making such distinction. However, because state-of-the-art RNN models trained with real-world datasets can hardly achieve 100% accuracy, the learned attention weights are often not completely accurate. For example, LSTMVis notices interpretable patterns but also significant noise when studying RNN models. Likewise, AttentionHeatmap reveals that the filtered events contain noise that does not follow the major pattern no matter what attention range is selected by the user.

Based on the above observations, we derive the following design goals (DGs):

DG1: Facilitate the attribution analysis for tensors that are composed of dimensions including time, instance, feature, and feature value. Given multi-dimensional tensors, how do we synthesize meaningful visualizations for interpreting a model’s prediction? Knowing how entire classes share common patterns is not enough because these patterns not necessarily contribute to the model formation, a.k.a, distinguishing different classes. Therefore, the system should help to distill complex data to find the contributing subset and visualize the patterns in the subset.

DG2: Highlight major patterns across the instances within each prediction class. As mentioned earlier, we notices significant noise when studying deep learning models. The learned attention weights are noisy too. The visualization should remove or minimize the influence of noises and highlight major patterns in data. Besides, the visualization should highlight common patterns shared among the instances in a prediction class. Those common patterns are the keys for users to find insights from each class.

DG3: Contrast differences between prediction classes. The visualization design should facilitate easy and fair comparison between different classes. For example, visual comparison based on imbalanced class sizes can suffer from inequity if the visualization results are affected by class sizes. The visualization design should guarantee the visualized pattern is a true reflection of its belonging class instead of the influence of class size.

DG4: Be able to scale for large datasets. Because predictions based on state-of-the-art models are formed with millions of weights optimized over millions of data instances, explaining predictions for single data instances can miss a bigger picture oftentimes. Understanding how entire classes contribute to a model is important for trusting a model’s prediction and deciphering what a model has learned. Therefore, the design should build entire class representations regardless of the class size.

DG5: Be generic for different applications. RNN becomes widely used across different domains because of its generality. Even the vanilla RNN models can be adaptive for multiple disciplines such as finance stock price forecasting and customer analysis predicting purchasing intent . It’s challenging but meaningful to build a domain-independent visualization system. For users from different domains, we should provide easy-to-interpret interaction and visualization designs.