Interpretable Machine Learning for Privacy-Preserving Pervasive Systems

Reading time: 5 minute
...

📝 Abstract

Our everyday interactions with pervasive systems generate traces that capture various aspects of human behavior and enable machine learning algorithms to extract latent information about users. In this paper, we propose a machine learning interpretability framework that enables users to understand how these generated traces violate their privacy.

💡 Analysis

Our everyday interactions with pervasive systems generate traces that capture various aspects of human behavior and enable machine learning algorithms to extract latent information about users. In this paper, we propose a machine learning interpretability framework that enables users to understand how these generated traces violate their privacy.

📄 Content

Interpretable Machine Learning for Privacy- Preserving Pervasive Systems Our everyday interactions with pervasive systems generate traces that capture various aspects of human behavior and enable machine learning algorithms to extract latent information about users. In this paper, we propose a machine learning interpretability framework that enables users to understand how these generated traces violate their privacy.
With the emergence of connected devices (e.g., smartphones and smartmeters), pervasive sys- tems generate growing amounts of digital traces as users undergo their everyday activities. These traces are crucial to service providers to understand their customers, to increase the degree of personalization, and enhance the quality of their services. For instance, personal digital traces stemming from public transit smartcards help transportation providers understand the commuting patterns of users; the usage statistics of home appliances can be used to improve energy effi- ciency; on-street cameras provide police officers with new ways of investigating crimes; content generated through mobile and wearables (e.g., posts in online social media or GPS running routes in specialized websites such as those for fitness) can be used to provide tailored content to individuals; bank transaction logs can be used to spot unusual activity in accounts.
However, sharing these digital traces generated by pervasive systems with service providers might raise concerns with regards to user privacy, as the processing and analysis of these traces can surface latent information about user behaviors. Using machine learning techniques, third parties such as advertisers can identify a single individual from inadequately aggregated datasets shared by service providers either publicly or privately. The common use of ad libraries inte- grated directly in applications and websites further allows advertisers to collect the same raw traces as the service providers, and infer personal information about users, which can infringe on the users’ privacy. In the case of location tracking libraries, these traces might reveal information Benjamin Baron University College London Mirco Musolesi University College London and The Alan Turing Institute

about the significant places routinely visited by users, which allows to infer a wide range of per- sonal information, including the user’s place of residence and work and their future locations.
The main focus of the existing work has been on the performance and interpretability of the tech- niques to infer personal information and identify users from their digital traces (e.g., Kosinski et al. [1]). In particular, there has been a large interest to improve the intelligibility of machine learning models to various audiences, mainly by giving effective and intelligible explanations of the inference task and model to the user [2] [3]. As a result, the explanations must provide an in- telligible representation to the users about what the model knows and how it knows it. With the rise of adversarial and linking attacks on machine learning models, these explanations are im- portant to guarantee the fairness and accountability of the models to the users [4]. Some works have studied the privacy impact of the specific models and proposed methods to improve their interpretability in terms of privacy by allowing the user to adapt the learning and inference algo- rithms according to their own privacy preferences [5]. However, there have been limited work in relation to how these inference techniques may infringe the user privacy with the personal infor- mation they expose. In addition to legal requirements [6], the need for the interpretability through effective explanation of the learning and inference process leading to certain predictions is twofold: (i) it helps users understand why their privacy has been violated, and (ii) it enables users to trust the model’s predictions and recommendations to take the necessary actions to pro- tect their privacy in the future. In this paper, we discuss the challenges related to the design of an interpretability framework with the goal of supporting interpretation of machine learning techniques that are adopted to in- fringe the privacy of individuals through personal data inference and user identification. Our contributions are threefold. We state the interpretability and privacy requirements of an effective interpretability framework for privacy-preserving pervasive systems before detailing the func- tionalities of its components, with a focus on feature selection methods as they are crucial when it comes to present the explanations to the users. We present a case study where we detail a pro- totype framework that relies on machine learning classifiers with the goal of identifying users from samples of their personal digital traces. Finally, we present the open challenges in this area, discussing a potential

This content is AI-processed based on ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut