Interpretable Machine Learning for Privacy-Preserving Pervasive Systems
📝 Abstract
Our everyday interactions with pervasive systems generate traces that capture various aspects of human behavior and enable machine learning algorithms to extract latent information about users. In this paper, we propose a machine learning interpretability framework that enables users to understand how these generated traces violate their privacy.
💡 Analysis
Our everyday interactions with pervasive systems generate traces that capture various aspects of human behavior and enable machine learning algorithms to extract latent information about users. In this paper, we propose a machine learning interpretability framework that enables users to understand how these generated traces violate their privacy.
📄 Content
Interpretable Machine
Learning for Privacy-
Preserving Pervasive
Systems
Our everyday interactions with pervasive systems
generate traces that capture various aspects of
human behavior and enable machine learning
algorithms to extract latent information about users. In
this paper, we propose a machine learning
interpretability framework that enables users to understand how these generated traces
violate their privacy.
With the emergence of connected devices (e.g., smartphones and smartmeters), pervasive sys-
tems generate growing amounts of digital traces as users undergo their everyday activities. These
traces are crucial to service providers to understand their customers, to increase the degree of
personalization, and enhance the quality of their services. For instance, personal digital traces
stemming from public transit smartcards help transportation providers understand the commuting
patterns of users; the usage statistics of home appliances can be used to improve energy effi-
ciency; on-street cameras provide police officers with new ways of investigating crimes; content
generated through mobile and wearables (e.g., posts in online social media or GPS running
routes in specialized websites such as those for fitness) can be used to provide tailored content to
individuals; bank transaction logs can be used to spot unusual activity in accounts.
However, sharing these digital traces generated by pervasive systems with service providers
might raise concerns with regards to user privacy, as the processing and analysis of these traces
can surface latent information about user behaviors. Using machine learning techniques, third
parties such as advertisers can identify a single individual from inadequately aggregated datasets
shared by service providers either publicly or privately. The common use of ad libraries inte-
grated directly in applications and websites further allows advertisers to collect the same raw
traces as the service providers, and infer personal information about users, which can infringe on
the users’ privacy. In the case of location tracking libraries, these traces might reveal information
Benjamin Baron
University College London
Mirco Musolesi
University College London
and The Alan Turing Institute
about the significant places routinely visited by users, which allows to infer a wide range of per-
sonal information, including the user’s place of residence and work and their future locations.
The main focus of the existing work has been on the performance and interpretability of the tech-
niques to infer personal information and identify users from their digital traces (e.g., Kosinski et
al. [1]). In particular, there has been a large interest to improve the intelligibility of machine
learning models to various audiences, mainly by giving effective and intelligible explanations of
the inference task and model to the user [2] [3]. As a result, the explanations must provide an in-
telligible representation to the users about what the model knows and how it knows it. With the
rise of adversarial and linking attacks on machine learning models, these explanations are im-
portant to guarantee the fairness and accountability of the models to the users [4]. Some works
have studied the privacy impact of the specific models and proposed methods to improve their
interpretability in terms of privacy by allowing the user to adapt the learning and inference algo-
rithms according to their own privacy preferences [5]. However, there have been limited work in
relation to how these inference techniques may infringe the user privacy with the personal infor-
mation they expose. In addition to legal requirements [6], the need for the interpretability
through effective explanation of the learning and inference process leading to certain predictions
is twofold: (i) it helps users understand why their privacy has been violated, and (ii) it enables
users to trust the model’s predictions and recommendations to take the necessary actions to pro-
tect their privacy in the future.
In this paper, we discuss the challenges related to the design of an interpretability framework
with the goal of supporting interpretation of machine learning techniques that are adopted to in-
fringe the privacy of individuals through personal data inference and user identification. Our
contributions are threefold. We state the interpretability and privacy requirements of an effective
interpretability framework for privacy-preserving pervasive systems before detailing the func-
tionalities of its components, with a focus on feature selection methods as they are crucial when
it comes to present the explanations to the users. We present a case study where we detail a pro-
totype framework that relies on machine learning classifiers with the goal of identifying users
from samples of their personal digital traces. Finally, we present the open challenges in this area,
discussing a potential
This content is AI-processed based on ArXiv data.