📝 Original Info
- Title: A general cipher for individual data anonymization
- ArXiv ID: 1712.02557
- Date: 2017-12-08
- Authors: Researchers from original ArXiv paper
📝 Abstract
Over the years, the literature on individual data anonymization has burgeoned in many directions. Borrowing from several areas of other sciences, the current diversity of concepts, models and tools available contributes to understanding and fostering individual data dissemination in a privacy-preserving way, as well as unleashing new sources of information for the benefits of society at large. However, such diversity doesn't come without some difficulties. Currently, the task of selecting the optimal analytical environment to conduct anonymization is complicated by the multitude of available choices. Based on recent contributions from the literature and inspired by cryptography, this paper proposes the first cipher for data anonymization. The functioning of this cipher shows that, in fact, every anonymization method can be viewed as a general form of rank swapping with unconstrained permutation structures. Beyond all the currently existing methods that it can mimic, this cipher offers a new way to practice data anonymization, notably by performing anonymization in an ex ante way, instead of being engaged in several ex post evaluations and iterations to reach the protection and information properties sought after. Moreover, the properties of this cipher point to some previously unknown general insights into the task of data anonymization considered at a general level of functioning. Finally, and to make the cipher operational, this paper proposes the introduction of permutation menus in data anonymization, where recently developed universal measures of disclosure risk and information loss are used ex ante for the calibration of permutation keys. To justify the relevance of their uses, a theoretical characterization of these measures is also proposed.
💡 Deep Analysis
Deep Dive into A general cipher for individual data anonymization.
Over the years, the literature on individual data anonymization has burgeoned in many directions. Borrowing from several areas of other sciences, the current diversity of concepts, models and tools available contributes to understanding and fostering individual data dissemination in a privacy-preserving way, as well as unleashing new sources of information for the benefits of society at large. However, such diversity doesn’t come without some difficulties. Currently, the task of selecting the optimal analytical environment to conduct anonymization is complicated by the multitude of available choices. Based on recent contributions from the literature and inspired by cryptography, this paper proposes the first cipher for data anonymization. The functioning of this cipher shows that, in fact, every anonymization method can be viewed as a general form of rank swapping with unconstrained permutation structures. Beyond all the currently existing methods that it can mimic, this cipher offers
📄 Full Content
1
A general cipher for individual data anonymization
Nicolas Ruiz1
OECD
Abstract
Over the years, the literature on individual data anonymization has burgeoned in many
directions. Borrowing from several areas of other sciences, the current diversity of concepts,
models and tools available contributes to understanding and fostering individual data
dissemination in a privacy-preserving way, as well as unleashing new sources of information
for the benefits of society at large. However, such diversity doesn’t come without some
difficulties. Currently, the task of selecting the optimal analytical environment to conduct
anonymization is complicated by the multitude of available choices. Based on recent
contributions from the literature and inspired by cryptography, this paper proposes the first
cipher for data anonymization. The functioning of this cipher shows that, in fact, every
anonymization method can be viewed as a general form of rank swapping with unconstrained
permutation structures. Beyond all the currently existing methods that it can mimic, this
cipher offers a new way to practice data anonymization, notably by performing
anonymization in an ex-ante way, instead of being engaged in several ex-post evaluations and
iterations to reach the protection and information properties sought after. Moreover, the
properties of this cipher point to some previously unknown general insights into the task of
data anonymization considered at a general level of functioning. Finally, and to make the
cipher operational, this paper proposes the introduction of permutation menus in data
anonymization, where recently developed universal measures of disclosure risk and
information loss are used ex-ante for the calibration of permutation keys. To justify the
relevance of their uses, a theoretical characterization of these measures is also proposed.
Keywords: privacy-preserving data publishing, statistical disclosure control, permutation
paradigm, permutation matrices, rank swapping, power means, cipher
- Introduction and contributions of this paper
Data on individual subjects are increasingly gathered and exchanged. By their nature, they
provide a rich amount of information that can inform statistical and policy analysis in a meaningful
way. However, due to the legal obligations surrounding these data, this wealth of information is often
not fully exploited in order to protect the confidentiality of respondents and to avoid privacy threats.
In fact, such requirements shape the dissemination policy of individual data at national and
international levels. The issue is how to ensure a sufficient level of data protection to meet releasers’
concerns in terms of legal and ethical requirements, while still offering users a reasonable level of
information. Over the last decade the role of micro data has changed from being the preserve of
National Statistical Offices and government departments to being a vital tool for a wide range of
analysts trying to understand both social and economic phenomena. This has raised a new range of
questions and pressing needs about the privacy/information trade-off and the quest for best practices
that can be both useful to users but also respectful of respondents’ privacy.
Statistical disclosure control (SDC) research has a rich history of addressing those issues by
providing the analytical apparatus through which the privacy/information trade-off can be assessed
and implemented. SDC consists in the set of tools that can enhance the level of confidentiality of any
data while preserving to a lesser or greater extent its level of information (see [1] for an authoritative
1 Contact : nicolas.ruiz@oecd.org. OECD, 2 rue André Pascal, 75016, Paris, France. Tél.: +33145241433
2
survey). Over the years, it has burgeoned in many directions. In particular, techniques applicable to
micro data, which are the focus of this paper, offer a wide variety of tools to protect the
confidentiality of respondents while maximizing the information content of the data released, for the
benefits of society at large. Such diversity is undoubtedly useful but has, however, one major
drawback: a lack of agreement and clarity on the appropriate choice of tools in a given context, and as
a consequence, a lack of a comprehensive view (or at best an incomplete one) across the relative
performances of the techniques available. The practical scope of current micro data masking methods
is not fully exploited precisely because there is no overarching framework. All methods generally
carry their own analytical environment, underlying approach and definitions of privacy and
information.
A step toward the resolution of this limitation has been recently proposed ([2], [3]), by
establishing that any micro data masking method can be viewed as functionally equivalent to a
permutation of the original data, plus eventual
…(Full text truncated)…
Reference
This content is AI-processed based on ArXiv data.