A general cipher for individual data anonymization

Reading time: 6 minute
...

📝 Original Info

  • Title: A general cipher for individual data anonymization
  • ArXiv ID: 1712.02557
  • Date: 2017-12-08
  • Authors: Researchers from original ArXiv paper

📝 Abstract

Over the years, the literature on individual data anonymization has burgeoned in many directions. Borrowing from several areas of other sciences, the current diversity of concepts, models and tools available contributes to understanding and fostering individual data dissemination in a privacy-preserving way, as well as unleashing new sources of information for the benefits of society at large. However, such diversity doesn't come without some difficulties. Currently, the task of selecting the optimal analytical environment to conduct anonymization is complicated by the multitude of available choices. Based on recent contributions from the literature and inspired by cryptography, this paper proposes the first cipher for data anonymization. The functioning of this cipher shows that, in fact, every anonymization method can be viewed as a general form of rank swapping with unconstrained permutation structures. Beyond all the currently existing methods that it can mimic, this cipher offers a new way to practice data anonymization, notably by performing anonymization in an ex ante way, instead of being engaged in several ex post evaluations and iterations to reach the protection and information properties sought after. Moreover, the properties of this cipher point to some previously unknown general insights into the task of data anonymization considered at a general level of functioning. Finally, and to make the cipher operational, this paper proposes the introduction of permutation menus in data anonymization, where recently developed universal measures of disclosure risk and information loss are used ex ante for the calibration of permutation keys. To justify the relevance of their uses, a theoretical characterization of these measures is also proposed.

💡 Deep Analysis

Deep Dive into A general cipher for individual data anonymization.

Over the years, the literature on individual data anonymization has burgeoned in many directions. Borrowing from several areas of other sciences, the current diversity of concepts, models and tools available contributes to understanding and fostering individual data dissemination in a privacy-preserving way, as well as unleashing new sources of information for the benefits of society at large. However, such diversity doesn’t come without some difficulties. Currently, the task of selecting the optimal analytical environment to conduct anonymization is complicated by the multitude of available choices. Based on recent contributions from the literature and inspired by cryptography, this paper proposes the first cipher for data anonymization. The functioning of this cipher shows that, in fact, every anonymization method can be viewed as a general form of rank swapping with unconstrained permutation structures. Beyond all the currently existing methods that it can mimic, this cipher offers

📄 Full Content

1

A general cipher for individual data anonymization

Nicolas Ruiz1 OECD

Abstract

Over the years, the literature on individual data anonymization has burgeoned in many directions. Borrowing from several areas of other sciences, the current diversity of concepts, models and tools available contributes to understanding and fostering individual data dissemination in a privacy-preserving way, as well as unleashing new sources of information for the benefits of society at large. However, such diversity doesn’t come without some difficulties. Currently, the task of selecting the optimal analytical environment to conduct anonymization is complicated by the multitude of available choices. Based on recent contributions from the literature and inspired by cryptography, this paper proposes the first cipher for data anonymization. The functioning of this cipher shows that, in fact, every anonymization method can be viewed as a general form of rank swapping with unconstrained permutation structures. Beyond all the currently existing methods that it can mimic, this cipher offers a new way to practice data anonymization, notably by performing anonymization in an ex-ante way, instead of being engaged in several ex-post evaluations and iterations to reach the protection and information properties sought after. Moreover, the properties of this cipher point to some previously unknown general insights into the task of data anonymization considered at a general level of functioning. Finally, and to make the cipher operational, this paper proposes the introduction of permutation menus in data anonymization, where recently developed universal measures of disclosure risk and information loss are used ex-ante for the calibration of permutation keys. To justify the relevance of their uses, a theoretical characterization of these measures is also proposed.

Keywords: privacy-preserving data publishing, statistical disclosure control, permutation paradigm, permutation matrices, rank swapping, power means, cipher

  1. Introduction and contributions of this paper Data on individual subjects are increasingly gathered and exchanged. By their nature, they provide a rich amount of information that can inform statistical and policy analysis in a meaningful way. However, due to the legal obligations surrounding these data, this wealth of information is often not fully exploited in order to protect the confidentiality of respondents and to avoid privacy threats. In fact, such requirements shape the dissemination policy of individual data at national and international levels. The issue is how to ensure a sufficient level of data protection to meet releasers’ concerns in terms of legal and ethical requirements, while still offering users a reasonable level of information. Over the last decade the role of micro data has changed from being the preserve of National Statistical Offices and government departments to being a vital tool for a wide range of analysts trying to understand both social and economic phenomena. This has raised a new range of questions and pressing needs about the privacy/information trade-off and the quest for best practices that can be both useful to users but also respectful of respondents’ privacy. Statistical disclosure control (SDC) research has a rich history of addressing those issues by providing the analytical apparatus through which the privacy/information trade-off can be assessed and implemented. SDC consists in the set of tools that can enhance the level of confidentiality of any data while preserving to a lesser or greater extent its level of information (see [1] for an authoritative

1 Contact : nicolas.ruiz@oecd.org. OECD, 2 rue André Pascal, 75016, Paris, France. Tél.: +33145241433 2

survey). Over the years, it has burgeoned in many directions. In particular, techniques applicable to micro data, which are the focus of this paper, offer a wide variety of tools to protect the confidentiality of respondents while maximizing the information content of the data released, for the benefits of society at large. Such diversity is undoubtedly useful but has, however, one major drawback: a lack of agreement and clarity on the appropriate choice of tools in a given context, and as a consequence, a lack of a comprehensive view (or at best an incomplete one) across the relative performances of the techniques available. The practical scope of current micro data masking methods is not fully exploited precisely because there is no overarching framework. All methods generally carry their own analytical environment, underlying approach and definitions of privacy and information.

A step toward the resolution of this limitation has been recently proposed ([2], [3]), by establishing that any micro data masking method can be viewed as functionally equivalent to a permutation of the original data, plus eventual

…(Full text truncated)…

Reference

This content is AI-processed based on ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut