Preserving Individual Privacy in Serial Data Publishing

Preserving Individual Privacy in Serial Data Publishing
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

While previous works on privacy-preserving serial data publishing consider the scenario where sensitive values may persist over multiple data releases, we find that no previous work has sufficient protection provided for sensitive values that can change over time, which should be the more common case. In this work we propose to study the privacy guarantee for such transient sensitive values, which we call the global guarantee. We formally define the problem for achieving this guarantee and derive some theoretical properties for this problem. We show that the anonymized group sizes used in the data anonymization is a key factor in protecting individual privacy in serial publication. We propose two strategies for anonymization targeting at minimizing the average group size and the maximum group size. Finally, we conduct experiments on a medical dataset to show that our method is highly efficient and also produces published data of very high utility.


💡 Research Summary

The paper addresses a gap in the literature on privacy‑preserving serial data publishing: existing work assumes that sensitive values are persistent across releases, while in many real‑world scenarios (e.g., medical records) the sensitive attribute can change over time. The authors introduce the notion of a “global guarantee,” which requires that for any individual the probability of being linked to a particular sensitive value in at least one of the published releases does not exceed 1/ℓ. This is distinguished from the traditional “localized guarantee” that bounds the linkage probability within each individual release. Through a concrete example with two releases, the authors demonstrate that satisfying the localized guarantee does not necessarily satisfy the global guarantee, because the combination of possible worlds across releases can increase the overall linkage probability.

The core technical contribution is a formal analysis showing that the size of anonymized groups (AGs) is the decisive factor for achieving the global guarantee. The paper derives a general formula for the breach probability p(o, s, k) based on counting possible worlds, and proves that group sizes must be larger than those required for simple ℓ‑diversity. Specifically, the authors prove that to guarantee the global bound, each group must have a size that grows with the number of releases, and they provide bounds relating group size, the privacy parameter ℓ, and the number of releases k.

Building on these insights, two optimization strategies are proposed: (1) MinAvg, which minimizes the average group size across all releases while satisfying the global guarantee, and (2) MinMax, which minimizes the maximum group size to avoid extreme information loss. Both strategies formulate the anonymization problem as a constrained optimization problem, where the constraints enforce the derived group‑size bounds and the objective functions capture the respective average or maximum size metric. The algorithms iteratively adjust QID generalization levels to form groups that meet the constraints with minimal utility loss.

The experimental evaluation uses a real medical dataset containing patient diagnoses. The authors vary ℓ (2, 3, 5) and the number of releases (3, 5, 7) and compare their methods against baseline serial k‑anonymity approaches. Metrics include execution time, average and maximum group sizes, and query accuracy measured by aggregate query error. Results show that the proposed methods achieve the global guarantee with substantially smaller average group sizes (20‑35 % reduction) and lower maximum group sizes (over 30 % reduction) compared to baselines. Consequently, the utility of the published data improves, as evidenced by a 10‑15 % reduction in query error. The algorithms scale linearly with dataset size, indicating suitability for near‑real‑time publishing scenarios.

In summary, the paper makes three key contributions: (1) it formalizes the global privacy guarantee for transient sensitive values in serial publishing, (2) it provides theoretical bounds linking group size to the guarantee, and (3) it offers practical anonymization algorithms that balance privacy and utility. The work opens avenues for future research on multi‑attribute sensitive values, adaptive group re‑formation over time, and integration with machine‑learning‑driven data analysis pipelines.


Comments & Academic Discussion

Loading comments...

Leave a Comment