Privacy by design in big data: An overview of privacy enhancing technologies in the era of big data analytics
The extensive collection and processing of personal information in big data analytics has given rise to serious privacy concerns, related to wide scale electronic surveillance, profiling, and disclosure of private data. To reap the benefits of analytics without invading the individuals’ private sphere, it is essential to draw the limits of big data processing and integrate data protection safeguards in the analytics value chain. ENISA, with the current report, supports this approach and the position that the challenges of technology (for big data) should be addressed by the opportunities of technology (for privacy). We first explain the need to shift from “big data versus privacy” to “big data with privacy”. In this respect, the concept of privacy by design is key to identify the privacy requirements early in the big data analytics value chain and in subsequently implementing the necessary technical and organizational measures. After an analysis of the proposed privacy by design strategies in the different phases of the big data value chain, we review privacy enhancing technologies of special interest for the current and future big data landscape. In particular, we discuss anonymization, the “traditional” analytics technique, the emerging area of encrypted search and privacy preserving computations, granular access control mechanisms, policy enforcement and accountability, as well as data provenance issues. Moreover, new transparency and access tools in big data are explored, together with techniques for user empowerment and control. Achieving “big data with privacy” is no easy task and a lot of research and implementation is still needed. Yet, it remains a possible task, as long as all the involved stakeholders take the necessary steps to integrate privacy and data protection safeguards in the heart of big data, by design and by default.
💡 Research Summary
The paper addresses the growing privacy concerns that arise from the massive collection and processing of personal data in big‑data analytics. It argues that the debate should move from a binary “big data versus privacy” stance to a collaborative “big data with privacy” paradigm, and positions privacy‑by‑design (PbD) as the cornerstone for achieving this shift. PbD requires that privacy requirements be identified early in the big‑data value chain—covering data acquisition, storage, preprocessing, analysis, sharing, and disposal—and that appropriate technical and organisational measures be embedded throughout.
The authors first map the value‑chain phases to specific PbD strategies. During data acquisition, they advocate minimal data collection, purpose limitation, and the use of anonymisation techniques such as k‑anonymity, l‑diversity, t‑closeness, and, more robustly, differential privacy to bound re‑identification risk. In the storage and preprocessing stage, they highlight searchable encryption (SSE) and encrypted indexing, which allow keyword queries over ciphertext without exposing raw data. For the analysis phase, the paper reviews privacy‑preserving computation methods, including homomorphic encryption (both partial and fully homomorphic), secure multi‑party computation (SMC), and trusted execution environments (TEEs). While these technologies enable computation on encrypted data, the authors note current performance and scalability limitations.
Access control is treated as a granular, attribute‑based problem. The paper discusses attribute‑based access control (ABAC) and attribute‑based encryption (ABE) as flexible mechanisms that can enforce dynamic policies based on user, data, and context attributes. To ensure accountability, it proposes real‑time policy enforcement engines coupled with immutable audit logs, enabling organisations to demonstrate compliance with regulations such as GDPR and CCPA. Data provenance mechanisms—tracking the lineage and transformations of datasets—are presented as essential for maintaining data integrity and transparency.
Beyond technical safeguards, the authors explore transparency and user‑empowerment tools. Privacy dashboards, consent‑management platforms, and data‑access request interfaces give data subjects visibility into how their information is used and the ability to revoke or modify consent. These tools not only satisfy legal obligations but also foster trust between users and data controllers.
The paper concludes by acknowledging that many privacy‑enhancing technologies (PETs) are still in early‑stage development, facing challenges related to computational overhead, usability, and lack of standardisation. It calls for coordinated research, standard‑setting, and cross‑sector collaboration among academia, industry, and regulators to embed privacy and data‑protection safeguards “by design and by default” into the heart of big‑data systems. Only through such concerted effort can the vision of “big data with privacy” become a practical reality.
Comments & Academic Discussion
Loading comments...
Leave a Comment