Why developers cannot embed privacy into software systems? An empirical investigation

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Pervasive use of software applications continues to challenge user privacy when users interact with software systems. Even though privacy practices such as Privacy by Design (PbD), have clear in- structions for software developers to embed privacy into software designs, those practices are yet to become a common practice among software developers. The difficulty of developing privacy preserv- ing software systems highlights the importance of investigating software developers and the problems they face when they are asked to embed privacy into application designs. Software devel- opers are the community who can put practices such as PbD into action. Therefore, identifying problems they face when embed- ding privacy into software applications and providing solutions to those problems are important to enable the development of privacy preserving software systems. This study investigates 36 software developers in a software design task with instructions to embed privacy in order to identify the problems they face. We derive rec- ommendation guidelines to address the problems to enable the development of privacy preserving software systems.

💡 Research Summary

The paper tackles a paradox that, despite the widespread advocacy of Privacy‑by‑Design (PbD) principles, software developers rarely embed privacy into their systems in practice. To uncover why, the authors conducted an empirical study with thirty‑six professional developers who were asked to design two realistic applications—a web‑based e‑commerce platform and a mobile health‑tracking app—under explicit instructions to incorporate privacy safeguards. Participants documented their design artifacts (architectural diagrams, data‑flow models, technology choices) and then took part in semi‑structured interviews that probed the difficulties they encountered, the reasoning behind their decisions, and the resources they consulted.

Qualitative coding of interview transcripts revealed four dominant problem domains. First, ambiguous privacy requirements: developers were given high‑level concepts such as “data minimisation” and “purpose limitation” without concrete operational definitions, making it hard to decide which data items needed protection, at what granularity, and for how long. Second, lack of tooling support: mainstream IDEs, version‑control systems, and CI/CD pipelines do not provide automated data‑flow tracing, privacy impact checks, or built‑in libraries for anonymisation and consent management. Consequently, developers had to manually search for cryptographic primitives and evaluate performance trade‑offs, a time‑consuming process. Third, organizational and project constraints: tight deadlines, limited budgets, and the absence of a designated privacy officer caused privacy tasks to be deprioritised or postponed. The study observed that many teams lacked formal processes for escalating privacy‑related conflicts. Fourth, skill and knowledge gaps: while most participants possessed solid security knowledge, they often conflated security with privacy, assuming that implementing encryption alone would satisfy privacy obligations. They were less familiar with legal nuances, consent lifecycle management, and the principle of purpose‑driven data handling.

Based on these findings, the authors propose seven actionable guidelines aimed at both the technical and managerial layers of software development. 1) Targeted privacy education and training that bridges legal requirements with concrete engineering practices. 2) A privacy‑design checklist that forces developers to ask, at each stage, who is the data subject, what data is collected, why it is needed, and how consent is obtained and recorded. 3) A catalogue of privacy design patterns (e.g., data‑pseudonymisation, on‑device processing, consent‑driven APIs) accompanied by reusable code snippets. 4) Integration of automated privacy analysis tools into CI pipelines, such as static data‑flow analysers that flag undisclosed personal identifiers and verify that encryption is applied consistently. 5) Appointment of a privacy champion or officer for each project to own requirement clarification and conflict resolution. 6) Mapping guides that align major regulations (GDPR, CCPA, etc.) with specific technical controls, helping developers translate legal clauses into concrete implementation steps. 7) Continuous privacy testing, including periodic Privacy Impact Assessments (PIAs) and simulated breach drills, to catch regressions early.

The paper argues that these measures can shift privacy from a “post‑hoc checklist” to an integral part of the design mindset. By providing clear, actionable artefacts—checklists, pattern libraries, and automated validation steps—developers can reduce cognitive load, make more informed trade‑offs, and avoid the costly re‑engineering that typically occurs when privacy is addressed only after a product is built. Moreover, institutional changes such as assigning a privacy owner and embedding privacy metrics into project governance are essential to overcome the organisational inertia that currently pushes privacy concerns to the back‑burner.

In sum, the study delivers a rigorous, evidence‑based diagnosis of why developers struggle to embed privacy, and it offers a concrete, multi‑level roadmap for researchers, tool vendors, and software organisations seeking to operationalise Privacy‑by‑Design in real‑world development pipelines.

Why developers cannot embed privacy into software systems? An empirical investigation

💡 Research Summary

Comments & Academic Discussion

Leave a Comment