Analyzing developer discussions on EU and US privacy legislation compliance in GitHub repositories

Analyzing developer discussions on EU and US privacy legislation compliance in GitHub repositories
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Context: Privacy legislation has impacted the way software systems are developed, prompting practitioners to update their implementations. Specifically, the EU General Data Protection Regulation (GDPR) and the California Consumer Privacy Act (CCPA) have forced the community to focus on users’ data privacy. Despite the vast amount of data on developer issues available in GitHub repositories, there is a lack of empirical evidence on the issues developers of Open Source Software discuss to comply with privacy legislation. Method: In this work, we examine such discussions by mining and analyzing 32,820 issues from GitHub repositories. We partially analyzed the dataset automatically to identify law user rights and principles indicated, and manually analyzed a sample of 1,186 issues based on the type of concern addressed. Results: We devised 24 discussion categories placed in six clusters: features/bugs, consent-related, documentation, data storing/sharing, adaptability, and general compliance. Our results show that developers mainly focus on specific user rights from the legislation (right to erasure, right to opt-out, right to access), addressing other rights less frequently, while most discussions concern user consent, user rights functionality, bugs and cookies management. Conclusion: The created taxonomy can help practitioners understand which issues are discussed for law compliance, so that they ensure they address them first in their systems. In addition, the educational community can reshape curricula to better educate future engineers on the privacy law concerns raised, and the research community can identify gaps and areas for improvement to support and accelerate data privacy law compliance.


💡 Research Summary

This paper, titled “Analyzing developer discussions on EU and US privacy legislation compliance in GitHub repositories,” presents a large-scale empirical study investigating how open-source software developers discuss and address compliance with major privacy regulations like the EU’s General Data Protection Regulation (GDPR) and the California Consumer Privacy Act (CCPA) within GitHub repositories.

The study was guided by three research questions (RQs): RQ1 explores the basic characteristics of law-relevant issues and compares them to non-law issues; RQ2 identifies which specific user rights and principles from privacy legislation are mentioned in discussions; and RQ3 uncovers the concrete concerns developers raise when dealing with compliance.

The methodology involved a multi-step, hybrid analytical process. The researchers first mined GitHub issues from April 2016 to June 2024 using keywords related to GDPR, CCPA, the California Privacy Rights Act (CPRA), and the Data Protection Act (DPA), resulting in an initial dataset of 32,820 issues from 13,227 repositories. This dataset was partially analyzed using automated techniques to filter for issues referencing specific legal user rights and principles. Subsequently, a representative sample of 1,186 issues underwent detailed manual thematic analysis. The researchers examined not only the issue descriptions and comments but also, when necessary, linked commits and source code to understand the full context of the discussions.

Key findings are as follows:

  • For RQ1, the analysis revealed that the creation of law-relevant issues spiked around the enforcement dates of major laws like GDPR and CCPA. Furthermore, these law-relevant issues triggered more comments and discussion compared to non-law-relevant issues, indicating their complexity and importance.
  • For RQ2, the study found a significant imbalance in developers’ focus. A limited set of user rights dominated the discussions: the Right to Erasure, the Right to Opt-out, and the Right to Access were the most frequently addressed. Other rights stipulated by the laws received considerably less attention in the issue threads.
  • For RQ3, which forms the core contribution, the manual analysis led to the creation of a comprehensive taxonomy of developer concerns. This taxonomy consists of 24 specific categories, organized into six higher-level clusters:
    1. Features/Bugs: Implementation of new privacy-related features and fixing bugs in existing compliance functionality.
    2. Consent-related: All aspects of obtaining, managing, withdrawing, and designing user interfaces for consent.
    3. Documentation: Updating privacy policies, terms of service, and other legal documentation.
    4. Data Storing/Sharing: Concerns about data minimization, encryption, secure deletion (shredding), and data sharing with third parties.
    5. Adaptability: Making systems configurable to adapt to different laws or regions (e.g., via feature flags, internationalization).
    6. General Compliance: Broader discussions about interpreting the law, checking compliance status, and seeking general advice.

The paper concludes that the developed taxonomy provides a valuable map of the practical landscape of privacy law compliance in OSS development. It can help practitioners prioritize issues, guide educators in reshaping curricula to include real-world compliance challenges, and offer researchers a foundation for building automated tools to support and accelerate the compliance process. The authors also acknowledge limitations, such as the inherent focus on discussed issues rather than a direct audit of code compliance and potential biases in the manual sampling process.


Comments & Academic Discussion

Loading comments...

Leave a Comment