Motivated by the challenges of implementing policy-based data access control (PBAC) under multiple simultaneously applicable compliance frameworks, we present Parajudica, an open, modular, and extensible RDF/SPARQL-based rule system for evaluating context-dependent data compliance status. We demonstrate the utility of this resource and accompanying metamodel through application to existing legal frameworks and industry standards, offering insights for comparative framework analysis. Applications include compliance policy enforcement, compliance monitoring, data discovery, and risk assessment.
Determining the permissibility of a data processing activity is an inherently complex task that often requires reasoning across multiple, overlapping policies and regulations. Such determinations are rarely dictated by the data alone; rather, they depend on contextual factors such as the roles of individuals involved, the purposes and methods of collection, jurisdictional constraints, the simultaneous availability of related data, the strength of organizational controls, assumptions about adversarial capabilities, and temporal considerations.
Determining the compliance status of the data itself is often an important first step, as access policies typically depend on these classifications [1,2,3]. However, complicating matters, the exact same dataset containing patient information may be classified as Protected Health Information (PHI) under the U.S. Health Insurance Portability and Accountability Act (HIPAA) [4] when processed by a covered entity, but as special category personal data under the General Data Protection Regulation (GDPR) [5]. While data considered de-identified under either of HIPAA’s regulatory standards for de-identification (safe harbor or expert determination) statutorily falls outside of HIPAA’s scope, it may remain controlled as anything from anonymized data to special category personal data under the GDPR depending upon the exact details. In short, compliance status is contextdependent.
Structured data further complicates this task. A dataset containing only diagnosis, gender, and age information with a randomly assigned record identifier might be considered suitable for public release; yet if other datasets reuse the same identifier while adding additional demographic details, the released information may not be considered de-identified. The availability of identifiers, therefore, must be considered, as their presence or availability may drastically change the data compliance status.
Further complicating matters, an organization’s internal compliance rules often depend on classifications made under others. For example, a policy might state that data classified as PHI under HIPAA requires encryption and audit logging, that personal data under GDPR is prohibited from cross-border transfer, or that certain combinations of domain-specific fields may be identifiable in other contexts. Such rules create interdependencies among frameworks, whereby the resulting compliance status goes beyond the disjoint union of framework determinations.
Terminology. Before presenting our approach, we define three central terms used throughout this paper. We use compliance framework to mean organizational rules for interpreting regulatory standards (e.g., GDPR, HIPAA) and classifying data. A governance scope represents organizational boundaries within which compliance is evaluated. The term metamodel reflects that our approach provides structures and semantics from which specific compliance scenarios are instantiated.
Approach. To address these challenges, we formulate compliance assessment of structured data as a computational problem: propagating semantic annotations through data structures and across relationships. Data containers (databases, tables, columns) are organized hierarchically, and compliance labels propagate according to framework-specific rules: inward to contained elements, outward to containers, among peers, or across joinable relationships. Parajudica is implemented in RDF, adding to a rich tradition of established use in adjacent areas of privacy vocabularies (DPV [6]), policy languages (ODRL [7,8]), and data governance initiatives [9].
Our approach is complementary to normative reasoning systems that resolve conflicts by imposing superiority relations on rules. In defeasible deontic logic (DDL), for example, the goal is to determine whether a normative statement, such as “it is obligatory to encrypt patient data”, holds after resolving conflicts and exceptions within a single normative system. Our problem is different. We observe that multiple internally coherent compliance interpretations (e.g., of HIPAA and GDPR) may each classify the same dataset differently. These interpretations coexist without a shared authority to adjudicate between them.
Consider two authorities with incompatible labeling schemes. If authority A classifies a field as L 1 and authority B classifies the same field as L 2 , there is no legitimate meta-rule determining which classification prevails. Any resolution (L 1 , L 2 , both, or nei-ther) would be rejected by at least one authority. Defeasible reasoning assumes such a hierarchy of defeat can be defined; multi-framework compliance does not provide one. Our metamodel instead retains both classifications (L 1 under A, L 2 under B) as parallel outputs rather than a logical inconsistency to be resolved. This is practically useful: a company operating under multiple jurisdictions may need to evaluate whether a proposed de-identification strategy satisfies eac
This content is AI-processed based on open access ArXiv data.