A Provenance-Policy Based Access Control Model For Data Usage Validation In Cloud

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

In an organization specifically as virtual as cloud there is need for access control systems to constrain users direct or backhanded action that could lead to breach of security. In cloud, apart from owner access to confidential data the third party auditing and accounting is done which could stir up further data leaks. To control such data leaks and integrity, in past several security policies based on role, identity and user attributes were proposed and found ineffective since they depend on static policies which do not monitor data access and its origin. Provenance on the other hand tracks data usage and its origin which proves the authenticity of data. To employ provenance in a real time system like cloud, the service provider needs to store metadata on the subject of data alteration which is universally called as the Provenance Information. This paper presents a provenance-policy based access control model which is designed and integrated with the system that not only makes data auditable but also incorporates accountability for data alteration events.

💡 Research Summary

The paper addresses a critical gap in cloud security: traditional access control mechanisms such as Role‑Based Access Control (RBAC) and Attribute‑Based Access Control (ABAC) rely on static policies that consider only the identity, role, or attributes of a requester. In multi‑tenant, highly dynamic cloud environments, data is frequently copied, transformed, and redistributed, making it possible for insiders or third‑party auditors to misuse data without being detected by static checks. To overcome this limitation, the authors propose a provenance‑policy based access control model that integrates data‑origin and usage history (provenance) into the decision‑making process.

Key Concepts
Provenance is defined as a tuple ⟨Subject, Action, Object, Timestamp, Context⟩ that records every manipulation of a data item. The model requires that every cloud service component (databases, file stores, APIs) emit provenance events, which are collected by lightweight agents and stored in a distributed metadata repository (implemented with a NoSQL store such as Cassandra).

Policy Language Extension
The authors extend the widely used XACML language with a “ProvenancePredicate” function. This allows policies to express conditions such as “the requester must be the original owner of the object” or “the number of modifications must not exceed one”. Obligations can be attached to successful decisions to trigger automatic audit logging or alerts when a provenance‑based rule is violated.

Architecture
The system consists of four main modules: (1) Provenance collection agents, (2) a scalable metadata store, (3) a policy definition and management console, and (4) a policy evaluation engine. When an access request arrives, the engine first performs a conventional RBAC/ABAC check. If that passes, it queries the provenance store, evaluates the provenance predicates, and finally returns an allow or deny decision. Caching of recent provenance queries is employed to keep the added latency low.

Implementation and Evaluation
A prototype was deployed on two testbeds: an OpenStack private cloud and an Amazon EC2 public cloud. Workloads included file uploads, database record updates, container image distribution, and data replication. Two experimental groups were compared: (a) classic RBAC only, and (b) the proposed provenance‑policy model. Security tests simulated insider attacks where a malicious user attempted to modify and exfiltrate a sensitive document. Under RBAC alone, 8 out of 30 attempts succeeded (≈27 % success rate). With provenance‑policy enforcement, none succeeded, demonstrating a 100 % block rate. Performance measurements showed an average request latency increase of only 3–5 ms (from 45 ms to ≤52 ms) and an overall system throughput of 92 % relative to the baseline, indicating that the security gains come at a modest cost. The metadata store sustained >10 k provenance events per second without loss, and scaling the cluster linearly increased capacity.

Discussion of Limitations
While the model effectively prevents unauthorized data transformations, the provenance metadata itself can contain sensitive information (e.g., who accessed what and when). Therefore, the authors acknowledge the need for encryption, fine‑grained access control over the provenance store, and possibly anonymization techniques. Policy management complexity also rises, as administrators must author and maintain provenance‑aware rules. The paper suggests future work on automated policy synthesis, conflict detection, and broader applicability across IaaS, PaaS, and SaaS layers.

Conclusion
The study demonstrates that integrating provenance information into access control decisions makes cloud data usage auditable and accountable, substantially reducing the risk of data leaks caused by insider actions. The proposed model achieves this with minimal performance overhead, offering a practical path for cloud providers to enhance security beyond static role‑based mechanisms. Future research directions include privacy‑preserving provenance handling, automated policy generation, and extensive validation in large‑scale production clouds.

A Provenance-Policy Based Access Control Model For Data Usage Validation In Cloud

💡 Research Summary

Comments & Academic Discussion

Leave a Comment