PASTA: A Scalable Framework for Multi-Policy AI Compliance Evaluation

PASTA: A Scalable Framework for Multi-Policy AI Compliance Evaluation
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

AI compliance is becoming increasingly critical as AI systems grow more powerful and pervasive. Yet the rapid expansion of AI policies creates substantial burdens for resource-constrained practitioners lacking policy expertise. Existing approaches typically address one policy at a time, making multi-policy compliance costly. We present PASTA, a scalable compliance tool integrating four innovations: (1) a comprehensive model-card format supporting descriptive inputs across development stages; (2) a policy normalization scheme; (3) an efficient LLM-powered pairwise evaluation engine with cost-saving strategies; and (4) an interface delivering interpretable evaluations via compliance heatmaps and actionable recommendations. Expert evaluation shows PASTA’s judgments closely align with human experts ($ρ\geq .626$). The system evaluates five major policies in under two minutes at approximately $3. A user study (N = 12) confirms practitioners found outputs easy-to-understand and actionable, introducing a novel framework for scalable automated AI governance.


💡 Research Summary

PASTA (Policy Aggregator & Scanner for Trustworthy AI) addresses the growing challenge of evaluating AI systems against multiple, rapidly evolving regulations across jurisdictions. Existing tools either focus on a single policy or require extensive legal expertise and costly, time‑consuming processes, making them unsuitable for resource‑constrained practitioners. PASTA introduces a four‑component framework that enables scalable, cost‑effective, and interpretable multi‑policy compliance assessment.

  1. Unified Model‑Card Input – The authors extend the conventional model‑card concept with eight structured sections (system purpose, data sources, training methodology, deployment context, risk assessment, stakeholders, ethical considerations, and legal references). This schema balances comprehensiveness with ease of completion, allowing typical users to fill it out in roughly 28 minutes.

  2. Policy Normalization Pipeline – Over 930 AI‑related legislative initiatives from 70 jurisdictions are harvested, parsed into article‑level units, and then clustered by semantic similarity. Legal terminology is mapped to a unified vocabulary (e.g., “data minimisation,” “high‑risk AI”). The output is a tabular representation where each row corresponds to a concise, 300‑400 token “policy chunk.” This uniform format lets large language models (LLMs) process heterogeneous regulations consistently.

  3. Cost‑Saving LLM Evaluation Engine – Two complementary strategies reduce token usage and monetary cost. Policy Chunking breaks long statutes into manageable pieces, while Irrelevancy Mapping filters out chunks whose semantic similarity to the model‑card falls below a cosine‑similarity threshold (≈0.25). Consequently, only about 35 % of the original chunks trigger LLM calls, cutting both latency and expense. The evaluation uses a GPT‑4‑style prompt that returns two scores per chunk: a violation likelihood (0‑5) and a relevance rating (0‑5).

  4. Interpretable Reporting Interface – Results are visualised as a heatmap where colour intensity indicates potential non‑compliance for each policy and clause. Clicking a cell reveals a concise summary, the rationale for the score, and concrete remediation suggestions. An automatically generated priority checklist further assists users in addressing the most critical gaps.

Technical Evaluation – The authors conducted two studies. In the expert alignment experiment, five major policies (EU AI Act, AIDA, GDPR, CCPA, Colorado AI Act) were evaluated by eight domain experts. PASTA’s automated scores correlated with expert labels with Spearman ρ = 0.6264 for violation scores and ρ = 0.7611 for relevance scores. Mean absolute error was under 0.42 points, and 87 %–94 % of predictions fell within one point of the expert rating.

User Study – Twelve AI practitioners (developers, project managers, UI designers) completed the model‑card and reviewed the generated compliance report. Average model‑card completion time was 28.4 minutes (SD = 6.7). Report interpretation took 6.8 minutes (SD = 1.9). Participants rated report readability 4.73/5 and expressed high confidence in identifying policy‑specific risks (4.73/5). Qualitative feedback highlighted the value of early‑stage risk awareness and the actionable nature of the checklist.

Limitations – The normalization step may lose nuanced legal context, especially for conditional or exception clauses, potentially affecting accuracy. The system currently relies on a proprietary GPT‑4‑style model; cost and performance could vary with model updates or alternative LLMs. Evaluation covered only five policies, leaving domain‑specific regulations (e.g., healthcare, finance) untested. The user study’s small sample size limits generalisability across organizations of different sizes and sectors.

Significance and Future Work – PASTA demonstrates that a well‑engineered combination of policy normalization and efficient LLM prompting can deliver multi‑policy compliance checks at a cost of roughly $3 and a runtime under two minutes. This makes systematic compliance feasible for small teams and individual developers who lack legal expertise. Future directions include automating the normalization pipeline with semantic graphs, extending the framework to cover specialized regulatory domains, experimenting with open‑source LLMs to further reduce cost, and conducting large‑scale field deployments to validate scalability and impact.

In sum, PASTA offers a novel, practical solution that bridges the gap between the explosion of AI regulations and the limited resources of many AI practitioners, moving the field toward more responsible and trustworthy AI deployment.


Comments & Academic Discussion

Loading comments...

Leave a Comment