Taint Analysis for Graph APIs Focusing on Broken Access Control
We present the first systematic approach to static and dynamic taint analysis for Graph APIs focusing on broken access control. The approach comprises the following. We taint nodes of the Graph API if they represent data requiring specific privileges in order to be retrieved or manipulated, and identify API calls which are related to sources and sinks. Then, we statically analyze whether a tainted information flow between API source and sink calls occurs. To this end, we model the API calls using graph transformation rules. We subsequently use Critical Pair Analysis to automatically analyze potential dependencies between rules representing source calls and rules representing sink calls. We distinguish direct from indirect tainted information flow and argue under which conditions the Critical Pair Analysis is able to detect not only direct, but also indirect tainted flow. The static taint analysis (i) identifies flows that need to be further reviewed, since tainted nodes may be created by an API call and used or manipulated by another API call later without having the necessary privileges, and (ii) can be used to systematically design dynamic security tests for broken access control. The dynamic taint analysis checks if potential broken access control risks detected during the static taint analysis really occur. We apply the approach to a part of the GitHub GraphQL API. The application illustrates that our analysis supports the detection of two types of broken access control systematically: the case where users of the API may not be able to access or manipulate information, although they should be able to do so; and the case where users (or attackers) of the API may be able to access/manipulate information that they should not.
💡 Research Summary
The paper introduces a comprehensive framework for detecting broken access control (BAC) vulnerabilities in Graph APIs, with a focus on GraphQL‑based services. The authors combine static taint analysis, graph transformation modeling, Critical Pair Analysis (CPA), and dynamic validation to provide a sound and systematic approach.
First, the data model treats objects as nodes and relationships as edges. Nodes that represent data requiring specific privileges are marked with a taint label. API operations are expressed as graph transformation rules: each rule consists of a left‑hand side (LHS) pattern, a right‑hand side (RHS) pattern, and a preserved kernel K. The LHS identifies the subgraph that must exist for the operation to apply; the RHS describes the resulting graph after the operation. By classifying rules that create tainted nodes as “sources” and those that consume or modify tainted nodes as “sinks,” the authors obtain a formal representation of information flow within the API.
The static analysis leverages CPA, a technique from graph transformation theory that automatically discovers conflicts and dependencies between pairs of rules. A conflict occurs when the application of one rule disables another; a dependency occurs when one rule’s application enables another. The paper proves (Theorem 3.13) that CPA can detect not only direct taint flows (a source rule immediately followed by a sink rule) but also a class of indirect flows, provided the intermediate dependency chain consists solely of write rules that preserve the taint label. This gives the static analysis a formal guarantee of soundness for a useful subset of indirect flows.
To address the incompleteness of static analysis, the authors introduce a dynamic phase. The static phase outputs candidate source‑sink pairs; the dynamic phase automatically generates concrete GraphQL queries and mutations that exercise these pairs against a live server. By observing whether the taint propagates at runtime, the dynamic analysis validates (or refutes) the potential BAC scenarios identified statically. This two‑step process ensures that false positives are filtered out while false negatives are minimized.
A novel contribution is the notion of “role coverage.” In role‑based access control (RBAC), policies define which roles may perform which operations. The authors formalize a metric that measures whether the generated test suite exercises all roles defined in the policy, thereby providing a quantitative assessment of test completeness.
The methodology is applied to a subset of the GitHub GraphQL API. The schema includes types such as User, Repository, Issue, and Project, together with queries and mutations for creating, updating, and deleting these entities. The static analysis identifies twelve potential source‑sink pairs; seven are flagged as high‑risk based on the policy. Dynamic testing confirms two real‑world BAC issues: (1) an over‑restrictive case where a user lacking ownership cannot update a repository, revealing a policy that is too strict, and (2) an under‑restrictive case where a user can read a private issue field without proper permission, exposing a policy gap. Both cases correspond to known GitHub security discussions (issues 110618, 106598, 85661).
Key strengths of the work include:
- Formal modeling of Graph API operations via double‑pushout graph transformation, enabling precise reasoning about data creation and consumption.
- Automatic detection of rule dependencies and conflicts through CPA, providing a sound basis for static taint flow detection.
- Integration of static and dynamic analyses to achieve both scalability (static) and precision (dynamic).
- Introduction of role coverage as a metric for test completeness in RBAC contexts.
Limitations are acknowledged: the static analysis may miss flows that involve complex conditional logic or non‑write intermediate rules; CPA’s computational cost grows with the number of rules; and the current implementation focuses on GraphQL, requiring adaptation for other graph query languages (e.g., Gremlin, Cypher).
Future work suggested includes optimizing CPA (e.g., parallel processing), extending the framework to multi‑role and attribute‑based policies, handling concurrency‑related race conditions, and applying the approach to other graph databases such as Neo4j.
In summary, the paper delivers the first systematic, formally grounded approach that unifies static graph‑transformation‑based taint analysis with dynamic security testing for Graph APIs. Its successful application to a real‑world, widely used API demonstrates both practical relevance and the potential to become a standard methodology for securing graph‑centric web and mobile services.
Comments & Academic Discussion
Loading comments...
Leave a Comment