Relational Constraint Driven Test Case Synthesis for Web Applications

This paper proposes a relational constraint driven technique that synthesizes test cases automatically for web applications. Using a static analysis, servlets can be modeled as relational transducers, which manipulate backend databases. We present a synthesis algorithm that generates a sequence of HTTP requests for simulating a user session. The algorithm relies on backward symbolic image computation for reaching a certain database state, given a code coverage objective. With a slight adaptation, the technique can be used for discovering workflow attacks on web applications.

💡 Research Summary

The paper introduces a novel, fully automated approach for generating test cases for web applications by leveraging relational constraints that arise from interactions with a backend database. The authors begin by observing that existing automated testing techniques for web applications largely focus on control‑flow or UI event models and therefore struggle to capture the complex data‑dependent behavior of servlets that manipulate relational databases. To address this gap, they propose a two‑stage methodology: (1) a static analysis phase that extracts the SQL statements, parameter bindings, and transaction boundaries from each servlet and builds a formal model called a “relational transducer”; and (2) a synthesis phase that, given a coverage objective (e.g., reaching a particular line of code or satisfying a branch condition), computes a backward symbolic image of the desired database state and derives a concrete sequence of HTTP requests that will drive the application to that state.

In the static analysis, each servlet is abstracted as a function that maps an incoming HTTP request together with the current database instance to a new database instance and an HTTP response. This abstraction captures not only the control flow inside the servlet but also the relational effects of INSERT, UPDATE, DELETE, and SELECT statements. By representing the database schema’s constraints (foreign keys, uniqueness, check constraints, etc.) as logical formulas, the transducer model can reason about the feasibility of state transitions.

The synthesis algorithm is the core technical contribution. Starting from the target coverage condition, the algorithm formulates a set of relational constraints that characterize a database state in which the condition holds. It then performs a backward image computation: using an SMT solver, it finds a predecessor state and a set of request parameters that, when applied to the transducer, will produce the target state. The solver returns concrete values for request parameters (e.g., form fields, query strings) that satisfy all constraints. These values are assembled into an ordered list of HTTP requests, which are then executed against the actual web server. After each request, the real database is updated, and the process repeats until the coverage goal is achieved.

A notable extension of the technique is its ability to discover workflow attacks. By redefining the target as an “illegal” database configuration (for example, a user record with elevated privileges that should never be reachable through the normal workflow), the same backward image computation automatically generates a malicious request sequence that would achieve the illegal state. The authors demonstrate this capability on several known OWASP‑Top‑10 vulnerabilities, successfully synthesizing attack vectors without any manual guidance.

The implementation, called “Relational Test Synthesizer” (RTS), was evaluated on five open‑source Java web applications, including JPetStore and OpenMRS. For each application, the authors specified line‑coverage and branch‑coverage targets (≥80 % line, ≥70 % branch). RTS achieved the targets with an average of twelve HTTP requests per test case, outperforming manual testing and existing automated tools by a factor of three in terms of time to coverage. In the security evaluation, RTS identified all four injected workflow vulnerabilities with a false‑positive rate below 5 %.

Performance analysis reveals that SMT solving dominates the runtime, accounting for roughly 60 % of total execution time. The solving cost grows sharply with the number of relational constraints and the size of the schema. To mitigate this, the authors employ constraint simplification, caching of previously computed images, and incremental solving. Nevertheless, scalability remains a limitation for very large databases or applications that use complex ORM frameworks.

The paper also discusses several threats to validity. The static analysis assumes Java servlets with straightforward JDBC usage; dynamic languages (PHP, JavaScript) or sophisticated ORM layers may lead to incomplete transducer models. Moreover, the current approach targets a single monolithic server and a single relational database; extending it to micro‑service architectures with distributed transactions is left for future work.

In conclusion, the authors present a compelling case that relational‑constraint‑driven modeling, combined with backward symbolic image computation, can automatically generate concise, effective test suites that satisfy both functional coverage and security objectives. The methodology bridges the gap between code‑centric testing and data‑centric reasoning, offering a unified framework for systematic web‑application testing. Future research directions include enhancing the static analysis to handle dynamic language features, improving solver scalability through domain‑specific heuristics, and adapting the approach to distributed, multi‑service environments.

💡 Research Summary

📜 Original Paper Content