Benchmarking Symbolic Execution Using Constraint Problems -- Initial Results
`\textcolor{myc}{int x0, x1, x2, x3, x4, x5;}'
// `\it declare variables symbolic'
`\textcolor{myc}{klee\_make\_symbolic(\&x0,sizeof(x0),"x0");}'
`\textcolor{myc}{klee\_make\_symbolic(\&x1,sizeof(x1),"x1");}'
$\cdots$
`\textcolor{myc}{klee\_make\_symbolic(\&x5,sizeof(x5),"x6");}'
// `\it enforce variable domains'
`\textcolor{myc}{klee\_assume(x0 $\geq$ 0 \&\& x0 $\leq$ 1);}'
`\textcolor{myc}{klee\_assume(x1 $\geq$ 0 \&\& x1 $\leq$ 1);}'
$\cdots$
`\textcolor{myc}{klee\_assume(x5 $\geq$ 0 \&\& x5 $\leq$ 1);}'
if ((x0==0 && x1==0 && x2==0) ||
(x0==0 && x1==1 && x2==0)) exit(0);
if ((x3==0 && x4==0 && x5==0) ||
(x3==0 && x4==1 && x5==0)) exit(0);
`\textcolor{myc}{assert(0);}' // `\it CSP is satisfiable'
if ((x0==0 & x1==0 & x2==0) |
(x0==0 & x1==1 & x2==0) |
(x3==0 & x4==0 & x5==0) |
(x3==0 & x4==1 & x5==0)) exit(0);
klee_assume(!((x0==0 && x1==0 && x2==0) ||
(x0==0 && x1==1 && x2==0) ||
(x3==0 && x4==0 && x5==0) ||
(x3==0 && x4==1 && x5==0)));
if (y0==dist(x0,x1)); else exit(0);
if (y1==dist(x1,x2)); else exit(0);
if ((y0==dist(x0,x1) && y1==dist(x1,x2)));
else exit(0);
klee_assume(x0!=x1 & x0!=x2 & x0!=x3)
if (x0!=x1 && x0!=x2 && x0!=x3 &&
y0==dist(x0,x1) && y1==dist(x1,x2))
assert(0);
Transforming Combinatorial Problems to C
The idea behind turning a CSP into a C program is as follows. Solving a combinatorial problem can be imagined as two components: firstly an oracle guesser for a solution; and secondly a checker for the solution. We can think of symbolic execution as finding a solution which the oracle could have returned and we transform the CSP into a C program that functions as the checker.
Our approach is to transform a combinatorial problem, a CSP $`P`$, into
a C program to be tested on program analysis tools as follows: (i) the
finite domain variables of the CSP correspond to integer variables in
the program (C variables are treated as symbolic); (ii) CSP variable
domains are converted to assume statements (see below); and (iii) the
constraint relations are encoded into conditional statements in the
program or as assume statements. The encoding of the CSP to C ensures
that when the CSP $`P`$ is satisfiable (symbolic) execution of is able
to reach a distinguished program point—the values of the C variables
in are a solution to CSP $`P`$. To test the analysis tools, the
distinguished program point is mapped to an assertion failure, i.e.
assert(0). Similarly, when CSP $`P`$ is unsatisfiable, the assert
cannot be reached from any path, i.e., all paths will not succeed.
From a single CSP $`P`$, we propose several transformations of $`P`$ into C programs. The purpose of various transformations which result in different programs is to exercise the tools in different ways (as we show in the results). We employ two general approaches for encoding the constraints in $`P`$:
-
approach:
ifstatements are created, whose condition captures the values of variables satisfying the constraint. The execution can terminate with anexit(0)statement in their else branches. If it is possible that execution takes the then branches, then there will be satisfying values for the variables of the encoded constraint. In symbolic execution, the condition is simply added into the current set of constraints, i.e., the path condition. -
assume approach: the constraint solver of the analysis tool is used directly—the constraints of the CSP $`P`$ are translated into an argument to assume statements (
klee_assumeof KLEE or__llbmc_assumeof LLBMC). When an assume statement is executed in symbolic execution, the constraint is simply added into the path condition and not intended to be executed as C, whereas the is C code.
When all constraints have been translated, we place an assert(0) at
the end of the program (the distinguished program point). The
assert(0) triggers an assertion failure, so if the CSP $`P`$ is
satisfiable (no contradiction is found), the program terminates with an
assertion failure. The generated test case from symbolic execution,
e.g., KLEE or Tracer-X, is a solution for $`P`$. If the CSP $`P`$ is
unsatisfiable, assertion failure will not be possible and will be
reported as a failure.
Table [table:features] (to the left of the
double vertical line) shows how different features are combined to
obtain different transformed (versions) C programs for a CSP $`P`$.
Overall we have designed 12 extensional transformations for extensional
CSPs and 10 intensional transformations for intensional CSPs.1 Each
transformation version (different row in Table
[table:features]) employs different
C constructs on the same underlying CSP $`P`$. The transformations are
designed to be correct by construction. There is insufficient space to
formalize the transformations, rather, we do it by example. Figure
1 (a) shows how the negative
tables in Figure 2 (a) are transformed into C under
the Version 1 transformation. The green
coloured code in Figure
1(a) shows details of the
transformation common to all versions of a particular CSP: C variable
declaration, KLEE symbolic variable construction, constraining variable
domain, and distinguished assertion program point. CSP variables are
symbolic in KLEE and domains are encoded with klee_assume (similarly
for LLBMC). Reaching the assert means the CSP is satisfiable.
Similarly, Figures
1 (b), and (c) are the results
of Extensional Versions 5 and 8 on the same constraint. Figure
1 (d) and (e) are from
Intensional Versions 1 and 2 on Figure
2 (b), and Figure
1 (f) is from Intensional
Version 9 on Figure 2 (c). Figure
1 (g) is from Intensional
Version 3 combining constraints in Figures
2 (b) and (c) as a single problem.
Table [table:features] shows how the
transformations in each column (Construct, Operator, Grouped) are
combined to give a particular version. The Construct column gives
the construct used in the C program, if/assume approaches. Figures
1 (a) and (b) are the
transformation results of the approach on the constraint example in
Figure 2 (a), while Figure
1 (c) shows the result of the
assume approach, which uses an assume statement (example is for KLEE
using klee_assume, whereas when testing on LLBMC, __llbmc_assume is
used).
The Operator column gives the logic operators used in the
conditions, C logical operators (&& or ||) or C bitwise operators,
which are used to combine the expressions making up the conditions. The
difference between the logical and bitwise operators is that the logical
operators are from C-style short-circuiting of conditions, i.e. the
condition is broken up into a cascading conditional branching structure
with only a atomic condition guarding every branch. Atomic conditions
do not contain logical or bitwise operators. Figure
1 (b) and (f) shows the usage of
bitwise operators in the transformation, while Figure
1 (a), (c), (e) and (g) show the
usage of logical operators. For Intensional Transformations, when there
is no grouping of conditions, no operator is needed (denoted by NOP in
Version 1 and 6) in Table
[table:features] and illustrated in
Figure 1 (d). The CSP may have several
constraints defined in the same way but on different variables, i.e. in
Figure 2 (a), there is a single constraint
group with two individual constraints.
The Grouped column shows how translation is grouped by constraints:
the translation is per constraint group (yes); per individual
constraints defined in the group (no); or the entire CSP $`P`$ is
grouped together as a single condition (all). For example, Figure
1 (a) and (d) show the no
case, Figure 1 (g) shows the all case given
the constraints in Figures
2 (b) and (c), and the rest of
Figure 1 show the yes case. Figure
1 (g) simply combines the
assert(0) statement in the then body of the if statement. We
remark that as the benchmarks are for testing path explosion and
constraint solving, the transformations produce straight-line code.
However, it is straightforward to make more compact forms with loops as
well, but that only makes symbolic execution more complex.
CSP Background
One formulation of constraint problems are Constraint Satisfaction Problems (CSP). A Constraint Satisfaction Problem (CSP) $`P`$ is a pair $`(X , C)`$ where $`X`$ is a set of $`n`$ variables $`\{x_1, . . . , x_n\}`$ and $`C`$ a set of $`e`$ constraints $`\{c_1, . . . , c_e\}`$ . In this paper, we focus on finite discrete combinatorial problems. In a finite domain CSP, variables $`x \in X`$ take values from their domain $`D(x)`$ which is a finite set of values. Each $`c\in C`$ has two components: a scope ($`scp(c)`$) which is an ordered subset of variables of $`X`$; and a relation over the scope ($`rel(c)`$). Given $`scp(c) = {x_{i1}, . . . , x_{ir}}, rel(c)\subseteq \Pi^r_{j=1}D(x_{ij})`$ is the set of satisfying tuples (combinations of values) for the variables in $`scp(c)`$. A constraint is satisfied if there is valuation for the variables from at least one of the tuples in its relation which takes values from the variables’ domain. A solution to $`P`$ is a valuation for $`X`$ such that every constraint is satisfied. $`P`$ is satisfiable iff a solution exists. If no solution exists then $`P`$ is unsatisfiable.
<group>
<extension>
<list> %0 %1 %2 </list>
<conflicts> (0,0,0) (0,1,0)</conflicts>
</extension>
<args> x[0] x[1] x[2] </args>
<args> x[3] x[4] x[5] </args>
</group>
<group>
<intension> eq(%0,dist(%1,%2)) </intension>
<args> y[0] x[0] x[1] </args>
<args> y[1] x[1] x[2] </args>
</group>
<allDifferent> x[0] x[1] x[2] </allDifferent>
| Type | Version | Construct | Operator | Grouped | Benchmarks (#lines, #variables) | |||
| Sat-Aim100 | Sat-Aim200 | Dubois | ||||||
| 1 | logical | no | (853, 100) | (2080, 200) | (933, 130) | |||
| 2 | logical | yes | (352, 100) | (657, 200) | (432, 130) | |||
| 3 | logical | all | (316, 100) | (616, 200) | (412, 130) | |||
| 4 | bitwise | no | (1040, 100) | (2080, 200) | (933, 130) | |||
| 5 | bitwise | yes | (352, 100) | (657, 200) | (432, 130) | |||
| 6 | bitwise | all | (316, 100) | (616, 200) | (412, 130) | |||
| 7 | logical | no | (782, 100) | (1552, 200) | (759, 130) | |||
| 8 | logical | yes | (531, 100) | (1033, 200) | (676, 130) | |||
| 9 | logical | all | (516, 100) | (1016, 200) | (672, 130) | |||
| 10 | bitwise | no | (782, 100) | (1552, 200) | (759, 130) | |||
| 11 | bitwise | yes | (531, 100) | (1033, 200) | (676, 130) | |||
| 12 | bitwise | all | (782, 100) | (1552, 200) | (759, 130) | |||
| AllInterval | CostasArray | HayStacks | ||||||
| 1 | NOP | no | (549, 74) | (718, 98) | (3302, 173) | |||
| 2 | logical | yes | (259, 74) | (361, 98) | (555, 173) | |||
| 3 | logical | all | (239, 74) | (310, 98) | (535, 173) | |||
| 4 | bitwise | yes | (252, 74) | (354, 98) | (548, 173) | |||
| 5 | bitwise | all | (238, 74) | (309, 98) | (534, 173) | |||
| 6 | NOP | no | (2727, 74) | (1189, 98) | (2231, 173) | |||
| 7 | logical | yes | (242, 74) | (323, 98) | (538, 173) | |||
| 8 | logical | all | (238, 74) | (309, 98) | (534, 173) | |||
| 9 | bitwise | yes | (242, 74) | (323, 98) | (538, 173) | |||
| 10 | bitwise | all | (238, 74) | (309, 98) | (534, 173) | |||
A constraint can be defined in two ways: (i) intensional; or (ii) extensional (also known as table constraint). An intensional constraint is one where the relation of the constraint is defined implicitly. This is a common form of constraints supported in constraint solvers, e.g., a linear arithmetic inequality is implicitly understood in the usual arithmetic fashion over either real numbers or integers. Constraints can also be defined extensionally—the relation is defined as a (positive) table giving the satisfying tuples of constraint $`c`$ (respectively, also a “negative table” defining the set of “negative tuples”, namely, the tuples not satisfying the constraint). Consider the constraint $`x + y = 1`$ where the domain of the variables is binary. The equation above is already in the intensional form of the constraint. The positive extensional form is given by the tuples $`\{\langle 1,0\rangle, \langle 0,1\rangle\}`$ (respectively, the negative extensional form is $`\{\langle 0,0\rangle, \langle 1,1\rangle\}`$) for the variable sequence $`\langle x,y \rangle`$.
As our benchmarks used the XCSP3 format, we briefly describe XCSP3. Figure 2 illustrates three kinds of constraints in XCSP3 format . XCSP3 is an XML format for defining CSP instances which can be used with many constraint solvers. It is also used in the XCSP3 competition for constraint solvers . We use XCSP3 problem instances in our experiments. The details of the XCSP3 format are beyond of the scope of the paper, so examples in XCSP3 are meant to be illustrative only.
We consider CSPs where the constraints are in: (i) intensional form, and
(ii) extensional form (see Figure
2 (a) which defines a negative
table with two constraints on $`\langle x0, x1, x2 \rangle`$ and
$`\langle x3, x4, x5 \rangle`$ using the same table definition). Some
examples of intensional constraints are: eq(%0,dist(%1,%2))
representing $`x = |y - z|`$ for a given substitution of $`x`$, $`y`$,
and $`z`$ (see Figure
2 (b)); and the alldifferent
constraint (see Figure
2 (c)), which constrains the
variables $`\langle x0, x1, x2 \rangle`$ to all take different values.
We remark that alldifferent is considered a global constraint but in
this paper, what matters is the intensional versus extensional
distinction of the constraint.
-
Intensional versions do not need operators in every case (shown as NOP in the table). For example, both Extensional Versions 1 and 4, correspond to Intensional Version 1. This explains the difference in the number of transformations for extensional and intensional problems. ↩︎