Benchmarking Symbolic Execution Using Constraint Problems -- Initial Results

Benchmarking Symbolic Execution Using Constraint Problems -- Initial   Results
`\textcolor{myc}{int x0, x1, x2, x3, x4, x5;}'
//  `\it declare variables symbolic'
`\textcolor{myc}{klee\_make\_symbolic(\&x0,sizeof(x0),"x0");}'
`\textcolor{myc}{klee\_make\_symbolic(\&x1,sizeof(x1),"x1");}'
  $\cdots$
`\textcolor{myc}{klee\_make\_symbolic(\&x5,sizeof(x5),"x6");}'
//  `\it enforce variable domains'
`\textcolor{myc}{klee\_assume(x0 $\geq$ 0 \&\& x0 $\leq$ 1);}'
`\textcolor{myc}{klee\_assume(x1 $\geq$ 0 \&\& x1 $\leq$ 1);}'
  $\cdots$
`\textcolor{myc}{klee\_assume(x5 $\geq$ 0 \&\& x5 $\leq$ 1);}'
if ((x0==0 && x1==0 && x2==0) ||
    (x0==0 && x1==1 && x2==0)) exit(0);
if ((x3==0 && x4==0 && x5==0) ||
    (x3==0 && x4==1 && x5==0)) exit(0);
`\textcolor{myc}{assert(0);}' // `\it CSP is satisfiable'
if ((x0==0 & x1==0 & x2==0) | 
    (x0==0 & x1==1 & x2==0) |
    (x3==0 & x4==0 & x5==0) |
    (x3==0 & x4==1 & x5==0)) exit(0);
klee_assume(!((x0==0 && x1==0 && x2==0) ||
              (x0==0 && x1==1 && x2==0) ||
              (x3==0 && x4==0 && x5==0) ||
              (x3==0 && x4==1 && x5==0)));
if (y0==dist(x0,x1)); else exit(0);
if (y1==dist(x1,x2)); else exit(0);
if ((y0==dist(x0,x1) && y1==dist(x1,x2)));
  else exit(0);
klee_assume(x0!=x1 & x0!=x2 & x0!=x3)
if (x0!=x1 && x0!=x2 && x0!=x3 &&
   y0==dist(x0,x1) && y1==dist(x1,x2)) 
   assert(0);
Example transformed constraints with version numbers from Table [table:features]. Version 1 is the full transformation, the others are only the constraint.

Transforming Combinatorial Problems to C

The idea behind turning a CSP into a C program is as follows. Solving a combinatorial problem can be imagined as two components: firstly an oracle guesser for a solution; and secondly a checker for the solution. We can think of symbolic execution as finding a solution which the oracle could have returned and we transform the CSP into a C program that functions as the checker.

Our approach is to transform a combinatorial problem, a CSP $`P`$, into a C program to be tested on program analysis tools as follows: (i) the finite domain variables of the CSP correspond to integer variables in the program (C variables are treated as symbolic); (ii) CSP variable domains are converted to assume statements (see below); and (iii) the constraint relations are encoded into conditional statements in the program or as assume statements. The encoding of the CSP to C ensures that when the CSP $`P`$ is satisfiable (symbolic) execution of is able to reach a distinguished program point—the values of the C variables in are a solution to CSP $`P`$. To test the analysis tools, the distinguished program point is mapped to an assertion failure, i.e. assert(0). Similarly, when CSP $`P`$ is unsatisfiable, the assert cannot be reached from any path, i.e., all paths will not succeed.

From a single CSP $`P`$, we propose several transformations of $`P`$ into C programs. The purpose of various transformations which result in different programs is to exercise the tools in different ways (as we show in the results). We employ two general approaches for encoding the constraints in $`P`$:

  1. approach: if statements are created, whose condition captures the values of variables satisfying the constraint. The execution can terminate with an exit(0) statement in their else branches. If it is possible that execution takes the then branches, then there will be satisfying values for the variables of the encoded constraint. In symbolic execution, the condition is simply added into the current set of constraints, i.e., the path condition.

  2. assume approach: the constraint solver of the analysis tool is used directly—the constraints of the CSP $`P`$ are translated into an argument to assume statements (klee_as­sume of KLEE or __llbmc_assume of LLBMC). When an assume statement is executed in symbolic execution, the constraint is simply added into the path condition and not intended to be executed as C, whereas the is C code.

When all constraints have been translated, we place an assert(0) at the end of the program (the distinguished program point). The assert(0) triggers an assertion failure, so if the CSP $`P`$ is satisfiable (no contradiction is found), the program terminates with an assertion failure. The generated test case from symbolic execution, e.g., KLEE or Tracer-X, is a solution for $`P`$. If the CSP $`P`$ is unsatisfiable, assertion failure will not be possible and will be reported as a failure.

Table [table:features] (to the left of the double vertical line) shows how different features are combined to obtain different transformed (versions) C programs for a CSP $`P`$. Overall we have designed 12 extensional transformations for extensional CSPs and 10 intensional transformations for intensional CSPs.1 Each transformation version (different row in Table [table:features]) employs different C constructs on the same underlying CSP $`P`$. The transformations are designed to be correct by construction. There is insufficient space to formalize the transformations, rather, we do it by example. Figure 1 (a) shows how the negative tables in Figure 2 (a) are transformed into C under the Version 1 transformation. The green coloured code in Figure 1(a) shows details of the transformation common to all versions of a particular CSP: C variable declaration, KLEE symbolic variable construction, constraining variable domain, and distinguished assertion program point. CSP variables are symbolic in KLEE and domains are encoded with klee_assume (similarly for LLBMC). Reaching the assert means the CSP is satisfiable. Similarly, Figures 1 (b), and (c) are the results of Extensional Versions 5 and 8 on the same constraint. Figure 1 (d) and (e) are from Intensional Versions 1 and 2 on Figure 2 (b), and Figure 1 (f) is from Intensional Version 9 on Figure 2 (c). Figure 1 (g) is from Intensional Version 3 combining constraints in Figures 2 (b) and (c) as a single problem.

Table [table:features] shows how the transformations in each column (Construct, Operator, Grouped) are combined to give a particular version. The Construct column gives the construct used in the C program, if/assume approaches. Figures 1 (a) and (b) are the transformation results of the approach on the constraint example in Figure 2 (a), while Figure 1 (c) shows the result of the assume approach, which uses an assume statement (example is for KLEE using klee_assume, whereas when testing on LLBMC, __llbmc_assume is used).

The Operator column gives the logic operators used in the conditions, C logical operators (&& or ||) or C bitwise operators, which are used to combine the expressions making up the conditions. The difference between the logical and bitwise operators is that the logical operators are from C-style short-circuiting of conditions, i.e. the condition is broken up into a cascading conditional branching structure with only a atomic condition guarding every branch. Atomic conditions do not contain logical or bitwise operators. Figure 1 (b) and (f) shows the usage of bitwise operators in the transformation, while Figure 1 (a), (c), (e) and (g) show the usage of logical operators. For Intensional Transformations, when there is no grouping of conditions, no operator is needed (denoted by NOP in Version 1 and 6) in Table [table:features] and illustrated in Figure 1 (d). The CSP may have several constraints defined in the same way but on different variables, i.e. in Figure 2 (a), there is a single constraint group with two individual constraints.

The Grouped column shows how translation is grouped by constraints: the translation is per constraint group (yes); per individual constraints defined in the group (no); or the entire CSP $`P`$ is grouped together as a single condition (all). For example, Figure 1 (a) and (d) show the no case, Figure 1 (g) shows the all case given the constraints in Figures 2 (b) and (c), and the rest of Figure 1 show the yes case. Figure 1 (g) simply combines the assert(0) statement in the then body of the if statement. We remark that as the benchmarks are for testing path explosion and constraint solving, the transformations produce straight-line code. However, it is straightforward to make more compact forms with loops as well, but that only makes symbolic execution more complex.

CSP Background

One formulation of constraint problems are Constraint Satisfaction Problems (CSP). A Constraint Satisfaction Problem (CSP) $`P`$ is a pair $`(X , C)`$ where $`X`$ is a set of $`n`$ variables $`\{x_1, . . . , x_n\}`$ and $`C`$ a set of $`e`$ constraints $`\{c_1, . . . , c_e\}`$ . In this paper, we focus on finite discrete combinatorial problems. In a finite domain CSP, variables $`x \in X`$ take values from their domain $`D(x)`$ which is a finite set of values. Each $`c\in C`$ has two components: a scope ($`scp(c)`$) which is an ordered subset of variables of $`X`$; and a relation over the scope ($`rel(c)`$). Given $`scp(c) = {x_{i1}, . . . , x_{ir}}, rel(c)\subseteq \Pi^r_{j=1}D(x_{ij})`$ is the set of satisfying tuples (combinations of values) for the variables in $`scp(c)`$. A constraint is satisfied if there is valuation for the variables from at least one of the tuples in its relation which takes values from the variables’ domain. A solution to $`P`$ is a valuation for $`X`$ such that every constraint is satisfied. $`P`$ is satisfiable iff a solution exists. If no solution exists then $`P`$ is unsatisfiable.

<group>
<extension>
<list> %0 %1 %2 </list>
<conflicts> (0,0,0) (0,1,0)</conflicts>
</extension>
<args> x[0] x[1] x[2] </args>
<args> x[3] x[4] x[5] </args>
</group>
<group>
<intension> eq(%0,dist(%1,%2)) </intension>
<args> y[0] x[0] x[1] </args>
<args> y[1] x[1] x[2] </args>
</group>
    <allDifferent> x[0] x[1] x[2] </allDifferent>
Example CSP Constraints in XCSP3 Format
Type Version Construct Operator Grouped Benchmarks (#lines, #variables)
Sat-Aim100 Sat-Aim200 Dubois
1 logical no (853, 100) (2080, 200) (933, 130)
2 logical yes (352, 100) (657, 200) (432, 130)
3 logical all (316, 100) (616, 200) (412, 130)
4 bitwise no (1040, 100) (2080, 200) (933, 130)
5 bitwise yes (352, 100) (657, 200) (432, 130)
6 bitwise all (316, 100) (616, 200) (412, 130)
7 logical no (782, 100) (1552, 200) (759, 130)
8 logical yes (531, 100) (1033, 200) (676, 130)
9 logical all (516, 100) (1016, 200) (672, 130)
10 bitwise no (782, 100) (1552, 200) (759, 130)
11 bitwise yes (531, 100) (1033, 200) (676, 130)
12 bitwise all (782, 100) (1552, 200) (759, 130)
AllInterval CostasArray HayStacks
1 NOP no (549, 74) (718, 98) (3302, 173)
2 logical yes (259, 74) (361, 98) (555, 173)
3 logical all (239, 74) (310, 98) (535, 173)
4 bitwise yes (252, 74) (354, 98) (548, 173)
5 bitwise all (238, 74) (309, 98) (534, 173)
6 NOP no (2727, 74) (1189, 98) (2231, 173)
7 logical yes (242, 74) (323, 98) (538, 173)
8 logical all (238, 74) (309, 98) (534, 173)
9 bitwise yes (242, 74) (323, 98) (538, 173)
10 bitwise all (238, 74) (309, 98) (534, 173)

A constraint can be defined in two ways: (i) intensional; or (ii) extensional (also known as table constraint). An intensional constraint is one where the relation of the constraint is defined implicitly. This is a common form of constraints supported in constraint solvers, e.g., a linear arithmetic inequality is implicitly understood in the usual arithmetic fashion over either real numbers or integers. Constraints can also be defined extensionally—the relation is defined as a (positive) table giving the satisfying tuples of constraint $`c`$ (respectively, also a “negative table” defining the set of “negative tuples”, namely, the tuples not satisfying the constraint). Consider the constraint $`x + y = 1`$ where the domain of the variables is binary. The equation above is already in the intensional form of the constraint. The positive extensional form is given by the tuples $`\{\langle 1,0\rangle, \langle 0,1\rangle\}`$ (respectively, the negative extensional form is $`\{\langle 0,0\rangle, \langle 1,1\rangle\}`$) for the variable sequence $`\langle x,y \rangle`$.

As our benchmarks used the XCSP3 format, we briefly describe XCSP3. Figure 2 illustrates three kinds of constraints in XCSP3 format . XCSP3 is an XML format for defining CSP instances which can be used with many constraint solvers. It is also used in the XCSP3 competition for constraint solvers . We use XCSP3 problem instances in our experiments. The details of the XCSP3 format are beyond of the scope of the paper, so examples in XCSP3 are meant to be illustrative only.

We consider CSPs where the constraints are in: (i) intensional form, and (ii) extensional form (see Figure 2 (a) which defines a negative table with two constraints on $`\langle x0, x1, x2 \rangle`$ and $`\langle x3, x4, x5 \rangle`$ using the same table definition). Some examples of intensional constraints are: eq(%0,dist(%1,%2)) representing $`x = |y - z|`$ for a given substitution of $`x`$, $`y`$, and $`z`$ (see Figure 2 (b)); and the alldifferent constraint (see Figure 2 (c)), which constrains the variables $`\langle x0, x1, x2 \rangle`$ to all take different values. We remark that alldifferent is considered a global constraint but in this paper, what matters is the intensional versus extensional distinction of the constraint.


  1. Intensional versions do not need operators in every case (shown as NOP in the table). For example, both Extensional Versions 1 and 4, correspond to Intensional Version 1. This explains the difference in the number of transformations for extensional and intensional problems. ↩︎