Interface Building for Software by Modular Three-Valued Abstraction Refinement

Verification of software systems is a very hard problem due to the large size of program state-space. The traditional techniques (like model checking) do not scale; since they include the whole state-space by inlining the library function codes. Curr…

Authors: Pritam Roy

Verification of software systems is a very hard problem due to the large size of program state-space. Most software programs contain library functions and these kind of functions are examples of open systems. The verification of such open systems becomes infeasible due to two main problems. Firstly, in order to verify a given program one needs to inline the library function code and it increases the space complexity of the verification algorithms. Current formal techniques like model-checking can not handle the large state-space generated from the program variables. The second option is to verify the library functions a priori so that there is no need to inline them. For this purpose, most of the time a small code containing a sequence of library functions calls(called client) is written. The client code invokes the library functions to close the open system. The library functions are impossible to verify in the absence of exhaustive client program. Hence most of the verification approaches plug-in a client code to close the open-system. The current research [9,1,3] avoids these two problems by applying modular verification techniques which builds a small call sequence graph, called interface representing union of all client programs. The interface contains all possible call sequences which leads the library to error or illegal states. Similarly, the interface should contain all possible call sequences which avoids the error states. Henceforth constrains on the use of the library function calls from outside and the user can distinguish the legal call sequences from the illegal ones by simply looking at the interface. There are two immediate benefits of using the interfaces. Firstly, these interfaces are light-weight representation of the libraries and the implementation of the library functions can be replaced by the interface. Secondly, the interfaces can be constructed without the help of any client program. The interface should be safe i.e. all illegal call sequences (which leads the library to the error states) will be present in the interface. The interface graph should be permissive i.e. all legal sequences will be present in the interface. However, there are some challenges in building succinct interfaces. The interface size can become exponential in terms of number of variables. A symbolic representation and abstraction techniques partition the state-space into a small number of regions where every region represents one node of the interface graph. Some researches apply these abstraction and symbolic techniques to obtain a small but safe and permissive interface. The work by Alur et. al. ( [1]) uses Angluin's learning algorithm L* to create an interface. The algorithm learns the interface language by asking membership and equivalence queries to teacher (here program). The generated interface is safe and minimal; but not permissive. To handle big case studies predicate abstraction has been used, however the user need to provide the predicates. There is no automatic abstraction refinement. The algorithm returns minimal size interface if the algorithm is not hit by timeout. Experimental results show that even in small examples timeout occurs. The CEGAR approach by Henzinger et. al. ( [9]) creates a safe and permissive interface. The size of the interface can be big enough depending on the chosen counter-example. The direct approach by Beyer et. al. ( [3]) creates an interface which is safe and permissive. This approach does not use abstraction and hence the interface can become very large. Unlike the related work, our work can also be used in unstructured or non-object oriented (C style) functions. In an object-oriented framework every class variable is accessible to every class method and can be a global variable to the class method. Instead we assume that each function may contain several local variables in addition to those global variables. Hence, we have more general platform to compute interface. Each of these functions can also have several sequential updates of variables, call to other functions even recursive calls to themselves. However, we compute the interface including only functions accessible to the user level. In the first stage of three stage algorithm, every C library function is parsed by CIL (C Intermediate Language) [11] and converted into TICC [4] input language. This language syntax is similar to the guarded-update language. We have implemented the next two stages in this Multi-valued Decision Diagram [10]based symbolic tool TICC. The second stage computes the transition summary of each function. This modular algorithm handles each function separately including local variables within the scope. However, the space complexity of function summary becomes a bottleneck in order to compute big functions which may contain large number of guarded-updates. Hence, we employ three valued abstraction refinement schemes in addition to symbolic techniques. The abstraction in summarization ensures small size; whereas successive refinement of the abstract states fine tune the abstraction to obtain the safety and permissiveness. In the last stage, an interface graph is built from the abstract set of states. We show different stages of building a symbolic safe and permissive interface in the following example. Example 1 (Motivating Example). Figure 1(a) defines a stack data-type stackT and two functions push and pop. The data type stackT has an array of integers el of size M AX and an integer showing the top of the stack. The function pop returns error when the stack is empty i.e. top is zero. The function push returns error if the top is equal to M AX. Otherwise copies the input value sd into the el array at address top. The top is incremented later. Figure 1(b) shows how the C code is converted into guarded-update rule in the next stage. The global variable err denotes the error in the library and the library goes to error state when err is set to 1. Figure 1(c) shows the interface graph from the set of rules. The initial state of the interface graph is state 1 where the stack is empty. A call to pop function from the initial state will move the library into an ERROR state. Similarly calling push form state 3 will be an error due to full stack. We can note that the interface can create many legal as well as illegal sequences of stack functions. To check each of them we otherwise need a set of client programs. Finally we discuss the applications of the safe and permissive interface graph. Firstly, any given client program can immediately verify with the help of the interface graph whether the function call sequence in the client leads the library to some error states. Secondly, the interface can actually provide an offline testsuite for a set of functions. Often the source of the library is unknown; however one can create a model program from the available documentation of the functions. The interface graph obtained from the model program can be used to test the implementation-under-test (IUT). In this section we provide preliminary definitions and the background work. The library reaches an error set E ⊆ S G when the global variable err is set to 1. Moreover, the error set is a sink set of the library. The initial configuration of the library is given by set I ⊆ S G . Each function f ∈ F G also contains a set of local variables V f L . The scope of any local variable v ∈ V f L is function f . There is a special local variable, called s, in V f L which corresponds to the relative location in the function with respect to the first location. For a function f , all variables V f can be given as V f L ∪ V G and function state-space S f can be defined with respect to different valuations V f . We note that each global set s G ∈ S G is a non-empty subset of s G ⊆ S f function state-space. The initial local state set I f L ⊆ S f denotes the entry point to the function f . All variables of the library Lib is denoted by V and is given by The total state-space S can be defined with respect to different valuations of all variables V . Each function f ∈ F contains some number (say k) of guarded-update rules. For i-th such rule, its condition part i.guard ⊆ S f can be given as a set of function states, and the assignment part i.update ⊆ S f × S f can be given as the set of transitions. For a set X ⊆ S f , i.update(X) : S f denotes the next state of X in the i -th update rule. The conditional transition of rule i given as The transition relation T rans f ⊆ S f × S f can be given as the union of rules corresponding to the function f i.e. T rans f := ∪ i=1...k i.trans. We will use T rans f (t) ⊆ S f to denote the successor set of state t ∈ S f . For a binary relation ∈ {=, ≤, ≥} and a state-space S, the set S | v a denotes the set where the value of a variable v related to value a with relation . For a set X ⊆ S f , we define support(X) ⊆ V f as the set of variables whose value change result in a value change of X. Formally we can write, where s = v s implies that s = s except for a variable v ∈ V f . Interface graph is an input-enabled interface automata. Given a Library Lib = (F G , V G , E, I) and global state-space S G , we can define interface-graph or call sequence graph as IG = N, T, T e , In, Er where, the nodes N ⊆ 2 2 S G correspond to the set of states, the set In ⊆ N denotes the initial nodes corresponding to I, the set Er ⊆ N denotes the error nodes corresponding to E, the set T ⊆ N × F G × (N \ Er) denotes good transitions. the set T e ⊆ N × F G × Er denotes erroneous transitions. For a library L = (F G , V G ), a function f ∈ F G and a function state-space S f , an abstraction R ⊆ 2 2 S f \∅ is defined such that each abstract state (or region) r ∈ R is a non-empty subset r ⊆ S f of concrete states. We require R = S f . For subsets T ⊆ S f and U ⊆ R, we write: Thus, for a set U ⊆ R of abstract states, U ↓ is the corresponding set of concrete states. For a set T ⊆ R of concrete states, T ↑ m R and T ↑ M R are the set of abstract states that constitute over and under-approximations of the concrete set T . We say that the abstraction R of a state-space S f is precise for a set We will express our algorithms for solving reachability on the function state space in µ-calculus notation [8]. Consider a procedure γ : 2 V f → 2 V f , monotone when 2 V f is considered as a lattice with the usual subset ordering. We denote by µZ.γ(Z) (resp. νZ.γ(Z)) the least (resp. greatest) fix-point of γ, that is, the least (resp. greatest) set Z ⊆ V such that Z = γ(Z). As is well known, since V is finite, these fix-points can be computed via Picard iteration: µZ.γ(Z) = lim n→∞ γ n (∅) and νZ.γ(Z) = lim n→∞ γ n (V ). For a library function f and a function state-space S f , we define the one-step predecessor operator P re f,1 : 2 S f → 2 S f as follows, for all Y ⊆ S f : We define the multi-step predecessor operator P re f, * : 2 S f → 2 S f as follows, for all Y ⊆ S f : Intuitively, the set P re f, * (X) consists a subset of S f from which one can reach to X by applying zero or more transitions within the function f by applying rules one after another. For the abstract state space R, we introduce abstract versions of P re f,R . As multiple concrete states may correspond to the same abstract state, we cannot compute, on the abstract state space, a precise analogous of P re f,R . We define two abstract operators: the may operator P re f,R m : 2 R → 2 R , which constitutes an over-approximation of P re f , and the must operator P re f,R M : 2 R → 2 R , which constitutes an under-approximation of P re f [6]. We let, for U ⊆ R: The fact that P re f,R m and P re f,R M are over and under-approximations of the predecessor operator is made precise by the following observation: for all U ⊆ R we have . For an integer k ≥ 1 and function state-space S f , we recursively define the k-step post operator P ost f,k : 2 S f → 2 S f as follows, for all X ⊆ S f : For an abstract state space R ⊆ 2 2 S f , we define the abstract post operator P ost f,R m : 2 R → 2 R as follows, for all X ⊆ R: where k is the smallest integer to satisfy P ost f,k+1 (I f L ∩ (X↓)) = ∅. Intuitively, the condition implies that no new states are added in the k+1-th iteration, hence the last updated value when f returns can be obtained by applying P ost f,k to a subset of X↓ corresponding to the function's initial state set I f L . In this section we discuss our procedure to convert C functions into the "sociable interface automata" [5] format. This format is contains several guarded-update rules and is the input format of our symbolic tool TICC. In our work the frontend and back-end are separate. Hence one only need a different front-end to parse functions from any other language (like Java/C++) to generate the TICC input format models. The next stages of the algorithm can reuse the out tool TICC to build interface graphs. The C functions are fed into CIL [11] tool which parses C source code and returns the control flow graph. The control flow graph contains block structure as nodes and the conditions as the transitions. We have modified the control flow graph for each function into set of guarded-update rules. The conditions are represented as guards and the assignments are represented as updates. The special local variable s defines the location of current block. For a variable v, the primed variable v denotes the v in the next sequential step. When the translator encounters a critical error condition (e.g. call to exit(1)) in the control flow graph; the global variable err is set to 1 in the translated library. -Control Flow Structures: The C source like "if (a =0) {b=0;} else {b=1;}" is converted into the following rules: The switch and loop (like while, for) structures can be handled similarly. -Variables and Data Structures: Currently the algorithm supports unsigned integers with small number (e.g. 4) of bits. The fixed-size arrays and structures are flattened in the translation process. In the Integer Stack example in Figure 1(b) shows how an array of size 3 is translated as 3 integer variables. The structure elements are also flattened in the example. Currently our translation does not directly handle pointers and recursive data types. However we can manually translate the pointers into integers only if we know that the control flow of the function does not depend on the value at its pointer location. -Function Calls: Currently in order to compute the abstract transition for function f , we inline all the intermediate function call inside the body of f . In the guarded-update rule semantics, the rules of the intermediate functions are explicitly added to the rules of f . An explicit stack data structure is added to store the return address and the context variables. This trick can be applied to one function calling another function as well as the non-tail recursive function calls. The tail-recursive function calls can be converted into loops and do not need the stack. In the Appendix, we show a complete translation of a recursive c function. In this section we assume that the C functions are already parsed by CIL and modified into a software library module Lib = (F G , V G , E, I). We describe the basic algorithms for abstract refinement and building interface from a given library Lib. We also provide some implementation specific optimizations. Modular Verification : Each function is considered separately in AbsRef (Algorithm 2). Since, the interface graph is an input-enabled interface automata, every abstract state in the function can be checked separately for error reachability in one step function transition. The algorithm starts with the initial abstraction R and the set of useful variables V abs are obtained from the support set of the abstract states. The local abstraction R f and global abstraction R G are initialized with R. The must abstraction transition is computed with respect to R f and we compute the must predecessor S M of the error set E. The set S M determines the set of states of the function which eventually reach the error set E. The set S f M is subset of S M corresponding to the initial set of states of the function. One-step concrete pre-image S 1 of S M ↓ checks whether any new states can be added to S M ↓. If S 1 \ S M ↓ is non-empty then the local abstraction R f is refined and the loop continues. Otherwise the global abstraction R G is refined with respect to S f M . The local and global refinements are described in the next paragraph. The algorithm terminates when each abstract state can either reach E or can not reach E in one function step. Algorithm 2 AbsRef(R, f, E) RG:=RG ∪ {r1, r2} \ {r}, where r1 := (r ∩ S f M ) and r2 := (r Abstraction R f is refined for all valuations of v 11. end if Automatic Refinement : For refinement of the local abstraction R f , the algorithm finds a variable v ∈ V f which is not in the set V abs and is in the support set of S 1 m \ S M ↓. The variable is added to the significant set V abs and a new abstraction R f is obtained with respect to different valuations of v. The refinement of global abstraction R G happens after the local abstraction reaches a fix-point and no new states can be added in the S M set. For each abstract state r ∈ R G have a non-empty intersection with both S f M and ¬S f M , then it is split into two states r 1 and r 2 . Input: Abstraction R, a set of functions F , a library Lib = (FG, VG, E, I) Output: Interface Graph IG = (N, T, Te, In, Er) 1. Q, N, T, Te, In, Er = ∅ 2. append(Q, I); append(N, I ∪ E); append(In, I); append(Er, E) 3. while Q is non-empty do 4. curr := removeFirst(Q) 5. for each f ∈ F do 6. next := P ost f,R m (curr) 7. if ( not member(N, next)) then append (Q, next); append (N,next) endif 8. if (next ⊆ E) then Te := Te ∪ (curr, f, Er) else T := T ∪ (curr, f, next)endif 9. end for 10.end while Building Interface : Algorithm 3 computes the interface graph from the abstraction R. For the algorithm, a list Q is maintained. the procedure append(Q, X) adds each element x ∈ X at the end of Q. The procedure member(Q, x) check if x is a member of Q. The procedure removeF irst(Q) removes the first element from Q and returns the element. The algorithm computes the next symbolic state for each element in Q by applying P ost f,R m operator. There is an erroredge from the current state curr to the error state Er when the next state of curr is a part of error set E. Otherwise appends the next state Q and a new good edge (curr, f, next) is added. The algorithm terminates when the list Q is empty. Example 2. To illustrate the algorithms defined before, let us revisit the Integer Stack example (Figure 1). We assume that the guarded-update rules (Figure 1(b)) are converted into a library model with the set of functions {pop, push}. Let us denote the state-space as S. Figure 2 illustrates the run of the explore algorithm(Algorithm 1). The initial abstract states r 0 , r 1 and r 2 partitions the state-space S into three regions (Figure 2(a)), where r 0 = S | err=1 corresponds to error states, r 1 = S | err=0,top=0 corresponds to the initial states without error states, r 2 = S | err=0,top>0 corresponds to the non-initial non-error states. AbsRef (Algorithm 2) is invoked for pop function, the significant variables are V abs := {err, top}. In the first iteration, the must predecessor S M of error state r 0 fail to add any new states. However, one step concrete predecessor of set S M returns a set S 1 corresponding to S | pop.s=0,top=0,err=0 , where pop.s is the local variable s at function pop. The support set of S 1 \ S M contains a new variable pop.s which is in V f , but not in V abs . The local refinement of R f adds different valuations of local variable pop.s (Figure 2(b)). The second digit of each abstract states denotes the value of pop.s in the abstract state. In the next iteration the must predecessor S M becomes {r10, r00, r01} and no new concrete states can be added by one step predecessor of set S M . Hence the local abstraction R f can not be further refined. The local refinement at Figure 2(b) can not be returned as as the locally added variable pop.s can not reach outside the scope of function pop. The global set which leads the error set can be given by S f M which is a subset of S M corresponding to local initial state I f L of the pop function i.e. S | pop.s=0 . Hence the final global abstraction R G for pop function is obtained from the initial global abstraction R of the function and will be refined with respect to set S f M and its compliment set. The algorithm returns with an unchanged global abstraction. Similarly for the push function the local variable push.s is included in the local abstraction. Even if no new global variable is added in the refinement, there is a new refinement of the global abstract set r 2 with respect to the set of states (where top is 2 and err is 0) which reaches error states in one push call. The final global abstraction is shown in Figure 2(c). The build interface algorithm (Algorithm 3) starts with the initial state r 1 and adds the edges in the graph (Figure 1(c)) until every node is explored with respect to all functions. The interface generated by Explore algorithm is safe and permissive by construction. The safety in ensured by AbsRef Algorithm and permissiveness is ensured by BuildInterface algorithm. The final abstraction R after calling AbsRef algorithms for each function f ∈ F distinguishes error reaching regions from the non-reaching ones. In BuildInterface algorithm each function f is applied in each of the states in the graph obtained by the abstraction R and hence all behaviors are captured in the interface graph. Theorem 1. Explore (Algorithm 1) returns a safe and permissive interface. Approximate Abstract Function Summary and Predecessors: For practical purposes, we do not compute the abstract predecessor operators on the monolithic transition relations. Like [7], Equation 4 holds for approximate operators. The transition for a function f ∈ F G is represented as a number (say k) of guardedupdate rules. For an abstraction R ⊆ 2 2 S f , the must and may abstraction of rule i ∈ {1, . . . , k} can be given as follows: For all j ∈ {m+, M -}, X ⊆ 2 R , the approximate transition relation, one step predecessor operator and multi-step predecessor operator can be given respectively as: . For disjunctive transition relation, the approximate may predecessor operator will be precise; however, the approximate must predecessor will be underapproximation of the precise one. Theorem 2. For each f ∈ F , R ⊆ 2 2 S f , and X ⊆ 2 R , we have Incremental Building of Interface: Algorithm 1 can be used for incremental addition of function sets; as we may not need to create the interface for all the functions at first. The algorithm returns the refined interface for the included functions only. The created interface can be used if we want to add more functions from the library. Rule Partition for Function One more optimization will be partitioning the rule set of each function with respect to the abstraction to create less splitting. Computation of each individual rule for must abstraction can create huge underapproximation; hence may need more splitting. Example 3. In presence of If-Then-Else or Switch constructs in the source code, we may encounter the following rules after the translation. The abstract set R is defined with respect to different valuations of indata variable. If we consider each rule separately and apply the must abstraction, we miss the fact that the final value of variable indata will be 0 and does not depend on the initial value of hd. The must predecessor of S | indata=0 will be ∅ for both rules since the must abstraction of guards will be empty-set. However, if we combine two rules by taking union of sets, then the must predecessor of S | indata=0 will be S for the combined rule and there will not be any further splitting. The heuristic of rule set partition is obtained from the abstraction itself. If a function f has k rules, then i-th and j-th rules can be grouped together for an abstraction R if the condition i.guard↑ m R = j.guard↑ m R holds. In this section we will provide results of some case studies and compare with the related works. There is a data stream with a header of length 2 h and data of length 2 d where h ≤ d. The program uses d bits to represent the pointer and 1 bit for the "error". The boolean variable isHeader is 1 when in header and is 0 otherwise. There are four functions in the program. The function F irstHeader and F irstData takes the pointer to the first header and data location respectively. The function N ext moves the pointer within the header or data in a cyclic way. The function W rite results in an error when pointer points to header section. Our algorithm produces the interface shown in Figure 3(a). The state 1 represents that the pointer in the data part and the state 2 represents that the pointer in the header part. ). Learning algorithm provides the minimal graph, but slowest of all three approaches. Our algorithm provides the same number of non-error regions as the learning algorithm. However, we can not compare time due to different platforms. In this section, we show how a safe and permissive interface can be useful in the verification and testing of the software programs. The following section briefly describe the modifications needed for the interface to be compatible with these settings. Let us assume that we have computed an interface graph for a set of functions. Given a client program consisting of those functions one can immediately check the client with respect to the interface graph. The idea would be simulating the actions of the client program into the interface graph and check whether the library error state (State "ERROR") is reached. For example, a client with a single line modif y(b) on the BitArrayManipulator b can be simulated in the interface graph (Figure 3(b)). We can see that the error state ERROR is reached from the initial state (State 1). There could be an infinite number of possible clients corresponding to those functions and each of them can be model-checked after the interface is computed. In the model-based testing paradigm, an implementation under test (IUT) is checked with respect to a given model program (a specification of the IUT). Our algorithm can build an interface graph from the definitions of the functions given in the model program. We can create a C source regression test-suite from the interface generated from the libraries. However, we need to extend the function calls with the argument values to create a test-bench for the IUT. For example, Figure1(a) can be generated from the model program in Figure1(c). If we are given a linked-list implementation of a finite-size integer stack, we can create an offline test-suite from the interface graph. The testing of the implementation with respect to the test-suite checks whether the interface goes to the error state if and only if the implementation goes to the error state. If there is a discrepancy between the behavior of the interface graph and the code, we understand the implementation source needs further checking. In this section we conclude with the summary of the work and possible future directions. We have provided a new algorithm for interface synthesis with a local-global abstraction refinement framework. This framework is can dramatically reduce the state-space of the interface generation by hiding local variables inside each function. The abstract summarization of the functions provides scalability. The modular analysis is used to handle each function separately. In our generalized setting any C-style set of functions can be handled. The results show that our algorithm provides a safe, permissive and sufficiently minimal (i.e. comparable to the learning algorithms) interface from the set of functions. We have provided the approximate abstract predecessor operators to handle the state-space inside the function. The interface synthesis can be incremental : hence one can add new functions to the interface and it may lead to refinements corresponding to the function. The interface could be used to immediately verify clients and as offline testsuite for a new untested implementation. However, the translation engine is very basic and some parts are done manually. In future we like to work more on covering more aspects (e.g. pointers, recursive data types) of the C source code such that we can have bigger case studies. We like to see how we can use the shape analysis algorithms to translate complex data types. We also like to include CIL inside the tool TICC s.t. it can parse C functions and represent the rules directly in MDD format. We like to implement the back-end using a combination of MDD and SMT solvers such that the space-space problems can be handled better.

Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Leave a Comment