Tarmo: A Framework for Parallelized Bounded Model Checking
This paper investigates approaches to parallelizing Bounded Model Checking (BMC) for shared memory environments as well as for clusters of workstations. We present a generic framework for parallelized BMC named Tarmo. Our framework can be used with a…
Authors: Siert Wieringa (Helsinki University of Technology TKK), Matti Niemenmaa (Helsinki University of Technology TKK), Keijo Heljanko (Helsinki University of Technology TKK)
L. Brim and J. van d e Pol (Eds.): 8th International W orkshop on Parallel and Distributed Methods in verifiCation 2009 (PDMC’09) EPTCS 14, 2009, pp. 62–76, doi:10.4204 /EPTCS .14.5 c S. W ieringa & M. Niemenmaa & K. Heljank o T armo : A Framework f or Parallelized Bounded Model Checking Siert W ieringa Matti Niemenmaa K eijo Heljanko Siert.Wier inga@tkk.f i Matti.Niem enmaa@tkk. fi Keijo.Heljank o@tkk.fi Helsinki Univ ersit y of T echnology TKK Department of Information and Computer Science P .O. Box 540 0, FI-02015 TKK, FI NLAND This paper in vestigates approaches to parallelizing Bounde d Model Check ing ( BMC) f or sh ared memory environments a s well as for clusters o f workstations. W e present a gen eric framework for parallelized BMC n amed T armo . Our fra mew ork can be used with any in cremental SA T encodin g for BMC but for the results in th is paper we use only the curre nt state- of-the- art enco ding f or full PL TL [4]. Using th is encodin g allo ws u s to check bo th safety and liv eness properties, contra ry to an earlier work on distrib uting BMC that is limited to safety properties only . Despite our fo cus on BMC af ter it has been translated to SA T , existing distributed SA T solvers are not well suited for our ap plication. This is b ecause s olving a BMC problem is not so lving a set of indepen dent SA T instances b ut rather inv olves s olving m ultiple related SA T instances, encoded incr e- mentally , where the satisfiability of each instance c orrespon ds to the existence of a cou nterexample of a specific le ngth. Ou r framework i ncludes a generic architectur e for a shared clause d atabase that allows easy clause sharing between SA T solver threads solving various such instances. W e p resent extensive experime ntal r esults ob tained with multiple variants of our T armo imple- mentation. Our shared mem ory variants have a significantly better perform ance than c onv entional single thr eaded ap proach es, which is a result th at many u sers can benefit f rom as multi-co re and multi-pro cessor technolog y is widely av a ilable. Furth ermore we demonstrate that our fra mew ork can be deployed in a ty pical cluster of workstation s, where se veral m ulti-core machines ar e connecte d by a network. 1 Intr oduction Bounded Model Chec king (BMC) is a symbolic model checkin g techniqu e [3, 4] which attempts to le ve rage the exi stence of efficient solv ers for the pr oposition al satisfiabili ty pr oble m (SA T), so-c alled SA T s olver s (e.g. [15, 8]). S A T is the pro blem of finding a tru th assig nment to the Boolean v ariables of a propo sitiona l logic formul a in such a w ay th at the fo rmula e v aluates to true , or determin ing that no such assign ment exists. This classifies the formula as respecti vely satisfiable or unsatisfia ble . The main idea behind BMC is to encode a system model M , property φ and integ er k called the bound into a propositio nal logic formula in such a way that it is satisfiable if f there exists an ex ecutio n of length k of system M which violate s the property φ . Such an exec ution is called a counter ex ample . A con ven tional scheme for BMC is to ha ve a SA T solv er te st the existen ce of a counter examp le of leng th k , and if its existen ce is disprov en (i.e. the solver returns “unsatisfiab le”) k is increased after which the test is repea ted. A typ ical instance of this proce ss is to start with k = 0 and on e v ery itera tion increment k by one. The proces s ends whene ver a countere xample is found or time or memory resourc es av ailable run out. W e w ill call this approach CO NV for con ventio nal . Notice that B MC in this basic form, to which we limit ourselve s in this paper , is an incomplete method as it cannot prove a property φ correc t for all possib le execu tions of system M . For a sur ve y into complete BMC methods see Section 7 of [4]. S. W ieringa & M. Niemenmaa & K. Heljanko 63 Although SA T is an N P-complete proble m curren t state-of-the -art S A T solvers can solve many in- stance s of SA T ef ficiently . Con venti onal S A T solv ers are based on the D PLL framew ork [7], which requir es the input formula to be in conjunc tive normal form (CNF). A propositio nal logic formula is in this form if it is a conjuncti on of clauses . A clau se is a disjunction of liter als . A literal is an atomic propo sition, i.e. either a Boolean varia ble x i or its neg ation ¬ x i . Note that a clause is satisfied by a truth assign ment in which any one of its literals is assigned the value true , and a CNF formula is satisfied if all of its clauses are satisfied. For the remainde r of this paper w hene ver w e speak of a formula we mean an instanc e of S A T in C NF . Note that such a formula can be repres ented as a set of clauses. A SA T solver based on the DP LL frame wor k repeatedly s elects an unassigned v ariabl e as the bran ch- ing var iable which it assigns to e ither true o r fals e . After this th e so lver s earche s for a s atisfyi ng assign- ment in the reduce d search space. If no such assign ment exists the proced ure backtr acks and assigns the branch ing v ariabl e to the opposite value . The defaul t SA T sol ver used by T armo is MiniSA T 2.0 without the simplifier [8] bu t it can easily be replac ed with an y other confl ict driven SA T solv er which supports incr emental SA T . A conflict dri ven SA T solver deriv es, or learns , new clauses as it is working its way through the problem’ s search space. These learned clauses can be seen as additional lemmas that help the solver to a v oid parts of the search space that contain no soluti ons. In a typical SA T solv er the clauses of the input formula are kept in the pr oblem clause database , w hereas the learn ed clauses are in the learned clause database . 1.1 Incr emental SA T In a number of applicat ions, includin g BM C, S A T solv ers are used to solv e a set of formulas that share a lar ge number of clauses. If we were to solve these independe ntly each solving process may make the same inferenc es, expr essed as learned clauses , about the common subset of the formulas. T o a v oid this repeat ed eff ort it would be desirab le to re use lear ned clau ses between the conse cuti vely execu ted solv ing proces ses, which is what an incr ementa l SA T solver is good for . Example 1.1 A ssume that we wish to sequential ly solve the formulas h F 1 , F 2 , . . . , F n i for which F i = S i j = 1 P j , i.e . each formu la F i equals the u nion of the pr evious fo rmula F i − 1 and a new s et of clauses P i . Exploiting the incr ementa lity of the sequence to r euse learned clauses is easy in this case: W e can simply place the clause s F 1 in the solv er , solve, r eport th e r esult for F 1 , add P 2 to the solver , solve , r eport the r esult for F 2 , add P 3 and so on. All learned claus es re main log ical consequ ences of the pr oblem clause s thr ougho ut this sequence , so all learne d clauses can be reu sed in consecu tive runs. Unfortun ately , for most applications , includ ing ours, it does not hold that each formula is a superset of the preceding formula as in Example 1.1. If we want to solv e two consecuti ve formulas we m ay not only need to add clauses to the solve r , we also may need to remov e some. Howe ver , if we remove clause s from the problem clause data base the clau ses in the learne d clau se data base may no longer be implied by the problem clauses . The concep t of assumption s was first introd uced in [9] and it off ers a way around this problem. Only a simple modification to a standard SA T sol ver is required; the additi on of the possib ility to solv e the formula in the problem clau se databa se under a set of assumptio ns . An assumpti on is simply a vari able assignment . W e will sho w nex t why this is suf ficient. Example 1.2 A ssume a gain that we w ish to sequenti ally solve the formulas h F 1 , F 2 , . . . , F n i b ut now eac h F i = Q i ∪ S i j = 1 P j , i.e . each formula F i now conta ins a subset of clauses Q i that is conta ined on ly in F i . Let { x 1 , x 2 , . . . , x n } be a s et of free varia bles, i.e. a set of va riabl es that do not o ccur in a ny cl ause in an y of the formulas i n the sequence . L et Q ′ i = { C j ∨ x i | C j ∈ Q i } . Note that if x i is as signed the value 64 T armo: A Framew ork for Para llelize d BMC false then formula Q ′ i becomes equivalen t to Q i . If, however , x i is assigne d the value true , then formula Q ′ i becomes e quival ent to true . As x i occur s only in the clauses of Q ′ i and its ne gati on ¬ x i does no t occur in an y claus e, the so lver may fr eely choo se to a ssign x i the valu e true un less we for ce it o therwise , which we may do by means of an assumpt ion. W e pr oceed in almos t the same way as in E xample 1.1 : simply place the clauses P 1 and Q ′ 1 in the solver , solve under the assumpt ion x 1 = false , r eport the r esult for F 1 , add P 2 and Q ′ 2 to the so lver , solve under the assumptio n x 2 = false , r eport the r esult for F 2 , add P 3 and Q ′ 3 and so on. As we ne ver actually r emov e a clause fr om the pr oblem clause databas e, we do not af fect the co nsis- tency of the learne d clause dat abase . W e use the BMC encoding of [12, 4 ] to generate the SA T instances. For the remainder of this paper we will represent an encod ed BMC instance as a sequen ce of formulas h F 1 M φ , F 2 M φ , . . . , F n M φ i for which F i M φ ⊆ F k M φ for any k > i . Furthermore there exists a correspondi ng sequen ce of v ariabl es h x 1 , x 2 , . . . , x n i such that F i M φ ∧ ¬ x i is satisfiable iff there exists a countere xample of length i against proper ty φ in model M . Cor ollary 1.1 If F i M φ | = C j then for any k > i it holds that F k M φ | = C j . From experiments in the early stage s of this proj ect we found out that it is not uncommon for the separa te S A T instances in a formula sequence to take se v eral minutes to solv e while the whole sequence could ha ve been solv ed using an incremental SA T solver i n les s than on e minut e. The use of incremental SA T is thus crucial for performance when solv ing BMC instance s, which makes general purpo se dis- trib uted SA T solvers unsuitab le for solving them. In this paper w e present approa ches to parallelizin g the solving o f BMC in stances while maintaining the efficien cy of incrementa l SA T . O ne of our main con- trib utions is the introd uction of a generic archite cture for a shared clause database which allo ws sharing clause s between incremental SA T so lve r threads, allo wing solve rs to easily pick only those cla uses from the database tha t a re implied by their own problem c lauses , while requiring o nly a smal l a mount of bo ok- kee ping. W e demon strate the fea sibilit y of our de sign in en vironments where multip le solv er thre ads can access share d memory , as well as for en vironments where solv er threads communic ate through a net- work. In contr ary to the appro ach presented for distr ib uted bounded model checkin g of safety properti es in [1] the correct ness of our clause sharing m echani sm is not dependen t on the chosen encoding of BMC instan ces into incremen tal S A T . Our frame work can thus alw ays benefit from futu re impro vemen ts in such encodin gs. W e chose to use the current state-o f-the-a rt enco ding presented in [4] which allo ws us to check for safety as well as liv eness pr operti es, thus remo ving an important limitati on of the mention ed earlier work. 2 Multi thr eaded BMC Our multithread ed en vironment is one w here multiple solver threads S = { s 0 , s 1 , . . . , s n } are run on a single shared memory system. All the solver threads attempt to find a counte rex ample against property φ in model M , but they are not necessaril y looking for countere xamples of the same length. This means that in each solver thread s i the pr oble m clause database contains exact ly the clauses in F sbnd ( s i ) M φ for some bound sbnd ( s i ) , the solver bound . Furthermore, let minbnd ( S ) = min { sbnd ( s i ) | s i ∈ S } and maxbnd ( S ) = max { sbnd ( s i ) | s i ∈ S } be the smallest respec ti vel y the larg est solver bound amongs t any of the solve r threads in S . S. W ieringa & M. Niemenmaa & K. Heljanko 65 Let LD s i be the learned clause database of solver thread s i . By definition each clause in the learned clause database is implied by the clauses in the proble m clause databas e, so for each C j ∈ LD s i it holds that F sbnd ( s i ) M φ | = C j . The shar ed clause datab ase is a data structu re accessibl e by each solv er thread for the purp ose of sharin g learne d clauses between solver thread s. 2.1 Ap pr oaches In our framewo rk solver thread s i attempts to solve the formula F sbnd ( s i ) M φ . T wo solve r thread s s i , s j ∈ S may hav e the same solver bound, i.e. it may hold that sbnd ( s i ) = sbnd ( s j ) , in which case both solv er thread s are solv ing the same formula . A related appro ach in w hich no two thread s are ev er searching for a countere xample of the same length is presented in [1] for the checking of safety prope rties. The restric tion that no two threads must be solving the exa ct same formula may seem like it can only hav e positi ve ef fects, but this is not the case. The reason is the lack of rob ustnes s of a SA T solving proces s. Modern SA T solv ers usually use some randomizat ion, and due to this randomizati on the run time of a SA T solv er may v ary greatly for multiple runs of the same solver on the same formula w hen a dif ferent ran dom seed is used. Recent work on distr ib uted S A T solvin g [13, 14] has confirmed that this can be exp loited to achie ve remarkable reductio ns in the e xpecte d run times by simply running the same ran- domized S A T solver on t he same formula multiple times in paralle l with diff erent random seeds until o ne of them finishes. By sharing clauses amongst these solver threads those results can be further improv ed. The authors of [11] use a similar method for distrib uted SA T solving where they also consider using dif feren t search strate gies in diffe rent thread s (e.g. differe nt solv er par ameter set tings or ev en co mpletely dif feren t S A T solv ers). A simple analogue to the described simple distrib ution m ethods for SA T that fits our framew ork is to make each solver thread indepen dently act just like the con ventio nal single-threa ded approach CONV that we descri bed earlier . W e will call this appro ach MU L TICONV . An approa ch similar to the one pro posed in [1] in whic h each solv er that has finishe d starts to searc h for a countere xample of the smallest length that no thread has started search ing for (i.e. maxbnd ( S ) + 1) we call MUL TIBOUND . In that approach the cores indi vidua lly no longer follo w the same scheme as CONV . 2.2 Clause bound For a clause C j let the clause bound cbnd ( C j ) be a number such that F cbnd ( C j ) M φ | = C j . W e use this clause bound for sharing learned clauses between solver threads. The clause bound can be used to ensure that a solv er thread s i only recei v es those shared clauses that are implied by the clauses in its problem clause databa se, as this holds at least for all claus es C j for which cbnd ( C j ) ≤ sbnd ( s i ) . T o allo w clause sharing whene ver possible we would like cbnd ( C j ) to always be the minimal bound at which C j is implied by the problem clause s, but this is hard to calculate and not required for correctness . In fact, a safe approx imation for the clause bound of any clause that is either in the problem clause database of solver thread s i , or learned by that thread , would be sbnd ( s i ) . In our impl ementatio n we calculate a clau se bound fo r each cla use only once, af ter which it is stored with the clause. W ith all clauses C j in the problem clause database we store cbnd ( C j ) = min { k | C j ∈ F k M φ } , i.e. the first bound at w hich the clause appeared in the set of clause s. N ote that a learned clause is al way s deri ved from a number of other c lauses. For a learned clau se C j deri ved from th e set o f cla uses P , we store cbnd ( C j ) = m ax { cbnd ( C k ) | C k ∈ P } , i.e. the maximum clause bound store d with an y of t he 66 T armo: A Framew ork for Para llelize d BMC clause s in P . Finding the m aximum clause bound of all clauses in the typically small set P takes only a neg ligible amount of time. 2.3 Shar ed clause database organization The shared c lause da tabase is organi zed as a set o f que ues { Q 0 , Q 1 , . . . , Q maxbnd ( S ) } . A s t he nu mber of queue s is dependent on maxbnd ( S ) a new queue must be created whene ve r m axbnd ( S ) increases. This means that whene ve r a solver thread s i ∈ S starts to solve the problem for a bound that no other solve r had reache d up to that point it has to creat e a new queu e in the shared clause database. Each clause C j ∈ LD s i that solver thread s i wants to enter into the shared clause databa se should be pushe d into queue Q cbnd ( C j ) . Note that this is the queue corr espond ing to clause C j ’ s cla use bound . Each clause C j in queue Q k has a clause index q ( Q k , C j ) . The fi rst clause to be pushed into an empty queue gets clause index 1, and eve ry clause pushed into a non-empty queue gets the number of its predecesso r incremen ted by 1. Furthermore we define p ( Q k , s i ) as the highest clause index amongst the clauses in Q k that solv er thread s i kno w s about. If solve r s i has ne ver read from nor written to queue Q k then p ( Q k , s i ) = 0. Each queue can be lock ed separate ly . Furthermor e there exists one readers-writer lock L for the whole share d clause database. A reader s-writer lock can be acquired by multiple threa ds at the same time for reading or excl usi v ely by one thread for writing. If a thread wants to add a queue to the shared clause data base it mus t acq uire the lo ck L f or writing . Threads that want to lock a s eparat e queue for an y type of a ccess must first acquir e lock L for rea ding. This mechanis m is require d because exist ing queues may be relocat ed in memory when a new queue is adde d to the database. Example 2.1 A ssume an e n vir onment in whi ch two simul taneou sly working solver thr eads S = { s 0 , s 1 } e xist, let sbnd ( s 0 ) = 21 and sbnd ( s 1 ) = 22 . A possible state of the shar ed clause database in this en vi- r onment is the one depicte d in F ig . 1. T he pointer s p ( Q 20 , s 0 ) and p ( Q 20 , s 1 ) indicate that both solver thr eads have seen all clauses in queue Q 20 . Solver thr ead s 0 has also seen all clauses fr om queue Q 21 , b ut as its solver bound is smaller than 22 it is not allowed to syn chr onize with queue Q 22 so it knows none of the cla uses in ther e. One may also observe that as solver thr ead s 1 has n ot se en th e cla uses 3 − 5 in queue Q 21 the y must have been put ther e by solver thre ad s 0 . ... 1 C 20 1 2 C 20 2 3 C 20 3 4 C 20 4 5 C 20 5 Q 20 1 C 21 1 2 C 21 2 3 C 21 3 4 C 21 4 5 C 21 5 6 C 21 6 Q 21 1 C 22 1 2 C 22 2 3 C 22 3 4 C 22 4 Q 22 p ( Q 20 , s 0 ) p ( Q 20 , s 1 ) p ( Q 21 , s 0 ) p ( Q 21 , s 1 ) p ( Q 22 , s 1 ) p ( Q 22 , s 0 ) Figure 1: Shared clause databa se example S. W ieringa & M. Niemenmaa & K. Heljanko 67 2.4 Synchr onizing with the shared clause database As e xpla ined in Subsection 2.2, a ll cla uses C j for w hich cbnd ( C j ) ≤ sbnd ( s i ) are implied by the prob lem clause s in solv er thread s i , which means that s i can safely introduce all clauses from the queues Q k for k ≤ sbnd ( s i ) to i ts shared clause database. As it does th is it only has to r ead clauses it has not read before, so it can start readin g from the claus e w ith claus e index p ( Q k , s i ) + 1. A claus e C j can be removed fr om the que ue Q k by th e last sol ver thre ad s i ∈ S that reads it, i.e. w hen s i finds after read ing that for all s m ∈ S it hold s that p ( Q k , s m ) ≥ q ( Q k , C j ) . If a solv er thread s i wishes to insert a set of clause s into queue Q k it must first lock that queue, then read all the clauses C j from it for which q ( Q k , C j ) > p ( Q k , s i ) . O nly after this it may write the new clau ses to the queue and fi nally it may procee d to unlock it. It is necessary that s i reads unread clauses from Q k before writing anyth ing to it as otherwis e the queue ends up in a state where clauses not known by s i preced e clauses kno wn by s i . In such a state we would no longer be able to use the clause index mechanism to identif y which clauses in the queue the solve r does not yet kno w . Each solver thread s i ∈ S has a local clause stack LS s i ⊆ LD s i that contains all clauses learned by s i that hav e not yet been pl aced in the s hared clause d atabas e. T he clauses in stack LS s i can be mo ved to the shared clause database at regular interv als. As w e hav e to read clauses from the database before w riting to it, these points form the synchroni zation points of solv er threa d s i with the shared claus e database . The pseud ocode for the synchroniz ation procedure is stated in Algorithm 2.1 . W e chose to execu te this synch ronizat ion at eve ry r estar t (see e.g. [15]), as restarts happen regul arly but only after learning a substa ntial amount of new clauses, and be cause the y a re go od poin ts for i ntrodu cing ne w lea rned cla uses as all assign ments of branching varia bles are undone. Algorithm 2.1 Synchr onizing solver thr ead s i with the shar ed clause database . 1. lock reade rs-writer lock L for reading 2. f or all Q k such that k ≤ sbnd ( s i ) 3. lock queu e Q k 4. Read claus es { C j | C j ∈ Q k , q ( Q k , C j ) > p ( Q k , s i ) } from the database 5. Push claus es { C j | C j ∈ LS s i , cbnd ( C j ) = k } into Q k 6. newmin : = m in { p ( Q k , s m ) | s m ∈ S } 7. Remov e all cla uses { C j | C j ∈ Q k , q ( Q k , C j ) ≤ newmin } from the databas e 8. unlock queu e Q k 9. end f or 10. unlock readers-writ er lock L 11. LS s i : = / 0 As a n o ptimizat ion to this basic sch eme ou r imple mentatio n pushes clause s C j for whi ch cbnd ( C j ) < minbnd ( S ) into Q minbnd ( S ) instea d of into Q cbnd ( C j ) . This means that no clauses are pushed into queues corres pondin g to bounds that are no longer being solved by any solv er thread. As a result the queues Q k for k < minbnd ( S ) will ev entual ly become empty after w hich the y may be completel y discarded. 2.5 Benchmark s W e obtain ed the benchmark set used in [4 ], to which w e will refer as LMCS06, and the b enchmark suites L2S, TIP and Intel from the set of benchmarks used for the Hardware Model Checking Competition in 2007 (HWMC C07) [5]. E ach of the benchmarks repres ents a model M and property φ , which can serve as input to, for exa mple, the model checke r NuSMV [6]. 68 T armo: A Framew ork for Para llelize d BMC This model check er includ es an implementa tion of the encod ing presen ted in [4]. Unfortunatel y NuSMV is linked to an incremen tal SA T solver directly (e.g. MiniSA T) and thus the actual encoding of a bench mark into clauses that are fed to that solver does not beco me visible to its users. W e use a modified version of N uSMV ver sion 2.4.3 which streams the sequence of formulas encod- ing a benchmark into a fi le rather than attemptin g to solve those formulas with its linked- in SA T solver . For benchmarks from H WMCC07 for which it was kno wn beforeh and that the shortest existin g coun - tere xample was of length k , a formula sequence of length k + 11 was generated, i.e. the larg est formula repres ented in the file correspon ds to the existenc e of a counter exa mple of length k + 10. For all other bench marks the sequence was generated up to length 501, i.e. the larg est formula represen ted in the file corres ponds to the exi stence of a countere xample of length 500 . A s no suitable file format existed for these increment al S A T prob lems we defined our own forma t, called iCNF 1 . All of the obtaine d benc hmarks were translat ed into a sequenc e of formulas as described. iCNF is T armo’ s input file format, so in the re mainder of this pap er whene ve r we spea k of a b enchmark we mea n these translat ions. W e cons ider a benchmark solved when a formul a in the sequence is found satisfiable, which corresp onds to the existen ce of a count erex ample, or when all fo rmulas in the sequ ence are found unsati sfiable, which correspo nds to the none xiste nce of a countere xample of length at m ost 500. W e remov ed all benchmarks from our benc hmark set that can be solve d within 10 seconds by the singl e- thread ed C ONV appr oach. The resulting set contains 134 benchmarks. 2.6 Experiments In this subsection we present experiment al results with dif feren t approach es to exp loitin g m ulti-co re en vironment s for BMC. All results in this subsectio n were obtained using a single works tation from the set of 20 wor kstatio ns found in our department’ s cluster . Each workstatio n is equipped with two Intel Xeon 5130 (2 GHz) Dual Core proce ssors and 16 GB of R AM. Figures 2 and 3 are “cact us plots”: such plots are traditional ly used by the organ izers of the SA T competit ions [2] for compar ing SA T solv ers. In a ca ctus plot, time is on the vert ical axis and the numbe r of instances solved is on the horizontal axis. F rom Fig. 2 one can, for exampl e, see that for 97 benchmark s in the set the run time of CONV is under twenty minutes, and that for 105 benchmarks the run time of CONV is under one hour . The exec ution of the single- thread ed CONV obviou sly required the use of only a single core of one of our workstatio ns, but , as will beco me clear later , it is important to note that care was take n to kee p the other three av ailable cores in that same workstati on idle. The results presen ted for CONV are the run times of a single ex ecution, but C ONV was ex ecuted in total four times for each benchmark. 4xCONV is an artificial v ariant that reports the fastest of those four results for each benchmark. This is meant to illustrate ho w the run time of a SA T solver varies per run due to the random choices it makes, and ho w this can b e e xplo ited to achie ve redu ctions in the e xpected run time, as can be cle arly seen fro m Fig. 2. Unfortun ately if w e exe cute the four independen t runs of CONV in parallel on the same four core works tation the results are not as positi ve. This is because the cores slow each other down as they share resour ces like the memory bu s and parts of the cache . The negati ve result can be clearly seen in the scatter plot presented in Fig. 4 as well as in the cactus plot presented in Fig. 2. From that cactus plot it can be seen how the result of this naiv e paralleli zation , which w e will refer to as MUL TICONV - SIMPLE , is ev en slo wer than the single-th readed varia nt CO NV for many of the simpler benchmark s. 1 For a detailed description, and tools for handling iCNF fi les, please check http:/ /www.tcs.hut. fi/ ~ swiering/icnf/ S. W ieringa & M. Niemenmaa & K. Heljanko 69 0 600 1200 1800 2400 3000 3600 75 80 85 90 95 100 105 110 115 120 125 Time (s) Instances solved CONV 4xCONV MULTICONV-SIMPLE MULTICONV-FULL Figure 2: Cactus plot sho wing the effec ts of multithreadi ng. 0 600 1200 1800 2400 3000 3600 75 80 85 90 95 100 105 110 115 120 125 Time (s) Instances solved MULTICONV-FULL MULTICONV-ADAPTIVE MULTICONV-TARMO MULTIBOUND-TARMO Figure 3: Cactus plot sho wing the improv ed multithreaded varian ts. 70 T armo: A Framew ork for Para llelize d BMC 1 10 100 1000 1 10 100 1000 MULTICONV-SIMPLE 4xCONV HWMCC07, Intel HWMCC07, L2S HWMCC07, TIP LMCS06 Figure 4: Scatterplo t illustrating the artificial varian t 4xCONV . 1 10 100 1000 1 10 100 1000 MULTICONV-FULL MULTICONV-SIMPLE HWMCC07, Intel HWMCC07, L2S HWMCC07, TIP LMCS06 Figure 5: Scatterplo t illustrating the effec t of clause sharing. S. W ieringa & M. Niemenmaa & K. Heljanko 71 1 10 100 1000 1 10 100 1000 MULTIBOUND-TARMO MULTICONV-TARMO HWMCC07, Intel HWMCC07, L2S HWMCC07, TIP LMCS06 Figure 6: Scatterplo t comparing MUL T ICONV with MUL TIBOUND . Ho wev er , it does manage to solve a cou ple of benchmarks that CON V cou ld not solve within an hour . Fortun ately we can extend MUL TICONV -SIMPLE with clause sh aring to impr ov e its p erforman ce. MUL TICONV -FULL is a version which implements shared clause datab ase synchroni zation s by e ver y solv er thread as described in Subsectio n 2.4. Although one can see from the cactu s plot presen ted in Fig. 2 that the a vera ge performance improv es after adding clause sharin g, the scatterp lot in Fig. 5 sho w s that sharing clauses sometimes harms performance . This was not unex pected as t oo many learned clauses are not beneficial to any SA T solver . In fact, to reduce the nega ti ve effec ts of large learned clause databa ses SA T solvers occasio nally delete learned clauses. In distrib uted SA T solv ers var ious ways of limiting the number of shared clauses can be found. A common approach, found for example in [11], is to share only clauses whose length is shorter than some co nstan t. This cru de appr oach is just ified by t he obser v ation that sho rter clau ses represe nt stronger constr aints. W e hav e tried se ve ral such constants in our distr ib uted BMC frame work b ut w e achie v ed better a v erage resul ts with v ariant MUL TICONV -AD APTIVE which u ses an ada pti ve heuristi c to limit clause sharing. It shares only clauses whose length is smaller than or equal to the continuousl y recalcu- lated av erage length of all clauses it ev er learned. T he performanc e improve ment can be clearly seen in Fig. 3. In all of our MUL TICON V v arian ts presente d so far the search space is prune d dif ferentl y on each core only becau se of the ef fect of the randomizati on used by the SA T s olve rs. T o force a more di versified search we can use dif ferent search parameter s in dif ferent threads. One of MiniSA T’ s search parameters is the polarity mode which can be either ne gat ive or posi t ive . The default is ne gat ive , meaning that for eve ry branc hing var iable MiniSA T tries to assign the val ue false first. In any case, MiniSA T selects the same v alue first consequen tly for each branching var iable, 72 T armo: A Framew ork for Para llelize d BMC which seems to be surpris ingly effect i ve [16]. The defa ult polarity mode ne ga t ive works best in practic e for “indus trial” S A T ins tances , which is solely caused by the way peopl e tend to encode their problems. W e ob tained the best resul ts in our four -threaded en vironment with a varia nt we call MUL TICONV - T A RMO . It is the same as MUL T ICONV -ADAPTIVE exce pt that in one of the four solve r threads we use the polarity mode posit ive . This further di versifies the search, which causes a clear improveme nt of the performance as can be seen from Fig. 3. Using polarity mode posi t ive in two of the four solver thread s perfor med less well for our benchmarks. W e ha ve also tested t he MU L TIBOUND approach. Just as for MUL TICONV we tes ted v ariants us- ing full clause s haring , using our adapti ve cl ause sharin g heuris tic, and with one solver using the o pposi te polari ty mode setting. In the cactus plot presented in Fig. 3 only this last vari ant, called MUL TIBOU ND- T A RMO , is plotted. O ne can see that this versio n performs on a ver age quite similarly to the equi v alent MUL TICONV vari ant. Surprising ly enough the av erage performance of each MU L TIBOUND vari ant was similar to that of the equi valent MUL TICON V var iant. This similar a vera ge performance is espe- cially interes ting since the performan ce for indiv idual benchmarks is very dif ferent, as can be seen from the scatter plot presented in F ig. 6. It thus seems that the MUL TICONV and MUL TIBOUND app roach are both useful , but complemen tary , approac hes. 3 BMC for w orkstation clusters No w that we ha v e demonstra ted the significan t speed-ups that we can obtain using our multithr eaded v ariant s of T armo we will dis cuss approach es which distrib ute runs of T armo o ver s e ver al multithreaded works tations . A distrib uted SA T solver for a similar en viron ment is presented in [17]. The workstatio ns in our department’ s compu ting cluster that were already mentioned in Subsecti on 2.5 are all conne cted by 1 gigabi t Ethernet connection s through a cluster switch. Our en viro nment can be defined as a set T = { D , S 0 , S 1 , . . . , S n } in w hich D refers to the single - thread ed D atabas e Interface Pr ocess (DIP), and each S i is a worker , which is simply a set of solver thread s on a single multi-core worksta tion as defined in Section 2 . Each m ultithr eaded en viro nment S i uses one of our multithrea ded T armo v ariants to find a cou ntere xample against property φ in model M . The D IP is a process which stores the global shar ed clause database , and pro vides an interface to it for the solv er threads. It does not manipulate the database by itself. For the remainder of this sectio n let Q i k refer to queue Q k in the local shared clause database of worker S i , and Q D k refer to Q k in the global shared clause database stored in the DIP . Furthermore , let L i be the reader s-writer lock for the local shared clause database of worke r S i . 3.1 Global share d clause database organization The global shared clause database is a data structure which is almost identi cal to the shared clause databa se found in each worker process. T he differe nce is that it is accesse d by the work ers, rather than by their indi vidua l solv er threads. For each queu e-wor ker pair ( Q D k , S i ) the clause databas e store s p ( Q D k , S i ) which is the high est clause index of the clauses in Q D k which work er S i kno w s about. Only one worke r can access the global shared clause database at the same time because the D IP is single -thread ed. This simplifies the design as well as pre ve nting possible network conges tion due to multiple work ers accessing the database simultane ously . S. W ieringa & M. Niemenmaa & K. Heljanko 73 3.2 Global database synchron ization Whene ver a work er wishes to share clause s with other worker s, one of its threads performs a synchro- nizatio n with the global shared clause databa se through the DIP . This synchron izes the work er’ s local shared clause databa se w ith the globa l shared clause database. Recall from Subsection 2.3 that w e ha ve for each thre ad s m ∈ S i and queue Q i k a clause inde x p ( Q i k , s m ) . The local data base of each worker S i is e xtende d with p ( Q i k , D ) for each queue Q i k , where p ( Q i k , D ) is defined as the highe st clause index amongst all clauses in Q i k that are kno w n to the DIP . The synch roniza tion process begin s with a worker S i sendin g a m essage to th e DIP , informin g it that it is prepare d for a synchroni zation . The DIP gathers for all Q D k the clauses { C j | C j ∈ Q D k , q ( Q D k , C j ) > p ( Q D k , S i ) } and places all of them in a buf fer . The whole b uf fer is then sent to work er S i at once. When the worke r has recei ved the clause b uf fer from the DIP it starts a synchro nizatio n procedure which is describe d in Algorithm 3.1. A s with local synch ronizat ions, care must be taken to ensure that writing ne w clause s to a queue always follo ws a lock and a read, in order to prev ent unkno wn clause s preced ing known clause s in the queu e. Algorithm 3.1 Synchr onizing worker S i with the globa l shar ed clause database . 1. Let R be the set of clauses recei ved from D 2. B : = / 0 3. lock reade rs-writer lock L i for r eading 4. f or all Q i k such that k ≤ maxbnd ( S i ) 5. lock queu e Q i k 6. Read claus es { C j | C j ∈ Q i k , q ( Q i k , C j ) > p ( Q i k , D ) } and appen d them to B 7. Push claus es { C j | C j ∈ R , cbnd ( C j ) = k } into Q i k 8. newmin : = m in { p ( Q i k , s m ) | s m ∈ S i } ∪ { p ( Q i k , D ) } 9. Remov e all cla uses { C j | C j ∈ Q i k , q ( Q i k , C j ) ≤ newmin } 10. unlock queu e Q i k 11. end f or 12. unlock readers-writ er lock L i 13. Send B to D Upon recei ving the work er’ s learned clauses after the local synchro nizati on has taken place, the DIP can write them t o the global shared clause database. The process is co mpleted and the DIP awaits another reques t. 3.3 Experiments W e ha ve tried se veral approache s to distrib uting T armo o ver more th an one worksta tion. O ur best multi- thread ed v ariants turned out to be very robus t. S imply running the same multithrea ded varian t m ultiple times with diff erent seeds in parallel on se ver al worksta tions and reporting the result when the first one finishes hardly decreases the expected run time. From the exp eriments in Subsection 2.6 we concluded that our MUL TICONV -T AR MO a nd MUL T IBOUND-T ARMO v arian ts both ha ve good av erage per- formance but are complementary . This observ ation insp ired us to a simple distrib ution over two work- station s where the two diff erent approac hes are each run on a single workstat ion. In this way w e obtain a resul t for each benchmark in exa ctly the amoun t of time it takes for the fast est of the two to finish. W e hav e named this varian t MU L TICONVxMUL TIBOU ND . It was calculated from the earlier single works tation results r ather tha n act ually ex ecuted on t wo w orksta tions in p aralle l. In this ca se th is shoul d, 74 T armo: A Framew ork for Para llelize d BMC ho we ve r , not make any diff erence to the result, as two workstation s can fun ction comple tely indepen - dently , at least assuming that they both already hav e the input file stored locally before starting the run. From Fig. 7 an improve ment on the number of instan ces solved within an hour can be seen. When one tak es anoth er look at Fig. 6 in S ection 2 .6 o ne re alizes th at fo r man y indivi dual bench marks the s peed-u p is significant as the achie ved performance is the best of the two v arian ts plotted there. The cactus also shows the varian t DISTRIBUTED . This is a truly distrib uted program that uses MPI v ersion 2.0 [10] for communi cation between work station s. T o obtain each resu lt for that varian t we used three workstation s in total: one running MUL T ICONV , one running M UL TIBOUND , and one runnin g the DIP . The single-thr eaded DIP was run on a single workstation in which the other three a v ailabl e processor cores were kept idle for the purpose of obtain ing these results. In a practic al setting one will most likely not want to reserv e an entire works tation for the single-threa ded DIP , b ut as the D IP’ s computa tional load is not very high, relaxing that restricti on should not cause a significant performan ce decrea se. It may ev en be a good choice in practice to run the D IP on the cluster ’ s fr ont-e nd , which in a typica l cluster setup is a single wo rkstati on through which all communicati on with machines outside the cluste r takes place. Note that in varia nt DISTRIBUTED we use the globa l shared clause database stored in the DIP to share clauses between a workstati on runnin g MUL TICON V -T ARMO and a worksta tion running MUL TIBOUND -T A RMO . Our clause databa se design ensures that this does not cause any complica- tions. A fter testin g sev eral approa ches we chose to ha v e a worke r initiate a synchroni zation with the global shared clause databa se whene ver one of its solv er threa ds increa ses its solv er boun d, i.e. ev ery time a solver thread fi nds a formula unsatis fiable. From Fig. 7 it can be seen that this simple global clause sharin g setup impro v es the av erag e performance. This performanc e can probabl y be improv ed more by introdu cing a clev er heurist ic for limiting the number of claus es sh ared as w e did for the multithr eaded appr oaches . W e chose no t to further in vestigat e such varian ts in this paper . T he performance increase obtained is m ainly due to using two complemen- tary multithr eaded approache s. As those are very robus t approaches the performance of this distrib uted ver sion of T armo will not scale beyond two worksta tions. O ne could try to define more multithreaded approa ches with good av erag e performance to obtain more complementary approaches that can be run in parallel b ut this is unlike ly to scale much further . This di strib uted framewo rk with its generi c shared clause d atabas e architect ure will be very use ful to our future work. W e plan to in vestiga te approac hes that use search space splitting amongst the worksta- tions, in order to allo w our system to scale to larg er numbers of workst ations . A possible way of doing this would be to spl it the formulas using guiding paths [18]. 4 Conc lusion In this paper we ha ve presented the T armo framewo rk for bounded model checking using multi-core works tations as well as cluster s of them. One no ve l featu re of our framew ork for distrib uted BMC is that it allo ws using an y encoding of B MC ins tances into incremen tal SA T . In o ur e xper iments we u se the encod ing presen ted in [4 ], which means that we are able to check safety as well as liv enes s properties with all v ariant s of T armo discus sed in this paper . An import ant contrib ution found in thi s work is o ur gene ric architec ture for a shared clause data base for multiple incremental S A T solv er threads working on parts of the same incremental S A T encodi ng of a B MC instance. T ogether with our definitions for clause bound and solver bound , it allows the sharing of clause s while requirin g very little bookk eeping to make sure that solv er threads only obtain those S. W ieringa & M. Niemenmaa & K. Heljanko 75 0 600 1200 1800 2400 3000 3600 75 80 85 90 95 100 105 110 115 120 125 Time (s) Instances solved MULTICONV-TARMO MULTICONVxMULTIBOUND DISTRIBUTED Figure 7: Performance of the multiple workstat ion T armo v ariants. clause s that are are actually implied by their set of proble m clauses. It has been demonst rated how the archite cture can be emplo yed for solver th reads oper ating in sha red-memory en vironmen ts as well as for solv er threads that communicate through a network using MPI. Our multi-cor e v arian ts of T armo obtain ed good speed-u ps ov er the con ventiona l single-threa ded approa ch. T his is an importan t result as multi-core hardware is now widely av ailable, and thus many BMC users can benefit from this. Furthe rmore the two multi-core vari ants presented as MUL T ICONV - T A RMO and MUL TIBOUN D-T ARMO turne d out to be complementary approaches which both hav e good av erage performance. W e expl oited thes e complementary var iants in a setting which use s multiple workstatio ns. W e ob- tained a s peed-u p over the single wor kstatio n versions , but po ssibly more in teresti ngly showed t he feasi- bility of cla use sharing between wor kstatio ns using our shared claus e database architec ture. This will b e a ver y useful result for future distrib uted versions of T armo or e ven other distrib uted BM C approaches . T o improv e the rate at which the performance scales with the number of worksta tions used such future ver sions may , for exampl e, split the search space into multiple disjoint parts. Such techniques are easy to implement within our frame wor k, as our shared clause datab ase architec ture allo w s clause sharin g between any solver thread that is workin g on parts of the same incremental SA T problem, regard less of the solvin g strate gy it uses. Our T armo implementat ion is a v ailab le at: h ttp://www.tcs.hut.fi/ ~ swiering / tarmo/ . Acknowledgements This work was financial ly supported by the Academy of F inland (proje cts 126860, 12805 0), T echnolo gy Indust ries of F inland Centennial Foundation , Jenny and Antti W ihuri Found ation , and the Helsinki Graduate School in Computer Science and Engineer ing (H ecse). 76 T armo: A Framew ork for Para llelize d BMC Refer ences [1] Erika ´ Abrah ´ a m, T obias Sch ubert, Bernd Becker, M artin Fr ¨ anzle & Christian Herd e (2 006): P arallel SA T Solving in Bounded Model Checking . In: Lub os Brim, Boudewijn R. Haverkort, Martin L eucker & Jaco van de Pol, editors: FMICS/PDMC , Lecture Notes in Computer Science 4346. Springer, pp. 30 1–31 5. [2] Daniel Le Berre, Oli v ier Roussel & Laurent Simon (organizers in 2009) . Th e SA T competitions . A vailable at http:/ / www.satcom petit ion. org . [3] Armin Biere, Alessandro Cimatti, Edm und M. Clarke & Y unshan Zhu (199 9): Symb olic Mo del Checking without BDDs . In : Rance Cleav eland, editor: T ACAS , Lecture Notes in Com puter Science 1579. Springer, pp. 193– 207. [4] Armin Biere, Keijo He ljanko, T omm i Junttila, Timo Latvala & V iktor Sch uppan (200 6): Linear Encodin gs of Bounded L TL Model Checking . Logical Method s in Compu ter Science 2(5:5) , pp. 1–64. [5] Armin Biere & T on i Jussila. Har dware Model Checking Competition 20 07 (HWMCC07) . A vailable at http:/ / fmv.jku.at/hwmc c07/ . Organized as a satellite event to CA V 20 07, Berlin , Ger many , July 3 -7, 2007. [6] Alessandro Cimatti, Edm und M. Clarke, En rico Giu nchiglia, Fausto Giu nchiglia, Marco Pistore, Marco Roveri, Roberto Sebastiani & Arm ando T acchella (2002 ): NuSMV 2 : An OpenS ource T o ol for S ymbolic Model Ch ecking . In : Ed Brinksma & Kim Gu ldstrand Larsen, e ditors: CA V , Lecture N otes in Computer Science 2404. Springe r , p p. 359–36 4. [7] Martin Davis, George Logemann & Dona ld W . Loveland (1962) : A ma chine pr ogram for th eor em-pr oving . Commun. A CM 5(7), pp. 394 –397. [8] Niklas E ´ en & Niklas S ¨ or ensson (2003 ): An Ex tensible SA T-solver . In: En rico Giunchiglia & Armando T acchella, edito rs: SA T , Lecture Notes in Compu ter S cience 2919. Springe r , p p. 502– 518. [9] Niklas E ´ en & Niklas S ¨ or ensson (2003): T emporal induction by in cr emental SA T s olving . Electronic Notes in Theoretical Computer Science 89(4) , pp. 543–56 0. [10] Wi lliam Gropp , Ewing Lusk & Rajeev Thaku r (19 99): Using MPI- 2: Adv anced F ea tur es of th e Message- P assing Interface . MIT Press, Cambridg e, MA. [11] Y o ussef Ha madi, Said Jabbour & Lakhda r Sais (2009 ): Man ySAT: A P arallel SAT Solver . JSA T Special Issue on Parallel SA T Solving 6, pp. 245– 262. [12] Keijo Heljanko, T o mmi Junttila & T imo Latv ala (2005) : Incr emental and Complete Bounded Model Chec king for F ull PL TL . I n: Kousha Etessami & Sr iram K. Rajam ani, ed itors: CA V , Lecture Notes in Computer Science 3576. Springe r , p p. 98–111 . [13] Antti E . J. Hyv ¨ a rinen, T ommi Junttila & Ilkk a Niemel ¨ a (2009) : Incorporating Clau se Learning in Grid-Based Rando mized SAT S olving . JSA T Special Issue on Parallel SA T Solving 6, pp. 223–24 4. [14] Antti Eero Johannes Hyv ¨ arinen, T ommi Junttila & Ilkka Niemel ¨ a (2008 ): S trate gies for Solving SAT in Grids by Rand omized Searc h . In: Serge A utexier , John Camp bell, Ju lio Rubio, V olker Sorge, Ma sakazu Suzuki & Freek W iedijk, editors: AISC/MKM/Calculemus , Lecture Notes in Compu ter Science 5144. Springe r , p p. 125–1 40. [15] Matthew W . Moskewicz, Con or F . Mad igan, Y ing Zhao, Lintao Zhang & Sharad Malik (2001): Chaff: Eng i- neering an Efficient SA T Solver . In: D A C . A CM, pp. 53 0–53 5. [16] Dimo sthenis Mpekas, Michiel van Vlaardin gen & Siert Wieringa ( 2006 ). The fi rst steps to a hyb rid SAT solver . Student report, Delft Uni versity of T e chnolog y , Faculty of EWI. [17] T obias Sch ubert, Matthe w Le wis & Bernd Becker (2009 ): P aMiraXT: P a rallel SAT Solving with Th r eads an d Message P a ssing . JSA T Special Issue on Parallel SA T Solving 6, pp. 203– 222. [18] Hanta o Zhang, Maria Paola Bonacina & J ieh Hsiang (1996): PSAT O : A distrib uted pr opo sitional pr over and its application to quasigr oup pr oblems . J. Symb. Co mput. 21(4- 6), pp. 543– 560.
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment