Integrity-Enhancing Replica Coordination for Byzantine Fault Tolerant Systems

Integrity-Enhancin g Replica Coordinati on for Byzantine Fault T olerant Systems ∗ W enbing Zhao Department of Electrical and Computer Engineering Cle veland State Uni versity 2121 Euclid A ve, Cle veland, OH 44115 wenbing@ieee.org October 24, 2018 Abstract Strong rep lica consistency is of ten achieved by writin g deterministic applicatio ns, or b y using a variety of mech - anisms to rend er rep licas determ inistic. There exists a large body of work on how to r ender replicas d eterminis- tic under the benign fault model. Howe ver , when replicas can be subject to malicious faults, mo st of the previous work is no longer ef fective. Furthermo re, the determin- ism o f the replicas is o ften co nsidered har mful fro m the security perspective and for many application s, their in- tegrity strongly d epends on the random ness of som e o f their interna l operations. This calls f or new appro aches tow ards achieving r eplica con sistency while preserving the rep lica ra ndomn ess. In this paper, we present two such approach es. One is based on Byzantin e a greement and the oth er on thre shold coin-tossing. Each approa ch has its stren gth and weaknesses. W e co mpare th e perfo r- mance of the two appr oaches an d ou tline th eir re spectiv e best use scenarios. Keywords : Replica Consistency , Byza ntine Fault T oler- ance, Middleware, Thresh old Signature, Coin-T ossing ∗ This work is sponsored by Cle veland State Univ ersity through a Facu lty Research Dev elopment award. 1 Introd uction Strong replica consistency is an essential proper ty fo r replication- based fault toler ant distrib uted systems. It can b e ach ie ved v ia a number of dif ferent techniqu es. In th is paper, we investigate the ch allenges in achiev- ing integrity- preservin g stro ng replica con sistency and present ou r so lutions for state-machin e based Byz antine fault tolerant systems [3]. While it is wid ely known th at strong replica consistency can also be achieved throug h the systematic-checkp ointing technique [12] for non deter- ministic ap plications in the benig n fault model, it is gen - erally regarded as too expen si ve and it is not suitab le fo r Byzantine fault tolerance. The state-machin e based appro ach is one of the fund a- mental techniqu es in building fault toleran t systems [16]. In this app roach, replicas are assumed to be either de- terministic or render ed-determ inistic. There has been a large b ody of work on how to render replicas determ in- istic in th e pr esence o f r eplica n ondeterm inism under th e benign fault mod el ( e.g., [3, 4, 1 2, 15]). Howe ver , when the replicas can be subject to Byzantine faults, which is the case for many In ternet-based system s, most of the previous work is no lo nger effectiv e. Furthermo re, the determinism (or rendered -determin ism) of the replicas is often c onsidered h armful fro m the security perspective ( e.g., with rep lication, an adversary can compro mise any of the replica s to ob tain conﬁdential info rmation [6]) an d for many app lications, their integrity is stron gly d epen- dent on the r andom ness of some o f their in ternal o pera- tions ( e.g., rand om numbers are used for unique identiﬁer generation in transaction al systems and for shufﬂing c ards in onlin e poker games, an d if the ran domness is taken away b y a deterministic algorithm to ensure replica con- sistency , the identiﬁers or the hands of cards can be made predictable , which can easily lead to exploit [1 9, 21]). This calls for new approache s to wards achieving strong replica consistency while preserv ing the rand omness o f each replica’ s operations. In this p aper, we presen t two alter nativ e approa ches tow ards ou r goa l. The ﬁrst one is based on Byzan tine agreemen t [3] (referred to as the BA-algorithm in this pa- per) an d the oth er on a thr eshold coin-to ssing scheme [ 2] (referre d to as the CT -alg orithm). Both approach es rely on a co llecti ve determina tion for decisions inv olving ran- domness, and the determ ination is ba sed on the co ntri- butions mad e b y a set of replicas (at least on e of wh ich must be correct), to a v oid the problems mentioned above. They differ mainly b y how the collective determination is carried out. In th e B A-algo rithm, th e replicas ﬁrst reach a Byzantine agreement on the set of co ntributions from replicas, and th en apply a determin istic algo rithm (for a ll practical p urposes, the bitwise exclusi ve-or oper- ation [21]) to comp ute the ﬁnal rando m value. The CT - algorithm uses the thresho ld coin-tossing scheme intro- duced in [ 2] to derive the ﬁnal rando m value, witho ut the need of a Byzantine agreem ent step. Even though the CT -algorith m sa ves on commu nication cost, it does incur signiﬁcant compu tation overhead due to the CPU- intensive exponentiatio n calculatio ns. Consequently , as we will show in Sec tion 7, the BA-algorithm perfo rms the best in a Local-Are a Network ( LAN) environment, where the CT - algorithm is more approp riate for the W ide- Area Network (W AN) environment wher e message pass- ing is expe nsiv e. Furthermore, to ensur e the freshn ess of the r andom num bers g enerated, th e replicas u sing the B A-algorithm should ha ve access to high entropy sou rces (which is relatively easy to satisfy) and the replicas should be ab le to refresh their key shares perio dically in the CT - algorithm . F or the latter, we envisage that a pr oactive threshold signature scheme could be used [1, 7, 8]. How- ev er , the discussion of proactive threshold signature tech- niques is out of the scope of this paper . T o summ arize, we ma ke the fo llowing research contri- butions in this paper: • W e point out th e danger an d pitf alls of co ntrol- ling replica rando mness for the pur pose of ensu r- ing replica co nsistency . Removing randomness fro m replica operations (wh en it is needed) could seriously compro mise the system in tegrity . • W e p ropose the use of collective determ ination of random num bers contributed from replicas, a s a practical way to recon cile the requ irement o f strong replica consistency a nd the preservation of rep lica random ness. • W e presen t a light- weight, Byzantine agreement based alg orithm to car ry out the collectiv e d etermi- nation. The B A-algorithm only introduces two ad- ditional commu nication s teps because the Byzantine agreemen t for the collective d etermination of ran- dom num bers can be integrated into that fo r mes- sage total or dering, as n eeded b y the state-mach ine replication. The B A-algorithm is par ticularly suited for Byzantine fault toler ant systems op erating in the LAN en vironment, o r wher e replicas are con nected by high-sp eed low-latency networks. • W e fu rther present an alg orithm that uses the thresh- old coin-tossing scheme [2] as an alternati ve method for collectiv e determina tion of ran dom n umbers. The coin-tossing scheme is intr oduced in [2] as an instru- mental m echanism fo r a gro up of re plicas to reach Byzantine agr eement in asynchr onous systems. T o the b est of our k nowledge, ou r work is the ﬁrst to show its usef ulness in helpin g to ensure strong replica consistency without compro mising th e sys- tem integrity . • W e condu ct extensi ve experiments, in b oth a LAN testbed and an emu lated W AN environmen t, to th or- oughly characterize the per forman ce o f the two ap- proach es. 2 Byzantine F ault T olerance In this section, we intro duce the system model for our work, an d the pr actical Byz antine fault toleran ce algo- rithm (BFT alg orithm, for short) de veloped by Castro and Liskov [3] as necessary backgroun d informatio n. 2 Byzantine fault tolerance refers to the cap ability o f a system to tolera te Byzantine faults. It can b e achieved b y replicating the server and by ensur ing that all server repli- cas r each an agreemen t o n the total o rdering of clients’ requests d espite the existence of Byzantine faulty repli- cas and clients. Such an a greement is of ten referr ed to as Byzantine agreement [11]. In recent several years, a number of efﬁcient Byzan tine agreemen t algorithms [3, 9, 20] h av e been pro posed. In this work, we focu s on the BFT algor ithm and use the same system model as that in [3]. The BFT algorithm operates in an asyn chron ous dis- tributed environment. Th e safety prope rty of th e alg o- rithm, i.e., all co rrect rep licas agree on th e total ordering of req uests, is ensur ed withou t any assumption of syn - chrony . Howe ver , to gua rantee liv eness, i.e., for the algo - rithm to make progress towards th e Byzantine agre ement, certain sync hrony is need ed. Basically , it is assume d that the message transmission an d proce ssing d elay has an asymptotic upper bound. This bo und is dyn amically ex- plored in the algorith m in that each time a view chan ge occurs, the timeout for the new vie w is doubled. The B FT algorithm is e xecuted by a set of 3 f + 1 repli- cas to tolerate u p to f Byzan tine faulty r eplicas. One o f the replicas is designated as t he primary wh ile the rest are backup s. E ach replica is assigned a un ique id i , wher e i varies from 0 to 3 f . F or view v , the rep lica wh ose id i satisﬁes i = v mod (3 f + 1) would serve as th e primary . The view starts from 0. For each v iew ch ange, the v iew number is increased by on e and a new prim ary is selected. The norm al o peration of the BFT algorithm inv olves three p hases. During th e pre- prepare phase, th e p rimary multicasts a pre-prepar e message containing the client’ s request, th e cur rent view and a sequen ce number assigned to the reque st to all backu ps. A backup veriﬁes the re- quest and the ord ering inform ation. If the backup ac- cepts the pre-pre pare message, it multicasts a prepare message co ntaining the o rdering informa tion and the d i- gest o f the request b eing order ed. Th is starts the prepare phase. A replica waits until it has collecte d 2 f match- ing prepare messag es from d ifferent replicas, and the pr e- prepare message, b efore it m ulticasts a commit message to o ther r eplicas, wh ich starts the commit ph ase. Th e commit p hase ends when a replica has collected 2 f + 1 matching comm it messages from different rep licas (pos- sibly including the one sen t or would have been sent b y itself). At this p oint, the request message has been tota lly ordered and it is read y to be delivered to the server appli- cation once all previous requests ha ve be en deliv ered. All messages exchang ed am ong the replicas, and those between the re plicas and the clients are protected by an authenticato r [ 3] (for mu lticast messages), or by a m es- sage au thentication cod e (MAC ) (for point- to-poin t com - munication s). An a uthenticator is forme d by a numbe r of MA Cs, one f or each target of th e multicast. W e assume that the replicas and the clients each has a public/pr i vate key pair, and the p ublic ke ys ar e known to ev eryone . These keys are used to generate symmetric keys needed to produ ce/verify authen ticators and MA Cs. T o ensure freshness, the symmetric keys are periodica lly refreshed by the mech anism de scribed in [3]. W e assum e that the adversaries have limited computin g power so that they cannot break the security mechanisms describe d above. Furthermo re, we assume th at a faulty rep lica cannot transmit the conﬁden tial state, such as the rand om num- bers collectiv ely determin ed, to its colluding clients in real time. This can be achieved b y u sing a n application - lev el gate way , or a pri vac y ﬁre wall as d escribed by Y in et al.[20], to ﬁlter out ille gal replies. A co mprom ised rep lica may , howev er , replace a h igh entropy sou rce to wh ich it retrieves rando m n umbers with a d eterministic algorithm, and conve y such a n algorithm via o ut-of-b and or covert channels to its colluding clients. 3 Pitfalls in Controllin g Replica Randomness In this section, we an alyze a f ew well-known approaches possibly be used to ensure replica consistency in the pr es- ence o f replica ran domness. W e sho w that they are not robust against Byzantine f aulty replicas and clients. For replicas that use a pseu do-ran dom number gen era- tor , they can be easily ren dered deterministic by e nsuring that they use the same seed value to initialize the gen- erator . On e m ight attempt to use the seq uence number assigned to the r equest as the seed. Even thou gh this approa ch is p erhaps the most economic al way to ren der replicas deterministic ( since no extra communication step is needed and n o extra inform ation is to be in cluded in the control m essages for total ordering of req uests), it vir- 3 tually takes the ran domness away fro m th e fault toleran t systems. In the presen ce of Byzantine clients, the v ulner- ability can be exploited to compromise the integrity of the system. For examp le, a Byzantine faulty clien t in an on- line poker g ame can simp ly try out different integer values as the seed to the pseudo-rando m genera tor (if it is k nown to the clien t) to guess the hands o f the cards in the dealer and co mpare with the on es it h as gotten . The clien t can then place its bets according ly an d gain unfair adv antage. A seemin gly mor e robust approa ch is to use th e times- tamp as the seed to the p seudo-ra ndom nu mber genera - tor . As shown in [19, 21], the use o f timestamp d oes not o ffer more r obustness to th e sy stem b ecause it can also be guessed b y Byza ntine faulty clien ts. Furthermor e, the use of timestam p imposes serio us ch allenges in asyn - chrono us d istributed systems because of the r equireme nt that all replicas must u se the same timestamp to seed the pseudo- random nu mber generator . In [3], a mechanism is propo sed to handle this prob lem by asking the prim ary to piggy back its timestam p, to b e used by ba ckups as well, with the pre-prep are message. Howe ver, the issue is that th e b ackups have very limited ways of verifying the timestamp p roposed (othe r than that the timestamp must be mo notonically incre asing) with out resorting to strong synchro ny assum ptions (such as bou nds on pro- cessing and message passing). The o nly o ption re maining seem s to be th e use o f a truly random num ber to seed the pseu do-ran dom number gen - erator (o r to o btain rand om numb ers entirely f rom a high entropy source) . W e note that the elegant mech anism de- scribed in [3] cannot be used in this case becau se backup s have no means to verify whether the number p roposed by the primary is taken from a high -entropy source, or is gen- erated according to a deterministic algorithm. I f the latter is the ca se, the Byzantine faulty primary could contin ue colluding with Byzantine faulty clients witho ut being de- tected. Therefo re, we believe the most effecti ve way in coun- tering such threats is to collectively d etermine the random number, based on the contributions from a set o f rep licas so that B yzantine f aulty replica s cannot in ﬂuence the ﬁnal outcome. The set size depend s on the algorithms used , as we will show in the next two sections, but it mu st be greater than the num ber of faulty replicas to lerated ( f ) by the system. p r e - p r e p a r e p p - u p d a t e p r e p a r e c o m m i t p p - u p d a t e En t r o p y Ex t r a c ti o n En t r o p y Ex t r a c ti o n Pr e - Pr e p ar eP h as e En t r o p y Co m b i n at i o n En t r o p y Co m b i n at i o n Re p l i c a0 Re p l i c a1 Re p l i c a2 Re p l i c a3 Pr e p ar eP h as e Co m m i tP ha s e Figure 1: Norma l operation o f the B A-algorithm. 4 The B A-Algorithm The n ormal op eration of the B A-alg orithm is illustrated in Figure 1. As can b e seen, the collective-determination mechanism is seamlessly integrated into the original BFT algorithm . On ordering a request, the pr imary deter- mines the order of the requ est ( i.e., assigns a seque nce number to the request), and quer ies the ap plication for the type of operation associated with the req uest. I f the operation inv olves w ith a rand om number as inpu t, the primary ac ti vates the mechan ism for the BA-algorithm. The primary then obtains its share of rand om numb er b y extracting from its own entropy sourc e, a nd pigg ybacks the share with th e pre-p repare message m ulticast to all backup s. The pre- prepar e message h as th e form < P R E - P R E PA R E , v , n, d, R p >α p , where v is th e view numb er , n is the seq uence n umber assigned to the req uest, d is th e digest of the requ est, R p is the random numb er g enerated by the primary , an d α p is the authenticato r for the mes- sage. On receiving the pre-p repare messag e, a backu p p er- forms th e usu al chores such as the veriﬁcation o f the au- thenticator b efore it accepts the m essage. It also checks if the request will indeed trig ger a randomized oper a- tion, to prevent a faulty primary fro m putting u nneces- sary load s on correct replicas (wh ich could lead to a de- nial of service attack). If the pre-prepar e message is 4 acceptable, the rep lica creates a pre-p repare certiﬁcate for storing the relevant info rmation, gene rates a shar e of random number from its entro py so urce, and multicasts to a ll replicas a pp -upda te message, in th e f orm < P P - U P D A T E , v , n , i, R i , d>α i , where i is th e sending replica identiﬁer, R i is th e ran dom num ber contributed by rep lica i . When th e prim ary has collected 2 f pp-up date mes- sages, it combines the rando m num bers r eceiv ed a ccord- ing to a determin istic algorithm (ref erred to as the entropy combinatio n step in Figure 1), an d builds a pp -upd ate message with slightly different con tent than t hose sent by backup s. In the pp-upd ate m essage sent by th e prim ary , the R i compon ent is replac ed by a set of 2 f + 1 tuples containing the rando m numbe rs co ntributed by replicas (possibly including its own share), S R . Each tuple has the f orm . The replica identiﬁer i s included in the tuple to ease the veriﬁcation of the set at backups. On receiving a pp- update message, a backup accepts the message and stores the message in its data structu re provided that the message has a co rrect authenticator , it is in vie w v and it h as accepted a pre-prepare message to or- der the r equest with th e dig est d a nd sequen ce number n . A backup p roceeds to the entropy com bination step only if (1) it has accepted a pp-update message from the primary , and (2) 2 f pp -update messages sent by the replicas re fer- enced in the s et S R . Th e backu p req uests a retran smission from the primary for any missing pp-update message. After the entro py combina tion step is completed, a b ackup multicasts a prepa re message in the for m < P R E PA R E v , n, i, d ′ >α i , whe re d ′ is the d igest of the re- quest concatenate d by th e combined random number . When a rep lica ha s co mpleted th e entropy combina tion step, and it has c ollected 2 f valid prepare messages fro m different replicas (possibly includ ing the message sent or would have b een sent by itself), it multicasts to all replicas a c ommit message in the form < C O M M I T v , n, i , d ′ >α i . When a rep lica receives 2 f + 1 valid co mmit messages, it de cides on the seq uence num ber and th e collectively determined r andom num ber . At the time of deli very to the application, both th e re quest and the rando m nu mber are passed to the application . In Figure 1, th e du ration of the entropy extraction and combinatio n steps have b een intentionally exaggerated for clarify . In practice, the en tropy comb ination can be achieved by app lying a bitwise exclu si ve-or operation on p r e - p r e p a r e p r e p a r e c o m m i t sh a r e ge n e r a ti o n Pr e - Pr e p a re Ph a s e sh a r e s co m b i n at i o n Re p l i c a0 Re p l i c a1 Re p l i c a2 Re p l i c a3 Pr e p ar e Ph a s e Co m m i t Ph a s e Figure 2: Norma l operation o f the CT -algorith m. the set of rando m numbers collected, which is very fast. The cost o f e ntropy extraction depend s on the scheme used. Some sch emes, such a s the TrueRand method [10], allows very pro mpt entr opy extractio n. T r ueRand works by gathering th e underlyin g randomn ess fro m a comp uter by measuring the d rift between the system clock a nd the interrup ts-generation ra te on the processor . 5 The CT -Algorithm The no rmal o peration of the CT -algorithm is shown in Figure 2. The CT -alg orithm is the same as the BFT al- gorithm in the ﬁrst two phases ( i. e., pre-p repare and p re- pare p hases). The comm it phase is modiﬁed by in cor- porating thr eshold coin-tossing oper ations. Mo st existing ( k , l ) thresho ld signatu re sch emes [1, 5, 7, 8, 14] can b e used for the CT -alg orithm, wher e k is the thr eshold num- ber of signa ture shares needed to prod uce the group sig- nature, and l = 3 f + 1 is the total nu mber of playe rs ( i.e., replicas in our case) par ticipating th e threshold sign - ing. In most ( k , l ) th reshold signatur e schemes, a cor rect group signature can be deri ved by combining shares from k = t + 1 players, where t = f is the maximum nu m- ber of corru pted playe rs tole rated. Som e schemes, suc h as the RSA-based scheme in [14], allow the ﬂexibility of using up to k = l − t as the minimum numbe r of shares required to prod uce the g roup signatu re. Since l = 3 f + 1 5 in ou r work, k can be set as high as 2 f + 1 . This pr op- erty offers addition al pr otection again st By zantine faulty replicas [14]. At th e beginning of the commit p hase, each replica gen- erates its shar e of thresho ld signatur e by signin g using its priv ate key share, where d is the d igest of the request message and n is the sequ ence numb er as- signed to the requ est. Th is operatio n is r eferred to a s the share-gen eration step in Figure 2. The signatu re share is pigg ybacked with the co mmit message, in the fo rm < C O M M I T v , n, i, d, T ( d || n, i ) >α i , wher e T ( d || n, i ) is the replica i ’ s shar e of threshold signature. When a replica h as collected 2 f +1 valid commit mes- sages from different rep licas, it executes the shares- combinatio n step b y comb ining k thresh old signature shares p iggybacked with th e commit message s. After the shares have been combined into a g roup signatur e, it is mapped into a rand om numb er , ﬁrst by hashing the group signature with a secur e hash fu nction ( e.g., SHA1) , and then by takin g th e ﬁrst group of most sign iﬁcant b its from the hash according to the typ e of num bers n eeded, e.g., 32bits. The r andom n umber will b e delivered to- gether with the requ est to the app lication, when all p re- vious requests hav e been deliv ered. 6 Inf orm al Pr oof of Corr ectness In this section , we provide an info rmal argum ent o n the correctne ss of o ur two a lgorithms. The corr ectness crite- ria for the algorithms are: C1 All cor rect rep licas deliv er the same rand om num- ber to the applicatio n tog ether with th e associated request, and C2 Th e random n umber is secure ( i.e., it is truly rand om) in the presence of up to f Byzan tine faulty replicas. W e ﬁrst argu e fo r the B A-algorithm . C1 is guarante ed by th e use o f Byzan tine agreemen t algorithm. C2 is en- sured by th e collection of 2 f + 1 shares contributed by different replicas, an d b y a sound entro py combination al- gorithm ( e.g., by using th e bitwise exclusive-or operation on the set to pr oduce the comb ined ran dom num ber). By collecting 2 f + 1 contributions, it is guaran teed that at least f + 1 of them are fr om co rrect replicas, so faulty replicas canno t com pletely co ntrol the set. 1 The entropy combinatio n algorithm ensu res that the comb ined rando m number is secu re a s long as at least one sh are is secu re. The bitwise exclusiv e-or operation could be used to com- bine the set and it is provably secur e for this purpo se [21]. Therefo re, t he B A-algorithm satisﬁes both C1 and C2. Next we argue fo r th e CT -algor ithm. C1 is guara nteed by th e following fact: (1 ) The same message ( ) is sign ed by all correct replicas, according to th e CT - algorithm . ( 2) The thr eshold signa ture algor ithm gua ran- tees the p roductio n o f the same group sign ature b y co m- bining k shares. D ifferent r eplicas could obtain d ifferent set of k share s and yet the y all lead to the same group sig- nature. (3 ) The same secure hash fun ction i s used to h ash the grou p signatu re. C2 is guaranteed by the threshold signature algorithm. For the threshold sign ature alg orithm used in our imp lementation, its security is en sured by th e random oracle model [14]. T herefor e, the CT -algor ithm is corr ect as well. This comp letes our proof. 7 Pe rf ormance Characterization The B A-algorithm an d the CT -algorithm hav e b een im ple- mented and inco rporated into a Ja va-based BFT f rame- work. The Jav a-based BFT fram ew ork is developed in house and it is ported fr om th e C++ based BFT f rame- work of Castro and Lisko v [3]. Due to space limita- tion, the details of the fr amew ork impleme ntation is omit- ted. The CT -algorithm uses Shou p’ s th reshold sign ature scheme [14], implemen ted by St ev e W eis and mad e av ail- able at SourceForge [1 8]. The development and te st p latform co nsists o f a group of Dell SC440 servers each is equipp ed with a Pentium D processor of 2. 8GHz and 1 GB of RAM r unning SuSE 10.2 Linux . The nod es are c onnected via a 10 0Mbps LAN. As we noted earlier, the W AN experime nts are em- ulated by intro ducing artiﬁcial delays in commu nication, without injecting message loss. T o chara cter the cost of the two algo rithms, we use an echo application with ﬁxed 1KB-lo ng req uests and replies. The ser ver is replicated at four no des, and hen ce, 1 The use of f + 1 shares are all that needed for this purpose. How- e ver , collec ting more shares is more rob ust in cases when some correct replic as use lo w-entrop y sources. This is analogous to the beneﬁt of Shoup’ s thre shold signature scheme [14]. 6 Operation T ype Signing /Generation V eriﬁcation /Combination MA C 24.1 µs 237.3 µs Authenticator 80.2 µs 892.0 µs CT2-64 2.2 ms 4.6 ms CT2-128 7.1 ms 12.8 ms CT2-256 31.7 ms 58.5 ms CT2-512 179.1 ms 338.2 ms CT2-1024 1191.7 ms 1381.4 ms CT3-64 2.2 ms 5.6 ms CT3-128 7.1 ms 18.5 ms CT3-256 31.7 ms 80.0 ms CT3-512 179.1 ms 449.7 ms CT3-1024 1191.7 ms 2292.1 ms T able 1 : Executio n time fo r basic cryptograp hic opera- tions in volved with o ur algorithms. The data shown f or CT signing is for a single share. f = 1 in all ou r measu rements. Up to 1 2 con curren t clients are la unched across th e rem aining n odes (at mo st one client per node). Each client issues consecutive re- quests witho ut any think time. For the CT -a lgorithm, we vary a numb er o f param eters, includ ing the thr eshold value and the ke y length. W e also experim ent with certain optimization s. For all measuremen ts, the end-to- end la- tency is measu red at the client and the throughp ut is mea- sured at the replicas. The Java Sy stem.nanoTim e() API is used for all timing-re lated me asurements. 7.1 Cost of Cryptographic Operations W e ﬁrst rep ort the mea n execution laten cy o f basic cr ypto- graphic operation s inv olved in the B A-algorithm and the CT -algorithm becau se such info rmation is b eneﬁcial to the under standing of the behaviors we observe. The la- tency cost is ob tained when run ning a single client and 4 server re plicas in th e LAN testbed . The results are sum- marized in T able 1. As can be seen , the th reshold signa- ture o perations are quite expensiv e, a nd it is impractical to use a key as lar ge as 1024bit-lon g. W ithout any optim ization (and withou t fault), an end - to-end r emote call f rom a c lient to the replicated server using th e original BFT algo rithm inv olves a to tal of 4 authenticato r ge neration op erations ( A g ), 5 authenticator veriﬁcation op erations ( A v ) (on e does not need to ver- ify the m essage sent by itself), 1 MA C gene ration ope r- ation ( M g ) an d 2 MA C veriﬁcation ope ration ( M v ) on the critical execution path ( i.e., A g + A v for requ est send- ing and receiving, A g + A v for the pre- prepare phase, A g + A v for the prepare p hase, A g + 2 A v for the commit phase, and M g + 2 M v for the rep ly sending and rec eiv- ing). The BA-algorithm in troduces two additio nal com - munication step s and 2 A g and 3 A v on the critical path. The CT -algorithm d oes not requ ire any additional com - munication step, but intro duces 1 thresho ld signing o per- ation ( T s ) and 1 oper ation for thr eshold shares v eriﬁcation and comb ination ( T v ). From this analysis, the minimum end-to- end laten cy ac hiev able using the B A- algorithm is L min B A = 6 A g + 8 A v + M g + 2 M v (a rep lica can pro- ceed to the next step as soon as it r eceives 1 valid prepare message from other r eplica in th e prepare p hase, and 2 valid comm it m essages fr om other r eplicas in the co m- mit p hase, and the client ca n pro ceed to d eliv er the reply as soon as it has gotten 2 consistent replies). Similar ly , the minimum latency u sing th e CT -algorith m is L min C T = 4 A g + 5 A v + M g + 2 M v + T s + T v . Based on the values giv en in T able 1, L min B A = 8 . 1 ms and L min C T = 12 . 1 m s for k = 2 and 64bit-lo ng key . The minimum o verhead in- curred by th e B A-algorithm is 2 A g + 3 A v = 2 . 8 ms and that b y the CT - algorithm is T s + T v = 6 . 8 ms for k = 2 and 64bit-lo ng k ey . 7.2 LAN Experimental Results Figure 3 shows the summary of th e experime ntal r esults obtained in the LAN testbed. The end- to-end latency (plotted in log-scale) measured at a sing le clien t u nder various co nﬁguratio ns is shown in Figure 3( a). As a r ef- erence, the laten cy fo r the BFT system without the ad- ditional m echanisms describ ed in this pa per is sh own as “Base”. In the ﬁgure, the result fo r the B A -algorithm is shown as “B A ”, an d the results for the CT -algo rithm with different par ameter settings are lab eled as CT#-i, where # is the k value, and i is the key length. As can be seen, only if a very shor t key is used, the CT -algorithm incur s signiﬁcant overhead. Furtherm ore, the observed en d-to- end latency resu lts a re in-line with th e analysis provided in the previous subsection. The through put measurem ent results shown in Fig- 7 0 20 40 60 80 100 120 50 100 150 200 250 300 350 End-to-End Latency (ms) Throughput (calls/s) (c) Base BA CT2-64 with batching CT3-64 with batching CT2-64 no batching CT3-64 no batching 0 100 200 300 400 500 600 12 11 10 9 8 7 6 5 4 4 3 2 1 Throughput (calls/s) Number of Concurrent Clients (b) Base BA CT2-64 with batching CT3-64 with batching CT2-64 no batching CT3-64 no batching 10 0 10 1 10 2 10 3 10 4 12 11 10 9 8 7 6 5 4 4 3 2 1 End-to-End Latency (ms) Various Configurations (a) 1: Base 2: BA 3: CT2-64 4: CT2-128 5: CT2-256 6: CT2-512 7: CT2-1024 8: CT3-64 9: CT3-128 10: CT3-256 11: CT3-512 12: CT3-1024 Figure 3: LAN measure ment results. (a) End -to-end latency under v arious conﬁgu rations. (b) The system throug hput in the pr esence of different number of con- current clients. ( c) En d-to-en d latency as a fun ction of the load on the system (throug hput). ure 3(b) ar e consistent with those in the en d-to-en d latency measur ements. The re sults labe led with “no batching” are obtained for the original CT -algorith m d e- scribed in Section 5, i.e., one coin-tossing operatio n ( i.e., threshold share signing, com bination and veriﬁca- tion o f k shares) is used for every requ est. Those lab eled with “with batchin g” are measured when the req uests are batched (for total order ing, they all share the same se- quence n umber [3]) an d only on e coin-tossing o peration is used for the en tire batch of requ ests. As can be seen from Figure 3(b ), the g ain in thro ughpu t is sign iﬁcant with the batching optimization . Ho we ver , if sharin g the same random numb er among sev eral requests is a concern , this optimization must be disabled. For the B A-alg orithm, the co mmunicatio n steps for reaching a Byzantine agreement on the set of random number s are automatically batche d tog ether with that for requests total-ord ering. Batching the Byzantin e agree- ment for a set of ran dom numbe rs d oes not seem to in- troduce any v ulnerability . The a dditional o ptimization of one set of entropy extrac tion an d co mbination per batch of requests does not h av e any n oticeable perfo rmance bene- ﬁt. Th erefore, it is ad vised that this furth er o ptimization not to be considered in practice due to possible security concern s. Figure 3(c) shows the end-to-en d latency as a fun ction of th e load on the sy stem in the pr esence of concurren t clients. W e use the system through put as a metric for the system load becau se it be tter reﬂects the actu al load on the system than the number of clients. It is also usefu l to comp are with th e results in the W AN experim ents. As can be seen, fo r the CT -algo rithm, withou t the batching optimization , the latency in creases very sharply with the load, due to the CPU intensive thresho ld sign ature com- putations. The results fo r th e CT -alg orithm with keys larger than 64bits are om itted in Figure 3(b) and (c) to avoid clutter- ing. The thro ughpu t is signiﬁcantly lower and the end-to- end latency is much high er than those o f the B A-algorith m in these conﬁgu rations, especially wh en the load is high. 7.3 W AN Experimen tal Results The exper imental r esults o btained in an emulated W AN en vironm ent are shown in Figure 4 . The observed me trics and the parameters used are identical to those in the LAN experiments. As can be seen in Figure 4(a) , the end-to- end latency as perceived by a single clien t is similar for the B A-algorithm and t he CT -algo rithm with a key size up to 2 56bits (for eith er k = 2 or k = 3 ). This can be easily understoo d because the end -to-end latency is dominated by the c ommun ication delays, as indicated by the en d-to- end latency for the base system included in the ﬁgure. Figure 4(b) shows p art of the measureme nt results on system throughp ut und er different n umber of concurre nt clients. T o av oid cluttering, o nly the results for k = 2 and key sizes of up to 256b its ar e shown . The throug hput for the base system is includ ed as a reference. As can be seen, when b atching fo r the coin-tossing opera tion is en abled, 8 0 500 1000 1500 2000 2 4 6 8 10 12 14 16 18 20 22 End-to-End Latency (ms) Throughput (calls/s) (c) Base BA CT2-64 with batching CT2-64 no batching CT2-128 with batching CT2-128 no batching CT2-256 with batching CT2-256 no batching 0 5 10 15 20 25 30 12 11 10 9 8 7 6 5 4 4 3 2 1 Throughput (calls/s) Number of Concurrent Clients (b) Base BA CT2-64 with batching CT2-64 no batching CT2-128 with batching CT2-128 no batching CT2-256 with batching CT2-256 no batching 10 0 10 1 10 2 10 3 10 4 12 11 10 9 8 7 6 5 4 4 3 2 1 End-to-End Latency (ms) Various Configurations (a) 1: Base 2: BA 3: CT2-64 4: CT2-128 5: CT2-256 6: CT2-512 7: CT2-1024 8: CT3-64 9: CT3-128 10: CT3-256 11: CT3-512 12: CT3-1024 Figure 4: Emulate d W AN measure ment results. (a) End - to-end latency und er various conﬁgu rations. (b ) Th e sys- tem through put in the presen ce of different numb er of concur rent clien ts. (c) End-to -end latency as a fun ction of the load on the system (throug hput). the CT -alg orithm with shor t-to-med ium sized keys out- perfor ms the B A-alg orithm. When batch ing is disabled, howe ver , the B A-algorithm per forms better unless very small key is used for the CT -algorithm. The end-to-en d latency results sho wn in Figure 4(c) conﬁrm the trend. 8 Related W ork How to en sure stro ng rep lica co nsistency in th e presenc e of replica no ndeterm inism has bee n of research interest for a lo ng time, es pecially for fault tolera nt systems using the b enign fault mod el [3, 4, 12, 15]. Howe ver, while the importan ce of the u se of good ran dom num bers has lo ng been recognized in b uilding s ecure systems [19], we have yet to see substantial research work on how to p reserve the r andomize d operations necessary to ensur e the system integrity in a fault tolerant system. For the type of sys- tems where th e use of rand om numbe rs is c rucial to th eir service integrity , the benig n fault mode l is o bviously inad- equate and the Byzan tine fault model must be employed if fault tolerance is required. In the recen t se vera l years, signiﬁcan t progress has been made tow ards building practical Byzan tine fault t ol- erant sy stems, as shown in the series of semina l paper s such as [3, 4, 9, 20]. T his makes it possible to address the problem of reconciliation of the requirement of strong replica consistency and th e preservation of each rep lica’ s random ness for real-world applications that requires both high availability and high degree of security . W e believ e the work pr esented in th is pa per is an important step to- wards solving this challengin g p roblem. W e shou ld n ote that some form of replica no ndeter- minism (in par ticular, replica non determinism r elated to timestamp operations) h as been studied in the c ontext Byzantine fault tolerant systems [3 , 4]. Howev er , we have argued in previous sections that the existing appr oach is vulnerab le to the presence of collu ding Byzan tine faulty replicas and clients. The main idea of this work, i.e., collectiv e de termina- tion of ran dom values based on the contributions m ade by th e replicas, is bo rrowed fr om the d esign princip les for secure comm unication protoco ls [17]. Howe ver , the application of this princip le in solv ing the strong r eplica consistency problem is novel. The CT -algorithm is inspir ed by the work of Cachin, Kursawe and Shoup [2], in particular, th e idea of exploit- ing threshold sign ature techniq ues for a greement. How- ev er , we have adap ted this idea to solve a totally differ- ent pro blem, i.e., it is used towards reac hing in tegrity- preserving strong rep lica consistency . Furtherm ore, we carefully studied what to sign fo r each re quest so that th e ﬁnal rand om n umber obtained is not vulner able to attack s. 9 Conclusion and Futur e W ork In this paper, we presente d ou r work on reco nciling the re- quiremen t of strong replica consistency an d the d esire o f 9 maintaining each r eplica’ s ind ividual randomness. Based on the central idea of collectiv e determin ation of ran- dom v alues need ed b y the applicatio ns for their serv ice integrity , we design ed an d imple mented two algo rithms. The ﬁrst one, the BA-algorithm, is ba sed on r eaching a Byzantine agreemen t on a set of random number shares provided b y 2 f + 1 replicas. The second on e, the CT - algorithm , is based on th reshold signatur e techniqu es. W e thorou ghly characterize d the perfo rmance of the two al- gorithms in both a L AN testbed an d an em ulated W AN en vironm ent. W e show that the B A-algorithm in general out-per forms the CT -algorithm in m ost cases except in W AN operations und er relatively light load. Furth ermore, the overhead incurred by the B A-a lgorithm with r espect to the base BFT system is re lati vely small, making it po s- sible for practical use. Future research work will focus on the threshold key share r efreshmen t issue for the CT -algorithm. T o en- sure lon g-term ro bustness o f the system, th e key shares must be proa cti vely ref reshed periodically . Otherwise, the random nu mbers gener ated this way m ay age over time , which may open the do or fo r attack s. Th e thresh old signa- ture algo rithm used in th is work [14] does n ot hav e built- in mechanism for ke y share refreshmen t. W e will exp lore other thresh old signature algorithm s th at offer this capa- bility [1, 7, 8]. Refer ences [1] A. Boldyrev a. Efﬁcient threshold signatures, multisig- natures and blind signatures based on the Gap-Difﬁe- Hellman-Group signature sche me. Lectur e Notes in Com- puter Science , 2567:31– 46, S pringer-V erlag, 2003 [2] C. Cachin , K. Kursa we, and V . Shoup . Random oracles in Constantinople: Practical asynchronous Byzantine agree- ment using cry ptography . Jou rnal of Cryptolo gy , 18:219– 246, 2005. [3] M. Castro and B. Liskov . Practical Byzantine fault t oler- ance and proactiv e recovery . ACM Tr ansaction s on Com- puter Systems , 20(4):398– 461, Novembe r 2002. [4] M. Castro, R. Rodrigues, and B. Liskov . B ASE: Using abstraction to improv e fault tolerance. ACM T ransactions on Computer Systems , 21(3):236–2 69, August 2003. [5] Y . Desmedt. Threshold cryptography . Europ ean T ran sac- tions on T elecommunications , 5(4):449–457 1994. [6] Y . Deswarte, L. Blain, and J.-C. Fabre. Intrusion t oler- ance in distributed computing systems. Proce edings of the IEEE Symposium on Resear c h i n Security and Pri vacy , pages 110–121 , Oakland, CA, May 1991. [7] Y . F rankel, P . Gemmal, P . MacK enzie and M. Y ung. Proactiv e R SA. Pr oceeding s of the 17th Annual Inter- national Cryptology C onfer ence (Crypto’ 97) , Santa Bar- bara, CA, August 1997. [8] R. Gennaro, S. Jarecki, H. Krawczyk and T . Rabin. Ro- bust threshold DSS signatures. P r oceeding s of the In- ternational C onfer ence on the Theory and Application of Cryptogr aphic T echniques , S aragossa, Spain, May 12-16, 1996. [9] R. Kotla, L. Alvisi, M. Dahlin, A. Clement, E. W ong. Zyzzyv a: S peculati ve Byzantine fault tolerance. In Pr o- ceedings of 21st ACM Symposium on Operating Systems Principles , W A, 2007. [10] J. Lacy , D. Mitchell, and W . Schell. CryptoLib: Cryptog- raphy in software. Proc eedings of the 4th USENIX Secu- rity Symposium , pages 1–17, 1993. [11] L . Lamport, R. Shostak, and M. Pease. The Byzantine generals problem. ACM T ran sactions on Pr ogr amming Langua ges and Systems , 4(3):382–401, July 1982. [12] D. P o well. Delta-4: A Generic Arc hitectur e f or Depend- able Distributed Compu ting . Springer-V erlag, 1991. [13] T . Rabin, A simpliﬁed approach to threshold and proac- tiv e RSA. Proc eedings of the 18th Annual International Cryptology Confer ence (Crypto’ 98), Santa Barbara, C A, August 1998. [14] V . Shoup. Practi cal th reshold signatures. Lectur e Notes i n Computer Science , 1097:20 7–220, Springer , Berlin, 2000. [15] J. Slember and P . Narasimhan. Living with nondetermin- ism i n re plicated mid dle ware applications. In Pr oceeding s of the ACM/IFIP/USENIX 7th International Middlewar e Confer ence , pages 81–100, Melbourne, Australia, 2006. [16] F . Schneider . Implementing fault-tolerant services using the state machine approach: a tutorial. ACM Computing Surve ys , 22(4):299–31 9, 1990. [17] A. T anenbaum. Computer Networks , Prentice Hall, 2003, 4th Edition. [18] T hreshSig: Jav a threshold signatures. A vailable at http://threshsig.sourcefor ge.net/ [19] J. V iega and G. McGra w . Building Secur e Software . Addison-W esley , 2002. [20] J. Y i n, J. -P . Martin, A. V enkataramani, L. Alvisi, and M. Dahlin. Separating agreement from execution for byzantine fault tolerant services. In Pr oceedings of the ACM Symp osium on Operating Systems Principles , pages 253–26 7, Bolton Landing, NY , USA, 2003. [21] A. Y oung and M. Y ung. Malicious Cryptog raphy: Expos- ing Cryptovir ology . W iley , 2004. 10

Integrity-Enhancing Replica Coordination for Byzantine Fault Tolerant Systems

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment