Using In-Memory Encrypted Databases on the Cloud

Storing data in the cloud poses a number of privacy issues. A way to handle them is supporting data replication and distribution on the cloud via a local, centrally synchronized storage. In this paper we propose to use an in-memory RDBMS with row-lev…

Authors: Francesco Pagano, Davide Pagano

Using In-Memory Encrypted Databases on the Cloud
Using In-Memory Encrypted Databases on the Cloud Francesco P agano Departmen t of Inform ation Tech nology Università degli Studi d i Milano Mila no, I tal y franc esco .p agano @uni mi. it Davide Pagano School of Eng ineering Politecnico di Milano Mila no, I tal y davide1.pag ano@m ail.polimi .it Abstract — Stori ng data i n th e clou d pos es a nu mber of pri vacy issues . A w ay to handl e the m is s upport ing data repl icat ion a nd distr ibut ion on t he cl oud vi a a l ocal, cent ral ly s ynch roni zed storage. In this pa per we propose to use an in- memory RDBMS with r ow-lev el d ata en cr yption fo r gran tin g and revok ing a ccess rights to distributed data . This type of solution is rarely adopted in convention al RDBM Ss because it require s several c omplex steps. In this paper we focus on implementation and benchmarking of a test syste m , w hich shows that our si mple yet effective solution overco mes most of the proble ms. Keywords- cl oud; dat abase; encrypti on; data sharing I. I NTRODUCTION Storing s ensitive data in th e cl oud m ay lead to secu rity faul t when it res ides on untru sted se rvers. To s olve this issu e, a distri bute d ap proach w as pres ente d in [ 1], w here agen ts sh are confi denti al dat a in a secure man ner using simple grant- and- revok e perm is sion s on sh are d dat a. Th e addi ti onal step was the implem entation of a distr ibuted DBMS w ith r ow -level encry ption capa bilities to enable a str ong acc ess control to record s, allowing re vocation o f rights. T his solution is not freque nt in liter ature because of its inhere nt slowness. In this pape r w e presen t a real im pl em entation of su ch sof tw are an d descri be how we solved the per formance prob lems. We first d escri be a sch ematic m odel that w e introduce d in a previ ous pa per (sec tion II) , then, after taking a survey on crypto grap hy in datab ases (se ction II I), gra nularity in data base-l evel en cry ption ( section IV) an d in-m emory data bases (s ecti on V ), w e de scri be ou r s olu ti on (s ect ion VI ), that w e im plem ented an d benchm ark ed (s ecti on V II). II. T HE DI STRI BUTED ARCHI TECTURE A. The model Hereon, w e will us e th e te rm dos s ier t o indicate a set of correl ated in formati on. Our data m od el is inform ally represent ed in F ig. 1 . To sim plify the dis cussion, w e introduce the foll owing assumptions: • Each dos si er h as only one owner; • Only the dossier 's owner can change it. These assu mptions p erm it the u se of an elem entary cascade synchronizati on in wh ich the ow ner will su bmit the changes to the receiv ers . Fig ur e 1 . The mo del In the mode l, each n ode re presents a loca l, sin gle- use r applica tion/data base de dicat ed to an in dividu al user ( u n ). The node store s only the dossier s that u n owns. Shar ed do ssiers (in this exam ple, d 1 ) are replica ted on each node . Whe n a node modifi es a shared do ssier, i t must synchroni ze with the othe r nodes t hat hold a copy of it. Fig ur e 2 . The d ist ribut ed archi tec ture Our soluti o n (Fig . 2) consists o f tw o parts: a t rusted cli ent agent and a remote untrusted synchro nizer o n the clo ud. The cli ent maintains local da ta st orage wh ere: • The dossi ers w hom sh e ow ns are (o r at leas t can be) store d as plain text; • The others, instea d , are encry pted, each using a diffe rent key. 30 978-1-4577-1186-2/11/$26.00 ©2011 IEEE The Synchro nizer store s the keys to decrypt the sha red doss iers ow ned by the local cli ent an d the m odif ied d o ssie rs to synchro nize. No informatio n is in clear-fo rm: dossier s are encr ypted using the keys, whic h, in turn, are e ncrypted using the rec eivers’ pu blic keys. W hen an other client needs to decr ypt a doss ier, she must conne ct to the Sync hro nizer and obtain the cor respon ding dec ryption key . The data and the key s are st ored in tw o separate en titi es an d the refore n one can access inf ormati on withou t the collabo rati on of the othe r part. III. C RYPTOGRAPHY IN DATABASES Confidential ity, integri ty an d availabil ity a re the main properties of data b ase prote ction. Confid entiality has been defi ned by the I nterna tional Orga nizatio n for Standard ization (ISO) in ISO -17799 1 as "en surin g that inf ormation is ac cessible only to th ose autho rized to hav e access "; data integrity assures that none can modify the inf orm ation w ithout a trace; availability provides ac cess to data by authorized users i n a reason abl e tim e. Along the y ears, a lot of ACP (A ccess C ontrol Poli cy) hav e been defin ed, bas ed on databas e m odel ( relational rather than object ) and p olicy control (i .e., DAC -Disc retiona ry Access C ontr ol, R BACC- Role-Bas ed Acces s Cont rol, MAC- Mandat o ry A ccess Contr o l). Trad itionally , ACPs are bas ed on the assum ption that th e DBA (DataBase A dmin istrator ) is trusted, b ut it is not assure d in the o utsour ced data c enters a nd in the cl oud, wh ere the pl atform- as-a- service (PaaS ) provide r is external t o data own er. A sol ution to this pro blem is that the DBMS tre ats only raw -data, encry p ted in su ch a way that D BA (or an othe r intruder) cannot re ad the info rmation . There are three m ain categories of database encryption [ 4]: • Stor age level encr yption • Database leve l encrypti on • Application level encryption A. Storage-level encryption ( S LE) Data is encry p ted eith er at the file l evel (NAS/DAS ) or at the b lock level ( SAN) [5]. A sho rt while a go, Toshib a has releas ed a ha rdw are im plementation of SLE, a fam ily of hard drives - called S elf-Encry pting Disk. The sy stem is b ased on the Opal specificatio ns of Trusted Computing G roup, suppor ts native e ncryptio n AES 256 and can automati cally delet e its contents if not us ed by the righ tful ow ner. This en cryption is not sel ective; it en crypts an entire su pport o r port ions of support . It preve nts theft of stor age but it is unsuitab le for preve nting unauthorized access by a honest-but-curi ous system administra tor. On th e other h and, it is en tire ly tr ansparen t to th e system, so it needs no databas e modification. B. Database-level encryption (D LE) DLE secu res d a ta as it is w ritten to an d re ad from a data base . The enc ryption is appl ied to the db a t va rious granula rities, su ch as d atabas e, t ables , colum ns (mos t freque ntly), and rows. It can be related with some logica l conditions for sele cting affecte d data, to o. The cons are: 1 I SO/IEC 17799, Jan 4, 200 9 • DLE is no t tr anspar ent t o applicati on as SLE, so it involves s ome modifi cations t o the index ed en crypte d data a nd in sto red proc edures and tr iggers; • The sy stem is sl owed do wn by th e encry ption over head ; • Usually, it is n ot a d efens e fro m th e curiou s DBAs. C. Application-level encryption ( ALE) In this case, data is enc rypte d/dec rypted b y the a pplication that gene rates it . Plain-text data is m ad e ava ilable only at c lient side, w hile data sent over t he network is encrypted. T his scheme u sually involves ret urning larger result sets to the client, w hich are th en filt ered at client sid e, w hen decrypte d. To accomplish this result, appl icat ions nee d to b e modif ied and th e netw ork traffic in creas es. IV. G RANULARITY IN DATABASE - LE VE L EN C R YP T IO N The m ost c ommon sol uti on for dat a prot ect ion is DL E, which ca n have differ ent types of gra nularity: • databas e • tables • columns • rows A. Database In this case, the whole data b ase is encry p ted usin g only one key , as if it w as a single file. The con s of this encry ption a re: • It doesn't all o w to define diff erent privileg es on each table; • T he schema defini tion beco mes partic ularly comple x; • T he system p erfor mance suffe rs co nsider able degra dation (an impr ovem ent can be ach ieve d with appropr iate cachi ng); • Its effe ctiven ess is c los ely lin ked t o the deg ree of confiden ce w ith which the m aster key is ke p t. For these reasons, the dat abase g r anularity solution is seldom used. B. Tables A specifi c key enc r ypts e ach tabl e sep arat ely. P erfo rman ces are bett er than the previous soluti on, but still ve ry far from thos e of a c lear -text da taba se, beca use en crypti ng an e xisting table can b e very slo w. The d efinitio n (and e nforce ment) of integr ity constrai nts, foreign keys and indexe s are very complex . C. Columns All the data in a column (or set of colum ns) of a table is encrypte d w ith the same key . T his is the soluti o n ado pted by most DBMS suppliers , as it allows encry p ting only sensitive data. Howev er, it n eeds t o bu ild a d-hoc indexes custom ized f or 31 the e xpected querie s (again, a t the e xpense o f perfor mance). With this appr oach, it is also not possible to define acc ess privileg es on "horiz ontal" p ortions o f a ta ble such as ro w sets (e.g. , all o w ing acces s on ly to rows with id> 10 0 ), as it is awkw ard to encry pt row s with dif ferent key s depen ding on the user. T his ty pe of m echanism usually relies on thir d-party applicati o ns, o r otherw ise it is delega ted to in strum ents such as tri ggers or stored pro cedur es. D. Rows Eac h single row in a table i s encrypted using a differe nt key. The main a dvantag e of t his techn ique is the capability to defin e acc ess con trol on a subset of data (r o w s) of a tabl e basing on the distri buti on of d ecry ption key s. Let’ s assum e th at we h ave a tab le that includes th e data of all stu dents in a univer sity and we want to grant a ccess to the secr etary's off ice of e ach c ourse on ly to data of stud ents enr olled in that course. If we are using d atabase or tab le-level encryptio n, we would have t o cr eate a view for ea ch course and gran t the rights to the corres ponding secr etary 's office, w ith the p roblem s outline d above (als o, data stays reada ble by the DBA s). Using co lumn- level en cry ption, the perm issions m ust be s pecified at the fiel d level an d, unles s appropri ate in dexes o r cum bersom e proce dures are implem ented (w hich may also expos e th e d ata to inference or statistic al atta cks), it w ould be im p ossible to mak e the inf ormation in stan tly accessible to authorized users. Using row -level encrypti on, instea d, it is possi b le to make availab le to th e auth orized use r the k eys ( o r the key ) that c an be used to dec rypt only the all owed row s. This techn ique, be sid e to ensu re a bett er m anag eme nt of acce ss pe rm issi ons, pr events any kind of s tatistical analy sis on the table. In a n orm al RDBM S, however, thi s technique ha s significant disad vantages in term s of perf ormance an d functiona lity: query ing w ould be po ssible onl y through the co nstructio n of appropr iate inde xes for ea ch colum n of the table (with a cons idera ble w aste o f resour ces both in te rms of time and space), w hile th e constrai nts and foreign keys would be almost unusabl e. Another major issue concerns the management o f keys: row- level encry ption c o uld potent ially lead t o the gen eration and main tenance ( and / or dist ribution ) of a key f or each row of each tabl e encrypted with this m ethod. To solve (o r reduc e) th e problem, we ca n use some techniq ues of key management, such as : • Broad cast (or G roup) en cry ption[ 13] : row s are divi ded into equi valen ce classes, bas ed on recipi ents. Every class is encrypted u sing an asymm etric algorithm where th e encrypti on key is m ade in a w a y th at each recipien t can decrypt the inform ation using only its ow n priv ate key . Eithe r the public an d the pr ivat e key s are ge nerated by a trusted entity. • Identity Base d Enc ryption [ 11]: it bounds th e encry ption key to the identity of recipien t. Each recip ient genera tes by itself a key pair used to encry pt/dec rypt inf orm ation. • Attribute Based En cryption [ 1 2]: it bounds the encry ption key to an att ribut e (a grou p) of reci pient. Each re cipient receives by a truste d entity the priv ate key u sed to decry pt, while the en crypti on key is calculat ed by th e sen der. How ever thes e techniqu es ar e complex an d ther efore convent iona l RDBMSs don 't u se encrypti on at the row-lev el. V. I N MEMO RY DA TA BASES “An in- memory da tabase (IMD B also known as main mem ory dat abase sy stem or MMDB an d as rea l-tim e databas e or RTDB) is a data base manag ement sy ste m that primarily re li e s on ma in me mor y f or c o mp ut er dat a sto ra ge .” 2 It is intere sting noting tha t, while a conve ntional database system stores data on disk but caches it into mem ory for access , in an IMDB the data resi d es perm anently in the main phy sical memor y and t here is a b ackup copy on d isk [14]. “In-memo ry databa ses have recent ly become an intrigui ng topic for the database indust ry. With the mainstream availability of 64-bit servers w ith m any gig abytes of mem ory a completely RAM based databas e solut ion is a tem pting pros pect t o a m uch w ider audien ce.” 3 IMDB s are inte nded either for pe rsonal us e (becaus e they are c ompara tively sm all w .r.t. t raditional data bases ), or fo r perform ance-critic al system s (for thei r very low response time a nd very high throughput). They use main mem ory s tructur es, s o they n eed n o t ransl ation f rom disk to mem ory f o rm, an d no c achin g and th ey p erform better th an traditi onal DBMSs w ith Solid State Disks . Usually , the use of vola tile memory-ba sed IM DBs supports the t hree ACID properties o f atom icity, consistency and isolation, but lack s support for the du rability pr o perty . To add th is w hen non- volatile rand om access memory (NVRA M) is n ot available , IMDB s use a co mbinatio n of tra nsaction lo gging and p rimary data base ch eck-pointing to the s ys tem's hard d isk: th ey log changes f ro m c o mm itted trans actions t o phy sical m edium and, peri odically , upda te a disk im age of th e data base. H aving to write u pdates to disk, the w rite operati ons are h eavie r than read -only. Logging p olicies vary fro m product to product: some leav e the ch oice of w hen to w rite the a pplicati on on file , others do all the checkpoints a t regu lar in tervals of time or af ter a certain amount of dat a entere d / edited. TABL EI. I M DB S P ROS AND C ONS Pros Cons Fast transactio n s No tr ansla tion High re liability Mult i-Us er co ncurrency Compl ex ity o f durability Obviously , the limitati on of this type of databas e is relate d to the amount o f RAM o n computer ho sting the d b. But given their natu re, IMDBs are w ell suited to be distribute d a nd replicat ed ac ross multiple n odes to inc r ease c apacity and perf orm ance. Th e p rop osed soluti on work s a round th is limitat ion: not having a single ce ntral database co ntaining the whole da ta, we pre ferred to give one da tabase for ea ch client appli cation . Th is data base con tains only ow ned dat a, wh ile external data w ill be added (or removed) v ia the Synchroniz er, based o n access pe rmissions. T o minimize c ryptograp hy 2 ht tp:/ /en.w ikipedia .or g/w iki/In-me mory_da tabase 3 ht tp:/ /www .remo te-dba. net/ t_i n_mem ory_co hesio n_ss d.htm 32 over head, we encr ypt only rows "r eceived" by other nodes, while ro ws owned by the loca l node are s tored in cl ear form. Well-kno wn ope n solutions of IM DB are Apa che Derb y, HyperSQL ( HSQLDB) and SQL ite. For o ur im plem entation, w e chose t o use the open sou rce Hy perSQL re l. 2. 0. A. HyperSql HyperSQL 4 is a pure Java RDBMS. Its strength is, besides the li ghtness (about 1.3Mb for ver sion 2.0), the capabi lity to run ei ther as a Server instan ce either as a m odule intern al to an applic ation (in-pro cess). A databa se started "in-pr ocess" ha s the advant age of s peed, but it is dedicat ed only to th e containi ng appli cation (n o oth er a pplicat ion can qu ery the dat abase ). For our purposes , w e chose serve r m ode. In this w ay, th e databa se engine run s insi de a JVM and will sta r t on e or mor e "in- proces s” data bases, list ening request s fr om process es in the local mach ine o r rem ote compute rs. For intera ctions betw een clie nts and database ser ver, we can use three differe nt prot ocols: • HSQL Server: the fas test and most u sed. It implem ents a propriet ary communicat ion protoco l; • HTTP Serve r: it is u sed w hen access t o the se rver is limited only to HTTP. It cons ists of a w eb server that allo ws J DBC clie nt s to con nect ove r http ; • HT TP S ervle t: a s the Ht tp Ser ver , but it is used when accessing t he database is managed by a servlet contain er or by an applic ation serv let (e.g. Tom cat). It is limite d to usin g a single database. There ar e differ ent ty pes of databases (call ed catal ogs) tha t can be cr eated with Hype rSQL, t hat differ i n the methodo logy adopted for data storage: • res: this ty pe of catal og provid es for the st orage of data into sm all JAR or Z IP files; • mem: data is st ored com p letel y in th e mach ine’s RA M, so there is no persisten ce of inf o rmation outsid e of the applica tion lif e cycle in the JVM; • file: data is stored in files res iding into the file system of the m achine. In ou r w ork w e used the las t t ype of c atalog . A catalog file can use u p to si x files on the f ile system f o r its op erations. The name of th ese files c onsists of the name of the da tabase plus a dot suffix. Assuming we have a database called "db_ test", the files w ill be: • db_t est.prop erties contai ning the b asic settings o f the DB; • db_t est.log: used to peri odically save dat a from the data base, to preven t data loss in cas e of a cras h; • db_t est.scrip t: contai ning the tabl e definit ions and other components of th e D B, plus data of not-cach ed tables; • db_t est.da ta: contai ning the actu al data of cached tables . It can be not presen t in som e catal ogs; • db_t est.ba ckup: co ntaining the compr essed bac kup of last “.dat a ” file , that may be not present in some catal ogs; 4 w ww .hsqldb.org • db_t est.lobs : used for stori ng BLOB or CLOB fields. Besides these f iles, HyperS QL can connect to CSV files . A clie nt appl ication ca n connect to Hyper SQL serv er using the JD BC driv er (. Net and O DBC driv ers are “ in l ate st ages of developm ent”), specif yin g the ty p e of data base t o access (f ile , mem or res ). Hy perSQL im p lemen ts the SQL standard eith er for tempo rary tables either for persistent ones. Temp orary tables (TEM P) are n ot st ored on th e fil e sy stem and their lif e cycle is lim ited to the durati on of the connecti o n (i.e . of th e Connection object). T he visibility of data in a TE MP table is limited to the co ntex t of connecti o n us ed to populate it. With regard to the p ersistent tabl es, instea d, HyperSQ L provides three di fferent t ypes of tables, a ccording t o the method used to stor e the data : • MEMORY: it is the default option when a table is created w ithout specify ing the type . Memory t able dat a is kept ent irely in m e mory , while an y ch ange to its structu re or contents is rec orded in .log and .scri pt files . These tw o files are read at th e opening of database t o load data into m emory. All ch anges are saved w hen clo sing the database. These proce sses can take a long time in th e c ase of tables large r than 10 MB . • CACHED: when th is type of t able is ch osen, only pa rt of the data (and rel ated in d exes) is stored in memory, thus all owing th e use of la rge tables at the ex pense of perfor mance. • TEXT: the data is stored in f orm atted files su ch as .csv. In ou r w ork, w e use MEMORY t ables . The Lo ader and the Seriali zer are th e m ain parts of HyperSQL that we analy zed and m odifie d. They ar e the mechan isms that l oad th e dat a from text file s at the opening and save the m to the database at clo sing. 1) Loader We suppose th at the client connects to the DBMS using instructions like: Class.forName("org.hsqldb.jd bcDriver" ); Connection c = DriverManager.getConnectio n( "jdbc:hsqldb:f ile:myDB", "SA", ""); Having use d a catalo g of file type, the sta tic method newS ession( ) of clas s o r g.hs ql.Databas eManag er is cal led. Its task is to open the database or to connect to it (if it is alrea dy opene d). o rg.hs ql.Database is the clas s th at r epres ents the instance of the database in mem ory, so th is is the root of al l data str uctures de signed to contai n the informati on of the database. Once th e database is loaded into m emory, tw o fundamenta l classe s are used for the parsing of text files: org.hsq ldb.Parser Command (for management o f sessions and state ments) and o rg.hsqldb. Scanner ( for the re cognitio n of indiv idu al SQL tok ens) . The class respons ibl e for m aintain ing the dat abase (rel ate d to the ses sion) is org .hsqld b.Sessi onDa ta, whos e main attri b utes are : private final Database databas e; private final Sessio n session; PersistentStor eCollectionSessi on persistentStor eCollection; PersistentStore is the d ata stru cture that conta ins all row s in a database table . Specifically , this is an interface implemen ted by using differ ent classes depe nding on the t ype of table 33 represen ted: in ou r case w e use M EMORY ta bles, so that t he affecte d class is the org .hs qld b.persi st.R owStoreAVL Memory. When th e Database object is cr eate d, particu la rly at the invoca tion of m ethod reopen (), the clas s org.hsq ldb.persist .Logger, which is the class that repr esents the interfa ce f o r I/O to and f rom text f iles o f th e databas e, is instantiate d . The s tarter method of Logger class is openPersist ence(), w hich will open the s pecifi ed dat abase (if the data b ase is new , the rel ate d text files are creat ed). The cl ass org.persist . Log is instanti ated after verify ing the integrity of the .pro pert ies file. Our focus is o n method open() o f this class which checks th e status of the Databas e (if it w as close d properly , if i t w as modified, an d so on) an d then in stantia tes the class org.hs qld b.sc riptio. Script Reade rText t o read the .script file using t he method readAll(Se ssion s) . Th e clas s org.hsq ldb.rowio. RowInputTextLo g is used to read a si ngle line of the database an d the object that represents a row in the data base is t he ob j ect Row. Two m ethods of class Script ReaderT ext are invoked: • readDDL( ): reads the DDL statem ents and initi alize a class R ow Input TextLog f or each line read from the .scri pt file . • readExistin gData (): it extrapo lates th e values of each single lin e, initiali zes th e row and adds i t to the PersistentStore ; Becau se of the databas e file structu re, w e need to lo ok for Inser t statemen ts to fin d the r ows of a tabl e. When on e of these state ments is encount ered , it is managed by the method processStatement(Session s) o f ScriptReade rText class. For each fiel d in the row , it chec ks wh ether it is prim ary key and determ ines the data ty p e, then th e value of th e field is read by the meth od readData (D ataTyp e t) of RowInputTe xtLog cla ss. 2) Serializer The s erial ize r is th e module respons ible fo r saving the modified d ata into .sc r ipt an d .log fil e s. Changes are initially written in .log f ile and mov ed to the .script file , w hen a shutdo wn comm and is issued. Each database ta b le is repres ent ed by an inst ance of class org. hsqldb .Table , compr ising: data struc tures for t he management o f content, meth ods for c r eating a new table, and operations of insert/select rows. W hen inse rting a new row, th e meth od insertSingleRow() of th e Tab le cl ass is invoked ; the f irst step is to cre ate a n ew Row object for cac hing dat a in memory, which is done by the me thod getNewCache d Object (Session s, Object [ ] data) of PersistentStore class . Mem ory-ty pe tabl es ar e kept in a balanced tr ee st ructu re (A VL) implem ented in th e c lass org. hsqldb .per sis t.Row Stor eA VLMemory . Once a node ( i.e. the row being i nserted) is bu ilt and added to the AVL ( this operat ion involv es s everal checks on the cont ents of th e fields and of integrity constrain ts), Hy p erSQL writes the r ow into the buf fer a nd th en tran sfe rs it to th e te xt file ( dat a is wri tten to the .log file until shu tdow n of the database) . To p erform this task, the Logger class utilizes the method writ eIn sertSta tem ent(Se ssion s, Tab le t, Ob ject [] data ) , and the method writeIns ertStatem ent() of th e Log class. Writing to the file is done using th e cl ass or g.hsqldb. scrip tio .Scrip tW riter Base ( mor e preci sely, in ca se of me mo ry -t yp e ta b le s, th e ScriptWriterText subclass). The me thod writeRow(Session s , Table t, O bject [] data) of Script Wr iterT ext cla ss writ es d a ta to a tex t b uffe r a nd, at the end of the p rocedure, t ransfer s it to th e file. The buffer (w hich is only a by te[ ]) is enca p sulated in the class RowOutputBase (mor e prec isely, i n case o f memor y-type t abl es, the RowOut putTextLo g subcla ss), w hich extends t he HsqlBy teArrayOut putStream and provid es methods to transform any type of data for s erializing it into the buf fer . Once w riting to th e buf fer is c o mplet ed, the method writeRowOutToFile() of ScriptWriterText c lass is used, which calls the m ethod wr ite(b yte [ ] b) of the class Out putStr eam to write in to the o utput st ream of .log f ile. Wh en shutting d own the dat abase, method writeScript () of Log c lass is i nvoked with the foll o wing task s: creating t emporary file for w r iting .scri p t file, lo ading e ach element of the database into m emory and writin g it to the file b y ex ecuting th e f lus h() of the Outp utStre am connec ted to the file. VI. I MPL EMENTED SOL UTION A. Clie nt side On the client side, using IMD Bs, we have only two interact ions betw een each lo cal agen t an d the Syn chronizer . Fig ur e 3 . Cli ent ’s st ate di agra m Note that the clien t, af ter th e first c o mmu nication w ith the Synchroni zer, ca n run offline. We modifi ed the classes included in file hsqldb. jar to handle the en cryption. The basic idea w as to manage enc ryptio n in th e .log and .script text f iles. The r ows that are owned by the local cl ient are s tore d in clear-t ext, w hile the share d r ows “g ranted” by other owners are store d encry pted. The v alues cont ained in ta bles are st ored in form of S QL insert: INSERT INTO tabl e_nam e(field_1, f ield_2, …, f ield_n) VALUE S(val ue_ 1, val ue_2 , …, va lue_ n) Earlier , to obtain cont rol access granularity at the f ield level, w e encrypte d fie ld by field. This w ay, the text contained in the d atabase f ile was in the form of : INSERT INTO tabl e_nam e(field_1, f ield_2, …, f ield_n) VALUES(pk, en cr ypted_v a lue_2, …, encry pted_value_n) The prim ary key pk must be i n clear- text, sin ce it is use d to retri eve th e de cryptin g key s from th e cent ral Sy nchroni zer. We dropp ed this idea beca use it require s changing the I/ O code for each p ossi ble data base type a nd an attacke r may obtain so me infor mation suc h as table, pr imary key and number o f rows. The cur rent s olution is to encry p t the whole row by AES symmetric algor ithm . The en cryption ove r head is lower t han the previ ous soluti o n and a ll inf ormation is hidden to curi o us eyes. To relat e the encry pted row (stored loc ally) t o the decr ypting key (stored i n the remote Synchro nizer), we use a 34 new key (id_p ending_ro w). The en crypted ro w is prefixed by a clear -text heade r conta ining the id_p ending_ro w delimite d by “$” and “@”. T he encrypt ed value i s then stored i n a hexade cimal repr ese ntation, so a ge neric ro w is of the form: $27@5DAAA ED5DA06A8014BFF305A 93C957D 1) Load time At loa d tim e, the .script f ile w ill contain cle ar-text an d encry pted r ows , e.g.: INSERT INTO studen ts(id,name) VALUES(12,'A l ice'); INSERT INTO stu dents(id,nam e) VALUES(31,' B ob'); $27@5F3C 25EE5738DA AAED5DA06A80F305A 93C95A $45@5DA 67AD A06A AED580FA 914BF3C953057D38 7F INSERT INTO stu dents(id,name) VALUES (23,'Carol' ); The class whose t ask is re ading th e file and loa ding th e appropr iate data in memory is Script ReaderText . Fig ur e 4 . Scri ptR eaderTe xt c lass ’ UM L Th e readLogged State ment metho d pa rses each lin e of text in the .l og or .scr ipt files and forw ards the resu lt to the pr oce ssS ta tem en t meth od, w hich loads data in to m e mory . We change d the read Lo gge dSt atement method to mak e a prep roces sing: if it find s a record head er (e nclosed between $ and @) in the t e xt lin e, it ex tracts t he id_pen ding_ro w_receive d . Usin g this id, the client re quests t o the cent ral Synchro nizer the rela ted decod ing key, which it uses to decry pt the enti re text line an d to p rocee d w ith normal Hyper SQL management. If the d ecoding ke y is unavailabl e, the text line is t e mporarily discarded (it is not delete d if i t w as no t rece ived for communic ation pro blem with the Synchr oniz er). 2) Save time The cla ss ScriptWr iterTe xt manages the write operatio ns in .log an d .scri pt files. The af fected m ethods are write Row and writeRowOutToFile. The form er deals w ith building th e st ring that w ill be written into the t ext file (INSERT INTO ....) w hich corres ponds to the in-m emory data. A Tab le i nstan ce contains the inform ation about the table s truct ure (t able nam e, field nam es, types of da ta, constrai nts, et c.). Th e v alu es of f ields are in an ar ray of Obje ct. The S QL insert is w ritten in a text buf fer that i s stored in the .scr ipt file by the met hod writeRowOutToFile . Becau se each tabl e has an id_pen ding_ro w_receive d column, we modi fied the writeRow met hod t o che ck if the row is owne d or sh ared by an other us er. In the latter case ( i d_pe nding_r ow_rec eive d not null) , the custom writeRowOutToFileCr ypto method is used instead of the origina l writeR owOutToFile meth od . WriteRow OutT oFileCrypt o use s the paramet er id_pen ding_ro w_receive d to query t he re lat ed s ymmet ric encr yptio n key fro m the Synchro niz er, need ed to encr ypt the whole buf fer. The result is a hexad ecim al sequence w hich is prefixe d by th e below header w ith the id _pendi ng_row_ received. Fig ur e 5 . Script WriterText c lass’ UM L 3) Changes We can a lter the original Hy p erSQL in th ree w a ys : • subclass origin al classes an d override th e affect ed me thods • change th e orig inal c ode direct ly • use co de in jection by aspect pr ogra mm ing The firs t is the clean est m ethod, but i t im p lies collaborating with the su pport team of Hy p erSQL to implem ent some inter faces cont aining ne w methods to add ne w feature s in subclass. Lac king it, we had to revert to the second w a y. The third, w hich is less invasiv e and more m aintainable, fo rces to have Aspe ctJ compiler (or equ ivalent ) in the cli ent li brary. The changes regarded the c lasses ScriptRead erText and Script Wr iterT ext only (w ithout changing the classes that use them ). Thes e a re th e class es th at deal with I/O to and f rom the .script fi le. They are ve ry m ature an d stabl e, so w e think tha t a simple substitu tion of its .class f iles is suff icient t o alte r hsqldb.ja r. B. Server side When a data ow ner a dds or u pdates a r o w in the l o cal database, it ne eds to pro pagate it to all th e rel a ted us ers utilizing a central Sy nchronizer ac ting as a m ailbox at serv er side, in th e cloud. It u ses a s imple databas e with the foll owing tables: • Use rs: conta ining, among other s, the id and public key of each user; 35 • P ending Rows: c ontai ning the rows tha t are adde d/m odif ied in th e l ocal ow ner’ s databas e, unt il they are de livered to des tination. A unique r ow_id is automatic ally assigne d to each pending r o w. Addit ional in format ion incl udes: su bmissi on da te, sender an d rece iver. The chan ged r ow is sto r ed in encry pted f orm in the fiel d encrypte d_row ; • De crypti ng keys: co ntains the ke ys that are used to decrypt th e pending r o ws . Additional inf orm ation inclu des: se nder , recei ver, expiry date, i d_row . At chan ge tim e, the ow ner (client si d e) m ust: • se rial ize th e row; • gene rate a symmetri c key to encryp t it; • encrypt th e row; • enc rypt the key using th e public keys o f receivers; • send t he encrypt ed row and the decod ing keys to the receive rs. Because we stor e the s erial ized row, w e need not w orry about colum ns data types. The Sy nchro nizer us es RMI to expo se its servi ces to cl ients. T he servi ces are gro uped in three interfa ces: • KeyInterfa ce w ith methods r elate d to encry p tion k eys: depositKey , deleteDecri ptingKey, getDec riptingK eyByIdP endingRow, getPubl icKeyBy User; • SynInterf ace w ith methods f or shar ing th e row s: sendRo w, getPendingRo wForUser , getAllUser s, resen dRow ; • Regist rationI nterfac e to regi ster and mana ge user s: regist erUse r, Sele ctUser ById, selectUserByIdAndPassword. VII. P ERFO RMAN CES In cont rast to the usual row-leve l encr yption, which nee ds encry ption/ decryption at ever y data access, our s olution u ses these h eavy oper ations onl y when comm unicating w ith Syn chronize r, w ith a clea r advantag e, es pecial ly in the case of rarely modif ied dat abases. A. Read operati ons The sy stem uses decry ption on ly at start tim e, when rec ords are loa ded f rom the dis k int o the main m e mory . Each row is decr ypted none ( if it is owned by loc al node) or j ust once ( if it is owned by a remot e node), so thi s is optimal for read operations . Each decrypti o n i mplies an acc ess to th e remote Synchroni zer to do wnload the rel ated decrypt ing key and, eventually , the modi fied row . B. Writ e operati ons Write operat ions occu r when a re cord is in sert ed / update d into the db, w ith no overload unti l the c lient, wh en onli ne, explicitly synchronizes d ata w ith the central serve r. At t his mom ent, fo r each modifi ed rec ord, the cli ent nee ds to : • gene rate a new (symmetr ic) key • encrypt the r ecord • di spatc h the e ncrypt ed data and t he decr ypting key to the remote synchroniz er C. Benchmark The t est ap plicat ion w e w rote uses our m odifie d HyperS QL driver a nd interacts with the other cli ents through our Syn chronizer. It p erform s these dist inct activ ities: • Creati on of data base and sam ple tables • Population of tables with sample va lues • Shar ing of a portio n of da ta with a nother user • Receipt of sha r ed dossi ers fro m other u sers • Op ening o f the ne wly cre ated (a nd popul ated) d atabase The appl icati o n receiv es three param eters: • Num ber of doss iers • Number of c lients inv olved in sharing • P ercent age o f shar ed do ssier s To m inim ize comm unication delay, th e cent ral Synchroni zer and the cli ents ran on the same comput er. For testing pur pose, it w as sufficient to use o nly tw o clients (to enable dat a sh aring ). Th e a ppli cation was com pared with an equi valent one havi ng the following diffe rence s: • It uses the unmodifi ed HyperSQL dr iver • It do esn’t share dat a with other clie nts • When populatin g the d ataba se, it c reates th e sam e number of dossi ers than the previo us applicat ion; after benc hmarking, ho wever, it adds the number of shar ed dos siers, resulting in the same final number of dossiers. We benc hmarked the system using single -table dossier s of about 20 0 bytes, in two batte ries of tests; the first w ith 20%, and the seco nd with 40% of shared do ssiers, which numbered from 1,000 to 500,000. T he results ar e represe nted by the graphs i n Fig. 6-8. It is worth no ting that the ov erhead perc entag e of th e modifi ed soluti on rapi dly decr eases (w ith 100,000 dossiers it is ar ound 10%), either in the first battery of tests (F ig. 6 ), an d either in the second (Fig. 7). In th e t ests, the total d elay (l oad + create + popu late + receive) is l inear in th e number of dossiers and is limited, even with a huge number of dossiers (Fig. 8). Local results c an be slightly altered by extern al ev ents not preventabl e (e.g. , gar bage c o llec tor ). D. Results The delay o f th e system is tightly boun d to comm unications effor t with central Synchroniz er. Computing over head is limited to just one encry p tion per recor d at write tim e and no more th an on e decry ption pe r re cord at re ad tim e. Since w e use symm etric encrypti on, th ese o peration s are very fast. The benchmark demons trates that th e delay is subs tanti ally conce ntrated i n database opening, while the subs equent use does not in volv e ad ditional del ays , com pared t o the unm odified versio n. 36 0 100, 000 200,000 300, 000 400,000 500,000 0, 00% 20, 00% 40, 00% 60, 00% 80, 00% 100, 00% 120, 00% Benc hmar k ( 20% shar ed) diff cr eat e + rec ei v e (perc ) diff load Num. Dossiers Ov er he ad Fig ur e 6 . Overh ead wh en 2 0% of dos si ers ar e shar ed 0 100,0 00 200, 000 30 0, 000 400 , 000 500 , 000 0, 00% 20, 00% 40, 00% 60, 00% 80, 00% 100, 00% 120, 00% Benc hm ark ( 40% shar ed) diff c reat e + rec eive ( perc ) diff l oad N um. Do ssie rs Ov erhead Fig ur e 7 . Overh ead wh en 4 0% of dos si ers ar e shar ed Fig ur e 8 . Total d ela y VIII. C ONCLUSION In this paper , using IMDBs, we presented a simpl e solutio n to row-leve l encr yptio n of databa ses. It can be used i n the cloud to manage ver y granular access r ights in a highly di stributed datab ase. This allows for stronger confide nce in the pri vacy of shared sen sitive data. An inte resting field of applica tion is the use in (b usiness) coo perati ve environments, e .g. profe ssional netw orks. In these en viron men ts, privacy is a priori ty, but l ow comput ing resource s don't allow the use of slow and compl ex algorithm s. IMDBs an d ou r smart enc ryption, in stead , achi eve the g oal in a mor e eff ectiv e w ay. IX. F UTURE WORK We w ant t o t est th e sy stem in cas e of la rge p opul ation of users in th e clou d. We are workin g to redu ce the num ber of communica tions bet ween local node s and synchronizer using a form of gr oup encry ption . We are g oing to com p are th e complexity of th e naïf s olution with th e group en crypti o n effort to evalu ate w hich are th e pa rameters that affect the perform ance of the tw o alterna tives. A CK NOWL EDG MENT We wish to than k Ernes to Dam iani, professor at Unive r sità di Milano, for helpful comments. Marco Di Paol a contribute d within his grad uation disser tation to the design and implem entation of this application . R EFERENCES [1] E . Dami ani and F. Pa gano, “Hand ling con fid ential d ata on the unt rust ed cloud : an a gent-bas ed appr oach, ” Cloud C omputin g 2010, pp. 61-67 [2] S. De Capita ni di Vimerc ati, S. Forest i and P. Sa marati , “Recen t Advanc es in Acc ess Cont rol,”i n Handbook of databa se secu rity: applic ations an d trends, M. G ertz and S . Jajo dia, Spr inger, 20 0 8, pp. 1- 26 [3] R . Sand hu, E . Coyne, H. Fein stei n, and C. Youman, “Role-B ased Acces s Control Mode ls,” I EEEComputer, Vol .29, Num.2, Fe b ruary 1996,pp. 38-47 [4] L. Boug anim an d Y. Guo , “Databas e e ncry ption,” i n Encyclo pedia of Cryptogra ph y and Secu rity, Sprin ger, 2010, 2nd Editi on [5] E . D a m i a n i , S . D e C a p i t a n i V i m e r c a t i , S . J a j o d i a , S . P a r a b o s c h i , a n d P . Samara ti, “Balan cing co nfidenti ali ty and e fficiency in untruste d relati ona l dbms,” Proceedi ngs of th e 10th ACM con ferenc e on Comput er and c ommunicat ion s securit y, ACM, 2003 , pp. 93-10 2 [6] U. Matt sson , “Databa se Enc rypti on-How to B alanc e Secu rity with Performan ce,” ITtoolbox Dat aba se, July 2005 [7] E . Shmu eli, R. Vai senber g, Y. E lovici , C. Glezer, “Da tab as e encr ypti on: an ov erview of co ntemporar y chal lenge s and design co nsideratio ns,” SIGMOD Record 38 (3), 2009 , pp. 29-3 4 [8] E . Peterson ,”An Overvi ew of Crypt ograph ic Systems an d Encr ypting Datab ase Dat a,” Knol. 2 008, http ://knol. google. com/ k/ erich-p eterson /an- overvie w-of-cr yptograp hic -systems/ 226vt o3111gt2 /2 . [9] H. Garcia-Moli na, K. Salem, “Main Memory Databa se Systems: An Overview,” I EEE Trans. Knowl. Data Eng. 4 (6), 1992, p p. 509 -516 [10] E . D a m i a n i , S . D e C a p i t a n i d i V i m e r c a t i , S . F o r e s t i , S . J a j o d i a , S . Parabo schi , and P. Sam arati, “K ey manageme nt for mult i-user e ncry pted datab ases, ” StorageSS, 2005, pp . 74-83 [11] D. Bone h and M. Ham burg, “ General ize d Identity Based an d Bro adcast Encry ption Sche mes,” ASI ACRYPT , 2008, pp. 455- 470 [12] V. Goyal, A. Ja in, O. Pand ey and A. Sahai , “Bound ed Ci phertext Polic y Attrib ute Ba sed Enc ryption,” ICALP, 2008 , pp. 579 -591 [13] A. Fiat and M. Na or, “Broad ca st Encrypt ion,” CRYPTO, 199 3, pp. 480 - 491 [14] http ://it .toolb ox.com /wiki/i ndex. php/In- Memory_Dat abas e 37

Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Leave a Comment