Vcache: Caching Dynamic Documents

The traditional web caching is currently limited to static documents only. A page generated on the fly from a server side script may have different contents on different accesses and hence cannot be cached. A number of proposals for attacking the pro…

Authors: Vipul Goyal, Sugata Sanyal, Dharma P. Agrawal

Vcache: Caching Dyna mic Documents Vipul Goyal 1 Sugata Sanyal Dharma P . Agrawal Depa rtment o f Computer Scienc e & Engg Schoo l of T ec hnolo gy & Computer Science Cent er for Dis tribut ed and Mobile Computing, E CECS Institute of Technology , Banaras Hindu University Tata Institute of Fundam ental Research University of Cincinnati Varanas i-221005, In dia Mum bai-400005, In dia Cin cinn ati, O H 45221-0030, US A vipulg@cpan.org sanyal@tifr.res.in dpa@ececs.uc.edu 1 Autho r to w hom all communicatio n shoul d be dire cted Abstract − − − − The traditional web caching is currently limited to static docum ents only. A page generated on the fly from a server side script m ay have different contents on differen t accesses and hence cannot b e cached. A num ber of proposals for attack ing the problem have em erged based on th e observation that different instan ces of a dynam ic docum ent are usua lly quite similar in m ost cases, i.e. they have a lot of com mo n HTML code. In this pa per, we fi rst review these related techniques and show their inadequacy for practical use (see section 2). We then present a gener al and ful ly automatic t echnique cal led Vcache based on the dec omposition of dynamic documents into a hierar chy of templates and binding s. The techni que is desig ned keepin g i n mind languages like Perl and C etc that generate the docu m ents u sing lo w-level print lik e sta tem en ts. These languages together, account for the largest num ber of dynam ic documents on the w eb. 1. INTRODUCT ION The World Wide Web is compri sed of static docum ents and dyn ami c documen ts. One central aspect of th e developm ent of the WWW duri ng t he last decade is th e increasi ng u se of dyn am ic docum ents [1, 16]. More and m ore HTML doc uments ar e dynamica lly gener ated on t he fly using a server side scrip t, which are commonl y w ritten in Perl, C, Php, A sp and Js p etc. The trad itional caching fails in case of dyn amic docum ents, since every in stance is g enerated on th e fly usin g a server side script and gen erally cann ot be assu med t o be the sam e as the previous in stance. Hence, s ince a dy nam ic docum ent instan ce should be dow n loaded in its en tirety f or each req uest, the net work ba ndwidth co nsumed a nd more importantly the response time is su bstantial. The motivation for our cachin g s chem e is the observation that the dif ferent in stances of dy nam ic docu men t m ay diff er only slightly in co ntent and usua lly cont ain a number of sections of com mon HTML code. The central idea of Vcache is to decompos e the dy nam ic docum ent into a hierarchy of tem plates and bin dings. The templates can then be cached at th e client side w hile th e bindings are supplied by the server separately f or each instance of that dy nam ic document. This decomposition of dynam ic documen ts is automatic. The decomposition is done by a software running a t the se rver using br anch flo w statistics technique. We w ill call this server softw are as the fragm en tor from he reon. II. RE LATED W ORKS The techniques labelled "dynam ic docum ent cachin g” can be broadly classif ied in to three categ ories: s erver based, prox y base d and clie nt base d. Tho ugh a number of server- ba sed and proxy - based techn iques are availabl e (e.g. [11, 6, 9, 5, 14]), not m any are clien t based. The tech niqu e w e propose here is a client -based techn ique e.g . [7, 12]. Delta encodin g [10] is based on th e observat ion th at m ost dynamically co nstruc ted do cuments have many fragments in comm on w ith earlier versions . Instead of transferrin g the complete docum en t, a delta is com puted representin g the chang es compared to some com m on base. Usin g a cache proxy, the f ull documen t is regenerated n ear the client. A drawback is—in addition to requiring specialized proxies— that it necess itates protocols f or m anag em ent of past versions. Such in trusions can obvious ly limit w idespread use. Furtherm ore, it does not help with repetitions w ithin a sing le docum ent. In the s ys tem [12], dyn am ic HTML docum ents are compo sed o f higher -or der templa tes tha t are plugged together to construct com plete documen ts. A service transmits not the full HTML documen t but instead a compact JavaScript recipe for a client-s ide construc tion of the docum ent based on a static collection of f ragm ents th at can be cached by the brows er in the u sual m ann er. This technique exploits the template m echanism in the language and he nce canno t be exte nded to o ther langua ges with out an inbuilt tem plate mech anism like Perl, C and Jsp. HPP [7] is an HTML extension, w hich allow s an explicit separation bet w een stat ic and dy nam ic part s of a dyn am ically generated docu men t. The static parts of a docu men t are collected in a template file w hile th e dyn amic param eters are in a separate binding file. The tem plate file can contain simple i nstruct ions, akin to embed ded scri pting l anguages such as ASP, PHP, or JSP, specify ing how to assem ble the complete docum ent. In the HPP sy stem , the docum ent construction should be explicitly programm ed an d each document should be manually divided into templates and bindings. The HPP system w ill require every dy nam ic docum ent already present on th e w eb to be reprogram m ed in order to benefit f rom cach ing. Fu ture docu men ts s hould also be program med k eeping in m ind th e HPP cachin g s ys tem and document con struction at the client side. Our solution does not su ffer f rom th ese draw backs. The division of the dynam ic documents in to templates and bindings is fully automatic and applies to the existing documents as w ell w ithout deman ding any changes to them . The program mer n eed not be aw are of the docu m ent division/construction or the HTML extension Vcache requires extra fu nctionality from clien ts and the server as in HP P. The client functionality can be in the f orm of either cach e proxies or brow ser plu g- ins (for docu m ent constr uctio n using templ ates a nd bi ndings) . The server softw are sh ould be m odified to in clude th e frag m entor. No extension to th e HTT P pro tocol howev er, is required. III. THE CACHING SC HEME 3.1 Templates and B indings First we defin e templates and bindings - Definition 1 (Tem p late) A template is a cacheable regu lar HTML file hav ing g aps or discontinuities in it. A part from the regular HTML code, it may contain t he fol lowing new tags- 1) 2) and Definition 2 (Bind ing) A binding is a non- cacheable section of code enclosed between and tags. The tag specifies the tem plate to w hich the bin ding belon gs. The enclosed code m ay contain the follow ing apart from the regu lar HTML code- 1) and tags 2) and t ags 3) an d tags w here n is a positive in teger not equal to zero. 4) Ano ther bind ing These d efinitions an d the meanin g of the n ew tag s w ill be clear from the definition of Plug operator (see next subsection). 3.2 Operators requir ed at the cl ient si de As described before, ou r solut ion requi res extra functionality from th e client as in HPP. This can be in the form of cache proxies, java applets or brow ser plug -ins . The client is required to su pport a set of operators w h ich are defin ed below - Definition 3 (Th e GenerateL ist operator) T he GenerateL ist operator takes a binding as a parameter and returns a list of url' s of templates. These are the templates, w hich are referred in that binding and hen ce w ill be required for documen t construction. This o perator is essentially a recursive parser due to the possibility of another binding in a binding (see definition 2). Definition 4 (Th e FetchList op erator) The FetchList operator accepts th e template URL list generated by the GenerateList operator. It then checks th e brow ser cache and in case a subset of these tem plates is n ot already presen t in the cach e, it fetch es an d sav es that subs et to the cache. Definition 5 (Th e Plug operator) T he Plug operator accepts a binding and a tem plate as parameters and returns a reg ular HTML code cons tructed from them . The Plug operation proceeds according to the followin g simple rules- 1) Every tag in the tem plate is replaced by a HTML code section enclos ed betw een and tags in the binding. This replacing is sequential. 2) The and tags and th e encl osed code in the template are replaced by the HTML code gen erated by plugging the inside co de sep arate with each run of the l oop specified i n th e binding . The nth run of the loop i s specif ied by th e code enclosed ins ide and tags in th e bindin g and all th e runs togeth er are enclosed betw een an d tags (see Fig ure 1). 3) If a binding enclose m ore binding(s ) (w hich m ay again enclose more bin ding(s) and s o on) w ithin th e and tags, th ese tags an d the bindin g(s) inside are replaced by the code gen erated after apply ing the Plug operator recursively to th at binding and the tem plate specified in th e tag. The illustration given (see Figure 1) should make th e Plug operator clear. 3.3 The actual cach ing mechanism T he fragm entor produ ces a set of tem plates from the dyn ami c docum ent at the s erver (see next subs ection). These templates are an alogous to the static docum en ts an d can be cached by the client. For every access to the dy nam ic docum ent, a binding is generated at the server an d is passed to the client. The binding is s pecific to that particu lar access and h ence is n on- cacheable. The client then f etches the un- cached tem plates if any (using the GenerateList and FetchList operators). The binding and its template (as in tag . On encou ntering a branching decision (if st atemen t etc), a tag is in serted in the tem plate and separate templates are constructed recursively f or every pos sible branch at that poin t. This approach is illustrated in Figure 2. How ever, even for m oderately si zed program s, thi s bru te force approach of template gen eration m ay produce a large num ber of templates. T he num ber of templates produced is equal to the num ber of branch flow possibilities of the program. T his places unnecessary burden on the system as some of the tem plates are rarely used (e.g . error/exception handling branches ) and som e others m ay be too sm all in s ize. W e therefore take a variant of th is approach to optim ize the sy stem . The fragm entor (af ter gettin g installed on the serv er) first gathers the branch flow statistics of the dy nam ic docum ents on th e server by analy sin g the clien t requests an d execution of server s cripts (dy nam ic docum ents) to provide the response. After collecting a m eaning ful statistics (say for n runs o f the scrip t), the fragmentor genera tes one template each for all the dom in ating branch sequences /subsequ ences ins tead of g enerating separate template f or every pos sible branch . Hence ef fectively , templates for rarely taken branches are n ot generated at all, w hile the tem plates for a popular bran chin g sequ ence get merg ed into a sin gle tem plate (see Figure 3). Also, the fragmentor does not generate the tem plates w hose size is b elow a lower lim it (say 50 bytes) or duplicate templates (inter- document optim ization). We are currently work ing to m ake a detailed design for th e fragm entor using various optim izations and to implem ent it for P er l la nguage. The imple mentatio n is yet to be finished ; hence th e form al perform ance m etrics f or our techni que are currently unavailable. How ever, it can be expected that w ith a nicely w ritten an d optimizing f ragmen tor, the performan ce should be at par with the tech niques like an d HPP wh ile being general an d automatic at the sam e tim e. IV. CONCLUSION We have propos ed a caching tech niqu e, w hich w e call Vcache, to extend the ex isting client side cach ing mech anism to cover dy nam ic docum en ts. Our approach requires chan ges to the server and the client s oftw are. No chan ge to the HTTP protocol is requ ired. Our approach is g eneral and ful ly autom atic and can be u sed for dy nam ic docum ents desig ned in language s like P er l, C, J sp e tc. The templat e hier arc hy is genera ted using the rea l r un time dat a of the server scri pts. With our technique, the program m er need n ot be aw are of the caching issu es since the decomposition of the dyn amic documents in to templates and bindings is au tomatic. V. REFERENCE S [1] Paul Barford, Azer Bes tavros, A dam Bradley, an d Mark Crovella. Ch anges in w eb client access patterns: Characteristics and caching im plications. World Wide Web Journal , 2(1–2):15–28, Janu ary 1999. Kluw er. [2] Greg Barish and Katia Obraczk a. World Wide Web caching : Trends and techniqu es. IEEE Com m unication s Magazine In ternet Techn ology Series, May 2000. [3] Charles Brooks , Murray S. Mazer, Scott Meeks, an d Jim Miller. Application-specific proxy serv ers as HTT P stream transducers. In Proceedings of the Fourth Internation al WWW Conf erence, Decem ber 1995. [4] Claus Brabrand, An ders Møller, an d Michael I. Schw artzbach. The project. ACM Transactions on Intern et Technology , 2002. [5] Pei Cao, Jin Zhan g, and Kev in Beach. Active cache: Caching dy nam ic conten ts on the Web. In Proceedings of the 1998 Middlew are conf erence, 1998. [6] J im Chal lenger , P aul Da ntzi g, and Arun Iyengar . A scalable sy stem for cons istently caching dy nam ic w eb data. In Pro ceedings of the 18 th Annual Joint Conference o f the IEEE Compu ter and Com m unication s Societies, March 1999. [7] Fred Douglis, A nton io Haro, and Michael R abinovich . HPP: HTML macropreprocess ing to su pport dy nam ic docum ent caching . In Pr oceedings of the 1997 Usen ix Sym posiu m on Internet Technolog ies and Sy stem s (USITS- 97), December 1997. [8] J . Gettys, J. Mogul, H. Fry sty k, L . Masinter, P. Leach , and T. Berners-L ee. Hypertex t transf er protocol, HTTP /1.1. [9] Arun Iy engar an d Jim Challen ger. Im provin g w eb serv er perform ance by cachin g dy nam ic data. In USENIX Sym posiu m on Internet Techn ologies and Sy st ems , December 1997. [10] J effre y C. Mogul, Fred Dougl is, Anj a Feld mann, and Balachander Krishnam urthy . Potential benefits of delta encoding and data com press ion f or HTTP. In S IGCOMM, pages 181–194, 1997. [11] Karthick R ajaman i and A lan Cox . A sim ple and effectiv e caching s chem e for dy nam ic conten t. Technical report, CS De pt., Rice Un iversi ty , Septem ber 2000. [12] Claus Brabrand, Anders Møller, Steff an Olesen, and Micha el I. Sc hwartzb ach. Language -Bas ed Ca ching o f Dy nam ically Gene rated HTML. To appear in World Wi de Web Journal. A vai lable f rom ht t p : / / www. b r i c s . d k / b i gwi g / p u b l i c a t io ns / caching.pdf [13] Anders Sandholm and Michael I. Sch w artzbach. A ty pe sy stem for dy nam icWeb docum ents . In Principles of Pr ogra mming Languages (P OP L’00 ). ACM, 2000. [14 ] B en Smith, Anura g Acharya, T ao Yang, and Huic an Zhu . Exploiting res ult equiv alence in cach ing dy nam ic web content. In USENIX Sym posiu m on Internet Technolog ies and Sy stem s , 1999. [15] Mark Tsimelzon, Bill Weihl, and Larry Jacobs. ESI language specific ation 1 .0. http :// www. e d g e - deli very.o rg/la nguage sp ec 1 -0. html, 2001. [16] AlecWolman . Characterizing w eb w orkloads to im prove perform ance, Ju ly 1999. Univ ersit y of Wash ing ton. [17] Jia Wang. A surv ey of w eb caching sch em es f or the Inter net. ACM Computer Communicatio n Review, 2 9(5 ):3 6– 46, October 1999.

Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Leave a Comment