Malleable Coding: Compressed Palimpsests

Mallea ble Coding: Compressed P alimpsests La v R. V arshney , Graduate Student Member , IEEE, Julius Kusuma, Member , IEEE, and V ivek K Goyal, Senior Member , IEEE Abstract A malleable coding scheme con siders not only compr ession efﬁciency b ut also the ease of a lteration, thus encour aging some form of recycling o f an old compr essed version in th e form ation of a new on e. Malleability cost is the difﬁculty of synchronizin g com pressed versions, and malleable co des are of particular interest when rep resenting in formatio n and modify ing the rep resentation are b oth expensive. W e examine the trade-off b etween comp ression ef ﬁciency and malleability cost under a malleab ility metric deﬁn ed w ith re spect to a string edit distanc e. This prob lem introd uces a metric topolo gy to the compressed domain. W e character ize the achie vable rates a nd malleability as the solution of a su bgraph isomorph ism pr oblem. This can be used to argue that allowing conditiona l entro py of the edited message giv en the o riginal message to grow linearly with block leng th creates an exponen tial incr ease in c ode length. Index T erms data com pression, distributed databases, concur rency co ntrol, Gray codes, subgrap h isomorphism This work was supp orted in part by the NSF Graduate Research F ellowsh ip, Grant CCR-0325774 , and Grant CCF-0729069 . L. R. V arshne y is with the Department of Electrical Engineering and Computer Science, the Laboratory for Information and Decision Systems, and the Research Laboratory of Electronics, Massac husetts Institute of T echn ology , Cambridge, MA 02139 USA (e-mail: lrv@mit.edu). J. K usuma i s wi th S chlumberg er T echnology Corporation, Sugar Land, T X 774 78 USA (e-mail: kusuma@alum.mit.edu). V . K. Goyal is with the Department of E lectrical Engineering and C omputer Science and the Research Laboratory of Electronics, Massa chusetts Institute of T e chnology , Cambridge, MA 02139 USA (e-mail: vgoyal@mit.edu). August 22, 2 018 DRAFT 1 I . I N T RO D U C T I O N The so urce c oding theo rem for block code s is o btained by calculating the number of typical source sequen ces and ge nerating a s et of labels to en umerate them. Asymptotically almost surely (a.a.s), only typical sequence s will oc cur so it is sufﬁcient that the s et of labels be as la r ge as the se t of typical sequen ces; this yields the achiev able entropy bo und. A s Shanno n comments, 1 “The high p robability group is c oded in an arbitrary o ne-to-one way into this set, ” a nd so in this sens e there is no notion of topology of typical seque nces. If one is concerned with zero error rather than a.a.s. negligible error , the sou rce c oding theorem for variable-length c odes a lso y ields the e ntropy as an ach iev able lower bound . In this setting, the mapp ing from source sequence s to labels is not allo wed to be qu ite as arbitrary; howev er , as long as a n optimizing set of code lengths is correctly matched to source letters, there are still s ome arbitrary ch oices in a n optimal construction [2]. In contrast to these we ll-kno wn settings, we in vestigate the ma pping from the s ource to its compressed representation moti v ated by the follo wing problem. Su ppose that after comp ressing a source X n 1 , it is modiﬁed to become Y n 1 according to a memoryless ed iting process p Y | X . A malleable coding sch eme preserves so me portion of the codeword of X n 1 and modiﬁes the rema inder into a n ew codeword from which Y n 1 may be decode d reliably . There are sev eral ways to deﬁne how o ne preserves some portion of the c odeword o f X n 1 . He re we concen trate on a malleability cost deﬁned b y a no rmalized ed it distanc e in the co mpressed domain. This is moti vated b y syste ms wh ere the old codeword is stored in a rewritable medium; c ost is incurred when a s ymbol has to be cha nged in v alue, regardless of the loc ation. Reca lling the ancient practice of scraping and ov erwriting parch ment [3], we call the storage medium a compressed palimpsest and the characterization of the trade-offs the palimpses t problem . A comp anion paper [4] focu ses on a d istinct problem with a s imilar moti vation. The re, we ﬁx a part of the o ld codeword to be recycled in c reating a c odeword for Y n 1 . W it hout los s of gen erality , the ﬁxed portion can b e taken to be the beginning o f the co deword, so the ne w cod ew ord is a ﬁxed preﬁx followed by a new sufﬁx. This formulation is suitable for applications in which the update information (new su f ﬁx) must be transmitted throu gh a communication channe l. If the locations of the change d symbols were to be arbitrary , one would nee d to ass ign a co st to the indexing of the locations. The main result for the palimpsest problem is a graphica l c haracterization of a chiev able rates and 1 From [ 1] wit h emphasis added. August 22, 2 018 DRAFT 2 number of ed iting operations. Th e result in v olves the solution to the error -tol erant attrib uted subgraph isomorphism problem [5], w hich is es sentially a graph embed ding problem. Although graph fun ctionals such a s inde pende nce nu mber [6] and c hromatic n umber [7] 2 often arise in the solution o f information theory problems, this seems to be the ﬁrst time that the s ubgraph isomo rphism problem has arisen. Moreover , this seems to be the ﬁrst treatment of the s ource c ode as a mapping betwee n metric spac es. Several of the results we ob tain a re pess imistic. Unles s the old source and the ne w s ource are very strongly c orrelated, a large rate p enalty mu st be paid in o rder to h av e minimal malleab ility cos t. Similarly , a large malleability co st mus t be inc urred if the rates are req uired to b e c lose to entropy . Outline an d Pre view: The remainder of the pa per is organized as follows. In Se ction II, we presen t a few toy examples of coding methods that exhibit a large range o f possible trade -of fs. S ection III provides additional motiv ation an d context for o ur work. Section IV the n provides a formal problem statement, and constructiv e c oding tech niques paralleling those previe wed in Section II are developed prec isely in Section V. In Sec tion VI, graph embe dding tec hniques are use d to spec ify ach iev able rate–malleability p oints. In particular , Sec tion VI-A deals with Hamming distance as the e diting c ost a nd prop oses a construction using Gray codes . Lower bounds and c onstructiv e examples using letter -by-letter enco ding and deco ding are g i ven. This graph embedd ing approach is ge neralized in Sec tion VI-C to include other edit distances via gene ralized minimal chang e c odes. While the ab ove de lay-free encoding and decod ing gives optimal resu lts for a few spec ial cas es, we consider a more ge neral coding approach in Section VII, co nsidering both variable-length and block codes. In the latter ca se, we show that the topolog y o f typ ical seque nces plays an important role in o ur problem. Us ing graph-theo retic ideas , we give an ach iev ability result in Theo rem 2. Fu rther , in The orem 3 we argue that a linear reduction in malleability is at expone ntial cost in compression efﬁciency , co nsistent with the examples giv en in Sec tion II. This theo rem is proved for “stationary editing d istrib utions, ” though we believe it to be true for gen eral distributi ons. In T heorem 4, we giv e a n u pper bound on malleability cost using the Lipsch itz constan t of the source co de map ping for ge neral distrib utions. Section VIII p rovides some ﬁnal o bservations o n the trade -of f betwe en malleability cost a nd compres- sion efﬁciency , gives some conclus ions, and discu sses future work. 2 The chromatic number of a graph can be related to its genus (which is deﬁned by the topological embedding of the graph into closed, oriented surf aces [8], [9]), howe v er our interest is in metric graph embedding rather than topological graph embedding. August 22, 2018 DRAFT 3 P S f r a g r e p l a c e m e n t s a) b) c) d) n np nH ( X ) 1 2 nH ( X ) n ( H ( X ) + H ( Z )) nH ( Z ) 2 nH ( X ) 2 compressed size editing co st Fig. 1. Qualitativ e representation of the four simple techniques of Section II. For ease of representation, it is assumed that H ( X ) = H ( Y ) . The relativ e orderings of points are based on H ( Z ) ≪ H ( X ) ; this reﬂects the natural case where the editing operation is of lo w complexity relati ve to the original string. I I . S I M P L E E X A M P L E S T o motiv ate this exposition prior to deﬁning all q uantities precisely , we begin by gi ving four examples of how one can trade o f f be tween co mpression efﬁciency and malleability . Let X , Y , and Z be binary variables with e ntropies H ( X ) , H ( Y ) , a nd H ( Z ) , resp ectiv ely . Suppo se that the original obs ervation is a word X n 1 . After compressing X n 1 , the original source is modiﬁed by adding a binary seque nce Z n 1 with Hamming weight n p to obtain a n ew word Y n 1 = X n 1 ⊕ Z n 1 . Su ppose the storage a lphabet is a lso binary and tha t the cos t of synch ronization is measured with the extended Hamming d istance. Unlike many source coding problems whe re only the cardinality of the set of cod ew ords is used, here the alphabet itself is u sed to meas ure malleability cost; an abs tract se t of indices is n ot appropriate. How might the c ode for X n 1 and the up date mechanism to allow re presentation of Y n 1 be designe d? The four possibilities below are summarize d in Fig. 1. a) No compression: W e store n bits for X n 1 . Hence sync hronizing to the new version only requires changing the same number of bits in the code as we re changed from X n 1 to Y n 1 ; the cost is the Hamming weight of Z n 1 , np . b) Fully compress X n 1 and Y n 1 : W e apply Sh annon-type comp ression, storing only nH ( X ) bits for X n 1 . It s eems, howe ver , tha t a large portion of this old codeword will have to be cha nged—perha ps August 22, 2018 DRAFT 4 about half the bits—to become a represe ntation for Y n 1 . Compress ion efﬁciency is obtained a t the cost of malleability . c) Fully compr ess X n 1 and an incr e ment: Another coding strategy is to compress the change Z n 1 separately and appe nd it to the original compress ion of X n 1 . The new c ompression then has length n ( H ( X ) + H ( Z )) ≥ nH ( Y ) bits. The extended Ha mming mallea bility cost is nH ( Z ) bits. d) Completely favor malleability over comp r ession: Interes tingly , there is a metho d that dramatically trades compression e f ﬁciency for malleability . 3 The source X n 1 is encoded with 2 nH ( X ) bits, using a n indicator function to den ote which of its typical s equenc es was observed. Th e same strategy is us ed to encode Y n 1 , us ing 2 nH ( Y ) bits. Then synchron ization requires cha nging only two bits wh en X n 1 and Y n 1 are d if feren t. Our purpose is to study the limits of this interesting trade-of f be tween c ompression efﬁciency and malleability . W e will do so using formalized p erformance metrics a fter a bit mo re backg round. I I I . B AC K G RO U N D Our stud y of malleable c ompression is motiv a ted by information s torage systems tha t store doc uments which are up dated often. In su ch s ystems, the storag e costs include not on ly the average length of the coded signal, but also the costs in upd ating. W e describe these systems and also discu ss an information storage system in syn thetic biology , where the editing cos ts are much more signiﬁca nt and restricti ve than in optical or magne tic sys tems. A. V ersion Management Consider the installation of a secu rity pa tch to an operating sy stem, the upda te of a text d ocument after proofreading, the storage of a comp uter ﬁle backup system after a day’ s work, or a s econd e mail that corrects t he location of a seminar yet also reproduc es the entire seminar abstract. I n all o f these settings and numerous others, s eparate data strea ms may be ge nerated, but the contents differ only slightly [10], [11], [12], [13]. Moreover , in these a pplications, old versions of the stream ne ed not be pres erved. Particularly for devices suc h a s mobile telephone s, whe re memory size and ene r gy are severely c onstrained, but for any storage system, it is a dvisable to reduc e the space taken by d ata and also to reduc e the energy required to insert, delete, and modify stored data. In c ertain ap plications, in-place reconstruction is d esired [12], neces sitating the use of instantane ous source codes . 3 Due to Robert G. Gallager . August 22, 2018 DRAFT 5 Recursive e stimation an d c ontrol a lso require temp orarily storing state estimates and upda ting them at each time step. Thus such problems also sugge st themselves a s application areas for malleable coding. Note that the application of ma lleable codes would determine how information storag e is carried out, not what information is stored and what information is dissipated [14]. In the scena rios discu ssed, n ew versions will be correlated with old versions, n ot inde pende nt as assume d in previous studies o f write-ef ﬁc ient memories [15], [16]. Th at is, we en vision scena rios which in volve upd ating Archime des of Siracusa with Archi medes of Syracuse (Levenshtein distan ce 2 ) rather than up dating with Jesus of Nazareth (Lev enshtein distance 15 ), though the resu lts will apply to the entire gamut of scena rios. There is a lso ano ther difference b etween the prob lems we formulate and prior work on write-ef ﬁc ient memories. In write-efﬁcient memories, the enco der can look to see wha t is already stored in the memory before de ciding the codeword for the upda te. An information pattern ev en more extensive than for write- efﬁcient memories was disc ussed in [17]. W e require the c ode to be de termined be fore the encoding process is carried out. Su ch an information pattern would arise na turally in remo te ﬁle syn chronization [13]. Once the co dewor d of the new version is determined (without ac cess to the realized c ompressed old version), there may b e se ttings where the dif ferences be tween the tw o must be determined in a d istrib uted fashion. For a good ma lleable code, the old an d ne w codewords will be strongly correlated. Thus, protocols for d istrib u ted reconc iliation of correlated s trings may be us ed [18], [19], [13]. B. Gen etic Cod ing W ith recent advances in biotec hnology [20], the s torage of a rtiﬁcial mes sages in DN A strings seems like a real p ossibility , rather tha n just a laboratory p ipe d ream [21]. Thu s the s torage of messa ges in the DN A o f living organisms as a long -lasting, high -density data s torage medium provides ano ther motiv a ting application for malleab le c oding. Note that a lthough minimum chan ge co des, a s we will develop for the palimpsest problem, hav e bee n suggested a s an explanation for the genetic code through the optimization approach to biology [22], here we are co ncerned with syn thetic biolog y . As in magnetic or optical storage and perhap s mo re so, it is desirable to compress information for storage. For a pa limpsest system, on e would use s ite-directed mutage nesis [23, Ch. 7 ] to p erform editing of stored codewords whereas for the formulation o f malleable c oding in [4], molecular b iology cloning techniques u sing restriction enzy mes, oligonucleotide synthe sis or p olymerase cha in reaction (PCR), an d ligation [23, Ch. 3 ] would be used. In site-directed mutagene sis when multiple ch anges ca nnot be made August 22, 2018 DRAFT 6 using a s ingle primer , the co st of a s ingle insertion, dele tion, or su bstitution is app roximately the sa me and is additi ve with respe ct to the n umber o f edits. Using restriction enzyme methods with oligonucleotide synthesis , howe ver , the cos t is related to the len gth of the n ew segment that must be syn thesized to replace the old segmen t. Thus the biotechnical ed iting c osts c orrespond exac tly to the costs d eﬁned in the present paper and in [4]. Unlik e mag netic or optical storage , inse rtion a nd de letion are natural op erations in DN A information s torage, the reby a llo wing variable-length co des to be ea sily edited. Incidentally , insertion an d deletion is also possible in neu ral information storage throug h mod iﬁcation of ne uronal a rbors [24 ]. I V . P R O B L E M S T A T E M E N T After a few requisite deﬁnitions, we will p rovide a formal stateme nt of the palimpse st problem, which takes e diting cos ts a s we ll a s ra te costs into acc ount. The s ymbols of the storage medium are drawn from the ﬁnite a lphabet V . Note that unlike mos t so urce coding prob lems, the alphabet itself will be used, n ot just the ca rdinality of s equen ces d rawn from this alphabet. Also, it is natural to measure all rate s in n umbers of symbols from V . This is an alogous to using base- |V | logarithms in p lace of base-2 logarithms, a nd all logarithms s hould be interpreted as such. W e requ ire the notion of an edit distance [25] on V ∗ , the se t of all ﬁnite seque nces of elements o f V . Deﬁnition 1: An edit distance , d ( · , · ) , is a fun ction from V ∗ × V ∗ to [0 , ∞ ) , deﬁn ed by a set of edit operations. The edit operations are a symmetric relation on V ∗ × V ∗ . The edit distance between a ∈ V ∗ and b ∈ V ∗ is 0 if a = b and is the minim um number of edit operations nee ded to transform a into b otherwise. An example of a n edit distan ce is the Levenshtein distance , which is con structed from inse rtion, deletion, and su bstitution operations. It ca n be noted that ( V ∗ , d ) is a ﬁnite metric spa ce (see Append ix A). Now we can formally deﬁn e our co ding problem. W e deﬁne the variable-length and block coding versions together , drawing distinctions only whe re neces sary . Symbols are reused so a s to co nserve notation. It should be clear from co ntext wh ether we are d iscussing variable-length or block coding. Let { ( X i , Y i ) } ∞ i =1 be a sequenc e of indepe ndent d rawings o f a pair of rand om v ariables ( X, Y ) , X ∈ W , Y ∈ W , where W is a ﬁnite set and p X,Y ( x, y ) = Pr[ X = x, Y = y ] . The marginal distributions are p X ( x ) = X y ∈W p X,Y ( x, y ) and p Y ( y ) = X x ∈W p X,Y ( x, y ) . When the random variable is clear from context, we write p X ( x ) a s p ( x ) and so on. August 22, 2018 DRAFT 7 A modiﬁcation chan nel p Y | X ( y | x ) = p ( x, y ) p ( x ) relates the two marginal distrib utions. If the joint distrib ution is suc h that the mar ginals a re eq ual, the modiﬁcation chann el is said to perform stationar y editing . V ariable-length Codes : A variable-length en coder with bloc k leng th n is a ma pping f E : W n → V ∗ , and the correspond ing deco der with block length n is f D : V ∗ → W n . The enco der and dec oder deﬁne a vari able-length palimpse st code. The encoder and dec oder pair is required to be instantan eous, in the sen se tha t the enc oding may be parsed as a suc cess ion of c odewords. A (vari able-length) enc oder-decoder with b lock length n is a pplied as follo ws. L et ( A, B ) = ( f E ( X n 1 ) , f E ( Y n 1 )) , inducing random variables A and B that are d rawn from the alpha bet V ∗ . Also let ( ˆ X n 1 , ˆ Y n 1 ) = ( f D ( A ) , f D ( B )) . Block Codes : A b lock e ncode r for X with parame ters ( n, K ) is a mapping f ( X ) E : W n → V nK , and a block encod er for Y with parameters ( n, L ) is a mapping f ( Y ) E : W n → V nL . Gi ven thes e enc oders, a co mmon decode r with parameter n is f D : V ∗ → W n . The encoders a nd decoder deﬁne a block palimpsest code. S ince there is a common decoder , the two codes should b e in the same format. A (block) enco der-decoder with pa rameters ( n, K, L ) is app lied as follows. Let ( A, B ) = ( f ( X ) E ( X n 1 ) , f ( Y ) E ( Y n 1 )) , inducing random variables A ∈ V nK and B ∈ V nL . The mappings are depicted in Fig. 2. Also let ( ˆ X n 1 , ˆ Y n 1 ) = ( f D ( A ) , f D ( B )) . August 22, 2018 DRAFT 8 X A B Y n n p ( X ) p ( Y | X ) p ( Y ) p ( A ) p ( B | A ) p ( B ) 1 1 Fig. 2. Distributions in representation space induced by distributions in source space . For bo th variable-length a nd block co ding, we c an deﬁne the error rate as ∆ = max(∆ X , ∆ Y ) , where ∆ X = Pr[ X n 1 6 = ˆ X n 1 ] and ∆ Y = Pr[ Y n 1 6 = ˆ Y n 1 ] . Natural (and completely c on ventional) p erformance ind ices for the code are the p er- letter a verage lengths of the codewords K = 1 n E [ ℓ ( A )] , and L = 1 n E [ ℓ ( B )] , where ℓ ( · ) denotes the leng th of a sequence in V ∗ . (In the block coding c ase, A ha s a ﬁxed length of nK letters from the alphabet V , so there is n o c ontradiction in using the p reviously-deﬁned symbol K . Similarly for L .) The ﬁnal performance measu re c aptures ou r n ovel co ncern with the c ost of changing the comp ressed version. The malleability c ost is the expec ted per- source-letter edit distanc e betwee n the co des: M = 1 n E [ d ( A, B )] . August 22, 2018 DRAFT 9 X n 1 Y n 1 A B p ( X , Y ) p ( A, B ) Fig. 3. Commutati ve diagram for t he palimpsest problem. Deﬁnition 2: Given a s ource p ( X, Y ) a nd an edit distance d , a triple ( K 0 , L 0 , M 0 ) is said to be achievable for the variable-length palimpsest problem if, for arbitrary ǫ > 0 , there exists (for n sufﬁciently lar ge) a vari able-length palimpse st c ode with error rate ∆ = 0 , av erage codeword le ngths K ≤ K 0 + ǫ , L ≤ L 0 + ǫ , a nd malleability M ≤ M 0 + ǫ . Deﬁnition 3: Given a s ource p ( X, Y ) a nd an edit distance d , a triple ( K 0 , L 0 , M 0 ) is said to be achievable for the bloc k palimpses t proble m if, for a rbitrary ǫ > 0 , there exists (for n sufﬁciently lar ge) a block pa limpsest c ode with error rate ∆ < ǫ , average codeword lengths K ≤ K 0 + ǫ , L ≤ L 0 + ǫ , a nd malleability M ≤ M 0 + ǫ . For the variable-length palimpsest problem, the set of ach iev able ra te–malleability triples is denoted by P V ; for the bloc k version, the co rresponding s et is denoted by P B . It will be our purpose to characterize P V and P B as much as poss ible. It follo ws from the deﬁnition that P V and P B are c losed s ubsets of R 3 and have the property tha t if ( K 0 , L 0 , M 0 ) ∈ P , then ( K 0 + ǫ 1 , L 0 + ǫ 2 , M 0 + ǫ 3 ) ∈ P for any ǫ i ≥ 0 , i = 1 , 2 , 3 . Consequ ently , P V and P B are c ompletely deﬁned by their lower bound aries, which too are clos ed. Both versions o f the pa limpsest problem can be viewed using the diagram in Fig. 3. Giv en p ( X, Y ) and thus p ( X ) , p ( Y ) , and p ( Y | X ) , the malleability c onstraint deﬁne s what is a chiev able in terms of p ( A, B ) with the additional cons traints that the re mus t be maps betwee n X n 1 and A , and be tween Y n 1 and B , wh ich allow for los sless or near los sless comp ression. An a lternati ve formulation as the ma pping between two metric spaces W n and V ∗ is also po ssible. V . C O N S T RU C T I V E P A L I M P S E S T E X A M P L E S Having formulated the palimpses t prob lem in Section IV, we prese nt some examples o f what can be a chieved. Thes e examples revisit Se ction II. New examp les given in Section VI will inspire ge neral statements. August 22, 2018 DRAFT 10 A. Sou r ce Coding with No Compression The simplest compression scheme is one t hat simply copies the source sequence s to the storage medium. This is only pos sible when W = V . Whe n W 6 = V , zero-error cod ing without compression is possible with b lock lengths larger tha n 1 , as in con verting hexade cimal digits to binary digits or vice versa. The ﬂexibility in s uch a mapping can be exploited. If the shortes t pos sible block ing is us ed and l is the least common multiple of |V | and |W | , then there are l ! valid mappings. For the mome nt, we igno re the gains to be had by exploiting this ﬂexibility and focus on the W = V ca se, with block leng th n = 1 . T aking A = X and B = Y , it follows immediately that K = 1 a nd L = 1 . 4 It also follo ws that the malleability cost is M = E [ d ( X, Y )] . If we take the edit distanc e to be the Hamming distance , then M = Pr[ X 6 = Y ] . Thus the triple ( K, L, M ) = (1 , 1 , Pr [ X 6 = Y ]) is a chiev able b y no c ompression for any source distribution p ( X , Y ) und er Hamming ed it distan ce. B. Ignore Malleability Consider wh at hap pens when the malleability parame ter is igno red and the rates for the variable- length enc oder a re optimized. W e will improve rate performance an d ho pefully not worsen mallea bility too much. If the updating proce ss p Y | X is stationary , the n a c ommon instantaneo us code may be used to asy mp- totically ac hiev e K = H ( X ) an d L = H ( Y ) . Picking a sing le code for different sou rces has been well-studied in the sou rce coding literature, starting with [26]. If a single sou rce code is used for a collection of distributions, the rate los s over the entropy lower b ound is termed the red undancy [27]. As shown by Gilbert, if Huffman or Shannon codes a re used, this redundan cy is the relativ e en tropy betwe en the source and the random variable used to design the cod e. Restricting to s uch ins tantaneou s codes , if the palimpsest c ode is designe d for e ither p ( x ) or for p ( y ) , the incurred redunda ncies are the relativ e e ntropies D ( p X k p Y ) = X x ∈W p ( x ) log p ( x ) p ( y ) or D ( p Y k p X ) = X y ∈W p ( y ) log p ( y ) p ( x ) respectively . The se lead to ho rizontal and vertical portions of a lower bou nd for P V in the K – L plane. 4 Remember t hat rates K and L are measured in letters from V , not in bits. August 22, 2018 DRAFT 11 An intermediary portion of this lo wer bound, between the vert ical and horizontal portions, is determined by ﬁnding a random variable Z that is be tween X and Y and design ing a code for it. W e want to choose some “tilted” distributi on, p Z , on the geod esic between the two distrib utions p X and p Y . If p Z = 1 2 p X + 1 2 p Y , then D ( p X k p Z ) + D ( p Y k p Z ) is called the capa citory discrimination [28]. Th e rate los s in the balanced rate loss ca se, D ( p X k p Z ) whe n D ( p X k p Z ) = D ( p Y k p Z ) , ha s a closed form expression [29]. The distribution p Z used to achiev e it is halfw ay (in the asymmetric sense of Y after X ) along the geodes ic that c onnec ts the two distributions. The distanc e alon g the geode sic may be parameterized by t = D ( p Y k p X ) D ( p Y k p X ) + D ( p X k p Y ) . The resulting rate loss for Z t is D ( p X k p Z t ) = R ( p X , p Y ) + log µ ( t ) , where R ( p X , p Y ) is deﬁned through 1 R ( p X , p Y ) = 1 D ( p X k p Y ) + 1 D ( p Y k p X ) , and µ ( t ) = X x ∈W p 1 − t X ( x ) p t Y ( x ) . Notice that due to the asymme try of the re lati ve entropy , this is different than the Ch ernoff information. In ge neral, the con necting portion be tween the ho rizontal and vertical parts of the lower bound is curved below the time-sharing line, determined by the relati ve entropies D ( p X k p Z ) and D ( p Y k p Z ) for a Z that is a long the geod esic conn ecting the two distributi ons. Fig. 4 shows an exa mple of this a chiev a ble lo we r bound. If the restriction to instantaneous code s is removed, then there are s ev eral kinds of uni versal s ource codes that ac hiev e the K ≥ H ( X ) and L ≥ H ( Y ) bounds simultane ously [27], [30], howe ver ins tanta- neous codes a re required by the palimpsest problem s tatement. These results sa y nothing ab out M , they only d eal with K and L . T o say so mething about M , one can sh ow that the average starting overlap is rather s mall [31]. Since optimal s ource co des produce equiprobable outpu ts [32], one might h ope that computing M is a matter of measuring the expected ed it distance between two random equiprobable seq uences [33 ], but optimizing the depe ndence betwee n these two se quenc es is actually the problem to be solved. August 22, 2018 DRAFT 12 P S f r a g r e p l a c e m e n t s H ( X ) H ( Y ) H ( X ) + D ( X k Y ) H ( Y )+ D ( Y k X ) K L Fig. 4. K – L regio n achie vable using instantaneous codes for sources related through non-stationary editing. The marked point is when rate loss f or both versions is balanced. The diagonal line segment sho ws the subop timal strategy of time-sharing. C. Sou r ce Coding with Incremental Compression One might compres s the original source using an optimal so urce code, thereby achieving the K ≥ H ( X ) lo w er b ound with e quality . Then one may produce an optimal source code for the innov a tion separately , with rate H ( Y | X ) . Thus the new version would b e represented by conc atenating the two pieces, with L = H ( X ) + H ( Y | X ) = H ( X , Y ) . Under extend ed Hamming edit dis tance, the difference between the original source code and the new version which has a new piece co ncatenate d would be M = H ( Y | X ) . Separate compression o f the innovation has the advantage that X n 1 can be recovered from B , howe ver this was not a req uirement in the problem formulati on and is thus wasteful. Suc h a coding scheme is useful in differential en coding for version man agemen t s ystems whe re a ll versions shou ld be recoverable. Results would bas ically follow from the chain rule of entropy [34] or from succ essive reﬁnability for a lossy version of the prob lem [35]. D. Sou r ce Coding with Pulse -P osition Modulation Another coding strategy is to signiﬁcantly back of f from achieving goo d rate performance so as to achieve very good ma lleability . In particular , we describe a c ompression s cheme tha t requ ires only 2 su bstitution edits for any mo diﬁcation to the source , and so the value of M achieved goes to 0 asymptotically . August 22, 2018 DRAFT 13 W e represent any of the possible |W | n sequen ces that can occur as X n 1 or Y n 1 by a pulse-position modulation sc heme. In pa rticular , we use only two letters from V , which we c all 0 and 1 without loss of generality . The codebo ok is the se t of binary s equen ces o f len gth |W | n with Ha mming weigh t 1 . Each possible source seque nce is a ssigned to a distinct c odeboo k entry , thus mak ing ∆ = 0 . Now modifying any s equenc e to any o ther se quenc e entails c hanging a single 0 to a 1 and a single 1 to a 0 . Computing the p erformance criteria, w e g et that K = L = 1 n |W | n , a nd so are paying an exponential rate penalty over s imply enumerating the source se quenc es. The payof f is that M = 2 n Pr[ X n 1 6 = Y n 1 ] . This is true univ ersally , even if X an d Y are independ ent. No te that if a.a.s . no error is des ired, then only typical source se quenc es nee d to have codewords assign ed to them, a nd s o K = L = 1 n 2 n max( H ( X ) ,H ( Y )) (where H ( X ) and H ( Y ) are here in bits) has the same effect on M . Pulse-position modulation is als o a possible sche me for ach ieving channel c apacity p er unit cost [36], where an exponential sp ectral efﬁciency p enalty is paid in order to have very low power . V I . S O U R C E C O D I N G W I T H G R A P H E M B E D D I N G Before co nstructing an examp le, let us d ev elop some lower bou nds for a rbitrary so urces p ( X , Y ) . From the so urce cod ing theorems, it follows that K ≥ H ( X ) and L ≥ H ( Y ) . W e obs erve that since distinct codewords must h av e an ed it distance of at lea st on e, we can lower bound M by as suming that d istance 1 is achie ved for all co dew ords. Th en the edit distance is s imply the p robability of error for uncod ed transmission of p ( x ) throu gh the c hannel p ( y | x ) , since e ach error gives ed it distance 1 an d e ach correct reception gives edit distanc e 0 . Thu s for n = 1 , M ≥ P x ∈W P y ∈W : y 6 = x p ( x, y ) . For larger n , the bound is similarly derived to be M ≥ 1 n X x n 1 ∈W n X y n 1 ∈W n : y n 1 6 = x n 1 p ( x n 1 , y n 1 ) . (1) A weaker , s impliﬁed version of the bound is M ≥ 1 n . As will be evident in the sequel, this wea ker bou nd is related to Lipschitz constan ts for the mapping from the s ource s pace to the repres entation space . A. Graph Embedding using Gray Code s Now we construct an example that simultaneou sly achieves the rate lower bounds and the malleability lower bo und (1). C onsider a memoryless sou rce p ( x ) with alpha bet W = { k , K , G , g , j , J , C ,  } , such that each letter is drawn equiprobably . 5 Then the original version of the source has entropy 5 The scholar of linguistics and coding theory will notice t he relev ance of the order in which the alphabet is written [37]. August 22, 2018 DRAFT 14 3 bits. Consider the relationship be tween X and Y gi ven by a noisy typewriter channel, with channe l transmission matrix p ( y | x ) =                       1 2 1 4 0 0 0 0 0 1 4 1 4 1 2 1 4 0 0 0 0 0 0 1 4 1 2 1 4 0 0 0 0 0 0 1 4 1 2 1 4 0 0 0 0 0 0 1 4 1 2 1 4 0 0 0 0 0 0 1 4 1 2 1 4 0 0 0 0 0 0 1 4 1 2 1 4 1 4 0 0 0 0 0 1 4 1 2                       . (2) Evidently , the b ound on M is 1 / 2 for n = 1 , foun d by p erforming the summation in (1). Moreover , the marginal distributi on of y is also eq uiprobable from the alpha bet W , which giv es the e ntropy boun d on L to be 3 bits. T ake V to be { 0 , 1 } . Now we d ev elop a binary en coding scheme that h as performance coinciden t with the established inner b ounds, using graph embed ding methods. W e can draw a g raph where the vertices are the sy mbols and the edge s are labeled with the a ssociated proba bilities of transition; the weighted directed edges are c ombined into we ighted undirected ed ges in s ome suitable way . The result is a weighted adjacen cy grap h, a we ighted version of the adjac ency graphs in [6], [7], shown in Fig. 5. Suppose that the edit distan ce is the Ha mming distance. Now we try to embed this ad jacency graph into a hyp ercube of a g i ven size. Since we want the average code length to be small, we ﬁ rst c onsider the hypercube of size 3 . Th e adjacency graph is exactly e mbeddab le into the hype rcube, as shown in Fig. 6. If it were not exactly embedda ble, some of the lo w we ight edges might have to be broken. As an alternati ve to the edge weights being determined from the trans ition matrix, the edge weights may be determined through a joint typicality meas ure (as in the me ssage graph in [38] and in Section VII). After we complete the embedding into the hype rcube, we us e the binary reﬂected Gray code (see e.g. [39] for a de scription) to ass ign codewords through corresp ondenc e. The binary reﬂected Gray code -labeled hypercub e is shown in F ig. 7. Clearly the error rate for this scheme is ∆ = 0 , since the code is lossles s. Sinc e all codewords are of length 3 , clearly K = L = 3 . T o c ompute M , n otice tha t any so urce symbol is perturbed to on e of its neighbors with prob ability 1 / 2 . Further notice that the Hamming distance b etween neighbors in the hypercub e is 1 . Th us M = 1 / 2 . W e have s een that this enc oding scheme achieves the entropy boun ds August 22, 2018 DRAFT 15 Fig. 5. W eighted adjacenc y graph for noisy type writer channel (2). Fig. 6. W eighted adjacency graph for noisy type writer channel embedded in 3-dimensional hypercube. Thick li nes represent edges that are used in the embedd ing. Dotted l ines represent edges in the hypercube that are unused in the embedd ing. Fig. 7. Hypercube graph labeled with binary reﬂected Gray code. August 22, 2018 DRAFT 16 H ( X ) and H ( Y ) . It also ac hiev es the n = 1 lower bound for M and is thus optimal for n = 1 . W e c an further driv e M down by increasing the b lock length. As shown in the following prop osition, if a graph is embeddable in ano ther grap h and we take Cartesian p roducts of e ach with itself, then the resulting graphs o bey the sa me e mbedding relationship. Deﬁnition 4: Co nsider two graphs G a nd H with vertices V ( G ) and V ( H ) and edg es E ( G ) and E ( H ) , respectively . T hen G is said to be embed dable into H if H has a s ubgraph isomorphic to G . That is, there is an injec ti ve ma p φ : V ( G ) → V ( H ) such that ( u, v ) ∈ E ( G ) implies ( φ ( u ) , φ ( v )) ∈ E ( H ) . This is denoted a s G H . Deﬁnition 5: Co nsider two graphs G 1 and G 2 with vertices V ( G 1 ) and V ( G 2 ) and e dges E ( G 1 ) and E ( G 2 ) , resp ectiv ely . Then the Car tesian pr oduct of G 1 and G 2 , de noted G 1 × G 2 is a grap h with vertex set V ( G 1 ) × V ( G 2 ) and for vertices u = ( u 1 , u 2 ) and v = ( v 1 , v 2 ) , ( u, v ) ∈ E ( G 1 × G 2 ) when ( u 1 = v 1 and ( u 2 , v 2 ) ∈ E ( G 2 ) ) or ( u 2 = v 2 and ( u 1 , v 1 ) ∈ E ( G 1 ) ). Pr opo sition 1: If G 1 H 1 and G 2 H 2 , then G 1 × G 2 H 1 × H 2 . A special cas e is that G H implies G × G H × H . Pr oof: See Appen dix B. Cor ollary 1: Let G n denote the n -fold Cartesian produ ct of G and H n the n -fold Cartesian product of H . If G H , then G n H n for n = 1 , 2 , . . . . Pr oof: By induction. Returning to our example, since the embedd ing relation is true for n = 1 , it is also true for n = 2 , . . . , so we ca n e mbed n -fold Cartesian products of the a djacency graph into n -fold Cartesian prod ucts of the hypercub e. Such a scheme would achieve rates o f K = 3 bits and L = 3 b its. It would als o a chieve M of 1 n Pr[ X n 1 6 = Y n 1 ] since the Cartesian produ ct of the adjace ncy graph repres ents exac tly edit costs of 1 . For each n , this matches the lower bound s gi ven in (1 ), and is thus optimal. F urthermore, asy mptotically in n , the triple ( K, L, M ) = (3 , 3 , 0) is ac hiev ab le. One may obse rve that emb eddability into a graph wh ere graph distanc e co rresponds to edit distanc e seems to be s ufﬁcient t o gu arantee good performanc e; we will explore this in de tail in the se quel. But ﬁrst, we p resent a similar but more challeng ing situation as a contrast to the “b est of all worlds” performance we h av e jus t seen. W ith the source alphabet, representation a lphabet, a nd distrib u tion of X remaining the same , let us August 22, 2018 DRAFT 17 suppos e that the relationship between X and Y is giv en by p ( y | x ) =                       2 5 1 5 1 20 1 5 0 0 0 3 20 1 5 3 5 0 0 0 0 1 5 0 1 20 0 3 5 0 0 7 20 0 0 1 5 0 0 3 5 1 5 0 0 0 0 0 0 1 5 3 5 0 0 1 5 0 0 7 20 0 0 3 5 0 1 20 0 1 5 0 0 0 0 3 5 1 5 3 20 0 0 0 1 5 1 20 1 5 2 5                       . (3) One ca n verify that, li ke (2), this is a stationary editing process. Thu s, the rate bounds are unchanged at K ≥ 3 a nd L ≥ 3 . Also, ev a luation of (1) yields the bo und M ≥ 9 20 for block size n = 1 . W e will presently se e that the three lower bou nds cann ot be achieved simultaneous ly , and we will d etermine the best values of ( K, L, M ) for n = 1 . The weighted a djacency graph correspond ing to the new editing proces s is depicted in Fig. 8. Continu- ing to use the Ha mming edit dista nce, to ac hiev e K = 3 , L = 3 , and the M lower bound simultaneo usly would require the e mbeddab ility of the graph of F ig. 8 into the hyp ercube of size 3 . Su ch embedd ing is clearly not possible since tw o node s of the adjac ency g raph have degree 4 , whereas the maximum degree of the hype rcube is 3 . T o achieve the leas t increase in M above the lower b ound (1), we must ad vantageously choos e ed ges in the adjace ncy graph to brea k to c reate embedda bility . (As we will see later , c hoosing the optimal set of ed ges to break in volv es s olving the error -tolerant subgraph isomorphism prob lem.) In this example, the two node s of degree 4 must each have at least on e edge broken. Picking the lo west weight e dges (the two with we ight 1 / 10 ) is clearly the best choice, as the resulting g raph can be embedded in the hypercub e a nd c ost o f the e dits k ↔ G and  ↔ J is increased b y the lea st pos sible a mount (from 1 to 2 ). Eac h of the broken edg es has probability 1 10 · 1 8 , s o M is increas ed above the previously comp uted minimum by 1 40 . Thus we achieve ( K, L, M ) = (3 , 3 , 19 40 ) . W e may alterna ti vely aim for lo wer M at the expe nse of K and L . T o determine whether the lo wer bound (1) ca n be ac hiev ed with K = L = 4 , we need to c heck if the weighted a djacency graph of Fig. 8 can be embedde d in the hy percube of size 4 . Fig. 9 shows that this e mbedding is p ossible, with the cod e giv e n in T able I. Thus on e c an ac hieve ( K, L, M ) = (4 , 4 , 9 20 ) . August 22, 2018 DRAFT 18 Fig. 8. W eighted adjacenc y graph for stationary editing process (3). 0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111 Fig. 9. W eighted adjacency graph for editi ng proce ss (3) embedded in 4-dimensional hyp ercube. Black lines represent edges that are used in the embedding . Gray lines r epresent edges in the hypercube that are unused in the embedding . B. Exten sion to Non -equipr o bable Sources The fact that both versions in the previous example we re equiproba ble and thus uncompres sible might cast do ubt on its gravity . He re we cons ider another example where the s ources are not equiproba ble. W e will make use of variable-length lossless source c odes and the Levenshtein distance as the ed it d istance. The basic edit operations are substitution, ins ertion, an d deletion, as opposed to the Hamming distance T ABLE I C O D E F O R T H E 4 - D I M E N S I O N A L H Y P E R C U B E E M B E D D I N G S H OW N I N F I G . 9 . k 0000 K 0100 G 1000 g 0010 j 0011 J 1001 C 0101  0001 August 22, 2018 DRAFT 19 0 1 00 01 10 11 000 001 010 011 100 101 110 111 Fig. 10. Le vensh tein Edit Di stance Graph for { 0 , 1 } ∪ { 0 , 1 } 2 ∪ { 0 , 1 } 3 . T ABLE I I H U FF M A N C O D E F O R 4 - A RY S O U R C E . x ∈ W p ( x ) f Huf fman ( x ) ℓ Huf fman ( x ) p ( x ) ℓ ( x ) k 1 2 0 1 1 2 K 1 4 10 2 1 2 G 1 8 110 3 3 8 g 1 8 111 3 3 8 where su bstitution is the only edit operation. S imilar to the hyp ercube graph for the Hamming d istance, we can crea te a Levenshtein edit distanc e graph. The Levenshtein graph of b inary strings u p to length 3 is shown in Fig. 10. Consider a memoryles s s ource w ith alph abet W = { k , K , G , g } , with probabilities shown in T able II. Also in T a ble II, we ﬁnd a Huffman co de for the source, wh ich is the best v ariable-length lossless s ource code [2]. Since the mar g inal d istrib u tion p ( x ) is dya dic, it is at the c enter of a code attraction region of the binary Hu f fman c ode an d a chieves the en tropy lower bound exactly [40]: K = X x ∈W p ( x ) ℓ ( x ) = 1 . 75 = H ( X ) = − X x ∈W p ( x ) log 2 p ( x ) . August 22, 2018 DRAFT 20 Fig. 11. Adjacenc y graph for noisy type wr iter-like channel. 0 1 00 01 10 11 000 001 010 011 100 101 110 111 Fig. 12. Adjacenc y graph for noisy type wr iter-like channel embedded in the Leven shtein graph. Now cons ider a chann el that is like the noisy typewriter cha nnel, with cha nnel transmiss ion ma trix p ( y | x ) =          3 4 1 2 0 0 1 4 3 8 1 4 0 0 1 8 1 2 1 4 0 0 1 4 3 4          . (4) Evidently the editing is stationary , so the same Huffm an code is optimal for b oth X and Y . Constructing the ad jacency graph yields Fig. 1 1. This g raph can b e embe dded (with matched verte x labels) in the Levenshtein graph us ing the Huffman ass ignment that we had developed, as shown in Fig. 12. Evaluating the malleability lower bound (1) for n = 1 in this case gives M ≥ X x ∈W X y ∈W : y 6 = x p ( x, y ) = 3 8 . W ith the code that we have use d, we can ach iev e the triple ( K, L, M ) =  7 4 , 7 4 , 3 8  which me ets the n = 1 lo wer bounds tightly , so it is optimal in the compression and malleability sen ses. As b efore, we c an consider Ca rtesian produc ts to redu ce M , howe ver , things are a bit more complicated s ince the Levenshtein graph doe s not grow as a Cartesian produc t. August 22, 2018 DRAFT 21 C. Minimal Ch ange Code s As s een in the previ ous subs ection, Gray code s a nd related minimal cha nge co de constructions se em to play a role in achieving good pa limpsest performanc e. W e revie w minimal cha nge code s and some of their previous uses in c ommunication the ory , pointing out co nnections to our problem. W e use minimal change codes to expand o ur treatment in the pre v ious parts from using just Hamming o r Lev enshtein distances to include gene ral edit distance s. Deﬁnition 6: Le t G be a connec ted graph. The path metric d G assoc iated w ith the grap h G is the integer -v alued metric on the vertices of G which is d eﬁned by s etting d G ( u, v ) eq ual to the length of the shortest path in G joining u an d v . Pr opo sition 2: For any edit distance d : V ∗ × V ∗ → [0 , ∞ ) , there exists a graph G with vertex set V ∗ such tha t its path metric d G = d . Pr oof: Co nstruct a graph on vert ex s et V ∗ by adding an edg e for a ny pair of vertices A, B ∈ V ∗ such tha t d ( A, B ) = 1 . Deﬁnition 7: An orde red cod ebook ( A i ) , i = 1 , 2 , . . . , | ( A i ) | , A i ∈ V ∗ is a minimal change code with respect to edit dista nce d if it is a Hamiltonian path in a su bgraph o f the graph on V ∗ assoc iated with d . Our deﬁnition of minimal ch ange c odes is a ge neralization o f Gray c odes, wh ich are Hamiltonian paths through the hype rcube ass ociated with Ha mming distance [41]. Other minimal ch ange c odes may include Hamiltonian p aths through the Levenshtein graph (Fig. 10), the de Bruijn g raph, or the graph induced by Dob rushin’ s distance func tions for insertion/deletion chann els [42]. There are countless other edit distances with numerous minimal cha nge code s corresp onding to e ach. Minimal cha nge cod es have bee n used p reviously in the architecture design of parallel comp uters and in switching theory , among other places. Of p articular interest to us, h owe ver , is their us e in joint source-cha nnel cod ing (JSCC) [43]. T here are rela ted problems in signal constellation labeling [39], [44], [45], in the genotype to phe notype map ping p roblem mentioned previously [22], [9], or in the problem of labeling bo oks for ease of bro wsing [46 ]. Th ere are also several theories of cog nition bas ed on preserving similarity relations from a source spa ce in a represe ntation s pace, though minimal change codes do not seem to be used explicitly [47]. Consider JSCC with so urce alphabet X , chan nel input alphabet A , chann el o utput alphabe t B , a nd source recon struction alpha bet Y . Th en the injectiv e mapping between X a nd A is the index assignmen t for J SCC. The mapping between A and B is giv en by the n oisy cha nnel, a transition prob ability assignme nt, p ( b | a ) . Th e s urjectiv e ma pping between B and Y is the in verse index assignme nt operation. The goal in selec ting index assign ments is to minimize the distortion between the X − Y spaces when August 22, 2018 DRAFT 22 there are errors between the A − B spa ces. Informally using terminology from gene tics, the source spaces X and Y are ca st a s phe notype, whe reas the index spac es A and B are c ast as genotype . Then ind ex assignme nt aims to have small mutations in genotype res ult in s mall change s in pheno type. In the palimpsest problem, 6 the injectiv e and su rjecti ve mappings be tween X a nd A as well as B a nd Y are basic ally the sa me as in the joint so urce chan nel coding problem. The distinction be tween the two problems is that for malleab le coding, there is a trans ition probability ass ignment between X and Y , rather tha n between A and B . On e goal is to minimize the distance be tween words in the A and B spa ces for perturbations in the X and Y spac es. Using the ge netics an alogy , index ass ignments so that s mall change s in phenotyp e result in small change s in genotype are desired. One might even call malleable coding a problem in joint cha nnel sou rce coding . Considering that index assignme nt for JSCC, signal constellation labeling, and the pa limpsest problem are s o similar , it is n ot surprising that Gray c odes come up in all of them [43], [39]. All a re es sentially problems o f embedding: performing a transformation on o bjects of on e type to produc e obje cts of a new type su ch that the distanc e betwee n the transformed objects approx imates the distance be tween the original objects [25]. V I I . G E N E R A L C H A R AC T E R I Z AT I O N S W e have se en that there may be a trade-off between the various parameters ( K, L, M ) a nd have found several ea sily achieved po ints. Our interest now turns to obtaining more detailed charac terizations of P V and P B , the sets of achiev able rate–malleability triples. A. V ariab le-length Coding W e begin with characte rization o f P V , wh ich is a problem in zero error information theo ry [48], [49]. Our results a re expressed in terms o f the solution to a n error-t olerant attr ibuted s ubgraph isomorphism problem [5], which we ﬁrst describe in gen eral. 1) E rr or-T o lerant Attrib uted Subgraph Isomorp hism Pr o blem: A vertex-attri buted graph is a three- tuple G = ( V , E , µ ) , where V is the set o f vertices, E ⊆ V × V is the set of edges , and µ : V → V ∗ is a fun ction assign ing labels to vertices. T he set of labels is de noted V ∗ . Th e d eﬁnition of embedding for attrib u ted graphs ha s a slightly s tronger requ irement than for un attrib uted g raphs, Def. 4. Deﬁnition 8: Co nsider t wo vertex- attrib uted graphs G = ( V ( G ) , E ( G ) , µ G ) and H = ( V ( H ) , E ( H ) , µ H ) . Then G is said to be embeddable into H if H ha s a subgraph isomorphic to G . That is, there is an injectiv e 6 T o mak e the correspondence more precise, let X = Y = W and A = B = V . August 22, 2018 DRAFT 23 map φ : V ( G ) → V ( H ) such that µ G ( v ) = µ H ( φ ( v )) for all v ∈ V ( G ) a nd that ( u, v ) ∈ E ( G ) implies ( φ ( u ) , φ ( v )) ∈ E ( H ) . This is de noted as G H . Several g raph ed iting operations may be de ﬁned, suc h as substituting a vertex label, d eleting a vertex, deleting an edge, and inserting an e dge. These four ope rations are po we rful enough to transform any attrib u ted graph into a su bgraph of any other a ttrib ute d graph. The edited g raph is denoted throug h the operator E ( · ) corresp onding to the seque nce of graph e dit operations E = ( e 1 , . . . , e k ) . There is a cost assoc iated w ith each sequ ence of graph ed it operations , C ( E ) . Deﬁnition 9: Given two graphs G and H , a n error -cor r ecting attributed s ubgraph isomorphism ψ from G to H is the comp osition of two operations ψ = ( E , φ E ) where • E is a seq uence of g raph edit ope rations s uch tha t the re exists a E ( G ) that s atisﬁes E ( G ) H . • φ E is an embedd ing of E ( G ) into H . Deﬁnition 10: The subgraph d istance ρ ( G, H ) is the cost of the minimum cost error -co rrecting at- trib u ted subg raph isomorphis m ψ from G to H . Note tha t in general, ρ ( G, H ) 6 = ρ ( H , G ) . Remark 1: It should be no ted that the subgrap h isomo rphism problem is NP-co mplete [50], and therefore the error -tolerant sub graph isomorphism p roblem is in the class NP a nd is generally ha rder than the exact sub graph isomo rphism p roblem [5]. 2) Clos eness V itality: The s ubgraph isomorph ism cos t structure for the palimpsest problem is b ased o n a graph the oretic quantity called the c losenes s vitality [51]. V itality me asures determine the importanc e of particular edges and vertices in a g raph. Deﬁnition 11: Let G be the set of all graphs G = ( V , E ) , and let f : G → R be any real-valued function on G . A vitality index v ( G, x ) is the dif feren ce of the values of f o n G a nd on G without element x ; it satisﬁes v ( G, x ) = f ( G ) − f ( G − x ) . A particular vitality index is the closenes s v itality , deﬁne d in terms of the W iener index [52], which is simply the sum of all pairwise distance s. Deﬁnition 12: The W iene r index f W ( G ) of a graph G is the sum of the distance s of all vertex pairs: f W ( G ) = X v ∈ V X w ∈ V d ( v , w ) . Deﬁnition 13: The c losenes s v itality cv ( G, x ) is the vitality ind ex with respe ct to the W ien er index: cv( G, x ) = f W ( G ) − f W ( G − x ) . In addition to the app lication in the p alimpsest problem, the clos eness vitali ty also determines traf ﬁc - related c osts in all-to-all routing networks. August 22, 2018 DRAFT 24 Finding the distan ce matrix to compute the W ien er index in volv es solving the all-pairs sh ortest path problem. Finding the distance matrix of a modiﬁed graph from the distance matrix of the original graph in volves solving the dynamic all-pairs sho rtest path problem [53], [54]. 3) P V Characterization: For our purpos es, we are concerned with the error- tolerant embe dding of an attrib uted, weighted sou rce a djacency graph into the g raph ind uced by a V ∗ -space edit distanc e. As such, ed ge de letion will b e the only graph editing operation that is required. Error -tolerant embedding problems in pattern recognition an d ma chine vision often have simple cos t fun ctions [5], [55 ]; our c ost function is determined by the closene ss vitality and is not so simple. T o characterize P V , let us ﬁrst c onsider the delay-free case, n = 1 . A source p ( X , Y ) a nd an edit distanc e d ( · , · ) are given. It is known [2] that Huff man c oding provides the minimal redund ancy instantaneou s cod e a nd ach iev es expected p erformance H ( X ) ≤ K ≤ H ( X ) + 1 . Similarly , a Hu f fman code for Y would yield H ( Y ) ≤ L ≤ H ( Y ) + 1 . The rate loss for using an incorrect Huff man code is essen tially as given in Fig. 4. Supp ose that we require that the rate lower bound is met, i.e. we must use a Hu f fman code for some Z that is on the geo desic betwee n X and Y . This code will sa tisfy the Kraft inequality [56]. No te that for a gi ven Z , the re are s everal Huffman codes : tho se arising from dif ferent labelings of the code tree and also pe rhaps dif ferent trees [57]. Let us de note the set of all Huf fman codes for Z as H Z . Since K an d L are ﬁxed by the cho ice of Z , all that remains is to determine the set of achiev able M . Let G be the g raph indu ced by the e dit dista nce d ( · , · ) , and d G its pa th me tric. Th e graph G is intrinsically labeled. Let A be the weighted adjacency graph of the source p ( X , Y ) , with vertices W , edges E ( A ) a su bset of W × W , and labe ls given by a Huf fman c ode. That is A = ( W , E ( A ) , f E ) for some f E ∈ H Z . Th ere is a pa th se mimetric, d A , a ssociated with the grap h A (s ince the adjac ency graph is weighted, it might not satisfy the triangle inequa lity). As may be s urmised from Section V, the b asic problem is to s olve the error -tolerant subg raph isomorphism problem of embedd ing A into G . In general for n = 1 , the malleab ility cost under edit distance d G when us ing the source c ode f E is M = X x ∈W X y ∈W p ( x, y ) d G ( f E ( x ) , f E ( y )) . The smallest malleability poss ible is when A = ( W , E ( A ) , f E ) is a s ubgraph of G , and then M min = X x ∈W X y ∈W p ( x, y ) d A ( x, y ) = X x ∈W X y ∈W p ( x, y ) d G ( f E ( x ) , f E ( y )) = Pr[ X 6 = Y ] , August 22, 2018 DRAFT 25 which is simply the expected W iene r index M min = E [ f W ( A )] = Pr[ X 6 = Y ] . If edges in A need to broken in order to make it a sub graph of G , then M increases a s a res ult. The cost of grap h e diting operations in the error-tolerant embedding problem sh ould reﬂect the effect o n M . If an e dge ¯ e is removed from the g raph A , the resulting graph is ca lled A − ¯ e ; it induc es its own path semimetric d A − ¯ e . Thus the cost of removing a n edge, ¯ e , from the graph A is giv en by the follo wing expression as a func tion of the a ssoc iated removal ope ration e : C ( e ) = X x ∈W X y ∈W p ( x, y ) [ d A − ¯ e ( f E ( x ) , f E ( y )) − d A ( f E ( x ) , f E ( y ))] , which is the negative expected close ness vitality C ( δ ) = − E [cv ( A, e )] . If E is a se quence o f e dge removals, ¯ E , then C ( E ) = X x ∈W X y ∈W p ( x, y )  d A − ¯ E ( f E ( x ) , f E ( y )) − d A ( f E ( x ) , f E ( y ))  , which is C ( E ) = − E [cv( A, ¯ E )] . As seen, the cost func tion is quite d if fe rent from standard error-tolerant embe dding problems [5], [55] since it depend s not only on which edg e is broken, but also on the rema inder o f the gra ph. Putting things together , we s ee that P V contains a ny point ( K, L, M ) =  H ( X ) + D ( p X k p Z ) + 1 , H ( Y ) + D ( p Y k p Z ) + 1 , M min + min f E ∈H Z ρ ( A, G )  . The previous an alysis ha d ass umed n = 1 . W e may inc rease the bloc k length and improve pe rformance. Theorem 1 : Conside r a source p ( X , Y ) with as sociated (unlabeled) weighted a djacency graph A a nd an ed it distance d with as sociated graph G . For a ny n , let P ( ach ) V be the set of triples ( K, L, M ) that are computed, by allowi ng an arbitrary choice of the memoryles s random variable p ( Z n 1 ) , as follo ws: K = H ( X ) + D ( p X k p Z ) + 1 n , L = H ( Y ) + D ( p Y k p Z ) + 1 n , M = 1 n Pr[ X n 1 6 = Y n 1 ] + 1 n min f E ∈H Z n 1 ρ ( A = ( W n , E ( A ) , f E ) , G ) . Then the set of triples P ( ach ) V ⊆ P V is achiev able instantane ously . August 22, 2018 DRAFT 26 Pr oof: A non-degenerate ra ndom variable Z n 1 is ﬁxed. Th ere is a f amily of instantaneous lossles s codes (with ∆ = 0 ) that corres ponds to this random variable, d enoted { ( f E , f D ) } = H Z n 1 , through the McMillan sum. By the res ults in [26], any of these codes achieve rates K ≤ H ( X ) + D ( p X k p Z ) + 1 n and L ≤ H ( Y ) + D ( p Y k p Z ) + 1 n . Moreover , by the graph emb edding co nstruction, a cod e ( f E , f D ) achieves M = 1 n Pr[ X n 1 6 = Y n 1 ] + 1 n ρ ( A = ( W n , E ( A ) , f E ) , G ) . Sinc e a ll codes in H Z n 1 have the same rate performance, a code in the family that minimizes ρ may be cho sen. The above theorem states that error -tolerant subg raph isomorph ism implies achievable malleability . The choice of the auxiliary rand om variable Z is op en to op timization. If minimal rates are desired, then p Z should be on the geod esic con necting p X and p Y . If Z is not on the geo desic, the n there is some rate loss, but perha ps there can be some mallea bility ga ins. Note that w hen p ( y | x ) is a stationary ed iting process , there is the poss ibility o f the s imple lo wer bounds being tight to this achievable region. Cor ollary 2: Consider a sou rce a s g i ven above in Theorem 1. If p ( y | x ) is stationary , p ( x ) = p ( y ) is |V | -adic, and there is a Huffman-labeled A for p ( x ) = p ( y ) tha t is an iso metric subgrap h o f G , then the block length n lower bound ( H ( X ) , H ( Y ) , 1 n Pr[ X n 1 6 = Y n 1 ]) is tight to this ach iev able region for ev ery n , a nd in particular to ( H ( X ) , H ( Y ) , 0) for large n . B. Block Co ding Now we turn our attention to the block -coding pa limpsest problem. For P B , we us e a joint typicality graph rather than the we ighted adjacency graph used for P V . Additionally we focus on binary b lock codes un der Hamming edit d istance, so we are conc erned on ly with hypercu bes rathe r than gene ral edit distance graphs. W e can use graph-theoretic ideas to formally state an achiev a bility result for the block coding palimpsest problem. As shown in the cons tructi ve examples above, there are sche mes for which a n improvement o n M ma y be a chieved b y increas ing L . However , the resulting c ompression of Y n 1 is no t unique, and thus is not optimal. W e wish to expu r gate the redu ndant representations of Y n 1 as ef ﬁciently a s po ssible, by the aid of a graph. Howev er , in doing so, we h ave to also co nsider the repres entations a nd how they a re related to one anothe r . First we re view s ome standard typica lity arguments (from [58]) a nd the n de ﬁne a graph from typical sets. Deﬁnition 14: The s trongly typica l set T n [ X ] δ with res pect to p ( x ) is T n [ X ] δ = ( x n 1 ∈ X n | X x   1 n N ( x ; x n 1 ) − p ( x )   ≤ δ ) , August 22, 2018 DRAFT 27 where N ( x ; x n 1 ) is the n umber of occurrenc es of x in x n 1 and δ > 0 . Deﬁnition 15: The s trongly jointly typical se t T n [ X Y ] δ with res pect to p ( x, y ) is T n [ X Y ] δ = ( ( x n 1 , y n 1 ) ∈ X n × Y n | X x X y   1 n N ( x, y ; x n 1 , y n 1 ) − p ( x, y )   ≤ δ ) . Deﬁnition 16: For a ny x n 1 ∈ T n [ X ] δ , deﬁne a strongly cond itionally typical se t T n [ Y | X ] δ ( x n 1 ) = n y n 1 ∈ T n [ Y ] δ | ( x n 1 , y n 1 ) ∈ T n [ X Y ] δ o . Deﬁnition 17: Let the co nnected strong ly typical s et be S n [ X ] δ = n x n 1 ∈ T n [ X ] δ | T n [ Y | X ] δ ( x n 1 ) is non empty o . Now that we have d eﬁnitions of typical sets, we put forth some lemmas. Lemma 1 (Str on g AE P): Let η be a sma ll positiv e number s uch that η → 0 as δ → 0 . Then for sufﬁciently large n ,    T n [ X ] δ    ≤ 2 n ( H ( X )+ η ) . Pr oof: See [58, Theorem 5.2]. Lemma 2 (Str on g JAEP): Le t λ be a small pos iti ve number such that λ → 0 a s δ → 0 . Then for sufﬁciently large n , Pr[( X n 1 , Y n 1 ) ∈ T n [ X Y ] δ ] > 1 − δ and (1 − δ )2 n ( H ( X,Y ) − λ ) ≤    T n [ X Y ] δ    ≤ 2 n ( H ( X,Y )+ λ ) . Pr oof: See [58, Theorem 5.8]. Lemma 3: If δ ( n ) satisﬁes the following c onditions, then Lemma 2 remains valid: δ ( n ) → 0 and √ nδ ( n ) → ∞ as n → ∞ . Pr oof: See [59, (2.9) on p. 34]. Lemma 4: If    T n [ Y | X ] δ ( x n 1 )    ≥ 1 , then 2 n ( H ( Y | X ) − ν ) ≤    T n [ Y | X ] δ ( x n 1 )    ≤ 2 n ( H ( Y | X )+ ν ) , where ν → 0 as n → ∞ and δ → 0 . Pr oof: See [58, Theorem 5.9]. Lemma 5: (1 − δ )2 n ( H ( X ) − ψ ) ≤    S n [ X ] δ    ≤ 2 n ( H ( X )+ ψ ) , August 22, 2018 DRAFT 28 where ψ → 0 as n → ∞ and δ → 0 . Also, for any δ > 0 , Pr[ X n 1 ∈ S n [ X ] δ ] > 1 − δ for n sufﬁciently large. Pr oof: See [58, Corollary 5.11], Lemma 1, an d [58, Propo sition 5.1 2]. For the biv ariate d istrib u tion p ( x, y ) , deﬁ ne a square ma trix c alled the str o ng joint typicality matrix A n [ X Y ] as follo ws. There is one ro w (and column) for ea ch se quenc e in S n [ X ] δ ∪ S n [ Y ] δ . The entry with row correspon ding to x n 1 and column c orresponding to y n 1 receiv es a one if ( x n 1 , y n 1 ) is strongly jointly typical and zero otherwise. 1) S tationary Editing: Now let us restrict ourselves to the clas s o f b i vari ate d istrib u tions with equal marginals: P = { p ( x, y ) | p ( x ) = p ( y ) } , which is the class of distributions with stationa ry editing. In this class , we avoid the misma tch redunda ncy and also reduce the nu mber of performanc e parame ters from 3 to 2. After this restriction, it is c lear that the x -typical set and the y -typical set coinc ide. Mo reover , H ( X ) = H ( Y ) and H ( Y | X ) = H ( X | Y ) = H ( X ) − I ( X ; Y ) . Thus i t follo ws that asymptotically , A n [ X Y ] will be a sq uare matr ix with an approximately equal number of ones in all columns a nd in all rows. Th ink of A n [ X Y ] as the adjace ncy matrix of a g raph, where the vertices are se quence s and edge s con nect sequ ences that are jointly typical with one anothe r . Pr opo sition 3: T ake A n [ X Y ] for s ome source in P as the adjace ncy matrix of a graph G n . The numbe r of vertices in the graph will sa tisfy (1 − δ )2 n ( H ( X ) − ψ ) ≤ | V ( G n ) | ≤ 2 n ( H ( X )+ ψ ) , where ψ → 0 as n → ∞ and δ → 0 . The degree of e ach vertex, deg v , will c oncentrate a s 2 n ( H ( Y | X ) − ν ) ≤ deg v ≤ 2 n ( H ( Y | X )+ ν ) , where ν → 0 as n → ∞ and δ → 0 . Pr oof: Follo ws from the p revious lemma s. Having establishe d the basic top ology of the s trongly typica l se t as asymp totically a 2 nH ( Y | X ) -regular graph on 2 nH ( X ) vertices, we r eturn to the coding problem. Using graph embedding ideas yields a theorem on b lock palimpsest achievabilit y . Theorem 2 : For a source p ( x, y ) ∈ P a nd the Ha mming e dit distance, a triple ( K, K , M = M min ) is achiev able if G n H nK . August 22, 2018 DRAFT 29 Pr oof: T o achieve M min , we need to assign bina ry co dew ords to each of the 2 nH ( X ) vertices, such that the Hamming d istance be tween the codewor d of a vertex and the co dew ords of any of its neighbo rs is 1 . Using the binary reﬂected Gray code o f length nK and the hype rcube that it induc es, the construction reduces to ﬁnding an embed ding of G n into the hype rcube of size nK , de noted H nK . Thus a sufﬁcient condition for b lock-code achievabili ty , while requ iring M = M min , is G n H nK . Using this result, we argue that a linear increa se in mallea bility is a t exp onential cost in code length. By a simple coun ting argument, we p resent a c ondition for e mbeddab ility . Theorem 3 : For a source p ( x, y ) ∈ P and the Hamming edit distanc e, asy mptotically , if G n H nK then nK ≥ max  nH ( X ) , 2 nH ( Y | X )  . (5) Pr oof: Th e h ypercube H nK is an nK -regular g raph with 2 nK vertices. As a minimal cond ition for embedda bility , the number of vertices in the hype rcube must be greater than or equal to the n umber of vert ices in the grap h to be embedded, i.e. nK ≥ n ( H ( X ) + ψ ) . As another minimal condition for embedda bility , the degree of the hypercube must be g reater than or equa l to the ma ximal degree of the graph to be embedde d, s o nK ≥ 2 n ( H ( Y | X )+ ν ) . Combining the two c onditions a nd letting ψ → 0 and ν → 0 as n → ∞ yields the des ired res ult. This the orem is on e of ou r main results. It should be noted that even if we allowed some a symptotically small s lack in b reaking s ome edges to pe rform e mbedding, i.e . we so lved an error -tolerant subgraph isomorphism problem with error toleran ce ξ , this would n ot help, since we would need to break a constant fraction o f edge s in G n to reduc e the maximal degree. In particular , since each of the n H ( X ) vertices in G n asymptotically has the same degree, to reduce the maximal d egree even by one would require breaking ξ ≥ nH ( X ) edg es. Clea rly ξ 9 0 as n → ∞ . This result can be interpreted as follo ws. When using binary codes that achieve the minimal mallea bility parameter , the length of the code must be grea ter than max  nH ( X ) , 2 nH ( Y | X )  . If 2 nH ( Y | X ) is much greater than nH ( X ) , i.e. the tw o versions are no t particularly well correlated, this implies that to ac hiev e minimal malleability requires a signiﬁcant le ngth expansion of the codewords over the entropy bound . T aking this to an extreme, suppose that X and Y are independent. The n 2 nH ( Y | X ) = 2 nH ( X ) , and a n exponential expan sion is requ ired, just as in the universal PPM s cheme of Section V -D. If we want to understan d the embedd ability requirements further , we would need to un derstand the topology of G n further . Just knowing that it is asymptotically regular do es not seem to be e nough. S ev eral August 22, 2018 DRAFT 30 properties that are equiv alent to exact hyp ercube e mbeddab ility are given in [60] , [61]. 7 Of cou rse we can b reak so me small fraction ξ of edg es in the graph to satisfy the embedd ability conditions as long as ξ → 0 as n → ∞ . If we no longer require tha t nM be the minimal possible, then we are back to the same kind of error-tolerant subgraph isomorphism formulati on gi ven for variable length c oding in the previous s ection. The o nly c hange in the characteriza tion of the ac hiev ab le region is that rather tha n restricting the enc oder to be the Huffman code of an aux iliary random v ariable Z n 1 , here one would n eed to test the error -tolerant subg raph isomorph ism func tional over all permutations of labe lings. 2) Ge neral Editing: If we remove our res triction of p ( x, y ) ∈ P , then we can create A n [ X Y ] as be fore. While the resulting graph would no t be as ymptotically regular , the basic result o n p aying a n expone ntial rate penalty will still hold. The space S n , S n [ X ] δ ∪ S n [ Y ] δ with the c orresponding pa th me tric, d A induced b y A n [ X Y ] is a metric space . Hype rcubes with the ir natural path metric, d G , are also a metric space. Rather than requiring absolutely minimal n M , it can be noted that M is asymptotically zero wh en the Lipsch itz cons tant assoc iated with the mapping between the sourc e spac e a nd the represe ntation spac e has nice p roperties in n . Deﬁnition 18: A mapping f from the me tric space ( S n , d A n ) to the metric space ( V nK , d G n ) is called Lipschitz continuou s if d G n ( f ( x 1 ) , f ( x 2 )) ≤ C d A n ( x 1 , x 2 ) 7 There are seve ral characterizations of hypercube-embed dable graph s in the metric theory of graphs [60], [61]. For a bipartite connected graph G = ( V , E ) the statements are equiv alent: • G can be isometrically embedded i nto a hypercube. • G satisﬁes G ( a, b ) = { x ∈ V | d G ( x, a ) < d G ( x, b ) } is con vex for each edge ( a, b ) of G . A subset U ⊆ V is con vex if it is closed under taking shortest paths. • G is an ℓ 1 graph, i.e. the path metri c d G is isometrically embeddable in the space ℓ 1 . • The path metric d G satisﬁes the pentagonal inequality: d G ( v 1 , v 2 ) + d G ( v 1 , v 3 ) + d G ( v 2 , v 3 ) + d G ( v 4 , v 5 ) − X h =1 , 2 , 3 k =4 , 5 d G ( v h , v k ) ≤ 0 for all nodes v 1 , . . . , v 5 ∈ V . • The distance matrix of G has exactly one positiv e eigen v alue. Further , a graph is said to be distance regular if there exist integ ers b m , c m ( m > 0 ) such that for an y two nodes i, j ∈ V ( G ) at distance d G ( i, j ) = m there are e xactly c m nodes at distance 1 from i and distance m − 1 from j , and there are b m nodes at distance 1 from i and distance m + 1 f rom j . The distance-regu lar graphs that are hypercube embeddable are completely classiﬁed: t he hypercubes, the e ven circuits, and the dou ble-odd graphs. August 22, 2018 DRAFT 31 for s ome constan t C and for all x 1 , x 2 ∈ S n . The smallest such C is the Lipschitz constan t : Lip[ f ] = sup x 1 6 = x 2 ∈ S n d G n ( f ( x 1 ) , f ( x 2 )) d A n ( x 1 , x 2 ) . The Lipschitz con stant is als o called the dilation of an embed ding, since it is the maximum amo unt that any e dge in G n is stretched as it is replaced by a path in H nK [62], [63]. A related qua ntity is the Lipschitz constan t of the in verse mapping, called the co ntraction : Lip[ f − 1 ] = su p x 1 6 = x 2 ∈ S n d A n ( x 1 , x 2 ) d G n ( f ( x 1 ) , f ( x 2 )) . The produ ct of the dilation a nd con traction Lip [ f ] Lip [ f − 1 ] is called the distor tion . Another prop erty of metric e mbeddings in the expansion , which is the ratio of the s izes of the two ﬁn ite me tric s paces , expan[ f ] = |V nK | | S n | . W e ca n bound the ma lleability M , for a cod ing scheme that o nly represen ts seque nces in S n as follo ws. Theorem 4 : Let the Lipschitz con stant be as deﬁned. Then for a coding sche me f E that only repres ents sequen ces in S n = S n [ X ] δ ∪ S n [ Y ] δ , we have that M ≤ Lip[ f E ] n (1 + δ diam( G n )) August 22, 2018 DRAFT 32 Pr oof: The proof is given a s follows: M = 1 n X x n 1 ∈ S n X y n 1 ∈ S n p ( x n 1 , y n 1 ) d G n ( f E ( x n 1 ) , f E ( y n 1 )) ( a ) ≤ 1 n X x n 1 ∈ S n X y n 1 ∈ S n p ( x n 1 , y n 1 ) Lip [ f E ] d A n ( x n 1 , y n 1 ) = Lip[ f E ] n X x n 1 ∈ S n X y n 1 ∈ S n p ( x n 1 , y n 1 ) d A n ( x n 1 , y n 1 ) = Lip[ f E ] n          X x n 1 ∈ S n ,y n 1 ∈ S n ( x n 1 ,y n 1 ) ∈ T n [ X Y ] δ p ( x n 1 , y n 1 ) d A n ( x n 1 , y n 1 ) + X x n 1 ∈ S n ,y n 1 ∈ S n ( x n 1 ,y n 1 ) / ∈ T n [ X Y ] δ p ( x n 1 , y n 1 ) d A n ( x n 1 , y n 1 )          ( b ) = Lip[ f E ] n          Pr h ( x n 1 , y n 1 ) ∈ T n [ X Y ] δ i + X x n 1 ∈ S n ,y n 1 ∈ S n ( x n 1 ,y n 1 ) / ∈ T n [ X Y ] δ p ( x n 1 , y n 1 ) d A n ( x n 1 , y n 1 )          ( c ) ≤ Lip[ f E ] n      1 + δ max x n 1 ∈ S n ,y n 1 ∈ S n ( x n 1 ,y n 1 ) / ∈ T n [ X Y ] δ d A n ( x n 1 , y n 1 )      = Lip[ f E ] n { 1 + δ d iam( G n ) } , where step (a) is by deﬁnition of the Lipsc hitz c onstant; step (b) foll ows from the deﬁn ition of graph distance and the consistency of strong typicality ([58, Theorem 5.7]); a nd step (c) is from b ounding Pr h ( x n 1 , y n 1 ) ∈ T n [ X Y ] δ i by 1 an d from L emma 2. Note tha t the δ bound use d in s tep (c) for the probability of sequ ences pairs that are both margi nally typical b ut n ot jointly typ ical is ac tually the prob ability of all n on-jointly typical pairs and is therefore loose. Computing Lipschitz constants is us ually d if ﬁcult or impossible. Th ere are, however , methods from theoretical co mputer sc ience for bounding Lips chitz c onstants (or dilation) for embeddings [62], [63 ]. For a “host” graph H and a “g uest” grap h G , a basic counting argument reminisce nt of Th eorem 3, shows that the dilation for any embe dding of G into H mu st satisfy Lip[ f E ] ≥  log( d G − 1) log( d H − 1)  , where d G and d H are the respec ti ve maximum degrees [62, Prop. 1.5.2]. When the guest grap h is the joint typ icality graph G n and the host g raph is the hype rcube H nK , this implies that Lip[ f E ] ≥ & log(2 n ( H ( Y | X )+ ν ) − 1) log( n K − 1) ' ≈ nH ( Y | X ) log nK . August 22, 2018 DRAFT 33 Another typica l result arises whe n it is ﬁxed that both graph s have m vertices (expansion is 1 ). The Lipschitz constant bound is in terms the bisec tion width W G and the recursive edge-bise ctor function R H ( · ) . The dilation Lip[ f E ] of any e mbedding f E of G into H must s atisfy Lip[ f E ] ≥  1 log d H  log W G d G R H ( m ) − 1 . The bisection width of a graph is the size of the s mallest cut-set that breaks the graph into two subgraph s of equal sizes (to within rounding) [62, P rop. 2.3 .6]. Us ing su ch a result is difﬁcult since the bise ction width of the joint typicality graph is not known. For the case whe n H = H ⌈ nH ( X ) ⌉ , a simpliﬁed version reduces to Lip[ f E ] ≥ ⌈ log m ⌉ diam( G n ) ≈ nH ( X ) diam( G n ) , where diam( G n ) is the same g raph diameter we had seen before [63]. If the dilation is t o be no greater than 2 , the PPM s cheme we have described pre viously may be reinterpreted in a grap h embedding framew ork and seen to achieve Lip[ f E ] ≤ 2 , but the p rice is exponen tial expans ion, expan[ f E ] = 1 m 2 m − 1 . Returning to Theo rem 4, as note d in Le mma 3, δ can b e taken as δ ( n ) = n − 1 2 + ω for so me ﬁxed ω > 0 . The diameter o f the hypercube H nK is c learly nK . Combining this with the contraction provides a boun d on the diameter of G n : diam( G n ) ≤ L ip [ f − 1 E ] nK . Thus one c an further boun d the expression in Theorem 4 as M ≤ Lip[ f E ] n (1 + δ diam( G n )) ≤ Lip[ f E ] n  1 + n − 1 2 + ω Lip[ f − 1 E ] nK  = Lip[ f E ] n + K Lip[ f E ] L ip [ f − 1 E ] n 1 / 2 − ω . This y ields the follo wing proposition. Pr opo sition 4: The ma lleability M is asymp totically boun ded a bove by: lim s up n →∞ M ≤ Lip[ f E ] n + K Lip[ f E ] L ip [ f − 1 E ] n 1 / 2 − ω for a ny ﬁxed ω > 0 . The quan tity M is essentially bounde d by n − 1 / 2 K Lip[ f E ] L ip [ f − 1 E ] since the s econd term should dominate the ﬁrst. An alternate express ion for K is K = 1 n log 2 expan[ f E ] + H ( X ) , which is ﬁxed. If August 22, 2018 DRAFT 34 Lip[ f E ] L ip [ f − 1 E ] is o ( √ n ) and Lip[ f E ] is o ( n ) for the sequen ce of enco ders f E , M will go to zero asymptotically in n . Due to the boun ding methods that were u sed, it is not at all c lear whether this Lipschitz bound on malleability is tight, and one might sus pect that it is not. A s lightly different branch of theore tical comp uter scienc e d eals with bounding the distortion of map pings [64], [65], howe ver it is not clear how to apply these resu lts to the p alimpsest problem. V I I I . D I S C U S S I O N A N D C O N C L U S I O N S W e h ave formulated an informati on theoretic problem mo ti vated by ap plications in information storage where a c ompressed stored do cument must often be u pdated and there a re co sts associated with writing on the storage medium. That there are alw ays editing costs in ov erwriting rewritable media is a fund amental fact of thermodynamics and follows from Landau er’ s principle [66]: Since disca rding information results in a dissipation of energy , overwrit ing ca uses an inextricable loss of e nergy . Both the compres sed palimpses t problem conside red here and a distinct problem with a similar moti vati on presen ted in a companion pape r [4] exhibit a fundame ntal trade-off between comp ression efﬁciency and the costs incurred when sy nchronizing between two versions of a source code. The palimpsest problem is conce rned with random acc ess e diting, where c hanging n earby or g reatly se parated symbols in the compres sed rep resentation have the sa me cost. The “c ut-and-paste” formulation o f [4] is concerne d with ed iting lar ge subse quence s, a s would be a ppropriate whe n there is a cos t asso ciated with communicating the positions of edits. The bas ic result is that unless the two versions o f the source are either very strongly correlated or have a d eterministically common part, if rates close to entropy are required for bo th sources, then a lar g e malleability cost will hav e to be paid. Similarly , if small mallea bility is required, a very lar ge rate penalty will be paid. There is a funda mental trade-off betwee n the qu antities. For ou r compress ed palimpses t problem, we found that if minimal malleab ility costs a re desired, then a rate pe nalty that is exponential in the c onditional entropy of the ed iting proce ss mus t be pa id. Th at is, unless the two versions of the source are v ery strongly correlated (conditional entropy logarithmic in block length), rate exponentially larger than entropy is neede d. A universal scheme for minimal ma lleability is giv e n by a pu lse position mod ulation me thod. Thu s, if we require malleab ility M = O (1 /n ) , then rates K and L must be Ω( 1 n 2 n ) . One may be te mpted to try to cas t the block palimpsest problem in terms of e rror -co rrecting code s, where the quality metric is the block Hamming distance. The Hamming distance does not care how two letters dif fer , it only c ares whether they are different. In a sense, it is an ℓ ∞ distance. This g i ves August 22, 2018 DRAFT 35 rise to e rror -co rrecting codes that try to maximize the minimum distanc e between two co dew ords in the codebo ok. In ma lleable coding, we care not just about whether a modiﬁed codeword is ins ide or outside the minimum d istance de coding region for the original c odeword, b u t h ow far , basically treating the space with a symbol edit distance , which may be ℓ 1 . A P P E N D I X A ( V ∗ , d ) I S A F I N I T E M E T R I C S P AC E A metric must satisfy n on-negativit y , e quality , s ymmetry , and the triangle ineq uality . The se properties are veriﬁed for a ny edit distance with edit operation R a s follows. • non-ne gativity : follo ws s ince the edit distance is a coun ting mea sure. • equality : follows by deﬁ nition, sinc e the dis tance is z ero if and only if a = b . • symmetry : If d ( a, b ) = n , then it f ollows there is a sequenc e of n − 1 intermed iate strings, a 1 , a 2 , . . . , a n − 1 which along with a 0 = a and a n = b satisfy ( a i , a i +1 ) ∈ R . Since R is a symmetric relati on, it follo ws tha t ( a i + 1 , a i ) is a lso in R , a nd so there is a b ackwards seq uence a n , a n − 1 , . . . , a 0 . He nce if d ( a, b ) = n then d ( b, a ) = n also, an d s o d ( a, b ) = d ( b, a ) for all a, b . • t riangle ineq uality : Suppo se d ( a, b ) + d ( b, c ) < d ( a, c ) . T hen there is a se quenc e of editing operations ( a i , a i +1 ) that goes from a to c via b in d ( a, b ) + d ( b, c ) steps. No w pe rform the editing ope rations of d ( a, b ) followed by the o perations of d ( b, c ) , which requ ires d ( a, b ) + d ( b, c ) s teps. This contradicts the initial assu mption, henc e d ( a, b ) + d ( b, c ) ≥ d ( a, c ) . A P P E N D I X B P R O O F O F P R O P O S I T I O N 1 Since G 1 H 1 , V ( G 1 ) ⊆ V ( H 1 ) . Since G 2 H 2 , V ( G 2 ) ⊆ V ( H 2 ) . Then by eleme ntary set operations, V ( G 1 × G 2 ) = V ( G 1 ) × V ( G 2 ) ⊆ V ( H 1 × H 2 ) = V ( H 1 ) × V ( H 2 ) . Since G 1 H 1 , E ( G 1 ) ⊆ E ( H 1 ) . Since G 2 H 2 , E ( G 2 ) ⊆ E ( H 2 ) . Consider an edge ( u, v ) ∈ E ( G 1 × G 2 ) . By deﬁn ition of Cartesian product, it satisﬁes ( u 1 = v 1 and ( u 2 , v 2 ) ∈ E ( G 2 ) ) or ( u 2 = v 2 and ( u 1 , v 1 ) ∈ E ( G 1 ) ), but since E ( G 1 ) ⊆ E ( H 1 ) and E ( G 2 ) ⊆ E ( H 2 ) , it also sa tisﬁes ( u 1 = v 1 and ( u 2 , v 2 ) ∈ E ( H 2 ) ) or ( u 2 = v 2 and ( u 1 , v 1 ) ∈ E ( H 1 ) ). Th erefore E ( G 1 × G 2 ) ⊆ E ( H 1 × H 2 ) . Since V ( G 1 × G 2 ) ⊆ V ( H 1 × H 2 ) and E ( G 1 × G 2 ) ⊆ E ( H 1 × H 2 ) , G 1 × G 2 H 1 × H 2 . A C K N O W L E D G M E N T S The second a uthor thanks V ahid T a rokh for introduc ing him to storage area n etworks. The a uthors also thank Robe rt G. Gallager a nd S anjoy K. Mitt er for use ful exchan ges; Sekhar T atikonda for discussions on August 22, 2018 DRAFT 36 mappings betwe en sou rce and repres entation sp aces; and Renuka K. Sas try for a ssistance with gene tics. R E F E R E N C E S [1] C. E. Shannon, “ A mathematical theory of communication, ” Bel l Syst. T ech. J . , vol. 27, pp. 379–42 3, 623–656, July/Oct. 1948. [2] D. A. Huffman, “ A method for the construction of minimum-redundan cy codes, ” Pr oc. IRE , vol. 40, no. 9, pp. 109 8–1101, Sept. 1952 . [3] R. Netz and W . Noel, The Arc himedes Codex . Philadelphia, P A: Da Capo Press, 2007. [4] J. Kusuma, L. R. V arshne y , and V . K. Goyal, “Malleable coding: A cut-and-paste method, ” IEEE T rans. Inf. Theory , 2008, in preparation. [5] B. T . Messme r and H. Bunke, “ A new algorithm for error -tolerant subgraph i somorphism detection, ” IEEE T rans. P attern Anal. Mac h. Intell. , vol. 20, no. 5, pp. 493 –504, May 1998. [6] C. E. Shannon, “The zero error capacity of a noisy channel, ” IRE T rans. Inf. T heory , vol. IT -2, no. 3, pp. 8–19, Sept. 195 6. [7] H. S. Witsenh ausen, “The zero-error side information problem and chromatic numbers, ” IEEE T rans. Inf. Theory , vol. IT -22, no. 5, pp. 592 –593, Sept. 1976. [8] J. L. Gross, T opolo gical Gra ph Theory . Ne w Y ork: John Wiley & Sons, 1987. [9] T . Tlusty , “ A model for the emergence of the genetic code as a transition in a noisy information chann el, ” J. Theor . Biol. , vol. 249, no. 2, pp. 331 –342, Nov . 2007. [10] D. R. Bobbarjung, S. Jagannathan, and C. Dubnicki, “Improving duplicate elimination in storage systems, ” ACM T rans. Stora ge , vol. 2, no. 4, pp. 424–4 48, Nov . 2006. [11] C. Policroniades and I. Pratt, “ Alternativ es for detecting redundancy in storage systems data, ” in Pr oc. 2004 USENIX Ann. T ech. Conf. , Boston, June 2004, pp. 73–86. [12] R. Burns, L. Stockmeyer , and D. D. E. L ong, “In-place reconstruction of version dif ferences, ” IEEE T rans. Knowl. Data Eng. , vol. 15, no. 4, pp. 973–984 , July-Aug. 2003. [13] T . S uel and N. Memon, “ Algorithms for delta compression and remote ﬁle synchronization, ” in L ossless Compr ession Handboo k , K. Sayood, E d. El se vier , 2003, pp. 269–290. [14] S. K. Mitter and N. J. Ne wton, “Information and entropy ﬂow in the Kalman-Bucy ﬁ lter , ” J. Stat. Phys. , vol. 118, no. 1-2, pp. 145– 176, Jan. 2005. [15] R. Ahlswede and Z. Zhang, “Coding for write-efﬁcient memory , ” Inf. Comput. , vo l. 83, no. 1, pp. 80–97, Oct. 1989. [16] ——, “On multiuser write-efﬁcient memories, ” IEEE T rans. Inf. Theory , vol. 40, no. 3, pp. 674–68 6, May 1994. [17] S. Ramprasad, N. R. Shanbhag, and I. N. Hajj, “Information-theo retic bounds on a verage signal transition acti vity , ” IEEE T rans. VL SI Syst. , vol. 7, no. 3, pp. 359– 368, Sept. 199 9. [18] A. Orlitsky , “Interacti ve commu nication of balanced distribution s and of correlated ﬁles, ” SIAM J. Discrete Math. , vol. 6, no. 4, pp. 548– 564, Nov . 1993. [19] Y . Minsky , A. T rachtenber g, and R. Zippel, “Set reconciliation wi th nearly optimal communication complex ity , ” IEE E T rans. Inf. Theory , vol. 49, no. 9, pp. 2213–22 18, Sept. 2003. [20] M. S. Garﬁ nkel, D. E ndy , G. L . Epstein, and R. M. F riedman, “Synthetic genomics: Options for govern ance, ” Oct. 2007. [Online]. A v ail able: http://hdl.handle.net/172 1.1/39141 [21] P . C. W ong, K.-K. W ong, and H. Foote, “Organ ic data memory using the DNA approach, ” Commun. ACM , vol. 46, no. 1, pp. 95–9 8, Jan. 200 3. August 22, 2018 DRAFT 37 [22] R. Swanson, “ A unifying concept for the amino acid code, ” Bull. Math. Biol. , v ol. 46, no. 2, pp. 187–203, Mar . 1984. [23] S. B. P rimrose, R. M. T wyman, and R. W . Old, Principles of G ene Manipulation , 6th ed. Oxford: Blackwell S cience, 2001. [24] L. R. V arshney , P . J. Sj ¨ ostr ¨ om, and D. B. Chklovskii, “Optimal information storage in noisy synapses under resource constraints, ” Neur on , vol. 52, no. 3, pp. 409– 423, Nov . 2006. [25] G. Cormode, “Sequenc e distance embedd ings, ” Ph.D. dissertation, Uni versity of W arwick, Jan. 2003. [26] E. N. Gilbert, “Codes based on inaccurate source probabilities, ” IEE E T rans. Inf. Theory , vol. IT -17, no. 3, pp. 304– 314, May 1971. [27] L. D. Davisson, “Uni versal noiseless coding, ” IEEE T rans. Inf. T heory , vol. I T -19, no. 6, pp. 783–795, Nov . 1973. [28] F . T opsøe, “Some inequalities for i nformation div ergence and related measures of discrimination, ” IEE E Tr ans. Inf. Theory , vol. 46, no. 4, pp. 1602 –1609, July 2000 . [29] S. Sinanov i ´ c and D. H. Johnso n, “T ow ard a theory of i nformation processing, ” Signal P r ocess. , vol. 87, no. 6, pp. 1326– 1344, June 2007. [30] L. D. Davisson, R. J. McEliece, M. B. Pursley , and M. S. W allace, “Efﬁcient univ ersal noiseless source codes, ” IE EE T rans. Inf. Theory , vol. IT -27, no. 3, pp. 269–279, May 198 1. [31] P . G ´ acs and J. K ¨ orner , “Common information is far l ess than mutual information, ” P r obl. Control Inf. Theory , vol. 2, no. 2, pp. 149– 162, 1973. [32] K. V isweswariah, S. R. Kulkarni, and S. V erd ´ u, “Source codes as random number generators, ” IEEE T rans. Inf. Theory , vol. 44, no. 2, pp. 462– 471, Mar . 199 8. [33] F . Fu and S. Shen, “On the expec tation and v ariance of Hamming distance between two i. i.d. random vectors, ” Acta Math. Appl. Sin. , vol. 13, no. 3, pp. 243–25 0, July 1997. [34] J. K ¨ orner, “ A property of conditional entrop y , ” Stud. Sci. Math. Hung. , vol. 6, pp. 355–35 9, 1971. [35] W . H. R. Equitz and T . M. C ov er, “Successiv e reﬁnement of information, ” IEEE T rans. Inf. Theory , vol. 37, no. 2, pp. 269–27 5, Mar . 1991. [36] S. V erd ´ u, “On chann el capacity per unit cost, ” IEEE T rans. Inf. Theory , vol. 36, no. 5, pp. 101 9–1030, Sept. 199 0. [37] L. R. V arshne y and V . K. Goya l, “Ordered and disordered source coding , ” in Pr oc. Inf. Theory Appl. Inaug ural W orkshop , La Jolla, California, Feb . 2006. [38] S. S . Pradhan, S. Choi, and K. Ramchandran, “ A graph-based framewo rk for tr ansmission of correlated sources over multiple-access chan nels, ” IEEE T rans. Inf. Theory , vol. 53, no. 12, pp. 4583–460 4, Dec. 2007. [39] E. Agrell, J. Lassing, E. G. S tr ¨ om, and T . Ottosson, “On the optimality of the binary reﬂected Gray code, ” IEEE T rans. Inf. Theory , vo l. 50, no. 12, pp. 3170–318 2, Dec. 2004. [40] G. Longo and G. Galasso, “ An application of informational di vergence t o Huffman codes, ” IEEE T rans. Inf. Theory , vol. IT -28, no. 1, pp. 36– 43, Jan. 1982. [41] E. N. Gi lbert, “Gray codes and paths on the n -cube, ” Bell Syst. T ech. J. , vol. 37, no. 3, pp. 815–826, May 1958. [42] R. L. Dobrushin, “Shannon’ s theorems for channels with synchronization errors, ” Prob l. Inf. Tr ansm. , vol. 3, no. 4, pp. 11–26, Oct.-Dec. 1967. [43] K. Zeger and A. Gersho, “Pseudo-Gray coding, ” IEEE Tr ans. Commun. , vo l. 38, no. 12, pp. 2147–21 58, Dec. 1990. [44] G. Caire, G. T aricco, and E . Bigli eri, “Bit -interleav ed coded modulation, ” IEEE T rans. Inf. T heory , vol. 44, no. 3, pp. 927–94 6, May 1998. August 22, 2018 DRAFT 38 [45] V . Sethuraman and B. Hajek, “Comments on “bit-interlea ved coded modulation”, ” IEEE T rans. Inf. Theory , vol. 52, no. 4, pp. 1795 –1797, Apr . 2006. [46] R. M. L osee, Jr., “ A Gray code based ordering for documen ts on shelv es: Classiﬁcation for bro wsing and retri e v al, ” J . Am. Soc. Inform. Sci. , vol. 43, no. 4, pp. 312 –322, May 1992. [47] S. Edelman, Repr esentation and R ecog nition in V ision . Cambridge: MIT Press, 1999 . [48] N. Alon and A. Orl itsky , “Source coding and graph entropies, ” IEEE T rans. Inf. Theory , vol. 42, no. 5, pp. 1329–13 39, Sept. 1996 . [49] J. K ¨ orner and A. Orlitsky , “Zero-error information theory , ” IEEE T rans. Inf. T heory , vol. 44, no. 6, pp. 2207–222 9, Oct. 1998. [50] M. R. Garey and D. S. Johnson, Computers and Intractability: A Guide t o t he T heory of NP-C ompleteness . San F rancisco: W . H. Freeman, 1979. [51] D. K osch ¨ utzki, K. A. Lehmann, L. Peeters, S. Richter , D . T enfelde-Podehl, and O. Z loto wski, “Centrality indices, ” in Network Analysis: Methodologica l F oundations , U. Brandes and T . Erlebach, Eds. Berl in: Springer , 200 5, pp. 16– 61. [52] H. W iener , “Structural determination of paraf ﬁn boiling points, ” J . Am. Chem. Soc. , vol. 69, no. 1, pp. 17–20, Jan. 1947. [53] G. Ausiello, G. F . Italiano, A. M. Spaccamela, and U. Nann i, “Incre mental algorithms for minimal length paths, ” J. Algorithms , v ol. 12, no. 4, pp. 615–638 , Dec. 1991. [54] C. Demetrescu and G. F . Italiano, “ A new approach to dynamic all pairs shortest paths, ” J. ACM , vol. 51, no. 6, pp. 968–99 2, Nov . 2004. [55] H. Bunke, “Error correcting graph matching : On the inﬂuence of the underlying cost function, ” IEEE T rans. P attern Anal. Mach . Intell. , vol. 21, no. 9, pp. 917–922, Sept. 1999. [56] L. G. Kraft, Jr ., “ A device for quantizing, grouping, and coding amplitude-modulated pulses, ” Master’ s thesis, Massachusetts Institute of T echnology , 1949. [57] R. Ahlswede, “Identiﬁcation entropy , ” in General Theory of Information T ransfer and Combinatorics , ser . Lecture Notes in Computer S cience, R. Ahlswede, L. B ¨ aumer, N. Cai, H. A ydinian, V . Bli novsk y , C. Deppe, and H. Mashurian, Eds. Berlin: Springer , 2006 , vol. 4123, pp. 595–613. [58] R. W . Y eung, A F irst Course in Information Theory . New Y ork: Kluwer Academic/Plenum Publishers, 2002. [59] I. Csisz ´ ar and J. K ¨ orner, Information Theory: Coding Theor ems for Discr ete Memoryless Systems , 3rd ed. Budapest: Akad ´ emiai Kiad ´ o, 1997. [60] M. M. Deza and M. Laurent, Geometry of Cuts and Metrics . Berlin: Springer , 1997 . [61] M. Deza, V . Grishukhin, and M. Shtogrin, Scale-Iso metric P olytopal Gra phs in Hyper cubes and Cubic Lattices . London: Imperial College Press, 2004. [62] A. L. R osenber g and L. S. Heath, Graph Separator s, wit h Applications . New Y ork: Kluwer Academic / P lenum Publishers, 2001. [63] M. Living ston and Q . F . Stout, “Embeddings in hyp ercubes, ” Math. Comput. Model. , vol. 11, pp. 222–227, 1988. [64] Y . Rabinovich and R. Raz, “Lower bounds on the distortion of embedding ﬁnite metric spaces in graphs, ” Discr ete Comput. Geom. , v ol. 19, no. 1, pp. 79–94, Jan. 1998. [65] N. Linial, E. Lond on, and Y . Rabinovich , “The geo metry of graph s and some of its algorithmic applications, ” Combinatorica , vol. 15, no. 2, pp. 215– 245, June 1995. [66] C. H. Bennett, P . G ´ acs, M. Li, P . M. B. V it ´ anyi, and W . H. Z urek, “Information distance, ” IE EE T rans. Inf. Theory , vol. 44, no. 4, pp. 1407 –1423, July 1998 . August 22, 2018 DRAFT

Malleable Coding: Compressed Palimpsests

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment