On the Information Loss in Memoryless Systems: The Multivariate Case

On the Information Loss in Memoryless Systems: The Multi v ariate Case Bernhard C. Geiger ∗ , Gernot Kubin ∗ ∗ Signal Processing and Speech Comm unication Laboratory , Graz University of T echno logy , Au stria { geiger,gernot.ku bin } @tugraz.at Abstract —In t his work we giv e a concise d eﬁnition of i nfor - mation loss fr om a system-theor etic p oint of view . Based on this deﬁnition, we analyze the information loss in memory less input- output systems subject to a c ontinuous-valued input. Fo r a certa in class of multipl e-input, multiple-output systems the inf ormation loss i s quantiﬁed. An interpretation of th is loss is acco mpanied by upper b ounds which ar e simple t o e valuate. Finally , a class of systems is ident iﬁed f or wh ich the in fo rma- tion loss i s necessarily inﬁ nite. Quantizers and limiters are shown to belong to th is class. I . I N T R O D U C T I O N In the XXXI. Sh annon lecture Han argued that in formation theory links information- theoretic quantities, such as entropy and mutual information , to operational quantities such as source, channel, capacity , and error probab ility [1]. In this work we try to make a n ew lin k to an operationa l quan tity not mention ed by H an: inform ation loss. Inf ormation can be lost, on the on e hand, in erasures or du e to superposition of noise as it is known from communic ation theo ry . Dating back to Shannon [2] this loss is linked to th e con ditional entr opy of the inpu t g iv en the outpu t, at lea st in discrete-a mplitude, memory less settings. On the other hand, as stated by the data processing inequality (DPI, [3]), information can be lost in deterministic, no iseless systems. It is this kind of lo ss that we will treat in this work, a nd we will show that it makes sen se to link it to the sam e information -theore tic quantity . The informa tion loss in input-o utput systems is very sparsely covered in the literatur e. Aside fro m the DPI f or dis- crete r andom variables ( R V) an d static systems, som e re sults are a vailable f or jointly stationary stochastic processes [4]. Y et, all these results just state that in formation is lost , withou t quantify ing this loss. Only in [5] the informa tion lost by collapsing states of a discrete-valued stochastic proce ss is quantiﬁed as the difference between the entropy rates at th e input and the outpu t of the memoryless system. Con versely , energy loss in input-output systems has bee n deeply analyzed, lead ing to me aningfu l deﬁnitions o f transfer function s and notions of passi vity , stability , and losslessness. Essentially , it is our a im to develop a system theory not from an en ergetic, b ut from an info rmation- theoretic point of view . So far we analyzed the info rmation loss of discrete- valued stationary stochastic proce sses in ﬁnite-dime nsional dynamica l in put-ou tput system s [6], where we prop osed an upper bo und on the inf ormation loss an d identiﬁed a class of information- preservin g systems (the inf ormation -theoretic counterp art to lo ssless systems). In [7] the info rmation loss of continu ous R Vs in mem oryless systems was qu antiﬁed and bound ed in a preliminar y way . In this w ork, extending [7] , we analyze the information loss for static multiple-input, multiple- output systems wh ich are subject to a continuou s input R V . Unlike in our previous work, we permit functions which lo se an inﬁnite am ount of in formation and p resent the accor d- ing condition s. Aside fr om that we p rovide a link between informa tion loss an d d ifferential entropy , a quantity which is not in variant un der chang es of variables. The next steps tow ards an i nfor mation-cen tered system theory are the analysis of discrete- time dyn amical systems with contin uous-valued stationary inp ut pro cesses and a treatmen t of inf ormation loss in multirate systems. In the remainder of this paper we giv e a mathem atically concise deﬁnitio n of informa tion loss (Section I I). After re- stricting the class of systems in Section III, in Section IV we provide exact results fo r information loss together with simple bound s, and establish a link to differential entrop ies. Finally , in Section V we show under which conditions the informatio n loss becomes inﬁnite. This ma nuscript is an extended version of a p aper sub mitted to a conf erence. I I . A D E FI N I T I O N O F I N F O R M A T I O N L O S S When talking ab out the inf ormation loss in duced by pr o- cessing o f sign als, it is o f prime importance to accompany this discussion by a well-based de ﬁnition of inform ation loss going beyond, but without lacking, in tuition. Fur ther, the deﬁnition shall also allow g eneralization s to stoch astic pr ocesses and dynamica l systems witho ut con tradicting p revious statements. W e try to m eet both objectives with the following Deﬁnition 1. Let X be an R V 1 on the sam ples space X , and let Y b e obtained by transformin g X . W e d eﬁne the inform ation loss induced by this transfo rm as L ( X → Y ) = sup P  I ( ˆ X ; X ) − I ( ˆ X ; Y )  (1) where th e supr emum is over all partitions P of X , and wher e ˆ X is o btained by qu antizing X accor ding to the partitio n P (see Fig. 1). This Deﬁnition is motiv ated by th e data p rocessing in- equality (cf. [3]), which states that the expression und er the 1 Note that X and all oth er in volve d R Vs nee d not be sca lar-v alued. ˆ X P Q ( · ) g ( · ) Y X I ( ˆ X ; X ) I ( ˆ X ; Y ) Fig. 1. Model for computing the informat ion loss of a memoryle ss inpu t- output system g . Q is a quantiz er with part ition P . supremum is a lways no n-negative: Inf ormation loss is the worst-case redu ction of informatio n about ˆ X induced by transform ing X . W e now try to shed a little more light o n Deﬁnition 1 in the fo llowing Theorem 1 . The in formation loss of Deﬁ nition 1 is given by: L ( X → Y ) = lim ˆ X → X  I ( ˆ X ; X ) − I ( ˆ X ; Y )  (2) = H ( X | Y ) (3) Pr oo f: W e start by no ticing that I ( ˆ X ; X ) − I ( ˆ X ; Y ) = H ( ˆ X | Y ) ≤ H ( X | Y ) (4) by th e deﬁnition of mutual infor mation and since b oth ˆ X and Y ar e functions of X . The inequa lity in (4) is due to data processing ( ˆ X is a function of X ) . W e n ow sho w that in the supremum over all partitions equality can be achieved. T o this end, observe that among all partitions of the sample space of X there is a sequ ence {P n } of increasingly ﬁne partitions 2 such that lim n →∞ ˆ X n = X (5) where ˆ X n is th e quantization o f X indu ced by partition P n . By the axio ms of entropy (e.g., [8, Ch. 14]), H ( ˆ X n | Y ) is an increasing sequ ence in n with limit H ( X | Y ) . Thus, this limit represents the supremu m in Deﬁnition 1, which proves (3). Note fu rther that each co n verging sequ ence ˆ X → X contains a conv erging subsequen ce ˆ X n → X satisfying (5), and where ˆ X n +1 is obtained b y r eﬁning the pa rtition ind ucing ˆ X n . Theref ore, lim ˆ X → X H ( ˆ X | Y ) = lim n →∞ H ( ˆ X n | Y ) = H ( X | Y ) (6) which completes the proo f. This Theorem shows that the supre mum in Deﬁnition 1 is achieved for ˆ X ≡ X , i.e., when we compute the d ifference between the self-in formation of the inp ut and the inform ation the outp ut of the system contains about its input. This differ - ence w as sho wn to be identical to th e conditional entropy of the in put given the o utput – the quantity which is also used f or quantify ing the information loss due to noise or e rasures (in the discrete-valued, memoryless case). In addition to that, th e Theorem suggests a natural way to measure the info rmation loss via measuring mutual informatio ns, as it is depicted in Fig. 1. As we will see later (cf. Theo rem 3), the con sidered 2 i.e., P n +1 is a reﬁnement of P n . partition does n ot have to b e inﬁnitely ﬁn e, but indeed a compara bly co arse partition can deliver the cor rect result. I I I . P RO B L E M S TA T E M E N T Let X = [ X 1 , X 2 , . . . , X N ] be an N -dimensional R V with a pro bability measur e P X absolutely co ntinuo us w . r .t. the Lebesgue measure µ ( P X ≪ µ ). W e requir e P X to be concentr ated on X ⊆ R N . This R V , which possesses a u nique probab ility density fu nction (PDF) f X , is the input to the following m ultiv ariate, vector -valued function: Deﬁnition 2. Let g : X → Y , X , Y ⊆ R N , b e a surjectiv e, Borel-measur able fu nction deﬁned in a piecewise m anner: g ( x ) =        g 1 ( x ) , if x ∈ X 1 g 2 ( x ) , if x ∈ X 2 . . . (7) where x = [ x 1 , x 2 , . . . , x N ] and g i : X i → Y i bijectively 3 . Fur- thermor e, let the Jacobian matrix J g ( · ) exist on th e closures of X i . In addition to th at, we require the Jacob ian deter minant, | det J g ( · ) | , to b e non-zero P X -almost ev erywhe re. In acc ordance with pre vious work [7] the X i are disjoin t sets of p ositiv e P X -measure which un ite to X , i.e., S i X i = X and X i ∩ X j = ∅ if i 6 = j . Clearly , also the Y i unite to Y , but need not be disjoint. T his deﬁnition ensures that the p reimage g − 1 [ y ] o f each elemen t y ∈ Y is a countab le set. Using th e meth od of transform ation [8, pp . 2 44] one obtains the PDF of th e N -dimensional o utput R V Y = [ Y 1 , Y 2 , . . . , Y N ] as f Y ( y ) = X x i ∈ g − 1 [ y ] f X ( x i ) | det J g ( x i ) | (8) where the sum is over all elements o f the preimag e. Note that since Y possesses a den sity , the correspon ding probability measure P Y is also ab solutely continuous w .r .t. th e L ebesgue measure. I V . M A I N R E S U LT S W e now state our main results: Theorem 2. The information loss induced b y a function g satisfying Deﬁnition 2 is giv en as H ( X | Y ) = Z X f X ( x ) log   P x i ∈ g − 1 [ g ( x ) ] f X ( x i ) | det J g ( x i ) | f X ( x ) | det J g ( x ) |   d x . (9) The proo f of this Theo rem can be f ound in the Appendix and, in a modiﬁed version for u niv ariate fu nctions, in [7]. Note th at fo r u niv ariate functions the Jacobian determinant is replaced by the derivati ve of the function . 3 In the uni varia te case, i.e., for N = 1 , this is equi v alent to requiring that g is pie ce wise strictl y monotone. Corollary 1. The info rmation loss indu ced by a function g satisfying Deﬁnitio n 2 is given as H ( X | Y ) = h ( X ) − h ( Y ) + E { log | det J g ( X ) |} (10) Pr oo f: The proo f is obtained by reco gnizing the PDF of Y insid e the lo garithm in (9) and by splitting the logarithm . This result is particularily interesting because it provid es a lin k between in formatio n loss a nd d ifferential en tropies already anticipated in [8, pp. 66 0]. T here, it was claim ed that h ( Y ) ≤ h ( X ) + E { log | det J g ( X ) |} (11) where equ ality holds iff g is bijectiv e. While (1 1) is a ctually another version of th e DPI, Coro llary 1 quan tiﬁes how m uch informa tion is lost b y processing. In addition to that, a very similar expression denoted as foldin g entr opy has been presented in [9] , altho ugh in a completely d ifferent setting analyzing the entr opy production o f auton omous dynamical systems. W e now introduce a discrete R V W which d epends on the set X i from which X was taken. In oth er words, for all i we have W = w i iff x ∈ X i . O ne can in terpret this R V as being generated by a vector qu antization of X with a partition P = {X i } . W ith this new R V we can state Theorem 3. The in formation loss is iden tical to the un cer- tainty about the set X i fr om which the inp ut was taken, i.e., H ( X | Y ) = H ( W | Y ) . (12) The proof follows closely the proof pr ovided in [7] and thus is omitted. Howe ver , th is equ iv alen ce suggests a way o f measuring inf ormation loss by means of proper quantiza tion: Since H ( W | Y ) = I ( W ; X ) − I ( W ; Y ) the loss can b e determined by measur ing mutual informations, which in this case are alw ays ﬁnite (or, at least, bounded by H ( W ) ) . In contrary to that, the mutual info rmation in (2) o f Th eorem 1 div erge to inﬁnity; This expr ession was used in [7] for the informa tion loss, hig hlighting the fact that both the self- informa tion of X and the infor mation transfer fro m X to Y are inﬁnite. The interpretation deriv ed from Th eorem 3 allo ws us now to provide upper bounds on the infor mation loss: Theorem 4. The informa tion loss is upper boun ded by H ( X | Y ) ≤ Z Y f Y ( y ) log | g − 1 [ y ] | d y (13) ≤ log X i Z Y i f Y ( y ) d y ! (14) ≤ max y log | g − 1 [ y ] | . (15) Pr oo f: W e gi ve here only a sketch of the proof: The ﬁrst inequality results from bounding H ( W | Y = y ) by the entropy of a uniform distribution on the p reimage of y . Jensen’ s inequality yields th e second line of the Theorem. T he coarsest bound is o btained by r eplacing the card inality o f th e preimage by its maxim al value. In th is Theorem, we b ound ed the inf ormation loss given a certain o utput by the cardin ality of the preimage. While the ﬁrst boun d considers the fact that the cardinality may actually depend on the o utput itself, the la st bo und incorp orates the maximum cardinality only . In cases where the fun ction fro m Deﬁnition 2 is deﬁn ed not o n a co untable but on a ﬁnite number of subdomains this ﬁnite number can act as an upper bound (cf. [7]). Another straig htforward upper b ound , which is indep endent from th e bou nds in Theor em 4 is o btained fro m Theorem 3 by removing conditionin g: H ( X | Y ) ≤ H ( W ) = − X i p i log p i (16) where p i = P X ( X i ) = R X i f X ( x ) d x . It has to be no ted, though , that dep ending on the function g all these bou nds can be inﬁnite while the info rmation loss remains ﬁnite. A further implication of introd ucing this discrete R V W is that it allows us to perfor m in vestigations about reco nstructing the input fr om the outpu t. Cu rrently , a Fano-typ e inequ ality bound ing the reconstru ction erro r by the in formation loss is under development. I n addition to that, new upper bou nds on the inform ation loss related to the reco nstruction error o f optimal (in the maximum a posteriori sense) and of simpler, sub-optim al estima tors are analyzed . V . F U N C T I O N S W I T H I N FI N I T E I N F O R M AT I O N L O S S W e n ow drop the requirement of local bijecti vity in Deﬁni- tion 2 to analyze a wid er c lass of sur jectiv e, Bor el-measurab le function s g : X → Y . W e keep the re quiremen t th at P X ≪ µ and thus X p ossesses a den sity f X (positive o n X and zero else where) . W e maintain Theorem 5. Let g : X → Y be a Bor el-measurable func tion and let the continuous RV X be th e inp ut to this fun ction. If ther e exis ts a set B ⊆ Y of positive P Y -measure such that the pr eimage g − 1 [ y ] is uncoun table for every y ∈ B , then the information loss is inﬁn ite. Pr oo f: W e notice that since B ⊆ Y H ( X | Y ) = Z Y H ( X | Y = y ) dP Y ( y ) (17) ≥ Z B H ( X | Y = y ) dP Y ( y ) (18) where the integrals ar e now written as Lebesgu e integrals, since P Y now n ot necessarily possesses a den sity . Since on B the preimage of every element is un countab le, we obtain with [4] and the references therein H ( X | Y = y ) = ∞ for all y ∈ B , an d, thus, H ( X | Y ) = ∞ . Note that the req uirement of B b eing a set of po siti ve P Y -measure cann ot be dropped, as Ex ample 4 in Section VI illustrates. W e immed iately obtain the following Corollary 2. Let g : X → Y be a Borel-measurable functio n and let the continuous RV X be th e inp ut to this fun ction. If the pr obab ility m easur e of the outp ut, P Y , possesses a non- vanishing discr ete comp onent, the information loss is in ﬁnite. Pr oo f: Accordin g to the Lebesgue- Radon-Nikody m the- orem [10, pp. 12 1] every measure can b e dec omposed in a compon ent absolutely continuo us w .r . t. µ an d a com ponen t singular to µ . T he latter p art can fu rther be d ecompo sed into a singular continuo us and a discrete part, where th e latter pla ces po siti ve P Y -mass on po ints. Let y ∗ be such a point, i.e., P Y ( y ∗ ) > 0 . As an immediate consequen ce, P X ( g − 1 [ y ∗ ]) > 0 , which is only p ossible if g − 1 [ y ∗ ] is uncou ntable ( P X ≪ µ ). This result is also in accordance with intu ition, as the analysis of a simple quantizer sho ws: While the entro py of the inp ut R V is inﬁn ite ( I ( ˆ X ; X ) → ∞ for ˆ X → X ; cf. [8, pp. 654]), the quantized o utput c an con tain o nly a ﬁnite amount o f information ( I ( ˆ X ; Y ) → H ( Y ) < ∞ ). In addition to tha t, the pre image of each p ossible outpu t value y is a set o f positive P X -measure. The loss, as a conseq uence, is inﬁnite. While for the quantizer the preimage of each possible output value is a set of positi ve measure, there cer tainly are function s for which some outputs h av e a countable preimage and some whose preimage is a n on-n ull set. An example of such a s ystem is th e limiter [8, Ex. 5-4 ]. For such systems it can be shown that both the info rmation loss L ( X → Y ) = H ( X | Y ) and the infor mation transfer I ( X ; Y ) are inﬁnite. Finally , ther e exist function s g for wh ich the preim ages of all ou tput values y are nu ll sets, but which still fu lﬁll the condition s of Theo rem 5. Functions which project X on a lower -dimensiona l subspace of R N fall into that category . V I . E X A M P L E S In this Section we illustrate o ur theoretical results with the help of examples. The log arithm is taken to base 2 unless otherwise noted. A. Example 1: A two-dimensiona l transform with ﬁnite in for- mation loss Let X b e unifo rmly distributed on the square X = [ − a, a ] × [ − a, a ] . Eq uiv alently , th e two constituing R Vs X 1 and X 2 are indepen dent and unif ormly distributed on [ − a, a ] . In o ther words, while f X ( x ) = 1 / 4 a 2 for a ll x ∈ X , we ha ve f X ( x i ) = 1 / 2 a for x i ∈ [ − a, a ] and i = 1 , 2 . W e con sider a function g per formin g the map ping: Y 1 = X 1 (19) Y 2 = | X 1 − X 2 | (20) The correspo nding Jaco bian matrix is a triang ular matrix J g ( x ) =  1 0 sgn ( x 1 − x 2 ) sgn ( x 2 − x 1 )  (21) where s gn ( · ) is the sign-fu nction. Fr om this imm ediately follows that the magnitude of the determin ant of the Jacobian matrix is u nity for all possible v alues o f X , i.e. , | det J g ( x ) | = 1 for all x ∈ X . The sub sets of X on which th e partitioned function s g i are bijecti ve are no interv als in this case; th ey are x 1 x 2 a a X 2 X 1 Fig. 2. Subdomains of E xample 1. The part itioned funct ions g i restrict ed to the domain of either color are bije cti ve. Furthermore, the overal l function g is bijecti ve in areas with light shading. the triangu lar halves of the square indu ced by x 1 = x 2 (see Fig. 2) X 1 = { [ x 1 , x 2 ] ∈ X : x 1 > x 2 } (22) X 2 = { [ x 1 , x 2 ] ∈ X : x 1 ≤ x 2 } . (23) The preimage of g ( x ) is, in any case, { [ x 1 , x 2 ] , [ x 1 , 2 x 1 − x 2 ] } ∩ X . (24) The tran sform g is b ijectiv e wh enever [ x 1 , 2 x 1 − x 2 ] / ∈ X , i.e. , if | 2 x 1 − x 2 | > a . W ith the PDF of X and of its co mponen ts we o btain f or the infor mation loss H ( X | Y ) = Z a − a Z a − a 1 4 a 2 log  1 2 a + f X (2 x 1 − x 2 ) 1 2 a  dx 1 dx 2 (25) which is non-z ero only if − a ≤ 2 x 1 − x 2 ≤ a ( numera tor a nd denomin ator ca ncel other wise; no loss occu rs in the b ijectiv e domain of the fun ction). As a consequence, H ( X | Y ) = Z a − a Z x 2 + a 2 x 2 − a 2 log 2 4 a 2 dx 1 dx 2 (26) = Z a − a 1 4 a dx 2 = 1 2 . (27) The infor mation loss is identical to a half bit. This is in tuitive when looking at Fig. 2, where it ca n be seen that any inf or- mation loss o ccurs only on one half of the do main X (shaded in stronger colors). By d estroying the sign inform ation, in this area the infor mation loss is equal to one bit. B. Example 2: Squ aring a Gaussian RV Let X be a zero-mea n Gaussian R V with unit variance and differential entropy h ( X ) = 1 2 ln(2 π e ) measured in nats. W e co nsider the square of this R V , Y = g ( X ) = X 2 , to illustrate the con nection between informatio n loss a nd x f X ( x ) λ x g ( x ) 1 λ 1 λ Fig. 3. P DF f X and piece wise linear func tion g of Exampl e 3. differential entr opy . The sq uare of a Gaussian R V , Y , is χ 2 - distributed with one d egree o f freedom . Thu s, th e differential entropy of Y is given by [11] h ( Y ) = 1 2 + ln  2Γ  1 2  + 1 2 ψ  1 2  (28) = 1 2 + 1 2 ln π − γ 2 (29) where Γ( · ) and ψ ( · ) are the gamma- and digamma- function s [12, Ch. 6] and γ is th e Eu ler-Mascheroni con- stant [12, pp. 3]. W ith some calculus we obtain for the expected value of the derivati ve ( taking the p lace of th e Jacobian determ inant in the univ ariate case) E { ln | 2 x |} = 1 2 ln 2 − γ 2 . (30) Subtracting differential entro pies and add ing the e xpected value o f the derivati ve y ields the informa tion loss H ( X | Y ) = h ( X ) − h ( Y ) + E { ln | 2 x |} (31) = 1 2 ln(2 π e ) − 1 2 − 1 2 ln( π ) + 1 2 ln 2 (32) = ln 2 (33) again measured in na ts. Changing the ba se of th e logarithm to 2 we obta in an inform ation lo ss of one bit. T his is in perf ect accordan ce with a pre vious result showing that the information loss o f a square-law device is equal to one b it if the PDF of the inpu t has e ven symm etry [7]. C. Example 3: Expo nential RV an d inﬁnite bounds In this example we consider an e xpon ential in put with PDF f X ( x ) = λ e − λx (34) and a piecewise lin ear fun ction g ( x ) = x − ⌊ λx ⌋ λ . (35) The PDF and the fun ction are depicted in Fig. 3. W e obviou sly have X = [0 , ∞ ) and Y = [0 , 1 λ ) , while g partitions X in a countable number of intervals of length 1 λ . In other words, X k =  k − 1 λ , k λ  (36) and g ( X k ) = Y f or all k = 1 , 2 , . . . . From this follows th at for every y ∈ Y th e preimag e contains an element from each subdoma in X k ; th us, the bou nds from T heorem 4 all e valuate to H ( X | Y ) ≤ ∞ . Howe ver, it can be shown that the oth er bound , H ( X | Y ) ≤ H ( W ) is tight in this case: W ith p k = P X ( X k ) = Z X k f X ( x ) dx = (1 − e − 1 )e − k +1 (37) we ob tain H ( W ) = − log(1 − e − 1 ) + e − 1 1 − e − 1 ≈ 1 . 24 . The sam e result is obtaine d for a direct e valuation of Theorem 2. D. Example 4: An almost in vertible tr ansform with zer o in- formation loss As a n ext example consider a tw o-dime nsional R V X w hich places prob ability m ass uniform ly on the unit disc, i.e., f X ( x ) = ( 1 π , if || x || ≤ 1 0 , else (38) where || · || is the Euclidean n orm. Th us, X = { x ∈ R 2 : || x || ≤ 1 } . The cartesian coordinates x are now transformed to polar coord inates in a special way , namely : y 1 = ( || x || , if || x || < 1 0 , else (39) y 2 = ( arctan( x 2 x 1 ) + π (1 − sgn ( x 1 )) , if 0 < || x || < 1 0 , else (40) This mapping to gether with th e domains o f X and Y is illustrated in Fig. 4 (left and up per right diagram). As a d irect consequ ence we have Y = (0 , 1) × [0 , 2 π ) ∪ { 0 , 0 } . Observe that not o nly the point x = { 0 , 0 } is mapped to the point y = { 0 , 0 } , but that also the unit circle S = { x : || x || = 1 } is mapped to y = { 0 , 0 } . A s a co nsequence , the preimage of { 0 , 0 } under g is unco untable. Howe ver, since a circle in R 2 is a Lebesgue null-set and thus P X ( S ) = 0 , also P Y ( { 0 , 0 } ) = 0 and th e conditions of Theorem 5 are not met. Indeed , sin ce H ( X | Y = y ) = 0 P Y -almost everywhere, it can be shown that H ( X | Y ) = 0 . E. Example 5: A mapping to a subspa ce of lower dimen sion- ality Consider a gain a uniform distribution on th e un it disc, as it was u sed in Example 4. Now , ho wever , let g be such that only the radius is co mputed while the angle is lost, i.e., y 1 = || x || (41) y 2 = 0 . (42) Note that here only the orig in { 0 , 0 } is mapped b ijectiv ely , while for all other y ∈ Y = [0 , 1] × { 0 } the p reimage u nder x 1 x 2 1 y 1 y 2 2 π 1 y 1 y 2 1 Fig. 4. Mapping of domains in Examples 4 and 5. T he solid red circle in the left diagram and the red dot in the uppe r right diagram correspond to each other , ill ustrating th e mapping of an uncountable P X -null set to a point. The lightl y shaded areas a re mapped bij ecti vely in Exampl e 4. In Example 5, the disc in the left diagram is m apped to the solid red line in the lo w er right diagram. x 1 x 2 a − m a − m − m − a X 1 X 2 X 3 Fig. 5. Subdomains of Example 6. The functions g i restrict ed to a domain of either color are bijecti ve. Furthermore, the ove rall function g is bije cti ve in areas with light shading. g is uncoun table (a circle aro und the origin with ra dius y 1 ). Indeed , in this p articular example, the proba bility m easure P Y is no t discrete, but sin gular con tinuous: Each point has zero P Y -measure (circles are Leb esgue null-sets), but P Y is not absolutely co ntinuou s w .r .t. th e two-dim ensional Lebesgu e measure µ . Clearly , µ ( Y ) = 0 while P Y ( Y ) = 1 . Since th e preimage is unco untable on a set o f po siti ve P Y -measure, we have H ( X | Y ) = ∞ . F . E xample 6: Anoth er two- dimensiona l transform with ﬁnite information loss Finally , co nsider a unifo rm distribution on a triang le deﬁned by X = { x ∈ R 2 : x 1 ∈ [ m − a, m + a ] , x 2 ∈ [ − m − a, − x 1 ] } (43) 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 0 0.5 1 1.5 m H( X | Y ) Information Loss Upper Bound Thm. 4, (16) Upper Bound H(W) Fig. 6. Information loss H ( X | Y ) of Example 6 for a = 2 . where 0 ≤ m ≤ a and a > 0 . Thus, the PDF of X is gi ven as f X ( x ) = 1 2 a 2 if x ∈ X and zero elsewhere (see Fig. 5). The fu nction g takes the mag nitude of each coordin ate, i.e., y i = | x i | , where i = 1 , 2 . W e now try to derive the infor mation loss as a functio n of m . First we can identify thr ee subsets of X wh ich are mappe d bijectively by restricting g to these sets, namely X 1 = { x ∈ X : x 1 ≤ 0 , x 2 ≥ 0 } , X 2 = { x ∈ X : x 1 ≤ 0 , x 2 < 0 } , and X 3 = { x ∈ X : x 1 > 0 , x 2 < 0 } . Furthermor e, for m ≥ 0 a part o f X 3 is mappe d bijecti vely by g (lighter sh ading in Fig. 5). The probab ility mass co ntained in this subset X b can be shown to equal P b = P X ( X b ) = m 2 a 2 . For all other possible input values x the preim age of g ( x ) has exactly tw o e lements: One of them is located in X 2 , the other either in X 1 or in X 3 \ X b . Due to the un iformity of X an d sinc e the Jacobian determinan t is identical to unity for all x ∈ X bo th of these preimages are equa lly likely . Thus, on X \ X b the inform ation loss is identical to one bit. I n other words, H ( X | Y = y ) = 1 (44) for all y ∈ g ( X \ X b ) . W e therefo re obtain with P b = m 2 a 2 an informa tion loss eq ual to H ( X | Y ) = 1 − m 2 a 2 . From the p robability masses conta ined in the sets X 1 , X 2 , and X 3 we can comp ute an upper bound on the info rmation loss: H ( W ) = m 2 2 a 2 + 3 2 − log a 2 − m 2 a 2 + m a log a − m a + m . (45) And ev aluating the bounds of Theorem 4 yields H ( X | Y ) ≤ 1 − m 2 a 2 ≤ lo g(2 − m 2 a 2 ) ≤ 1 (46) which for m = 0 all red uce to one bit. In particular, it can be seen that in this case the smallest bou nd of Theorem 4 is exact. The exact information loss, togeth er with the second small- est bou nd from The orem 4 and with th e boun d from H ( W ) , is shown in Fig. 6. As it can be seen, the closer the param - eter m appr oaches a , the smaller the inform ation loss gets. Con versely , for m = 0 the info rmation loss is exactly one bit. Moreover , it turn s out that the bo und from H ( W ) is rather loose in this case. V I I . C O N C L U S I O N In this work, we propo sed a mathematically concise def- inition of in formation loss for the pu rpose of establishing a system theory fro m an infor mation-th eoretic poin t of view . For a certain class of m ultiv ariate, vector-valued functions an d continuo us in put v ariables this inf ormation loss was quantiﬁed, and the result is accompanied by con venient upper bounds. W e further sho wed a c onnection between information loss and the differential en tropies of the input and ou tput v ariables. Finally , a class of systems has been identiﬁed for which the informatio n loss is inﬁnite. V ector-quantizers and limiters belong to th at class, but also functions wh ich project the inpu t space onto a space of lower dimensionality . A P P E N D I X P R O O F O F T H E O R E M 2 For the p roof we use (2) of Theo rem 2, wher e we take the limit of a sequence of increasingly ﬁne partition s P n = { ˆ X ( n ) k } satisfying (5 ). For a given n we write the resulting mutual infor mation I ( ˆ X n ; X ) as I ( ˆ X n ; X ) = E n D ( f X | ˆ X n ( · , ˆ x ) || f X ( · )) o (47) where D ( ·||· ) denotes the Kullback -Leibler di vergence and the expection is w .r .t. ˆ X n . Note that fo r ea ch p ossible o utcome ˆ x k of ˆ X n the co nditiona l probab ility measure P X | ˆ x k is ab solutely continuo us w .r . t. the Lebesgue m easure (c f. Sectio n V) . I t thu s possesses a density f X | ˆ X n ( x , ˆ x k ) = ( f X ( x ) p ( ˆ x k ) , if x ∈ ˆ X ( n ) k 0 , else (48) where p ( ˆ x k ) = P X ( ˆ X ( n ) k ) . W ith the deﬁnition o f the Kullback-Leibler di vergence [3, Lemma 5.2.3 ] and [8, Thm. 5- 1] we can write the difference of mutual informa tions in Theorem 1 as I ( ˆ X n ; X ) − I ( ˆ X n ; Y ) = X k p ( ˆ x k ) Z ˆ X ( n ) k f X ( x ) p ( ˆ x k ) log f X | ˆ X n ( x , ˆ x k ) f Y ( g ( x )) f Y | ˆ X n ( g ( x ) , ˆ x k ) f X ( x ) ! d x . (49) Re writing with the indicator function I A ( x ) = ( 1 , if x ∈ A 0 , else (50) this yields I ( ˆ X n ; X ) − I ( ˆ X n ; Y ) = Z X f X ( x ) X k I ˆ X ( n ) k ( x ) log f X | ˆ X n ( x , ˆ x k ) f Y ( g ( x )) f Y | ˆ X n ( g ( x ) , ˆ x k ) f X ( x ) !! d x . W e can n ow exploit the relationship (8) for th e conditional PDF of Y gi ven ˆ X n , and with (48) we realize that the functio n under the integral is mo noton ically increasing in n : Ind eed, for ﬁner partitions it is less likely that an y element of the preimage g − 1 [ g ( x )] other th an x lies in ˆ X ( n ) k , thus f Y | ˆ X n ( g ( x ) , ˆ x k ) conv erges to f X | ˆ X n ( x , ˆ x k ) | det J g ( x ) | . This holds for all k , thu s, in voking the mono tone converge theor em [10, p p. 21] and cancelling the conditional PDFs eliminates th e dependence on k and the sum over in dicator functions ( S k ˆ X ( n ) k = X ). Substituting the PDF of Y with (8) comp letes the proof. R E F E R E N C E S [1] T . S. Han, “Musing upon informat ion the ory , ” XXXI Shannon Lectu re, 2010, pre sented at IEEE Int. Sym. on Information Theory (ISIT). [2] C. E. Shannon, “ A mathematic al theory of communicati on, ” Bell Systems T echnica l Journ al , vol. 27, pp. 379–423, 623–656, Oct. 1948. [3] R. M. Gray , Entropy and Information Theory . Ne w Y ork, NY : Springer , 1990. [4] M. S. Pinsk er , Information and Informat ion Stabil ity of Random V ari- ables and Pro cesses . San Francisco: Holde n Day , 1964. [5] S. W atanabe and C. T . Abraham, “Loss and recov ery of information by coarse observ ation of stochasti c chain, ” Information and Contr ol , vo l. 3, no. 3, pp. 248–278, Sep. 1960. [6] B. C. Geiger and G. Kub in, “Some results on the information loss in dynamica l systems, ” in Pr oc. IEEE Int. Sym. W ir eless Commun ication Systems (ISWSC) , Aachen, Nov . 2011, accepted; preprint av aila ble: arXiv:1106.240 4 [cs.IT] . [7] B. C. Geiger , C. Feldbauer , and G. Kubin, “Information loss in static nonlin earitie s, ” in Pr oc. IEEE Int. Sym. W ir eless Communicat ion Systems (ISWSC) , Aachen, Nov . 2011, accepted; preprint av aila ble: arXiv:1102.479 4 [cs.IT] . [8] A. Papou lis and U. S. Pillai, Proba bility , Random V ariables and St ochas- tic Pr ocesses , 4th ed. Ne w Y ork, NY : Mc Graw Hill, 2002. [9] D. Ruelle, “Positi viy of entrop y production in noneq uilibrium statistic al mechanic s, ” J.Stat .Phys. , vol. 85, pp. 1–23, 1996. [10] W . Rudi n, Real and Comple x Analysis , 3rd ed. Ne w Y ork, NY : McGra w-Hill, 1987. [11] A. C. V erdugo Lazo and P . N. Ra thie, “On the en tropy of continu ous probabil ity distribut ions, ” IEE E T ransact ions on Informat ion Theory , vol. IT -24, pp . 120– 122, 1978. [12] M. Abramowitz and I. A. Stegun, E ds., Handbook of Mathematic al Functions with F ormulas, Graphs, and Math ematical T ables , 9th ed. Dov er Publ icatio ns, 1972.

On the Information Loss in Memoryless Systems: The Multivariate Case

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment