Unequal Error Protection: An Information Theoretic Perspective

An information theoretic framework for unequal error protection is developed in terms of the exponential error bounds. The fundamental difference between the bit-wise and message-wise unequal error protection (UEP) is demonstrated, for fixed length b…

Authors: Shashi Borade, Baris Nakiboglu, Lizhong Zheng

Unequal Error Protection: An Information Theoretic Perspective
1 Unequal Error Protecti on: An Information Theoreti c Perspecti v e Shashi Borade Barıs ¸ Nakibo ˘ glu Lizhong Zheng EECS, Massachusetts Insti tute of T e chnology { spb , nakib , li zhong } @mit .edu Abstract An inform ation theoretic fram ew o rk for uneq ual error prote ction is developed in terms of the exponential error bound s. The fundamental difference between the bit- wise an d message-wise une qual err or protectio n ( UEP ) is demonstra ted, f or fixed length block codes on D MCs without feedba ck. Ef fect of feedbac k is investigated via variable len gth block cod es. It is shown that, feed back results in a significant imp rovement in bo th bit-wise and message-wise UEP (except the single m essage case f or missed detection). T he distinction between false-alarm and missed-detection formaliza tions for message-wise U E P is also consider ed. All results pre sented are at rates close to capacity . I . I N T RO D U C T I O N Classical theoretical f ramew ork for communication [35] a ssumes that all information is equally important. In this frame work, the communica tion system aims to provide a u niform error protection to all messages: any particular messa ge b eing mistaken a s a ny o ther is viewed to b e e qually cos tly . W ith such uniformity ass umptions, reliability of a communication scheme is measured b y either the a verage or the worst c ase prob ability of error , over all possible messag es to be trans mitted. In information theory literature, a communication scheme is said to be reli able if this error proba bility c an be made small. Communication s chemes designed with this fr amew ork turn out to be optimal in sen ding a ny source over any c hannel, p rovided that long enoug h co des ca n be e mployed. This homogeneo us view of information motiv a tes the un i versal interface of “bits” between any source and any channe l [35], a nd is o ften viewed as Shannon’ s mos t significa nt co ntrib ution. In many c ommunication scenarios, such as wireless networks, interac ti ve sys tems, and c ontrol applications, where uniformly good error protection becomes a luxury; providing s uch a protec tion to the entire information might be wasteful, if not infeasible. Instea d, it is more efficient he re to p rotect a crucial part of information better than the res t. For example, • In a wireless n etwork, c ontrol signals like cha nnel state, power c ontrol, and sche duling information are often more important tha n the payload data, a nd sh ould be protected more carefully . Thus ev en though the fin al objectiv e is delivering the p ayload da ta, the phy sical layer should provide a better p rotection to suc h protoc ol information. Similarly for the Internet, packet head ers a re more important for delivering the packet a nd nee d better protection to ensure that the a ctual data gets through. • Another example is transmission of a multiple res olution s ource code. The coarse resolution n eeds a better protection than the fine resolution so that the user at leas t obtains s ome crude reconstruction after bad noise realizations. • Controlling unstable plan ts over noisy communica tion link [33] an d comp ressing uns table so urces [34] provide more examples where different p arts of information need dif ferent reliability . In co ntrast with the classica l h omogene ous vie w , these examples de monstrate the heteroge neous nature of infor- mation. F urthermore the practical nee d for un equal error protection ( UEP ) du e to this heteroge neity de monstrated in these examples is the reas on wh y we ne ed to g o b eyond the con ventional con tent-blind information processing. This research is supported by DARP A ITMANET project and an AFOSR grant F A9550-06-0156. Initial part of t his paper was submitted to IEEE International Symposium on I nformation Theory , 2008. 2 Consider a messa ge set M = { 1 , 2 , 3 , . . . , 2 k } for a bloc k c ode. N ote tha t members of this set, i.e. “mes sages ”, can also b e represe nted b y length k strings of information bits, b = [ b 1 , b 2 , . . . b k ] . A bloc k code is compose d of an enc oder which maps the mes sages , M ∈ M into channel inputs an d a decoder which ma ps cha nnel outputs to decode d message , ˆ M ∈ M . An error event for a block co de is { ˆ M 6 = M } . In mos t information theory texts, wh en an error o ccurs, the entire bit sequence b is rejected. That is, e rrors in dec oding the mess age a nd in dec oding the information b its are treated s imilarly . W e av oid this, and try to figure out wha t can be a chieved b y analyz ing the errors of dif ferent subs ets of bits separately . In the existing formulations of unequal error protection codes [38] in coding theory , the information bits are partitioned into s ubsets, a nd the de coding errors in different s ubsets o f bits are viewed a s d if ferent kind s of e rrors. For example, one might want to p rovide a b etter protection to one subse t o f bits by e nsuring that e rrors in these bits are less probable than the other bits. W e call such problems a s “ bit-wise U E P ”. Pre vious examples of p acket headers , multiple res olution codes , etc. belong to this c ategory o f UEP . Howe ver , in s ome situations, instead of bits on e might want to provide a be tter protec tion to a subset of messages . For example, one might co nsider embe dding a s pecial mes sage in a n ormal k -bit code, i.e., transmitting one of 2 k + 1 me ssage s, w here the extra messa ge has a spe cial meaning and req uires a sma ller error probab ility . Note that the error event for the special me ssage is not associated to error in any particular bit o r se t of bits. Instead, it c orresponds to a particular bit-seq uence ( i.e . message ) being dec oded a s some other b it-sequence . Borrowi ng from h ypothesis testing, we c an define two kinds of errors correspo nding to a spec ial mess age. • Missed-detec tion of a messag e i oc curs when transmitted messa ge M is i a nd decode d me ssage ˆ M is some other messag e j 6 = i . Conside r a sp ecial me ssage indica ting some system emergency which is too cos tly to be missed. Clearly , s uch spe cial mess ages de mand a small misse d detec tion probab ility . Missed detection probability of a messag e is s imply the con ditional error probability after its transmission. • F alse-alar m of a messa ge i oc curs when trans mitted mess age M is s ome other messa ge j 6 = i and d ecode d messag e ˆ M is i . Con sider the reboot message for a remote-controlled s ystem suc h as a robot or a satellite or the “ disconne ct” message to a cell-phone . Its false-alarm c ould ca use unn ecess ary s hutdowns and o ther system troubles. S uch spe cial mes sages demand small false alarm p robability . W e call such problems as “me ssage -wise UE P ”. In con ventional frame work, every bit is as important as every other bit and every messa ge is a s important as ev ery o ther messag e. In sh ort in con ventional frame work it is assume d that a ll the information is “crea ted e qual”. In s uch a framework there is n o rea son to distinguish be tween bit-wise or mes sage wise error probab iliti es bec ause me ssage -wise error proba bility is lar g er than bit-wise error probability by an insignificant factor , in terms of exponents. However , in the UEP setting, it is nece ssary to dif ferentiate b etween mes sage-errors an d bit-errors. W e will se e tha t in many situations , error prob ability of special bits and messages have behave very dif ferently . The main contrib ution of this paper is a s et of res ults, identifying the performance limits a nd optimal cod ing strategies, for a v ariety of UE P scena rios. W e focus on a fe w simplified n otions of UEP , most with immediate practical applications , and try to illustrate the ma in insights for the m. One ca n imagine using the se UEP strategies for embe dding protocol information within the actual data. By e liminating a se parate control chan nel, this can enhanc e the ov erall b andwidth and/or energy efficiency . For conce ptual clarity , this article focu ses exclusively on situations w here the da ta rate is esse ntially equal to the chann el capacity . These s ituation can be motiv ated by the scenarios where data rate is a c rucial sys tem resource that ca n not be c ompromised. In these situations, no po siti ve error expo nent in the co n ventional se nse can b e a chieved. That is, if we aim to p rotect the en tire information u niformly we ll, ne ither bit-wise nor me ssage - wise e rror probabilities can de cay exponen tially fast with incre asing co de length. W e ask the question then “c an we make the e rror prob ability of a pa rticular bit, or a particular me ssage , decay expo nentially fast with block length?” When we break away from the c on ventiona l framework and start to provide be tter protec tion to agains t certain kinds o f errors, there is n o reas on to restrict ourselves by a ssuming that those errors are errone ous dec oding of some particular bits or missed de tections or false a larms associated with some particular me ssage s. A g eneral 3 formulation of UEP could be an a rbitrary c ombination of protec tion demands against some specific kinds of errors. In this g eneral definition of UEP , bit-wise UEP and mes sage-wise UEP are s imply two particular ways of spe cifying which kinds of e rrors a re too costly compared to othe rs. In the foll owing, we start by s pecifying the cha nnel mod el and giving so me b asic definitions in Section II. Then in sec tion III we discuss b it-wise UE P and mes sage-wise UEP for bloc k co des without fee dback. The orem 1 shows that for da ta-rates approa ching capacity , e ven a single bit ca nnot achieve any positiv e error expon ent. Thus in bit-wise UEP , the data-rate must back off from cap acity for ac hieving any positi ve error expo nent ev en for a single bit. On the contrary , in me ssage -wise UEP , pos iti ve error expone nts can be achiev ed even at cap acity . W e firs t co nsider the c ase whe n there is only o ne s pecial mess age and s how that, Theorem 2, optimal (missed- detection) error exponent for the special mes sage is equal to the r ed-alert exponent , which is de fined in sec tion III-B. W e then consider situations where a n exponentially large nu mber of me ssage s are special and each s pecial messag e demand s a p ositi ve (missed detection) error exponent. (This situation has previously been a nalyzed before in [12], an d a result closely related to our has been reported there.) Theorem 3 s hows a surprising res ult that thes e spec ial me ssage s c an ach iev e the same expo nent a s if all the other (non-spe cial) me ssage s we re ab sent. In other words, a c apacity achieving c ode a nd an error expo nent-optimal co de below cap acity can coexist without hurting eac h o ther . Thes e res ults also she d s ome new light on the structure of capacity achieving c odes. Insights from the block codes without fee dback be comes useful in Section IV where we in vestigate s imilar problems for v ariable length b lock code s with fee dback. Feedba ck together with variable de coding time creates some fundamental connec tions between bit-wise UEP and me ssage -wise UEP . Now even for bit-wise UEP , a positiv e e rror exponent c an be achieved at capac ity . Theorem 5 shows tha t a single special bit can ac hiev e the same expo nent as a s ingle spec ial mess age—the red-alert expon ent . As the numbe r of special b its increas es, the achiev able expone nt for them de cays linea rly with their rate as sh own in Theorem 6. Then The orem 7 ge neralizes this result to the case when there are multiple levels of specialty—most spec ial, se cond-most spec ial and so o n. It uses a strategy similar to onion-peeling and achieves error exponen ts which are su ccess i vely refinable ov er multiple lay ers. For single spe cial messa ge case, howev er , Theorem 8 shows that feedbac k does not improve the optimal misse d de tection exponen t. The case of exponen tially many me ssage s is resolved in Theorem 9. Evidently many sp ecial messages cannot achieve an exponent higher tha n that of a single special message, i.e. red-alert exponen t . Ho wev er it turns o ut tha t the s pecial messages c an rea ch red-alert exp onent at rates below a certain threshold, as if all the other spec ial messa ges were absent. Furthermore for the rates above the very same threshold, spe cial mes sages reach the corres ponding value of Burnashev’ s exponent, as if all the ordinary messag es were a bsent. Section V then ad dresses me ssage-wise UEP situations where spe cial mes sages demand small probab ility of false-alarms instead of missed-detections. It considers the case of fixed len gth block codes with out feedback as we ll as variable length block c odes with feedbac k. This discuss ion for false-alarms was po stponed from earlier se ctions to a v oid c onfusion with the missed-detec tion results in earlier s ections. Some future directions are discuss ed briefly in Section VI. After discussing each theo rem, we will provide a b rief d escription of the optimal strategy , but refrain from detailed te chnical disc ussions. Proofs c an be found in later sec tions. In s ection VII and s ection VIII we will present the proo fs of the results in Se ction II I, on block c odes without feed back, an d S ection IV, on variable length bloc k c odes with feedback, resp ectiv ely . Lastly in Sec tion IX we discu ss the proofs for the false-alarm results of Section V. Before going into the prese ntation o f our work let us gi ve a very brief overview of the previous work on the problem, in different fields. A. Pre vious W ork and Con tributi on The s implest method of u nequal error protec tion is to allocate diff erent cha nnels for different types of d ata. For example, many wireless sys tems allocate a se parate “ control cha nnel”, often with short c odes with low rate and low s pectral efficiency , to transmit control s ignals with h igh reliability . The w ell kn own Gray c ode, assigning similar b it strings to clos e by con stellation points, can b e viewed as UEP : even if there is some error 4 in iden tifying the transmitted symbol, there is a good chance that s ome of the b its are correctly rece i ved. Bu t clearly this ap proach is far from addressing the problem in any effecti ve way . The first s ystematic consideration of problem in c oding theory was within the frame work o f linear co des. In [24], Masnick and W olf su ggested tec hniques which p rotects dif ferent parts (bits) of the messa ge aga inst different number of cha nnel errors (channe l symbol co n versions). This frame work has extensiv ely studied over the years in [22], [16], [7], [26] , [21], [27], [8] and in many others. Later issue is a ddresse d within frame work o f Low Density Parity Check (LDPC) c odes too [39], [29], [30], [32], [31] , a nd [28]. “Priority en coded transmission” (PET) was sugges ted by Albenese et.al. [2] as an alternative mode l of the problem, with packet erasures. In this approach g uarantees are given not in terms of c hannel errors but packet erasures. Coding a nd modulation issues are a ddresse d simultaneously in [10]. For wireless ch annels, [15] a nalyzes this problem in terms of diversity- multiplexing trade-offs. In con trast with above mentioned work, we pose and a ddress the problem within the information theo retic frame work. W e work with the error p robabilities and refrain from making a ssumptions abo ut the pa rticular block code used while proving our conv erse results. This is the main dif ference between our approach an d the prev ailing ap proach within the coding the ory community . In [3], Bassalygo et. al. cons idered the error correcting cod es who se me ssages are composed o f two group of b its, e ach of which requ ires different lev el of p rotection aga inst cha nnel errors a nd provided inne r a nd outer bounds to the achievable performance, in terms of hamming distanc es and rates. Unlike other works within coding theory frame w ork, they do n ot make any assu mption a bout the c ode. Thus their results can indeed be reinterpreted in our framework as a result for bit wise UEP , on binary symmetric c hanne ls. Some of the the UEP problems h av e already been in vestigated within the framew o rk of information theory too. Csisz ´ ar studied mess age wise UEP with many message s in [12]. Moreover results in [12] a re not restricted to the rates close to capac ity , like ours. Also message s wise UE P with s ingle spe cial message was dealt with in [23] b y Kudryashov . In [23], a n UEP code with single sp ecial messa ge is u sed as a s ubcod e within a variable delay communication s cheme. The sche me propose d in [23] for the single sp ecial me ssage ca se is a key building block in many of the res ults in section IV. Ho wever the o ptimality of the sc heme was n ot proved in [23]. W e show that it is indeed optimal. The main contribution of the current work is the propos ed frame work for UEP problems within information theory . In addition to the particular results presente d on dif ferent problems and the c ontrasts demons trated between dif ferent scena rios, we be lie ve the p roof technique s used in s ubsec tions 1 VII-A, VIII-B.2 and VIII-D.2 are novel and they are promising for the future work in the field. I I . C H A N N E L M O D E L A N D N OTA T I O N A. DMC’ s and Block Cod es W e conside r a discrete memoryless chan nel (DMC) W Y | X , with inpu t a lphabet X = { 1 , 2 , . . . , |X |} and output alphabet Y = { 1 , 2 , . . . , |Y |} . The conditional d istrib ution of output letter Y whe n the c hannel input letter X equals i ∈ X is de noted by W Y | X ( ·| i ) . Pr [ Y = j | X = i ] = W Y | X ( j | i ) ∀ i ∈ X , ∀ j ∈ Y . W e as sume that a ll the en tries of the c hanne l transition matrix a re positiv e, that is, every outpu t letter is reachab le from e very input letter . This assumption is indeed a crucial one. Many o f the results we prese nt in this pape r change whe n there are zero-probability trans itions. A leng th n block cod e without fee dback with messag e set M = { 1 , 2 , . . . , |M|} is c ompose d of two map pings, encode r mapp ing and decode r mapp ing. Encod er mapp ing assigns a len gth n co dew ord, 2 ¯ x n ( k ) ∆ = ( ¯ x 1 ( k ) , ¯ x 2 ( k ) · · · , ¯ x n ( k )) ∀ k ∈ M 1 The ke y idea in subsection VIII-B.2 i s a generalization of the approac h presented in [4]. 2 Unless mentioned otherwise, small letters (e.g. x ) denote a particular value of the correspond ing random variable denoted in capital letters (e.g. X ). 5 where ¯ x t ( k ) denotes the input at time t for mes sage k . Decode r mapping, ˆ M , assigns a messag e to ea ch pos sible channe l output se quenc e, i.e. ˆ M : Y n → M . At time z ero, the transmitter is given the mess age M , which is ch osen from M according to a uniform distrib ution. In the follo wing n time un its, it sends the corres ponding cod ew ord. After o bserving Y n , recei ver decode s a message. The error proba bility P e and rate R of the c ode is given by P e , Pr h ˆ M 6 = M i and R , ln |M| n . B. Dif fer ent Kinds o f Err ors While discussing mess age-wise UEP , we consider the conditional error probability for a particular message i ∈ M , Pr h ˆ M 6 = i    M = i i . Recall that this is the sa me a s the miss ed detection probab ility for messag e i . On the other hand wh en we are talking ab out bit-wise UEP , we co nsider messa ge sets that a re of the form M = M 1 × M 2 . In such cases messa ge M is composed of two s ubmess ages, M = ( M 1 , M 2 ) . First submes sage M 1 correspond s to the high-priority b its while s econd subme ssage M 2 correspond s to the low-prior ity bits. The uniform choice o f M from M , implies the uniform and independent ch oice o f M 1 and M 2 from M 1 and M 2 respectively . Error probability of a sub messag e M j is given by Pr h ˆ M j 6 = M j i j = 1 , 2 Note that the overall mes sage M is dec oded inc orrectly when either M 1 or M 2 or both are de coded inco rrectly . The goal of bit-wise UEP is to achiev e bes t po ssible Pr h ˆ M 1 6 = M 1 i while ensuring a reas onably small P e = Pr h ˆ M 6 = M i . C. Reliable Co de Sequen c es W e focus on systems wh ere reliable communication is a chieved in o rder to find exponen tially tight bounds for error prob abilities of special p arts of information. W e use the notion of co de-seque nces to simplify o ur d iscussion . A seq uence of code s ind exed by their b lock-lengths is ca lled r eliable if lim n →∞ P e ( n ) = 0 For any reliable code-sequ ence Q , the ra te R Q is given by R Q , lim inf n →∞ ln |M ( n ) | n The (con ventional) e rror exponent of a reliable s equenc e is then E Q , lim inf n →∞ − ln P e ( n ) n Thus the number of mess ages in Q is 3 . = e nR Q and their average error probab ility d ecays like e − nE Q with block length. Now we can define error exponen t E ( R ) in the con ven tional sense , which is equ i valent to the ones gi ven in [20], [36], [13], [17], [25]. 3 The . = sign denotes equality i n the exponen tial sense. For a sequence a ( n ) , a ( n ) . = e nF ⇔ F = lim inf n →∞ ln a ( n ) n 6 Definition 1: For a ny R ≤ C the error exponent E ( R ) is de fined as E ( R ) ∆ = sup Q : R Q ≥ R E Q As men tioned previously , we are interested in UEP when operating a t ca pacity . W e already know , [36], that E ( C ) = 0 , i.e. the overall error probability cann ot de cay exponentially a t cap acity . In the following sections, we show how c ertain parts of information can still achiev e a pos iti ve exponent a t capacity . In do ing that, we are foc using on ly on the reliable seque nces whose rates are e qual to C . W e call such reliable code s equenc es as capacity-achieving sequ e nces . Through out the text we denote Kullback-Leibler (KL) diver ge nce between two distrib utions α X ( · ) and β X ( · ) as D ( α X ( · ) k β X ( · )) . D ( α X ( · ) k β X ( · )) = X i ∈X α X ( i ) ln α X ( i ) β X ( i ) Similarly cond itional K L diver gen ce be tween V Y | X ( ·|· ) and W Y | X ( ·|· ) und er P X ( · ) is g i ven by D  V Y | X ( ·| X )   W Y | X ( ·| X )   P X  = X i ∈X P X ( i ) X j ∈Y V Y | X ( j | i ) ln V Y | X ( j | i ) W Y | X ( j | i ) The outpu t distrib ution that ach iev es the capacity is denote d by P ∗ Y and a c orresponding input distribution is denoted by P ∗ X . I I I . UEP A T C A P A C I T Y : B L O C K C O D E S W I T H O U T F E E D BA C K A. Spec ial bit W e first addres s the situation whe re on e p articular b it (say the first) out of the total log 2 |M| bits is a spe cial bit—it n eeds a much better e rror protec tion than the overall information. The error proba bility o f the s pecial b it is required to de cay as fast as possible while ens uring reliable communication at ca pacity , for the overall c ode. Th e single spec ial bit is denoted b y M 1 where M 1 = { 0 , 1 } and over all me ssage M is o f the form M = ( M 1 , M 2 ) where M = M 1 × M 2 . The o ptimal error expone nt E b for the s pecial bit is then defined as follows 4 . Definition 2: For a capacity-ach ieving sequenc e Q with message sets M ( n ) = M 1 ×M ( n ) 2 where M 1 = { 0 , 1 } , the sp ecial b it error expon ent is d efined as E b , Q , lim in f n →∞ − ln Pr ( n ) [ ˆ M 1 6 = M 1 ] n Then E b is defin ed as E b , sup Q E b , Q . Thus if Pr ( n ) h ˆ M 1 6 = M 1 i . = exp( − n E b , Q ) for a reliable s equenc e Q , then E b is the sup remum of E b , Q over all capa city-achieving Q ’ s . Since E ( C ) = 0 , it is c lear that the entire information cann ot ac hiev e any positiv e error expo nent at cap acity . Howe ver , it is not clear whe ther a single spe cial bit can s teal a po siti ve error exponent E b at cap acity . Theorem 1: E b = 0 This implies that, if we want the error prob ability of the messages to vanish with increasing block length and the error prob ability of at least one of the bits to de cay with a positi ve exponen t with block length, the rate of the co de s equen ce sho uld b e strictly sma ller than the capacity . Proof of the theorem is heavy in calculations , but the main idea behind is the “blowing up lemma” [13]. Con ventiona lly , this lemma is only us ed for strong c on verses for various ca pacity the orems. It is also worth mentioning that the con ventional c on verse tech niques like Fano’ s inequality a re not sufficient to prove this result. 4 Appendix A discusses a different but equi valent type of definition and sho ws why it i s equi valent to this one. These two types of definitions are equiv alent for all the UEP exponen ts discussed in this paper . 7 ˆ M = 1 ˆ M = 0 Fig. 1. Split ting the output space into 2 distant enough clusters. Intuitiv e interpretation: Le t the s haded balls in Fig. 1 denote the minimal decoding regions of the messages . These decoding regions ensure reliable communication, they are esse ntially the typical n oise-balls ([11]) arou nd codewords. The decod ing re gions on the left of the thick line correspon ds to ˆ M 1 = 1 a nd those on the right correspond to the same when ˆ M 1 = 0 . Each of the se halves includes half of the decoding regions. Intuiti vely , the blowing up lemma implies tha t if we try to add s light extra thickness to the left clusters in Figure 1 , it blows up to occupy almos t a ll the output spa ce. This s trange phenomen on in high d imensional space s leav es no roo m for the right cluster t o fit. Infeasibility of adding even s light extra thickn ess impli es zero error exponent the sp ecial bit. B. Spec ial mes sage Now consider situations where o ne particular messa ge (say M = 1 ) out o f the . = e n C total me ssage s is a s pecial messag e—it nee ds a superior error protection. The missed detection probability for this ‘emer gency’ message needs to b e minimized. Th e best missed dete ction exponent E md is defin ed as follows. 5 Definition 3: For a capacity-achieving sequ ence Q , miss ed detection expo nent is defined as E md , Q , lim inf n →∞ − ln Pr ( n ) [ ˆ M 6 =1 | M =1 ] n . Then E md is defined as E md , sup Q E md , Q . Compare this with the situation where we aim to protect all the messa ges uniformly well. If all the mess ages demand e qually good miss ed d etection expon ent, then no p ositi ve expone nt is a chiev a ble at c apacity . Th is follows from the earlier discussion abou t E ( C ) = 0 . Below theorem shows the improv ement in this exponent if we o nly demand it for a single mess age ins tead of a ll. Definition 4: Th e parameter ˜ C is d efined 6 as the r ed-aler t exponent of a c hannel. ˜ C , max i ∈X D  P ∗ Y ( · ) k W Y | X ( ·| i )  W e will de note the input letter achieving ab ove maximum by x r . Theorem 2: E md = ˜ C . 5 Note t hat t he definiti on obtained by replacing Pr ( n ) h ˆ M 6 = 1 ˛ ˛ ˛ M = 1 i by min j Pr ( k ) h ˆ M 6 = j ˛ ˛ ˛ M = j i is equiv alent to the one given abov e, since we are taking the supremum over Q anyway . I n short, the message j with smallest conditional error probability could always be relabeled as message 1 . 6 Authors would like to thank Krishnan Eswaran of UC Berkeley for suggesting this name. 8 Recall that Karush-Kuhn-T ucker (KKT) c onditions for achieving capa city imply the following expres sion for capac ity , [20, Th eorem 4 .5.1]. C = max i ∈X D  W Y | X ( ·| i )   P ∗ Y ( · )  Note that simply switching the arguments of KL di vergence wit hin the maximization for C , gives u s the expression for ˜ C . The capacity C represen ts the best possible da ta-rate over a chan nel, whereas red-alert expone nt ˜ C repres ents the be st possible protection ach iev able for a me ssage at cap acity . It is worth men tioning here the “very no isy” c hanne l in [20]. In this formulation [6], the KL diver genc e is symmetric, which implies D  P ∗ Y ( · ) k W Y | X ( ·| i )  ≈ D  W Y | X ( ·| i )   P ∗ Y ( · )  . Hence the red-alert expone nt and capac ity b ecome rou ghly equal. For a symmetric c hannel like BSC, all inputs can b e u sed as x r . Since the P ∗ Y is the uniform d istrib ution for these channe ls, ˜ C = D  P ∗ Y ( · ) k W Y | X ( ·| i )  for any input letter i . This a lso ha ppens to be the sphere-pac king expone nt E sp (0) of this channel [36] at rate 0 . Optimal s trategy: Codew ords of a ca pacity a chieving code are used for the ordinary me ssage s. Codeword for the special messa ge is a rep etition sequ ence of the input letter x r . For all the output se quence s s pecial mess age is d ecoded , except for the output sequence s with emp irical distribution (type) approximately e qual to P ∗ Y . For the output s equenc es with emp irical distribution a pproximately P ∗ Y , the dec oding sc heme of the original capac ity achieving co de is used. Indeed Kudryas hov [23] ha d alrea dy sug gested the encod ing sche me described above, as a subco de for his non- block variable delay coding sch eme. Howe ver disc ussion in [23] does not ma ke any claims abo ut the o ptimality of this e ncoding sche me. Intuitiv e interpretation: Having a lar ge miss ed detection exponent for the special message corresponds to ha v ing a lar ge dec oding region for the special mes sage. This ensures that when M = 1 , i.e. whe n the specia l messa ge is transmitted, probability of ˆ M 6 = 1 is expo nentially small. In a s ense E md indicates how lar ge the decoding region o f the s pecial messa ge c ould be made, while still filling . = e n C typical noise ba lls in the remaining space. The red region in Fig. 2 d enotes s uch a lar g e region. Note that the a ctual decoding region of the special message is muc h larger than this illustration, becaus e it con sists of all o utput type s except the one s close to P ∗ Y , whe reas the ordinary d ecoding regions only contain the output types close to P ∗ Y . Fig. 2. A voiding missed-detection Utility o f this result is two folds: first, the optimality of suc h a simple sc heme was not obvious before; second, as we will see later protec ting a single spec ial messa ge is a key building block for many other problems when feedback is avail able. 9 C. Many sp ecial messages Now cons ider the c ase wh en instead of a sing le s pecial mess age, exponen tially many of the total . = e n C messag es are sp ecial. Let M ( n ) s ⊆ M ( n ) denote this se t o f s pecial message s, M ( n ) s = { 1 , 2 , · · · , ⌈ e nr ⌉} . The bes t missed detection expone nt, a chiev a ble simultane ously for all of the spe cial message s, is denoted by E md ( r ) . Definition 5: For a capacity-ach ieving se quenc e Q , the missed d etection exponent achieved on sequen ce of subsets M s is defin ed as E md , Q , M s , lim inf n →∞ − ln max i ∈M ( n ) s Pr ( n ) [ ˆ M 6 = i | M = i ] n . Then for a given r < C , E md ( r ) is de fined as, E md ( r ) , sup Q , M s E md , Q , M s where maximization is over M s ’ s suc h that lim inf n →∞ ln |M ( n ) s | n = r . This me ssage wise UEP problem has already bee n in vestigated by Csisz ´ ar in his pape r on joint source -channe l coding [12]. His analys is allows for multiple sets o f s pecial mes sages ea ch with its own rate and an overall rate that can b e smaller than the c apacity . 7 Essentially , E md ( r ) is the bes t value for which misse d detection probab ility of ev ery special mess age is . = exp( − nE md ( r )) or smaller . Note that if the on ly messages in the c ode are thes e ⌈ e nr ⌉ special me ssage s (instead of |M ( n ) | . = e n C total mes sages ), their best misse d detection exponent e quals the classical error exponent E ( r ) discuss ed earlier . Theorem 3: E md ( r ) = E ( r ) ∀ r ∈ [0 , C ) . Thus we can communica te reliably at capac ity and still protect the special message s as if we are only communicating the sp ecial messa ges. Note that the classical error expone nt E ( r ) is yet un known for the rates below criti cal rate (excep t zero rate). N onetheless , this the orem says tha t wha tev er E ( r ) can be achieved for ⌈ e nr ⌉ messag es when they are by themselves in the code book, can still be achieved when there are . = e n C additional ordinary messa ges requ iring reliable co mmunication. Optimal strategy : Start with an optimal code-book for ⌈ e nr ⌉ me ssage s w hich achieves the error exponent E ( r ) . These co dew ords are used for the sp ecial me ssage s. Now the ordinary c odewords are a dded us ing rand om c oding. The ordinary co dew ords w hich land c lose to a s pecial cod ew ord may be discarded without e ssentially any eff ect on the rate of communica tion. Decode r uses a two-stage deco ding rule, in first s tage of which it decides whether or not a spe cial me ssage was sent. If the receiv ed se quenc e is close to one or more of the special codew ords, rec eiv er decide s that a spe cial messag e was sent else it de cides an ordinary messag e was sent. In the sec ond stage , receiv er employs an ML decoding either a mong the ordina ry message s or the among the s pecial mess ages depending on its de cision in the first stage. The overall missed de tection expo nent E md ( r ) is bottle-necked by the s econd s tage e rrors. It is be cause the first stage error exponent is es sentially the sphere-packing exponent E sp ( r ) , which is never s maller than the second stage error expo nent E ( r ) . Intuitiv e interpretation: This means that we ca n start with a code of ⌈ e nr ⌉ me ssage s, where the de coding regions are l ar ge enough to provide a mi ssed detection exponent of E ( r ) . Co nsider the balls around each codeword with sphere-pac king radius (see Fig. 3 (a)). For each message, the probability of going outside its ball deca ys exponentially with the s phere-pac king expone nt. Although, thes e ⌈ e nr ⌉ balls fill up most of the output space , 7 Authors would like to thank Pulkit Grover of UC Berkeley f or pointing out t his closely related work, [12] 10 there are still some cavit ies left between them. Thes e sma ll cavities ca n still acc ommodate . = e n C typical noise balls for the ordinary messages (see Fig. 3(b)), which are mu ch smaller than the original ⌈ e nr ⌉ balls. This is analogou s to filling s and particles in a box full of large boulders . This theorem is like saying that the number o f sand particles rema ins unaffected (in terms of the expo nent) in spite of the large bou lders. (a) Exp onent optimal cod e (b) Ach ieving capac ity Fig. 3. “There is alwa ys room for capacity” D. Allowing eras ur es In some situations , a decode r may be allowed declare an erasure when it is not s ure about the transmitted messag e. These e rasure events are not coun ted as errors and are us ually followed by a retransmission using a decision feedback protocol like Hybrid-ARQ. This subsection extends the earlier result for E md ( r ) to the cases when suc h e rasures are a llo wed. In deco ding with erasures , in addition to the mes sage set M , the de coder c an map the receiv ed seq uence Y n to a virtual me ssage called “eras ure”. Let P erasure denote the average eras ure probability of a c ode. P erasure = Pr h ˆ M = eras ure i Previously when there was no eras ures, e rrors were not detec ted. For errors and e rasures deco ding, erasu res are detected errors, the rest of the errors are unde tected errors and P e denotes the undetected error proba bility . Thu s av erage and con ditional (unde tected) error proba bilities are giv en by P e = Pr h ˆ M 6 = M , ˆ M 6 = eras ure i and P e ( i ) = P r h ˆ M 6 = M , ˆ M 6 = eras ure    M = i i An infinite s equen ce Q of b lock code s with errors and eras ures decod ing is r e liable , if its average error probability and average eras ure p robability , both vanish with n . lim n →∞ P e ( n ) = 0 and lim n →∞ P erasure ( n ) = 0 If the eras ure probability is s mall, the n average nu mber of retransmission s need ed is a lso sma ll. Hence this condition of vanishingly sma ll P erasure ( n ) ensures that the e f fecti ve data-rate of a d ecision feedbac k protoco l remains unc hanged in s pite of retransmissions. W e again res trict ourselves to reliable seque nces whose rate equal C . W e could redefine a ll p revious expone nts for decision-feedba ck (df) scenarios, i.e . for reliable code s with erasure decoding . But res ulting exponents do not chan ge with the provision of e rasures with vanishing probability for single bit or sing le message problems, i.e. dec ision feedbac k protoco ls such as Hybrid-ARQ do es not improv e E b or E md . Thus we o nly discuss the decision feedback version of E md ( r ) . 11 Definition 6: For a cap acity-achieving se quence with erasures, Q , the miss ed detection exponent ac hiev ed on sequen ce of subsets M s is defin ed as E df md , Q ( r ) , lim inf n →∞ − ln max i ∈M ( n ) s Pr ( n ) [ ˆ M 6 = i, ˆ M 6 = erasure | M = i ] n . Then for a g i ven r < C , E df md , Q ( r ) is define d a s, E df md , Q ( r ) , sup Q , M s E md , Q , M s where ma ximization is ov er M s ’ s suc h that lim inf n →∞ ln |M ( n ) s | n = r . Next theorem shows allowi ng erasures increases the miss ed-detection exponent for r below c ritical rate, o n symmetric ch annels. Theorem 4: For symmetric cha nnels E df md ( r ) ≥ E sp ( r ) ∀ r ∈ [0 , C ) . Coding s trategy is similar to the no -erasure c ase. W e first s tart with an erasure cod e for ⌈ e nr ⌉ mes sages like the one in [18]. Th en add randomly g enerated ordinary codewords to it. Aga in a two- stage deco ding is performed where the first stage decides be tween the set of ordina ry c odewords a nd the set of special c odewords u sing a threshold distance. If this fi rst stage choo ses spe cial cod ew ords, the second s tage a pplies the dec oding rule in [18] amongs t spec ial codewor ds. Otherwise , the secon d stage us es the ML decoding among ordinary cod ew ords. The overall missed d etection expone nt E df md ( r ) is bottle-nec ked by the first stage errors. It is becaus e the first- stage error exponent E sp ( r ) is smaller than the secon d stage error expone nt E sp ( r ) + C − r . This is in contrast with the c ase without eras ures. I V . UEP A T C A PAC I T Y : V A R I A B L E L E N G T H B L O C K C O D E S W I T H F E E D BA C K In the last sec tion, we an alyzed bit wise and me ssage wise UEP problems for fixed leng th b lock c odes (without feedb ack) operating at capacity . In this sec tion, we will re v isit the same problems for variable length block code s with perfect feedback, operating at capac ity . Before go ing into the discus sion o f the problems, let us recall variable length block codes with feedbac k b riefly . A variable len gth block c ode with feed back, is co mposed of a coding algorithm and a deco ding rule. Decoding rule de termines the de coding time and the me ssage that is decode d then. Poss ible observations o f the receiv er can be se en as leaves o f |Y | -ary tree, as in [4]. In this tr ee, all nodes at length 1 from the root den ote all |Y | possible outputs at time t = 1 . All non-leaf nodes a mong these s plit into further |Y | branch es in the next time t = 2 and the branching of the non-leaf nodes c ontinue lik e this ever after . Each n ode o f depth t in this tree correspond s to a particular seque nce, y t , i.e. a history o f o utputs until time t . The pa rent of no de y t is its prefix y t − 1 . Leaves o f this tree form a prefi x free source co de, because decision to s top for d ecoding has to b e a casual ev ent. In o ther words the event { τ = t } shou ld be measurable in the σ -field gene rated by Y t . In addition we have P r [ τ < ∞ ] = 1 thus decoding time τ is Markov stopp ing time with respect to receivers observation. The coding a lgorithm on the othe r hand as signs an input letter , X t +1 ( y t ; i ) , to each message , i ∈ M , at each non-leaf node, y t , of this tree. Th e enc oder stops transmission of a message when a leaf is reac hed i.e. when the deco ding is comple te. Codes we cons ider are block codes in the se nse tha t transmission of eac h message (packet) starts only after the transmiss ion of the previous on e ends. The e rror probability and rate of the code a re simply given by P e = Pr h ˆ M 6 = M i and , R = ln M E [ τ ] A more thorough disc ussion of variable length block c odes with feedb ack can be found in [9] and [4]. 12 Earlier discus sion in Section II-B about diff erent kinds of e rrors is still valid as is b ut we need to slightly modify our disc ussion a bout the reliable se quence s. A reliable seque nce of variable length block cod es w ith feedback , Q , is any c ountably infinite co llection of code s ind exed by integers, such that lim k →∞ P e ( k ) = 0 In the rate and expone nt definitions for reliable sequence s, we replace b lock-length n by the expected decoding time E [ τ ] . The n a capa city a chieving seq uence with fee dback is a reliable seque nce of variable length block codes with feed back whose rate is C It is w orth noting the impo rtance of ou r assumption that all the entries of the trans ition probability ma trix, W Y | X are po siti ve. For any chann el with a W Y | X which h as one o r more zero prob ability transitions, it is p ossible to have error free co des operating at c apacity , [9]. Thus all the expo nents disc ussed below are infinite for DMCs with one or more zero proba bility transitions. A. Spec ial bit Let us co nsider a capa city a chieving s equenc e Q wh ose messag e se ts are of the form M ( k ) = M 1 × M ( k ) 2 where M 1 = { 0 , 1 } . Then the e rror expone nt of the M 1 , i.e. , the initial bit, is defin ed as follows. Definition 7: For a capacity achieving se quenc e with feedba ck, Q , with mes sage sets M ( k ) of the form M ( k ) = M 1 × M ( k ) 2 where M 1 = { 0 , 1 } , the s pecial bit error expo nent is d efined as E f b , Q , lim inf k →∞ − ln Pr ( n ) [ ˆ M 1 6 = M 1 ] E [ τ ( k ) ] Then E f b is defin ed as E f b , sup Q E f b , Q Theorem 5: E f b = ˜ C . Recall that without fee dback, even a single bit cou ld not a chieve any positiv e error expone nt at capa city , The orem 1. B ut feedbac k toge ther with variable d ecoding time connec ts the messag e wise UEP and the bit wise UEP and results in a po siti ve exponent for bit wise UEP . Below described s trategy show how schemes for protecting a special mes sage c an be us ed to protect a special bit. Optimal stra tegy: W e use a length ( k + √ k ) fixed length block code with errors and erasures deco ding as a buil ding block for our code. T ransmitter first transmits M 1 using a short repetition code of length √ k . If the tentati ve decision abou t M 1 , ˜ M 1 , is c orrect after this rep etition code , transmitter send s M 2 with a leng th k c apacity achieving code. If ˜ M 1 is inco rrect after the rep etition c ode, transmitter sends the symbol x r for k time u nits whe re x r is the input letter i max imizing the D  P ∗ Y ( · ) k W Y | X ( ·| i )  . If the output sequ ence in the sec ond pha se, Y √ k + k √ k +1 , is not a typ ical se quenc e of P ∗ Y , an eras ure is declared for the block. And the sa me me ssage is retrans mitted b y repeating the sa me strategy afresh. Else rec eiv er use s a n ML decoder to chos e ˆ M 2 and ˆ M = ( ˜ M 1 , ˆ M 2 ) . The erasure probability is vanishingly s mall, as a resu lt the unde tected error probab ility of M i in fixed length erasure c ode is a pproximately equ al to the error prob ability o f M i in the variable length block code. Fu rthermore E [ τ ] is roughly ( k + √ k ) desp ite the retransmission s. A dec oding e rror for M 1 happen s only when ˜ M 1 6 = M 1 and the empirical distribution of the output s equenc e in the second p hase is clos e to P ∗ Y . N ote that latter event happen s with probability . = e − ˜ C E [ τ ] . B. Many sp ecial bits W e now an alyze the situation where instead of a single spe cial b it, there are approximately E [ τ ] r/ ln 2 special bits out of the total E [ τ ] C / ln 2 (ap prox.) bits. Hence we cons ider t he cap acity achie ving sequence s with feedba ck having message s ets of the form M ( k ) = M ( k ) 1 × M ( k ) 2 . Unlike the pre vious subsection where size of M ( k ) 1 13 was fixed, we now allo w its size to v ary with the index of the co de. W e restrict ourselves to the cases w here lim inf k →∞ ln |M ( k ) 1 | E [ τ ( k ) ] = r . This limit gives us the rate of the special bits. It is worth n oting at this point that e ven whe n the rate r of spe cial bits is z ero, the number of special bits might n ot be bounde d, i.e. lim inf k →∞ |M ( k ) 1 | might be infinite. The error expo nent E f bits , Q at a gi ven rate r of special bits is defined as follows, Definition 8: For a ny c apacity achieving s equenc e with feedback Q with the mess age se ts M ( k ) of the form M ( k ) = M ( k ) 1 × M ( k ) 2 , r Q and E f bits , Q are define d a s r Q , lim inf k →∞ ln |M ( k ) 1 | E [ τ ( k ) ] E f bits , Q , lim inf k →∞ − ln Pr ( k ) [ ˆ M 1 6 = M 1 ] E [ τ ( k ) ] Then E f bits ( r ) is d efined as E f bits ( r ) , sup Q : r Q ≥ r E f bits , Q Next the orem shows how this expo nent dec ays linea rly with ra te r of the special bits. Theorem 6: E f bits ( r ) =  1 − r C  ˜ C Notice that the expone nt E f bits (0) = ˜ C , i.e. it is as high as the expon ent in the single bit cas e, in sp ite of the fact that here the nu mber of bits can be growing to infin ity with E [ τ ] . This linear trade of f betwee n rate a nd reliability reminds us of Burnash ev’ s res ult [9]. Optimal strategy: Like the single b it case , we use a fixed length b lock cod e with eras ures as our building block. First transmitter s ends M 1 using a capac ity achieving code of length r C k . If the tentati ve dec ision ˜ M 1 is correct, transmitter sends M 2 with a c apacity achieving c ode of le ngth (1 − r C ) k . Otherwise transmitter sends the channel input x r for (1 − r C ) k time units. If the ou tput s equen ce in the seco nd ph ase is not typical with P ∗ Y an e rasure is declared and same strategy is rep eated afresh. Else re ceiv er us es a ML decoder to decide ˆ M 2 and decod es the mess age M as ˆ M = ( ˜ M 1 , ˆ M 2 ) . A decoding error for M 1 happen s only when an error ha ppens in the first phase and the output sequence in the se cond phase is typ ical with P ∗ Y when the reject codewor d is se nt. Bu t the probability of the later event is . = e − (1 − r C ) ˜ C k . The factor of (1 − r C ) arises becau se the relativ e duration of the sec ond phase to the over all communication block . Similar to the s ingle bit cas e, erasure probab ility re mains vanishingly small in this case. Thu s not only the expe cted decod ing time of the v ariable len gth b lock co de is roughly equal to the block length of the fixed le ngth block code, but a lso its error probab ilities are roughly equal to the corresponding e rror probabilities as sociated with the fixed length block code. C. Multiple lay ers o f priority W e c an generalize this resu lt to the ca se when there are multiple levels of priority , where the most important layer contains E [ τ ] r 1 / ln 2 bits, the secon d-most important layer con tains E [ τ ] r 2 / ln 2 bits and so o n. For an L -layer situation, me ssage se t M ( k ) is of the form M ( k ) = M ( k ) 1 × M ( k ) 2 × · · · × M ( k ) L . W e as sume without los s of g enerality that the orde r of importance of the M i ’ s is M 1 ≻ M 2 ≻ · · · ≻ M L . He nce we have P e M 1 ≤ P e M 2 ≤ · · · ≤ P e M L . Then for any L -layer capacity achieving s equen ce with feedb ack, we define the error exponent o f the s th layer as E f bits ,s, Q = lim inf k →∞ − ln Pr ( k ) [ ˆ M s 6 = M s ] E [ τ ( k ) ] . The ach iev ab le e rror expone nt region of the L -layered capac ity achieving se quence s with feedb ack is the set of all achiev able expon ent vec tors ( E f bits , 1 , Q , E f bits , 2 , Q , . . . , E f bits ,L − 1 , Q ) . The follo wing theorem determines that region. 14 Theorem 7: Achievable error exponent region of the L -layered cap acity achieving seq uences with feed back, for rate vector ( r 1 , r 2 , . . . , r L − 1 ) is the set of vectors ( E 1 , E 2 , . . . , E L − 1 ) satisfying, E i ≤ 1 − P i j =1 r j C ! ˜ C ∀ i ∈ { 1 , 2 , . . . , ( L − 1) } . Note tha t the lea st important layer c annot a chieve any positiv e error expo nent b ecaus e we are communica ting at capac ity , i.e. E L = 0 . Optimal strategy : T ran smitter first send s the most important layer , M 1 , us ing a capa city ach ieving code of length r 1 C k . If it is decoded correctly , then it sends the next layer with a capacity achieving code of length r 2 C k . Else it starts sending the inp ut letter x r for not only r 2 C k time units but also for all remaining L − 2 pha ses. Same strategy is repeate d for M 3 , M 4 , . . . , M L . Once the whole block of channel outputs, Y k , is o bserved; receivers checks the empirical distrib ution of the output in all of the pha ses exc ept the first o ne. If they are all typical with P ∗ Y receiv er us es the tentati ve d ecisions to de code, ˆ M = ( ˜ M 1 , ˜ M 2 , . . . ˜ M L ) . If one o r mo re of the output seque nces are not typical with P ∗ Y an eras ure is declared for the whole block and trans mission starts from scratch. For each laye r i , with the above strategy we can achieve an expo nent a s if the re were only two kinds of bits (as in The orem 6) • bits in lay er i or in more important lay ers k < i (i.e. spec ial bits) • bits in les s important layers (i.e. ordinary bits). Hence The orem 7 does not only sp ecify the optimal p erformance when there are multiple layers, but also shows that the performanc e we obse rved in The orem 6, is su ccess i vely re finable. Figure 4 s hows these s imultaneously achiev able expone nts of Th eorem 6, for a particular rate vector ( r 1 , r 2 , . . . , r L − 1 ) . ✻ ✲ exponent rate r 1 r 2 r 3 r 4 r 5 C ˜ C (1 − r 1 C ) ˜ C (1 − r 1 + r 2 C ) ˜ C (1 − P 3 i =1 r i C ) ˜ C (1 − P 4 i =1 r i C ) ˜ C (1 − P 5 i =1 r i C ) ˜ C ◗ ◗ ◗ ◗ ◗ ◗ ◗ ◗ ◗ ◗ ◗ ◗ ◗ ◗ ◗ ◗ ◗ ◗ ◗ ◗ Fig. 4. Successiv e refinability for multiple layers of priority , demonstrated on an exa mple with six layers; P 6 i =1 r i = C . Note that the mo st important layer can ach iev e an exponen t close to ˜ C if its rate is c lose to zero. As we move to the layers with decreas ing importance, the achiev able error expo nent dec ays gradually . D. Special mes sage Now cons ider one particular messag e, say the first one, which requ ires small misse d-detection proba bility . Similar to the no-feedback case , de fine E f md as its misse d-detection exponent a t capa city . Definition 9: For any ca pacity ach ieving sequence with feed back, Q , miss ed de tection expo nent is defined as E f md , Q , lim inf k →∞ − ln Pr ( k ) [ ˆ M 6 =1 | M =1 ] E [ τ ( k ) ] . Then E f md is defined as E f md , sup Q E f md , Q . 15 Theorem 8: E f md = ˜ C . Theorem 2 an d 8 implies follo wing c orollary , Cor o llary 1: Feedbac k doesn’t i mprove the miss ed detection exponent of a single special message: E f md = E md . If red-alert exponent we re defined as the best protection of a special mes sage achiev able at capa city , then this result could have been thought o f as a n ana log the “fee dback does n ot increas e capac ity” for the red-alert exponent . Also note tha t with feedb ack, E f md for the s pecial mess age an d E f b for the s pecial bit are eq ual. E. Many sp ecial messages Now let us cons ider the problem where the first ⌈ e E [ τ ] r ⌉ messag es are s pecial, i.e. M s = { 1 , 2 , . . . , ⌈ e E [ τ ] r ⌉} . Unlike previous problems , now we will also impose a un iform expected delay con straint as follows. Definition 10 : For any reliable v ariable length block c ode with feedba ck, Γ , max i ∈M E [ τ | M = i ] E [ τ ] A reliable seq uence with feedb ack, Q , is a uniform delay reliable sequ ence with fee dback if and only if lim k →∞ Γ ( k ) = 1 . This mean s tha t the average E [ τ | M = i ] for every mess age i is esse ntially e qual to E [ τ ] (if n ot smaller). Th is uniformity cons traint refle cts a s ystem requirement for ensuring a robust delay performance, which is in variant o f the transmitted message . 8 Let us d efine the mis sed-detection expone nt E f md ( r ) un der this u niform delay c onstraint. Definition 11 : For any uniform delay cap acity a chieving sequ ence with feedbac k, Q , the missed detection exponent ac hieved on sequ ence of s ubsets M s is defin ed as E f md , Q , M s , lim inf n →∞ − ln max i ∈M ( k ) s Pr ( k ) [ ˆ M 6 = i | M = i ] E h τ ( k ) i . Then for a gi ven r < C , we define E f md ( r ) , sup Q , M s E f md , Q , M s where maximization is ov er M s ’ s suc h that lim inf k →∞ ln |M ( k ) s | E  τ ( k )  = r . The follo wing theorem shows that the special mes sages can achieve the minimum of the red-alert exponen t and the Burna shev’ s expone nt at rate r . Theorem 9: E f md ( r ) = min  ˜ C , (1 − r C ) D max  , ∀ r < C . where D max , max i,j ∈X D  W Y | X ( ·| i )   W Y | X ( ·| j )  . For r ∈ [0 , (1 − ˜ C D max ) C ] each special messa ge ac hiev es the bes t miss ed detection exponent ˜ C for a single spec ial messag e, a s if the res t of the spec ial messa ges we re abs ent. For r ∈ [(1 − ˜ C D max ) C , C ) special mess ages achieve the Burna shev’ s exponent a s if the ordina ry mess ages were absen t. The optimal strategy is based on transmitti ng a spec ial bit first. This resu lt demonstrates, yet a nother time, how feedback conn ects bit-wise UEP with messag e-wise UEP . In the o ptimal strategy for bit-wise UEP with many bits a s pecial messag e was used, whereas now in messa ge wise UEP with many messages a sp ecial bit is used. The roles of bits an d me ssage s, in two op timal strategies are simply s wapped be tween the two cas es. Optimal s trategy: W e c ombine the s trategy for achieving ˜ C for a s pecial bit and the Y amamoto-Itoh strategy for achieving Burnashev’ s expo nent [40]. In the fi rst phas e, a spec ial bit, b , is sent with a repe tition code of 8 Optimal expon ents i n all prev ious problems remain unchang ed irr especti ve of this uniform delay constraint. 16 √ k symbols. This is the indicator bit for spe cial me ssage s: it is 1 when a special me ssage is to be sen t an d 0 otherwise. If b is d ecoded incorrectly as ˆ b = 0 , inp ut letter x r is sent for the remaining k time u nit. If it is d ecoded correctly as ˆ b = 0 , the n the ordinary mes sage is sent using a c odeword from a capacity achieving c ode. If the output sequen ce in the second pha se is typica l with P ∗ Y receiv er us e a n ML d ecode r to ch ose on e of the o rdinary messag es, else an erasure is dec lared for ( k + √ k ) lon g block. If ˆ b = 1 , then a len gth k two phase co de with e rrors and e rasure decod ing, like the one given in [40] b y Y amamoto a nd Itoh, is use d to send the messag e. In the commu nication phase a length r C k capacity achieving code is use d to se nd the messa ge, M , if M ∈ M s . If M / ∈ M s an a rbitrary codeword from the length r C k capac ity achieving c ode is sent. In the control phas e, if M ∈ M s and if it is decod ed correctly a t the end of communication phas e, the accept letter x a is sent for (1 − r C ) k time units, else the rejec t letter , x d , is s ent for (1 − r C ) k time units. If the empirical distrib ution in the control phase is typical with W Y | X ( ·| x a ) then special messag e dec oded at the end of the communica tion phase become s the final ˆ M , else a n erasure is d eclared for ( k + √ k ) lon g block. Whenever an eras ure is declared for the whole block , transmitter and receiver applies above s trategy a gain from scratch. Th is scheme is repeated u ntil a no n-erasure decoding is reached. V . A VO I D I N G F A L S E A L A R M S In the previous sec tions wh ile in vestigating mes sage wise UEP we h av e o nly considered the miss ed detection formulation of the problems. In this section we will focus on an alternativ e formulation of me ssage wise UEP problems bas ed on false alarm p robabilities. A. Block Co des without F eedback W e first c onsider the no-feed back case . When false-alarm o f a sp ecial mess age is a critical event, e.g. the “reboot” instruction, the false alarm prob ability Pr h ˆ M = 1 | M 6 = 1 i for this mess age s hould be minimized, rather than the miss ed detection probab ility Pr h ˆ M 6 = 1 | M = 1 i . Using Bayes ’ rule and a ssuming uniformly chos en messag es we ge t, Pr h ˆ M = 1 | M 6 = 1 i = Pr h ˆ M = 1 , M 6 = 1 i Pr [ M 6 = 1] = P j 6 =1 Pr h ˆ M = 1 | M = j i ( |M| − 1) . In clas sical error exponent analys is, [20], the error p robability for a giv en message usu ally means its missed detection probability . Howev er , examples s uch as the “reb oot” messag e necess itate this n otion of false alarm probability . Definition 12 : For a capacity-achieving seque nce, Q , such that lim sup n →∞ Pr ( n ) h ˆ M 6 = 1    M = 1 i = 0 , false a larm exponen t is d efined as E fa , Q , lim inf n →∞ − ln Pr ( n ) [ ˆ M =1 | M 6 =1 ] n . Then E fa is defin ed as E fa , sup Q E fa , Q . 17 Thus E fa is the best exponential decay rate of f alse alarm probability with n . Unfortunately we do no t hav e the exact expres sion for E fa . Ho wev er uppe r bou nd gi ven below is sufficient to demon strate the improvement introduced by fee dback and variable decoding time. Theorem 10: E l fa ≤ E fa ≤ E u fa . The uppe r and lower boun ds to the false alarm exponent are gi ven by E l fa , max i ∈X min V Y | X : P j V Y | X ( ·| j ) P ∗ X ( j )= W Y | X ( ·| i ) D  V Y | X ( ·| X )   W Y | X ( ·| X )   P ∗ X  E u fa , max i ∈X D  W Y | X ( ·| i )   W Y | X ( ·| X )   P ∗ X  . The maximizers of the optimizations for E l fa and E u fa are denoted by x f l and x f u E l fa = min V Y | X : P j V Y | X ( ·| j ) P ∗ X ( j )= W Y | X ( ·| x f l ) D  V Y | X ( ·| X )   W Y | X ( ·| X )   P ∗ X  E u fa = D  V Y | X ( ·| x f u )   W Y | X ( ·| X )   P ∗ X  . Strategy to reach lower bou nd Co dewor d for the spec ial me ssage M = 1 is a repetition sequenc e of input letter x f l . Its deco ding region is the typical ‘no ise ball’ around it, the outpu t s equenc es who se empirical distrib ution is approximately equal to W Y | X ( ·| x f l ) . For the ordinary messa ges, we use a cap acity ac hieving co de-book whe re all codew ords have the sa me empirical distributi on (approx.) P ∗ X . Th en for y n whose empirical distribution is not in the typical ‘noise ball’ aroun d the special c odeword, receiver makes an ML decoding amo ng the ordinary codewords. Note the con trast between this strategy for achieving E l fa and the optimal strategy for achieving E md . For achieving E md , output se quence s of any type other tha n the one s c lose to P ∗ Y were d ecode d as the special messag e; whe reas for achieving E fa , only the output sequ ences o f types that are clos e to W Y | X ( ·| x f l ) are deco ded a s the special messag e. Fig. 5. A voiding false-alarm Intuitiv e interpretation: A f alse alarm expo nent for the special message c orresponds to having the s mallest possible d ecoding region for the sp ecial mess age. This e nsures that when s ome ordinary messa ge is transmitted, probability of the ev ent { ˆ M = 1 } is exponentially small. W e cannot make it too small though, becau se whe n the special mess age is trans mitted, the probability of the very same event should be almost one. Hen ce the d ecoding region of the s pecial message should at least contain the typical n oise ball around the special codeword. The blue region in Fig. 5 d enotes such a region. 18 Note that E l fa is larger than chan nel c apacity C d ue to the c on vexity o f KL div ergence. E l fa = max i ∈X min V Y | X : P j V Y | X ( ·| j ) P ∗ X ( j )= W Y | X ( ·| i ) D  V Y | X ( ·| X )   W Y | X ( ·| X )   P ∗ X  > max i ∈X min V Y | X : P j V Y | X ( ·| j ) P ∗ X ( j )= W Y | X ( ·| i ) D X k P ∗ X ( k ) V Y | X ( ·| k )      X k ′ P ∗ X ( k ′ ) W Y | X ( ·| k ′ ) ! = max i ∈X D  W Y | X ( ·| i )   P ∗ Y ( · )  = C where P ∗ Y denotes the outpu t d istrib ution correspo nding to the c apacity achieving input distribution P ∗ X and the last equ ality follows from KKT condition for achieving c apacity we men tioned previously [20, Th eorem 4.5.1 ]. Now we can compare our res ult for a special me ssage with the similar result for classical situation where a ll messag es are treated eq ually . It turns o ut that if e very messag e in a ca pacity-achieving code demands equ ally good false-alarm expo nent, then this uniform exp onent ca nnot be larger than C . T his res ult s eems to be directly connec ted with the p roblem of ide ntification via c hannels [1]. W e ca n prove the ac hiev a bility part of their capac ity theorem using an extens ion of the achiev a bility pa rt o f E l fa . Perhaps a new c on verse of their resu lt is also possible using suc h results. F urthermore we see that reducing the d emand of f alse-alarm expon ent to only one message , instead of a ll, enha nces it from C to a t least E l fa . B. V ariable L ength Block Cod e s with F e edback Recall that feedback does not impro ve the missed-detec tion expon ent for a special mes sage. On the contrary , the false-alarm expone nt of a spe cial messag e is improved when feedback is av a ilable and variable decoding time is allowed. W e again restrict to un iform delay capa city a chieving s equen ces with fee dback, i.e. cap acity achieving se quence s satisfying lim k →∞ Γ ( k ) = 1 . Definition 13 : For a uniform delay ca pacity-achieving sequenc e with feed back, Q , su ch that lim sup k →∞ Pr ( k ) h ˆ M 6 = 1    M = 1 i = 0 , false a larm exponen t is d efined as E f fa , Q , lim inf k →∞ − ln Pr ( k ) [ ˆ M =1 | M 6 =1 ] E [ τ k ] . Then E f fa is defin ed as E f fa , sup Q E f fa , Q . Theorem 11: E f fa = D max . Note that D max > E u fa . Thus feedback s trictly improves the false alarm exponent, E f fa > E fa . Optimal strategy: W e u se a strategy similar to the one employed in proving Theorem 9 in s ubsection IV -E. In the first phase, a length √ k code is u sed to c on vey whethe r M = 1 or not, us ing a spec ial b it b = I { M =1 } . • If ˆ b = 0 , a length k capa city achieving code with E md = ˜ C is used . If the decod ed message for the length k code is 1 , a n erasu re is de clared for ( k + √ k ) long block. Else the de coded messa ge of leng th k co de becomes the de coded mess age for the whole ( k + √ k ) long block. • If ˆ b = 1 , – a nd M = 1 , input symbol x a is transmitted for k time units. – a nd M 6 = 1 , input symbol x d is transmitted for k time units. 19 If the o utput se quence , Y √ k + k √ k +1 , is typical w ith W Y | X ( ·| x a ) the n ˆ M = 1 e lse an erasure is declared for ( k + √ k ) long block. Receiver and transmitter starts from scratch if a n erasure is declared at the end o f seco nd phase. Note that, this strategy simultaneously achieves the op timal missed-detec tion expone nt ˜ C and the optimal false-alarm exponent D max for this sp ecial messa ge. V I . F U T U R E D I R E C T I O N S In this p aper w e have restricted our in vestigation of UE P problems to data rates tha t are ess entially e qual to the channe l capa city . Scenarios we hav e analyzed provides us with a rich class of problems when we consider data rates below capacity . Most of the UEP problems has a coding theo retic version. In these coding theoretic versions deterministic guarantees , in terms of Hamming d istances, are demanded instead o f the probabilistic guarantees, in terms of error expo nents. As we hav e me ntioned in section I-A, c oding theo retic versions of bit-wise UEP problems have been studied for the c ase of linear co des extens i vely . Bu t it se ems co ding theoretic versions o f both mess age-wise UEP problems and bit-wi se UE P problem for non-linear codes are sc arcely in vestigated [3], [5]. Throughou t this p aper , we focused on the cha nnel co ding compone nt of communication. However , often times, the final objecti ve is to communicate a sou rce within some distortion constraint. M essag e-wise UEP problem itself has first come up within this framew ork [12]. Bu t the source we are trying to c on vey c an itself be heterogeneou s, in the s ense that some part of its o utput may de mand a smaller d istortion than other parts. Und erstanding optimal methods for communicating such s ources over noisy chann els present many novel joint-source ch annel coding problems. At times the final objective of commu nication is ac hieving so me coordination be tween various a gents [14]. In these scen arios channe l is used for b oth commun icating d ata and ach ieving coo rdination. A new c lass of proble m lends itself to us when we try to figure out the trade off s betwe en error exponents of the c oordination and data? W e can a lso ac ti vely use UE P in network protoc ols. For examp le, a relay can forward so me pa rtial information ev en if it ca nnot decod e everything. Th is partial information could be charac terized in terms of special bits as well a s s pecial mess ages. Another example is two-way communic ation, where UEP can be us ed for more reliable feedback an d sy nchronization. Information theoretic understa nding of UEP also g i ves r ise to some network optimization prob lems. W ith UEP , the interface to physical layer is no longer bits. Ins tead, it is a collection of vari ous levels of error protection. The achievable channe l resourc es o f reliability and rate n eed to b e e f ficiently divi ded amongs t the se lev els, which giv es rise to many res ource alloca tion problems. V I I . B L O C K C O D E S W I T H O U T F E E D BA C K : P RO O F S In the following sections, we use the following s tandard notation for entropy , conditional entropy and mutual information, H ( P X ) = X j ∈X P X ( j ) ln 1 P X ( j ) H ( W Y | X | P X ) = X j ∈X ,k ∈Y P X ( j ) W Y | X ( k | j ) ln 1 W Y | X ( k | j ) I ( P , W ) = X j ∈X ,k ∈Y P X ( j ) W Y | X ( k | j ) ln W Y | X ( k | j ) P i ∈X W Y | X ( k | i ) P X ( i ) . In addition we denote the de coding region of a messag e i ∈ M by G ( i ) , i.e. G ( i ) , { y n : ˆ M ( y n ) = i } . 20 A. Pr oof of Theorem 1 Proof: W e fi rst show tha t any cap acity achieving s equenc e Q with E b , Q can be use d to con struct another capac ity achieving seq uence, Q ′ with E b , Q ′ = E b , Q 2 , all members of which are fixed c omposition codes. Then we show that E b , Q ′ = 0 for any capacity achieving s equenc e, Q ′ which only include s fi xed composition codes . Consider a ca pacity ac hieving seque nce, Q with mess age s ets M ( n ) = M 1 × M ( n ) 2 , where M 1 = { 0 , 1 } . As a resu lt of Marko v inequality , at least 4 5 |M ( n ) | of the message s in M ( n ) satisfy , Pr h ˆ M 1 6 = M 1    M = i i ≤ 5 Pr h ˆ M 1 6 = M 1 i . (1) Similarly at lea st 4 5 |M ( n ) | of the message s in M ( n ) satisfy , Pr h ˆ M 6 = M    M = i i ≤ 5 Pr h ˆ M 6 = M i . (2) Thus at leas t 3 5 |M ( n ) | o f the messa ges in M ( n ) satisfy bo th (1) and (2). Cons equently at leas t 1 10 |M ( n ) | messag es a re of the form (0 , M 2 ) and satisfy equations (1) an d (2). If we group them according to their empirical distrib ution at lea st one of the groups will have more than |M ( n ) | 10( n +1) |X | messag es bec ause the number of d if fere nt empirical distributi ons for elements of X n is less tha n ( n + 1) |X | . W e keep the first |M ( n ) | 10( n +1) |X | codewords of this most populous type, denote them by ¯ x ′ A ( · ) and thro w away all of other cod ew ord correspon ding to the mess ages of the form (0 , M 2 ) . W e do the sa me for the messa ges of the form M = (1 , M 2 ) a nd deno te co rresponding codewords by ¯ x ′ B ( · ) . Thus we have a length n code with messa ge set M ′ of the form M ′ = M 1 × M ′ 2 where M 1 = { 0 , 1 } and |M ′ 2 | = |M ′ 2 | 10( n +1) |X | . Furthermore, Pr h ˆ M ′ 1 6 = M ′ 1    M ′ = i i ≤ 5 Pr h ˆ M 1 6 = M 1 i Pr h ˆ M ′ 6 = M ′    M ′ = i i ≤ 5 Pr h ˆ M 6 = M i ∀ i ∈ M ′ . Now let u s co nsider followi ng 2 n long block c ode with messag e set M ′′ = M 1 × M ′′ 2 × M ′′ 3 where M ′′ 2 = M ′′ 3 = M ′ 2 . If M ′′ = (0 , M ′′ 2 , M ′′ 3 ) the n ¯ x ( M ′′ ) = ¯ x ′ A ( M ′′ 2 ) ¯ x ′ B ( M ′′ 3 ) . If M ′′ = (1 , M ′′ 2 , M ′′ 3 ) the n ¯ x ( M ′′ ) = ¯ x ′ B ( M ′′ 2 ) ¯ x ′ A ( M ′′ 3 ) . Decode r of this new length 2 n code uses the dec oder of the original length n code first on y n and then on y 2 n n +1 . If the c oncaten ation of length n codewords correspond ing to the decod ed halves, is a co dew ord for a n i ∈ M ′′ then ˆ M ′′ = i . Else an arbitrary me ssage is de coded. One can e asily s ee that the error probability of the length 2 n code is less tha n the twice the error proba bility o f the leng th n co de, i.e. Pr h ˆ M ′′ 6 = M ′′    M ′′ i ≤ 1 − (1 − P r h ˆ M ′ 6 = M ′    M ′ = M ′′ 2 i )(1 − Pr h ˆ M ′ 6 = M ′    M ′ = M ′′ 3 i ) ≤ 2 Pr h ˆ M ′ 6 = M ′ i . Furthermore bit error proba bility of the new code is also at most twice the bit error probability of the leng th n code, i.e. Pr h ˆ M ′′ 1 6 = M ′′ 1    M ′′ 1 i ≤ 1 − (1 − P r h ˆ M ′ 1 6 = M ′ 1    M ′ 1 = M ′′ 1 i )(1 − Pr h ˆ M ′ 1 6 = M ′ 1    M ′ 1 = M ′′ 1 i ) ≤ 2 Pr h ˆ M ′ 1 6 = M ′ 1 i Thus using the se cod es one can obtain a cap acity achieving se quence Q ′ with E b , Q ′ = E b , Q 2 all members of which are fixed compos ition cod es. In the following discus sion we foc us o n ca pacity a chieving seque nces, Q ’ s whic h are compos ed o f fixed composition codes only . W e will show that E b , Q = 0 for all capacity achieving Q ’ s with fixed c omposition codes. Cons equently the discuss ion above implies tha t E b = 0 . 21 W e call the e mpirical distrib ution of a given ou tput seque nce, y n , conditioned on the c ode word, ¯ x ( i ) , the conditional type of y n giv en the mes sage i and denote it by V ( y n , i ) . Furthermore we call the set of y n ’ s whos e conditional type with message i is V , the V -shell of i and denote it by T V ( i ) . Similarly we denote the set of output seque nces y n with the e mpirical distributi on U Y , by T U Y . W e de note the empirical distribution of the c odewords of the n th code of the sequenc e by P ( n ) X and the correspond ing output distrib ution by P ( n ) Y , i.e. P ( n ) Y ( · ) = X i ∈X W Y | X ( ·| i ) P ( n ) X ( i ) . W e simply u se P X and P Y whenever the value of n is una mbiguous from the co ntext. Furthermore P n Y ( · ) stands for the prob ability measu re on Y n such that P n Y ( y n ) = n Y k =1 P Y ( y k ) . S ( n ) 0 ,V is the se t of y n ’ s for which ˆ M 1 = 0 and V ( y n , ˆ M ( y n )) = V . S ( n ) 0 ,V , { y n : V ( y n , ˆ M ( y n )) = V and ˆ M ( y n ) = (0 , j ) for some j ∈ M 2 } (3) In other words, S ( n ) 0 ,V is the set of y n ’ s such that y n ∈ T V  ˆ M ( y n )  and decode d value of the fi rst bit is zero. Note that since for each y n ∈ Y n there is a unique ˆ M ( y n ) and for each y n ∈ Y n and me ssage i ∈ M there is unique V ( y n , i ) ; e ach y n belongs to a unique S ( n ) 0 ,V or S ( n ) 1 ,V , i.e. S ( n ) 0 ,V ’ s an d S ( n ) 1 ,V ’ s are disjoint sets that collectively cover the s et Y n . Let us d efine the typica l neighborhood of W Y | X as [ W ] [ W ] , { V Y | X : | V Y | X ( j | i ) P ( n ) X ( i ) − W Y | X ( j | i ) P ( n ) X ( i ) | ≤ 4 p 1 /n ∀ i, j } (4) Let us den ote the union of all S ( n ) 0 ,V ’ s for typical V ’ s by S ( n ) 0 = [ V ∈ [ W ] S ( n ) 0 ,V . W e will establish the following inequality later . Let us assume for the moment that it h olds. P n Y  S ( n ) 0  ≥ e n ( R ( n ) − ( C + ǫ n ))  1 2 − |X || Y | 8 √ n − P e  (5) where lim n →∞ ǫ n = 0 . As a result of b ound gi ven in (5) and the blo wing up lemma [13, Ch. 1, Lemma 5.4], we can c onclude that for any c apacity ac hieving seque nce Q , there exists a s equenc e of ( ℓ n , η n ) pa irs satisfying lim n →∞ η n = 1 and lim n →∞ ℓ n n = 0 such that P n Y  Γ ℓ n ( S ( n ) 0 )  ≥ η n where Γ ℓ n ( A ) is the set of all y n ’ s which differs from an element of A in a t mos t ℓ n places. Clearly on e can repeat the same argument for Γ ℓ n ( S ( n ) 1 ) to ge t, P n Y  Γ ℓ n ( S ( n ) 1 )  ≥ η n . Consequ ently , P n Y  Γ ℓ n ( S ( n ) 0 ) \ Γ ℓ n ( S ( n ) 1 )  = P n Y  Γ ℓ n ( S ( n ) 0 )  + P n Y  Γ ℓ n ( S ( n ) 1 )  − P n Y  Γ ℓ n ( S ( n ) 0 ) [ Γ ℓ n ( S ( n ) 1 )  P n Y  Γ ℓ n ( S ( n ) 0 ) \ Γ ℓ n ( S ( n ) 1 )  ≥ 2 η n − 1 . 22 Note that if y n ∈ Γ ℓ n ( S ( n ) 1 ) , then the re exist at lea st one element ˜ y n ∈ T P Y which differs from y n in at mos t ( |Y ||X | n 3 / 4 + ℓ n ) plac es. 9 Thus we c an upper bound its p robability by , y n ∈ Γ ℓ n ( S ( n ) 1 ) ⇒ P n Y ( y n ) ≤ e − nH ( P Y ) − ( |Y ||X | n 3 / 4 + ℓ n ) ln λ where λ = m in i,j W Y | X ( j | i ) . Thus we have | Γ ℓ n ( S ( n ) 0 ) \ Γ ℓ n ( S ( n ) 1 ) | ≥ (2 η n − 1) e nH ( P Y )+( |Y ||X | n 3 / 4 + ℓ n ) ln λ . (6) Note tha t for a ny y n ∈ Γ ℓ n ( S ( n ) 0 ) T Γ ℓ n ( S ( n ) 1 ) , there exist a ˜ y n ∈ T W ( i ) for an i of the form i = (0 , M 2 ) wh ich dif fers from y n in at mo st ( |Y ||X | n 3 / 4 + ℓ n ) places . 10 Consequ ently Pr [ y n | M = i ] ≥ e − nH ( W Y | X | P X )+( |Y ||X | n 3 / 4 + ℓ n ) ln λ . (7) Since M 2 = e nR ( n ) 2 using e quation (7) we can lower bo und the probability of y n under the hypo thesis M 1 = 0 as follows, Pr [ y n | M 1 = 0] = X j ∈M 2 Pr [ y n | M = (0 , j )] Pr [ M = (0 , j ) | M 1 = 0] ≥ 2 e − n ( H ( W Y | X | P X )+ R ( n ) )+( |Y ||X | n 3 / 4 + ℓ n ) ln λ . (8) Clearly same holds for M 1 = 1 too, thus Pr [ y n | M 1 = 1 ] ≥ 2 e − n ( H ( W Y | X | P X )+ R ( n ) )+( |Y ||X | n 3 / 4 + ℓ n ) ln λ . (9) Consequ ently , Pr h ˆ M 1 6 = M 1 i ≥ X y n 1 2 min(Pr [ y n | M 1 = 0] , Pr [ y n | M 1 = 1 ]) ( a ) ≥ X y n ∈ Γ ℓ n ( S ( n ) 0 ) T Γ ℓ n ( S ( n ) 1 ) e − n ( H ( W Y | X | P X )+ R ( n ) )+( |Y ||X | n 3 / 4 + ℓ n ) ln λ ( b ) ≥ (2 η n − 1) e nH ( P Y )+( |Y ||X | n 3 / 4 + ℓ n ) ln λ e − n ( H ( W Y | X | P X )+ R ( n ) )+( |Y ||X | n 3 / 4 + ℓ n ) ln λ = (2 η n − 1) e n ( I ( P X ,W ) − R ( n ) )+2( |Y ||X | n 3 / 4 + ℓ n ) ln λ (10) where ( a ) follows from equ ations (8) and (9) and ( b ) follows from equation (6). Using Fano’ s inequa lity we get, I ( M ; Y n ) − nR ( n ) ≥ − ln 2 − nR ( n ) P e ( n ) (11) where I ( M ; Y n ) is the mutual information between the messag e M and chan nel output Y n . In a ddition we can upper boun d I ( M ; Y n ) as follows, I ( M ; Y n ) = X i ∈M ,y n ∈Y n Pr [ i, y n ] ln Pr[ y n | i ] Pr[ y n ] = X i ∈M ,y n ∈Y n Pr [ i, y n ] ln Pr[ y n | i ] Q n k =1 P Y ( y k ) − X y n ∈Y n Pr [ y n ] ln Pr[ y n ] Q n k =1 P Y ( y k ) ( a ) ≤ X i ∈M 1 |M| n X k =1 X y k W Y | X ( y k | ¯ x k ( i )) ln W Y | X ( y k | ¯ x k ( i )) P Y ( y k ) ( a ) = nI ( P X , W ) (12) 9 Because of the integer constraints T P Y might actually be an empty set. If so we can make a similar argument for the U ∗ Y which minimizes P j | U Y ( j ) − P Y ( j ) | . Ho we ver this technicality is inconsequential. 10 Integer constraints here are inconsequential too. 23 where P Y ( · ) = P j ∈X W Y | X ( · ) P X ( j ) . Step ( a ) follows the no n-negati vity of KL diver ge nce a nd step ( b ) follows from the fact that all the cod e words a re of type P X ( · ) . Using equa tions (10), (11) an d (12) we get Pr h ˆ M 1 6 = M 1 i ≥ (2 η n − 1) e − l n 2 − nR ( n ) P e ( n ) +2( |Y ||X | n 3 / 4 + ℓ n ) ln λ Thus using lim n →∞ P e ( n ) = 0 , lim n →∞ η n = 1 and lim n →∞ ℓ n n = 0 we c onclude that, lim n →∞ − ln Pr ( n ) [ ˆ M 1 6 = M 1 ] n = 0 Now o nly think left, for p roving E b = 0 , is to es tablish inequality (5). One c an write the e rror probability of the n th code of Q a s P e ( n ) = X i ∈M ( n ) 1 M X y n ∈Y n (1 − I { ˆ M ( y n )= i } ) Pr [ y n | M = i ] = X i ∈M ( n ) e − nR ( n ) X V X y n ∈ T V ( i ) (1 − I { ˆ M ( y n )= i } ) e − n ( D ( V Y | X ( ·| X ) k W Y | X ( ·| X ) | P X )+ H ( V Y | X | P X )) = X V e − n ( D ( V Y | X ( ·| X ) k W Y | X ( ·| X ) | P X )+ H ( V Y | X | P X )+ R ( n ) ) X i ∈M ( n ) X y n ∈ T V ( i ) (1 − I { ˆ M ( y n )= i } ) = X V e − n ( D ( V Y | X ( ·| X ) k W Y | X ( ·| X ) | P X )+ H ( V Y | X | P X )+ R ( n ) ) ( Q 0 ,V + Q 1 ,V ) (13) where Q k ,V = X i =( k, j ) j ∈M 2 X y n ∈ T V ( i ) (1 − I { ˆ M ( y n )= i } ) for k = 0 , 1 . Note that Q k ,V is the sum, over the messages i for which M 1 = k , of the numbe r of the elements in T V ( i ) that are not de coded to messag e i . In a sense it is a me asure of the co ntrib ution of the V -shells of different codewords to the e rror probability . W e will use e quation (13) to establish lo wer boun ds on P n Y  S ( n ) 0 ,V  ’ s. Note that all elements of S ( n ) 0 ,V have the same prob ability under P n Y ( · ) and P n Y  S ( n ) 0 ,V  = |S ( n ) 0 ,V | e − ζ n where ζ = X x,y P X ( x ) V Y | X ( y | x ) ln 1 P Y ( y ) . (14) Note that ζ = X x,y P X ( x ) V Y | X ( y | x ) ln W Y | X ( y | x ) P Y ( y ) + X x,y P X ( x ) V Y | X ( y | x ) ln 1 W Y | X ( y | x ) = I ( P X , W Y | X ) + D  V Y | X ( ·| X )   W Y | X ( ·| X )   P X  + H ( V Y | X | P X ) + X x,y P X ( x )( V Y | X ( y | x ) − W Y | X ( y | x )) ln W Y | X ( y | x ) P Y ( y ) Recall that I ( P X , W Y | X ) ≤ C and min i,j W Y | X ( i | j ) = λ . T hus using the definition of [ W Y | X ] gi ven in equ ation (4) we g et, ζ ≤ C + ǫ n + D  V Y | X ( ·| X )   W Y | X ( ·| X )   P X  + H ( V Y | X | P X ) ∀ V Y | X ∈ [ W Y | X ] (15) where ǫ n = |X || Y | 4 √ n ln 1 λ . Note that |S ( n ) 0 ,V | = |M ( n ) 2 | · | T V ( i ) | − Q 0 ,V = 1 2 | T V ( i ) | e nR ( n ) − Q 0 ,V . (16) 24 Recalling that S ( n ) 0 ,V ’ s are disjoint a nd using e quations (14), (15) and (16) we get P n Y  S ( n ) 0  ≥ X V ∈ [ W ] P n Y  S ( n ) 0 ,V  ≥ X V ∈ [ W ] e − n ( C + ǫ n )  1 2 | T V ( i ) | e nR ( n ) − Q 0 ,V  e − n ( D ( V Y | X ( ·| X ) k W Y | X ( ·| X ) | P X )+ H ( V Y | X | P X )) ( a ) ≥ e n ( R ( n ) − ( C + ǫ n ))   X V ∈ [ W ] 1 2 | T V ( i ) | e − n ( D ( V Y | X ( ·| X ) k W Y | X ( ·| X ) | P X )+ H ( V Y | X | P X )) − P e   = e n ( R ( n ) − ( C + ǫ n ))   1 2 X V ∈ [ W ] X y n ∈ T V ( i ) Pr [ y n | M = i ] − P e   ( b ) ≥ e n ( R ( n ) − ( C + ǫ n ))  1 2 − |X || Y | 8 √ n − P e  where ( a ) follows the equ ation (13) and ( b ) follows from the C hebysh ev’ s inequa lity . 11 • B. Pr oof of Theorem 2 1) Achievability: E md ≥ ˜ C : Proof: For each block length n , the special message is sent wit h the length n repetition sequen ce ¯ x n (1) = ( x r , x r · · · , x r ) where x r is the input letter satisfying D  P ∗ Y ( · ) k W Y | X ( ·| x r )  = max i D  P ∗ Y ( · ) k W Y | X ( ·| i )  . The remaining |M| − 1 o rdinary codewords are generated randomly a nd indepen dently of ea ch o ther using capac ity ach ie ving input distribution P ∗ X i.i.d. over time. Let us d enote the emp irical distribution of a pa rticular output seq uence y n by Q ( y n ) . The rec eiv er dec odes to the sp ecial messa ge only when the o utput distribution is not c lose to P ∗ Y . Being more precise, the set of ou tput sequen ces close to P ∗ Y , [ P ∗ Y ] , and d ecoding region of the sp ecial messa ge, G (1) , are gi ven as follo ws, [ P ∗ Y ] = { P Y ( · ) : k P Y ( i ) − P ∗ Y ( i ) k ≤ 4 p 1 /n ∀ i ∈ Y } G (1) = { y n : Q ( y n ) ∈ [ P ∗ Y ] } . Since there are at most ( n + 1) |Y | dif ferent empirical ou tput distrib ution for elements of Y n we get, Pr ( n ) [ y n / ∈ G (1) | M = 1] ≤ ( n + 1) |Y | e − n min Q Y ∈ [ P ∗ Y ] D ( Q Y ( · ) k W Y | X ( ·| x r )) Thus lim n →∞ − ln Pr ( n ) [ y n / ∈G (1) | M =1] n = D  P ∗ Y ( · ) k W Y | X ( ·| x r )  = ˜ C . Now the only thing we a re left with to prov e is that we can h av e lo w enough probability for the remaining messag es. For doing that we will first calculate the average e rror probability of the following random cod e ensemble. Entries of the code book, other than the one s co rresponding to the s pecial messa ge, are gene rated ind epende ntly using a capacity ach ieving input distribution P ∗ X . Be cause of the symme try av erage error proba bility is sa me for all i 6 = 1 in M . Let us calcu late the error p robability of the me ssage M = 2 . Assuming that the sec ond mes sage was trans mitted, Pr [ y n ∈ G (1) | M = 2 ] is vanishingly small. It is be cause , the output d istrib u tion for the rando m ens emble for ordinary codewords is i.i.d. P ∗ Y . Chebysh ev’ s inequa lity 11 The claim in ( b ) is identical to the one in [13][Remark on page 34] 25 guarantees that p robability of the output type being outside a 4 p 1 /n ball a round P ∗ Y , i.e. [ P ∗ Y ] , is of the order p 1 /n . Assuming that the se cond message was transmitted, Pr [ y n ∈ ∪ i> 2 G ( i ) | M = 2] is vanishingly small due to the stan dard ran dom coding argument for ac hieving capa city [35]. Thus for any P e > 0 for all large enough n average error proba bility of the code e nsemble is smaller than P e thus we have at least one co de with that P e . For that code at lea st half of the co dewords have a n error proba bility less then 2 P e . • 2) Con verse: E md ≤ ˜ C : In the section VIII-D.2, we will prove that even with feed back and variable d ecoding time, the miss ed-detection exponen t of a single special messag e is a t most ˜ C . Thus E md ≤ ˜ C . C. Pr o of of Theo rem 3 1) Achievability: E md ≥ E ( r ) : Proof: Special code words: At any gi ven block length n , we start with a o ptimum codebo ok (say C s pecial ) for ⌈ e nr ⌉ messag es. Suc h o ptimum code book a chieves error exponent E ( r ) for every messag e in it. Pr h ˆ M 6 = i | M = i i . = e − nE ( r ) ∀ i ∈ M s ≡ { 1 , 2 , · · · , ⌈ e nr ⌉} Since there are at most ( n + 1) |X | dif ferent types, the re is at least one type T P X which has ⌈ e nr ⌉ (1+ n ) |X | or more codewords. Throw away all other codewords from C s pecial and lets call the remaining fixed co mposition c odeboo k as C ′ s pecial . Code book C ′ s pecial is use d for transmitting the special messa ges. As sho wn in Fig. 3( a), let the noise ball arou nd the codeword for the specia l messag e i be B i . These balls need not be disjoint. Let B de note the union of these balls of a ll spe cial mes sages . B = [ i ∈M s B i If the output sequence y n ∈ B , the first stage of the decoder decides a s pecial message was transmitted. The second stag e the n choos es the ML can didate amongst the mes sages in M s . Let us d efine B i precisely now . B i = { y n : V ( y n , i ) ∈ W ( r + ǫ, P X ) } where W ( r + ǫ, P X ) = { V Y | X : D  V Y | X ( ·| X )   W Y | X ( ·| X )   P X  ≤ E sp ( r + ǫ ; P X ) } . Recall that the sph ere- packing exponen t for inpu t type P X at rate r , E sp ( r ; P X ) is given by , E sp ( r ; P X ) = min V Y | X : I ( P X ,V Y | X ) ≤ r D  V Y | X ( ·| X )   W Y | X ( ·| X )   P X  Ordinary codewords : The o rdinary cod ew ords a re generated ran domly using a capacity ac hieving input distri- buti on P ∗ X . This is the same as Sh annon’ s cons truction for ach ieving cap acity . The random coding c onstruction provides a simple way to s how that in the cavity B c (complement of B ), we can esse ntially fit enough typical noise-balls to a chieve c apacity . This av oids the complicated task o f carefully ch oosing the ordinary co dewor ds and their de coding regions in the cavity , B c . If the o utput seque nce y n ∈ B c , the first sta ge of the dec oder decides an ordinary message was trans mitted. The sec ond stage then choo ses the ML cand idate from ordinary codewords. Error analysis: First, cons ider the case when a spe cial cod ew ord ¯ x n ( i ) is transmitted. By Stein’ s lemma a nd definition of B i , the probability of y n / ∈ B i has exponent E sp ( r + ǫ ; P X ) . Hence the first stage error expone nt is at least E sp ( r + ǫ ; P X ) . 26 Assuming correct fi rst sta ge deco ding, the secon d stage e rror exp onent for sp ecial me ssage s equa ls E ( r ) . Hence the e f fecti ve error exponent for sp ecial messa ges is min { E ( r ) , E sp ( r + ǫ ; P X ) } Since E ( r ) is at most the sphere-packing expo nent E sp ( r ; P X ) , [19], c hoosing arbitrari ly small ǫ ensures that missed-detec tion expone nt of ea ch spec ial mes sage e quals E ( r ) . Now cons ider the situation of a u niformly c hosen ordinary codewor d being transmitted. W e h ave to make sure that the error probability is v anishingly small now . In this ca se, the outpu t s equen ce distribution is i.i.d. P ∗ Y for the random coding ense mble. The first sta ge d ecoding error happe ns whe n y n ∈ S B i . Again by Stein’ s lemma, this expone nt for a ny particular B i equals E o : E o = min V Y | X ∈W ( r + ǫ,P X ) D  V Y | X ( ·| X )   P ∗ Y ( · ) | P X  ( a ) = min V Y | X ∈W ( r + ǫ,P X ) I ( P X , V Y | X ) + D ( ( P V ) Y ( · ) k P ∗ Y ( · )) ( b ) ≥ min V Y | X ∈W ( r + ǫ,P X ) I ( P X , V Y | X ) ( c ) ≥ r + ǫ where in ( P V ) Y in ( a ) is given by ( P V ) Y ( j ) = P i P X ( i ) V Y | X ( j | i ) , ( b ) follows from the non -negati vity o f the KL diver ge nce an d ( c ) follows from the defin ition o f sphe re-packing expo nent and W ( r + ǫ, P X ) . Applying union b ound over the spe cial me ssage s, the probability of first stage d ecoding e rror after se nding an ordina ry mess age is at mos t . = exp( nr − nE o ) . W e have a lready shown that E o ≥ r + ǫ , whic h ens ures that probability of fi rst stage d ecoding e rror for ordina ry mes sages is a t mo st . = e − nǫ for the ran dom c oding e nsemble. Recall that for the ran dom cod ing e nsemble, average error probability of the secon d-stage de coding also v anishes below cap acity . T o summarize, we have shown thes e two properties of the random cod ing en semble: 1) Error probability of first stage decod ing vanishes as a ( n ) . = exp( − nǫ ) with n whe n a un iformly c hosen ordinary mes sage is transmitted. 2) Error probability of second stage decoding (say b ( n ) ) vanishes with n when a uniformly chos en ordina ry messag e is trans mitted. Since the first error probab ility is at most 4 a ( n ) for some 3 / 4 fraction of c odes in the random ensemble, and the second error prob ability is at mos t 4 b ( n ) for some 3 / 4 fraction, the re exists a particular c ode which s atisfies both these p roperties. Th e overall error probability for ordinary messa ges is a t mos t 4( a ( n ) + b ( n ) ) , wh ich vanishes with n . W e will us e this particular c ode for the ordinary c odewords. This de-randomiza tion comp letes our c onstruction of a reliable co de for ordinary messages to be comb ined with the co de C s pecial for sp ecial me ssage s. • 2) Con verse: E md ≤ E ( r ) : The conv erse a r gument for this res ult is obvious. Removing the ordinary mess ages from the code can only improv e the error probability of the spec ial messages . Even then, (by definition) the best missed detec tion expon ent for the spec ial mess ages equals E ( r ) . D. Pr o of of Theo rem 4 Let us now address the case with eras ures. In this ach iev ab ility result, the firs t s tage o f d ecoding rema ins unchan ged from the no-erasure case . Proof: W e use e ssentially the same strategy as before. Let u s start with a good code for ⌈ e nr ⌉ messag es a llo wing e rasure decoding . Forney h ad shown in [18] that, for s ymmetric channels an error exponent equal to E sp ( r ) + C − r is ach iev ab le while ensuring that eras ure probability vanishes with n . W e can u se that code for the se ⌈ e nr ⌉ 27 codewords. As before, for y n ∈ S i B i , the first stage dec ides a special c odeword was s ent. The n the second stage applies the eras ure decod ing me thod in [18] amongst the special codewords. W ith this decoding rule, when a special me ssage is transmitted, error prob ability of the two-stage decod ing is bottle-necked by the first sta ge: its error exponent E sp ( r + ǫ ) is smaller t han that o f the second stag e ( E sp ( r )+ C − r ). By ch oosing a rbitraril y small ǫ , the special message s can achiev e E sp ( r ) as the ir missed -detection expone nt. The ordinary codewords are again g enerated i.i.d. P ∗ X . If the first stage decide s in fa v or o f the o rdinary messag es, ML deco ding is implemen ted among ordinary codewords. If a n ordinary messag e was transmitted, we can ens ure a vanishing error p robability as be fore by repeating ea rlier arguments for no-erasure case . • V I I I . V A R I A B L E L E N G T H B L O C K C O D E S W I T H F E E D BA C K : P RO O F S In this section we will present a more de tailed discussion of bit-wise and me ssage wise UEP for v ariable length block code s with fee dback by proving the The orems 5, 6, 7, 8 and 9. In the proofs of co n verse res ults we need to disc uss issues related with the con ditional e ntropy of the mes sages gi ven the observation of the receiver . In those d iscussion we us e the follo wing notation for conditional entropy and conditional mutua l information, H ( M | Y n ) = − X i ∈M Pr [ M = i | Y n ] ln Pr [ M = i | Y n ] I ( M ; Y n +1 | Y n ) = H ( M | Y n ) − E  H ( M | Y n +1 )   Y n  . It is worth n oting that this notation is different from widely us ed o ne, which includes a further expe ctation over the the con ditioned variable. “ H ( M | Y n ) ” in the con ventional notation, stands for the E [ H ( M | Y n )] a nd “ H ( M | Y n = y n ) ” stand s for H ( M | Y n ) . A. Pr oo f o f Theorem 5 1) Achievability: E f b ≥ ˜ C : This sing le special bit expo nent is achiev ed using the missed detection exponent of a s ingle special mes sage, indicating a decoding error for the spe cial bit. Th e decoding error for the bit goes unnoticed when this s pecial messag e is not detected. This sho ws how feedb ack connects bit-wise UEP to messag e-wise UEP in a fundamental manner . Proof: W e will prove that E f b ≥ ˜ C by constructing a capac ity achieving s equenc e with fee dback, Q , s uch tha t E f b , Q = ˜ C . For that let Q ′ be a capa city achieving s equenc e suc h that E md , Q ′ = ˜ C . Note that existence of such a Q ′ is guaranteed as a resu lt of Theo rem 2. W e first co nstruct a two p hase fixed length block code with feedback and erasures. Then using this we obtain the k th element of Q . In the first ph ase one o f the two inpu t symbo ls, x 0 and x 1 , with distinct outpu t distrib utions 12 is se nd for ⌈ √ k ⌉ time units depending on M 1 . At time ⌈ √ k ⌉ rec eiv er makes tentativ e dec ision ˜ M 1 on mes sage M 1 . U sing Chernoff bound it c an easily be shown that, [36, Theorem 5 ] Pr h ˜ M 1 6 = M 1 i ≤ e − µ √ k where µ > 0 Actual value of µ , however , is immaterial to us we are me rely i nterested in finding an u pper bound on Pr h ˜ M 1 6 = M 1 i which goes to zero as k inc reases. In the s econd pha se transmitter uses the k th member of Q ′ . The mes sage in the s econd pha se, M ′ , is d etermined by M 2 depend ing on whether M 1 is dec oded c orrectly or n ot at the e nd of the fi rst phase . ˜ M 1 6 = M 1 ⇒ M ′ = 1 ˜ M 1 = M 1 and M 2 = i ⇒ M ′ = i + 1 ∀ i 12 T wo input symbols x 0 and x 1 are such that W ( ·| x 1 ) 6 = W ( ·| x 0 ) 28 At the e nd of the second phase d ecoder decode s M ′ using the dec oder of Q ′ . If the decod ed message is one, i.e. ˆ M ′ = 1 then receiv er declares an eras ure, e lse ˆ M 1 = ˜ M 1 and ˆ M 2 = ˆ M ′ − 1 . Note that erasure p robability of the two p hase fixed length block co de is uppe r b ounded as Pr h ˆ M ′ = 1 i ≤ Pr h ˜ M 1 6 = M 1 i + P r  M ′ = 1   M ′ 6 = 1  ≤ e − µ √ k + M ′ ( k ) M ′ ( k ) − 1 P e ′ ( k ) (17) where P e ′ ( k ) is the error probability of the k th member of Q ′ . Similarly we can upp er bound the probab ilities of two error events as sociated with the two pha se fixed length block cod e a s follo ws Pr h ˆ M 1 6 = M 1 , ˆ M ′ 6 = 1 i ≤ P e ′ ( k ) (1) (18) Pr h ˆ M 6 = M , ˆ M ′ 6 = 1 i ≤ M ′ ( k ) M ′ ( k ) − 1 P e ′ ( k ) + P e ′ ( k ) (1) (19) where P e ′ ( k ) (1) is the conditional error proba bility of the 1 st messag e in the k th element of Q ′ . If there is an erasure the trans mitter and the recei ver will repeat what they have don e again, u ntil they get ˆ M ′ 6 = 1 . If we su m the proba bilities of all the error e vents, includ ing error events in the poss ible repetitions we get; Pr h ˆ M 1 6 = M 1 i = Pr [ ˆ M 1 6 = M 1 , ˆ M ′ 6 =1 ] 1 − Pr [ ˆ M ′ =1 ] (20) Pr h ˆ M 6 = M i = Pr [ ˆ M 6 = M , ˆ M ′ 6 =1 ] 1 − Pr [ ˆ M ′ =1 ] (21) Note that expec ted d ecoding time of the code is E [ τ ] = k + ⌈ √ k ⌉ 1 − Pr [ ˆ M ′ =1 ] (22) Using e quations (17), (18), (19), (20), (21) an d (22) one can conclude that the resulting sequ ence o f variable length block co des with feedba ck, Q , is reliable. F urthermore R Q = C and E f b , Q = ˜ C . • 2) Con verse: E f b ≤ ˜ C : W e will use a con verse resu lt we hav e n ot proved ye t, namely conv erse part o f Theorem 8, i.e . E f md ≤ ˜ C . Proof: Consider a capac ity ac hieving sequenc e, Q , with mes sage set sequence M ( k ) = { 0 , 1 } × M ( k ) 2 . Using Q we construct anothe r c apacity achieving s equen ce Q ′ with a special messa ge 0 , with message set sequen ce M ′ ( k ) = { 0 } ∪ M ( k ) 2 such that E f md , Q ′ = E f b , Q . Th is implies E f b ≤ E f md , which together with Theo rem 8, E f md ≤ ˜ C , gives us E f b ≤ ˜ C . Let u s de note the messa ge of Q by M and tha t o f Q ′ by M ′ . The k th code o f Q ′ is as follow . At time 0 receiver choose s randomly an M 1 for k th element of Q and send its choice through feedback channel to trans mitter . If the me ssage of Q ′ is no t 0 , i.e . M ′ 6 = 0 then the transmitter uses the codeword for M = ( M 1 , M ′ ) to con vey M ′ . If M ′ = 0 receiver pick a M 2 with un iform distrib ution o n M 2 and use s the code word for M = (1 − M 1 , M 2 ) to conv ey that M ′ = 0 . Receiver makes dec oding using the dec oder of Q : if ˆ M = ( M 1 , i ) the n ˆ M ′ = i , if ˆ M = (1 − M 1 , i ) the n ˆ M ′ = 0 . One can easily show tha t expe cted deco ding time an d error probability of both of the codes are same. Furthermore error prob ability o f M 1 in Q is equa l to conditional error prob ability o f me ssage M ′ = 0 in Q ′ thus, E f md , Q ′ = E f b , Q . • 29 B. Pr oof of Theorem 6 1) Achievability: E f bits ( r ) ≥  1 − r C  ˜ C : Proof: W e will cons truct the capac ity achieving seq uence with feedba ck Q u sing a capac ity ach ievi ng seq uence Q ′ satisfying E md , Q ′ = ˜ C , as we did in the proo f of theorem 5. W e know that s uch a seque nce exists, becaus e of Theorem 8. For k th member o f Q , conside r the following two phase errors and eras ures co de. In the first phase trans mitter uses the ⌊ r k ⌋ th element of Q ′ to c on vey M 1 . Rec eiv er makes a tentativ e decision ˜ M 1 . In the se cond phas e transmitter uses the ⌊ ( C − r ) k ⌋ th element of Q ′ to con vey M 2 and whether ˜ M 1 = M 1 or not, with a mapp ing similar to the one we ha d in the proof o f theorem 5. ˜ M 1 6 = M 1 ⇒ M ′ = 1 ˜ M 1 = M 1 and M 2 = i ⇒ M ′ = i + 1 ∀ i Thus M ( k ) 1 = M ′ ( ⌊ rk ⌋ ) and M ( k ) 2 ∪ {|M ( k ) 2 | + 1 } = M ′ ( ⌊ ( C − r ) k ⌋ ) . If we apply a de coding algorithm, like the one we ha d in the proof of theorem 5; going through esse ntially the same a nalysis with proof of Theorem 5, we can con clude that Q is a capacity achieving seq uence and E f bits , Q =  1 − r C  ˜ C and r Q = r . • 2) Con verse: E f bits ( r ) ≤  1 − r C  ˜ C : In esta blishing the con verse we will use a techn ique that was used previously in [4], togethe r with lemma 1 which we will prove in the co n verse part The orem 8. Proof: Consider any variable leng th block code with fee dback whose mes sage set M is o f the form M = M 1 × M 2 . Let t δ be the fi rst time instanc e that an i ∈ M 1 becomes more likely than (1 − δ ) a nd let τ δ = t δ ∧ τ . Recall that min i,j W Y | X ( j | i ) = λ c onseq uently definition of τ δ implies that min i ∈M 1 (1 − Pr [ M 1 = i | y τ δ ]) ≥ λδ . Thus using Markov inequality for P e we ge t, Pr [ τ δ = τ ] ≤ P e λδ (23) W e use e quation (23) to bound expected value of the entropy of first part of the message at time τ δ as follows, E [ H ( M 1 | Y τ δ )] = E  H ( M 1 | Y τ δ ) I { τ δ = τ }  + E  H ( M 1 | Y τ δ ) I { τ δ <τ }  ≤ P e λδ ln |M 1 | + (ln 2 + δ ln |M 1 | ) = ln 2 + ( P e λδ + δ ) ln |M 1 | ) It has alread y been e stablished in, [4], E [ H ( M ) −H ( M | Y τ δ )] E [ τ δ ] ≤ C (24) Thus, E [ τ δ ] ≥ 1 C ( E [ H ( M ) − H ( M 1 | Y τ δ ) − H ( M 2 | M 1 , Y τ δ )]) ≥ 1 C ( − ln 2 + (1 − P e λδ − δ ) ln |M 1 | ) (25) Bound g i ven in inequa lity (25) spec ifies the time need ed for ge tting a likely c andidate, ˜ M 1 . Like it was the case in [4], remaining time is the time spe nd for confirmation. But unlike [4] transmitter needs to con vey also M 2 during this time. For each realization of Y τ δ divi de the mess age s et into d isjoint subse ts, Θ 0 , Θ 1 , . . . , Θ |M 2 | as follows, Θ 0 = { l : l ∈ M , l = ( i, j ) where i 6 = ˜ M 1 ( Y τ δ ) } Θ j = { l : l ∈ M , l = ( ˜ M 1 ( Y τ δ ) , j ) } ∀ j ∈ { 1 , 2 , . . . |M 2 |} 30 where ˜ M 1 ( Y τ δ ) is the most likely messag e giv en Y τ δ . Furthermore let the a uxiliary-message, M ′ , be the index of the set tha t M belongs to, i.e. M ∈ Θ M ′ . The dec oder for the auxiliary me ssage decode s the index of the d ecode d messa ge at the dec oding time τ , i.e ˆ M ′ ( Y τ ) = j ⇔ ˆ M ( Y τ ) ∈ Θ j . W ith these definition we have; Pr h ˆ M ( Y τ ) 6 = M    Y τ δ i ≥ Pr h ˆ M ′ ( Y τ ) 6 = M ′    Y τ δ i Pr h ˆ M 1 ( Y τ ) 6 = M 1    Y τ δ i ≥ Pr h ˆ M ′ ( Y τ ) 6 = 0    Y τ δ , M ′ = 0 i Pr  M ′ = 0   Y τ δ  . Now , we apply Lemma 1 , which will be p roved in section VIII-D.2. T o ease the no tation we use following shorthand; P M ′ e { Y τ δ } = Pr h ˆ M ′ ( Y τ ) 6 = M ′    Y τ δ i P M ′ e { 0 , Y τ δ } = Pr h ˆ M ′ ( Y τ ) 6 = 0    Y τ δ , M ′ = 0 i ξ ( Y τ δ ) = Pr  M ′ ( Y τ δ ) = 0   Y τ δ  . As a res ult of Lemma 1, for each realization of y τ δ ∈ Y τ δ such that τ δ < τ , we hav e (1 − ξ ( Y τ δ ) − P M ′ e { Y τ δ } ) ln 1 P M ′ e { 0 ,Y τ δ } ≤ ln 2 + E [ τ − τ δ | Y τ δ ] J  H ( M ′ | Y τ δ ) − ln 2 − P M ′ e { Y τ δ } ln |M 2 | E [ τ − τ δ | Y τ δ ]  By multiplying b oth sides of the ine quality with I { τ δ <τ } , we ge t an express ion that ho lds for all Y τ δ . I { τ δ <τ } (1 − ξ ( Y τ δ ) − P M ′ e { Y τ δ } ) ln 1 P M ′ e { 0 ,Y τ δ } ≤ I { τ δ <τ } h ln 2 + E [ τ − τ δ | Y τ δ ] J  H ( M ′ | Y τ δ ) − ln 2 − P M ′ e { Y τ δ } ln |M 2 | E [ τ − τ δ | Y τ δ ] i (26) Now we take the expe ctation of both sides over Y τ δ . For the right h and side we ha ve, R.H .S. = E h ln 2 + E [ τ − τ δ | Y τ δ ] J  H ( M ′ | Y τ δ ) − ln 2 − P M ′ e { Y τ δ } ln |M 2 | E [ τ − τ δ | Y τ δ ]  I { τ δ <τ } i ≤ ln 2 + E h E [ τ − τ δ | Y τ δ ] J  H ( M ′ | Y τ δ ) − ln 2 − P M ′ e { Y τ δ } ln |M 2 | E [ τ − τ δ | Y τ δ ]  I { τ δ <τ } i ( a ) ≤ ln 2 + E [ τ − τ δ ] J  E h I { τ δ <τ } H ( M ′ | Y τ δ ) − ln 2 − P M ′ e { Y τ δ } ln |M 2 | E [ τ − τ δ ] i ( b ) ≤ ln 2 + E [ τ − τ δ ] J E h I { τ δ <τ } H ( M ′ | Y τ δ ) i − ln 2 − P e ln |M 2 | E [ τ − τ δ ] ! (27) where ( a ) follows the concavity of J ( · ) and Jense n’ s inequa lity whe n we interpret E [ τ − τ δ | Y τ δ ] I { τ δ <τ } E [ τ − τ δ ] as proba- bility distribution over Y τ δ and ( b ) follows the fact that J ( · ) is a d ecreasing function. Now we lo we r bound E  I { τ δ <τ } H ( M ′ | Y τ δ )  in terms o f E [ H ( M | Y τ δ )] . Note tha t H ( M | Y τ δ ) = H ( M ′ | Y τ δ ) + Pr h M 1 6 = ˜ M 1 ( Y τ δ )    Y τ δ i H ( M | M 1 6 = ˜ M 1 ( Y τ δ ) , Y τ δ ) ≤ H ( M ′ | Y τ δ ) + Pr h M 1 6 = ˜ M 1 ( Y τ δ )    Y τ δ i ln |M 1 ||M 2 | 31 Furthermore for a ll Y τ δ such that τ > τ δ , Pr h ˜ M 1 ( Y τ δ ) 6 = M 1    Y τ δ i ≤ δ . Thus E  I { τ δ <τ } H ( M ′ | Y τ δ )  ≥ E  I { τ δ <τ } ( H ( M | Y τ δ ) − δ ln |M 1 ||M 2 | )  = E  (1 − I { τ δ = τ } ) H ( M | Y τ δ )  − δ ln |M 1 ||M 2 | ≥ E [ H ( M | Y τ δ )] − Pr [ τ δ = τ ] ln |M 1 ||M 2 | − δ ln |M 1 ||M 2 | ( a ) ≥ E [ H ( M | Y τ δ )] − ( P e λδ + δ ) ln |M 1 ||M 2 | ( b ) ≥ (1 − P e λδ − δ ) ln |M 1 ||M 2 | − C E [ τ δ ] (28) where ( a ) follo ws from the inequa lity (23), ( b ) follows from the ineq uality (24). Sinc e J ( · ) is decrea sing in its argument, inse rting (28) in (27) we get R.H .S. ≤ ln 2 + E [ τ − τ δ ] J   ln |M 1 ||M 2 | „ 1 − P e λδ − δ − P e « − E [ τ δ ] C − ln 2 E [ τ − τ δ ]   (29) Note that ∀ a > 0 , b > 0 , C > 0 , d dx ( b − x ) J  a − C x b − x  | x = x 0 = − J  a − C x 0 b − x 0  −  C − a − C x 0 b − x 0  d dx J ( x ) | x = a − C x 0 b − x 0 ( a ) ≤ − J ( C ) where ( a ) follows the c oncavity o f J ( · ) . T hus upp er bound given in equation (29) is decreas ing in E [ τ δ ] . Thus using the lower bound on E [ τ δ ] , given in (23) we get, R.H .S. ≤ ln 2 +  E [ τ ] − (1 − δ − P e λδ ) ln |M 1 | C + ln 2 C  J   „ 1 − P e λδ − δ − P e « ln |M 2 |− P e ln |M 1 |− 2 ln 2 E [ τ ] − (1 − δ − P e λδ ) ln |M 1 | C + ln 2 C   (30) Now let us consider the L.H .S. we get by taking the expectation of the inequ ality giv en in (26). L.H .S. = E h I { τ δ <τ } (1 − ξ ( Y τ δ ) − P M ′ e { Y τ δ } ) ln 1 P M ′ e { 0 ,Y τ δ } i ( a ) ≥ E h I { τ δ <τ } (1 − ξ ( Y τ δ ) − P M ′ e { Y τ δ } ) i ln E h I { τ δ <τ } (1 − ξ ( Y τ δ ) − P M ′ e { Y τ δ } ) i E h I { τ δ <τ } (1 − ξ ( Y τ δ ) − P M ′ e { Y τ δ } ) P M ′ e { 0 ,Y τ δ } i ( b ) ≥ − e − 1 + E h I { τ δ <τ } (1 − ξ ( Y τ δ ) − P M ′ e { Y τ δ } ) i ln 1 E h I { τ δ <τ } (1 − ξ ( Y τ δ ) − P M ′ e { Y τ δ ) } P M ′ e { 0 ,Y τ δ } i ≥ − e − 1 + E h I { τ δ <τ } (1 − ξ ( Y τ δ ) − P M ′ e { Y τ δ } ) i ln 1 E h I { τ δ <τ } P M ′ e { 0 ,Y τ δ } i (31) where ( a ) follows log sum ineq uality and ( b ) follo ws from the fact that x ln x ≥ − e − 1 . Note that E h I { τ δ <τ } (1 − ξ ( Y τ δ ) − P M ′ e { Y τ δ } ) i ≥ E  I { τ δ <τ } (1 − ξ ( Y τ δ ))  − E h P M ′ e { Y τ δ } i ≥ E  I { τ δ <τ }  (1 − δ ) − P e ≥ 1 − P e λδ − δ (32) where in las t step we have used the e quation (23). Furthermore E h I { τ δ <τ } P M ′ e { 0 , Y τ δ } i = E h I { τ δ <τ } Pr h ˆ M 1 = ˜ M 1    Y τ δ , ˜ M 1 6 = M 1 ii ≤ 1 δλ E h I { τ δ <τ } Pr h ˆ M 1 = ˜ M 1    Y τ δ , ˜ M 1 6 = M 1 i Pr h ˜ M 1 6 = M 1    Y τ δ ii ≤ P e M 1 δλ (33) 32 Thus using equ ations (31), (32) and (33 ) we g et L.H .S. ≥ − e − 1 − (1 − P e λδ − δ ) ln P e M 1 λδ (34) Using the inequalities (30), (34) and c hoosing δ = √ P e we get E f bits , Q ≤  1 − r Q C  J ( C ) . Since J ( C ) = ˜ C this implies E f bits ( r ) ≤  1 − r C  ˜ C . • C. Pr o of of of Th eorem 7 1) Achievability: Proof: Proof is very similar to the achiev ability proof for Theorem 6. Choose a cap acity achieving seq uence Q ′ such that E f b , Q ′ = ˜ C . The cap acity achieving sequ ence with feedbac k, Q uses L elemen ts of Q ′ as follows. For the k th element o f c ode Q , transmitter us es the ⌊ k · r 1 ⌋ th element o f Q ′ to send the first part of the messag e, M 1 . In the remaining phase s, l ≥ 2 transmitter uses ⌊ k · r l ⌋ th element of Q ′ . T he special me ssage o f the co de for phase l is allocated to the error event in p revious phase s. ( ˜ M 1 , . . . , ˜ M ( l − 1) ) 6 = ( M 1 , . . . , M ( l − 1) ) ⇒ M ′ l = 1 ∀ l ( ˜ M 1 , . . . , ˜ M ( l − 1) ) = ( M 1 , . . . , M ( l − 1) ) ⇒ M ′ l = M l + 1 ∀ l Thus M ( k ) 1 = M ′ ( ⌊ rk ⌋ ) and for all l ≥ 1 M ( k ) l ∪ {|M ( k ) l | + 1 } = M ′ ( ⌊ r l k ⌋ ) . If for all l ∈ { 2 , 3 , . . . , L } , ˆ M ′ l 6 = 1 , receiv er deco des all parts o f the information, else it declares an erasure. W e skip the error analys is beca use it is essen tially the s ame with The orem 6. • 2) Con verse: Proof: W e prove the con verse of Theorem 7 by contradiction. Evide ntly max { P e M 1 , P e M 2 , . . . , P e M j } ≤ P e M 1 , M 2 ,..., M j ≤ P e M 1 + P e M 2 + · · · + P e M j ∀ j ∈ { 1 , 2 , . . . L } Thus if there exists a scheme that ca n reach an e rror exponent vector outside the re gion gi ven in Theorem 7, there is at least one E i such that E i ≥ (1 − P i j =1 r j C ) ˜ C . The n we can have two supe r message s as follows, M ′ 1 = ( M 1 , M 2 , . . . , M i ) and M ′ 2 = ( M i +1 , M i +2 , . . . , M l ) Recall that P e M 1 ≤ P e M 2 ≤ · · · ≤ P e M l . Th us this new code is a cap acity achieving code, who se spec ial bits have rate r Q ′ and E f bits , Q ′ > E f bits ( r Q ′ ) . This is contradicting with the Theo rem 6 we have alread y proved. Thus all the a chiev able error expone nt regions should lie in the region giv en in Theo rem 7. • D. Pr o of of of Th eorem 8 1) Achievability: E f md ≥ ˜ C : Note that a ny fixed length bloc k cod e without feedb ack, is also vari able-length b lock code with feed back, thus E f md ≥ E md . Using the capacity achieving sequ ence we have us ed in the achiev a bility proof of Theorem 2, we get E f md ≥ ˜ C . 33 2) Con verse: E f md ≤ ˜ C : Now we prove tha t even with feedba ck and variable decod ing time, the best missed detection expo nent of a single special messa ge is less then or equa l to ˜ C , i.e. E f md ≤ ˜ C . Since the set of c apacity achieving s equen ces is a subs et of capa city a chieving sequ ences with feedback and variable decoding time, this also implies that E md ≤ ˜ C . Instead of d irectly proving the con verse p art of The orem 8 we first p rove the followi ng lemma. Lemma 1: For any variable length bloc k code with feedbac k, me ssage set M , initial e ntropy H ( M ) and av erage error probability P e , the co nditional error prob ability of ea ch me ssage is lower bounde d as follows, Pr h ˆ M 6 = i    M = i i ≥ e − 1 1 − Pr[ M = i ] − P e „ J „ H ( M ) − h ( P e ) − P e ln( |M|− 1) E [ τ ] « E [ τ ]+ln 2 « ∀ i (35) where J ( R ) is g i ven by the follo wing optimization ov er proba bility distrib utions on X J ( R ) = max α,x 1 ,x 2 ,P 1 X ,P 2 X : αI ( P 1 X ,W Y | X )+(1 − α ) I ( P 2 X ,W Y | X ) ≥ R αD  ( P 1 W ) Y ( · )   W ( ·| x 1 )  + (1 − α ) D  ( P 2 W ) Y ( · )   W ( ·| x 2 )  (36) It is wor thwhile remembe ring the notation we introdu ced previously tha t ( P i W ) Y ( · ) = X j ∈X P i X ( j ) W Y | X ( ·| j ) and I ( P i X , W Y | X ) = X j ∈X ,k ∈Y P i X ( j ) W Y | X ( k | i ) ln W Y | X ( k | i ) ( P i W ) Y ( k ) First thing to note abo ut Lemma 1 is that it is not ne cessa rily for the c ase of u niform p robability distrib ution on the mes sage s et M . Furthermore as long as Pr [ M = i ] << 1 the lower b ound on Pr h ˆ M 6 = i    M = i i depend s on the a priori probability distribution of the me ssage s only through the entropy of it, H ( M ) . In equa tion (36) α is s imply a time sharing variable, which a llo ws us to use a ( x i , P i X ) pair with low mutual information an d high diver g ence together with another ( x i , P i X ) pa ir with high mutual information and low div er gence. As a result o f Carath ´ eodory’ s Theo rem we see that time sh aring between two points of the form ( x i , P i X ) is su f ficient for ob taining optimal performance , i.e. allowing time sharing betwee n mo re than two p oints of the form ( x i , P i X ) will not improv e the value of J ( R ) . Indeed for any R ∈ [0 , C ] o ne can u se the o ptimizing values of α , x 1 , x 2 , P 1 X and P 2 X in a scheme like the one in Th eorem 2 w ith time sharing and prove that miss ed detec tion exponen t of J ( R ) is achiev able for a reliable sequen ce of rate R . In that α determines h ow long the input letter x 1 ∈ X is used for the spe cial message while P 1 X is being used for the ordinary co dew ords. Furthermore arguments very similar to those of Theorem 8 can be use d to prove no missed detection expo nent highe r than J ( R ) is ac hiev ab le for reliable seque nces of rate R . Thus J ( R ) is the best exponent a messag e can g et in a rate R reliable sequenc e. One c an s how that J ( R ) is a concave function o f R over its su pport [0 , C ] . Furthermore J (0) = D max and J ( C ) = ˜ C . Thus J ( R ) is a concave strictly de creasing fun ction of R for 0 ≤ R ≤ C . Proof (of Lemma 1): Recall tha t G ( i ) is the deco ding region for M = i i.e. G ( i ) = { y τ : ˆ M ( y τ ) = i } . The n as a result of data process ing ineq uality for KL diver ge nce we have E h ln Pr[ Y τ ] Pr[ Y τ | M = i ] i ≥ Pr [ G ( i )] ln Pr[ G ( i )] Pr[ G ( i ) | M = i ] + P r h G ( i ) i ln Pr [ G ( i ) ] Pr [ G ( i ) | M = i ] ≥ − h (Pr [ G ( i )]) + Pr h G ( i ) i ln 1 Pr [ G ( i ) | M = i ] ≥ − ln 2 + Pr h G ( i ) i ln 1 Pr [ G ( i ) | M = i ] (37) where in the last step we h av e u sed, the fact that h (Pr [ G ( i )]) ≤ ln 2 . In a ddition Pr h G ( i ) i ≥ Pr h G ( i )    M 6 = i i Pr [ M 6 = i ] ≥ X j 6 = i Pr [ G ( j ) | M = j ] Pr [ M = j ] ≥ ( 1 − P e − Pr [ M = i ]) . (38) 34 Thus using the equations (37) and (38 ) we g et Pr h G ( i )    M = i i ≥ e − 1 1 − P e − Pr[ M = i ] „ ln 2+ E » ln Pr [ Y τ ] Pr[ Y τ | M = i ] –« . (39) Now we lower bound the error probab ility of the specia l me ssage by upper bounding E h ln Pr[ Y τ ] Pr[ Y τ | M = i ] i . For that let us con sider the following s tochastic sequ ence, S n = ln Pr[ Y n ] Pr[ Y n | M = i ] − n X t =1 E h ln Pr[ Y t | Y t − 1 ] Pr[ Y t | M = i, Y t − 1 ]    Y t − 1 i Note that E [ S n +1 | Y n ] = S n and since min W i,j = λ we h av e E [ | S n +1 − S n || Y n ] ≤ 2 ln 1 λ . Thus S n is a martingale, furthermore sinc e E [ τ ] < ∞ we ca n use [37, T heorem 2 p 487], to get E [ S τ ] = S 0 = 0 . Thus E h ln Pr[ Y τ ] Pr[ Y τ | M =1] i = E " τ X t =1 E h ln Pr[ Y t | Y t − 1 ] Pr[ Y t | M =1 , Y t − 1 ]    Y t − 1 i # . (40) Note that E h ln Pr[ Y t | Y t − 1 ] Pr[ Y t | M =1 , Y t − 1 ]    Y t − 1 i = E h ln Pr[ Y t | Y t − 1 ] W Y | X ( Y t | ¯ x t (1))    Y t − 1 i . As a res ult of de finition of J ( · ) g i ven in equa tion (36) w e have, E h ln Pr[ Y t | Y t − 1 ] Pr[ Y t | M =1 ,Y t − 1 ]    Y t − 1 i ≤ J  I  X t ; Y t   Y t − 1  (41) where I  X t ; Y t   Y t − 1  is given by 13 I  X t ; Y t   Y t − 1  = E h ln Pr[ X t ,Y t | Y t − 1 ] Pr[ X t | Y t − 1 ] Pr[ Y t | Y t − 1 ]    Y t − 1 i Gi ven Y t − 1 random variables M − X t − Y t forms a Markov chain. Thus I  X t ; Y t   Y t − 1  ≥ I  M ; Y t   Y t − 1  . (42) Since J ( · ) is a dec reasing fun ction, equations (40), (41) and (42) lead to E h ln Pr[ Y τ ] Pr[ Y τ | M =1] i ≤ E " τ X t =1 J  I  M ; Y t   Y t − 1  # (43) Note that E " τ X t =1 J  I  M ; Y t   Y t − 1  # = E " τ τ X t =1 1 τ J  I  M ; Y t   Y t − 1  # ( a ) ≤ E " τ J τ X t =1 1 τ I  M ; Y t   Y t − 1  !# = E [ τ ] E " τ E [ τ ] J τ X t =1 1 τ I  M ; Y t   Y t − 1  !# ( b ) ≤ E [ τ ] J E " τ E [ τ ] τ X t =1 1 τ I  M ; Y t   Y t − 1  #! = E [ τ ] J  E h P τ t = i I ( M ; Y t | Y t − 1 ) E [ τ ] i (44) 13 Note t hat unlike the con ventional definition of conditional mutual information, I ` X t ; Y t ˛ ˛ Y t − 1 ´ is not averag ed ov er the conditioned random v ariable Y t − 1 . 35 where in both ( a ) a nd ( b ) we use the the c oncavity of the J ( · ) function toge ther with J ensen’ s inequality . Thus using equ ations (39), (43) and (44) we get, Pr h ˆ M 6 = i    M = i i ≥ e − 1 1 − P e − Pr[ M = i ] „ J „ E [ P τ t = i I ( M ; Y t | Y t − 1 )] E [ τ ] « E [ τ ]+ln 2 « Since J ( R ) is decreasing in R , the o nly thing we are left to s how is that E " τ X t = i I  M ; Y t   Y t − 1  # ≥ H ( M ) − h ( P e ) − P e ln( |M| − 1) (45) For that consider the stoch astic s equen ce, V n = H ( M | Y n ) + n X t =1 I  M ; Y t   Y t − 1  . Clearly E [ V n +1 | Y n ] = V n and E [ | V n | ] < ∞ , thus { V n } is a martingale. Furthermore E [ | V n +1 − V n || Y n ] ≤ K and E [ τ ] < ∞ thus using a version of Doob ’ s optional stopping the orem, [37, Theore m 2 p 487 ], we get, V 0 = E [ V τ ] = E [ H ( M | Y τ )] + E " τ X t =1 I  M ; Y t   Y t − 1  # . (46) One can write Fano’ s ine quality as follows, H ( M | Y τ ) ≤ h  Pr h ˆ M ( Y τ ) 6 = M    Y τ i + P r h ˆ M ( Y τ ) 6 = M    Y τ i ln( |M| − 1) . Consequ ently E [ H ( M | Y τ )] ≤ E h h  Pr h ˆ M ( Y τ ) 6 = M    Y τ ii + E h Pr h ˆ M ( Y τ ) 6 = M    Y τ ii ln( |M| − 1) . Using the c oncavity of b inary entropy , E [ H ( M | Y τ )] ≤ h ( P e ) + P e ln( |M| − 1) . (47) Using equa tion (46) together with e quation (47) we g et the de sired condition given in the equation (45). • Above p roof is for encoding schemes which doe s n ot hav e any randomization (time sharing), but same ideas can be us ed to establish the exact sa me result for ge neral variable length block co des with randomization. Now we are ready to prov e the con verse part of the Theorem 8. Proof (of Con verse p art of Theorem 8): In orde r to p rove E f md ≤ ˜ C , fi rst note that for capacity achieving sequen ces we conside r Pr [ M = i ] = 1 |M ( k ) | . Thus − ln( P e M ( i )) ( k ) E [ τ ( k ) ] ≤ 1 1 − P e ( k ) − 1 |M ( k ) |  J  ln |M ( k ) |− h ( P e ( k )) − P e ( k ) ln( |M ( k ) |− 1) E [ τ ( k ) ]  + ln 2 E [ τ ( k ) ]  . (48) Thus for a ny c apacity achieving se quenc e with feed back, lim k →∞ − ln( P e M ( i )) ( k ) E [ τ ( k ) ] ≤ J ( C ) = ˜ C . • 36 E. Pr oof of of The orem 9 In this s ubsec tion we will show how the strategy for s ending a spec ial bit ca n be combined with the Y amamoto- Itoh strategy when many sp ecial mess ages demand a missed-detec tion exp onent. Howe ver unlike pre vious results about capa city ac hieving seque nces, The orems 5, 6 , 7, 8, we will have and add itional u niform delay assump tion. W e will restrict ourse lf to uniform delay c apacity achieving s equenc es. 14 Clearly capacity ac hieving s equenc es in general n eed not to be uniform delay . Indee d ma ny messag es, i ∈ M , can g et an expe cted delay , E [ τ | M = i ] much larger than the average d elay , E [ τ ] . Th is in return c an decrea se the error probability of thes e messag es. The potential drawback of such code s, is that their average delay is sensitiv e to ass umption of mes sages b eing chosen according to a uniform probability distrib ution. Expec ted de coding time, E [ τ ] , can increase a lot if the code is us ed in a s ystem in wh ich the mes sages a re not c hosen uniformly . It is worth emphasizing that a ll previously disc ussed expone nts (single me ssage exponent E f md , single bit exponent E f b , many b its exponent E f b ( r ) and achievable multi-l ayer exponent regions) rema in uncha nged whe ther or not this uniform delay constraint is impos ed. Thus the flexibil ity to provide dif ferent expected delays to d if fere nt messag es doe s n ot improve tho se exponents. Howe ver , this is no t true for the messa ge-wise UEP with expon entially many messages . Removing the u niform delay c onstraint can cons iderably enhan ce the p rotection o f sp ecial mess ages at rate h igher than (1 − ˜ C D max ) C . Indeed one can make the exponent of all special me ssage s, ˜ C . The flexibility of providing more resource s (decoding delay) to sp ecial mes sages a chieves this e nhance ment. Howe ver , we will not disc uss tho se cases in this article an d stick to un iform delay cod es. 1) Achievability: E f md ( r ) ≥ min { ˜ C , (1 − r C ) D max } : The optimal scheme here reverses the trick for ac hieving E f b : first a s pecial bit tells to the re ceiv er whether the messag e being transmitted is special one or not. After the de coding o f this bit the message itse lf is transmitted. This further emphasizes ho w fee dback c onnects b it-wise and message-wise UEP , when used with v ariable de coding time. Proof: Like all the previous achievabilit y results, we con struct a c apacity ac hieving seq uence , Q , with the desired asymptotic be havior . A sequenc e of multi pha se fixed length errors and erasu res c odes, Q ′ is us ed as the building block of Q . Let us cons ider the k th member of Q ′ . In the first pha se transmitter sends one of the two input symbols with distinct output distrib utions for ⌊ √ k ⌋ time units in order to tell whether M ∈ M ( k ) s or n ot. Let b be b = I { M ∈M ( k ) s } .Then, a s it was mentione d in subs ection VIII-A.1, with a threshold de coding we ca n a chieve Pr h ˆ b 6 = 1    b = 1 i = Pr h ˆ b 6 = 0    b = 0 i ≤ e − √ k µ where µ > 0 . ( 49) Actual value of µ is not important for us , we are me rely interes ted in a n u pper bou nd vanishing with increasing k . In the second ph ase one of tw o length k codes is u sed depend ing on ˆ b . • If ˆ b = 0 , in the secon d p hase, transmitter uses the k th member of a ca pacity ach ieving seque nce, Q ′′ such that E b , Q ′′ = ˜ C . W e know that such a seque nce exists becaus e o f Theorem 2. The me ssage , M ′ of the Q ′′ is determined u sing the following mapping M ∈ M s ⇒ M ′ = 1 M / ∈ M s ⇒ M ′ = M − |M s | + 1 At the end of the sec ond phase, rec eiv er d ecodes M ′ . If ˆ M ′ = 1 , then rece i vers decla res an e rasure, ˜ M = erasure. If ˆ M ′ 6 = 1 , then ˆ M = ˜ M = ˆ M ′ + |M s | − 1 . 14 Recall that for any reliable variable length block code with feedback Γ i s defined as Γ = max i ∈M E [ τ | M = i ] E [ τ ] and uniform delay reliable sequences are the ones t hat satisfy lim k →∞ Γ ( k ) Q = 1 . 37 • If ˆ b = 1 , trans mitter u ses a two pha se c ode with errors and erasu res in the sec ond pha se, like the one described by Y amamoto and Itoh in [40]. T he two phases of this code are ca lled communica tion and c ontrol phases , res pectively . In commu nication phase transmitter us es ⌈ r k ⌉ th member of a c apacity achieving sequen ce, Q ′′ with E b , Q ′′ = ˜ C , to c on vey its message , M ′ . The a uxiliary mess age M ′ is determined as follows, M / ∈ M s ⇒ M ′ = 1 M ∈ M s ⇒ M ′ = M + 1 The de coded messag e o f the ⌈ r k ⌉ th member of Q ′′ is ca lled the tentative dec ision of commu nication phase and den oted b y ˜ M ′ . In the control phase , – if ˜ M ′ = M ′ tentati ve decision is confirmed by sending accept symbol x a for ℓ ( k ) = k − ⌈ r C k ⌉ time units. – if ˜ M ′ 6 = M ′ tentati ve decision is rejected b y se nding reject symbol x d for ℓ ( k ) = k − ⌈ r C k ⌉ time units. where x a and x d are the ma ximizers in the follo wing optimization problem. D max = max i,j D  W Y | x ( ·| i )   W Y | X ( ·| j )  = D  W Y | x ( ·| x a )   W Y | X ( ·| x d )  If the o utput sequ ence in last k − ⌈ r C k ⌉ time s teps is typical with W Y | X ( ·| x a ) the n ˆ M ′ = ˜ M ′ else eras ure is declared for M ′ . Note that the total probability of W Y | X ( ·| x a ) typical sequences are less than e − ℓ ( k )( D max − δ ℓ ( k ) ) when ˜ M ′ 6 = M ′ and more than 1 − δ ℓ ( k ) when ˜ M ′ = M ′ where lim ℓ ( k ) →∞ δ ℓ ( k ) = 0 , [13, Corrollary 1.2, p 19]. If ˆ M ′ = e rasure or i f ˆ M ′ = 1 then receiv er de clares erasure for M , ˜ M = erasure. If ˆ M ′ ∈ { 2 , 3 , . . . , |M s | + 1 } , then ˆ M = ˜ M = ˆ M − 1 . Now we can calculate the error and erasure p robabilities of the two phas e fixed length block code. Let u s denote the eras ures by ˜ M = erasure for e ach k . For i ∈ M s using the equation (49 ) and Bay es rule we get Pr h ˜ M = erasure    M = i i ≤ e − µ √ k + ( P e ( k − ℓ ( k )) , Q ′ + δ ℓ ( k ) ) (50) Pr h ˜ M 6 = i, ˜ M 6 = erasu re    M = i i ≤ e − µ √ k P e k Q ′ (1) + P e ( k − ℓ ( k )) , Q ′ e − ℓ ( k )( D max − δ ℓ ( k ) ) . (51) For i / ∈ M s using the eq uation (49) and Bayes rule we get Pr h ˜ M = eras ure    M = i i ≤ e − µ √ k + P e ( k ) , Q ′ (52) Pr h ˜ M 6 = i, ˜ M 6 = erasure    M = i i ≤ e − µ √ k + P e ( k ) , Q ′ . (53) Whenever ˜ M = erasure tha n transmitter and receiver try to send the mess age once again from s cratch using same strategy . The n for any i ∈ M Pr h ˆ M 6 = i    M = i i = Pr [ ˜ M 6 = i, ˜ M 6 = erasure | M = i ] 1 − Pr [ ˜ M = erasure | M = i ] (54) E [ τ | M = i ] = k + √ k 1 − Pr [ ˜ M = erasure | M = i ] (55) Using equ ations (50), (51), (52), (53), (54) and (55) we conclude that that Q is ca pacity ach ieving seq uence such that lim k →∞ − ln max i ∈M s Pr [ ˜ M 6 = i, ˆ M 6 = erasure | M = i ] E [ τ ] = min { ˜ C , (1 − r C ) D max } lim k →∞ ln |M ( k ) s | E [ τ ] = r • 38 2) Con verse: E f md ( r ) ≤ min { ˜ C , (1 − r C ) D max } : Proof: Consider any un iform dela y ca pacity ach ieving sequence , Q . Note tha t b y excluding all i / ∈ M ( k ) s we get a reliable seq uence, Q ′ such that P e ′ ( k ) ≤ Pr ( k ) h ˆ M 6 = M    M ∈ M s i E h τ ′ ( k ) i ≤ Γ ( k ) E h τ ( k ) i Thus − ln Pr [ ˆ M 6 = M | M ∈M s ] ( k ) E [ τ ( k ) ] ≤ − ln P e ′ ( k ) E [ τ ′ ( k ) ] Γ ( k ) Consequ ently E f md ( r ) ≤ (1 − r C ) D max . S imilarly by excluding all but on e of the elements of M s we ca n prove that E f md ( r ) ≤ ˜ C , using Theorem 8 and un iform delay cond ition. • I X . A V O I D I N G F A L S E A L A R M S : P RO O F S A. Block Co des without F eedback: Pr oof of Theorem 10 1) Lowe r Bo und: E fa ≥ E l fa : Proof: As a re sult of the cod ing theo rem [13, Ch. 2 Corollary 1.3, pag e 10 2 ] we know that there exits a reliable sequen ce Q ′ of fixed composition co des whose rate is C and who se n th elements compos ition P ( n ) X satisfies, X i ∈X | P ( n ) X ( i ) − P ∗ X ( i ) | ≤ 4 q 1 n . W e use the codewords of the n th element o f Q ′ as the codewords of the ordinary messages in the n th code in Q . For the spe cial messag e we us e a leng th- n repetition seq uence ¯ x n (1) = ( x f l , x f l , · · · , x f l ) . The decoding region for the special messag e is essentially the bare minimum. W e include the typical channe l outputs within the dec oding region of the special me ssage to ensu re s mall missed detection probability for the special mes sage, but we exclud e a ll other ou tput sequen ce y n . G (1) = { y n : X i ∈Y | Q ( y n ) ( i ) − W Y | X ( i | x f l ) | ≤ 4 p 1 /n } Note that this defi nition of G (1) itself ensures that sp ecial mes sage is tr ansmitted reliably whenever it is s ent, lim n →∞ Pr ( n ) h ˆ M 6 = 1    M = 1 i = 0 . The dec oding regions of the ordinary me ssage s, j = { 2 , 3 , . . . M ( n ) } , is the intersection of the corresponding decoding region in Q ′ with the c omplement of G (1) . T hus the fact that Q ′ is a reliable sequence implies that, lim n →∞ Pr ( n )   y n ∈ [ j / ∈{ 1 ,i } G ( j )       M = i   = 0 Consequ ently we have reliable communication for ordinary messa ges a s long as lim n →∞ Pr ( n ) [ G (1) | M = j ] = 0 , ∀ j 6 = 1 . But we prov e a much stronger resu lt to ens ure that Pr ( n ) h ˆ M = 1    M 6 = 1 i is decaying fast enough. Before doing that let us no te that in the s econd stage of the decoding , when we are ch oosing a message among the ordinary ones, ML decoder ca n be used instea d of the decoding rule of the original c ode. Doing that will only decrea se the average error proba bility . 39 Note the prob ability of a V -shell of a messag e i is equal to, Pr ( n ) [ T V ( i ) | M = i ] = e − nD ( V Y | X ( ·| X ) k W Y | X ( ·| X ) | P ( n ) X ) Note that also that G (1) can b e written as the u nion of V -she lls of a mess age i as follow . G (1) = [ V Y | X ∈V ( n ) T V ( i ) ∀ i 6 = 1 where V ( n ) = { V Y | X : P j | P k V Y | X ( j | k ) P n X ( k ) − W Y | X ( j | x f l ) | ≤ 4 p 1 /n } . Note that since there a re at most (1 + n ) |X || Y | dif ferent cond itional typ es. Pr ( n ) [ G (1) | M = i ] ≤ (1 + n ) |X || Y | max V Y | X ∈V ( n ) Pr [ T V ( i ) | M = i ] Thus for a ll i > 1 lim n →∞ − ln Pr ( n ) [ G (1) | M = i ] n = min V Y | X : P j P ∗ X ( j ) V Y | X ( ·| j )= W Y | X ( ·| x f l ) D  V Y | X ( ·| X )   W Y | X ( ·| X )   P ∗ X  • 2) Uppe r B o und: E fa ≤ E u fa : Proof: As a res ult of da ta processing inequ ality for KL div ergence we h av e X y n ∈Y n Pr [ y n | M = 1] ln Pr[ y n | M =1] Pr[ y n | M 6 =1] ≥ Pr [ G (1) | M = 1] ln Pr[ G (1) | M =1] Pr[ G (1) | M 6 =1] Pr h G (1)    M = 1 i ln Pr [ G (1) | M =1 ] Pr [ G (1) | M 6 =1 ] ≥ − ln 2 − Pr [ G (1) | M = 1] ln Pr [ G (1) | M 6 = 1] (56) Using the c on vexity of the KL diver gen ce we g et X y n ∈Y n Pr [ y n | M = 1] ln Pr[ y n | M =1] Pr[ y n | M 6 =1] ≤ |M| X i =2 1 |M|− 1 X y n ∈Y n Pr [ y n | M = 1] ln Pr[ y n | M =1] Pr[ y n | M = i ] = |M| X i =2 1 |M|− 1 X y n ∈Y n Pr [ y n | M = 1] n X k =1 ln Pr[ y k | M =1 ,y k − 1 ] Pr[ y k | M = i, y k − 1 ] = n X k =1 |M| X i =2 1 |M|− 1 D  W Y | X ( ·| ¯ x k (1))   W Y | X ( ·| ¯ x k ( i ))  (57) where ¯ x k ( i ) deno tes the input letter for c odeword of messa ge i , at time k . Let us d enote the emp irical distrib ution of the ¯ x k ( i ) for time k , by P X k . P X k ( i ) = P j ∈M I { ¯ x k ( j )= i } |M| ∀ i ∈ X Using equa tion (56) and (57) we get Pr [ G (1) | M 6 = 1] ≥ e − 1 Pr[ G (1) | M =1] „ |M| |M|− 1 P k D ( W Y | X ( ·| ¯ x k (1)) k W Y | X ( ·| X k ) | P X k ) +ln 2 « (58) W e show be lo w that for a ll capac ity achieving codes , almost all of the k ’ s has a P X k which is essentially equal to P ∗ X . For doing that let us first defin e the set P ( ǫ ) an d δ ( ǫ ) P ( ǫ ) , { P X : I ( P X , W Y | X ) ≥ C − ǫ } and δ ( ǫ ) , max P X ∈P ( ǫ ) X i | P X ( i ) − P ∗ X ( i ) | 40 Note that lim ǫ → 0 δ ( ǫ ) = 0 . As a result of Fano’ s inequa lity we have, I ( M ; Y n ) ≥ n R ( n ) (1 − P e ) − ln 2 (59) On the othe r hand using s tandard manipulations on mutual information we get I ( M ; Y n ) = n X k =1 I ( P X k , W Y | X ) ≤ C n − ǫ n X k =1 I { P X k / ∈P ( ǫ ) } (60) Using equa tion (60) in eq uation (59) we g et, n X k =1 I { P X k / ∈P ( ǫ ) } ≤ n ( C − R ( n ) (1 − P e ) − ln 2 /n ) ǫ Let ǫ ( n ) b e ǫ ( n ) = q C − R ( n ) (1 − P e ) − ln 2 n , then lim n →∞ ǫ ( n ) = 0 and n X k =1 I { P X k / ∈P ( ǫ ( n ) ) } ≤ n ǫ ( n ) . (61) Note for a ny P X ∈ P  ǫ ( n )  we have D  W Y | X ( ·| x k (1))   W Y | X ( ·| X k )   P X  ≤ D  W Y | X ( ·| x k (1))   W Y | X ( ·| X )   P ∗ X  + δ ( ǫ ( n ) ) D max ≤ E u fa + δ ( ǫ ( n ) ) D max (62) where E u fa = max i ∈X D  W Y | X ( ·| i )   W Y | X ( ·| X )   P ∗ X  Using equa tions (61) and (62) X k D  W Y | X ( ·| x k (1))   W Y | X ( ·| X k )   P X k  ≤ n ( E u fa + δ ( ǫ ( n ) ) D max + ǫ ( n ) D max ) Inserting this in equation (58) we get lim n →∞  − ln Pr ( n ) [ G (1) | M 6 =1] n  ≤ E u fa • B. V ariable L ength Block Cod e s with F e edback: Proof of Th eorem 11 1) Achievability: E f fa ≥ D max : Proof: W e construct a capac ity achieving seque nce with feedbac k, Q , by using a cons truction like the one we have for E f md ( r ) . In fact, this scheme achieves the false alarm exponent simultaneously with the best missed detection exponent, ˜ C , for the special messag e. W e use a fixed length multi-phase errors and eras ure code as the building block for the k th member of Q . In the first phase , b = I { M =1 } is con veyed using a length ⌈ √ k ⌉ repetition code, like we did in sub sections V III-A.1 and VIII-E.1. Rec all that Pr h ˆ b 6 = 1    b = 1 i = Pr h ˆ b 6 = 0    b = 0 i ≤ e − µ √ k µ > 0 (63) In the second ph ase one of the two length k cod es is use d de pending on ˆ b . 41 • If ˆ b = 0 , transmitter us es the k th member of a capa city achieving s equenc e, Q ′ such that E md , Q ′ = ˜ C to con vey the mes sage. W e know that such a sequ ence exists because of T heorem 2. Let the message o f Q b e the mes sage o f Q ′ , i.e. the auxiliary message , M ′ = M . If a t the end of the secon d phase ˆ M ′ = 1 , recei ver declares an e rasure, ˜ M = erasu re, else M is decod ed ˆ M = ˜ M = ˆ M ′ . • If ˆ b = 1 , transmitter uses a len gth k repetition cod e to co n vey whether M = 1 o r not. – If M = 1 , M ′ = 1 and transmitter se nds the codeword ( x a , x a , . . . , x a ) . – If M 6 = 1 , M ′ = 0 and transmitter se nds the codeword ( x d , x d , . . . , x d ) . where x a and x d are the ma ximizers achieving D max : D max = max i,j D  W Y | x ( ·| i )   W Y | X ( ·| j )  = D  W Y | x ( ·| x a )   W Y | X ( ·| x d )  Receiver decod es ˆ M ′ = 1 o nly when output se quence is typ ical with W Y | X ( ·| x a ) . Evidently as be fore we have, [13, Corrollary 1 .2, p19]. Pr h ˆ M ′ = 0    M = 1 i ≤ δ k (64) Pr h ˆ M ′ = 1    M = 0 i ≤ e − k ( D max − δ k ) (65) where lim k →∞ δ k = 0 . If ˆ M ′ = 1 then ˆ M = 1 , e lse receiver decla res e rasure for the who le block, i.e. ˜ M = erasure. Now we can calculate the error and e rasure p robabilities for ( ⌈ k ⌉ + k ) long block c ode. Using the equ ations (63), (64), (65 ) and Bay es’ rule we get Pr h ˜ M = eras ure    M = 1 i ≤ e − µ √ k + δ k (66) Pr h ˜ M = eras ure    M = i i ≤ e − µ √ k + P e ( k ) Q ′ i 6 = 1 (67) Pr h ˜ M ∈ M \ { 1 }    M = 1 i ≤ e − µ √ k P e ( k ) Q ′ (1) (68) Pr h ˜ M ∈ M \ { 1 , i }    M = i i ≤ P e ( k ) Q ′ i 6 = 1 (69) Pr h ˜ M = 1    M = i i ≤ e − µ √ k e − k ( D max − δ k ) i 6 = 1 (70) Whenever ˜ M = erasure than transmitter tries to send the mes sage again from scratch , using same strategy . Consequ ently a ll of the a bove e rror probabilities are scaled by a factor of 1 1 − Pr [ ˜ M = erasure | M = i ] when we co nsider the co rresponding e rror probabilities for the variable dec oding time code . Furthermore E [ τ | M = i ] = k + √ k 1 − Pr [ ˜ M = erasure | M = i ] (71) Using e quations (66), (67), (68), (69), (70) and (71) we co nclude that Q is a capac ity achieving co de with E f md , Q = ˜ C and E f fa , Q = D max . • 42 2) Con verse: E f fa ≤ D max : Proof: Note that as result of con vexity of KL di vergence we hav e E h ln Pr[ Y τ | M =1] Pr[ Y τ | M 6 =1]    M = 1 i ≥ Pr [ G (1) | M = 1] l n Pr[ G (1) | M =1] Pr[ G (1) | M 6 =1] + P r h G (1)    M = 1 i ln Pr [ G (1) | M =1 ] Pr [ G (1) | M 6 =1 ] ≥ − ln 2 + Pr [ G (1) | M = 1] ln 1 Pr[ G (1) | M 6 =1] (72) It has already been proved in [4] tha t, E h ln Pr[ Y τ | M =1] Pr[ Y τ | M 6 =1]    M = 1 i ≤ D max E [ τ | M = 1] (73) Note tha t as a result of defin ition of Γ we have E [ τ | M = 1] ≤ E [ τ ] Γ using this together with equ ations (72) and (73) the we get, Pr [ G (1) | M 6 = 1] ≥ e − ln 2+Γ D max E [ τ ] Pr[ G (1) | M =1] Thus for a ny u niform delay reliable se quence , Q , we h ave E f fa , Q ≤ D max . • A P P E N D I X A. Equivalen t d efinitions of UEP e xponents W e co uld have de fined all the UEP exponen ts in this paper without using the notion of capacity achieving sequen ces. As a n example in this s ection we define the single-bit exponent in this alternate ma nner an d show that b oth definitions lead s to identical results. In this alternati ve first ¯ E b ( R ) is d efined a s the best exponent for the sp ecial b it at a gi ven data-rate R and then it is minimized over a ll R < C to obtain ¯ E b . Definition 14 : For any R ≥ 0 , Z ( R ) is the set of seq uence of code s, Q , with mess age sets M ( n ) such that |M ( n ) | ≥ e Rn and M ( n ) = M 1 × M ( n ) 2 where M 1 = { 0 , 1 } . Definition 15 : For a sequ ence of co des, Q , su ch tha t lim n →∞ Pr ( n ) h ˆ M 6 = M i = 0 , singe bit expone nt E b , Q equals E b , Q , lim inf n →∞ − ln Pr ( n ) [ ˆ M 1 6 = M 1 ] n . (74) Definition 16 : ¯ E b ( R ) and the single bit expo nent ¯ E b are de fined a s ¯ E b ( R ) , sup Q∈Z ( R ) E b , Q ¯ E b , inf R 0 , there exists a capac ity-achieving s equenc e Q su ch that E b Q = E b and for large en ough n , R ( n ) ≥ C − δ . If we replace first n membe rs of Q with c odes whose rate are ( C − δ ) or higher we get anothe r se quenc e Q ′ such that Q ′ ∈ Z ( C − δ ) where E b Q ′ = E b . Thus ¯ E b ( C − δ ) ≥ E b for all δ > 0 . Consequen tly ¯ E b ≥ E b 43 E b ≥ ¯ E b : Let us first fix a n a rbitraril y sma ll δ > 0 . In the table in Figure 6, row k represe nts a cod e-seque nce ¯ Q k ∈ Z ( C − 1 /k ) , whose single-bit expo nent E b , ¯ Q k ≥ ¯ E b ( R ) − δ Let ¯ Q k ( l ) represent length- l c ode in this seque nce. W e cons truct a capa city a chieving s equenc e Q from this tab le by sequ entially choosing elements of Q from rows 1 , 2 , · · · as follows . ¯ Q 1 ¯ Q 2 ¯ Q 3 ¯ Q 4 · · · · · · · · · · · ¯ Q 1 (1) ¯ Q 1 (2) ¯ Q 1 (3) ¯ Q 1 (4) ¯ Q 2 (1) ¯ Q 2 (2) ¯ Q 2 (3) ¯ Q 3 (1) ¯ Q 3 (2) ¯ Q 4 (1) Block Length 1 2 3 · · · · · · · · · · · · · · n 1 · · · n 2 · · · · · · · · n 3 · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · ✲ · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · Fig. 6. Row k denotes a reliable code sequence at rate C − 1 /k . Bold path shows capacity achieving sequence Q . • For each sequ ence ¯ Q i , let n i denote the s mallest block length n at which, 1) The single b it error prob ability s atisfies Pr ( n ) h ˆ M 1 6 = M 1 i ≤ e − n ( ¯ E b ( R ) − 2 δ )) 2) The over all error probability satisfie s Pr ( n ) h ˆ M 6 = M i ≤ 1 /i 3) n i ≥ n i − 1 • Gi ven the seque nce, n 1 , n 2 , · · · , we choos e the members of our c apacity a chieving co de from the code-table shown in Figure 6 as follo ws. – I nitialize: W e u se first n 2 − 1 members of ¯ Q 1 as the first n 2 − 1 members of the new code. – I terate: W e choose c odes of leng th n i to n i +1 − 1 from the c ode seque nce ¯ Q i +1 , i.e.,  ¯ Q i ( n i ) , ¯ Q i ( n i + 1) · · · , ¯ Q i ( n i +1 − 1)  Thus Q is a sampling of the code-table as shown by the bold path in Figure 6. Note that this choice of Q is a capac ity ach ie ving sequenc e, moreover it will also ac hiev e a single bit exponent E b , Q = inf R

Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Leave a Comment