Designing communication systems via iterative improvement: error correction coding with Bayes decoder and codebook optimized for source symbol error

Designing comm unication systems via iterativ e impro v emen t: error correction co ding with Ba y es deco der and co deb o ok optimized for source sym b ol error Chai W ah W u IBM Researc h AI IBM T. J. W atson Researc h Center P . O. Bo x 218 Y orkto wn Heights, NY 10598 cwwu@us.ibm.com April 23, 2019 Latest up date: October 7, 2021 Abstract In most error correction co ding (ECC) framew orks, the typical error metric is the bit error rate (BER) whic h measures the num b er of bit errors. F or this metric, the p ositions of the bits are not relev ant to the deco ding, and in man y noise mo dels, not relev an t to the BER either. In many applications this is unsatisfactory as typically all bits are not equal and ha ve diﬀeren t signiﬁcance. W e consider the problem of bit error correction and mitigation where bits in diﬀerent positions hav e diﬀerent importance. F or error correction, w e lo ok at ECC from a Ba y esian persp ectiv e and in tro duce Bay es estimators with general loss functions to take into account the bit signiﬁcance. W e prop ose ECC schemes that optimize this error metric. As the problem is highly nonlinear, traditional ECC construction tec hniques are not applicable. Using exhaustiv e searc h is cost prohibitive, and th us w e use iterativ e improv emen t searc h techniques to ﬁnd go od co debo oks. W e optimize b oth general co deb ooks and linear codes. W e pro vide n umerical exp erimen ts to show that they can be sup erior to classical linear blo c k co des suc h as Hamming co des and decoding metho ds suc h as minimum distance decoding. F or error mitigation, w e study the case where ECC is not possible or not desirable, but signiﬁcance a ware encoding of information is still b eneﬁcial in reducing the a verage error. W e propose a nov el num ber presentation format suitable for emerging storage media where the noise magnitude is unkno wn and possibly large and show that it has lo wer mean error than the traditional num b er format. 1 1 In tro duction The information bit error rate (BER) in classical error co ding is based on the Hamming distance, i.e. the num b er of bits that are diﬀerent betw een the symbols at the transmitter and the deco ded symbols at the receiver. F or this metric, the p ositions of the source bits where the error o ccurred are not signiﬁcan t. This is unsatisfactory since in many scenarios all source bits are not equal and hav e diﬀerent signiﬁcance. F or instance, representing an integer in binary , the bits will hav e diﬀerent signiﬁcance with the diﬀerence increasing exp onen tially with its p osition, and an error in the most signiﬁcant bit is m uch more problematic than an error in the least signiﬁcan t bit. F urthermore, the relationship b et ween the diﬀerence | i − j | of tw o in tegers i and j and the Hamming distance when expressed as bitstrings is nonlinear and nonmonotonic. F or instance, the n umerical diﬀerence b et ween the num b ers 8 and 7 is 1, but expressed as bits, their Hamming distance is 4. On the other hand, the diﬀerence b et w een 0 and 2 k is 2 k , but hav e a Hamming distance of 1. In image compression [ 1 ], the discrete cosine transform (DCT) co eﬃcien ts for low er frequencies are more imp ortan t than the higher frequencies coeﬃcients and luminance co eﬃcients are more important than chrominance co eﬃcients as the h uman visual system exhibits low-pass b eha vior and is more sensitiv e to luminance changes. In stored- program computers, where b oth program data and instruction data are stored in memory , the program instruction co de is more critical than program data as an erroneous instruction can cause the machine to crash where as incorrect data t ypically leads to incorrect results, but not cause a machine to crash [2]. The purp ose of this pap er is to consider some approaches to tak e into accoun t the diﬀerence in signiﬁcance of the source bits in the con text of comm unication systems or storage systems where error correcting co des and source co ding are used to combat channel and storage noise. 2 Notation and setup F or an integer in the range 0 ≤ n < 2 k , let b k ( n ) denote the k -bit represen tation of n , e.g. b 4 (9) = 1001. Let us denote the bijection b et ween symbols s ∈ S (also called information or sour c e symbols) and co dew ords c ∈ C b y Φ. W e assume that each sym b ol in S o ccurs with equal probability in the data stream, i.e. ∀ s ∈ S, p ( s ) = 1 | S | . The standard setup [ 3 ] is shown in Fig. 1, where each sym b ol s is mapp ed to the corresp onding co deword c = Φ( s ) via the co debo ok Φ and transmitted through the channel 1 . The act of the co dew ords moving through the channel incurs errors b ecause of the noisy channel. This channel can b e ph ysical space in the case of communication channels or time in the case of storage sys tems. A t the receiver, the received noisy word c 0 is deco ded as D ( c 0 ) = c ∗ ∈ C and the deco ded symbol is retrieved 2 as s ∗ = Φ − 1 ( c ∗ ). 1 As w e focus on ECC, we will ignore channel modulation/demo dulation for no w. 2 The mapping Φ − 1 can be eﬃciently implemented via a hash table which is indexed by codewords and maps codewords to symbols. 2 sour ce dat a codewor d c Φ chan nel decoder decoded symbol s* noisy wor d c' codebook symbol s decoded codewor d c* Φ -1 D Figure 1: Comm unication system setup 3 Ideal observ er, maxim um likelihoo d and min- im um distance deco ding Giv en a transmitted co dew ord c and a receiv ed w ord c 0 (that may or may not b e a co dew ord in C ), the ideal observer dec oding [ 4 ] returns the co dew ord c ∗ suc h that c ∗ = argmax w ∈ C p ( w | c 0 ) where p ( w | c 0 ) is the probabilit y that co dew ord w is sent given that c 0 (whic h is not necessarily a co dew ord) is received. Giv en our assumption of a uniform prior distribution for co dew ords c , this is equiv alent to maxim um lik eliho o d decoding. F or certain channel models (e.g. binary symmetric c hannel with p < 0 . 5), this is equiv alen t to minimum distance deco ding. 4 Error rate in source sym b ol space T o tak e in to accoun t the diﬀerence in signiﬁcance of the bits of the source sym b ols, instead of BER, we need an error metric that measures the dif- ference b et ween source symbols. Let us consider the following error rate: e δ = lim N →∞ 1 N P N i =1 δ ( s ( i ) , s ∗ ( i )) where s ( i ) is the i -th source sym b ol and s ∗ ( i ) is the i -th deco ded sym b ol. The function δ measures the diﬀerence b et ween the transmitted symbol and the deco ded symbol in the source sym b ol space. In our example of comparing integers, w e deﬁne the following tw o functions: δ 1 = | b − 1 k ( s ) − b − 1 k ( s ∗ ) | and δ 2 = ( b − 1 k ( s ) − b − 1 k ( s ∗ )) 2 , i.e. the diﬀerence (resp. squared diﬀerence) b et ween s and s ∗ when expressed as integers. In order to minimize this error rate, w e optimize b oth the deco ding metho d D and the co deb ook Φ. W e will lo ok at the optimization of eac h of these in turn. 5 Ba y es estimator In Bay esian estimation theory [ 5 ], the Bay es estimator c ∗ minimizes the exp ected loss E ( L ( s ∗ , s ) | c 0 ), where L ( s ∗ , s ) is a loss function measuring the diﬀerence b et w een the deco ded source symbol s ∗ and the transmitted source symbol s . F or example, let a i ( x ) denote the i -th bit of x , then one p ossible loss function is L ( s ∗ , s ) = P i α i | a i ( s ∗ ) − a i ( s ) | . If we choose α i = 2 i , then L ( s ∗ , s ) is equal to the numerical v alue of the bitwise XOR of s ∗ and s . Consider the p osterior distribution p ( s | c 0 ). F or the additive white Gaussian noise (A W GN) channel, p ( s | c 0 ) is prop ortional to g ( µ,σ ) ( c 0 − Φ( s )) where g is the m ultiv ariate Gaussian p df with mean µ and v ariance σ 2 . F or a general loss function on a ﬁnite discrete sym b ol space, the Ba yes estimator can b e implemen ted by comparing all p ossible co debo ok candidates given the received 3 w ord. F or the case when s is a scalar quan tity and for some sp ecial forms of the loss function, the following simple explicit forms of the Bay es estimator are well kno wn and lead to eﬃcien t implementations of the Ba yes estimator: • If the loss function is the 0-1 loss function, i.e. L ( s ∗ , s ) = 0 if s ∗ = s and 1 otherwise, then the Bay es estimator is the mo de of the p osterior distribution and is equiv alent to the maximum-a-posteriori estimate or the ideal observer. • If the loss function is the squared error loss function, i.e. L ( s ∗ , s ) = ( s ∗ − s ) 2 , then the Bay es estimator is the mean of the p osterior distribution. • If the loss function is the absolute v alue loss function, i.e. L ( s ∗ , s ) = | s ∗ − s | , then the Bay es estimator is the median of the p osterior distribution. The Bay es risk is deﬁned as E π ( L ( s ∗ , s )), where π is the prior distribution of s and the Ba yes estimator is the estimator that minimizes the Ba yes risk. Since π is the uniform distribution by our assumption, 1 N P i L ( s ∗ ( i ) , s ( i )) is an unbiased estimator of the Bay es risk. Thus the Bay es estimator is the appropriate choice in order to minimize 1 N P i L ( s ∗ ( i ) , s ( i )), which is equal to e δ if L ( s ∗ , s ) = δ ( s, s ∗ ) (1) This analysis shows that w e should c ho ose the loss function according to Eq. (1). 6 An iterative impro v emen t approac h to ﬁnding go o d co deb o oks In the previous section w e hav e determined the optimal deco ding scheme for a sp eciﬁc e δ and a sp eciﬁc co deb ook. Next we need to ﬁnd an optimal co debo ok Φ that minimizes e δ further. While Shannon’s channel co ding theorems show the existence of an optimal co deb ook asymptotically , ﬁnding a go od co debo ok for arbitrary lengths has prov en to b e a diﬃcult task. If the error metric is for instance a Hamming distance in the co deword space, then traditional co ding constructions (e.g. Hamming co des, blo c k co des) can generate co deb ooks eﬃcien tly via linear algebraic techniques, although ther p erformance is p oor as the co dew ord length grows. In fact, even for the Hamming distance metric, a general construction of optimal co de do es not exist. In our case, the error is a general error in the source symbol space and such tec hniques are not applicable an ymore. On the other hand, solving it via exhaustive search is not feasible for co debo oks of large lengths. F or many optimization problems where there is no gradient or the gradient is diﬃcult to compute, gradient-based nonlinear programming algorithms are not applicable. In this case, AI-based optimization heuristics ha ve pro ven to b e useful to ﬁnd go od and near-optimal solutions to such problems in m uc h less time than an exhaustive search would entail. In [ 6 ] such techniques hav e b een used to 4 ﬁnd co debo oks minimizing the Hamming distance for small co deb ooks. One of the con tributions of this pap er is to explore whether such AI-based optimization metho ds can b e used to construct a go od co debo ok based on the bit-dep enden t error metric. T o this end, w e use the follo wing heuristic. W e ﬁnd a co deb ook that min- imizes the ob jective function v = P i 6 = j δ ( s i , s j ) p c (Φ( s i ) | Φ( s j )) where p c ( x | y ) for x, y ∈ C is the probability of receiving co deword x when co dew ord y is transmitted. The reason for choosing v in this form is that for a ﬁxed symbol s , the quan tity P s i 6 = s δ ( s i , s ) p c (Φ( s i ) | Φ( s )) is an estimate of the mean error b et w een the transmitted sym b ol and received symbol when s is transmitted by sampling only on the co debo ok and th us v is an estimate of the mean error when the source sym b ols are transmitted with equal probability . F or the A W GN c hannel with zero mean and v ariance σ 2 , p c ( x | y ) is prop ortional to e − d H ( x,y ) 2 σ 2 where d H is the Hamming distance. Th us we get v = X i 6 = j δ ( s i , s j ) e − d H (Φ( s i ) , Φ( s j ))) 2 σ 2 . W e use several search algorithms including genetic algorithm [ 7 ], and v arious t yp es of hill-climbing algorithms [ 8 ] to ﬁnd co debo oks that minimizes v . T o ensure the co dew ords are distinct, w e add a p enalt y term to the ﬁtness or ob jective function whenev er a candidate co deb ook satisﬁes d H (Φ( s i ) , Φ( s j )) = 0 for some i 6 = j . Note that the optimal noteb o ok dep ends on the error function e δ and the channel probabilit y p c . F or the genetic algorithm, the co debo ok as a whole can b e represented as a binary string forming a chromosome. Our exp erimen ts found that a simple genetic algorithm with one p oin t crossov er and sw ap mutation p erforms b etter than hill-clim bing in minimizing v . 7 Comparison with related w ork In unequal error protection (UEP), diﬀerent classes of messages are assigned diﬀeren t priorities and diﬀerent ECC metho ds are applied to each class [ 9 , 10 ]. While there are similarities in the sense that data of higher imp ortance should ha ve more protection against comm unication errors, there are some diﬀerences b et w een UEP and the curren t approac h. In particular, in most prior UEP framew orks, the data is classiﬁed into multiple priority classes and diﬀeren t ECC schemes are applied indep enden tly to each class and the errors betw een the diﬀeren t classes do not trade oﬀ against each other. On the other hand, in the current approach the bits hav e diﬀeren t signiﬁcance and they all contribute to the same ob jective (e.g. contributing to the v alue of an in teger) and th us their error correction should b e treated holistically . T o do this, we form ulate an optimization problem where the ob jectiv e function com bines the v arious bits with diﬀeren t signiﬁcances and thus such a trade oﬀ is p ossible. F urthermore, a Ba yes optimal decoder is prop osed here which for certain ob jective functions has a simple form and implementation. 5 While ML approac hes hav e b een applied to classical ECC co de design (e.g. [ 6 ]), it has not b een applied to co de design with diﬀerent bit signiﬁcance. Also, in contrast to [ 6 ], the current approach can b e a data-driven approach and extendable to use information extracted from empirical channel data (see Section 8). 8 Numerical results In this se ction, we illustrate this framew ork by comparing the optimized co de- b ooks and Bay es deco ding with some well kno wn linear blo c k co des and deco ding metho ds. T o show that this error metric is diﬀerent from the traditional Ham- ming distance metric, we show that co des optimized for this metric p erform b etter than Hamming co des under this metric. Note that Hamming co des are p erfects co des achieving the Hamming b ound and thus are optimal under the Hamming distance metric. 8.1 Blo c k code of rate 4/7 In this case k = 4. Let the symbols b e 0 , 1 , . . . , 15 and eac h symbol is mapp ed to a length 7 binary string. W e constructed an optimized co deb ook assuming an A W GN channel with σ = 1. W e simulate the p erformance of this co deb ook and that of the Hamming (7,4) co de [ 11 ]. W e choose Hamming co des as a baseline as they are perfect co des and ac hiev e the Hamming sphere-pac king b ound. How ever, as our exp erimen ts show, even though the Hamming co de is optimal with resp ect to the Hamming distance, it is not optimal when considering other types of error metrics. W e consider 3 deco ding schemes: 1. Hard deco ding : the received signal is quan tized to a binary word and the deco ded co dew ord is the closest c odeword with resp ect to the Hamming distance. 2. Soft deco ding : the deco ded codeword is the nearest codeword with resp ect to the Euclidean distance. 3. Bay es deco ding , as describ ed ab o ve. F or an A W GN channel the received noisy word is c 0 = c + n and the Bay es estimator requires knowing (or estimating) the v ariance of the noise. With the assumption that the co dew ords are equiprobable, the v ariance of the co dew ords bits can b e easily computed as the p opulation v ariance of the bits of co debo ok Φ. Since the noise is uncorrelated with the co dew ords, the v ariance of the noise can b e estimated by subtracting the v ariance of the co dew ords bits from the estimated v ariance of the received noisy bits. W e consider b oth error metrics e δ 1 and e δ 2 . As describ ed earlier, the Bay es estimator is simply the median and mean of the p osterior distribution in the sym b ol space S resp ectiv ely . F urthermore, the co debo ok used is optimized by 6 minimizing v using a genetic algorithm for e δ 1 resp. e δ 2 . In particular, for e δ 1 , an optimized co debo ok found by the genetic algorithm is:                             0 1 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 1 0 1 0 1 0 0 1 0 1 1 1 0 1 1 0 1 1 0 0 1 1 0 1 1 0 1 1 1 1 1 1 0 1 1 1 1 1 0 1 1 1 0 1 1 0 1 1 0 0 1 1 1 1 1 1 0 1 1 1 1 1 1 0 1 0 1 1 1 1 0 1 0 1 1 0                             where each row is a co dew ord and the ﬁrst co dew ord corresp onds to source sym b ol 0 and the last co dew ord corresp onds to source symbol 15. Because of the additiv e white noise assumption, p erm uting the columns of this co deb ook will not aﬀect the error rate. On the other hand, p erm uting the ro ws of the co debo ok (whic h corresp onds to p erm uting the co dew ords) will aﬀect the error rate. This is in contrast to the Hamming distance error metric (for which the Hamming code is optimized for) which do es not change under p erm utation of the co dew ords. As the error metric measures the diﬀerence b et ween symbols as integers, the Hamming distance of the co dew ords in this optimized codeb o ok are correlated with the diﬀerence of the symbols as in tegers, not as bitstrings. F or instance, recall that 7 and 8 diﬀer by 1 as integers but has a Hamming distance of 4 as bitstrings. In this co deb ook the Hamming distance b et ween the co dew ords for 7 and 8 is 1. Similarly , 8 and 0 diﬀer by 8 as integers but ha ve a Hamming distance of 1 as bitstrings. In the co deb ook, the codewords for 0 and 8 hav e a Hamming distance of 5. This is further illustrated in Fig. 2, where we show a plot of the Hamming distance of the co dew ords c i and c j for i and j v ersus | i − j | . Ideally , we wan t a linear relationship b et ween the Hamming distance of the codewords for i and j and log | i − j | , i.e., num b ers that are further apart n umerically should hav e corresp onding co dew ords with larger Hamming distance. W e see that the optimized co debo ok does a muc h b etter job at satisfying this requiremen t than the Hamming (7 , 4) co de whose co dew ords only hav e a pairwise Hamming distance of 3, 4 or 7. F or e δ 2 , an optimized co debo ok is: 7                            Figure 2: Hamming distance of co dewords of i and j versus | i − j | for rate 4/7. 8                             0 0 1 0 1 1 1 0 0 1 0 0 1 1 1 0 1 0 1 1 1 0 0 1 0 1 1 0 1 0 1 0 0 1 1 1 0 1 0 0 1 0 1 0 1 1 0 1 0 1 0 0 0 0 1 0 1 1 1 0 0 0 0 1 1 0 0 0 0 1 1 1 0 1 0 0 1 1 1 0 1 0 0 0 1 1 0 1 1 0 1 1 1 0 1 1 0 0 0 1 0 1 0 0 0 0 1 0 1 1 0 0                             T o test the p erformance of the optimized co deb ooks and the v arious deco ding metho ds, we sim ulated the system using 10 6 random symbols enco ded with the co debo ok, mo dulated with baseband BPSK and transmitted through an A W GN c hannel at v arious signal-to-noise ratios (SNR). W e estimate the v ariance of the noise as describ ed ab o ve by using 10 4 samples at the receiver. The results of e δ 1 and e δ 2 v ersus SNR are shown in Figs. 3 and 4 using the resp ectiv e optimized co debo ok and Bay es deco der optimized for SNR = 0 db. W e observe the follo wing in b oth ﬁgures: hard deco ding is worse than soft deco ding which is w orse than Ba yes deco ding. In all three deco ding schemes, the optimized co deb ook p erforms b etter than the Hamming co de. F or e δ 1 , soft deco ding p erforms almost as well as Bay es deco ding. F or hard deco ding, the optimized co deb ook can b e worse than the Hamming co de for large SNR. This is b ecause the co debo ok is tuned for a sp eciﬁc noise SNR, and the p erformance improv ement ov er the Hamming co de decreases (and can b ecome inferior) as the SNR deviates from the tuned SNR (of 0 db in our examples). F or e δ 2 , using b oth an optimized co debo ok with Bay es deco der results in an error that is ab out a third smaller than Hamming (7,4) co de with hard deco ding. Finally , the b eneﬁts of using the optimized co deb ook and Bay es deco ding are more signiﬁcant for e δ 2 than for e δ 1 . 8.2 Blo c k code of rate 3/8 W e rep eated the same exp eriment with k = 3 and 8-bit co dew ords and compared the optimized co debo ok with the Hadamard (8,3) co de. The simulation results for δ 2 are shown in Fig. 5 which ha ve similar trends as Fig. 4. 8.3 Blo c k code of rate 8/12 Next, we consider a rate 8/12 co de that maps 8-bit integers to 12-bit co dew ords and compare an optimized co debo ok with the mo diﬁed Hamming (12, 8) co de. 9                                         Figure 3: e δ 1 v ersus SNR for the rate 4/7 optimized co de compared with Hamming (7,4) co de.                                     Figure 4: e δ 2 v ersus SNR for the rate 4/7 optimized co de compared with Hamming (7,4) co de. 10                                     Figure 5: e δ 2 v ersus SNR for the rate 3/8 optimized co de compared with Hadamard (8,3) co de. The simulation results for δ 2 are shown in Fig. 6 which again ha ve similar trends as Fig. 4.On the other hand, b ecause of the slow conv ergence time, the resulting optimized obtained after a ﬁnite stopping time is far from optimal (as w e will see in Section 8.4) Analogous to Fig. 2, we plot the Hamming distance of co dew ords of i and j v ersus | i − j | in Fig. 7 and again the Hamming co de is clustered around a few Hamming distance. 8.4 Linear block co des Searc hing the en tire set of ( n, k ) block co des for an optimal codeb ook corresp onds to searc hing the space of 2 k b y n 0-1 matrices. In this section w e restrict ourselv es to the set of line ar blo c k co des 3 whic h are deﬁned by a k b y n gener ator 0-1 matrix G . In this case, for a symbol expressed as a 1 by k ro w vector x of 0’s and 1’s, the corresp onding co deword is the 1 by n ro w vector given by xG where the arithmetic is done in the ﬁeld F 2 . Since the space of generator matrices is muc h smaller than the space of block co des, the searc h can b e done more eﬃcien tly and the sub optimalit y of searc hing the subspace of linear co des is comp ensated by faster conv ergence, in particular when the size of the co debo ok is large. W e ﬁnd an optimized linear blo c k co de by using a genetic algorithm to search the space of generator matrice s. W e show in Fig. 8 the rate 4 / 7 linear blo c k co de optimized for δ 2 . Compared with Fig. 4, we see that this co de p erforms worse that the co de optimized o ver all co debo oks, but it still p erforms b etter than the Hamming (7,4) co de ov er a range of SNR. The generator matrix 3 of whic h the Hamming codes are an example of. 11                              Figure 6: e δ 2 v ersus SNR for the rate 8/12 optimized co de compared with Hamming (12, 8) co de. 1 2 3 4 5 6 7 8 9 10 11 12 Hamming distance of c i and c j 10 0 10 1 10 2 10 3 10 4 |i-j| Rate 8/12 block code optimized codebook Hamming (12,8) code Figure 7: Hamming distance of co dew ords of i and j v ersus | i − j | for rate 8/12. 12 G (in nonstandard form) is given by: G =     0 1 0 1 0 1 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0     On the other hand, when w e consider the optimized linear block code for rate 8 / 12 and comparing it with a Hamming (12,8) co de, the conclusions are diﬀeren t. Because of the size of the co debo ok, searc hing the co deb ook space is m uch slow er and after a similar num b er of generations of running the genetic algorithm, optimizing the linear co de (Fig. 8) results in a b etter co deb ook than searc hing ov er the codeb o ok space (Fig. 6). This indicates that the co deb ook obtained by searching the co deb ook space is far from optimal.The generator matrix of the optimized linear co de is given by: G =             0 1 1 0 1 0 0 1 0 1 0 0 1 0 0 1 0 1 1 1 1 0 1 1 1 0 0 0 0 0 1 0 1 0 1 1 1 0 0 0 0 0 0 0 1 0 1 0 1 0 0 1 0 0 1 0 1 0 0 1 1 0 0 1 0 0 1 0 1 0 1 1 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 1 0             In Fig. 10 we also plot the Hamming distance of tw o co dew ords c i and c j v ersus | i − j | for the optimized linear rate 8 / 12 co de. When compared with Fig. 7 this is another indication that the optimized linear co de is b etter than the optimized co debo ok due to the faster conv ergence and the ﬁnite stopping time. 9 Signed in tegers in t w o’s complemen t format So far w e hav e considered nonnegativ e integers represented as bitstrings. The same approac h can b e applied to signed in tegers. Of course, the optimal codeb ook can b e very diﬀerent in this case. Consider the case of representing p ositiv e and negativ e integers − 2 k − 1 ≤ i < 2 k − 1 in k -bit 2’s complemen t format. Note that in this case the Hamming distance b etw een 0 and − 1 is k . F or the case k = 4, rate 4 / 7, the optimized co debo ok for e δ 2 is: 13 -3 -2 -1 0 1 2 3 SNR(dB) 10 1 e 2 Rate 4/7 block code Hamming (7,4)(Hard Decoding) Hamming (7,4)(Soft Decoding) Hamming (7,4)(Bayes Decoding) Optimized linear codebook(Hard Decoding) Optimized linear codebook(Soft Decoding) Optimized linear codebook(Bayes Decoding) Figure 8: e δ 2 v ersus SNR for the rate 4/7 optimized linear co de compared with Hamming (7,4) co de. -3 -2 -1 0 1 2 3 SNR(dB) 10 3 e 2 Rate 8/12 block code Hamming (12,8)(Hard Decoding) Hamming (12,8)(Soft Decoding) Hamming (12,8)(Bayes Decoding) Optimized linear codebook(Hard Decoding) Optimized linear codebook(Soft Decoding) Optimized linear codebook(Bayes Decoding) Figure 9: e δ 2 v ersus SNR for the rate 8/12 optimized linear co de compared with Hamming (12,8) co de. 14 1 2 3 4 5 6 7 8 9 10 11 Hamming distance of c i and c j 10 0 10 1 10 2 10 3 10 4 |i-j| Rate 8/12 block code optimized linear codebook Hamming (12,8) code Figure 10: Hamming distance of co dewords of i and j v ersus | i − j | for rate 8/12. 15                             0 0 0 1 0 1 1 0 0 0 0 1 1 1 0 0 0 1 1 1 1 1 0 0 1 1 1 1 0 1 0 1 1 1 1 1 1 0 0 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 1 1 0 0 0 0 1 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 1 0 0                             where the ﬁrst, 8-th, 9-th, and last row corresp onds to s = 0, s = 7, s = − 8, and s = − 1, resp ectiv ely . The numerical results are shown in Fig. 11 which again are very similar trendwise to Fig. 4.                                  Figure 11: e δ 2 v ersus SNR where the source symbols are represented in 2’s complemen t format. 16 10 Comparison with related w ork In unequal error protection (UEP), diﬀerent classes of messages are assigned diﬀeren t priorities and diﬀerent ECC metho ds are applied to each class [ 9 , 10 ]. While there are similarities in the sense that data of higher imp ortance should ha ve more protection against communication errors, there are some diﬀerences as w ell. In most prior UEP frameworks, the data is classiﬁed into multiple priorit y classes and diﬀeren t ECC schemes are applied independently to eac h class and the errors b et w een the diﬀerent classes do not trade oﬀ against each other. In the current approach the bits hav e diﬀeren t signiﬁcance and they all contribute to the same ob jective (e.g. contributing to the v alue of an in teger) and th us their error correction should b e treated holistically . T o do this, w e formulate an optimization problem where the ob jectiv e function com bines the v arious bits with diﬀerent signiﬁcances and th us suc h a trade oﬀ is p ossible. F urthermore, a Ba yes optimal deco der is prop osed here which for certain ob jective functions has a simple form and implementation. While ML approaches ha ve b een applied to classical ECC co de design (e.g. [ 6 ]), it has not b een applied to co de design with diﬀerent bit signiﬁcance. Also, in contrast to [ 6 ], the current approac h can b e data-driven and use information extracted from empirical channel data (see Section 8). 11 A new n um b er format for computer memory Soft error in semiconductors can cause the data in memory devices to randomly undergo bit ﬂipping errors [ 12 ]. This problem will b e of more concern in emerging technology such as m emristiv e memories [ 13 ], quantum memories [ 14 ], c hemical and biological memories [ 15 ] where the probability of a soft error can b e unpredictable and large. F urthermore, when data is stored in memory for an extended amoun t of time, the error rate will also increase. Error correction circuitry can be added to memory chips, but this adds additional cost, energy and latency . In most n umerical computing applications, the tolerance to error is lo w, and th us memory comp onen ts hav e very low error rate. In this section, we consider applications such as deep learning where suc h errors are tolerated as the input is noisy and imprecise. In addition, signiﬁcant noise is added to sto c hastic rounding schemes to improv e the p erformance of low precision deep learning [ 16 ] and ODE solvers [ 17 ]. Similarly , in sto c hastic computing [ 18 , 19 ], the data is enco ding in sto c hastic pulses. In these applications, memory elements with v ery high error rate can b e tolerated and the goal of this section is to ﬁnd enco ding sc hemes for such memory elemen ts where the mean error is minimized. Deﬁne N k as the s et of integers { 0 , 1 , · · · , 2 k − 1 } . W e consider the problem of storing n umbers in N k in to k -bits of storage. Let us denote the set of k -bit binary strings as S k . The storage enco ding corresp onds to a bijection σ from N k to S k , i.e. there are k ! diﬀeren t enco dings σ p ossible. The canonical enco ding, which w e denote as σ ∗ , is deﬁned as mapping each member of N k to its representation in binary , i.e. σ ∗ (0) = 00 · · · 00, σ ∗ (1) = 00 · · · 01, etc. and is equal to b k deﬁned 17 in Section 2. With this canonical bijection, we can use the notation N k and S k in terchangeably , dep ending on context. Let us assume that each bit in a k -bit memory unit can ﬂip indep endently according to a probabilit y p ∈ [0 , 1]. The v alue of p is unkno wn, but follows a distribution X p . Consider a n umber x ∈ N k dra wn from a probability distribution X with supp ort in N k and stored in the memory unit as a bitstring σ ( x ). Due to the error in the memory unit, the bitstring σ ( x ) will change to a bitstring s ∈ S k where each bit changes parity with probabilit y p and its corresp onding num b er σ − 1 ( s ) will b e a random v ariable Y (that dep ends on p ). Since p is unknown, we will not apply any pro cessing to σ − 1 ( s ) to reco ver x . W e assume that X and X p are indep enden t. Deﬁne E x,p = E Y ( d ( x, Y ) | X p = p ), E p = E X,Y ( d ( x, Y ) | X = x, X p = p ). The mean error is given by E = E X,Y ,X p ( d ( x, Y ) | X = x, X p = p ) = R 1 0 R 1 0 p X p ( q ) p X ( x ) E x,p dxdq . In the sequel we assume b oth X and X p to b e the uniform distribution on [0 , 1]. First let us consider the case d ( x, y ) = k x − y k 2 . Numerical exp erimen ts indicate that when p ≤ 0 . 5, the canonical enco ding σ ∗ is minimal among all encodings. F or instance, for the case k = 3, we sho w in Fig. 12 the error E p for the standard enco ding, and the error E p for the optimal enco ding (among all 8! = 40320 p erm utations) at eac h p . It shows that for p > 0 . 5, the optimal enco ding can hav e substantially lo wer E p than the canonical enco ding σ ∗ . Since p is unknown, we wan t to ﬁnd a single enco ding σ suc h that E (i.e. E X p ( E p )) is minimal. This ﬁgure suggests that if X p has supp ort in [0 , 1 2 ], then the canonical enco ding has minimal mean error. Ho wev er, if p can range ov er all of [0 , 1], then there are other enco dings for which E is lo wer than the canonical enco ding. Based on these numerical exp erimen ts, we conjecture the follo wing: Conjecture 11.1 If p ≤ 0 . 5 , the c anonic al enc o ding has minimal err or among al l enc o dings. Ther e exists an enc o ding with minimal err or among al l enc o dings and al l p ≥ 0 . 5 . The minimal enco ding for p ≥ 0 . 5 as conjectured to exist by Conjecture 11.1 is diﬃcult to compute as it needs to b e optimal for all p ≥ 0 . 5. Even if w e restrict to a small set of p for large k ﬁnding the minimal enco ding is hard to ﬁnd due to combinatorial explosion. One of the purp oses of this section is to in tro duce an enco ding that is easy to deﬁne and implement that app ears to b e near optimal. In particular, consider the enco ding σ c : N k → N k , where σ c (2 n ) = n and σ c (2 n + 1) = M − n where M = 2 k − 1. T ables 1 and 2 sho w the formul as on sums and pro ducts of num b ers encoded in this format. These formulas show that addition, subtraction and multiplication can b e p erformed on ﬁxed p oin t nonnegative num b ers enco ded in this format via relativ ely straightforw ard mo diﬁcation of standard arithmetic circuits. This means that an arithmetic logic unit (ALU) for this num b er format should ha ve similar hardwar e complexity as that for the traditional num b er format. Fig. 12 shows that the error E p for σ c is close to optimal for k = 3 and p ≥ 0 . 5. Fig. 13 shows for k = 4 and eac h p the minimal E p among 10 6 random 18       p      E p *   c Figure 12: Canonical vs b est enco ding vs σ c ( k = 3) for d ( x, y ) = k x − y k 2 .       p      E p *            1 0 6                  c Figure 13: Canonical vs b est among 10 6 enco dings vs σ c ( k = 4) for d ( x, y ) = k x − y k 2 . 19 σ c ( a + b ) σ c ( a − b ) a , b even σ c ( a ) + σ c ( b ) σ c ( a ) − σ c ( b ) a even, b o dd σ c ( b ) − σ c ( a ) σ c ( a + 1) − σ c ( b + 1) a o dd, b even σ c ( a ) − σ c ( b ) σ c ( a ) + σ c ( b ) a , b o dd 2 M + 1 − σ c ( a ) − σ c ( b ) σ c b − σ c ( a ) T able 1: Sum and diﬀerence formulas for σ c . σ c ( ab ) a , b even 2 σ c ( a ) σ c ( b ) a even, b o dd σ c ( a )(2 M − 2 σ c ( b ) + 1) a , b o dd (2 M + 1)( σ c ( a ) + σ c ( b ) − M ) − 2 σ c ( a ) σ c ( b ) T able 2: Pro duct formulas for the enco ding σ c . bijectiv e enco dings. In this case this minimal E p is indistinguishable from the error for σ c for p ≥ 0 . 5. Figure 14 sho ws that the mean error E is less for σ c than σ ∗ for v arious v alues k and app ears asymptotically to diﬀer by a multiplicativ e constant ≈ 1 . 579. This indicates that using σ c rather than the canonical enco ding σ ∗ will result in a low er av erage error when the bit-ﬂipping probability p is unknown and can range from 0 to 1. The corresp onding ﬁgures for d ( x, y ) = k x − y k is shown in Figs. 15 and 16. Numerical results show similar b ehavior when N k are the set of integers − 2 k − 1 , · · · , 0 , 1 , · · · , 2 k − 1 − 1 in k -bit 2’s complemen t format. 12 Concluding remarks W e prop osed a framework for error correction co ding that tak es in to account the diﬀerence in bit signiﬁcance in the source sym b ols by using an appropriate error metric and minimizing it using a Ba yes deco der and an optimized co debo ok deriv ed from iterativ e improv ement search tec hniques. W e show that the Bay es deco der performs b etter than standard soft and hard minim um distance deco ding and that the optimized co deb ook p erforms b etter than classical linear blo ck co des such as Hamming codes. The error metric based on the diﬀerence | i − j | is similar to assigning an exp onen tial weigh t 2 d to the d -th bit. The same approac h can b e applied to other wa ys of assigning signiﬁcance to the v arious bits in the source bit stream b y deﬁning δ appropriately . In addition, even though we hav e b een discussing the bits to b e indep enden t in terms of their signiﬁcance, the loss function L describ ed abov e is more general in the sense that it compares tw o diﬀerent source sym b ols as a whole, not just comparing them bit wise, and thus can take into accoun t the correlation among bits. 20       k                      E * c Figure 14: Mean error E for the canonical enco ding vs σ c for d ( x, y ) = k x − y k 2 .       p          E p *   c Figure 15: Canonical vs b est enco ding vs σ c ( k = 3) for d ( x, y ) = k x − y k . 21       k             E * c Figure 16: Mean error E for the canonical enco ding vs σ c for d ( x, y ) = k x − y k . F urthermore, we ha ve considered the A W GN channel mo del in this pap er, but the Bay es estimator can b e deﬁned and the co debo ok can b e optimized for other channel mo dels (such as BSC) as w ell. In such c ases, deep neural net works can b e used to estimate the unkno wn channel and noise characteristics [ 20 ]. F or instance, a data-driven approach can b e used to estimate the probabilities p c (Φ( s i ) | Φ( s j )) in the deﬁnition of v . In the examples abov e, the symbol space S has size 2 k . This is not a necessary requiremen t; the optimized co debo ok can b e built to map any num b er of m sym b ols to n -bit co dew ords, resulting in a rate log 2 ( m ) /n co de, thus providing a more ﬂexible tradeoﬀ b et ween rate and distortion. The cases of a nonuniform prior distribution for the source symbols or the addition of source (entrop y) co ding result in a more complicated source sym b ol p osterior distribution, Bay es estimator and the function v used in the heuristic. In addition, the function v can be amended to optimize for multiple noise SNR’s. F urthermore, noise mo dels and parameters can b e estimated by sending in termittent prob e bits through the channel [ 21 ]. Finally , diﬀerent co debo oks can b e used (and the co debo ok choice communicated b et ween the transmitter and the receiver) based on the (estimated) noise models and/or parameters. In order to use a single codeb o ok minimizing a sp eciﬁc ob jectiv e function v , w e assume the same semantics for the data that is b eing transmitted. This is a reasonable assumption esp ecially for the data transfer of large scale data sets where nearly all of the data are of the same t yp e. F or instance, in the transmission 22 of sp eech or other multimedia information, v arious header information is sent to allo w the receiver to know what type of data (e.g. DCT co eﬃcien ts, ASCI I, etc.) are b eing sent and diﬀerent co deb ooks can b e used dep ending on the type of data that is b eing sen t. These topics will b e discussed in a future pap er. Finally , for memory storage where the bit ﬂipping probability can b e large and unkno wn, a nov el n umber enco ding format is prop osed that has a smaller exp ected error than the traditional enco ding format. References [1] W. P . Pennebak er and J. L. Mitchell, JPEG: stil l image data c ompr ession standar d . V an Nostrand Reinhold, 1993. [2] G. Stefanakis, V. Nagara jan, and M. Cintra, “Understanding the eﬀects of data corruption on application b ehavior based on data characteristics,” in International Confer enc e on Computer Safety, R eliability, and Se curity , pp. 151–165, Springer, 2014. [3] S. Haykin and M. Moher, Communic ation Systems . Wiley , 5th ed., 2009. [4] S. Roman, Intr o duction to Co ding and Information The ory . Springer, 1997. [5] R. W. Keener, The or etic al Statistics: T opics for a Cor e Course . Springer, 2010. [6] W. Haas and S. K. Houghten, “A comparison of evolutionary algorithms for ﬁnding optimal error-correcting co des,” in CI ’07 Pr o c e e dings of the Thir d IASTED International Confer enc e on Computational Intel ligenc e , pp. 64–70, Springer, 2007. [7] D. E. Goldb erg, Genetic Algorithms in Se ar ch, Optimization, and Machine L e arning . Addison-W esley , 1989. [8] S. J. Russell and P . Norvig, Artiﬁcial intel ligenc e: a mo dern appr o ach , vol. 2. Pren tice hall, 3rd ed., 2010. [9] S. Borade, B. Nakib o˘ glu, and L. Zheng, “Unequal error protection: An information-theoretic p erspective,” IEEE T r ansactions on Information The- ory , vol. 55, pp. 5511–5539, Dec. 2009. [10] M. T rott, “Unequal error protection co des: Theory and practice,” in Pr o c. IEEE Information The ory Workshop , 1996. [11] T. M. Cov er and J. A. Thomas, Elements of Information The ory . Wiley , 1991. [12] C. Slayman, “Soft error trends and mitigation tec hniques in memory devices,” in 2011 Pr o c e e dings - Annual R eliability and Maintainability Symp osium , pp. 1–5, 2011. 23 [13] P . Pouy an, E. Amat, and A. Rubio, “Reliability c hallenges in design of mem- ristiv e memories,” in 2014 5th Eur op e an Workshop on CMOS V ariability (V ARI) , pp. 1–6, 2014. [14] J.-L. Le Gou¨ et and S. Moiseev, “Quantum memory ,” Journal of Physics B: A tomic, Mole cular and Optic al Physics , vol. 45, no. 12, p. 120201, 2012. [15] L. Ceze, J. Niv ala, and K. Strauss, “Molecular digital data storage using dna,” v ol. 20, pp. 456–466, 2019. [16] S. Gupta, A. Agra wal, K. Gopalakrishnan, and P . Nara yanan, “Deep learning with limited numerical precision,” in Pr o c e e dings of the 32nd International Confer enc e on Machine L e arning (F. Bac h and D. Blei, eds.), v ol. 37 of Pr o c e e dings of Machine L e arning R ese ar ch , (Lille, F rance), pp. 1737–1746, PMLR, 07–09 Jul 2015. [17] M. Hopkins, M. Mik aitis, D. R. Lester, and S. F urb er, “Sto c hastic rounding and reduced-precision ﬁxed-p oint arithmetic for solving neural ordinary diﬀeren tial equations,” Philosophic al T r ansactions of the R oyal So ciety of L ondon. A. Mathematic al, Physic al and Engine ering Scienc es , vol. 378, no. 2166, p. 22, 2020. Id/No 20190052. [18] A. Alaghi and J. P . Hay es, “Survey of sto c hastic computing,” ACM T r ans- actions on Emb e dde d Computing Systems , vol. 12, no. 2s, pp. 1–19, 2013. [19] Y. Liu, S. Liu, Y. W ang, F. Lombardi, and J. Han, “A survey of sto c has- tic computing neural netw orks for machine learning applications,” IEEE T r ansactions on Neur al Networks and L e arning Systems , pp. 1–16, 2020. [20] H. Y e, G. Y. Li, and B.-H. Juang, “Po wer of deep learning for channel estimation and signal detection in OFDM systems,” IEEE Wir eless Com- munic ations L etters , vol. 7, no. 1, pp. 114–117, 2018. [21] N.-H. Ahn, T.-G. Chang, and H. Kim, “A systematic metho d of probing c hannel characteristics of home p o wer line communication netw ork,” in Di- gest of T e chnic al Pap ers, International Confer enc e on Consume Ele ctr onics , 2002. 24

Designing communication systems via iterative improvement: error correction coding with Bayes decoder and codebook optimized for source symbol error

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment