Fast Maximum-Likelihood Decoding of the Golden Code

1 Fast Maximum-Likelihood Decoding of the Golden Code Mohanned O. Sinnokrot and John R. Barry School of ECE, Georgia Instit ute of T echnology , Atlanta, GA 30332 USA, barry@ece.gatech.edu Abstract — The golden code is a full-rate full-diversity space-time code for two transmit antennas that has a maximal coding gain. Beca use each codeword conveys four information symbols from an M - ary quadrature-amplitude modulation alphabet, the co mplexity of an exhaustive search decoder is proportional to M  . In this paper we present a new fast algorithm for maximum-likelihood decoding of the golden code that has a worst-case complexity of only O (2 M 2.5 ) . W e also present an efficient implementation of the fast decoder that exhibits a low average compl exity . Finally , in contrast to the overlaid Alamouti codes, which lose their fast decodability pr operty on time-varying channels, we show that the golden code is fast decodable on both quasistatic and rapid time-varying channels. Index T erms — space-time coding, time-varying channels. I. I NTRODUCTION WIRELESS communication sys tem having multiple antennas at both th e transmitter and receiver is capable of achieving higher data rates and grea te r robustness to fading than a single-antenna system. These gains can be achieved in part through the use of space-time coding [1, 2]. The golden code is a space-time code for two transmit an d two or more receive antennas th at was proposed in dependently in [3] and [4]. The golden code has many advantages: it is a full-rate code (since it transmits four complex information symbols in two signaling intervals); it is a full-diversity code; it h as a maximal coding gain; and in terms of the SNR required to achieve a target error probability , it performs better than all previously reported full-rate codes with two transmit antenn as. Furthermore, the coding gain of the golden co de is independent of the alphab et size, which has two benefits : first, it ensures that the golden code achieves the full diversity-multiplexing frontier of Zheng and T se [5] (see also [6]); and second, from a practical A 2 standpoint it makes the golden code compatible with adaptive modula tion. For these reasons, the golden code has been incorporated into the 802.16e W iMAX standard [7]. The golden code comes in three va riations: the Belfiore-R ekaya-V iterbo golden code [3], the Dayal- V aranasi golden code [4], and the W iMAX golden code [7]. These varia tions are isomorphic, in the sense that one can be transformed into another by multiplyi ng on the left and ri ght by unitary matrices [4]. Since the determinant is invariant to such transformations, all three variations have identical rate, diversity , and coding gain. Furthermore, we will show that they all have the same decoding complexity . For the sake of concreteness, this paper focuses on decoding of the Dayal-V aranasi golden code [4]. Nevertheless, we emphasize that the results presented here ar e applicable all thr ee variations [3][4][7]. The golden code applied to a system with two receive antennas leads to an effective four -input four - output MIMO chann el that maps each b lock of four M -ary information symbols to a corresponding vector of four complex-valued received samples [8]. A n exhaustive search maximum-likelihood (ML) decoder would consider each of the M  possible input vectors in turn and ch oose as its decision the one that best represents the channel output in a mi nimum-Euclidean-distance se nse. Th erefore, the complexity of such an ML decoder is proportional to M  . Significant reductions in average complexity ar e possible by adopting a tree-based ML decoder such a s a sphere decoder . However , for certain alphabets, the worst-case complexity of a sphere decoder and an arbitrary channel is no better th an that of an exhaustive search. For this reason, it has been widely repo rted that the worst-case decoding co mplexity of the golden code grows with the fourth power of the signal constellation size [9, 10, 1 1,12, 13]. The worst-case complexity depends dramatically on the alphabet ty pe. In particular, when an arbitrary M -ary alphabet is replaced by the practicall y importa nt special case of a QAM alphabet, the worst-case complexity of a tree-based sphere decoder for an arbitrary four -input MIMO channel drops from O ( M  ) to O ( M  ) . T o see why , consider the problem of finding the “best” leaf node stemming from a particular node at the third level of the four -level tree. While one could in principl e exhaustively search all M 3 possibilities in turn, the problem can be solved efficiently using a single inst ance of a decision device, or slicer . And for certain alphabets, like QAM, the complexity of a slicer does not grow with the size of the alphabet; rather , the worst-case complexity of a QAM slic er is O (1) . Specifically , a QAM slicer can be implemented as a pair of P AM slicers, with each requiring a single multip ly , a single rounding operation, a single addition, and a single hard-limiting operation, none of which depends on M . The percep tion that maximu m-likelihood decoding of the golden code has hi gh complexity has had two effects: First, it has motivated a search for subo ptimal decoders for the go lden code with reduced complexity and near-ML performance [14][15][16][17]. Second, it has mo tivated a search for alternatives to the golden code with similar performance but whose maximum-likelih ood decoder has lower complexity [9][10][18]. In particular , two families of space-time cod es based on the Alamouti code were proposed in [9][10] and [18 ]; we will refer to these codes as overlaid Alamouti codes, because these rate- two codes can be viewed as the sum of a conventional rate-one Alamouti co de with a second modified rate- one Alamouti code. The ov erlaid Al amouti codes share the same full rate , full diversity , and nonvanishing determinant properties of the golden code, albeit with a small coding ga in penalty of a fraction of a dB. Moreover , on quasistatic channels , the overlaid Alamouti codes of [9][10] and [18] admit a maximum- likelihood decoder whose wors t-case complexity is only O ( M  ) for M -ary QAM alphabets. In this paper , we describe a new maximum-likelih ood decoding alg orithm for the golden code with M - ary QAM whose worst- case complexity is O (2 M 2.5 ) . W e also present an ef ficient implementation that has low average complexity . The proposed decoder exploits a key prop erty of the effectiv e channel induced by the golden co de, name ly that the inner product between the firs t and second columns is real. Furthermore, we will show that the golden code retains its fast-decodability prop erty on time-varying channels. In contrast, we will see that the overlaid Alamouti codes of [9][10] and [18 ] lose their fast-decoding property on time-varying channels. Reduced complexity d ecoding for time-varying channels is particularly important in mobile applications, where user mobility can lead to ra pid channel variations. 4 The remainder of the paper is or ganized as follows . In Section II, we review the golden code. In Section III, we introduce a new fast maximum-likelihood decoder for the golden code and show that the fast decodability is retained on tim e-varying channels. In Section IV, we show that the overlaid Alamouti codes are not fast decodable on time-varying cha nne ls. In Section V, we comp are the average complexity of the proposed detector to a co nventional golden code detector . W e conclude the paper in Section VI. II. THE G OLDEN C ODE The golden code transmits fo ur complex information symb ols { x  , x  , x  , x  } over two symbol periods from two antennas, so that the rate of the space-time code is two symbols per signa ling interval. The transmitted codeword can be expressed as a 2 × 2 matrix: C = , (1) where c i [ k ] denotes the symbol transmitted from antenna i ∈ { 1, 2 } at time k ∈ { 1, 2 } . In particular , the Dayal-V aranasi golden code encodes one pair of informat ion symbols a =[ x  , x  ] T onto the main diagonal of C , and it encodes a second pa ir of information symbols b =[ x  , x  ] T onto the off-diagonal. Specifically , the Dayal-V ara nasi golden code is [4]: C = + ϕ (2) where: a ˜ = Ma , b ˜ = Mb , M = , θ = tan -  (2) , and ϕ = e j π /  . (3) As can be seen from (2), the golden code can be view ed as the sum of two rate-one diagonal algebraic space-time (DAST) codes [19]. Our model for the received signal y j [ k ] at receive ante nna j at time k is given by: c  [  ] c  [  ] c  [  ] c  [  ] ˜ a  0 0 ˜ a  ˜ b  0 0 ˜ b  cos( θ )s i n ( θ ) – sin( θ ) cos( θ ) 1 2 -- - 5 y j [ k ] =  ∑ i =  c i [ k ] h i , j [ k ] + n j [ k ] , (4) where n j [ k ] is the complex additive-white Ga ussian noise at receive antenna j at time k , and h i , j [ k ] is the channel coefficient between the i -th transmit antenna and j -th receive antenna at time k . For quasistatic fading, h i , j [ k ] = h i , j is independ ent of time k . III. A F AST ML D ECODER FOR THE G OLDEN C ODE W e begin by describing the ef fectiv e channel matrix induced by the go lden code. W e th en establish a key property of this matrix, and de scribe a maximum- likelihood decoder that expl oits the key property to reduce complexity . A. The Effectiv e Channel Matrix a nd its Key Pr opert y Substituting the definition of the go lden code from (2) and (3) into (4), the vector of samples y = [ y  [  ] , y  [  ] , y  [  ] , y  [  ]] T received at a receiver with two antennas at the two time instances can be written as the output of an ef fective four -input four-output channel: y = Hx + n , (5) where x =[ x  , … x  ] T is the vector of information symbols, n =[ n  [  ] , … n  [  ] ] T is the noise, and where H = H Ψ is the effective channel matrix : H = , (6) where c =c o s ( θ ) , s = sin( θ ) , ϕ = e j π /  , and θ = tan -  (2) . The structure of the golden code induces special prope rties in this effective matrix that we exploit to reduce decoding complexity . The fo llowing theorem relates these spec ial properties to the orthogonal- h  [  ] ϕ h  [  ] 0 0 h  [  ] ϕ h  [  ] 00 h  [  ] ϕ h  [  ] 00 h  [  ] ϕ h  [  ] 00 c s c – s 00 0 0 c s c – s 00 0 0 H Ψ 1 2 -- - 6 triangular (QR) decomposition H = QR , which results from an app lication of the Gram-Schmidt procedure to the columns of H =[ h  , … h  ] , where Q =[ q  , … q  ] is unitary and R is upper triangular with nonnegative real diagonal el ements, so that the entry of R in row i and column j is r i , j = q i * h j . Theorem 1. (The Key Pr operty) : The R matrix in a QR decompositi on H = QR of the effective channel (6) has the form R = , (7) wher e both of the upper triangular matrices A and D ar e entir ely r eal. Pr oof: See Appendix A. A few remarks: • By construction, A = and D = are triangular with real diagonal entries, so the key property is essen tially the fact that both r  ,  and r  ,  are real. • T o demonstrate that r  ,  = h  * h  / k h  k is real, it is sufficient to show that the inner product between the first two columns of the effective channe l matrix is real, a fact which is easily verified by direct computation: h  * h  = cos( θ )sin( θ )( | h  [  ] |  – | h  [  ] |  + | h  [  ] |  – | h  [  ] |  ) = ( | h  [  ] |  + | h  [  ] |  – | h  [  ] |  – | h  [  ] |  ). (8) • The theorem applies regardless of whether th e channel is quasistatic or time-varying. • The submatrix B is not mentioned because all four of its entries are generally c omplex, regardless of whether th e channel is qu asistatic or time-varying. • The fact that r  ,  is real enables the deco der in the next section to reduce both the worst-case decoding complexity and the average decodi ng complexity . In contrast, the fact that r  ,  is real enables only a reduction in aver age complexity . It has no impa ct on the worst-case complexity . A O B D r  r  r  0 r  r  r  0 1 5 ------ - 7 B. A Fast ML Decoder for the Golden Code If we define z  =( z  , z  ) T and z  =( z  , z  ) T , where z = Q * y , then the maximum-likelihood decision minimizes the cost function P ( x ) = k y – Hx k  = k z – Rx k  = k z  – Aa – Bb k  + k z  – Db k  . (9) The last equality follows from (7). Therefore, the ML decisions a ˆ and b ˆ can be found recursively using: ˆ b = arg min b ∈ A  {k z  – Aa ∗ ( b ) – Bb k  + k z  – Db k  } , (10) ˆ a = a ∗ ( ˆ b ), (1 1) where a ∗ ( b ) = arg min a ∈ A  k z  – Aa – Bb k  . (12) The function a ∗ ( b ) in (12) can be viewed as producing the best a for a given b . W ith this interpretation, the optimization in (10) can be view ed as that of finding the best b when a is optimized. The optimization in (12) is equivalent to ML detection for a channel A with two QAM inputs and an output of: v = z  – Bb . (13) It can thus be solved by a sphere detector applied to a two-level tree. As discussed in the introduction, with two QAM inputs and with out any constrai nts on A , its worst-case complexity would be O ( M ) . But the golden code induces the special property that A is entirely real, which enab les (12) to determine the real components of a independently from its imaginary components. Specifically , the fact that A is real enables us to rewrite (12) as: a ∗ ( b )= a r g m i n a ∈ A  {k v R – Aa R k  + k v I – Aa I k  } (14) = arg min a R ∈ ( A R )  {k v R – Aa R k  } + j ⋅ arg min a I ∈ ( A I )  {k v I – Aa I k  } . (15) 8 (Throughout the paper we use superscripts R and I denote the real and imaginary components, respectively , so that v R =R e { v } and a I =I m { a } ). Thus, the optimization in (12) decomposes into the two parallel optimizations of (15). Since each optimization in (15) is equivalent to ML detection for a rea l channel with two -P AM inputs, each has a wors t-case complexity of O () . Thus, the overall complexity of (15) is O ( 2 ) . W e have thus shown ho w an ML decoder can be im plemented for the golden code with a worst-case complexity of O (2 M 2.5 ) : as described in (10), the ML decision can be found by stepping through each of the M  candidate values for b , and for each implement the O ( 2 ) optimizati on of (15). A more efficient implementation is de scribed in the next section. C. A Faster ML Decoder with Low A verage Complexity The ML decoder of the previous section is fast, in the sense that it has a worst-case complexity of O (2 M 2.5 ) , but its average complexity is unnecess arily high . In this section we describe an efficient implementation of an ML decoder for the golden code that has both low average comple xity and a worst- case complexity of O (2 M 2.5 ). A conventional sphere decoder for the golden code is based on a four -level tree, with a different x i associated with each le vel. In contra st, as illustrated in Fig. 1, we propose a four-level tree that associates b R =( x  R , x  R ) with the first level, b I =( x  I , x  I ) with the second level, a R =( x  R , x  R ) with the third level, and a I =( x  I , x  I ) with the fourth level. This new tree is a direct result of the fact that A and D are real (Theorem 1), which allows us to rewrite the ML cost function from (9) as P ( x ) = k v I – Aa I k  + k v R – Aa R k  + k z  I – Db I k  + k z  R – Db R k  . (16) Thus, as illustrated in Fig. 1, (16) shows that the total cost of a leaf node x decomposes into the sum of four branch metrics, where P i denotes the branch metric for a branch at the (4 – i ) -th stage of the tree. M M M M P  P  P  P  9 Besides inducing a new tree structure, the fact that D is real also leads to a si gnificant reduction in the complexity of Schnorr -Euchner sort ing for the first two stages of the tree. Specifically , the fact that D is real leads to a second-stage branch metric P  that is independent of the starting node ( b R ). Therefore, we can perform a single sort for the symbol pair ( b R ) emanating from the root, and simultaneou sly a single sort for the symbol pair ( b I ) emanating from its child ren. In contrast, if D were complex , we would require one sorting operation for th e root node, and then a separate sorting operation for each visit ed child. In the worst case, when every child is vi sited, there would be a total of M +1 sorting opera tions required for the first two tree stages. Thus, the fact that D is real enables us to reduce the number of sorts from as many as M +1 to only two. The computatio nal savings is significant, especially for lar ge alphabets. The pseudocode of an ef ficient implementation of the proposed maximum-likelihood golden co de detector is shown in Fig. 2. The first five lines represen t initializations. In particular , the first two lines are a QR decomposition of the ef fective channe l matrix in (6) and the computation of z in (9). The squared sphere radius ˆ P , which represents the smallest cost (16) encountered so far , is initialized to infinity to ensure ML deco ding (line 3). Sorting o r Schnorr-Eu chner enumeration is used for faster convergence. Only two sorting operations (line 4 and line 5) are required. In the pseudoco de, the complex QAM alphabet A is represented by an ordered list, so that A ( k ) indexes the k -th symbol in the list. W e next describ e the remainder of algorith m, which can be interpreted as a two-level comple x sphere decoder to choose the symbol pair b = ( x  , x  ) T , followed by an independent pair of two-level re a l sphere decoders that separately decode a R = ( x  R , x  R ) T and a I = ( x  I , x  I ) T . The two-level complex sphere decoder incorporates two common optimi zations: radius update (line 33) and pruning (line 7, line 9). While these opti mizations do not affect the worst-case complexity , they af fect the avera ge complexity significantly . The first level of th e complex sphere decoder considers candidate pairs b R in ascending order of their branch metric P  (line 6). The second level of the complex sphere decoder considers candidate pairs b I in ascending order of their branch metric P  (line 8). After 10 forming b (line 10), the decoder removes the interferenc e caused by b and forms the two intermediate variables v  and v  of (13), which ar e functions of the symbols x  and x  only (line 1 1 and line 12). Following the two-level complex sphere decoder and interference cancellation, the decoder decides on the symbol pairs a R and a I separately using an independent pair of two-level real sphere decoders. The function list is used to implement Schnorr-Euchner sor ting for the final two stages of the tree; it returns a list of candidate symbols drawn from the -ary P AM alphabet A R , sorted in ascending order of distance to the input argument. As described in [20], it can be implemented efficiently using a table lookup, requiring only a single rounding operation and a single addition. After initializing the sphere radius for decoding a R =( x  R , x  R ) T (line 13) and forming the sorted list of best candidate symbols (line 14), th e real sphere decoder chooses the symbol x  R that has the lowest branch metric P  (line 16). The interference from the symbol x  R is then subtracted (line 18) and a decision is made on the symbol x  R using the P AM slicer Q ( ⋅ ) (line 19); the slicer function Q ( x ) returns the symbol from the P AM alphabet A R that is closest to x . The branch-metri c sum P  for the current candidate symbol pair a R is computed in line 20, and ra dius update occurs if it is le ss than the previous smallest value P ˆ  (line 21). Similar to the complex sphere decoder , the real sphere decoder includes pruning and sphere radius upda te (line 17 and line 21). Decoding the symbol pair a I follows identically to the decoding of the symbol pair a R and is shown in line 24 through line 31. W e emphasize that a I is decoded sepa rately from a R . The overall cost P for the current candidate symbol vecto r is updated in line 3 2. Radius update and best candidate vector update occurs if the current cost P is less than the previous smallest cost P ˆ (line 33). The algorithm could be embellished to further reduce average complexity . For example, instead of sorting the entire alphabet as in line 4 and line 5, a sort-as-you-go approach would be more efficient. Furthermore, for faster convergenc e of the search algorithm, the QR decomposition in line 1 could permute the columns of H . For the sake of clarity of e xposition, howe ver , we ha ve chosen not to include such M 11 refinements in Fig. 2. Such refinements, which have no effect on the worst-case complexity , are well- known in the literature and their application to the ps eudocode of Fig. 2 is straightf orward. Regarding the possibility of permuting the columns of H for speeding convergence, we note that fast decoding is possible only for following permutations { 1, 2, 3, 4 } , { 1, 2 , 4, 3 } , { 2, 1, 3, 4 } , { 2, 1, 4, 3 } , { 3, 4, 1, 2 } , { 3, 4, 2, 1 } , { 4, 3, 1, 2 } and { 4, 3, 2, 1 } . These are the only permut ations for which the key property of Theorem 1 will still hold. The same rest riction on the allowable permutations for reduced complexity decoding also appli es to the overlaid Alamouti codes. W e remark that a quasistatic chan nel does not offer any additional reduction in decoding complexity , as compared to a time-varying channel. This is a direct result of the fact that the entries of B in (7) are generally complex, regardless of whether th e channel is quasistatic or time-varying. D. Golden Code V ariations In this section we argue that the proposed fast ML d ecoder , al though presen ted in the context of the Dayal-V aranasi version of the golden code [4], is equally applicable to the Belfiore-Rekaya-V iterbo [3] and W iMAX [7] versions of the golden code . Substituting the definition of the Belfiore-Rekaya-V ite rbo golden code from [3, eqn. (9)] into (4), the vector of samples received at a r eceiver with two antenn as will again be given by (5), but with a new effective channel matrix of the form: H = , (17) where c =c o s ( θ ) , s = sin( θ ) , and θ = tan -  (2) . The information symbols [ b , a , d , c ] in [3] have been relabeled as [ x  , x  , x  , x  ] in (5). Since c − si and s + ci have unity magnitude, we can transform this effective matrix into the one in (6) simp ly by rotating the ch anne l coef ficients h i , j [ k ]. These rotations have h  [  ] h  [  ] 0 0 h  [  ] ih  [  ] 00 h  [  ] h  [  ] 00 h  [  ] ih  [  ] 00 cs i – 000 0 sc i + 00 00 cs i – 0 000 sc i + cs 00 s – c 00 00 cs 00 s – c 1 2 -- - 12 no impact on complexity . In partic ular , the real coefficients in the R matrix will remain real, even after rotation. Therefore, the proposed fast ML decoder is applicable. Similarly , substituting the defin ition of the W iMAX golden code fro m [7, Sect. 8.4.8.3.3] (matrix C) into (4) will again yield (5), but with a new ef fective channel matrix of the form: H = . (18) The information symbol vector [ S i , iS i +  , S i +  , − S i +  ] in [7] has been relabeled to [ x  , x  , x  , x  ] in (5). Just as before, this effective matrix dif fers from th at in (6) only by the ro tated c hannel coef ficien ts. Therefore, the proposed fast ML decoder is again applicable to this vers ion of the golden code. E. Extension to Other Alphabets u sing Coor dinate Int erleaving The proposed fast decoder explo its the property of QAM that its real and imaginary components may be decoded separately . When the real and imaginary components are not separately decodable, as is the case for PSK, hexagonal, and cross-Q A M alphabets, the real symbol pair a R cannot be decoded separately from the imaginary symbol pair a I . This leads to a worst-case decoding complexity of O ( M  ) , the same as a conventional ML decoder . The reduced decoding complexity of the go lden code can be extended to arbitrary alphabets by interleaving the coordinates prior to encoding [21]. This so-called interleaved golden code allows the receiver to separate the decoding of the real and imaginary components of x  from the real and imagin ary components of x  . In other words, after cancellin g the interference from symbols x  and x  , the symbol x  is separately decodable from x  . This key property leads to a worst-case decoding complexity of O (2 M 2.5 ) , regardless of the alphabet type, and regardless of whether the channel is time varying. The interleaved golden code maintains the same coding gain and good perfo rmance of the golden code while extending its fast decodability from QAM to arbitrary alphabets. h  [  ] h  [  ] 0 0 − ih  [  ] − h  [  ] 00 h  [  ] h  [  ] 00 − ih  [  ] − h  [  ] 00 cs 00 s – c 00 00 cs 00 s – c 13 IV . ML D ECODING OF T HE O VERLAID A LAMOUTI C ODE In this section we show that the overlaid Alamouti code of [18] lo ses its fast decodability when the channel varies with time. The same re sults hold for the overlaid Alamou ti code of [10]; see [12] and the references therein. The overlaid Alamouti space-time code of [18] is: C = + , (19) where =, ϕ  = (1 + j ) , and ϕ  = (1 + 2 j ). (20) Substituting the definition of the overlaid Alamouti co de of [18] from (19) and (20) into (4), the vector of samples received at a receiver with two antennas at the two time instanc es can be written as: y = = + = Hx + n . (21) From (21) we can see that a qua sista tic channel causes the first column of H to be orthogonal to the second, and the third column to be orthogonal to the fourth. The implication on a QR decomposition H = QR is that r  ,  = r  ,  =0 . The receiver can exploit the fact that r  ,  =0 to reduce the worst-case decoding complexity as follows: the rece iver chooses a pair ( x  , x  ) and subtracts their contribution. Then, for every such pair , the receivers decides on the remaining symbols x  and x  separately . The separate decoding of x  and x  is possible because r  ,  =0 . For a square M -ary QAM alphabet, the worst-case decoding complexity is O (4 M  ) , where M  comes from the fact that there are M  ways to choose the 1 2 ------ - x  x  – x  * x  * 1 2 ------ - 1 0 0 – 1 u  u  – u  * u  * u  u  ϕ  ϕ  – ϕ  * ϕ  * x  x  1 7 ------ - 1 7 ------ - y  [  ] y  [  ] * y  [  ] y  [  ] * x  x  x  x  h  [  ] * h  [  ] h  [  ] * h  [  ] h  [  ] – * h  [  ] h  [  ] – * h  [  ] ϕ  h  [  ] – ϕ  * h  [  ] – ϕ  ** h  [  ] + ϕ  * h  [  ] ϕ  h  [  ] – ϕ  * h  [  ] – ϕ  ** h  [  ] + ϕ  * h  [  ] ϕ  h  [  ] + ϕ  * h  [  ] ϕ  ** h  [  ] – ϕ  * h  [  ] ϕ  h  [  ] + ϕ  * h  [  ] ϕ  ** h  [  ] – ϕ  * h  [  ] n  [  ] n  [  ] * n  [  ] n  [  ] * 14 symbol pair ( x  , x  ) and 4 O (1 ) come s from the fact that the real and imaginary components of x  and x  are separately decodable and each has a decoding complexity of O (1 ) . Under time-varying fading, however , the orthogonality of t he columns of H is lost, and the R matrix does not have any zero upper entries. In fact, all of the entries above the diagonal are complex in general. Therefore, ML detection has a wo rst-case decoding complexity of O ( M  ) . This result is not surprising when one considers that the Alamou ti code cannot be decode d in a low-complexity manner when the channel varies with time, and the overlaid Alamouti code s are the sum of two Alamouti block codes. V. N UMERICAL R ESULTS In Fig. 3 we compare the average complexity of th e proposed fast ML decoder to a conventional ML decoder . The channe l was modeled using (4) with quasis tatic i.i.d. Rayleigh fadi ng, with constant channel coefficients within each codeword block, and in dependent co mplex normal coefficients from bloc k to block. The alphabet was 64 -QAM. The fast ML decoder was im plemented following the pseudocode of Fig. 2. The conventional ML decode r wa s implemented using a four-level complex sphere decoder with Schnorr -Euchner enumeration for fast conver gence. Results are shown for two cases of channel matrix column ordering: no o rdering and BLAST reordering. As can be seen from Fig. 3, with no column ordering, the a verage co mplexity of the proposed fast ML decoder is about 45 % less complex than a conventional ML decoder . W ith BLAST ordering, the proposed ML decoder is abou t 30 % less complex than a conventional decoder . The average complexity is quantified by the aver age number of nodes visited while searching the tree; this measure of complexity has the advantages of being simple and reasonably insensitive to the implementation details of the algorithm. Other co mplexity measures such as floating point operations (FLOPs) are quite sensitive to the im plementation of the algorithm and can significantly exaggerate the performance of one algorithm compared to another . For example, the use of a look-up table for symbol 15 sorting would sign ificantly reduce the FLOP count, but it would not affect the conver gence time or the average node count. A significant drawback of the average node count is that it does not capture the complexity of the column ordering an d Schnorr-Eu chner symbol sorting. Therefore, beyo nd the advantages shown in Fig . 3, the proposed algorithm has three additional advantages that are not reflected in Fig. 3, namely: • the proposed algorithm reduces the number of Sc hnorr-Euchner sort operations for the first two stages to only two, compared with a conv entional decoder that can require as many as M +1 . Th e resulting complexity reduction can be significant, since sortin g is an expensive operation. • the proposed algorithm can avoi d the high complexity of BLAST ordering withou t a high performance pena lty . • decoding of the symbol pa irs a R and a I can be done in parallel, redu cing decoding latency . VI. C ONCLUSION S W e have proposed a maximum-likelihood decoding algo rithm for the golden code and its variants with M -ary QAM whose worst-case decoding complexity is O (2 M 2.5 ) . For large alphabets, this represents a significant reduction in complexity compared to the worst-case of O ( M  ) for a conventional detector . W e have presente d an efficient implem entation that was shown to significantly outperform a conventional detector in terms of average complexity . Finally , unlike the alternatives to the golden code based on overlaying two Alam outi codes, which lose their redu ced complexity decoding on time-varying channels, we have shown that the gol den code is fast decodable on both quasistat i c and rapid time-varying channels. 16 A PPENDIX A: P ROOF OF T HE K EY P ROPERTY (T HEOREM 1) Recall from (6) that the ef fective channel matrix is H = H Ψ . W e will use a QR decomposition of H , namely H = QR , to construct a QR decomposition of H , namely H = QR . Inspection of (6) reveals that h  * h  = h  * h  = h  * h  = h  * h  =0 , which implies that the subspace spanned by the fi rst and third columns of H is orthogonal to the subspace spanned by the second and fourth column s. Th is fact implies that r  ,  = r  ,  = r  ,  = r  ,  = 0 , so that: H = H Ψ = Q (22) = QG , (23) where G = , X = , Y = , and Z = . (24) Observe that the submatrices X and Z are entirely real, since c and s and { r ii } are all real. Therefore, we can transform G into an upper triangular matrix R = WG via the purely real Givens rotation matrix: W = , (25) where W  = and W  = . Substituting G = W T R into (23) yields the desired QR decomposition H = QR , where Q = QW T and R = WG = (26) = . (27) Since W  , W  , X and Z are all real, it follows that both A = W  X and D = W  Z are real. And by construction of W  and W  , both A and D are upp er triangular . r  r  0 0 r  r  00 0 r  0 0 0 r  00 c s c – s 00 0 0 c s c – s 00 0 0 X O Y Z cr  sr  cr  – sr  cr  sr  cr  – sr  cr  sr  cr  – sr  W  O O W  1 cr 11 ()  sr 22 ()  + ----------------- ------------------- --------- - cr  – sr  sr  cr  1 cr 33 ()  sr 44 ()  + ----------------- ------------------- --------- - cr  – sr  sr  cr  W  O O W  X O Y Z A O B D 17 R EFERENCES [1] S. M. Alamouti, “A Simple Transmit Dive rsity T echnique for W ireless Communications,” IEEE J. Se l. Ar eas Commun ., vol. 16, pp. 1451- 1458, Oct. 1998. [2] V . T arokh, N. Se shadri, and A. R . Calderbank , “Space-Time Codes for H igh Data Rate Wireless Communicatio n: Performance Crit erion and Code Constr uction,” IEEE T rans. Inf. Theory , vol. 44, pp . 744-765, Mar . 1998. [3] J.-C. Belfiore, G . Rekaya, and E. V iterbo, “The Golden Code: A 2 × 2 full rate Space-T ime Code with Non V anishing Determinants,” IEEE T rans. on Inf. Theory , vol. 51, no 4, April 2005. [4] P . Dayal and M. K. V aranasi, “An Optimal T wo T ransm it Antenna Space-Time Code And Its Stacked Extensions,” IEEE T rans. Inf. Theory , vol. 51, n . 12, pp. 4348 -4355, Dec. 2005. [5] L. Zheng and D. T se, “Diversity and Multiplexing: A Fundamental T radeoff in Multiple Antenna Channels,” IEEE T rans. on Inf. Theory , vol. 49, n o 4, pp. 107 3–1096, May 200 3. [6] H. Y ao and G . W . W ornell, “Achieving the Full M IMO Diversity-Multiplexing Frontier with Rotation-Based Space- T ime Codes,” i n Pr oceedings Allerton Conf. Commun., Cont., and Computing , (Illinois), October 2003. [7] IEEE 802.16e-2005: IEEE Standard fo r Local and Metropolitan Area Networks – Part 16: Air Interface for Fixed and Mobile Br oadban d W ir eless Access Systems – Amendment 2: Physical Layer a nd Medium Access Contr ol Layers for Combin ed Fixed and Mobile Operation in Licensed Bands , Feb. 2006. [8] B. Cerato, G . Masera, and E. V iterbo, “A VLSI Decoder for the Gold en Code,” 13th IEEE International C onfer ence on Electr onics, Cir cuits and Systems , ICE CS '06, pp. 549 - 552, Dec. 10-13, 2006. [9] J. Paredes, A.B. Gershman, and M. G . Alkhanari, “A 2 × 2 Space-T ime Code with Non-vanishing Determinants and Fast Maximum Likelihood Decoding,” in Pr oc IEEE International Conference on Acoustics, Speech, and Signal Pr ocessing , ICASSP 2007 , Honolu lu, pp. 877-880, April 2007. [10] S. Sezginer and H. Sar i , “A Full-Rate Full-Diversity 2 × 2 Space-Time Code for M obile W imax Systems,” in Pr oc. ICSP C , Dubai, Nov . 2007. [11] S. Sezginer and H. Sari, “Full-Rate Full-Diversity 2 × 2 Space-Time Codes for Redu ced Decoding Complexity ,” IEEE Communications Letters , V ol. 11, No. 12, pp. 1-3, Dec. 2007. [12] E. Biglieri, Y . Hong, and E. V iterbo, “On Fast Decodable Space-Time Block Codes,” submitted to IEEE T rans. on Inf. Theory , available online at arXiv [13] B. M uquet, S. Sezginer, and H. Sari, “MIMO T echniques in Mobile W iMAX Systems — Present a nd Future,” W iMAX T rends, July 20 07. [14] S. D. Howard, S. Sirianunpiboon, and A. R. Calderbank, “F ast Decoding of the Golden Code by Diophantine Approximation,” IEEE Information Theory W orkshop , Lake T ahoe, California, pp. 590–59 4, Sept. 2-6, 2007. [15] L. Zhang, B. L i, T . Y uan, X. Zhang, and D. Y a ng, “Gold en Code with Low Complexity Sphere Decoder ,” 18th Annual IEEE International Symposium on Personal, Indoor and Mobile Radio Communications (PIMRC'07 ), pp. 1-5 , Athens, Sept . 3-7, 2007. [16] M. Sarkiss, J.-C. Belfiore, and Y .-W . Y i, “Perform ance Comparison of Different Golden Code Detectors,” 18th Annual IEEE Intern ational Symposium on Personal, Indoor and Mobile Radio Communications (PIMRC'07 ), pp. 1-5 , Athens, Sept . 3-7, 2007. [17] S. Sirianunpiboon, A . R. Calderbank and S. D. Howard, “F ast Essentia lly Maximum Likelihood Decoding of the Golden Code,” s ubmitted to IEEE T rans . on Inf. Theory , 2008. [18] O. Tirk konen and R. Kashaev , “Combined Inform ation and Performance Optimization of Li near MIMO Modulations,” in P r oc IEEE In t. Symp. Inf. Theory (ISIT 2002) , Lausanne, Switzerland, p. 76, Jun e 2002. [19] M. O. Damen, K. Abed-Meraim, and J.-C. Belfiore, “Diagonal Algebraic Space-Time Block Codes,” IEEE T rans. on Inf. Theo ry , vol. 48, no. 3, pp. 628- 636, Mar . 2002. [20] A. W iesel, X. Mestre, A. Pages and J. R. Fonollo sa, “Eff icient Implementatio n of Sphere Demodulatio n,” 4th IEEE W orkshop on Sign al Pr ocessing Advances in W ireless Communications, (S P A WC 03), Rome, Ita ly , pp. 36- 40, June 15-18, 200 3. [21] M. O. Sinnokrot and J. R. Barry , “Modified Golden Codes for Fast Decodin g on T ime-V arying Channe ls,” The 1 1th International Symposium on W ir eless Personal Multimedia Communications (WPMC) , Lapland, Finland, September 8-1 1, 2008. 18 F IGURES b R a I P  = k z   R – D b R k  P  = k v R – A a R k  P  = k v I – A a I k  x ROOT Fig. 1. The structure of the proposed detection tree and its branch metrics. The cost function for the leaf node is the sum of the branch metrics, P ( x ) = P  + P  + P  + P  . b I a R P  = k z   I – D b I k  19 Fig. 2. Pseudocode of a fast ML decoder for the g olden code. 1. [ Q , R ]= QR decomposition ( H ) 2. z = Q * y 3. P ˆ = ∞ 4. [ P  , Π  ] = sort a ∈ A ( ( z  R – r  a R – r  a I )  + ( z  R – r  a I )  ) 5. [ P  , Π  ] = sort a ∈ A ( ( z  I – r  a R – r  a I )  + ( z  I – r  a I )  ) 6. for k from 1 to M % step through each b R in order 7. if P  ( k ) > P ˆ , break , end 8. for l from 1 to M % step through each b I in order 9. if P  ( l ) + P  ( k ) > P ˆ , break , end 10. [ x  R x  I x  R x  I ] = [ A ( Π  ( k )) R A ( Π  ( l )) R A ( Π  ( k )) I A ( Π  ( l )) I ] 11 . v  = z  – r  x  – r  x  12. v  = z  – r  x  – r  x  13. P ˆ  = P ˆ  = ∞ 14. X = list ( v  R / r  ) % ordered list of decisions for x  R 15. for m from 1 to % step through each x  R in order 16. P  = ( v  R – r  X ( m ))  17. if P  > P ˆ  , break , end 18. u  R = v  R – r  X ( m ) 19. q = Q ( u  R / r  ) 20. P  = ( u  R – r  q )  + P  21. if P  < P ˆ  , x  R = q , x  R = X ( m ), P ˆ  = P  , end 22. end 23. X = list ( v  I / r  ) % ordered list of decisions for x  I 24. for n from 1 to % step through each x  I in order 25. P  = ( v  I – r  X ( n ))  26. if P  > P ˆ  , break , end 27. u  I = v  I – r  X ( n ) 28. q = Q ( u  I / r  ) 29. P  = ( u  I – r  q )  + P  30. if P  < P ˆ  , x  I = q , x  I = X ( n ), P ˆ  = P  , end 31. end 32. P = P  ˆ + P  ˆ + P  ( l ) + P  ( k ) 33. if P < P ˆ , x ˆ = [ x  , x  , x  , x  ], P ˆ = P , end 34. end 35. end M M 20 SNR PER BIT (dB) COMPLEXITY (A VERAGE NODE COUNT) Fig. 3. Decoding complexity versus SNR for golden code with 64-QAM. CONVENTIONAL ML FA S T M L 14 16 18 20 22 24 26 10 20 30 40 50 60 70 N O O R D E R I N G B L A S T O R D E R I N G N O O R D E R I N G B L A S T O R D E R I N G

Fast Maximum-Likelihood Decoding of the Golden Code

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment