VLSI Design of a 3-bit Constant-Modulus Precoder for Massive MU-MIMO

VLSI Design of a 3-bit Constant-Modulus Precoder for Massi v e MU-MIMO Oscar Casta ˜ neda 1 , Sven Jacobsson 1,2,3 , Giuseppe Durisi 2 , T om Goldstein 4 , and Christoph Studer 1 1 Cornell Univ ersity , Ithaca, NY ; oc66@cornell.edu; studer@cornell.edu; web: http://vip.ece.cornell.edu 2 Ericsson Research, Gothenbur g, Sweden; sven.jacobsson@ericsson.com 3 Chalmers Univ ersity of T echnology , Gothenbur g, Sweden; durisi@chalmers.se 4 Univ ersity of Maryland, College Park, MD; tomg@cs.umd.edu Abstract —Fifth-generation (5G) cellular systems will build on massive multi-user (MU) multiple-input multiple-output (MIMO) technology to attain high spectral efﬁciency . However , having hundreds of antennas and radio-frequency (RF) chains at the base station (BS) entails pr ohibitively high hardware costs and power consumption. This paper proposes a novel nonlinear precoding algorithm for the massive MU-MIMO downlink in which each RF chain contains an 8-phase (3-bit) constant- modulus transmitter , enabling the use of low-cost and power - efﬁcient analog hardware. W e present a high-throughput VLSI architectur e and show implementation r esults on a Xilinx V irtex-7 FPGA. Compared to a recently-reported nonlinear precoder for BS designs that use two 1 -bit digital-to-analog conv erters per RF chain, our design enables up to 3 . 75 dB transmit power reduction at no more than a 2.7 × increase in FPGA resources. I . I N T RO D U C T I O N Fifth-generation (5G) cellular communication systems are widely expected to rely on massiv e multi-user (MU) multiple- input multiple-output (MIMO) technology to achiev e signiﬁcant improv ements in spectral ef ﬁciency compared to existing small-scale MIMO systems [2]–[4]. MU-MIMO equips the base station (BS) with hundreds of antennas and radio- frequency (RF) chains, enabling one to simultaneously serve tens of user equipments (UEs) in the same time-frequency resource via ﬁne-grained beamforming. Unfortunately , scaling con ventional multi-antenna BS architectures (that use high- precision RF chains) to BSs with hundreds of antenna elements entails a signiﬁcant increase in system costs and circuit power consumption. Hence, to make massi ve MU-MIMO systems inexpensi ve and po wer-ef ﬁcient, novel BS architectures and suitable baseband-processing algorithms are necessary . Low-Pr ecision BS Ar chitechur es: The use of low-precision digital-to-analog-con verters (D ACs) at the BS in the massiv e The precoding algorithm and architecture proposed in this paper builds upon the one proposed in [1] for C2PO; in contrast to these results, the present paper uses a modiﬁed architecture with a more sophisticated projection unit. A MA TLAB simulator for the precoder proposed in this paper is available on GitHub: https://github.com/quantizedmassi vemimo/3bit CM precoding. OC and CS were supported in part by Xilinx Inc. and by the US NSF under grants ECCS-1408006, CCF-1535897, CAREER CCF-1652065, and CNS-1717559. The work of SJ and GD was supported in part by the Swedish Foundation for Strategic Research under grant ID14-0022, and by the Swedish Governmental Agency for Innovation Systems (VINNO V A) within the competence center ChaseOn. SJ’ s research visit at Cornell was sponsored in part by Cornell’ s College of Engineering. TG was supported by the US NSF under grant CCF-1535902 and by the US ONR under grant N00014-15-1-2676. MU-MIMO downlink enables signiﬁcant reductions in terms of system costs and circuit power consumption. The ke y challenge with such lo w-precision BS architectures is to maintain high spectral ef ﬁciency , which requires sophisticated baseband- processing algorithms. While linear precoders, e.g., maximal- ratio transmission (MR T) and zero-forcing (ZF), follo wed by quantization exhibit low complexity [5]–[7], sophisticated non- linear precoders can achie ve superior performance, especially for the extreme case of using only a pair of 1-bit D A Cs per RF chain [8]–[13]. Recently , reference [1] presented VLSI designs of nonlinear precoders for systems that use a pair of 1 -bit D ACs per RF chain, which demonstrates that nonlinear precoding is feasible in practice for such low-precision BS architectures. The use of 1-bit D A Cs at the BS ensures that the precoded signal has constant-modulus (CM), i.e., the precoded signal’ s amplitude is equal on all antennas and constant over time, which enables the use of lo w-cost and po wer-ef ﬁcient analog circuitry , such as nonlinear power ampliﬁers. Recently , nonlin- ear precoders for 8-phase (3-bit) CM transmitters, i.e., the setup considered in this work, were proposed in [14], [15]. It remains, howe ver , an open question whether the algorithms proposed in [14], [15] can be implemented efﬁciently in hardware. Contributions: This paper de velops a novel nonlinear pre- coding algorithm in which each RF chain contains an 8-phase (3-bit) CM transmitter that enables efﬁcient analog circuitry while surpassing the error -rate performance of systems that use a pair of 1 -bit D A Cs (i.e., 4 phases) per RF chain. W e propose a noncon vex algorithm to solve the associated 8-phase ( 3 -bit) CM precoding problem in an efﬁcient manner , and we dev elop a VLSI architecture that uses a fast matrix-vector multiplication engine based on Cannon’ s algorithm [16]. W e show Xilinx V irtex-7 FPGA implementation results and provide a comparison with the C2PO precoder proposed in [1]. I I . S Y S T E M M O D E L A N D C M P R E C O D I N G A. System Model W e consider the single-cell, narro wband massive MU-MIMO downlink system shown in Fig. 1. Here, the BS, which is equipped with B antennas, serves U ≤ B single-antenna UEs. The narro wband do wnlink channel is modeled by y = Hx + n , where y = [ y 1 , . . . , y U ] T ∈ C U contains the received signals at all UEs, H ∈ C U × B is the channel matrix (which we assume is frequency-flat wireless channel . . . . . . RF RF RF . . . map. . . . RF RF RF . . . map. map. det. det. det. CM DAC CM DAC CM DAC CM precoder Fig. 1. Overview of the considered massiv e MU-MIMO downlink system with CM DA Cs. Left: B antenna massiv e MU-MIMO BS containing a CM precoder that mitigates multi-user interference and quantization artifacts in the CM DA Cs; Right: U single-antenna UEs. known to the BS), n ∈ C U models i.i.d. circularly-symmetric complex Gaussian noise with v ariance N 0 per complex entry , and x ∈ X B is the so-called precoded vector , where X is the transmit alphabet. In this work, we require that X has ﬁnite cardinality and that the entries of X hav e CM. Speciﬁcally , the CM alphabet is X = { exp( j 2 π p/P ) | p = 0 , . . . , P − 1 } where P denotes the number of phases and log 2 ( P ) the number of bits per RF chain. The CM constraint ensures that k x k 2 2 = B . B. Constant-Modulus (CM) Precoding The precoder at the BS maps the symbol vector s = [ s 1 , . . . , s U ] T into the precoded vector x ∈ X B . Here, s u ∈ O is the constellation point intended for the u th UE ( u = 1 , . . . , U ), where O is the constellation set (e.g., 16-QAM). W e assume that each UE u = 1 , . . . , U rescales its receiv ed signal y u by a factor β u ∈ C to compute an estimate ˆ s u = β u y u of the transmitted symbol s u . Nonlinear precoders that minimize the mean-squared error (MSE) between the transmitted and the estimated symbols solve the follo wing optimal precoding problem (OPP) [1]: (OPP) { ˆ x , ˆ β } = arg min x ∈X B , β ∈ C k s − β Hx k 2 2 + | β | 2 U N 0 . Here, we assume that β = β u for u = 1 , . . . , U ; as shown in [9], the UEs are able to accurately learn ˆ β . For systems that use a pair of 1 -bit D A Cs per RF chain ( P = 4 phases), methods that solve (OPP) approximately using con vex [8], [9] and noncon vex [1] relaxation hav e been proposed recently . In what follows, we present a novel precoder speciﬁcally designed for CM transmitters with 3 bits per RF chain ( P = 8 phases), which enables signiﬁcant error-rate performance improv ements compared to systems with 2 bits per RF chain ( P = 4 phases), without requiring complex RF circuitry . I I I . C 3 P O : C O N S T A N T - M O D U L U S 3 - B I T P R E C O D I N G A. Relaxing the Pr oblem (OPP) T o ﬁnd an approximate solution to ( OPP ) via methods that can be implemented ef ﬁciently , we perform the following approximations. First, we let N 0 → 0 , i.e., we assume that the system operates in the high-SNR regime. Then, we use the following approximation [1, Eq. (2)]: min x ∈X B min β ∈ C k s − β Hx k 2 2 ≈ min x ∈X B min α ∈ C k α s − Hx k 2 2 . ` 1 ` 2 ` 3 ` 4 ` 5 ` 6 A B C D E F (0 , 1) (1 , 0) = ( z ) < ( z ) < ( z ) = ( z ) 1 p 2 (1 , 1) Fig. 2. Left: 8-phase CM alphabet (conv ex polytope in blue); right: projection regions within the ﬁrst quadrant of the 8-phase CM alphabet. These two approximations result in the follo wing problem: (OPP ∗ ) { ˆ x , ˆ α } = arg min x ∈X B , α ∈ C k α s − Hx k 2 2 . W e next compute ˆ α by minimizing the objecti ve function of (OPP ∗ ) , which results in ˆ α = s H Hx / k s k 2 2 . Substituting ˆ α in (OPP ∗ ) yields k ˆ α ( x ) s − Hx k 2 2 = k Ax k 2 2 with A = QH and Q = I U − ss H / k s k 2 2 . Hence, we can simplify (OPP ∗ ) as (OPP ∗∗ ) ˆ x = arg min x ∈X B 1 2 k Ax k 2 2 . The factor 1 / 2 does not affect the solution of (OPP ∗∗ ) . W e now replace the ﬁnite-phase constraint x ∈ X B by the con ve x polytope surrounding the points X = { x p } P p =1 giv en by B = n P P p =1 α p x p | ( α p ≥ 0 , ∀ p ) ∧ P P p =1 α p = 1 o . For 3-bit CM precoding, the boundary of the conv ex polytope B is a regular octagon (see Fig. 2). Unfortunately , solving (OPP ∗∗ ) ov er the relaxed set x ∈ B B yields the all-zeros vector . W e therefore attempt to solve the following modiﬁed problem via forward-backward splitting (FBS) [17]–[19]: ˆ x = arg min x ∈B B 1 2 k Ax k 2 2 − δ 2 k x k 2 2 , (1) where the concav e regularizer − δ 2 k x k 2 2 with δ > 0 forces the solution ˆ x to lie at the boundary of the conv ex polytope B B . As the problem in (1) is noncon vex, FBS is not guaranteed to con verge to an optimal solution. Ne vertheless, the algorithm proposed exhibits good empirical performance (see Sec. IV). B. The C3PO Algorithm FBS is an ef ﬁcient numerical method to solve con vex opti- mization problems whose objectiv e function can be decomposed as f ( x ) + g ( x ) , where the function f is smooth and conv ex, and the function g is con vex but not necessarily smooth or bounded. FBS consists of the following iteration [17], [18]: x ( t +1) = prox g  z ( t +1) ; τ ( t )  with z ( t +1) = x ( t ) − τ ( t ) ∇ f ( x ( t ) ) for t = 1 , 2 , . . . , t max or until con vergence. Here, the sequence { τ ( t ) > 0 } contains suitably chosen step-size parameters and ∇ f ( x ) is the gradient of the smooth function f , and the so- called pr oximal operator for the function g is deﬁned by [20] prox g ( z ; τ ) = arg min x ∈ C B  τ g ( x ) + 1 2 k x − z k 2 2  . T o approximately solve (1) using FBS, we set f ( x ) = 1 2 k Ax k 2 2 and g ( x ) = χ  x ∈ B B  − δ 2 k x k 2 2 , where χ is a characteristic function that is zero if x ∈ B B and inﬁnity otherwise. For these choices, the gradient is given by ∇ f ( x ) = A H Ax and the proximal operator is detailed in Sec. III-C . Furthermore, we use a constant step size τ = τ ( t ) . The resulting algorithm is as follows: Algorithm 1 (C3PO) . Initialize x (1) = H H s and ﬁx the parameters δ and τ so that τ δ < 1 . Then, for every iteration t = 1 , 2 , . . . , t max compute: z ( t +1) = x ( t ) − τ A H Ax ( t ) (2) x ( t +1) = prox g ( z ( t +1) ; τ ) . (3) The pro x g operator is applied element-wise to z ( t +1) and detailed in Sec. III-C . In the last iteration t max , the output x ( t max +1) is quantized to the 3-bit CM alphabet X B . The most costly operation of C3PO is the matrix-vector product in step (2), which we compute as: A H A = H H H − vv H = H Υ H , where v = H H s / k s k 2 is a normalized version of the MR T vector; the augmented matrices H = [ H ; v H ] and H Υ = [ H H , − v ] are of dimension ( U + 1) × B and B × ( U + 1) , respectiv ely . Then, step (2) is rewritten as follo ws: z ( t +1) = x ( t ) − τ H Υ Hx ( t ) . (4) C. Pr oximal Operator for 3-Bit CM Pr ecoding The proximal operator in (3) reduces to pro x g ( z ; τ ) = pro j( 1 1 − τ γ z ) , where pro j( · ) projects each element of the argument to the closest point in the polytope B . For 3-bit CM precoding, the polytope is a regular octagon. Projecting a scalar z ∈ C onto an octagon is nontrivial so we focus on the ﬁrst quadrant of the complex plane (see Fig. 2). If z is inside the octagon (in region A ), then it remains there; if z is in the regions B , C , or D , then it will be mapped to j , 1 √ 2 (1 + j ) , or 1 , respecti vely; if z is in the re gions E or F , then it will be mapped to the closest point on the lines ` 1 or ` 2 , respectiv ely . T o determine in which of the six regions A – F the argument is located, we use the equations for the lines that separate them: ` 1 : = ( z ) = (1 − √ 2) < ( z ) + 1 , ` 3 : = ( z ) = 1 √ 2 − 1 < ( z ) + 1 , ` 4 : = ( z ) = 1 √ 2 − 1 < ( z ) − 1 . The equations for the lines ` 2 , ` 5 , and ` 6 are identical to the ones of ` 1 , ` 4 , and ` 3 , b ut with = ( z ) and < ( z ) exchanged. Using these equations, we can project z onto the set B . I V . V L S I A R C H I T E C T U R E A N D I M P L E M E N T AT I O N R E S U LT S A. Arc hitecture Overvie w The proposed VLSI architecture is shown in Fig. 3 and builds upon the one of C2PO in [1], which was designed for 2-bit CM precoding. As in [1], we assume that B is a multiple of U , so the architecture consists of B /U linear arrays, each containing U + 1 processing elements (PEs). Each linear array operates on Fig. 3. High-level block diagram of the VLSI architecture for C3PO. W e use B /U linear arrays, each consisting of U + 1 processing elements (PEs). a ( U + 1) × U sub-matrix of H and on a U -dimensional sub- vector of x ( t ) . The architecture computes step (2) simpliﬁed as in (4) via two separate matrix-vector products using Cannon’ s algorithm [16]. W e ﬁrst compute w = H ( τ x ( t ) ) by cyclically exchanging the entries of τ x ( t ) between the PEs of the same array . W e then compute z ( t +1) = x ( t ) − H Υ w by cyclically exchanging the accumulated results of the PEs within the same array . Finally , the vector z ( t +1) is fed to a projection unit implementing step (3), thus completing one C3PO iteration. The proposed architecture requires 2 U +log 2 ( B /U )+9 clock cycles for one C3PO iteration. See [1] for more architecture details. Each PE is equipped with (i) an ˜ h u memory storing the u th row of the corresponding sub-matrix taken from H ; (ii) a complex-v alued multiply-accumulate (MA C) unit; and (iii) a projection unit. See [1] for details on (i) and (ii); part (iii), the projection unit, is more complicated than that of C2PO. Speciﬁcally , this unit maps the entries of z ( t +1) to the ﬁrst quadrant of the complex plane and perform comparisons based on the line equations ` 1 – ` 6 (see Sec. III-C ) in order to perform the projection of z ( t +1) to B B . B. F ixed-P oint P arameters The entries of x ( t ) use 14 -bit signed values with 8 fraction bits. The entries of τ x ( t ) use 14 -bit signed values with 13 fraction bits. The entries of H use 11 -bit signed v alues with 8 fraction bits and are stored in look-up tables (LUTs) used as distributed RAM. The complex-v alued MA C units use 18 -bit signed values with 15 fraction bits when computing w ; 11 fraction bits are used when calculating z ( t +1) . The adder tree uses 21 bits with 15 fraction bits. The projection unit represents the constants (e.g., 1 − √ 2 and its reciprocal) using signed values with 4–5 bits, so no multipliers are used in the operations related to lines ` 1 – ` 6 . A total of 30 adders and subtractors are used within each projection unit; these components operate signed numbers with 7 fraction bits; the total bit-width varies between 14 – 15 bits, depending on the quantity . C. Err or-Rate P erformance Fig. 4(a) and Fig. 4(b) show uncoded bit-error rate (BER) as a function of the normalized transmit power % = B / N 0 for different precoding algorithms and U = 16 UEs. Fig. 4(a) shows the BER for B = 32 BS antennas and BPSK; Fig. 4(b), for B = 256 BS antennas and 16-QAM. The simulation results are for 10 , 000 Monte-Carlo trials and i.i.d. Rayleigh fading channels. Both C2PO and C3PO run with t max = 9 . For − 10 − 5 0 5 10 15 10 − 3 10 − 2 10 − 1 10 0 Normalized transmit power % [dB] Uncoded bit error-rate (BER) Inf. prec. ZF Inf. prec. MR T 3-bit CM ZF-Q 3-bit CM MR T -Q 2-bit C2PO 3-bit C3PO (a) B = 32 , U = 16 , and BPSK. − 10 − 5 0 5 10 15 10 − 3 10 − 2 10 − 1 10 0 Normalized transmit power % [dB] Uncoded bit error-rate (BER) Inf. prec. ZF Inf. prec. MR T 3-bit CM ZF-Q 3-bit CM MR T -Q 2-bit C2PO 3-bit C3PO (b) B = 256 , U = 16 , and 16-QAM. − 8 − 6 − 4 − 2 0 2 4 6 8 10 12 14 16 18 20 0 20 40 60 80 100 4 1 1 1 2 1 1 1 MR T -Q MR T -Q Min. normalized transmit power % [dB] that achieves 1% BER Throughput [Msymbols/s] C2PO C3PO B = 32 B = 32 B = 64 B = 64 B = 128 B = 128 B = 256 B = 256 (c) Performance/complexity tradeoff. Fig. 4. Subﬁgures (a) and (b): uncoded bit error-rate (BER) of v arious precoders versus normalized transmit power % . Markers show ﬁxed-point performance. Subﬁgure (c): performance/complexity tradeoffs for C2PO [1] and C3PO; the numbers next to the curves indicate t max . The vertical lines show the performance of inﬁnite-precision ZF precoding. C3PO outperforms C2PO in terms of uncoded BER with an increase in implementation complexity . T ABLE I X I LI N X V I RTE X - 7 X C7 V X 6 90 T F P G A I M P L EM E N T A T I ON R E S U L T S F O R M R T - Q [ 1 ], C 2 P O [ 1] , A N D T HE P RO P O S ED C 3 P O F OR U = 16 U E S Algorithm 2-bit CM MR T -Q [1] 2-bit C2PO [1] 3-bit C3PO (this work) BS antennas B 32 64 128 256 32 64 128 256 32 64 128 256 Slices 2 543 5 097 9 444 17 630 3 375 6 519 12 690 24 748 8 765 16 823 33 303 65 451 LUTs 7 842 15 617 32 476 64 446 10 817 21 920 43 710 85 323 29 034 56 799 113 948 224 420 Flipﬂops 5 711 11 419 21 902 42 764 5 677 12 461 26 083 53 409 11 611 24 357 49 893 101 026 DSP48 units 0 0 0 0 136 272 544 1 088 136 272 544 1 088 Clock frequency [MHz] 412 410 388 359 222 206 208 193 202 175 174 157 Latency a [clock cycles] 18 18 18 18 39 40 41 42 42 43 44 45 Throughput a [Msymbols/s] 366 365 345 319 91 82 81 74 77 65 63 56 Power consumption b [W] 0.79 1.25 1.84 3.16 1.04 1.70 3.17 5.80 1.76 2.89 5.48 10.12 a The minimum latency and maximum throughput is measured for one algorithm iteration. b Statistical power estimation at maximum clock frequency and 1.0 V supply voltage. reference, we sho w the BERs with 3-bit CM MR T -quantized (MR T -Q) and ZF-quantized (ZF-Q) precoding, as well as the BERs with MR T (“Inf. prec. MR T”) and ZF precoding (“Inf. prec. ZF”) with inﬁnite-precision D ACs. W e see from Fig. 4(a) and Fig. 4(b) that the nonlinear precoders (C2PO and C3PO) signiﬁcantly outperform MR T -Q and ZF-Q at high normalized transmit po wer % . Furthermore, compared to C2PO, we note that C3PO enables a 3 . 75 dB gain (in terms of % ) at 1% uncoded BER for B = 32 and BPSK, and 1 . 75 dB for B = 256 and 16-QAM. Finally , we note that the implementation loss of our hardware designs (sho wn with blue markers) is negligible, i.e., less than 0 . 15 dB at 1% uncoded BER. D. FPGA Implementation Results and Comparison T able I shows FPGA implementation results for 2-bit CM MR T -Q [1], C2PO [1], and C3PO. All designs were de veloped using V erilog, and implemented using Xilinx V iv ado Design Suite for a Xilinx V irtex-7 XC7VX690T FPGA. The designs support U = 16 UEs and were implemented for B = { 32 , 64 , 128 , 256 } . T able I reveals that the resources of all designs increase roughly linearly with B . MR T -Q achiev es the highest throughput thanks to its simplicity , which comes at the cost of a poor uncoded BER performance. C2PO uses ∼ 1 . 4 × more LUTs than MR T -Q and requires increased latency and critical path. Compared to C2PO, C3PO consumes ∼ 2 . 6 × the number of slices and LUTs, ∼ 2 × the number of ﬂip-ﬂops, and the same number of DSP48s. This difference is caused by the 3-bit CM projection unit, which also increases the latency with its pipeline registers. Howe ver , C3PO can signiﬁcantly outperform C2PO in terms of BER (cf. Fig. 4(a) and Fig. 4(b)). E. P erformance/Complexity T radeof fs Fig. 4(c) shows the performance-complexity tradeoffs of C2PO and C3PO: the complexity is represented by the minimum normalized transmit power % that is required to achiev e 1% uncoded BER for BPSK; the performance, by the throughput. The tradeof fs show systems with BPSK, U = 16 UEs and B = { 32 , 64 , 128 , 256 } BS antennas. As a reference, the minimum transmit power required for inﬁnite-precision ZF precoding to achiev e 1% uncoded BER is shown as a vertical line. W e see from Fig. 4(c) that, while C2PO is able to achiev e higher throughput than C3PO, C3PO requires lower transmit power to achie ve 1% uncoded BER. This difference increases for small array sizes: for a system with B = 32 , 4 iterations of C3PO achiev e 1% uncoded BER at % = 8 dB while C2PO is unable to achiev e 1% uncoded BER at such value of % . V . C O N C L U S I O N S W e hav e proposed a nonlinear precoder for 8-phase (3-bit) CM transmission, C3PO, which builds upon the 4-phase C2PO precoder [1]. By using a dif ferent projection unit and no more than 2 . 7 × higher FPGA resources, C3PO achieves up to 3 . 75 dB transmit power reduction, and thus, lo w uncoded BERs in scenarios for which C2PO exhibits poor error-rate performance. R E F E R E N C E S [1] O. Casta ˜ neda, S. Jacobsson, G. Durisi, M. Coldrey , T . Goldstein, and C. Studer , “1-bit massiv e MU-MIMO precoding in VLSI, ” IEEE J. Emer ging Sel. T opics Cir cuits Syst. , vol. 7, no. 4, pp. 508–522, Dec. 2017. [2] F . Rusek, D. Persson, B. Kiong, E. G. Larsson, T . L. Marzetta, O. Edfors, and F . Tufv esson, “Scaling up MIMO: Opportunities and challenges with very large large arrays, ” IEEE Signal Pr ocess. Mag . , vol. 30, no. 1, pp. 40–60, Jan. 2013. [3] E. G. Larsson, F . Tufv esson, O. Edfors, and T . L. Marzetta, “Massiv e MIMO for ne xt generation wireless systems, ” IEEE Commun. Mag. , vol. 52, no. 2, pp. 186–195, Feb . 2014. [4] L. Lu, G. Y e Li, A. L. Swindlehurst, A. Ashikhmin, and R. Zhang, “ An ov erview of massive MIMO: Beneﬁts and challenges, ” IEEE J. Sel. T opics Signal Pr ocess. , vol. 8, no. 5, pp. 742–758, Oct. 2014. [5] A. Mezghani, R. Ghiat, and J. A. Nossek, “T ransmit processing with low resolution D/A-conv erters, ” in Pr oc. IEEE Int. Conf. Electron., Cir cuits, Syst. (ICECS) , Y asmine Hammamet, Tunisia, Dec. 2009, pp. 683–686. [6] A. K. Saxena, I. Fijalko w , and A. L. Swindlehurst, “ Analysis of one-bit quantized precoding for the multiuser massi ve MIMO downlink, ” IEEE T rans. Signal Process. , vol. 65, no. 17, pp. 4624–4634, Sep. 2017. [7] Y . Li, C. T ao, A. L. Swindlehurst, A. Mezghani, and L. Liu, “Do wnlink achiev able rate analysis in massiv e MIMO systems with one-bit DA Cs, ” IEEE Commun. Lett. , vol. 21, no. 7, pp. 1669–1672, Jul. 2017. [8] S. Jacobsson, G. Durisi, M. Coldrey , T . Goldstein, and C. Studer , “Quantized precoding for massiv e MU-MIMO, ” IEEE T rans. Commun. , vol. 65, no. 11, pp. 4670–4684, Nov . 2017. [9] ——, “Nonlinear 1-bit precoding for massive MU-MIMO with higher- order modulation, ” in Pr oc. Asilomar Conf. Signals, Syst., Comput. , Paciﬁc Grov e, CA, Nov . 2016, pp. 763–767. [10] H. Jedda, J. A. Nossek, and A. Mezghani, “Minimum BER precoding in 1-bit massive MIMO systems, ” in IEEE Sensor Array and Multichannel Sig. Pr oc. W orkshop (SAM) , Rio de Janeiro, Brazil, Jul. 2016. [11] O. Tirkkonen and C. Studer, “Subset-codebook precoding for 1-bit massiv e multiuser MIMO, ” in Conf. on Info. Sciences and Systems (CISS) , Baltimore, MA, Mar . 2017. [12] A. L. Swindlehurst, A. K. Saxena, A. Mezghani, and I. Fijalkow , “Minimum probability-of-error perturbation precoding for the one-bit massiv e MIMO downlink, ” in Proc. IEEE Int. Conf. Acoust., Speec h, Signal Pr ocess. (ICASSP) , New Orleans, LA, USA, Mar. 2017, pp. 6483– 6487. [13] O. Casta ˜ neda, C. Studer, and T . Goldstein, “POKEMON: A non-linear beamforming algorithm for 1-bit massive MIMO, ” in Pr oc. IEEE Int. Conf. Acoust., Speech, Signal Pr ocess. (ICASSP) , New Orleans, LA, USA, Mar . 2017, pp. 3464–3468. [14] A. Noll, H. Jedda, and J. A. Nossek, “PSK precoding in multi-user MISO systems, ” in Pr oc. Int. ITG W orkshop on Smart Antennas (WSA) , Berlin, Germany , Mar . 2017, pp. 57–63. [15] S. Jacobsson, O. Casta ˜ neda, C. Jeon, G. Durisi, and C. Studer , “Nonlinear phase-quantized constant-en velope precoding for massiv e MU-MIMO- OFDM, ” Oct. 2017. [Online]. A vailable: https://arxiv .org/abs/1710.06825 [16] L. Cannon, “ A cellular computer to implement the Kalman ﬁlter algorithm, ” Ph.D. dissertation, Montana State University , USA, 1969. [17] T . Goldstein, C. Studer, and R. G. Baraniuk, “ A ﬁeld guide to forward-backward splitting with a F AST A implementation, ” Nov . 2014. [Online]. A vailable: http://arxiv .org/abs/1411.3406 [18] A. Beck and M. T eboulle, “ A fast iterative shrinkage-thresholding algorithm for linear in verse problems, ” SIAM J. Imag. Sci. , vol. 2, no. 1, pp. 183–202, Jan. 2009. [19] T . Goldstein and S. Setzer, “High-order methods for basis pursuit, ” UCLA CAM Report , pp. 10–41, 2010. [20] N. Parikh and S. Boyd, “Proximal algorithms, ” F oundations and T rends ® in Optimization , vol. 1, no. 3, pp. 127–239, Jan. 2014.

VLSI Design of a 3-bit Constant-Modulus Precoder for Massive MU-MIMO

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment