On the Optimal Integer-Forcing Precoding: A Geometric Perspective and a Polynomial-Time Algorithm

1 On the Optimal Inte ger -F orcing Precoding: A Geometric Perspecti v e and a Polynomial-T ime Algorithm Junren Qin, Fan Jiang, T ao Y ang, Shanxiang L yu, Rongke Liu, Senior Member , IEEE , Shi Jin, F ellow , IEEE Abstract The joint optimization of the integer matrix A and the po wer scaling matrix D is central to achie ving the capacity-approaching performance of Integer -Forcing (IF) precoding. This problem, ho we ver , is kno wn to be NP- hard, presenting a fundamental computational bottleneck. In this paper , we rev eal that the solution space of this problem admits a intrinsic geometric structure: it can be partitioned into a ﬁnite number of conical regions, each associated with a distinct full-rank integer matrix A . Le veraging this decomposition, we transform the NP-hard problem into a search ov er these regions and propose the Multi-Cone Nested Stochastic P attern Search (MCN-SPS) algorithm. Our main theoretical result is that MCN-SPS ﬁnds a near-optimal solution with a computational complexity of O  K 4 log K log 2 ( r 0 )  , which is polynomial in the number of users K . Numerical simulations corroborate the theoretical analysis and demonstrate the algorithm’ s efﬁcac y . Index T erms Overload MIMO, Inte ger-forcing precoding, Nonlinear optimization, Lattice basis reduction, Hybrid optimizer . I . I N T R O D U C T I O N Beniﬁted from creating spatial di versity and performing spatial multiple xing, multiple-input multiple-output (MIMO) demonstrates powerful performance, and becomes the core technology of modern communication [1]. Dev eloped through the massi ve MIMO (m-MIMO) [2] in ﬁfth-generation (5G) and the Extremely large-scale MIMO (XL-MIMO) [3] in sixth-generation (6G), the reliability of communication is enhanced by the channel hardening effect [4] aroused when the number of antennas at base station (BS) N is larger than the number of single antenna This work w as supported in part by Mobile Information Networks National Science and T echnology Major Project under Grant No.2025ZD1305200, in part by Low-Altitude Airspace Strategic Program Portfolio under Grant Z25306105. Junren Qin is with the School of Electronics Information Engineering, Beihang Univ ersity , Beijing 100191, China, the Pengcheng Laboratory (PCL), Shenzhen 518066, China, and the Shenzhen Institute of Beihang University , Shenzhen, China (e-mail: junrenqin@buaa.edu.cn). Fan Jiang is with the Pengcheng Laboratory (PCL), Shenzhen 518066, China (e-mail: jiangf02@pcl.ac.cn). Shanxiang Lyu is with the College of Cyber Security , Jinan Univ ersity , Guangzhou 510632, China (E-mail: lsx07@jnu.edu.cn). T ao Y ang are with the School of Electronics Information Engineering, Beihang Univ ersity , Beijing 100191, China (e-mail: tyang@b uaa.edu.cn;). Rongke Liu is with the Shenzhen Institute of Beihang University , Shenzhen, China, and is also with the School of Electronics Information Engineering, Beihang University , Beijing 100191, China (e-mail: rongke liu@buaa.edu.cn). Shi Jin is with the School of Information Science and Engineering, Southeast University , Nanjing 210096, China (e-mail: jinshi@seu.edu.cn). Corresponding author: Rongke Liu, T ao Y ang, Fan Jiang. 2 user equipments (UEs) K . T ow ards the massiv e communication and ubiquitous connectivity applications in 6G, the demand for increased connectivity and user density has become a key performance indicator , and the increasing trend of these parameters is e xpected to continue in near future [5]. This trend rise an attention on the overload MIMO system, in which K ≥ N and the occurrence of channel hardening effects is impeded by the limited spatial freedom. Considering the downlink, the messages are deliv ered from BS to UEs, which enables all messages of UEs to broadcast with full bandwidth. The capacity region of MIMO broadcasting (BC) channel can be established from the capacity region of MIMO multiple-access channel via the uplink-downlink duality , and can be achieved so far by dirty-paper coding (DPC) [6], [7]. Unfortunately , since the high implementation comple xity , DPC is widely regarded as a theoretical tool. T o decrease the complexity under a tradeof f on performance, T omlinson-Harashima Precoding (THP) [8] employs the decision feedback equalization (DFE) into the transmitter side. Furthermore, vector -perturbation (VP) [9], [10] precoding re gards the interference cancellation process of DFE as the solution of closest vector problem (CVP) in lattice theory [11]. T o minimize transmission power , VP employs an accurate sphere decoder (SD) combined Lenstra-Lenstra-Lov ´ a sz (LLL) algorithm [12] and sphere decoding algorithm [13]. These methods are all realized under the inspiration of interference canceling, which cannot be directly applied to the overload MIMO. As a large-scale UEs are distributed into less BS in the channel, the multi-user inference (MUI) is unable to eliminate completely , making it defecti ve to the idea of interference canceling. Meanwhile, the lack of spatial dimension in the ov erload MIMO decrease the accuracy of successiv e interference cancellation (SIC) in THP , and lose the effecti veness of LLL algorithm in VP . These lead to a weak performance on THP and a high complexity on VP . On the other hand, the linear precoding rectify MUI by multiplying a precoding matrix related to channel matrix with the signal or message, and becoming a large class due to its easy-to-implement characteristics. The popular methods in this category are zero-forcing (ZF) precoding and regularized ZF (RZF) precoding [14]–[16]. For the precoding matrix, ZF employs the inv erse (or pseudo inv erse) of channel matrix, and RZF conducted a normalization matrix by minimum mean square error (MMSE) ﬁltering. When N ≫ K in MMIMO or XL-MIMO, both ZF and RZF can approach the channel capacity [1], [17], but the situation is quite the opposite in the overload MIMO. Regarding the ZF , since the channel matrix is full column-rank, the multiplication between channel matrix and its pseudo in verse outputs the least norm and least square solution, and the gap between the outputs and identity matrix are enlarged during the ratio K / N e xpansion. Meanwhile, in the pseudo inv erse of channel matrix, the expansion of condition number under the overload MIMO cause the increased additiv e noise contained in the outputs of precoder . Although these ﬂa ws are attenuated in RZF by the MMSE ﬁlter , the effects caused by overload MIMO are still non-negligible and an adaptable solution for overload MIMO is required. Recently , based on the lattice-reduction-aided (LRA) precoding [18] and compute-and-forward (CF) [19], the integer -forcing (IF) precoding [20] provides a effecti ve methods to restrain MUI without a signiﬁcant increased complexity . Rather than separating users’ code word by the equalization and beamforming, IF precoder applies in verse linear transforming to correspond the integer -linear combination (ILC) to the desired message of that user [20]. As this reason, IF precoder can uni versally achie ve the MIMO capacity to within a constant gap [21]. Re garding 3 the overload MIMO, the ILC provides a degree of freedom to of fset the ef fects caused by the dimensional ﬂaw , which make IF precoding exhibiting a good adaptability to overloaded MIMO. In the IF precoding, the performance relates to the selection of a proper linear precoding matrix P . T o achiev e this optimization for general signal-to- noise ratio (SNR), [20] shows that the optimal selection of P is to choose a proper (unimodular) inte ger-valued matrix A and the beamforming matrix D such that HP ≈ c DA , and becomes equiv alent when SNR → ∞ . As a results, the research on IF precoder designing mainly focuses on the joint optimization between A and D . For the special case that K = 2 , Silva et. al [20] provide the optimal pair ( A , D ) . in high SNR. In general case, He et. al [22] propose an iterativ e algorithm based on uplink-downlink duality to calculate the optimal ( A , D ) accurately . Qiu et. al [23] obtain the optimal ( A , D ) via the Particle Swarm Optimization (PSO) algorithm [24]. Considering the requirements of low-comple xity and sub-optimal optimization in general case, V enturelli et. al [25] propose an calculation-constraint-relaxed algorithm form integer-v alued full-rank matrix A to comple x-v alued upper-unitriangular matrix A . Nev ertheless, the aforementioned methods face three critical tradeof fs: • Optimality vs. Feasibility: Iterativ e algorithms based on uplink-downlink duality may yield power allocations incompatible with practical constraints like a common shaping lattice [22]. • Global vs. Local Search: Heuristic methods like Particle Swarm Optimization (PSO) are prone to getting trapped in local optima and incur high computational costs [23], [26]. • Accuracy vs. Complexity: Relaxation-based methods simplify the problem by constraining A to a relaxed form (e.g., comple x upper-unitriangular), b ut this often leads to performance loss [25]. Thus, designing an algorithm that efﬁciently na vigates this joint optimization to strike a superior comple xity-accuracy trade-off remains an open and signiﬁcant challenge. The contrib utions of this article, along with the highlights, are summarized as follo ws. • Geometric Problem Ref ormulation: W e establish a generalized optimization model that re veals the underlying geometric structure of the solution space. By treating A as a function of D , we map the problem to a search within a ( K − 1) -dimensional space Ω . W e prove that Ω can be partitioned into a ﬁnite number of conical regions, each corresponding to a distinct, ﬁxed integer matrix A . A key insight is that any point within such a region can be uniquely represented by the direction of a ray from the origin, which drastically simpliﬁes the subsequent search. This decomposition transforms the continuous joint optimization into a structured search ov er discrete regions. • Polynomial-T ime Algorithm Design: MCN-SPS adopts a stochastic, nested search strategy that iterativ ely explores candidate conical regions. At each iteration, it constructs a hypersphere centered at the current point and generates multiple sampling points via randomly directed rays. Each sampling point is projected back to the constraint set Ω and then fed into an efﬁcient alternating optimization (A O) subroutine to obtain a local optimum. By comparing the sum rates of these local optima with the current point, MCN-SPS either shifts its search center to a better solution or contracts the search radius, thereby systematically reﬁning its exploration. Crucially , within the A O subroutine, we dev elop a direction-guided method for D -optimization that employs 4 a contraction mapping in the Hilbert metric space, ensuring fast con vergence. By discretizing the continuous search space into these structured regions and lev eraging efﬁcient local optimization, MCN-SPS achiev es both low time complexity and high accuracy in solving the ( A , D ) optimization problem. • Theoretical and Numerical V alidation: W e rigorously analyze the computational complexity of MCN-SPS, proving it to be O  K 4 log K log 2 ( r 0 )  with LLL algorithm, which is polynomial in K . Extensive simulations corroborate this analysis, demonstrating that MCN-SPS achiev es a lo wer computational cost than the PSO- based benchmark. Moreover , it deli vers superior sum-rate performance, outperforming all compared methods, especially in o verloaded MIMO scenarios. Notation : Matrices and column vectors are denoted by uppercase and lowercase boldface letters. Specially , for a matrix X , we denote its i -th column vector as x i , and represents the i -th row vector as x ⊤ i . W e represent the set by uppercase squiggles letters. For any x > 0 , let log + ( x ) ≜ max { 0 , log( x ) } . Considering the multiplying for matrix A and B , we denote the dot product as AB , and the Hadamard product as A ◦ B . Regarding two vector x and y , we deﬁne x ⊘ y the element-wise di vision which outputs a vector [ x 1 /y 1 , · · · , x n /y n ] ⊤ . Considering multiplication in symbol calculation, the term ⊗ is represented as modulo- p multiplication. The operation diag( · ) represents obtaining a vector contained the main diagonal elements of the inputted matrix. I I . P R E L I M I NA R I E S A. System Model Regarding a single-cell system consisting of a BS with N antenna and K single-antenna UEs, the BS broadcasts signals to the UEs, and each UE acquires their correspond message in the receiv ed signal from BS. In each broadcasting, considering k ∈ { 1 , · · · , K } , n ∈ { 1 , · · · , N } , the coded-modulated signal matrix X = [ x 1 , · · · , x N ] , where x n = [ x 1 ,n , · · · , x K,n ] ⊤ , is modeled by the signal-lev el matrix P with the power constraint T r( PP ⊤ ) ≤ ρ, (1) where ρ represents the transmit power at BS. Each precoded signals s [ n ] from the N antennas of BS is given by s [ n ] = P [ n ] x [ n ] , (2) where x [ n ] = x n = [ x 1 ,n , · · · , x K,n ] ⊤ denotes the K signals at the t -th instant (or sub-carrier) of the block. Regarding the transmission in channel, each s [ n ] is affected by channel coefﬁcient vector h ⊤ i [ n ] and the additive white Gaussian noise (A WGN) z i [ n ] with zero mean and unit variance σ 2 z = 1 , where i represents the index of UE. For a real-valued model, the base-band equi valent recei ved signal of the i -th UE y i [ n ] is calculated by y i [ n ] = h ⊤ i [ n ] s [ n ] + z i [ n ] . (3) Considering the channel coefﬁcient matrix H K × N [ n ] = [ h ⊤ 1 [ n ] , · · · , h ⊤ K [ n ]] ⊤ , the received signals of the K UEs are y [ n ] = H [ n ] s [ n ] + z [ n ] , (4) 5 where z [ n ] denotes the A WGN vector . The transmission signal-to-noise ratio (SNR) is given by ρ . In this paper , we present a real-valued model, and the complex-valued model with ˇ H ∈ C can be represented by a real-valued model of doubled dimension, i.e.,   ℜ{ Y } ℑ{ Y }   =   ℜ{ ˇ H } −ℑ{ ˇ H } ℑ{ ˇ H } ℜ{ ˇ H }     ℜ{ s } ℑ{ s }   +   ℜ{ z } ℑ{ z }   , (5) as treated in [19], [20]. In practice, the channel matrix H has to be estimated at the receiv er for retrieving the transmitted data symbol vector , and imperfect channel estimates arise in any practical estimation scheme [27]. In this paper , we consider the conv entional channel estimations on minimum mean-square error (MMSE) [28] and maximum likelihood (ML) [29]. In the MMSE channel estimation, the channel state information (CSI) with channel estimation error is modeled as [28] H = ˆ H + E , (6) where ˆ H and E ∼ N (0 , σ 2 e I ) denote the estimated channel and its estimation error respecti vely . Under the principle of minimum observation-error distance, ˆ H is independent to E . Regarding ML channel estimation, the channel estimate can be given as [29] ˆ H = H + E , (7) where the practical channel H ∼ N (0 , σ 2 h ) I is independent to E ∼ N (0 , σ 2 e I ) . In ML case, σ 2 e can be calculated by [27] σ 2 e = K σ 2 z N train E train , (8) where N train and E train respectiv ely denote the length of the training sequence and the power of the training symbols. B. Inte ger-F orcing (IF) Pr ecoding Differ to the con ventional linear precoding, IF precoding employs the users’ code word to create an integer-v alued effecti ve channel matrix. By applying in verse linear transforming to correspond the ILC to the desired message of that user, IF precoder can univ ersally achiev e the MIMO capacity to within a constant gap [20], [21]. According to [22], [23], the ﬂowchart of IF precoding system is depicted in Fig. 1. The whole procedure consists of the transmitter , receiv ers, and joint optimization. 1) T ransmitter: Before transmitting, a full rank integer -valued matrix A ∈ Z K × K and a positiv e diagonal matrix D = diag ( d 1 , · · · , d K ) are selected by the joint optimization. Moreo ver , D satisﬁes det( D ) = 1 . (9) In BS, K modulo- p UE-corresponded messages c 1 , c 2 , · · · , c K ∈ Z p are modiﬁed by multiplying a matrix Q − 1 satisﬁed Q − 1 ⊗ Q = I , where Q ≡ ⌊ A ⌉ (mod p ) (10) 6 ... ... Encoder Encoder ... Encoder ... Codeword Precoding ... Downlink Broadcast Channel Joint Optimization Signal Precoding ... Decoder Decoder Decoder ILC Estimation Base Station (T ransmitter) Users (Receivers) Fig. 1. Block diagram of the IF precoding architecture [22], [23] ov er Z p . The outputs ¯ c k = Q − 1 ⊗ c k , k ∈ { 1 , · · · , K } , are mapped into K codewor d-level precoded (CLP) p -P AM symbols x 1 , · · · , x K by x k = 1 γ  ¯ c k − p − 1 2 [1 , · · · , 1] ⊤  ∈ 1 γ  1 − p 2 , · · · , p − 1 2  K . (11) Then, a precoder P handles these CLP streams to the Signal-le vel pr ecoded (SLP) steams s 1 , · · · , s K by Eq. (2), which are then transmitted via the N antennas of BS. According to [20], similar to ZF and RZF , P can be obtained by P = 1 η H ⊤ ( HH ⊤ ) − 1 D A (12) for diagonally-scaled e xact IF (DIF), and P = 1 η H ⊤ ( K ρ I + HH ⊤ ) − 1 D A (13) for regularized IF (RIF). In Eq. (12) and Eq. (13), each entry d k , k ∈ { 1 , · · · , K } denotes the po wer allocated to the k -th CLP stream, and η is a scalar to ensure that the power constraint Eq. (1) is met. Regarding the full column rank channel matrix H in overload MIMO, RIF is suitable since HH ⊤ is not a full-rank matrix but K ρ I + HH ⊤ is. 2) Receivers: According to Eq. (11), the received signal of UE k in Eq. (3) can be written as y k = h ⊤ k s k + z k = h ⊤ k Px k + z k = 1 γ ˜ h ⊤ k ¯ c k + z k − p − 1 2 γ K X l =1 ˜ h k,l | {z } not related to the signal , (14) where ˜ h ⊤ k = h ⊤ k P = [ ˜ h k, 1 , · · · , ˜ h k,N ] . Since ¯ c k is an integer -valued vector , ˜ h ⊤ k ¯ c k can be regarded as the k -th entry of lattice point with basis [ ˜ h ⊤ 1 , · · · , ˜ h ⊤ K ] ⊤ , and y k as a dithering on the i -th lattice point which can be compensated easily . 7 At the recei vers, recei ver k computes the ILC of ¯ c k , written as ˆ s k,n = a ⊤ k ⊗ ¯ c k , k ∈ { 1 , · · · , K } , n ∈ { 1 , · · · , N } , (15) based on the noisy observation of the lattice points in Eq. (14). According to Eq. (10), Eq. (15) can be rewritten as ˆ s k,n = a ⊤ k ⊗ ¯ c k = a ⊤ k ⊗ A − 1 ⊗ c k = ˆ c k . (16) Then, the computed steam of UE- k ˆ s k is shifted into the estimated message ˆ c 1 , · · · , ˆ c K . 3) Joint Optimization: The achiev able rate of IF precoding can be obtained simply by R i ( A ) < ≜ log + 2   1 a i  I − ρ ρ ∥ ˜ h i ∥ 2 +1 ˜ h ⊤ i ˜ h i  a ⊤ i   , (17) where ˜ h ⊤ i = h ⊤ i P and we use the function log + ( x ) = max( x, 0) to ensure that the rate remains non-ne gative. For the high SNR regime [23], IF precoding is optimal and can achie ve a sum rate gi ven by R ( A , D ) = 1 2 K X i =1 log + 2  d 2 i ρ T r ( A ⊤ D ⊤ MD A )  = K 2 log + 2 ρ (det( D )) 2 T r ( A ⊤ D ⊤ MD A ) ! , (18) where M =  HH ⊤  − 1 (19) in DIF precoding, and M =  K ρ I + HH ⊤  − 1 (20) in RIF precoding. The target of ( A , D ) joint optimization is to select a pair of ( A , D ) to maximize Eq. (18). Under the constraint in Eq. (9) and the Monotonic property of log( · ) function, this problem can be expressed to arg min A , D T r  A ⊤ D ⊤ MD A  , (21) subject to A ∈ Z K × K , rank( A ) = K , D ∈ R K × N + , | det( D ) | = 1 . Considering a ﬁx ed D , the optimization of A is equiv alent to minimize T r  A ⊤ D ⊤ MD A  = K X l =1 ∥ M 1 2 Da l ∥ 2 , (22) where a l denotes the l -th column of A . Regarding M 1 2 D as a lattice basis, to minimize Eq. (22) is to solve the well-known shortest independent vector problem (SIVP) [30] in lattice theory . C. Lattice Basis Reduction This paper will use some concepts from lattices to arrive at simpler description of the algorithm and more elegant analysis. Hereby we revie w some basic concepts of lattices and lattice reduction. An M -dimensional lattice in R M is deﬁned as Λ =  Gz | z ∈ Z M  , (23) where the full-ranked matrix G ∈ R M × M represents the generator matrix of Λ . In a lattice Λ , SIVP is one of the critical problem to describe the structure of lattice, which is deﬁned in Deﬁnition 1. 8 Deﬁnition 1 (SIVP [12]) . Given a lattice Λ ∈ R M , ﬁnd M linearly independent vector s v 1 , · · · , v M ∈ Λ , such that ∥ v m ∥ ≤ λ M (Λ) , m ∈ { 1 , · · · , M } (24) In Deﬁnition 1, λ M (Λ) represents the M -th successi ve minima which is gi ven by Deﬁnition 2. Deﬁnition 2 (Successiv e Minima [12]) . Considering a lattice Λ ∈ R M and ∀ i ∈ { 1 , · · · , dim(Λ) } , the i -th minimum λ i (Λ) is deﬁned as the minimum of max 1 ≤ j ≤ i ∥ v j ∥ over all i linearly independent lattice vector v 1 , · · · , v i ∈ Λ T o solve SIVP , many existing methods [13], [31]–[33] hav e been proposed. In this paper, we focus on the con ventional LLL algorithm [31] due to its polynomial complexity . The corresponded metric of LLL algorithm, referred to as LLL reduced basis, is deﬁned in Deﬁnition 3. Deﬁnition 3 (LLL reduced [32]) . Let R be the R matrix of a QR decomposition on G , with elements r i,j ’ s. A basis G is called LLL r educed if the following two conditions hold: i) | r i,j /r i,i | ≤ 1 2 , 1 ≤ i < j ≤ N . (Size Reduction conditions) ii) δ ∥ π ⊥ G ( g i ) ∥ 2 ≤ ∥ π ⊥ G ( g i +1 ) ∥ 2 , 1 ≤ i ≤ N − 1 . (Lov ´ a sz conditions) In Deﬁnition 3, δ ∈  1 4 , 1  is called the Lov ´ a sz constant, and an equiv alently rewritten form of Lov ´ a sz conditions is ∥ g ∗ i +1 ∥ 2 ≤  δ − µ 2 i +1  ∥ g ∗ i ∥ 2 . (25) If G is LLL reduced, it satisﬁes [31] ∥ g i ∥ ≤ β N − 1 λ i (Λ G ) , i ∈ { 1 , · · · , N } , (26) in which β = 1 q δ − 1 4 ∈  2 √ 3 , + ∞  . (27) In the LLL algorithm, summarized in Alg. 1, the operation i and ii can be regarded as two transforming matrix T ( i ) k,i and T ( ii ) i respectiv ely satisfy T ( i ) k,i =   1 − µ k,i 0 1   , T ( ii ) i =   0 1 1 0   , (28) where T ( i ) k,i is implemented on a pair of vector [ r k , r i ] and T ( ii ) i on [ r i − 1 , r i ] . In the LLL algorithm processing, the transforming matrix between the original and LLL reduced basis is a unimodular matrix which is deriv ed by multiplying all T ( i ) k,i and T ( ii ) i continuously under their order in the process [34]. D. The Contraction Mapping Appr oach to P err on-F robenius Theory In this paper , we employs the Perron-Frobenius Theory in the complete metric space of Hilbert metric to guide the algorithm design. Hereby , we revie w some knowledge about the contraction mapping approach [35]–[37] and nonlinear Perron-Frobenius theory [38]. 9 Algorithm 1 The LLL algorithm Input: Lattice basis matrix G , Lov ´ a sz constant δ Output: LLL-reduced basis matrix G ′ , T ransforming unimodular matrix A 1: [ Q , R ] ← qr ( G ) ▷ The QR Decomposition of G 2: [ M , N ] ← size ( G ) 3: i ← 2 4: while i ≤ N do 5: f or k = i − 1 : − 1 : 1 do 6: µ k,i = ⌊ r k,i /r i,i ⌉ ▷ r i,j is the ( i, j ) element of R 7: if µ  = 0 then ▷ Operation i 8: r i ← r i − µ k,i r k ▷ r i is the i -th column of R 9: end if 10: end f or 11: if δ | r i − 1 ,i − 1 | 2 > | r i,i | 2 + | r i − 1 ,i | 2 then 12: α = r i − 1 ,i / q r 2 i − 1 ,i + r 2 i,i 13: β = r i,i / q r 2 i − 1 ,i + r 2 i,i 14: Swap r i − 1 and r i ▷ Operation ii 15: V ←   α β − β α   16:   r ⊤ i − 1 r ⊤ i   ← V   r ⊤ i − 1 r ⊤ i   ▷ r ⊤ i is the i -th row of R 17: [ q i − 1 , q i ] ← [ q i − 1 , q i ] V ⊤ ▷ q i is the i -th column of Q 18: i ← max( i − 1 , 2) 19: else 20: i ← i + 1 21: end if 22: end while 23: G ′ ← QR 24: A ← ⌊ G − 1 G ′ ⌉ ▷ ⌊·⌉ round for all elements In a complete metric space ( X , d ) , a linear transformation T is a contraction when it satisfy ∃ γ ∈ (0 , 1) , d ( Tx , T y ) ≤ γ d ( x , y ) , ∀ x , y ∈ X . (29) For a contraction T , there exists an x 0 ∈ X such that T q x 7→ x 0 for all x ∈ X , where x 0 is referred to as the ﬁxed point for T in ( X , d ) . In direction, the well-known Perron Theorem [36] tells that if T is a positiv e linear transformation on R N , then it has ∃ x 0 , x > 0 , such that (s.t.) , Tx ∥ Tx ∥ → x 0 ∥ x 0 ∥ . (30) 10 In [35], [39], a Hilbert metric d H ( · ) in non-Euclidean geometry is deﬁned as d H ( x , y ) = log  max ( x ⊘ y ) min ( x ⊘ y )  . (31) And it has the following characteristics [37]: i) d H ( x , y ) ≥ 0 , and d H ( x , y ) = 0 if and only if x = c y , c > 0 . ii) d H ( x , y ) = d H ( y , x ) . iii) d H ( x , z ) ≤ d H ( x , y ) d H ( y , z ) . iv) d H ( x , z ) ≤ d H ( x , y ) + d H ( y , z ) . v) ∀ a, b > 0 , d H ( a x , b y ) = d H ( x , y ) . Regarding the Hilbert metric, the Perron Theorem still works, referring to as the Birkhoff-Hopf theorem, denoted by d H ( Tx , T y ) ≤ γ d H ( x , y ) , ∀ x , y ∈ X , (32) where ∃ γ ∈ (0 , 1) , and T is non-positiv e. The contraction ratio κ ( T ) of a linear operator T is deﬁned in [38] as κ ( T ) ≜ inf { γ ≥ 0 | d H ( Tx , T y ) ≤ γ d H ( x , y ) , ∀ x , y ∈ X } . (33) In [35], it is proved that κ ( T ) = tanh  ∆( T ) 4  , (34) where the projecti ve diameter ∆( T ) of T is deﬁned by ∆( T ) ≜ sup { d H ( Tx , T y ) | x , y ∈ X } . (35) I I I . G E O M E T R I C S T R U C T U R E O F T H E S O L U T I O N S PAC E Since D = diag( d 1 , · · · , d K ) is a positiv e diagonal matrix with det( D ) = 1 , the set of all possible D is the positiv e diagonal subgroup of special linear group SLD + K ( R ) . T o express D in simple, we deﬁne its main diagonal as a v ector d by d = [ d 1 , · · · , d K ] ⊤ , (36) where 1 K = [1 , · · · , 1] ⊤ . Since the elements { d k } K k =1 are all positiv e, following the constraint in Eq. (9), d satisﬁes Q K k =1 d k = 1 , or P K k =1 log( d k ) = 0 . Regarding { d k } K k =1 as the positive region coordinates of a K dimensional space D , the set of all possible d for Eq. (21) is a K − 1 dimensional subspace Ω represented by Ω = ( d | K Y k =1 d k = 1 , d k > 0 , k ∈ 1 , · · · , K ) . (37) In this section, we will elaborate a generalized optimization model for IF precoding with the geometric features of D . 11 A. General Model of Joint Optimization According to Eq. (22), we can reg ard A as a function related to d , referring to as A ( d ) . T o correspond D , the expression in Eq. (21) can be rewritten as T r  A ( d ) ⊤ D ⊤ MD A ( d )  = d ⊤  A ( d ) A ( d ) ⊤  ◦ M  d . (38) Thus, the problem in (21) is equi valent to arg min d d ⊤  A ( d ) A ( d ) ⊤  ◦ M  d , (39) subject to d ∈ Ω . Considering the interference coupling matrix G =  AA ⊤  ◦ M , since M is known, G is also a function related to d , denoted as G ( d ) . The Lagrange function of Eq. (39) is represented as L ( d , α ) = d ⊤ G ( d ) d + α K X i =1 log( d i ) . (40) Let ▽ L ( d , α ) = ∂ L ( d ) ∂ d = 0 K , where 0 K = [0 , · · · , 0] ⊤ , a system of equations to obtain the optimal of Eq. (40) can be established as            2 G ( d ) d + d ⊤  ∂ G ( d ) ∂ d  d + α ( 1 K ⊘ d ) = 0 K ; K X i =1 log( d i ) = 0 . (41) T o solve Eq. (41), it is important to know the mapping d 7→ A and A 7→ d . By observing Eq. (22), d 7→ A is related to solv e SIVP , which can be calculated by A =  M 1 2 diag( d )  − 1 Ψ  M 1 2 diag( d )  , (42) where function Ψ( · ) represents the process on solving SIVP . Under a constant M , d maps to D one-by-one, and affects A by scaling the basis vector in generator matrix M 1 2 . Re garding the continuity on d 7→ A , since SIVP is NP-hard [30] and the solving algorithms (LLL, SD, etc.) are iterativ e, it is hard to formulate A 7→ d directly , but some law is easily obtained on the A derived by a vector and its neighborhood in Ω . T o describe the similarity between two matrices, we apply the spectral norm ∥ · ∥ 2 [40] which is deﬁned as ∥ D ∥ 2 ≜ sup x  = 0 K ∥ Dx ∥ 2 ∥ x ∥ 2 . (43) ∥ · ∥ 2 is a matrix norm compatible with the v ector 2-norm. Then, we ha ve a theorem about two similar matrices. Theorem 1. Considering a non-singular generator matrix G A of lattice Λ A and a positive diagonal matrix D A with constraint det( D A ) = 1 , the pr oduct G B = G A D A is the generator matrix of lattice Λ B , and the solution set of Ψ( G A ) and Ψ( G B ) ar e denoted as S A and S B r espectively . When D A is similar to I , i.e., for a minimum value ϵ > 0 , ∥ D A − I ∥ 2 satisﬁes ∥ D A − I ∥ 2 < ϵ, (44) 12 and | λ K (Λ A ) − λ K (Λ B ) | < ϵ, (45) so we have ∀ R B = G B U ∈ S B , ∃ R A = G A V ∈ S A , s.t. , U = V . (46) Pr oof. See Appendix A. According to Theorem 1, when d mov es at a minimal scale in Ω , A may remain the same before and after moving only if the successive minima remain continuity . In general, it is hard to delimit when A is shifted during d changing, so we take a sample on LLL algorithm in a 2 -dimensional basis matrix M ′ 2 = M 1 2 2 . W e will introduce this sample in the size reduction condition and Lov ´ a sz condition. • Size r eduction condition: Considering the QR decomposition M ′ 2 = Q 2 R 2 = Q 2   r 1 , 1 r 1 , 2 0 r 2 , 2   . (47) According to Alg. 1, the coefﬁcient of size reduction µ = ⌊ r 1 , 2 r 1 , 1 ⌉ when D 2 = I 2 . For a changed D 2 = diag( d 1 , d 2 ) , the changed coefﬁcient of size reduction µ ′ is gi ven by µ ′ =  d 2 r 1 , 2 d 1 r 1 , 1  d 1 = 1 d 2 =  µ +  ( d 2 2 − 1) r 1 , 2 r 1 , 1  . (48) When j ( d 2 2 − 1) r 1 , 2 r 1 , 1 m  = 0 , µ ′  = µ , such that A 2 has been changed. • Lov ´ a sz condition: According to Eq. (25), the sw apping between two vector in M ′ 2 is related to µ . Since µ has been changed under different D 2 in size reduction, the number of iterations becomes unknown for the algorithm process in practical, which triggers a sudden changes on Ψ( M ′ 2 D 2 ) and leads to a different A 2 . Based on the above content, we learn that the mapping in Eq. (42) is non-surjective, which reveals the mapping A 7→ d is one-to-man y . It explains a possible that each ﬁxed A corresponds to a region P ( A ) of Ω , and these regions form Ω without overlap. W e can solve Eq. (41) by dividing Ω into multiple region represented by different A . Based on Theorem 1, we hav e the Theorem 2 for its computational complexity . Theorem 2. Solving Eq. (41) is at least NP-har d. Pr oof. Based on the non-surjectiv e in mapping Eq. (42), T o solve Eq. (41) is equiv alent to solve four continuous subproblems: SP1 W ithout the range constraint of a region P ( A i ) , ho w to deriv e the optimal D of that ﬁxed A i ? SP2 W ith the range constraint of a region P ( A i ) , is that optimal D is within P ( A i ) ? SP3 The region P ( A i ) where the optimal D within P ( A i ) is referred to as a candidate P , how to select all candidate P from all P ( A ) ? SP4 How to select the optimal P from all candidate P ? 13 Considering SP2, an effecti ve method is to calculate the A by the deriv ed optimal D , and compare the calculated A with the original A i . The process of this method is equi valent to solv e SIVP which is NP-hard [30]. Assuming the solutions of SP1, SP3 and SP4 are both polynomial, the whole computational complexity can be derived as T whole = T SP1 ⊕ T SP2 ⊕ T SP3 ⊕ T SP4 = P ⊕ NP-hard ⊕ P ⊕ P = NP-hard . (49) Thus, solving Eq. (41) is at least NP-hard, the theorem is prov ed. Remark 1. The same unpro ven statement in Theor em 2 is mentioned in [20], [22], [25]. Based on Theor em 2, the closed-form solution of Eq. (41) is impossible to obtain unless P = N P , which leads us to achieve an appr oximate solution balanced between complexity and accuracy . In the four subpr oblems in Theor em 2, SP1’s solution is polynomial as the mention in Sect. III-B. Meanwhile, solving SP4 under SP3’ s solution is equivalent to solve a sequencing pr oblem which is polynomial. The SIVP solver for SP2 is well-studied in [32], [33], [41]. Thus, the difﬁculty to solve Eq. (41) is how to solve SP3 and SP4. B. Model on F ixed A Regarding a region represented by a ﬁxed A , the mapping d 7→ A can be ignored since A is reg arded as a constant integer -value matrix. In this situation, the problem of ﬁnding a optimal d is denoted by arg min d d ⊤  AA ⊤  ◦ M  d , (50) subject to d ∈ Ω , where the interference coupling matrix G =  AA ⊤  ◦ M is constant in this situation. The follo wing lemma for the con vexity of Eq. (50). Lemma 1. Given a ﬁxed A , Eq. (50) is a con vex problem if and only if any of the following conditions are satisﬁed: i) N ≥ K in DIF , ii) any H K × N in RIF . Pr oof. Since the dif ference of M in DIF and RIF , we will elaborate these two situations respecti vely . • DIF: According to Eq. (19), M = ( HH ⊤ ) − 1 . According to the features of Gram matrix, HH ⊤ is semi-positiv e deﬁnite matrix (SPDM), and AA ⊤ is positi ve deﬁnite matrix (PDM) since A is full-rank. When N ≥ K , ( HH ⊤ ) − 1 exists and is also SPDM. Based on Schur Product Theorem,  AA ⊤  ◦ M is SPDM, which rev eals that the tar get function in DIF is conv ex. • RIF: Reg arding the singular value decomposition (SVD) of HH ⊤ = QΛQ ⊤ , we ha ve K ρ I + HH ⊤ = Q  K ρ I + Λ  Q ⊤ . (51) Since K ρ > 0 , K ρ I + HH ⊤ is a PDM. Similar to DIF , M =  K ρ I + HH ⊤  − 1 is the PDM, and  AA ⊤  ◦ M is also the PDM. Thus, the tar get function in RIF is con vex. 14 Combining the abo ve two points, d ⊤  AA ⊤  ◦ M  d is con vex when N ≥ K in DIF and an y H K × N in RIF . The lemma has been proved. T o solve Eq. (50), we establish the Lagrange function to L ﬁx ( d , α ) = d ⊤ Gd + α K X i =1 log( d i ) , (52) and solve ▽ L ﬁx ( d , α ) = 0 K yielding to          ▽ L ﬁx ( d , α ) = 2 Gd + α ( 1 K ⊘ d ) = 0 K ; K X i =1 log( d i ) = 0 . (53) For ▽ L ﬁx ( d , α ) , since d ⊤ 0 K = 0 and d ⊤ ( 1 K ⊘ d ) = K , Eq. (53) is equi valent to          d ⊤ Gd = − αK 2 ; K X i =1 log( d i ) = 0 . (54) Based on Lemma 1 that G is a PDM, the quadratic of G remains positi ve, indicating that α is ne gati ve. Thus, d ⊤ Gd = | α | K 2 can be regarded as a hyperellipsoid in D conducted from d ⊤ Gd = 1 by scaling | α | K 2 . Furthermore, considering a kind of decomposition that G = T ⊤ T , d ⊤ Gd = | α | K 2 can be formulated as d ⊤ Gd = | α | K 2 ⇒ ∥ Td ∥ 2 = | α | K 2 . (55) In Eq. (50), we want to minimize d ⊤ Gd in D . According to Eq. (55), the minimum value is | α | K 2 which is a scale on d ⊤ Gd = 1 . So the problem in Eq. (50) is referred to ﬁnd a minimum scaling value such that the surf ace of hypersphere ∥ Td ∥ 2 = 1 reach the origin of D , which is equiv alent to ﬁnd the minimum vector Td on the surface of hypersphere ∥ Td ∥ 2 = 1 . Thus, a generalized optimization model in ﬁx ed A is established and represented by          ∥ Td ∥ 2 = | α | K 2 ; K X i =1 log( d i ) = 0 . (56) Considering SVD G = T ⊤ T = QΛQ ⊤ and Lower -Diagonal-Lower Transpose (LDL) OGO ⊤ = OT ⊤ TO ⊤ = LΣL ⊤ [25], Fig. 2 depicts some samples in 3 -dimension on Eq. (56), in which SNR= 15 dB and H =      0 . 7826 0 . 6097 0 . 7154 1 . 8776 1 . 8774 0 . 1899 1 . 8507 0 . 0694 0 . 3630      . From the observ ations of Fig. 2 (a) and (b), the shape of the region transformed by T (colored in red) has been changed compared to the re gion of Ω (colored in blue), which causes the extreme for optimization (reconstructed from the shortest vector in red region) offset from the shortest vector in Ω . The deformation on transformed region is caused by two reasons: 15 Origin Shortest vector in Extreme for optimization Shortest vector in (a) T = Λ 1 2 Q ⊤ in SVD Origin Shortest vector in Extreme for optimization Shortest vector in (b) T = Σ 1 2 L ⊤ O in LDL Origin Shortest vector in Extreme for optimization Shortest vector in (c) T = Σ 1 2 in [25] Fig. 2. 3 -dimensional sample for generalized optimization model in ﬁxed A i) The rotating caused by unitary matrix changes the spatial location of region, which causes the scaling affected by diagonal matrix partially changes to a folding effects. ii) the scaling af fected by diagonal matrix makes the boss of Ω tilts to along a certain direction. Meanwhile, considering LDL and L ⊤ O = I , the generalized model in Eq. (56) becomes the method proposed in [25], which is depicted in Fig. 2 (c). From that ﬁgure, we can observe that the consideration L ⊤ O = I ignores the rotating caused by unitary matrix, which causes an obvious de viation from the practical optimum. Remark 2. Since the problem in Eq. (50) is equivalent to ﬁnd the minimum vector Td on the surface of hypersphere ∥ Td ∥ 2 = 1 , the points in that hyperspher e can be also r egar ded as the alternative in d selecting. It r eﬂects that the constraint Q K i =1 d i ≤ 1 is the same as Q K i =1 d i = 1 in geometric perspective, and obtain the same solution. Due to the equivalence on Q K i =1 d i ≤ 1 and P K i =1 log( d i ) , the constrain in Eq. (50) can be shifted to con vex in geometric perspective, and the problem in Eq. (50) is conve x combining with Lemma 1. Comprehensiv e the abov e contents, we hav e Theorem 3 for the characteristic of solution set in Eq. (41). Theorem 3. The solution set S of Eq. (41) is a ﬁnite nonempty set in D . Pr oof. See Appendix B. Remark 3. If we r egar d Ω as a set of P ( A ) , then sear ching for the optimal extr emum on Ω can be interpr eted as a sear ch over the elements of S . Accor ding to Theorem 3, S is a pr oper subset of the r eal number space R K , which implies that the searc h space can be r educed and the time complexity lower ed when S is known, either completely or in part. If S is fully known, the optimal extr emum can be selected from the extr ema corresponding to all P ( A ) associated with S ; if only partially known, a sub-optimal extr emum can be obtained. I V . O P T I M A L I F P R E C O D E R D E S I G N T o obtain the optimal IF precoder from the proposed generalized model, we proposes a method to optimize ( A , D ) adaptiv e to the channel matrix H and SNR, referring to as Multi-Cone Nested Stochastic Pattern Search 16 (MCN-SPS). In this section, we introduce this method on the optimization for ﬁxed A (Solving SP1), alternating optimization for ( A , D ) (Solving SP3) and the stochastic pattern. A. Optimization for F ixed A T o solve Eq. (56), from Fig. 2, we ﬁnd that the shape of Ω forms a cone in the holopositiv e region of D . W e hav e the following lemma for this hypothesis. Lemma 2. T o Ω , any ray started fr om the origin and towar ds D ’s holopositive r e gion D + have and only have one intersection. F or the space B transformed by B fr om D , this characteristic holds true for Ω 2 = n B ( 1 K ⊘ d ) | Q K i =1 d i = 1 , d i > 0 , i ∈ 1 , · · · , K o , wher e B r epr esents arbitrarily non-ne gative matrix. Pr oof. W e prove Lemma 2 by using contradiction. Firstly , we assume ∃ r = [ r 1 , · · · , r K − 1 , β ] ⊤ ∈ D + s.t., r / ∈ Ω , and , Br / ∈ Ω 2 . (57) Since d i > 0 , i ∈ 1 , · · · , K , it is impossible to ﬁnd a r in the holopositiv e region of D with β < 0 . Thus, r must belong to Ω , and Br / ∈ Ω 2 . In other word, an y ray toward the holopositive region of D must hav e intersection to Ω , and so as for B to Ω 2 . Regarding the uniqueness of intersection, since there are two propositions in this lemma, we will prove them respectiv ely . • Ω : Assuming that there are two intersection for a ray to Ω , i.e., ∀ d 1 = [ d 1 , 1 , · · · , d 1 ,K ] ⊤ , d 2 = η d 1 ∈ Ω , η  = 1 , s.t. , K Y i =1 d 1 ,i = 1 , K Y i =1 d 2 ,i = 1 . (58) Howe ver , due to K Y i =1 d 2 ,i = η K K Y i =1 d 1 ,i = η K , (59) it is contradictory to Eq. (58), and any intersection number more than 2 is contradictory similarly for the same reason. Thus, there is only one intersection between any ray to ward the holopositi ve region of D and Ω . • Ω 2 : For any d ∈ Ω , when Q K i =1 d i = 1 , we ha ve K Y i =1 1 d i = 1 (60) so the set Ω 3 = ( ( 1 K ⊘ d ) | K Y i =1 d i = 1 , d i > 0 , i ∈ 1 , · · · , K ) . (61) corresponds to Ω one-by-one. Similarly , when we assume that ∀ x 1 , x 2 = η x 1 ∈ Ω 2 , η  = 1 , s.t. , K Y i =1  B − 1 x 1  i = 1 , K Y i =1  B − 1 x 2  i = 1 , (62) it is contradictory since K Y i =1  B − 1 x 2  i = η K K Y i =1  B − 1 x 1  i = η K . (63) 17 According the one-by-one corresponding, there is only one intersection between any ray tow ard the holopositiv e region of B and Ω 2 . Combining aforementioned contents, the lemma has been pro ved. The Lemma 2 reveals that each d ∈ Ω corresponds to the ray toward the holopositiv e region of D , which inspire us that to locate a d only require the direction of d rather than its length. Based on this inspiration, obtaining the optimal d only requires the proportion of each d i . According to Eq. (53), we ha ve          2 Gd + α ( 1 K ⊘ d ) = 0 K K X i =1 log( d i ) = 0 ⇒                          d 1 g ⊤ 1 d = − α 2 · · · d K g ⊤ K d = − α 2 K X i =1 log( d i ) = 0 , (64) where g i , i ∈ { 1 , · · · , K } represents the row of G . From Eq. (64), we ﬁnd d 1 g ⊤ 1 d = · · · = d K g ⊤ K d = − α/ 2 , alternativ ely , d i d j = g ⊤ j d g ⊤ i d , i  = j. (65) By conducting the auxiliary matrix S =      d 1 d 1 · · · d 1 d K · · · . . . · · · d K d 1 · · · d K d K      , (66) the proportion of each d i can be calculated by d j P K i =1 d i =  S ⊤ j 1 K  − 1 =  d 1 d j + · · · + d K d j  − 1 = g ⊤ j d g ⊤ 1 d + · · · + g ⊤ j d g ⊤ K d ! − 1 = 1 g ⊤ j d K X i =1 1 g ⊤ i d ! − 1 . (67) Since  P K i =1 1 g ⊤ i d  − 1 is a constant only corresponding to G , we derive that d opt ∝ " 1 g ⊤ j d # j ∈{ 1 , ··· ,K } . (68) Thus, reg arding the direction of d , the equation system for d optimization is gi ven by          Gd ∝ 1 K ⊘ d ; K Y i =1 d i = 1 . (69) T o solve Eq. (69), we can recognize it as a special case of matrix balancing problem. Here we have the following theorem about this equiv alence. Theorem 4. The constraint Q K i =1 d i = 1 can be expr essed by the scaling function Γ( x ) = ϑ ( x ) x , ϑ ( x ) ≜ 1 Q K i =1 x 1 K i , x = [ x 1 , · · · , x K ] ⊤ . (70) 18 Let G as an M matrix [42]. The pr oblem in ﬁxed A is equivalent to an unilateral matrix balancing pr oblem with r ow sum constraints. That is, we seek a diagonal matrix C such that the matrix B = ϑ ( G − 1 C · 1 K ) CG − 1 C is r ow stochastic, i.e., it satisfy [43]: B · 1 K = 1 K . (71) Pr oof. Due to the constrain Q K i =1 d i = 1 , Eq. (69) can be merged into d = Γ( G − 1 ( 1 K ⊘ d )) = ϑ ( G − 1 ( 1 K ⊘ d )) G − 1 ( 1 K ⊘ d ) . (72) The abov e equation can be reformulated as d = ϑ ( G − 1 ( 1 K ⊘ d )) G − 1 ( 1 K ⊘ d ) ⇒ D · 1 K = ϑ ( G − 1 D − 1 · 1 K ) G − 1 D − 1 · 1 K ⇒ ϑ ( G − 1 D − 1 · 1 K ) D − 1 G − 1 D − 1 · 1 K = 1 K . (73) Considering C = D − 1 , we ha ve ϑ ( G − 1 C · 1 K ) CG − 1 C · 1 K = 1 K . (74) Since G is an M matrix and is PDM in Lemma 1, G − 1 must be a non-negati ve matrix according to [42]. Considering the matrix B = ϑ ( G − 1 C · 1 K ) CG − 1 C , Eq. (74) can be reformulated as B · 1 K = 1 K . As the non-negati ve characteristic of G − 1 and D , B is a non-negati ve matrix, which also be a row stochastic matrix because the deﬁnition in [43]. W ith Theorem 4, we learn that the D optimization in ﬁxed A can be re garded as an unilateral matrix balancing problem with ro w sum constraints. Inspired by the Sinkhorn iteration [44], the function is formulated as d ( t +1) = Γ  G − 1  1 K ⊘ d ( t )  , (75) where Γ( x ) = x Q K i =1 x 1 K i , x = [ x 1 , · · · , x K ] ⊤ is a normalized function to satisfy the constraint Q K i =1 d i = 1 and t denotes the inde x of iteration. The con ver gence of Eq. (75) is proved in the next theorem. Theorem 5. T owards the p -P AM symbols in Rayleigh channel, if AA ⊤ is non-ne gative, Eq. (75) is conver ged in dir ection, i.e., d ( t ) t →∞ ∝ Γ  G − 1  1 K ⊘ d ( t )  . (76) The afor ementioned conclusion holds true in RIF , and in DIF when N ≥ K . Pr oof. See Appendix C. Besides the con vergence, Theorem 4 and Theorem 5 re veal some features of Eq. (75): i) When G is a M matrix, its in verse G − 1 is a non-negati ve matrix, ensuring that the d ( t ) in t -th iteration is belong to Ω without any modifying. 19 Algorithm 2 Reciprocal Approximation (RA) Input: Channel coef ﬁcient matrix H , SNR ρ , integer coef ﬁcient matrix A , tolerance ϵ Output: Optimal po wer allocation matrix D 1: [ N , K ] ← size ( H ) 2: M ←  K ρ I + HH ⊤  − 1 3: G ← ( AA ⊤ ) ◦ M 4: D ← Init ( G ) ▷ Initialize diagonal matrix 5: ang ← + ∞ 6: while ang > ϵ do 7: d old ← D1 K 8: d new ← Γ( G − 1 ( 1 K ⊘ d old )) 9: D ← diag ( d new ) 10: ang ← log  max( d new ⊘ d old ) min( d new ⊘ d old )  11: end while ii) When G is not a M matrix, according to the Perron-Frobenius Theory with Hilbert metric, the constriction factor γ enlarges and, in the last iteration, remains 1 rather than an y values greater than 1 , which ﬁxes the ray in two near optimal directions. This feature ensures the effecti veness of optimization. The optimization under ﬁxed A can be summarized by Algorithm 2, referred to as Reciprocal Approximation (RA). In this algorithm, Eq. (75) is applied iteratively to transform the reciprocal space into D , continuously contracting the solution toward a ﬁxed point within D . The process completes when the Hilbert metric between successiv e iterates becomes smaller than a gi ven threshold, and the optimal po wer allocation matrix D is returned. Regarding Eq. (75), based on Theorem 4, there are sev eral conditions to ensure the effecti veness of non-negati ve matrix constraint for matrix balancing problem: i) channel hardening effect happening: When N ≫ K , according to random matrix theory [45], the Gram matrix HH ⊤ can be regarded as a diagonal matrix. Since the positiv e deﬁniteness proved in Lemma 1, the main diagonal elements of M = [ m i,j ] i,j ∈{ 1 , ··· ,K } is positi ve and the others are zero, which is represented by m i,j =      m i,j > 0 , i = j ; 0 , i  = j. (77) Then, since AA ⊤ is SPDM, we have g i,j =      a ′ i,j × m i,j , i = j ; 0 , i  = j. (78) where a ′ i,j denotes the { i, j } elements of AA ⊤ and g i,j the { i, j } elements of G . Eq. (78) reﬂects that G is a M matrix [42] whose main diagonal elements is non-negati ve and the others non-positiv e. Combining the properties of 20 M Matrices [42] and G ’ s positi ve deﬁniteness prov ed in Lemma 1, we learn that G − 1 is a positi ve transformation and conforms to the famous Perron Theorem [36]. W e ha ve the following theorem for the approximate solution of Eq. (69) when M is a diagonal matrix. Theorem 6. When M is a diagonal matrix, a solution of Eq. (69) is given by D opt = L 1 2 det( L ) 1 2 K , (79) wher e L = diag   AA ⊤  ◦ M  − 1 · 1 K  . (80) Pr oof. Since M is a diagonal matrix, it follo ws Eq. (77). By substituting Eq. (78) into Eq. (69), we ha ve                           d 1 g 1 , 1 · · · 0 · · · . . . · · · 0 · · · d K g K,K      1 K ∝      d − 1 1 · · · 0 · · · . . . · · · 0 · · · d − 1 K      1 K ; K Y i =1 d i = 1 , i ∈ { 1 , · · · , K } . (81) After solving Eq. (81), we can deri ve that                      d 1 = K Y i =1 g − 1 2 K i,i ! g − 1 2 1 , 1 ; · · · d 1 = K Y i =1 g − 1 2 K i,i ! g − 1 2 K,K , ⇒ D opt =  diag  G − 1 · 1 K  1 2 det(diag ( G − 1 · 1 K )) 1 2 K . (82) Since G =  AA ⊤  ◦ M , the theorem is proved. ii) DIF with LLL algorithm: When N ≥ K , DIF employs M = ( HH ⊤ ) − 1 which is a M matrix according to Appendix C. Since H is with Rayleigh distribution, M is not a sparse matrix which causes period-2 oscillation in iterative. According to Lemma 3, the transforming matrix to reduce the basis M 1 2 D in size reduction condition is non-negati ve. Since the swapping matrix in Lov ´ a sz condition is non-negativ e too, the unimodular matrix A outputted by LLL algorithm in this situation is non-negati ve, which meets to the condition that AA ⊤ is non- negati ve in Theorem 5. Lemma 3. In LLL algorithm, the coefﬁcients µ i,j of size r eduction is always non-positive in iteration when the inputted basis B = [ b 1 , · · · , b K ] satisﬁes that B ⊤ B is a dense M matrix. Pr oof. W e proof this lemma by induction. Since BB ⊤ is a dense M matrix, its non-major diagonal elements are non-positiv e, which means b ⊤ i b j ≤ 0 , i  = j. (83) 21 (a) SNR= 0 dB (b) SNR= 5 dB (c) SNR= 10 dB (d) SNR= 15 dB Fig. 3. 3 -dimensional sample for the con vergence of Eq. (75) versus SNR After Gram-Schmidt Orthogonalization step in LLL algorithm, we obtain the orthogonal vector b ∗ 1 , · · · , b ∗ K , and b ∗ 1 = b 1 . When i = 2 , µ 2 , 1 is gi ven by µ 2 , 1 = b ⊤ 2 b ∗ 1 ( b ∗ 1 ) ⊤ b ∗ 1 = b ⊤ 2 b 1 b ⊤ 1 b 1 ≤ 0 . (84) By assuming ∀ j < p < i, µ p,j ≤ 0 , for i > j , we hav e b ⊤ i b ∗ j = b ⊤ i b j − j − 1 X p =1 µ p,j  b ⊤ i b ∗ p  . (85) Since µ p,j ≤ 0 and b ⊤ i b ∗ p ≤ 0 , P j − 1 p =1 µ p,j  b ⊤ i b ∗ p  ≥ 0 , so b ⊤ i b ∗ j ≤ b ⊤ i b j ≤ 0 . Thus, we ha ve µ i,j = b ⊤ i b ∗ j ( b ∗ j ) ⊤ b ∗ j ≤ 0 . (86) The lemma is proved. iii) RIF with LLL algorithm in high SNR: When SNR is higher enough, K ρ I is too small to be ignored, which rev eals M =  K ρ I + HH ⊤  − 1 ρ → + ∞ →  HH ⊤  − 1 . (87) 22 Algorithm 3 Alternating Optimization (A O) Input: Initial po wer allocation matrix D init , channel matrix H , SNR ρ . Output: Sub-optimal po wer allocation matrix D , Sub-optimal integer matrix A . 1: A tmp ← I 2: M ←  K ρ I + HH ⊤  − 1 3: Obtain A from D init via Eq. (42) 4: iter ← 0 5: while A tmp == A do 6: D ← RA( H , A , ρ ) ▷ Solve for D with ﬁx ed A 7: A tmp ← A 8: Obtain A from D via Eq. (42) 9: iter ← iter + 1 10: if iter > max iter then 11: break 12: end if 13: end while As the same reason in case ii, we learn that the unimodular matrix A outputted by LLL algorithm in this situation is non-negati ve, which meets to the condition in Theorem 5. T o explore how SNR affects the con ver gence, we present a 3 -dimensional sample in H =      0 . 5472 0 . 1643 0 . 4058 1 . 2850 0 . 6481 1 . 1394 0 . 5685 1 . 1173 0 . 7982      . The observations on Fig. 3 rev eal the fact that the point distribution affected by Eq. (75) shrinks during the SNR enlarging, and Eq. (75) mostly con verge in direction when SNR ≥ 5 dB. B. Alternating Optimization for ( A , D ) Having addressed the optimization of D with a ﬁxed A , we now consider the joint optimization of ( A , D ) . While an optimal d opt can be found for a gi ven A , it is important to note that not all such d opt lies in the dependent set P ( A ) . A standard approach to ﬁnd a d opt ∈ P ( A ) is to compute the integer matrix A ′ associated with d opt via an SIVP solver and check if A ′ matches the original A . A k ey challenge in this joint optimization arises when the current d opt / ∈ P ( A ) : determining the ne xt starting point for the search. W e tackle this challenge using an alternating optimization (A O) frame work. The algorithm is designed with a greedy , proximity-based strategy: the integer matrix A obtained in one iteration directly initializes the next. Each A O iteration executes two consecutiv e steps: (S1) Given D , update A optimally via the SIVP solver in Eq. (42); (S2) Giv en this A , update D optimally via Alg. 2. This alternating process conv erges to a stationary point ( A , D ) . 23 Hypersphere Constructing Random-directional Surface Searching Multi-Cone Optimization Radius contraction Output No Y es Random-directional ray constructing AO algorithm Sum-rate Comparison Origin Shifting Fig. 4. Flo wchart of the MCN-SPS method Furthermore, because each step is optimal conditional on the other variable, the achiev able rate R satisﬁes the following non-decreasing property at iteration t : R  A ( t ) , D ( t )  ≤ R  A ( t ) , D opt ( t )  ≤ R  A opt ( t ) , D ( t ) opt  = R  A ( t +1) , D ( t +1)  . (88) The iteration terminates when A ( t ) =  M 1 2 D ( t )  − 1 Ψ  M 1 2 D ( t )  . (89) The ov erall procedure is summarized in Alg. 3. C. Multi-Cone Nested Stoc hastic P attern Searc h (MCN-SPS) If Alg. 3 consistently conv erged to the unique optimal point d opt ∈ P ( A ) within Ω , then the optimal pair ( A , D ) could be readily obtained. Ho wever , as indicated by Property 3, this condition is not always met, since multiple disjoint regions may contain such optimal points. T o ensure con vergence to a globally optimal pair ( A , D ) among these candidate solutions, we propose a method termed the Multi-Cone Nested Stochastic Pattern Search (MCN-SPS). Fig. 4 illustrates the ﬂowchart of the proposed method, which consists of three core processes: search preparation, search and optimization execution, and result integration and adjustment. These three processes are described in detail below . 24 i) Search preparation: By centering at a d ( t ) ∈ Ω in the t -th iteration, a hypersphere B ( d ( t ) , r ) is constructed with a setting radius r . Starting from d ( t ) , q rays are emitted in stochastic direction, and cross the surface of B ( d ( t ) , r ) at ˆ d ( t ) 1 , · · · , ˆ d ( t ) Q . The i -th intersection ˆ d ( t ) i can be calculated by ˆ d ( t ) i = d ( t ) + r ˆ v i , (90) where ˆ v i represents the unit vector in the direction of i -th ray . After the movement of r ˆ v i , some ˆ d ( t ) i may leav e the holopositiv e region of D . T o limit in d i > 0 , i ∈ { 1 , · · · , K } , the escaped ˆ d ( t ) i will be abandoned, and we choose the nov el stochastic rays in loop until all vector are limited in the holopositi ve region of D . ii) Sear ch and Optimization execution: After the search space constructing, we correct the intersections ˆ d ( t ) 1 , · · · , ˆ d ( t ) Q to satisfy the constraint by Γ( · ) . Then, these corrected nodes are inputted into Alg. 3 in parallel to obtain their nearby sub-optimal nodes in Ω which satisﬁes Eq. (89). These nodes, represented as ˆ o ( t ) i , i ∈ 1 , · · · , Q , are the nearby or coincide nodes of d ( t ) satisﬁed d opt ∈ P ( A ) . T o reduce complexity for comparison, we merge the coincide nodes to one node. iii) Result integration and adjustment: In this operation, we compare the sum rate between d ( t ) and its nearby ˆ o ( t ) i , i ∈ 1 , · · · , Q , and adjust the ne w center in next iteration consisted of tw o cases: i) If any ˆ o ( t ) i , i ∈ 1 , · · · , Q has the max sum rate, the optimal node is probably located at the vicinity of this nodes. Thus it become the origin of ne xt iteration without other adjustment. ii) If d ( t ) has the max sum rate compared to ˆ o ( t ) i , i ∈ 1 , · · · , Q , the optimal node seems locate at a region more closer to d ( t ) . In this case, the radius r is too large to search the nearer region of d ( t ) , so we contract the radius to r/ 2 and put it into the next iteration. The iteration is completed when the radius is smaller than a predeﬁned threshold ϵ . The o verall procedure is summarized in Alg. 4. Throughout the entire method, the comparisons are implemented in the nodes satisﬁed d opt ∈ P ( A ) which is a ﬁnite nonempty discrete set in D according to Property 3, the search space in the proposed method is smaller than the whole continuous space which is employed in the con ventional heuristic algorithm. Thus we achie ve a lo wer complexity with the proposed method, which is demonstrated in the follo wing section. D. Joint Optimization in Imperfect CSI According to [20, Theorem 1], DIF precoding achie ves maximum spatial multiplexing gain if and only if HP = DA . (91) Considering the channel estimation in Eq. (6) and Eq. (7), this principle becomes to ( ˆ H ± E ) P = D A . (92) W e can deri ve the estimation-error-corresponded precoder P = ( ˆ H ± E ) † D A , (93) 25 Algorithm 4 MCN-SPS method Input: Initial po wer allocation matrix c , channel matrix H , SNR ρ , Initial step size r 0 Output: Optimal po wer allocation matrix D , Optimal integer matrix A . 1: [ N , K ] ← size( H ) 2: D ← AO( H , D init , ρ ) 3: r ← r 0 4: Obtain A from D via Eq. (42) 5: A tmp ← A 6: while r > ϵ do 7: D ← AO( H , D , ρ ) 8: D tmp ← D 9: f or i = 1 : max router do ▷ parallel processing 10: v ← randn(1 , K ) 11: v ← v ∥ v ∥ 12: ˆ D i ← diag (diag + r v ) 13: ˆ O i ← AO( H , ˆ D i , ρ ) 14: Obtain ˆ A i from ˆ O i via Eq. (42) 15: if R ( ˆ A i , ˆ O i ) > R ( A , D ) then 16: D tmp ← ˆ O i 17: A tmp ← ˆ A i 18: end if 19: end f or 20: if D tmp == D then 21: r ← r 2 22: end if 23: D ← D tmp 24: A ← A tmp 25: end while where the term ’ † ’ represents pseudo in verse. Ho wever , since E ∼ N (0 , σ 2 e I ) is random and σ 2 e is kno wn, we can estimate a practical precoder under the kno wn σ 2 e via the MMSE principle deﬁned as min P E  ∥ HP − DA ∥ 2 F  ⇒ min P E h T r  ( HP − DA ) ⊤ ( HP − DA ) i . (94) Under the MMSE and ML channel estimation, we elaborate the precoder design as following. i) MMSE: Since there is only one random value E ∼ N (0 , σ 2 e I ) in MMSE channel estimation, the target function under MMSE principle is min P E E  ∥ HP − DA ∥ 2 F  . (95) 26 Thus, we ha ve a theorem for estimation-error-considered precoder in DIF and RIF . Theorem 7. In the MMSE channel estimation when N ≥ K , considering the channel estimation err or E ∼ N (0 , σ 2 e I ) , the estimation-err or-consider ed precoder P MMSE DIF in DIF is P MMSE DIF = 1 η ˆ H ⊤  ˆ H ˆ H ⊤ + K σ 2 e I  − 1 D A . (96) In RIF , the precoder is shifted to P MMSE RIF = 1 η ˆ H ⊤  ˆ H ˆ H ⊤ + K σ 2 e I + K ρ I  − 1 D A . (97) Pr oof. See Appendix D. ii) ML: Since H ∼ N (0 , σ 2 h ) I is independent to E ∼ N ( 0 K , σ 2 e I ) , according to [46], we derive the conditional distribution that H | ˆ H ∼ N  β ˆ H , σ 2 ξ I  , (98) where β = σ 2 h σ 2 h + σ 2 e , σ 2 ξ = σ 2 h σ 2 e σ 2 h + σ 2 e . (99) Considering H = β ˆ H + Ξ , Ξ | ˆ H ∼ N  0 K , σ 2 ξ I  in ML channel estimation, the target function under MMSE principle is min P E Ξ | ˆ H  ∥ HP − DA ∥ 2 F  . (100) Thus, the optimized estimation-error-considered precoder for ML channel estimation is supported in the next theorem. Theorem 8. In the ML channel estimation when N ≥ K , considering the channel estimation error E ∼ N (0 , σ 2 e I ) and practical channel H ∼ N (0 , σ 2 h I ) , the estimation-err or-consider ed pr ecoder P MMSE DIF in DIF is P ML DIF = 1 η ˆ H ⊤  σ 2 h σ 2 h + σ 2 e ˆ H ˆ H ⊤ + K σ 2 e I  − 1 D A . (101) In RIF , the precoder is shifted to P ML RIF = 1 η ˆ H ⊤  σ 2 h σ 2 h + σ 2 e ˆ H ˆ H ⊤ + K σ 2 e I + K ( σ 2 h + σ 2 e ) ρσ 2 h I  − 1 D A . (102) Pr oof. See Appendix E. Remark 4. Inspir ed by [20], we can easily adapt Eq. (21) and Eq. (22) by re garding P as P = 1 η ˆ H ⊤ MD A when channel estimation err or existing. Thus, to adapt the impacts on imperfect CSI, the inputs M of MCN-SPS method should be shifted according to Theor em 7 and Theor em 8, which are listed at T ab . I. V . T H E O R E T I C A L A NA L Y S I S After introducing the whole method, in this section, we analyze this method theoretically in its comple xity and accuracy . 27 T ABLE I I N P UT S M O F M CN - S P S M E T H O D W I T H E S T I MAT I ON E R RO R Method MMSE Estimation ML Estimation DIF M MMSE DIF =  ˆ H ˆ H ⊤ + K σ 2 e I  − 1 M ML DIF =  σ 2 h σ 2 h + σ 2 e ˆ H ˆ H ⊤ + K σ 2 e I  − 1 RIF M MMSE RIF =  ˆ H ˆ H ⊤ + K σ 2 e I + K ρ I  − 1 M ML RIF =  σ 2 h σ 2 h + σ 2 e ˆ H ˆ H ⊤ + K σ 2 e I + K ( σ 2 h + σ 2 e ) ρσ 2 h I  − 1 A. Computational Complexity Analysis In our proposed method, the ov erall procedure of Alg. 4 incorporates an SIVP solver denoted as Ψ( · ) , whose time complexity is represented as O ( T Ψ ) . T o analyze the time complexit y of the MCN-SPS method, we note that it inv olves Alg. 3, which itself calls Alg. 2 in a nested manner . Therefore, we ev aluate the total time complexity in a layer-by-layer manner . In this section, we consider the inv erse calculation for M obtaining with complexity O ( K 3 ) in a verage, and the A obtaining with O ( T Ψ ) . i) MCN-SPS method (Alg. 4): The whole steps of MCN-SPS method consist of the initialization and iterativ e steps, which is given by T MCN − SPS = T init , 4 + T iter , 4 . (103) In initialization step, the each assignment is with complexity O (1) , except the initial A obtaining and the initial D obtaining with O ( T AO ) . The whole complexity in initialization step is giv en by T init , 4 = O ( T Ψ ) + O ( T AO ) . (104) Regarding the iterative steps, since the radius is reduced doubly , the iterati ve number of the shrinkage step is approach to log 2 ( r 0 ) . And we denote the iterativ e number of origin shifting as O ( T os ) which can be regarded as a constant. Inside the iterations, since the operations of each random-directional ray e xecute in parallel, we ignore the iterativ e number of the parallel processing. T o Eq. (18), it consists of a inv erse calculation for M obtaining. Thus, the whole complexity in iterati ve steps is given by T iter , 4 =  O ( T AO ) + O ( T Ψ ) + O ( K 3 )  [log 2 ( r 0 ) + O ( T os )] . (105) ii) A O (Alg. 3): The whole steps of A O algorithm consist of the initialization and iterativ e steps, which is given by T AO = T init , 3 + T iter , 3 . (106) In initialization step, the each assignment is with complexity O (1) , besides once operation for M and A obtaining. Thus, we ha ve T init , 3 = O ( K 3 ) + O ( T Ψ ) . (107) 28 Regarding the iterati ve steps, the whole iterativ e number is constrained under a set constant v alue max iter. Inside the iteration, once operation for D and A obtaining is implemented on alternating. So T iter , 3 can be calculated by T iter , 3 = T 3 ( O ( T RA + T Ψ )) ≤ max iter × ( O ( T RA + T Ψ )) . (108) iii) RA (Alg. 2): Regarding the initialization step in Alg. 2, it contains once operation on M obtaining and D initialing, where the D initialing consist of the inv erse calculation with complexity O ( K 3 ) in average. Thus, the complexity of initialization step can be calculated by T init , 2 = 2 O ( K 3 ) . (109) Outside the iterati ve steps, there is a theorem on the upper bound of iterati ve number . Theorem 9. Under a extr eme value ϵ , a interfer ence coupling matrix G and a initialization d (0) , the iterative number t RA of Alg. 2 satisﬁes t RA ≤ log ϵ − log  d H ( G − 1 ( 1 K ⊘ d (0) ) , d (0) )  log  tanh  ∆( G − 1 ) 4  , (110) wher e ∆( G − 1 ) = log max i,j,p,q g ′ p,i g ′ q ,j g ′ p,j g ′ q ,i ! , (111) and g ′ i,j , i, j ∈ { 1 , · · · , K } r epresents the { i, j } element of G − 1 . Pr oof. See Appendix F. Regarding the iterati ve steps, the whole process in each iteration consists a in verse calculation with complexity O ( K 3 ) to apply Eq. (75) and the others with complexity O ( K ) . Combining with Theorem 9, the whole complexity of RA algorithm can be calculated by T RA = 2 O ( K 3 ) + t RA  O ( K 3 ) + 4 O ( K )  . (112) In combination with the above, since T 3 and T os can be regarded as the constant, we can deriv e the complexity of our proposed method from T MCN − SPS = O  ( T AO + T Ψ + K 3 + 1)(log 2 ( r 0 ) + T os )  = O  ( T 3 ( T RA + T Ψ ) + T Ψ + K 3 + 1)(log 2 ( r 0 ) + T os )  = O  T 3 ( T Ψ + t RA K 3 )(log 2 ( r 0 ) + T os )  = O  ( T Ψ + t RA K 3 ) log 2 ( r 0 )  . (113) Considering LLL algorithm for Ψ( · ) with complexity O  K 4 log K  [12], the proposed method is with complexity O  K 4 log K log 2 ( r 0 )  . When K enlarge to a tremendously large number such that K ≫ r 0 , O (log 2 ( r 0 )) can be 29 T ABLE II T H E OR E T I C A L T I M E C O M PL E X I T Y I N DI FF E R E N T S I T UA T I O N Method A verage Complexity A verage Complexity with LLL algorithm K ≫ r 0 with LLL algorithm MCN-SPS method O (log 2 ( r 0 ) T Ψ ) O  K 4 log K log 2 ( r 0 )  O  K 4 log K  V enturelli’s method [25] O ( T Ψ ) O  K 4 log K  O  K 4 log K  PSO-based method [23] O  K 2 T Ψ  O  K 6 log K  O  K 6 log K  ignored, and the complexity of MCN-SPS method is given by O  K 4 log K  . In T ab . II, we compare the MCN- SPS method, V enturelli’ s method [25] and PSO-based method [23] in three cases: the average complexity , av erage Complexity with LLL algorithm, and K ≫ r 0 with LLL algorithm. The observations rev eal that the MCN-SPS method has a lower comple xity in all three cases compared to PSO-based method [23], and approach to an equal complexity on V enturelli’ s method when K ≫ r 0 . V I . N U M E R I C A L V A L I D A T I O N O F T H E O R E T I C A L F I N D I N G S In this section, we e valuate the ef fectiv eness of MCN-SPS method in feasibility , complexity and performance. The simulation setups are summarized as follo ws. Metrics: W e employ the sum rate and runtime as the metrics of communication performance and complexity respectiv ely , which are obtained by av eraging in 1000 Monte-Carlo e xperiments. The sum rate in each experiment is calculated by Eq. (18). Considering the feasibility veriﬁcation, we employ the Hilbert metric between this and previous iteration. When the processing is conv erged into a ﬁxed point, this metric is zero due to the same from input to output. In addition, we deﬁne the occupancy of the MIMO system by Occupancy = K N × 100% , (114) in which the situation of Occupancy > 100% is referred to as ov erload MIMO. Benchmark and setting: In the comparisons, we adopt PSO-based method [23], the [25]’ s method , the IF precoding without D optimization, RZF as the benchmarks, referring to as ”PSO”, ”V enturelli”, ”Equal Power” and ”RZF” respectiv ely . T o gain the SIVP solution, we employ LLL algorithm in Alg. 1. In PSO, the maximum of iteration and population size are set as 300 and 50 respectively , and the velocity is within [ − 10 − 1 , 10 − 1 ] . Considering Equal Power and RZF , the power allocation matrix D is both set as I . T o MCN-SPS method, we employ RIF scheme, and set max iter in Alg. 3 as 100 and r 0 in Alg. 4 as 10 . W e calculate the upper bound of MIMO system by R ub = 1 2 log det  I + ρ K HWH ⊤  , (115) where W represents the K × K diagonal matrix outputted by the famous water -ﬁlling algorithm [47]. 30 (a) (b) (c) Fig. 5. Trajectory in iterations for the sample H in: (a) SNR = 0 dB, (b) SNR = 5 dB, and (c) SNR = 10 dB. The trend beside trajectory depicts the Hilbert metric changed in iterations. (a) (b) Fig. 6. Sum rate comparisons in different occupancy ( 100% and 150% ) and K ( 16 to 256 ) with SNR: (a) 25 dB, (b) 30 dB. A. Con verg ence veriﬁcation T o verify the con vergence, Fig. 5 illustrates the trajectory of the moment v ector d in Ω under a sample channel matrix H , along with the corresponding trend of the Hilbert metric across iterations, for different SNR values. At SNR = 0 dB, a period-2 oscillation is clearly observed in Fig. 5(a): d ( t ) jumps between only two positions in Ω , 31 (a) (b) (c) (d) Fig. 7. Sum rate comparisons in different SNR with: (a) N = 8 , K = 16 , (c) N = K = 16 , and runtime from K = 2 to 32 are shown in (b), (d) respectively . while the Hilbert distance d H ( d ( t +1) , d ( t ) ) remains constant. Geometrically , the cavity in this case appears as a cylinder . In Fig. 5(b), a similar oscillatory beha vior persists, b ut the two locations gradually con verge tow ard each other . This results in a cylindro-conical ca vity shape and is reﬂected in the Hilbert metric, which decreases yet con verges to a non-zero value. As the SNR increases further , shown in Fig. 5(c), the cavity tak es on a tapered conical form, indicating that the iteration eventually con ver ges. Correspondingly , d H ( d ( t +1) , d ( t ) ) con verges to zero. Overall, these observations support the correctness of Lemma 3. B. P erformance comparison T o address the massi ve communication and ubiquitous application scenarios envisioned for 6G, we conduct a series of comparative experiments to v alidate the ef fectiv eness and adv antages of MCN-SPS in large-scale MIMO systems. Fig. 6 presents the results under different occupancy lev els ( 100% and 150% ) and SNR values ranging from 15 dB to 30 dB. Since the PSO method becomes inapplicable when K ≥ 32 , we compare the sum-rate performance of MCN-SPS, V enturelli’ s method [25], and RZF for various numbers of user antennas K (from 16 to 256 ). The following observations can be drawn from Fig. 6: i) Across all considered values of K and occupancy le vels, MCN-SPS achiev es a higher sum-rate compared to all benchmark methods. Moreov er , the sum rate gain of MCN-SPS increases as the occupancy level rises, highlighting its applicability in large-scale MIMO systems and its robustness in ov erloaded MIMO scenarios. 32 (a) (b) (c) Fig. 8. Sum rate comparisons on the robust precoder designed in Theorem 8 with dif ferent: (a) SNR, occupancy , (b) SNR, σ 2 e , (c) K , σ 2 e . ii) As K gro ws, the performance gap between MCN-SPS and RZF stabilizes, with their curves becoming nearly parallel. In contrast, the performance of V enturelli’ s method gradually falls below that of RZF , leading to an intersection between their curves. This indicates that a jointly optimized ( A , D ) pair is crucial for the effecti veness of IF precoding in lar ge-scale MIMO. iii) W ith increasing SNR, the performance gap between MCN-SPS and RZF widens further , while the intersection point between V enturelli’ s method and RZF shifts toward smaller K . These trends suggest that relaxation- based joint optimization methods for IF precoding exhibit reduced applicability in systems with many users or lower SNR. Such limitations are not observed with MCN-SPS, demonstrating the strong applicability of the MCN-SPS-based IF precoder . Regarding the performance in different SNR, we establish two comparisons on sum rate and runtime between the proposed method and benchmarks with N = 8 , k = 16 and N = K = 16 . The results are depicted in Fig. 7, and some observ ations are obtained as follow . i) In o verload MIMO depicted in Fig. 7(a), RZF fails to work due to the non-eliminated multi-user interference (MUI). But IF based methods (MCN-SPS, PSO, V enturelli, Equal Power) still work, which rev eals RIF have 33 the ef fectiveness to reduce MUI. ii) Compared to PSO, V enturelli, Equal Power and RZF , our proposed method achiev e a higher sum rate in all SNR ≥ 10 dB in Fig. 7(a) and (c), which gain 1 dB when N = K = 16 and 2 dB when N = 8 , k = 16 compared to PSO method. Meanwhile, the proposed method run in a nearly double lower time compared to PSO method. iii) The gap between the upper bound of MIMO system and the proposed method shrinks during the increasing of occupancy , which rev eals the applicability of MCN-SPS method in ov erload MIMO. iv) From Fig. 7(b) and Fig. 7(d), MCN-SPS achieves an nearly doubled runtime decreasing compared to PSO, which reﬂects the lower comple xity of MCN-SPS method. Finally , we ev aluate the effecti veness of the robust precoder designed in Theorem 8, which explicitly accounts for channel estimation errors. The results are presented in Fig. 8, which compares the proposed robust design with a con ventional (“naiv e”) approach under dif ferent SNRs, occupancy lev els (Fig. 8(a)), and estimation-error variances σ 2 e (Fig. 8(b)). In the naive scheme, the precoder in Eq. (13) and the inputs in Eq. (20) are implemented directly using the estimated channel ˆ H , without incorporating error statistics. Some observations can be obtained as follow . i) Across all subﬁgures in Fig. 8, the robust precoder of Theorem 8 achiev es a higher sum-rate than the naive counterpart, conﬁrming its effecti veness under imperfect CSI for varying SNR, occupancy , and σ 2 e . ii) As shown in Fig. 8(a) and (b), the sum-rate of MCN-SPS increases with SNR, while it decreases with higher occupancy or larger σ 2 e . iii) For a ﬁxed σ 2 e = 0 . 02 , the performance gap between the robust and nai ve schemes widens as SNR increases, yet narro ws when the occupancy lev el rises. While this gap remains approximately constant as σ 2 e increases across the SNR range, it is more pronounced at lower SNR values and larger numbers of users K . V I I . C O N C L U S I O N In this paper , a geometry-based generalized optimization model for ( A , D ) optimization. has been proposed. By the proposed generalized model, we ﬁnd that the solution of ( A , D ) optimization problem can be regarded as the searching on se veral A -represented regions independently di vided from the whole D , and each region corresponds the partial of a cone whose points can be represented by the direction of a ray emitted from the origin to D ’ s holopositiv e region. Based on this exploration, a mixed optimization method, referring to as Multi-Cone Nested Stochastic Pattern Search (MCN-SPS) method, has been proposed. The theoretical analysis on complexity shows that the MCN-SPS method achieve a lower complexity of O  K 4 log K log 2 ( r 0 )  with LLL algorithm, and reach O  K 4 log K  when K ≫ r 0 . The Simulations sho w that the proposed method features a lo wer complexity compared with PSO-based method in [23], and gain a higher sum rate compared with all benchmarks. 34 A P P E N D I X A P RO O F O N T H E O R E M 1 W e prov e Theorem 1 by using contradiction. First, we assume the original proposition does not hold under the original background, i.e., when ∥ D A − I ∥ 2 < ϵ for a minimum v alue ϵ > 0 , ∀ R B = G B U ∈ S B , ∀ R A = G A V ∈ S A , s.t. , U  = V . (116) Since R A and R B are the SIVP solutions of G A and G B respectiv ely , based on Deﬁnition 2, their basis vectors satisfy ∥ R A e i ∥ 2 = ∥ G A V e i ∥ 2 ≤ λ N (Λ A ) , (117) ∥ R B e i ∥ 2 = ∥ G B Ue i ∥ 2 ≤ λ N (Λ B ) , i ∈ { 1 , · · · , K } , (118) where e i denotes the i -th column of identity matrix I . Since U is the unimodular matrix, G A U is also a generator matrix of Λ A . Due to G B = G A D A , we ha ve G B Ue i − G A Ue i = G A D A Ue i − G A Ue i = G A ( D A Ue i − Ue i ) . (119) Since ∥ · ∥ 2 is a matrix norm compatible with the v ector 2-norm, we hav e ∥ G B Ue i − G A Ue i ∥ 2 = ∥ G A ( D A Ue i − Ue i ) ∥ 2 ≤ ∥ G A ∥ 2 ∥ D A Ue i − Ue i ∥ 2 . (120) Because G A is kno wn, ≤ ∥ G A ∥ 2 can be re garded as a kno wn value. Meanwhile, since ∥ D A − I ∥ 2 < ϵ , we hav e ∥ D A Ue i − Ue i ∥ 2 = ∥ ( D A − I ) Ue i ∥ 2 ≤ ∥ D A − I ∥ 2 ∥ Ue i ∥ 2 < ϵ ∥ Ue i ∥ 2 . (121) By reg arding ∥ Ue i ∥ 2 a constant, ∥ D A Ue i − Ue i ∥ 2 is bounded and approach to zero when ∥ D A − I ∥ 2 < ϵ , i.e., ∀ ϵ 2 > 0 , ∃ ϵ > 0 , s.t., ∥ D A − I ∥ 2 < ϵ, ∥ D A Ue i − Ue i ∥ 2 < ϵ 2 . (122) Since ϵ 2 > 0 can be any positiv e minimum v alue, we assume that ϵ 2 = ϵ ∥ G A ∥ 2 , and we have ∥ G B Ue i − G A Ue i ∥ 2 ≤ ∥ G A ∥ 2 ∥ D A Ue i − Ue i ∥ 2 < ∥ G A ∥ 2 × ϵ ∥ G A ∥ 2 = ϵ. (123) According to the trigonometric inequality and Eq. (45), we can deriv e that ∥ G A Ue i ∥ 2 = ∥ G B Ue i − ( G B Ue i − G A Ue i ) ∥ 2 ≤ ∥ G B Ue i ∥ 2 + ∥ G B Ue i − G A Ue i ∥ 2 < λ K (Λ B ) + ϵ < λ K (Λ A ) + 2 ϵ. (124) There are two cases for the relationship of ∥ G A Ue i ∥ 2 and λ K (Λ A ) : ∥ G A Ue i ∥ 2 > λ K (Λ A ) and ∥ G A Ue i ∥ 2 ≤ λ K (Λ A ) . If ∥ G A Ue i ∥ 2 > λ K (Λ A ) , we let ∃ e i , s.t., ϵ ∈  0 , ∥ G A Ue i ∥ 2 − λ K (Λ A ) 2  , (125) so we ha ve λ K (Λ A ) + 2 ϵ ∈ ( λ K (Λ A ) , ∥ G A Ue i ∥ 2 ] . (126) 35 Howe ver , it is contradictory between λ K (Λ A ) + 2 ϵ ≤ ∥ G A Ue i ∥ 2 and Eq. (124), ∥ G A Ue i ∥ 2 ≤ λ K (Λ A ) hold true for all ϵ > 0 . According to the Deﬁnition 2, G A U is one of the SIVP solution of Λ A . There exists a unimodular matrix U such that V = U , which is contradictory to Eq. (116). Thus, the original proposition in Theorem 1 is true, and Theorem 1 has been prov ed. A P P E N D I X B P RO O F O N T H E O R E M 3 the characteristic of S consists of the boundedness, discreteness, ﬁniteness and nonempty property , which we will elaborate them respectively: i) Boundedness: Since M is a PDM under the condition in Lemma 1, there must e xist a constant ϕ > 0 , s.t., v ⊤ Mv ≥ ϕ ∥ v ∥ 2 , ∀ v ∈ R K . (127) Considering the tar get function in Eq. (21), we have T r  A ⊤ D ⊤ MD A  ≥ ϕ K X i =1 ∥ Da i ∥ 2 , (128) where a i represents the i -th column of A . By the constrain Q K i =1 d i = 1 , we can derive the following solutions: 1 when a d i → 0 , it must ha ve a d j → ∞ to remain the constrain. 2 when a d j → 0 , and the j -th parameter of corresponded column a i is nonzero, we have ∥ Da i ∥ → ∞ , and T r  A ⊤ D ⊤ MD A  → ∞ , which is impossible to as a minimum v alue of T r  A ⊤ D ⊤ MD A  . Thus, it exists a constant C , s.t., ∀ d i ∈ [ m, M ] , m > 0 , M < ∞ when T r  A ⊤ D ⊤ MD A  ≤ C , which rev eals the boundness of S . ii) Discreteness: W e prove this property by using contradiction. Firstly , we assume there exists a dissimilarity point sequence { ( A n , d n ) } ⊂ S satisﬁes ( A n , d n ) → ( A ∗ , d ∗ ) . (129) Since A n is an integer -valued matrix, it must be A n = A ∗ when n is larger enough, which leads that d n is the solution of Eq. (50). According to Lemma 1 and Remark 2, Eq. (50) can be regarded as a con vex problem whose solution represented by d ∗ A is unique [48]. Based on Eq. (42) and Remark 2, any ( A , d ) ∈ S requires consistency , which rev eals that ( A ∗ , d ∗ A ) ∈ S . Thus, when n is large enough, ( A n , d n ) becomes constant sequence which is contradictory to the deﬁnition of dissimilarity point sequence. So it is impossible existing a dissimilarity point sequence consisted in S , and S is discrete. iii) Finiteness: Since S is a bounded discrete set in ﬁnite dimensional Euclidean space, S is a compact set. According [49, Theorem 2.4.1], S is a ﬁnite set. iv) Nonempty property: W e prove this property by using contradiction. Firstly , we assume that none of regions in Ω have the e xtreme. Since Ω fulﬁll the whole R K − 1 + , the problem in Eq. (50) ha ve no solution for e very A , which is contradictory to Lemma 1. Thus, there must be at least one region with e xtreme, and S is nonempty . 36 A P P E N D I X C P RO O F O N T H E O R E M 5 Before proving Theorem 5, there is a lemma about the characteristic of Hilbert metric on the vector 1 K ⊘ d Lemma 4. In the complete metric space with Hilbert metric, we have ∀ d x , d y ∈ D , s.t., d H ( 1 K ⊘ d x , 1 K ⊘ d y ) = d H ( d x , d y ) . (130) Pr oof. Regarding ∀ d x = [ d x, 1 , · · · , d x,K ] ⊤ , d y = [ d y , 1 , · · · , d y ,K ] ⊤ ∈ D , since the negati ve correlation of recip- rocal, we ha ve d H ( 1 K ⊘ d x , 1 K ⊘ d y ) = log   max K i =1  d y,i d x,i  min K i =1  d y,i d x,i    = log    1 min K i =1  d x,i d y,i  1 max K i =1  d x,i d y,i     = log   max K i =1  d x,i d y,i  min K i =1  d x,i d y,i    = d H ( d x , d y ) . (131) Thus, the lemma has been prov ed. Then we prove the Theorem 5. T owards the p -P AM symbols in Rayleigh channel with domain [0 , + ∞ ) , the channel matrix H remains non-negati ve. Due to the dif ference on M in DIF and RIF , we analyze them respectiv ely . DIF: When N ≥ K , H is a non-neg ative matrix with full rank in row , which means  HH ⊤  i,j ≥ 0 , i, j ∈ { 1 , · · · , K } . (132) Meanwhile, Lemma 1 reveals that HH ⊤ is a PDM when N ≥ K . Based on the properties of M matrices [42], the in verse of a positive-deﬁned M matrix is the positiv e-deﬁned non-negati ve matrix which is a necessary and sufﬁcient condition. So we learn that M =  HH ⊤  − 1 is a M matrix whose elements can be represented by m i,j = h  HH ⊤  − 1 i i,j =      non-negati ve , i = j ; non-positiv e , i  = j. (133) RIF: In RIF , since K ρ ≥ 0 and the situation K ρ = 0 happened only when ρ → + ∞ , we learn that K ρ I is non-negati ve. Combining Eq. (132), it rev eals that  K ρ I + HH ⊤  i,j =  K ρ I  i,j +  HH ⊤  i,j ≥ 0 , (134) where i, j ∈ { 1 , · · · , K } . As the same reason in DIF , we learn that M =  K ρ I + HH ⊤  − 1 is a M matrix whose elements can be represented by m i,j = "  K ρ I + HH ⊤  − 1 # i,j =      non-negati ve , i = j ; non-positiv e , i  = j. (135) Since AA ⊤ is non-neg ati ve in Theorem 5, we ha ve [ G ] i,j =  AA ⊤  ◦ M  i,j =  AA ⊤  i,j × m i,j =      non-negati ve × non-negati ve , i = j ; non-negati ve × non-positiv e , i  = j, =      non-negati ve , i = j ; non-positiv e , i  = j. (136) 37 According to Lemma 1, G is a PDM, and G − 1 is non-negati ve due to the properties of M matrix [42]. Thus, G − 1 can be regarded as a positive transformation. Based on the Perron-Frobenius theorem in direction and Hilbert metric, we ha ve ∃ γ ∈ (0 , 1) , d H  G − 1 d x , G − 1 d y  ≤ γ d H ( d x , d y ) (137) for all d x , d y ∈ Ω . Let function g ( d ) ≜ Γ  G − 1 ( 1 K ⊘ d )  , in which Γ( x ) can be reg arded as a scaling operation on x . Combining Lemma 4, we ha ve d H ( g ( d x ) , g ( d y )) = d H  Γ  G − 1 ( 1 K ⊘ d x )  , Γ  G − 1 ( 1 K ⊘ d y )  = d H  G − 1 ( 1 K ⊘ d x ) , G − 1 ( 1 K ⊘ d y )  ≤ γ d H ( 1 K ⊘ d x , 1 K ⊘ d y ) = γ d H ( d x , d y ) . (138) d H ( g ( d x ) , g ( d y )) ≤ γ d H ( d x , d y ) reﬂects that the function g ( d ) is a contraction within Ω , which means Eq. (75) will con ver ge to a ﬁxed direction satisﬁed Eq. (76). The theorem has been proved. A P P E N D I X D P RO O F O N T H E O R E M 7 The proof of Theorem 7 consists of the deriv ations on DIF and RIF . W e introduce them as following: i) DIF: Since E ∼ N (0 , σ 2 e I ) , it can deri ve that E E ( E ) = 0 K and E E  E ⊤ E  = σ 2 e I . Combining Eq. (6) and the independence between E and ˆ H , we ha ve E E  ∥ HP − DA ∥ 2 F  = E E  ∥ F + EP ∥ 2 F  = ∥ F ∥ 2 F + E E  ∥ EP ∥ 2 F  + 2T r  ( F ) ⊤ E E [ E ] P  = ∥ F ∥ 2 F + T r  P ⊤ E E  E ⊤ E  P  =T r  F ⊤ F  + K σ 2 e T r  P ⊤ P  , (139) where F = ˆ HP − D A . By assuming g ( P ) = T r  F ⊤ F  + K σ 2 e T r  P ⊤ P  , we can deriv e the optimal precoder by ∂ g ( P ) ∂ P = ˆ H ⊤  ˆ HP − DA  + K σ 2 e P = 0 K ⇒  ˆ H ⊤ ˆ H + K σ 2 e I  P = ˆ H ⊤ D A ⇒ P =  ˆ H ⊤ ˆ H + K σ 2 e I  − 1 ˆ H ⊤ D A . (140) According to the celebrated matrix in version lemma [50], we hav e P = ˆ H ⊤  ˆ H ˆ H ⊤ + K σ 2 e I  − 1 D A . (141) Considering the po wer constraint, the optimal estimation-error-considered precoder in DIF is P MMSE DIF = 1 η ˆ H ⊤  ˆ H ˆ H ⊤ + K σ 2 e I  − 1 D A . (142) 38 ii) RIF: By assuming the re gularization parameter λ for the constraint in Eq. (1), the target function in RIF is min P E E  ∥ HP − DA ∥ 2 F  + λ  T r  P ⊤ P  − ρ  . (143) W ith the same process in DIF , we hav e g ( P ) = T r  F ⊤ F  +  λ + K σ 2 e  T r  P ⊤ P  − λρ. (144) By calculating ∂ g ( P ) ∂ P = 0 , the optimal precoder can be deri ved as P = ˆ H ⊤  ˆ H ˆ H ⊤ +  K σ 2 e + λ  I  − 1 D A . (145) Due to the uplink-downlink duality of IF precoding [22] and equal power distribution assumption [20], λ can be selected as λ = K ρ , and the RIF’ s optimal estimation-error-considered precoder under po wer constraint is P MMSE RIF = 1 η ˆ H ⊤  ˆ H ˆ H ⊤ +  K σ 2 e + K ρ  I  − 1 D A . (146) A P P E N D I X E P RO O F O N T H E O R E M 8 W e prov e Theorem 8 on DIF and RIF , which is introduced respectively as following. i) DIF: According to the model H = β ˆ H + Ξ , Ξ | ˆ H ∼ N  0 K , σ 2 ξ I  , we can deriv e that E Ξ | ˆ H [ Ξ ] = 0 K and E Ξ | ˆ H h Ξ ⊤ Ξ i = σ 2 ξ I . Thus, we have E Ξ | ˆ H  ∥ HP − DA ∥ 2 F  = E Ξ | ˆ H  ∥ F + ΞP ∥ 2 F  = ∥ F ∥ 2 F + E Ξ | ˆ H  ∥ ΞP ∥ 2 F  + 2T r  ( F ) ⊤ E Ξ | ˆ H [ Ξ ] P  = ∥ F ∥ 2 F + T r  P ⊤ E Ξ | ˆ H h Ξ ⊤ Ξ i P  =T r  F ⊤ F  + K σ 2 ξ T r  P ⊤ P  , (147) where F = β ˆ HP − DA . Assume g ( P ) = T r  F ⊤ F  + K σ 2 ξ T r  P ⊤ P  , the optimal precoder P can be calculated by ∂ g ( P ) ∂ P = β ˆ H ⊤  β ˆ HP − DA  + K σ 2 ξ P = 0 K ⇒  β 2 ˆ H ⊤ ˆ H + K σ 2 ξ I  P = β ˆ H ⊤ D A ⇒ P =  β 2 ˆ H ⊤ ˆ H + K σ 2 ξ I  − 1 β ˆ H ⊤ D A . (148) According to the matrix in version lemma [50], we have P = β ˆ H ⊤  β 2 ˆ H ˆ H ⊤ + K σ 2 ξ I  − 1 D A = ˆ H ⊤ β ˆ H ˆ H ⊤ + K σ 2 ξ β I ! − 1 D A . (149) Substituting Eq. (99) in Eq. (149), we can derive that P = ˆ H ⊤  σ 2 h σ 2 h + σ 2 e ˆ H ˆ H ⊤ + K σ 2 h σ 2 e σ 2 h + σ 2 e σ 2 h + σ 2 e σ 2 h I  − 1 D A = ˆ H ⊤  σ 2 h σ 2 h + σ 2 e ˆ H ˆ H ⊤ + K σ 2 e I  − 1 D A . (150) 39 Considering the po wer constraint, the optimal estimation-error-considered precoder in DIF is P ML DIF = 1 η ˆ H ⊤  σ 2 h σ 2 h + σ 2 e ˆ H ˆ H ⊤ + K σ 2 e I  − 1 D A . (151) ii) RIF: By assuming the re gularization parameter λ for the constraint in Eq. (1), the target function in RIF is min P E Ξ | ˆ H  ∥ HP − DA ∥ 2 F  + λ  T r  P ⊤ P  − ρ  . (152) W ith the same process in DIF , we hav e g ( P ) = T r  F ⊤ F  +  λ + K σ 2 ξ  T r  P ⊤ P  − λρ. (153) By calculating ∂ g ( P ) ∂ P = 0 , the optimal precoder can be deri ved as P = ˆ H ⊤ β ˆ H ˆ H ⊤ + K σ 2 ξ + λ β I ! − 1 D A = ˆ H ⊤  σ 2 h σ 2 h + σ 2 e ˆ H ˆ H ⊤ +  K σ 2 e + λ ( σ 2 h + σ 2 e ) σ 2 h  I  − 1 D A . (154) Due to the uplink-downlink duality of IF precoding [22] and equal power distribution assumption [20], λ can be selected as λ = K ρ , and the RIF’ s optimal estimation-error-considered precoder under po wer constraint is P ML RIF = 1 η ˆ H ⊤  σ 2 h σ 2 h + σ 2 e ˆ H ˆ H ⊤ +  K σ 2 e + K ( σ 2 h + σ 2 e ) ρσ 2 h  I  − 1 D A . (155) A P P E N D I X F P RO O F O N T H E O R E M 9 According to the process in Appendix C, when Alg. 2 con verges, we hav e learn that d H ( g ( d x ) , g ( d y )) ≤ γ d H ( d x , d y ) . (156) Thus, we ha ve an inference chain denoted by d H  d ( t +1) , d ( t )  = d H  Γ  G − 1  1 K ⊘ d ( t )  , Γ  G − 1  1 K ⊘ d ( t − 1)  = d H  G − 1  1 K ⊘ d ( t )  , G − 1  1 K ⊘ d ( t − 1)  ≤ γ d H  1 K ⊘ d ( t ) , 1 K ⊘ d ( t − 1)  = γ d H  d ( t ) , d ( t − 1)  ≤ γ 2 d H  d ( t − 1) , d ( t − 2)  · · · ≤ γ t d H  d (1) , d (0)  = γ t d H  G − 1 ( 1 K ⊘ d (0) ) , d (0)  . (157) In Alg. 2, the algorithm ends at d H  d ( t +1) , d ( t )  ≤ ϵ . Let d H  d ( t +1) , d ( t )  = ϵ , we can derive that t RA log ( γ ) ≥ log d H  d ( t +1) , d ( t )  d H  d (1) , d (0)  ! = log ϵ − log  d H  G − 1 ( 1 K ⊘ d (0) ) , d (0)  . (158) 40 As γ ∈ (0 , 1) , log ( γ ) is within ( −∞ , 0) , so t RA ≤ log ϵ − log  d H  G − 1 ( 1 K ⊘ d (0) ) , d (0)  log ( γ ) . (159) Regarding the bounds of γ , we employ the contraction ratio in Birkhoff-Hopf theorem which can be calculated by κ ( G − 1 ) = tanh  ∆( G − 1 ) 4  . (160) In Eq. (160), the projectiv e diameter ∆( G − 1 ) can be calculated in [38] by ∆( G − 1 ) = max 1 ≤ i,j ≤ K d H  G − 1 e i , G − 1 e j  = log max i,j,p,q g ′ p,i g ′ q ,j g ′ p,j g ′ q ,i ! , (161) where e i , i ∈ { 1 , · · · , K } represents the i -th column of I and g ′ i,j , i, j ∈ { 1 , · · · , K } the { i, j } element of G − 1 . Combining Eq. (159) and Eq. (161), we can obtain Eq. (110) and Eq. (111). The theorem has been proved. R E F E R E N C E S [1] J. R. Hampton, Intr oduction to MIMO Communications . Cambridge Univ ersity Press, 2013. [2] B. Hassibi and B. M. Hochwald, “How much training is needed in multiple-antenna wireless links?” IEEE T rans. Inf. Theory , v ol. 49, no. 4, pp. 951–963, 2003. [3] Z. W ang, J. Zhang, H. Du, W . E. I. Sha, B. Ai, D. Niyato, and M. Debbah, “Extremely lar ge-scale MIMO: fundamentals, challenges, solutions, and future directions, ” IEEE W irel. Commun. , vol. 31, no. 3, pp. 117–124, 2024. [4] B. M. Hochwald, T . L. Marzetta, and V . T arokh, “Multiple-antenna channel hardening and its implications for rate feedback and scheduling, ” IEEE Tr ans. Inf. Theory , vol. 50, no. 9, pp. 1893–1909, 2004. [5] N. Rajatheva, I. Atzeni, E. Bj ¨ ornson, A. Bourdoux, S. Buzzi, J.-B. Dor ´ e, S. Erkucuk, M. Fuentes, K. Guan, Y . Hu, X. Huang, J. Hulkkonen, J. M. Jornet, M. Katz, R. Nilsson, E. Panayirci, K. Rabie, N. Rajapaksha, M. J. Salehi, H. Sarieddeen, S. Shahabuddin, T . Svensson, O. T ervo, A. T ¨ olli, Q. W u, and W . Xu, “White paper on broadband connectivity in 6g, ” 6G Researc h V isions , vol. 10. [Online]. A vailable: https://par .nsf.gov/biblio/10223732 [6] M. H. M. Costa, “Writing on dirty paper, ” IEEE T rans. Inf. Theory , vol. 29, no. 3, pp. 439–441, 1983. [7] W . Y u and J. M. Ciofﬁ, “Sum capacity of gaussian vector broadcast channels, ” IEEE T rans. Inf. Theory , vol. 50, no. 9, pp. 1875–1892, 2004. [8] L. Sun and M. R. McKay , “T omlinson-harashima precoding for multiuser MIMO systems with quantized CSI feedback and user scheduling, ” IEEE Tr ans. Signal Pr ocess. , vol. 62, no. 16, pp. 4077–4090, 2014. [9] B. M. Hochwald, C. B. Peel, and A. L. Swindlehurst, “ A vector-perturbation technique for near -capacity multiantenna multiuser communication-part II: perturbation, ” IEEE T rans. Commun. , vol. 53, no. 3, pp. 537–544, 2005. [10] Y . A vner , B. M. Zaidel, and S. S. (Shitz), “On vector perturbation precoding for the MIMO gaussian broadcast channel, ” IEEE T rans. Inf. Theory , vol. 61, no. 11, pp. 5999–6027, 2015. [11] S. G. a. Daniele Micciancio, Complexity of Lattice Pr oblems: A Cryptographic P erspective , 1st ed., ser. The Springer International Series in Engineering and Computer Science No.671. Springer , 2002. [12] P . Q. Nguyen and B. V all ´ ee, The LLL algorithm . Springer , 2010. [13] H. V ikalo and B. Hassibi, “On the sphere-decoding algorithm II. generalizations, second-order statistics, and applications to communica- tions, ” IEEE T rans. Signal Process. , vol. 53, no. 8-1, pp. 2819–2834, 2005. [14] J. Zhang, J. Zhang, E. Bj ¨ ornson, and B. Ai, “Local partial zero-forcing combining for cell-free massive MIMO systems, ” IEEE Tr ans. Commun. , vol. 69, no. 12, pp. 8459–8473, 2021. [15] L. Chen, Z. W ang, Y . Du, Y . Chen, and F . R. Y u, “Generalized transceiver beamforming for DFRC with MIMO radar and MU-MIMO communication, ” IEEE J. Sel. Ar eas Commun. , vol. 40, no. 6, pp. 1795–1808, 2022. [16] Z. W ang, L. Liang, S. Lyu, Y . Xia, Y . Huang, and D. W . K. Ng, “Efﬁcient statistical linear precoding for do wnlink massive MIMO systems, ” IEEE T rans. Wir el. Commun. , vol. 23, no. 10, pp. 14 805–14 818, 2024. 41 [17] M. A. M. Albreem, A. H. A. Habbash, A. M. Abu-Hudrouss, and S. S. Ikki, “Overview of precoding techniques for massive MIMO, ” IEEE Access , vol. 9, pp. 60 764–60 801, 2021. [18] S. Stern and R. F . H. Fischer , “ Advanced f actorization strategies for lattice-reduction-aided preequalization, ” in IEEE International Symposium on Information Theory , ISIT 2016, Barcelona, Spain, July 10-15, 2016 . IEEE, 2016, pp. 1471–1475. [19] S. Hong and G. Caire, “Reverse compute and forward: A low-comple xity architecture for downlink distributed antenna systems, ” in Pr oceedings of the 2012 IEEE International Symposium on Information Theory, ISIT 2012, Cambridge, MA, USA, July 1-6, 2012 . IEEE, 2012, pp. 1147–1151. [20] D. Silva, G. F . Piv aro, G. Fraidenraich, and B. Aazhang, “On integer-forcing precoding for the gaussian MIMO broadcast channel, ” IEEE T rans. W ir el. Commun. , vol. 16, no. 7, pp. 4476–4488, 2017. [21] O. Ordentlich and U. Erez, “Precoded integer-forcing universally achie ves the MIMO capacity to within a constant gap, ” IEEE T rans. Inf. Theory , vol. 61, no. 1, pp. 323–340, 2015. [22] W . He, B. Nazer, and S. S. Shitz, “Uplink-downlink duality for integer-forcing, ” IEEE T rans. Inf. Theory , vol. 64, no. 3, pp. 1992–2011, 2018. [23] X. Qiu, T . Y ang, and J. Thompson, “On lattice-based broadcasting for massiv e-user MIMO: practical algorithms and optimization, ” IEEE T rans. W ir el. Commun. , vol. 23, no. 11, pp. 16 544–16 558, 2024. [24] “Major advances in particle swarm optimization: Theory , analysis, and application, ” Swarm Evol. Comput. , vol. 63, p. 100868, 2021, withdrawn. [25] R. B. V enturelli and D. Silva, “Optimization of integer-forcing precoding for multi-user MIMO downlink, ” IEEE Wir el. Commun. Lett. , vol. 9, no. 11, pp. 1860–1864, 2020. [26] I. C. T relea, “The particle swarm optimization algorithm: con vergence analysis and parameter selection, ” Inf. Process. Lett. , vol. 85, no. 6, pp. 317–325, 2003. [27] C. W ang, E. K. S. Au, R. D. Murch, W . H. Mow , R. S. Cheng, and V . K. N. Lau, “On the performance of the MIMO zero-forcing receiver in the presence of channel estimation error , ” IEEE T rans. W ir el. Commun. , vol. 6, no. 3, pp. 805–810, 2007. [28] A. D. Dabbagh and D. J. Love, “Multiple antenna MMSE based downlink precoding with quantized feedback or channel mismatch, ” IEEE T rans. Commun. , vol. 56, no. 11, pp. 1859–1868, 2008. [29] F . Jiang, C. Li, and Z. Gong, “ Accurate analytical BER performance for ZF receiv ers under imperfect channel in low-snr region for large receiving antennas, ” IEEE Signal Pr ocess. Lett. , vol. 25, no. 8, pp. 1246–1250, 2018. [30] T . Liu, “On basing search SIVP on np-hardness, ” in Theory of Cryptography - 16th International Conference, TCC 2018, P anaji, India, November 11-14, 2018, Pr oceedings, P art I , ser . Lecture Notes in Computer Science, A. Beimel and S. Dziembowski, Eds., vol. 11239. Springer , 2018, pp. 98–119. [31] A. K. Lenstra, H. W . Lenstra, and L. M. Lov ´ asz, “Factoring polynomials with rational coefﬁcients, ” Mathematische Annalen , vol. 261, pp. 515–534, 1982. [32] S. L yu and C. Ling, “Boosted KZ and LLL algorithms, ” IEEE Tr ans. Signal Pr ocess. , vol. 65, no. 18, pp. 4784–4796, 2017. [33] J. W en, L. Li, X. T ang, and W . H. Mow , “ An ef ﬁcient optimal algorithm for the successive minima problem, ” IEEE T rans. Commun. , vol. 67, no. 2, pp. 1424–1436, 2019. [34] D. W ¨ ubben, D. Seethaler, J. Jald ´ en, and G. Matz, “Lattice reduction, ” IEEE Signal Pr ocess. Mag. , vol. 28, no. 3, pp. 70–91, 2011. [35] G. Birkhof f, “Extensions of jentzsch’ s theorem, ” Tr ansactions of the American Mathematical Society , vol. 85, no. 1, pp. 219–227, 1957. [36] E. Kohlber g and J. W . Pratt, “The contraction mapping approach to the perron-frobenius theory: Why hilbert’ s metric?” Math. Oper . Res. , vol. 7, no. 2, pp. 198–210, 1982. [37] J. E. Carroll, “Birkhoff ’ s contraction coefﬁcient, ” Linear algebra and its applications , vol. 389, pp. 227–234, 2004. [38] B. Lemmens and R. Nussbaum, Nonlinear P erron-F robenius Theory . Cambridge University Press, 2012, vol. 189. [39] E. Hopf, “ An inequality for positive linear integral operators, ” J ournal of Mathematics and Mechanics , vol. 12, no. 5, pp. 683–692, 1963. [40] R. A. Horn and C. R. Johnson, Matrix analysis . Cambridge uni versity press, 2012. [41] J. Li and P . Q. Nguyen, “ A complete analysis of the BKZ lattice reduction algorithm, ” J. Cryptol. , vol. 38, no. 1, p. 12, 2025. [42] A. Berman and R. J. Plemmons, Nonne gative matrices in the mathematical sciences . SIAM, 1994. [43] A. H. Sayed, “ Adaptation, learning, and optimization over networks, ” F ound. T rends Mach. Learn. , vol. 7, no. 4-5, pp. 311–801, 2014. [44] J. M. Altschuler , J. W eed, and P . Rigollet, “Near-linear time approximation algorithms for optimal transport via sinkhorn iteration, ” in Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Pr ocessing Systems 2017, December 42 4-9, 2017, Long Beach, CA, USA , I. Guyon, U. von Luxburg, S. Bengio, H. M. W allach, R. Fergus, S. V . N. V ishwanathan, and R. Garnett, Eds., 2017, pp. 1964–1974. [45] T . T ao, T opics in random matrix theory . American Mathematical Soc., 2012, vol. 132. [46] C. Li, F . Jiang, C. Meng, and Z. Gong, “ A ne w turbo equalizer conditioned on estimated channel for MIMO MMSE receiver , ” IEEE Commun. Lett. , vol. 21, no. 4, pp. 957–960, 2017. [47] D. Tse and P . V iswanath, Fundamentals of wireless communication . Cambridge university press, 2005. [48] S. P . Boyd and L. V andenberghe, Conve x optimization . Cambridge univ ersity press, 2004. [49] W . Rudin, “Principles of mathematical analysis, ” 3rd ed. , 1976. [50] J. Zhan, B. Nazer, U. Erez, and M. Gastpar , “Integer-forcing linear receivers, ” IEEE T rans. Inf. Theory , vol. 60, no. 12, pp. 7661–7685, 2014.

On the Optimal Integer-Forcing Precoding: A Geometric Perspective and a Polynomial-Time Algorithm

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment