Near-field Beam Training under Multi-path Channels: A Hybrid Learning-and-Optimization Approach

1 Near -ﬁeld Beam T raining under Multi-path Channels: A Hybrid Learning-and-Optimization Approach Jiapeng Li, Changsheng Y ou, Guoliang Cheng, Haobin Sun, Chao Zhou, and Linglong Dai Abstract —For extremely large-scale arrays (XL-arrays), the discrete Fourier transf orm (DFT) codebook, con ventionally used in the far -ﬁeld, has recently been employed for near-ﬁeld beam training. However , most existing methods rely on the line-of-sight (LoS) dominant channel assumption, which may suffer degraded communication performance when applied to the general multi- path scenario due to the more complex recei ved signal power pattern at the user . T o address this issue, we propose in this pa- per a new hybrid learning-and-optimization -based beam training method that ﬁrst leverages deep learning (DL) to obtain coarse channel parameter estimates, and then reﬁnes them via a model- based optimization algorithm, hence achieving high-accuracy estimation with low computational complexity . Speciﬁcally , in the ﬁrst stage, a tailored U-Net architecture is developed to learn the non-linear mapping from the receiv ed power pattern to coarse estimates of the angles and ranges of multi-path components. In particular , the inherent permutation ambiguity in multi-path parameter matching is effectively r esolved by a permutation in variant training (PIT) strategy , while the unknown number of paths is estimated based on deﬁned path existence logits. In the second stage, we further propose an efﬁcient particle swarm optimization method to reﬁne the angular and range parameters within a conﬁned search region; in the meanwhile, a Gerchberg- Saxton algorithm is used to retriev e multi-path channel gains from the recei ved power pattern. Last, numerical results demon- strate that the proposed hybrid design signiﬁcantly outperforms various benchmarks in terms of parameter estimation accuracy and achievable rate, yet with low computational complexity . Index T erms —Beam training, near -ﬁeld multi-path channel, deep learning, particle swarm optimization. I . I N T R O D U C T I O N Extr emely larg e-scale arrays (XL-arrays) have emer ged as a promising technology to pro vide signiﬁcantly high achiev able rates to support emerging applications in the sixth-generation (6G) wireless networks, such as holographic communications, extended reality , and digital twins [1], [2]. With the dramatic increase in the number of antennas, users are more likely to f all into the near-ﬁeld region, where radio propagation is charac- terized by spherical wa vefronts rather than planar ones [3], [4]. This introduces a new degree-of-freedom in the range domain that unlocks the capability of location-aware beam-focusing , allowing energy concentration at speciﬁc locations instead of speciﬁc directions [5], [6]. T o achie ve the beneﬁts of beam-focusing, acquiring accurate channel state information (CSI) is indispensable, which is J. Li, C. Y ou, G. Cheng, H. Sun, and C. Zhou are with the Department of Electronic and Electrical Engineering, Southern University of Science and T echnology , Shenzhen 518055, China; G. Cheng is also with the Frontier Research Center , Peng Cheng Laboratory , Shenzhen 518055, China; L. Dai is with the Department of Electronic Engineering and the State Key Laboratory of Space Network and Communications, Tsinghua Univ ersity , Beijing 100084, China. (e-mail: { lijiapeng2023, chenggl2024, sunhb2023, zhouchao2024 } @mail.sustech.edu.cn; youcs@sustech.edu.cn; daill@tsinghua.edu.cn). ( Corresponding author: Changsheng Y ou. ) rather difﬁcult in near-ﬁeld systems as compared to far-ﬁeld counterparts since both the angle and range parameters need to be estimated. Generally speaking, CSI acquisition methods can be categorized into two main paradigms, namely , explicit channel estimation [6], [7] and implicit beam training [4], [8], [9]. In this paper , we focus on beam training in near-ﬁeld systems, which is particularly important in high-frequency bands for establishing a reliable link before ef ﬁcient data transmissions based on predeﬁned codebooks. A. Related W orks Existing near-ﬁeld beam training methods can be broadly categorized into two types, namely , the power -pattern-based and deep learning (DL)-based methods. 1) P ower -pattern-based Near-ﬁeld Beam T raining: Power - pattern-based methods aim to infer key near-ﬁeld channel pa- rameters from structural features of the recei ved po wer pattern at the user after beam sweeping, without explicitly recon- structing full channel matrix. Among others, a straightforward approach is a two-dimensional (2D) exhausti ve search for both the angle and range in a polar-domain codebook [3], which, howe ver , incurs prohibitively high beam training overhead. T o tackle this issue, a representativ e line of works exploited the structural regularity of the received power pattern under beam sweeping. For example, a two-phase near-ﬁeld beam training method was proposed in [4], which ﬁrst estimates user angle through angle-domain beam sweeping based on a far -ﬁeld oriented discrete Fourier transform (DFT) codebook and then estimates user range in a second stage with a polar- domain codebook. The required beam training overhead was further reduced in [8] by jointly estimating the angle and range based on a deﬁned angular support under the line-of-sight (LoS) path assumption, whose middle angle and width are used to infer the user angle and range, respectiv ely . Further- more, a low-complexity method was proposed in [10], which efﬁciently obtains the near-ﬁeld user range by analytically resolving the beam pattern generated by the DFT codebook. Besides the abov e methods using DFT codebook, other works reduced the beam training overhead by using different beam- design strategies. For instance, the authors in [5] proposed a coarse-to-ﬁne procedure, which ﬁrst generates wide beams and then gradually resolves ﬁner -grained user angle and range parameters with narrow beams, thereby achieving lo w- ov erhead beam training. Likewise, the authors in [11] designed a multi-resolution near-ﬁeld codebook, where wide beams and narrow beams are jointly designed to realize hierarchical search directly ov er the angle-range domain. In addition, the multi-beam training method was proposed in [9], [12] to simul- taneously form multiple focusing beams, which ﬁrst identiﬁes sev eral candidate user locations by multi-beam sweeping and 2 then determines the true user location with a fe w single-beam pilots, thereby reducing the beam training ov erhead. 2) DL-based Near-ﬁeld Beam T raining: In contrast to the above power-pattern-based methods, DL-based near-ﬁeld beam training treats the inference from receiv ed signal po wers to channel parameters as a data-driv en learning problem. The main motiv ation is that the received power pattern is jointly determined by the highly nonlinear relationship among the an- gle, range, and channel gain parameters, for which an accurate and tractable analytical characterization is generally difﬁcult to obtain in practical scenarios. By le veraging the strong function-approximation capability of deep neural networks, DL-based methods can effecti vely learn this intricate mapping from labeled beam-training data. Most existing works focused on LoS-only or LoS-dominant channels and formulated near- ﬁeld beam training as a code word classiﬁcation problem, which can be efﬁciently solved via supervised learning. Speciﬁcally , the authors in [13] ﬁrst proposed a DL-based near-ﬁeld beam training method, where the receiv ed signal powers of far-ﬁeld wide beams were exploited to separately estimate the angle and range indices of the optimal near-ﬁeld codew ord. This method was further improved in [14], where a deep neural network (DNN) was trained ov er a near-ﬁeld codebook with explicit angle-range information, such that the angle and range could be jointly estimated rather than in- ferred separately . Different from these code word classiﬁcation- based designs, the beamforming vector was directly learnt in [15], where the achie vable rate is set as the learning objectiv e, thereby extending the codebook-based beam training to codebook-free beamformer generation. More recently , the authors in [16] further integrated beam training and beam tracking into a uniﬁed DL framework, where a two-stage con volutional neural network (CNN) was used for coarse-to- ﬁne near-ﬁeld beam training and a long-short-term-memory (LSTM) network was employed to e xploit temporal correlation for mobility tracking. Howe ver , in view of the abov e works, the po wer-pattern- based and DL-based near-ﬁeld beam training methods mostly hav e fundamental limitations when applied to general multi- path scenarios, which can be summarized as follows. • (Rely on the LoS-dominant channel assumption) Exist- ing near-ﬁeld beam training methods rely heavily on the LoS or LoS-dominant channel assumption, which may not be valid in practical multi-path scenarios. In particu- lar , if there exist strong non-line-of-sight (NLoS) paths, they may distort the received power pattern of the LoS channel case, exhibiting e.g., spurious power ﬂuctuations and resulting in degraded beam training accuracy . • (Unknown number of dominant channel paths) Most existing DL-based methods assumed a prior kno wledge of the channel sparsity level (i.e., the number of paths). In practical scenarios, the number of scatterers is generally unknown and time-varying. A mismatch in the number of paths may result in either over -ﬁtting (estimating noise as paths) or under-ﬁtting (missing dominant paths) issues, both of which degrade the estimation performance. • (P ermutation ambiguity in estimated multi-path) The physical multi-path channel is generally order- independent, whereas conv entional neural networks typ- ically generate order-sensiti ve sequences, where the esti- mation performance may be biased by the speciﬁc sorting of paths. This introduces permutation ambiguity [17], where the indices of estimated paths may not well match with those of ground-truth paths. For instance, if path parameters are accurately estimated b ut misaligned in sequence, standard loss functions may incorrectly penal- ize this as a severe error . Such permutation ambiguity also confuses optimizers, causing training instability and potential conv ergence failure. B. Contributions T o address the abo ve issues, we study ef ﬁcient beam training design for near-ﬁeld systems under the general multi-path channels. Speciﬁcally , we consider a practical scenario where the base station (BS) performs beam sweeping based on a DFT codebook. While traditionally utilized in the far-ﬁeld, the extension of DFT -based sweeping to the near-ﬁeld has recently gained upsurging attention, as it facilitates a ﬁeld- agnostic beam training framework that reconciles the disparate characteristics of both propagation regimes. Under this setup, we aim to estimate key multi-path channel parameters from the received signal po wers, including the angles, ranges, and complex-v alued gains of the user and scatterers. Our main contributions are summarized as follows. • First, we propose a new hybrid learning-and-optimization method for near-ﬁeld beam training under general multi- path channels, which ﬁrst leverages DL to obtain coarse channel parameter estimates, and then reﬁnes them via a model-based optimization algorithm, hence achiev- ing high-accuracy estimation with lo w computational complexity . Note that unlike existing methods tailored for LoS-only or LoS-dominant scenarios, the proposed framew ork accounts for the intricate nonlinear coupling and signal superposition among multiple paths in the receiv ed power pattern, which renders existing near-ﬁeld beam training methods suboptimal or ev en inapplicable. • Second, we propose efﬁcient methods for the two stages. Speciﬁcally , for the ﬁrst stage, we design a customized U-Net to learn the highly complex and nonlinear mapping from the recei ved po wer pattern to coarse multi-path position estimates, where permutation inv ariant training (PIT) strategy is adopted to resolve the inherent per - mutation ambiguity in multi-path estimation and path existence logits are introduced to infer the unknown number of dominant paths. Based on the statistical er- ror characteristics of coarse estimates, we construct a conﬁned search region for the key channel parameters. Subsequently , in the second stage, a customized particle swarm optimization (PSO) algorithm is de veloped to reﬁne the angle-range parameters within this conﬁned search region, while a Gerchberg-Saxton (GS) algorithm is employed to solve the corresponding phase retriev al problem and recover the complex-v alued channel gains. • Last, extensi ve numerical results are provided to verify the effecti veness of the proposed method for near-ﬁeld multi-path beam training. It is shown that the proposed 3 󰇛 0, 𝛿  𝑑󰇜 󰇛 0,0 󰇜 𝑦 -axis 𝑥 - axis Near-field user 󰇛𝑟  1  𝜃   , 𝑟  𝜃  󰇜 Near-field scatterer Near-field scatterer 󰇛𝑟 ℓ 1  𝜃 ℓ  , 𝑟 ℓ 𝜃 ℓ 󰇜 𝜙  𝜃   cos 𝜙  𝑟  𝑟 ℓ 𝑟 ℓ 󰇛󰇜 XL-array Fig. 1. A narrow-band XL-array communication system. method can attain an achiev able rate close to its upper bound based on perfect CSI under various simulation settings, yet it requires low computational complexity . Moreov er, compared with the existing LoS-based meth- ods, which suffer from sev ere model mismatch in multi- path scenarios, the proposed method achieves substan- tially improved estimation accuracy and achiev able rate. Or ganization: The remainder of this paper is organized as follows: Section II introduces the system model and problem formulation for near-ﬁeld multi-path beam training. Section III presents the main limitations of two existing benchmark schemes and summarizes the proposed hybrid learning-and- optimization method. The proposed two stages are detailed in Sections IV and V, respectiv ely . Numerical results are pre- sented in Section VI, with conclusions provided in Section VII. I I . S Y S T E M M O D E L A N D P R O B L EM F O R M U L A T I O N W e consider downlink beam training for a narrow-band XL- array system as sho wn in Fig. 1, where a BS equipped with an N -antenna uniform linear array (ULA) serves a single-antenna user . 1 W ithout loss of generality , the ULA is placed along the y -axis, where the n -th antenna is placed on (0 , δ n d ) , with δ n = 2 n − N − 1 2 , n ∈ N ≜ { 1 , 2 , . . . , N } and d = λ 2 denoting the inter-antenna spacing. A. Near-ﬁeld Channel Model The user is assumed to be located in the radiativ e near -ﬁeld region of the BS. Based on the spherical w avefront propagation model, the near-ﬁeld multi-path channel from the BS to the user can be modeled as [4] h H near = g 0 b H ( θ 0 , r 0 ) + L X ℓ =1 g ℓ b H ( θ ℓ , r ℓ ) , (1) which consists of one LoS path (represented by ℓ = 0 ) and L NLoS paths (with ℓ ≥ 1 ). Herein, the parameters g ℓ , θ ℓ ≜ cos ϕ ℓ , and r ℓ denote the complex-valued channel gain, spatial angle, and range of path ℓ , respectively . In addition, b ( θ ℓ , r ℓ ) ∈ C N × 1 , ℓ ∈ { 0 , 1 , . . . , L } denotes the near-ﬁeld steering vector , modeled as [4], [8] b ( θ ℓ , r ℓ ) = h e − ȷ 2 π λ ( r (1) ℓ − r ℓ ) , . . . , e − ȷ 2 π λ ( r ( N ) ℓ − r ℓ ) i T , (2) where r ( n ) ℓ = p r 2 ℓ + ( δ n d ) 2 − 2 r ℓ θ ℓ δ n d denotes the range between the n -th antenna and the user/scatterer . 1 The proposed beam training method can be directly applied to the multi- user case by estimating channels at individual users. B. Signal Model W e consider a practical two-phase transmission protocol. Speciﬁcally , in the ﬁrst phase, the BS employs a predeﬁned codebook for downlink beam training. Based on the received signal powers, the user estimates key channel parameters and then feeds them back to the BS. Then, in the second phase, the BS designs its transmit beamforming for downlink data transmissions based on estimated CSI. 1) Phase 1: Downlink beam training : For downlink beam training, we consider the DFT codebook, which is a well-known codebook in far-ﬁeld beam training due to its orthogonality and low implementation complexity . Recently , the DFT codebook was also sho wn to be ef fective for near-ﬁeld beam training to jointly estimate the angle and range parameters [4], [8], [10], which requires much lower beam training overhead than the polar-domain codebook. As such, the DFT codebook can be regarded as a universal codebook applicable to both near -ﬁeld and far -ﬁeld beam training. Let V DFT = [ v 1 , v 2 , . . . , v N ] ∈ C N × N denote the DFT codebook used for beam training, which consists of N orthog- onal beamforming vectors that uniformly sample the spatial angle domain. The n -th vector is giv en by v n = a ( φ n ) ≜ 1 √ N h 1 , e − ȷπ φ n , . . . , e − ȷπ ( N − 1) φ n i T , ∀ n ∈ N , (3) where φ n = 2 n − N − 1 N , n ∈ N denotes the sampled spatial angle of the n -th beam codeword. Let s ∈ C denote the transmitted pilot signal with transmit power P t . Then, the receiv ed signal at the user giv en the BS beamforming vector v n , can be expressed as y ( v n ) = h H near v n s + z , ∀ n ∈ N , (4) where z ∼ C N (0 , σ 2 ) is the recei ved additiv e white Gaussian noise (A WGN) at the user with σ 2 denoting the noise power . As such, the receiv ed signal power under the beam sweeping of DFT codebook is giv en by p n = | h H near v n s + z | 2 , ∀ n ∈ N . (5) Based on the received power pattern p DFT ≜ [ p 1 , . . . , p N ] T ∈ R N × 1 , key multi-path channel parameters (or the equiv alent channel) are estimated, including the angles and ranges of the user and scatterers ¯ θ = [ ¯ θ 0 , . . . , ¯ θ L ] T and ¯ r = [ ¯ r 0 , . . . , ¯ r L ] T , respectiv ely , as well as the corresponding complex-valued channel gains ¯ g = [ ¯ g 0 , . . . , ¯ g L ] T . For bre vity , the overall estimated parameters are represented as ¯ η = [ ¯ θ T , ¯ r T ] T and ¯ g . 2) Phase 2: Data transmission : For Phase 2, gi ven the estimated ¯ η and ¯ g , the BS-user channel can be constructed ¯ h H near ( ¯ η , ¯ g ) = ¯ g 0 b H ( ¯ θ 0 , ¯ r 0 ) + L X ℓ =1 ¯ g ℓ b H ( ¯ θ ℓ , ¯ r ℓ ) . (6) Based on ¯ h near ( ¯ η , ¯ g ) , the optimal beamforming vector for data transmission, denoted by w ( ¯ η , ¯ g ) , can be designed based on the maximum-ratio transmission (MR T), i.e., w ( ¯ η , ¯ g ) = ¯ h near ( ¯ η , ¯ g ) ∥ ¯ h near ( ¯ η , ¯ g ) ∥ 2 , for which the corresponding achiev able rate at the user in bits/second/Hertz (bps/Hz) is giv en by R = log 2  1 + P t | h H near w ( ¯ η , ¯ g ) | 2 /σ 2  . (7) 4 C. Problem F ormulation The aim of near-ﬁeld multi-path beam training is to estimate key multi-path channel parameters based on the receiv ed power pattern, which is used to design the BS transmit beam- forming for maximizing the user achie vable rate. 2 Speciﬁcally , according to (1) and (5), there exists an intrinsic mapping between { η , g } and p DFT . Mathematically , the near -ﬁeld multi-path beam training problem can be expressed as { ¯ η , ¯ g } ⇐ F ( p DFT ) , (8) where F ( · ) denotes the mapping function to be designed from the received power pattern to the channel parameters. Note that for the near-ﬁeld multi-path channel case, it is generally challenging to obtain a speciﬁc mapping function F ( · ) in closed form, due to the highly non-linear and complex relationship between the recei ved po wer pattern and multi-path channel parameters. I I I . B E N C H M A R K S C H E M E S A N D P RO P O S E D F R A M E W O R K In this section, we introduce two types of existing bench- mark schemes for near-ﬁeld beam training and point out their main limitations in the considered multi-path scenario. Besides, in order to overcome these limitations, we propose a new hybrid learning-and-optimization method. A. P ower -pattern-based Methods Power -pattern-based methods aim to estimate key channel parameters by analyzing structural features (e.g., angular support) of the received power pattern, which was recently exploited for near-ﬁeld single-path scenarios (see, e.g., [4], [8], [10]). Howe ver , the estimation accuracy of these methods may be sev erely degraded in scenarios with strong-po wer multi-paths due to the follo wing two reasons. First, the an- gular supports of different paths may overlap in the received power pattern, which can be coherently added or destructively counteracted , depending on the phases of individual paths. Second, the random complex-v alued path gains associated with NLoS components further perturb the receiv ed power pattern, posing a more challenging angle and range estimation prob- lem. Thereby , the power-pattern-based methods are generally ineffective in the near-ﬁeld multi-path channels. Example 1 (Signal superposition issue) . For illustration, we compare in Fig. 2 the received power pattern of two individual paths and their superposed ones under three different cases. Speciﬁcally , for Case 1 and Case 2, their spatial angles and complex-v alued channel gains of the two paths are identical, while the ranges of the second paths differ slightly ( r 2 = 8 meter (m) in Case 1 versus r 2 = 9 m in Case 2). It is observed that the individual power patterns of the cases are similar, but their corresponding power patterns of superposed signals are quite different, since a small range variation can lead to signiﬁcantly dif ferent near-ﬁeld channel steering vectors, resulting in distinct recei ved po wer patterns. Next, we compare 2 The beamforming strategy adopted here shares a similar framework with the 5G NR T ype II codebook mechanism [18]. Both approaches achieve linear beam combination through explicit amplitude and phase modulation, thereby providing high-precision feedback of multi-path parameters for effecti ve beamforming. 0 0.05 0.1 0.15 0.2 0.25 Received power 0 0.05 0.1 0.15 0.2 0.25 Received power -0.05 0 0.05 0.1 0.15 Spatial angle 0 0.05 0.1 0.15 0.2 0.25 Received power -0.05 0 0.05 0.1 0.15 Spatial angle 9 0.2 Fig. 2. Power pattern comparison. The left column sho ws the separated components deﬁned by h H i = g i b H ( θ i , r i ) for i = 1 , 2 , while the right column shows the superposed channel h H = h H 1 + h H 2 . Red text highlights parameter dif ferences compared to Case 1, demonstrating the high sensitivity of signal superposition to slight parameter variations. Case 1 and Case 3, where the physical positions (both angle and range) of the two paths are identical, while the phases of their channel gains are different, i.e., g 2 = e ȷ 0 . 5 π in Case 1 versus g 2 = e ȷ 0 . 2 π in Case 3. One can observe that even with such a mild phase v ariation, the po wer patterns of their superposed signals differ signiﬁcantly , due to the effect of phase in the multi-path signal superposition (i.e., in-phase superposition or out-of-phase cancellation). Consequently , re- lying on theoretical analysis of power pattern to estimate channel parameters is highly challenging (if possible) in multi- path scenarios, as slightly dif ferent channel parameters can yield signiﬁcantly distinct receiv ed power patterns. B. DL-based Methods Alternativ ely , DL-based methods hav e also been proposed for near-ﬁeld beam training [13]–[15], which lev erage pow- erful non-linear mapping capabilities of DL to infer channel parameters from the recei ved powers. Although DL-based methods can achieve fast inference in near-ﬁeld beam training under the LoS channels [13], [15], they may not be able to provide accurate beam training in practical near-ﬁeld multi- path channel scenarios due to the following reasons. • Unknown number of paths: First, the e xisting DL- based methods for multi-path beam training have mostly assumed that the number of paths is known a priori , which, howe ver , may not be available in practice. • P ermutation ambiguity: Second, con ventional DL meth- ods typically require a deterministic matching relation- ship between the output channel paths and the ground- truth channel paths during model training. Howe ver , due 5 Refined parameters 𝜼 , 𝐠 Stage 1: Learning- based coarse estimation Stage 2: Optimization- based parameter refinement PSO with confined search region Received power pattern 𝐩  Customized U-Net model Coarse position estimation 𝜼 , path existence logits 𝒒 Fig. 3. The framework of proposed hybrid learning-and-optimization method. to the unordered nature of multi-path channels, there is no predeﬁned matching relationship, resulting in a permuta- tion ambiguity issue in estimating multi-path parameters. • Accuracy limitation: Third, the existing DL-based meth- ods usually employ the classiﬁcation framew ork to esti- mate the best beam codeword based on a predeﬁned near- ﬁeld codebook [13], which inevitably introduces quan- tization errors in estimating continuous-valued channel parameters, thus resulting in limited estimation accuracy . C. Proposed Hybrid F rame work T o address the abov e issues, we propose in this paper a new hybrid learning-and-optimization framework for near- ﬁeld multi-path beam training, which synergistically fuses the efﬁcienc y of data-driven DL and the high precision of model- based optimization, thereby achieving accurate estimation of key channel parameters with low complexity . Speciﬁcally , as illustrated in Fig. 3, our key idea is to ﬁrst estimate coarse multi-path channel parameters by using DL techniques and then r eﬁne the parameters by using optimization techniques. The main procedures are summarized below . • Stage 1 (Learning-based coarse estimation): For this stage, we design an ef ﬁcient U-Net model to enable coarse channel parameter estimation by using a PIT strat- egy to address the permutation ambiguity issue, estimat- ing path e xistence logits to tackle the issue of an unkno wn number of paths, and employing a customized regression loss to reduce codebook quantization errors. Speciﬁcally , the model takes the obtained DFT power pattern p DFT as input, and outputs a coarse estimation of the multi-path angles and ranges ˜ θ , ˜ r (i.e., ˜ η ), as well as the correspond- ing path existence logit ˜ q = [ ˜ q 0 , ˜ q 1 , . . . , ˜ q L ] T , which indicates the existence probability of each path. More- ov er, the statistical characteristics of estimation errors are also obtained during model validation, based on which a conﬁned parameter search region B around the coarse estimate ˜ η is constructed for subsequent optimization. • Stage 2 (Optimization-based parameter reﬁnement): Giv en the coarse estimation ˜ η and conﬁned search region B , an optimization-based method is employed to obtain a high-precision estimation for the multi-path position parameters, denoted by ¯ η , by minimizing a deﬁned dis- crepancy between the reconstructed and recei ved power pattern based on the maximum likelihood principle. This problem is efﬁciently solved by a customized PSO al- gorithm, which effecti vely av oids poor local minima in the conﬁned search region for the non-con vex problem, achieving a high-quality solution with low complexity . The details of our proposed hybrid method are presented in the next two sections. I V . S TAG E 1 F O R L E A R N I N G - B A S E D C O A R S E E S T I M A T I O N This stage aims to lev erage the po werful feature extraction capabilities of DL to estimate coarse multi-path position parameters from the received power pattern p DFT . It serves as a crucial preliminary step that signiﬁcantly narrows do wn the search space for the subsequent optimization-based reﬁnement. T o achieve high-quality feature extraction, we employ a customized U-Net architecture as the backbone of our neural network model. Note that the U-Net architecture was origi- nally proposed for biomedical image segmentation tasks [19]. Its encoder–decoder structure with skip connections allows for effecti ve multi-scale feature learning, making it applica- ble to v arious tasks, such as channel estimation and signal separation [20], [21]. For the considered near-ﬁeld multi- path beam training problem, the U-Net architecture is well- suited due to the following reasons. First, the encoder in U- Net allows for progressiv ely extracting features at different scales, enabling the neural network to capture both coarse dominant information (e.g., power peak of the LoS path) and detailed spatial patterns (e.g., power of superposed signals among multi-paths). Next, the skip connection structure is employed to concatenate features from corresponding layers in the encoder and decoder , so that the features extracted from the encoder layers can be directly transferred to the corresponding decoder layers. Compared with traditional CNNs, the U-Net architecture can better preserve spatial information during feature extraction by fusing features from different scales, hence achieving more accurate channel parameter estimation than CNNs (to be numerically shown in Fig. 7). Moreover , unlike e xisting DL-based methods based on prior information of the number of channel paths, our proposed model does not require this prior knowledge, since it estimates the path existence probability for each potential path. In the follo wing, we introduce the detailed model training method. A. Data Prepr ocessing T o train the proposed U-Net model, we ﬁrst need to generate a large dataset of labeled samples. Speciﬁcally , we generate a dataset consisting of D samples, where each sample is generated by simulating near-ﬁeld multi-path channels based on the system model described in Section II. For each sample, we randomly determine the number of NLoS paths L within a maximum limit L max ; the user and scatterer positions are randomly generated as { θ ℓ , r ℓ } L ℓ =0 within near-ﬁeld region; and then compute the corresponding receiv ed power pattern p DFT . Besides, to eliminate the scale disparity between the spatial angle (unitless) and range (in meters), the angle and range parameters are standardized to zero mean and unit variance before being fed into the neural network for training. During the online inference phase, the estimated standardized 6 DoubleConv Received power- pattern 𝐩  Regressor 󰇛 𝜃  , 𝑟  , 𝑞  󰇜 … … 󰇛 𝜃  , 𝑟  , 𝑞  󰇜 Regressor 󰇛 𝜃   , 𝑟   , 𝑞   󰇜 Regressor DoubleConv DoubleConv DoubleConv Bottleneck DoubleConv DoubleConv DoubleConv DoubleConv 󰇛𝐷  , 𝑁󰇜 󰇛𝐷  , 𝑁 /2 󰇜 󰇛𝐷  , 𝑁 /2  󰇜 󰇛𝐷  , 𝑁 /2  󰇜 󰇛𝐷  , 𝑁 /2  󰇜󰇛 𝐷  , 𝑁 /2  󰇜 󰇛𝐷  , 𝑁 /2  󰇜 󰇛𝐷  , 𝑁 /2 󰇜 󰇛𝐷  , 𝑁󰇜 Max pool UpConv Conv BN ReLU Input Output Skip connection DoubleConv Connection Output parameters Conv BN ReLU Fig. 4. U-Net architecture and training methodology for coarse estimation. angle and range are transformed back to the spatial angle and physical range values for the subsequent optimization stage. This ensures balanced contributions from both angle and range parameters to the loss function, prev enting bias caused by magnitude differences. Moreov er, to handle the varying num- ber of paths across dif ferent samples, we adopt a padding strat- egy where the angle and range parameters are padded with ze- ros up to the maximum number of paths L max . Consequently , the ground-truth labels are constructed as { θ ℓ , r ℓ , c ℓ } L max ℓ =0 , where c ℓ denotes the path existence indicator with c ℓ = 1 indicating the existence of path ℓ and c ℓ = 0 otherwise. B. Model Arc hitecture W e adopt a 1D U-Net-based model as the core backbone to learn the mapping between the receiv ed power pattern p DFT (as input) and ke y channel parameters (as output) including the polar coordinates of user and scatterers ˜ θ , ˜ r , as well as the path existence logits ˜ q = [ ˜ q 0 , . . . , ˜ q L ] T . The overall architecture of the proposed U-Net model is illustrated in Fig. 4. Speciﬁcally , the designed model comprises an encoder for feature extraction, a bottleneck for latent representation, and a decoder for parameter reconstruction. The encoder stacks DoubleCon v modules to progressiv ely downsample the input receiv ed po wer pattern into higher -level semantic features along the spatial dimension N , condensing multi-path infor- mation into a global latent representation within the bottleneck. The symmetric decoder employs transposed con volutions for upsampling, leveraging skip connections to fuse multi-scale features from the encoder, thereby preserving high-resolution spatial information. Finally , a fully connected layer maps the restored features to the polar coordinates of paths and their existence logits. The fundamental DoubleConv block consists of two sequential operations, each including 1D conv olution (Con v), batch normalization (BN), and rectiﬁed linear unit (ReLU) activ ation. Note that the U-Net model is employed solely to estimate angle and range parameters, while complex- valued channel gains will be reconstructed based on the estimated angle–range parameters (see Section V). C. U-Net T raining Method For the proposed U-Net training method, we ﬁrst judiciously design a loss function to address three key issues in near- ﬁeld multi-path beam training, namely , the unknown number of paths, permutation ambiguity , and accuracy limitation. T o this end, the designed loss function is composed of three losses, i.e., a regression loss for the position of LoS path, a permutation-in variant regression loss for the positions of NLoS paths, and a binary cross-entropy loss for path existence classiﬁcation. These loss components are detailed as follows. • Regr ession loss for the LoS path: In practical scenarios, the LoS path is easy to identify , as it has the strongest receiv ed power among all paths. The regression loss for the LoS path refers to the Euclidean distance between the estimated and ground-truth user positions, giv en by L LoS = q ( ˜ θ 0 − θ 0 ) 2 + ( ˜ r 0 − r 0 ) 2 . (9) • P ermutation-inv ariant regression loss for NLoS paths: Note that the recei ved power pattern p DFT contains a LoS path and multiple NLoS paths, where the order of NLoS scatterers is generally unkno wn. This implies that ev en if the estimated angle and range parameters are accu- rate, the correspondence between estimated and ground- truth scatterer positions is not unique, hence resulting in the permutation ambiguity issue in training the neural network. T o tackle this issue, we propose a PIT -based method to design a permutation-in variant regression loss for the NLoS scatterers [17], which emulates all possible label permutations and selects one permutation that min- imizes the training loss in (13). Speciﬁcally , to achieve this goal, we need to consider all possible permutations of the positions of ground-truth scatterers. As such, we construct a cost matrix C ∈ R L max × L max , where each element represents a squared error between the estimated and ground-truth positions, i.e., ˜ θ , ˜ r and θ , r , given by C ℓ,m = ( ˜ θ ℓ − θ m ) 2 + ( ˜ r ℓ − r m ) 2 , ∀ ℓ, m ∈ L max , (10) where L max ≜ { 1 , 2 , . . . , L max } . Let x ℓ,m ∈ { 0 , 1 } deﬁne whether the estimated scatterer ℓ is matched with the ground-truth scatterer m or not. Thus, the problem of ﬁnding the optimal permutation for achieving the mini- mum matching cost can be formulated as an assignment problem as follows (P3) : min { x ℓ,m } L max X ℓ =1 L max X m =1 x ℓ,m C ℓ,m s.t. L max X ℓ =1 x ℓ,m = 1 , ∀ m ∈ L max , (11a) L max X m =1 x ℓ,m = 1 , ∀ ℓ ∈ L max , (11b) x ℓ,m ∈ { 0 , 1 } , ∀ ℓ, m ∈ L max , (11c) 7 where constraints (11a) and (11b) ensure that each esti- mated scatterer is matched with exactly one ground-truth scatterer and vice versa. Problem ( P3 ) is a classic linear sum assignment problem in combinatorial optimization, which can be solved in polynomial time by using the Hungarian algorithm, without the need to ev aluate all permutations explicitly [17]. For the sake of brevity , the speciﬁc details of the Hungarian algorithm are omitted. W e denote the optimal solution to Problem ( P3 ) by { x ∗ ℓ,m } . As such, the optimal permutation function can be constructed based on the optimal assignment as follows π ( ℓ ) = m, if x ∗ ℓ,m = 1 . (12) Thus, we can obtain the permutation-in variant regression loss for NLoS paths based on the optimal permutation π in (12). Mathematically , the permutation-in variant regres- sion loss for the positions of NLoS paths is deﬁned as the Euclidean distance between the estimated and optimally matched ground-truth path positions, giv en by L Reg = L max X ℓ =1 q ( ˜ θ ℓ − θ π ( ℓ ) ) 2 + ( ˜ r ℓ − r π ( ℓ ) ) 2 . (13) • Binary cross-entr opy loss for path existence classi- ﬁcation: In order to estimate the number of dominant NLoS paths, we introduce a binary cross-entropy loss for each detected NLoS path. 3 The loss is deﬁned to measure the discrepancy between the estimated path existence logits ˜ q and the ground-truth path existence indicators c = [ c 0 , c 1 , . . . , c L max ] T , given by L Cls = − L max X ℓ =1  c π ( ℓ ) log  σ ( ˜ q ℓ )  + (1 − c π ( ℓ ) ) log  1 − σ ( ˜ q ℓ )  , (14) where σ ( x ) = 1 / (1 + e − x ) denotes the sigmoid activ ation function that maps the logit ˜ q ℓ by the neural network to a probability value between 0 and 1. Minimizing this loss encourages the U-Net model to output high-valued logits for present paths and low-v alued logits for absent ones. Based on the above, the objective of the model training is to minimize the total loss function, which is deﬁned as the sum of the LoS regression loss, the permutation-in variant NLoS regression loss, and the classiﬁcation loss, giv en by L total = α 1 L LoS + α 2 L Reg + α 3 L Cls , (15) where α 1 , α 2 , and α 3 are weighting coef ﬁcients that balance the contributions of each loss component. The ef fectiveness of this designed loss function will be shown in Section VI. D. Online Inference Pr ocedur e After training, the U-Net model can be employed to infer the coarse multi-path position parameters based on the receiv ed power pattern p DFT . Speciﬁcally , given the input p DFT , the model outputs the coarse estimation of multi-path position parameters, i.e., the angles and ranges ˜ θ and ˜ r , as well as the 3 Due to the unknown number of dominant NLoS paths, position regression losses in (9) and (13) are insufﬁcient to identify valid channel components. The path existence loss is essential to explicitly estimate the presence probability of each path, thereby enabling the accurate determination of the number of dominant multi-path components. path existence logits ˜ q . T o determine the number of dominant NLoS paths ˜ L , we apply a thresholding operation on the estimated path existence logits as follows ˜ L = L max X ℓ =1 I ( σ ( ˜ q ℓ ) ≥ λ th ) , (16) where I ( · ) is the indicator function that returns 1 if the condition is true and 0 otherwise, and λ th is a predeﬁned threshold. Let k 1 , k 2 , . . . , k ˜ L denote the indices of the detected ˜ L paths, i.e., σ ( ˜ q k i ) ≥ λ th , k i ∈ ˜ L ≜ { 1 , . . . , ˜ L } . Thus, the coarse angle-and-range estimates of the detected paths are giv en by ˜ θ = [ ˜ θ k 1 , . . . , ˜ θ k ˜ L ] T and ˜ r = [ ˜ r k 1 , . . . , ˜ r k ˜ L ] T , respectively . E. 3 σ -criterion Boundary Besides generating the per-path coarse position parameter estimation ˜ η = [ ˜ θ T , ˜ r T ] T , we can also obtain the statistical characteristics of the estimation error based on the trained model. Speciﬁcally , the expectation and variance of the es- timation error for each path can be obtained during the U-Net model validation [22], which are giv en by µ r ℓ = E ( ˜ r ℓ − r ℓ ) , σ r ℓ = p V ( ˜ r ℓ − r ℓ ) , ℓ ∈ ˜ L , µ θ ℓ = E ( ˜ θ ℓ − θ ℓ ) , σ θ ℓ = q V ( ˜ θ ℓ − θ ℓ ) , ℓ ∈ ˜ L , (17) where E ( · ) and V ( · ) denote the expectation and variance of estimated parameters ˜ r ℓ and ˜ θ ℓ , respectiv ely . Let r lb ℓ , r ub ℓ , θ lb ℓ , and θ ub ℓ denote the lower and upper bounds of the search region for range and angle of path ℓ , respectiv ely , which are deﬁned as follows r lb ℓ = max( ˜ r ℓ − µ r ℓ − 3 σ r ℓ , Z F res ) , ℓ ∈ ˜ L , r ub ℓ = min( ˜ r ℓ − µ r ℓ + 3 σ r ℓ , Z Rayl ) , ℓ ∈ ˜ L , θ lb ℓ = max( ˜ θ ℓ − µ θ ℓ − 3 σ θ ℓ , − 1) , ℓ ∈ ˜ L , θ ub ℓ = min( ˜ θ ℓ − µ θ ℓ + 3 σ θ ℓ , 1) , ℓ ∈ ˜ L , (18) where Z F res = 0 . 5 p D 3 /λ and Z Rayl = 2 D 2 /λ denote the Fresnel and Rayleigh distances, serving as the lower and upper bounds of the near-ﬁeld region, respectiv ely , with D = ( N − 1) d representing the array aperture. Then the search region of the angle and range of each path can be obtained by the 3 σ rule [22], which deﬁnes the search region that contains the ground-truth parameters with high probability as follows B =  ( θ ℓ , r ℓ ) | θ lb ℓ ≤ θ ℓ ≤ θ ub ℓ , r lb ℓ ≤ r ℓ ≤ r ub ℓ , ∀ ℓ ∈ ˜ L  . (19) Note that we shall sho w in Section VI that the 3 σ -criterion can ef fectively conﬁne the search region around the coarse estimate ˜ η , while ensuring that the ground-truth parameters lie within this region with high probability . V . S TAG E 2 F O R P S O - B A S E D P A R A M E T E R R E FI N E M E N T For Stage 2, based on the coarse parameter estimation ˜ η in Stage 1, we propose a customized PSO method to reﬁne the multi-path channel parameters for achieving improv ed estimation accuracy . Note that dif ferent from con ventional PSO methods that randomly initialize particles over a full search space, our proposed method initializes particles around the coarse estimate ˜ η bounded by the conﬁned search region B in Stage 1 (i.e., (19)), hence signiﬁcantly reducing the search space and enhancing algorithm con vergence speed. 8 A. Problem F ormulation T o reﬁne the multi-path channel parameters, we formulate an optimization problem to minimize the discrepancy between the actual received power pattern p DFT and the reconstructed power pattern based on the near -ﬁeld multi-path channel model parameterized by η and g . Speciﬁcally , let B ( η ) ∈ C N × ( ˜ L +1) deﬁne the reconstructed steering matrix based on the near-ﬁeld multi-path channel model, which is giv en by B ( η ) ≜  b ( θ 0 , r 0 ) , b ( θ 1 , r 1 ) , . . . , b ( θ ˜ L , r ˜ L )  . (20) Thus, the near -ﬁeld multi-path channel in (1) can be re- expressed as a function of the position parameters η and chan- nel gains g , i.e., h near = B ( η ) g . Let Φ ( η ) ≜ V H DFT B ( η ) denote the effecti ve steering matrix. Then, given the conﬁned search region B around the coarse estimate ˜ η in Stage 1, the problem of Stage 2 can be formulated as follows (P4) : min η , g   p DFT − | Φ ( η ) g | 2   2 2 s.t. η ∈ B . (21) Note that Problem ( P4 ) is a non-conv ex optimization problem and thus is generally challenging to solve, since 1) the modulus operation in the objective function, and 2) the non-linear coupling between η and g in the objectiv e function. Moreov er, the search dimension of Problem ( P4 ) is 4( ˜ L + 1) , hence making it computationally prohibitiv e to solve by con ventional optimization methods when ˜ L is large. T o tackle these difﬁculties, we propose an inner-outer iterative optimization method to solve Problem ( P4 ) efﬁciently , where the inner layer estimates the channel gains g giv en position parameters η , while the outer layer optimizes the position parameters η using a customized PSO method. Inner problem: Gi ven any ﬁxed position parameters η , the inner layer problem of channel-gain optimization is given by (P5) : min g   p DFT − | Φ ( η ) g | 2   2 2 . (22) W e denote ¯ g as the optimized solution to Problem (P5) and deﬁne G ( η ) ≜   p DFT − | Φ ( η ) ¯ g | 2 , which is the optimized objectiv e function of Problem (P5) giv en any feasible η . Outer problem: Based on the inner problem, the outer layer problem of position parameters optimization can be expressed as (P6) : min η G ( η ) s.t. (21) . B. Inner-layer for Channel Gain Estimation Note that given any ﬁxed position parameters η , Problem ( P5 ) is still a non-con vex optimization problem, which is difﬁcult to solve, since the objective function only depends on the received power , while the phase information is missing. T o solv e this phase retriev al problem, we propose a GS-based channel gain estimation method [23], which is an alternat- ing projection method that iterativ ely projects the channel gain onto two domains until conv ergence, namely , a signal power domain with p DFT and a steering matrix domain with Φ ( η ) , respectiv ely . Speciﬁcally , the GS algorithm iterati vely reﬁnes the estimate of g by alternating between two steps based on a properly designed initialization. • Pr ojection onto signal power domain. In the i -th iteration, given the current estimate of g , denoted by g ( i ) , we ﬁrst obtain the noiseless estimated signal as ˜ y ( i ) = Φ ( η ) g ( i ) . Then, we project ˜ y ( i ) onto the sig- nal power domain to enforce the power constraint by replacing the po wer of ˜ y ( i ) with the recei ved po wer p DFT while retaining the phase of ˜ y ( i ) . This results in a reﬁned estimated signal ˆ y ( i ) , which can be expressed as ˆ y ( i ) = √ p DFT ⊙ ˜ y ( i ) | ˜ y ( i ) | , (23) with ⊙ denoting the element-wise multiplication. • Pr ojection onto steering matrix domain. Ne xt, we project the estimated signal ˆ y ( i ) back onto the steering matrix domain to update the channel gain g ( i +1) . This is achiev ed by solving the following least-squares problem g ( i +1) = arg min g   ˆ y ( i ) − Φ ( η ) g   2 2 , (24) which admits a closed-form solution g ( i +1) = Φ † ( η ) ˆ y ( i ) , (25) with Φ † ( η ) =  Φ H ( η ) Φ ( η )  − 1 Φ H ( η ) denoting the pseudo-in verse of Φ ( η ) . Remark 1 (Initialization of GS algorithm) . The initializa- tion of g in the GS algorithm can signiﬁcantly af fect its con vergence and estimation accuracy , due to the non-conv ex nature of Problem ( P5 ). T o address this issue, we design a dedicated spectral initialization method [24] to provide a rigorous and high-quality initial estimate of g by exploit- ing the statistical structure of the received power pattern. Speciﬁcally , let φ T n denote the n -th row of the effecti ve steering matrix Φ ( η ) , such that Φ ( η ) = [ φ T 1 , φ T 2 , . . . , φ T N ] T . W e ﬁrst construct a positi ve semi-deﬁnite correlation matrix, M , as a weighted sum of outer products of the ef fectiv e steering vectors, where the weights correspond to the receiv ed power pattern p DFT = [ p 1 , . . . , p N ] T , which is gi ven by M = 1 N P N n =1 p n φ n φ H n . According to the principle of spectral initialization, the principal eigen vector of M aligns with the direction of the true channel vector . T o obtain this direction, we perform the eigen value decomposition (EVD) of M to obtain its eigen values and eigenv ectors as M = EΛE H , where E = [ e 0 , e 1 , . . . , e ˜ L ] contains the eigenv ectors, and Λ = diag ( λ 0 , λ 1 , . . . , λ ˜ L ) contains the eigen values sorted in a descending order, i.e., λ 0 ≥ λ 1 ≥ . . . ≥ λ ˜ L . The eigen vector e 0 , corresponding to the largest eigen value λ 0 , captures the spatial structure (relati ve phase and amplitude) of the channel gain. Howe ver , as e 0 is unit-norm, i.e., ∥ e 0 ∥ 2 = 1 , it lacks the absolute scale information. T o recover the absolute strength, we perform a power scaling that minimizes the discrepancy between the receiv ed power pattern and the reconstructed one, which is giv en by β = r P N n =1 p n P N n =1 | φ H n e 0 | 2 . Finally , the initialization for g (0) of the GS algorithm is set as g (0) = β e 0 . Giv en the above initialization for g (0) , which ensures linear con vergence with high probability [24], the GS algorithm proceeds to reﬁne the estimate of g via alternating projections in (23) and (25). This process continues until con vergence is achiev ed, yielding the estimated channel gains ¯ g . 9 C. Outer-layer for P osition P arameters Estimation The outer Problem (P6) is generally challenging to solve due to the non-con vexity of the effecti ve steering matrix Φ ( η ) w .r .t. η . Traditional gradient-based methods may easily get stuck in local optima within highly complex search spaces. T o address this issue, we employ a gradient-free heuristic- based PSO algorithm to obtain a high-quality solution to Problem (P6) , which has been shown to achiev e exceptional global search capabilities and rob ustness in tackling non- con vex estimation problems [25]. Speciﬁcally , for the proposed PSO-based algorithm, we ﬁrst initialize M PSO particles. Let P (0) PSO ≜ { η (0) 1 , . . . , η (0) M PSO } denote the candidate parameter set of the multi-path channel, where η (0) 1 is set as the coarse estimate from Stage 1, i.e., η (0) 1 = ˜ η , while the remaining particles { η (0) m } M PSO m =2 are randomly initialized within the search region B to ensure div ersity in the initial population. Moreover , the associated velocities U (0) PSO ≜ { u (0) 1 , . . . , u (0) M PSO } denote the update vectors for each particle, which are initialized to zero, i.e., u (0) m = 0 , ∀ m ∈ { 1 , . . . , M PSO } . Then, we deﬁne f ﬁt ( · ) as a ﬁtness function , which guides the optimization process by ev aluating how well the po wer pattern constructed by the candidate parameter vector η matches the receiv ed power pattern. Mathematically , f ﬁt ( · ) is deﬁned as f ﬁt ( η ) =   p DFT − | Φ ( η ) ¯ g | 2   2 2 + ζ J ( η ) , (26) where ζ > 0 controls the penalty weight and J ( η ) is a bound- ary penalty designed to penalize any candidate parameters that violate the bounds of the search region B , giv en by J ( η ) = ˜ L X ℓ =0 h f p ( r ℓ , r ub ℓ , r lb ℓ ) + f p ( θ ℓ , θ ub ℓ , θ lb ℓ ) i . (27) Herein, f p ( x, x ub , x lb ) denotes the normalized quadratic penalty function, deﬁned as f p ( x, x ub , x lb ) = ( x − x ub ) 2 + + ( x lb − x ) 2 + ( x ub − x lb ) 2 , (28) where ( y ) + ≜ max { 0 , y } is the positiv e operation, ensuring that penalties are incurred only when the bounds are violated. Then, for each particle in the population, the velocity and candidate parameters are updated based on both its indi vidual best and the global best positions. Speciﬁcally , the individual best position η ( t ) m, pbest and the global best position η ( t ) gbest at iteration t are determined as η ( t ) m, pbest = arg min τ ∈{ 0 ,...,t } f ﬁt  η ( τ ) m  , ∀ m ∈ { 1 , . . . , M PSO } , η ( t ) gbest = arg min m ∈{ 1 ,...,M PSO } f ﬁt  η ( t ) m, pbest  . (29) Thus, the velocity and position of the m -th particle at iteration t are updated as follows u ( t +1) m = ρ u ( t ) m + ϱ 1 τ 1  η ( t ) m, pbest − η ( t ) m  + ϱ 2 τ 2  η ( t ) gbest − η ( t ) m  , η ( t +1) m = η ( t ) m + u ( t +1) m , (30) where ρ denotes the inertia weight, while ϱ 1 and ϱ 2 represent the learning factors for the cognitiv e and social components, respectiv ely . The terms τ 1 , τ 2 ∼ U [0 , 1] are random coefﬁ- cients introducing stochastic exploration. Follo wing the above procedures, after T PSO iterations, a high-quality solution to Problem (P4) is obtained, which is denoted as ¯ η = η ( T PSO ) gbest . Remark 2 (Algorithm con vergence and computational complexity) . First, for the proposed hybrid learning-and- optimization method, its con vergence is determined by the parameter reﬁnement in Stage 2. Since the objectiv e function of Problem (P4) is non-increasing ov er PSO iterations and is lower -bounded by zero, the con vergence of the proposed algorithm is guaranteed. Next, consider the computational complexity of proposed method. For the U-Net model inference in Stage 1, its com- putation complexity is proportional to the size of the network (i.e., the number of layers, the feature sizes, and the spatial di- mensions of each layer). Thus, the computation complexity of Stage 1 is in the order of C DL = O ( P 5 i =1 D i D i − 1 N / 2 i − 1 K ) , where K is the kernel size of the conv olutional layer in DoubleCon v block. For typical U-Net-based architectures, the inference time of C DL is generally in the order of microsec- onds [15], [26], which is negligible compared to iterati ve optimizations. For Stage 2, the computational complexity of PSO reﬁnement is dominated by the GS iterations in the inner-layer for channel gain estimation and the ev aluation of the ﬁtness function in (26) in the outer-layer for each particle. Speciﬁcally , the computational comple xity of the projection in (23) is in the order of O ( N ) , which is dominated by the element-wise multiplication between p DFT and ˜ y ( i ) . Meanwhile, the computational complexity of the projection in (24) is in the order of O ( N ˜ L ) , which is dominated by the matrix multiplication between Φ † ( η ) and ˆ y ( i ) . As such, the computational complexity of the GS algorithm is in the order of C GS = O  I GS ( N + N ˜ L )  , where I GS is the required number of iterations of the GS algorithm for con vergence. The outer -layer PSO optimization requires e valuating the ﬁtness function for each particle in each iteration, which has a computational complexity of O ( N ˜ L ) due to the matrix multiplication between Φ ( η ) and ¯ g in (26). Thus, the com- putational complexity of the PSO reﬁnement is in the order of C PSO = O  M PSO T PSO ( I GS ( N + N ˜ L ) + N ˜ L )  . Based on the above, the total computational complexity of the proposed hybrid learning-and-optimization beam training algorithm can be expressed as C T otal = O  M PSO T PSO N ( I GS (1 + ˜ L ) + ˜ L )  . Note that the coarse estimation in Stage 1 signiﬁcantly narrows the feasible search region, making it possible to use a much smaller number of particles and iterations to achiev e satisfactory con vergence compared to global space search. V I . N U M E R I C A L R E S U L T S In this section, we present numerical results to demon- strate the effecti veness of our proposed hybrid learning-and- optimization beam training method. A. System Setup and Benchmark Schemes The system parameters are set as follo ws unless otherwise speciﬁed. W e consider an XL-array with N = 256 antennas operating at a frequency of 30 GHz, thus the wav elength is 10 T able I: Simulation hyperparameter conﬁguration Stage Hyperparameter V alue Stage 1 Feature dimensions, { D i } 5 i =1 [64, 128, 256, 512, 1024] W eighting coefﬁcients, { α i } 3 i =1 [1, 1, 1] Initial learning rate 10 − 3 Batch size 256 Number of epochs 1000 Con volution kernel size 3 Optimizer Adam Stage 2 Number of particles, M PSO 50 Number of iterations, T PSO 100 Inertia weight, ρ 0.7 Individual learning factor, ϱ 1 1.5 Global learning factor , ϱ 2 1.5 Penalty weight, ζ 100 λ = 0 . 01 m. The channel is generated based on the near- ﬁeld multi-path channel model in (1), where the number of NLoS paths is uniformly set as L ∈ { 2 , 3 , 4 } . The spatial angles { θ ℓ } and ranges { r ℓ } are independently and uniformly distributed within θ ℓ ∼ U ( − 0 . 5 , 0 . 5) and r ℓ ∼ U (8 , 38) m. The complex-valued channel gain g 0 and g ℓ are modeled as g 0 = q κ κ +1 λ 4 π r 0 e − ȷ 2 π λ r 0 and g ℓ ∼ C N (0 , σ 2 l ) , ℓ ∈ L , where σ l = 1 √ L ( κ +1) λ 4 π r 0 and κ denotes the Rician factor varying from 0 dB to 30 dB [8], [27]. The reference received signal-to- noise ratio (SNR) is deﬁned as SNR = P t || h near || 2 /σ 2 , where the transmit power P t = 10 dBm and the noise power σ 2 = − 80 dBm, respecti vely . The size of dataset is 500000, among which 80% is used for training, 10% for validation, and 10% for testing. All ev aluations are conducted using Python 3.9 and PyT orch 2.2 on a computer equipped with an Intel Xeon Platinum 8336C CPU @ 2.30GHz and a graphics processing unit (GPU) (NVIDIA R TX 4090 with 24GB memory). Other parameters are summarized in T able I. For performance comparison, we consider the follo wing benchmark schemes: • P erfect-CSI-based beamforming : The beamforming vector is designed based on perfect CSI, which serves as an upper bound for performance comparison. • LoS-based scheme : This scheme assumes that only a strong LoS path exists and infers channel parameters based on the recei ved power pattern, which ﬁrst estimates user angle based on a DFT codebook, and then estimates user range with a polar-domain codebook [4]. • CNN-based scheme : This scheme adopts the con ven- tional CNN-based near-ﬁeld beam training method pro- posed in [13], where the CNN is trained to directly map the received po wer pattern to the polar-domain codebook. • F ar -ﬁeld beam training : This scheme performs beam training based on the DFT codebook by selecting the beams with the largest recei ved powers, where the num- ber of selected beams is set to the number of paths assumed to be known a priori . Besides, we also consider the following variants of the proposed method to ev aluate the effecti veness of each stage: • Coarse estimation of Stage 1 : The coarse estimates ˜ θ ℓ , ˜ r ℓ obtained from Stage 1 are used to reconstruct the near-ﬁeld multi-path channel for beamforming designs, where channel gains are estimated via the GS method. • PSO with full space : For this scheme, we consider the PSO-based parameter estimation, where the full parame- ter space is searched without prior information. • Pr oposed method ( 1 σ boundary) : This scheme applies the proposed method with a search boundary deﬁned by the 1 σ -criterion in (18), which is more restrictive than the 3 σ -criterion used in the proposed method. B. Conver gence P erformance Analysis Fig. 5(a) illustrates the con vergence behavior of the training and testing losses for the proposed U-Net model across epochs. Speciﬁcally , it depicts three distinct loss components deﬁned in (15). It is observed that all loss curves exhibit a sharp descent during the initial training phase before stabilizing, indicating that the model effecti vely learns the features from the complex near-ﬁeld multi-path power pattern. Furthermore, the testing loss curves are close to the training loss curves with minimal di vergence, demonstrating strong generalization capabilities and robustness against overﬁtting. Fig. 5(b) illustrates the con ver gence performance of the pro- posed PSO-based parameter reﬁnement algorithm by plotting the ﬁtness function value against the number of iterations. It is observed that the proposed method initiates with a signiﬁcantly lower ﬁtness v alue compared to the con ventional PSO method with full space. This advantage results from the high-quality coarse estimates from Stage 1, enabling the construction of a conﬁned search region for effecti ve particle initialization. Consequently , the proposed method exhibits superior con ver - gence characteristics, reaching a stationary point rapidly after 40 iterations. In contrast, the con ventional PSO method with full space exhibits a signiﬁcantly slower conv ergence rate, necessitating a substantial increase in the number of iterations to approach a satisfactory solution. On the other hand, while the proposed method with 1 σ -criterion boundary exhibits the fastest con vergence, it suf fers at a suboptimal ﬁtness value. This indicates that an overly restrictiv e search boundary may fail to cover the true parameter range, thereby pre venting the algorithm from locating the global optimum. Fig. 5(c) compares the a verage con ver gence runtime of the different beam training schemes versus SNR, which is measured until con vergence. It is observed that the PSO with full space scheme incurs the longest runtime to con vergence (around 10 2 s) among all methods, which is due to its ex- haustiv e search over the entire high-dimensional parameter space without any prior information to guide the optimization process. In contrast, the coarse estimation exhibits the shortest runtime (around 10 − 2 s), as its complexity is limited to a single forward inference pass of the U-Net. Crucially , the proposed method maintains a lo w runtime (around 10 0 s) that is signiﬁcantly shorter than the con ventional PSO benchmark method. This efﬁcienc y stems from the learning-based coarse estimation, which effecti vely conﬁnes the search region and thereby allowing the optimization-based parameter reﬁnement to conv erge rapidly with signiﬁcantly fewer iterations. C. P arameter Estimation Comparison Fig. 6(a) demonstrates the parameter estimation accuracy by plotting the estimated coordinates of LoS and NLoS paths in the Cartesian system. Herein, the LoS path and NLoS paths are spatially close in the angular domain, typically 11 0 25 50 75 100 125 150 Number of epochs 0 0.1 0.2 0.3 0.4 0.5 0.6 Loss value (a) T rain loss and test loss for three components of the loss function in (15) versus epochs. 10 0 10 1 10 2 10 3 Number of iterations 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 Fitness function value Iter: 12 Iter: 40 Iter: 894 (b) Fitness function value of (26) versus itera- tions. -10 -5 0 5 10 15 20 25 30 SNR (dB) 10 -2 10 -1 10 0 10 1 10 2 Average convergence runtime (s) (c) A verage con vergence runtime versus SNR. Fig. 5. Conv ergence behavior and runtime analysis of the proposed method. 6 8 10 12 14 16 x-axis (m) -6.5 -6 -5.5 -5 -4.5 -4 -3.5 -3 -2.5 y-axis (m) 13.9 14 14.1 14.2 14.3 -5.1 -5.08 -5.06 (a) Estimation results for position parameters. -0.3 -0.2 -0.1 0 Spatial angle 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Normalized power Perfect CSI Coarse estimation -0.3 -0.2 -0.1 0 Spatial angle Perfect CSI Proposed method (b) Comparison of reconstructed power patterns. -10 -5 0 5 10 15 20 25 30 SNR (dB) 0 10 20 30 40 50 60 70 80 90 100 Path detection accuracy (%) (c) Path detection accuracy versus SNR. Fig. 6. Ke y parameters estimation accuracy , reconstructed po wer pattern comparison, and path detection accuracy versus SNR. inducing severe signal superposition, making it highly chal- lenging to distinguish individual components based on the receiv ed power pattern alone. It is observed that the proposed coarse estimation method successfully disentangles these paths and obtains coarse estimates of targets, which demonstrates superior capability of the proposed U-Net model in handling the strong ambiguity caused by signal superposition. Based on this, the subsequent PSO-based reﬁnement stage of the proposed method achie ves superior localization precision, with the reﬁned estimates closely aligning with the true positions. Furthermore, a closer observ ation re veals the effecti veness of the 3 σ -criterion for search space construction: the 3 σ - criterion boundary successfully encompasses the true param- eters, ensuring that the global optimum is included within the conﬁned search region for the subsequent optimization stage. In contrast, the overly restrictiv e search region deﬁned by the 1 σ -criterion excludes the true path location, which makes the PSO algorithm con verge to a biased boundary estimate, thereby failing to accurately recov er the channel parameters. Fig. 6(b) compares the normalized power patterns recon- structed by dif ferent estimation schemes against the ideal power pattern obtained from perfect CSI. Speciﬁcally , the normalized power pattern reconstructed by coarse estimation successfully captures the proﬁle of power pattern, indicating that the U-Net model ef fectiv ely resolv es the number and locations of dominant paths. Howe ver , the reconstructed multi- path amplitude exhibits noticeable deviations, which are due to residual errors in the neural network output. In contrast, the power pattern reconstructed by the proposed method exhibits an exceptional alignment with that under perfect CSI, accurately estimating the multi-path components. Fig. 6(c) ev aluates the accuracy of the proposed method in estimating the number of paths versus SNR. It is observed that the estimation accuracy improv es signiﬁcantly as the SNR increases. Speciﬁcally , in the lo w-SNR regime (e.g., -10 dB), the detection of multiple NLoS paths is challenging due to the strong noise interference, resulting in lower accuracy , particularly for scenarios with a larger number of paths (e.g., 5 paths). Ho wev er, as the SNR improves, the estimation accuracy improves rapidly and approaches nearly 100% when the SNR exceeds 10 dB. Besides, without the classiﬁcation loss in (14), the model will always estimate the maximum number of paths, which results in a constant accuracy as the frequency of the maximum number of paths in the dataset (i.e., around 30%). This result validates the effecti veness of the proposed path existence estimation loss function in (14). D. Estimation Error Comparison Let the normalized MSE (NMSE) be deﬁned as NMSE = E [ ∥ h near − ¯ h near ∥ 2 / ∥ h near ∥ 2 ] , with ¯ h near denoting the recon- structed channel based on the estimated parameters. Fig. 7(a) ev aluates the normalized MSE (NMSE) of the reconstructed channel versus SNR for the different schemes. It is observed that the coarse estimation of Stage 1 and the CNN-based scheme suffer from a pronounced error ﬂoor, saturating at a relati vely high NMSE (e.g., around -5 dB and -2 dB). The saturation is due to the inherent resolution limitations of classiﬁcation-based models, which prev ent them from achiev- ing high-precision continuous parameter recovery . In contrast, the NMSE of the proposed method decreases with the increase of SNR, reaching a minimum of approximately − 27 dB in the 12 -10 -5 0 5 10 15 20 25 30 SNR (dB) -35 -30 -25 -20 -15 -10 -5 0 5 NMSE (dB) (a) NMSE versus SNR. -10 -5 0 5 10 15 20 25 30 SNR (dB) 10 -4 10 -3 10 -2 10 -1 10 0 RMSE (b) Angle estimation RMSE versus SNR. -10 -5 0 5 10 15 20 25 30 SNR (dB) 10 -3 10 -2 10 -1 10 0 10 1 RMSE r (m) (c) Range estimation RMSE versus SNR. Fig. 7. Estimation errors of channel, angle and range versus SNR. -10 -5 0 5 10 15 20 25 30 SNR (dB) 0 2 4 6 8 10 12 14 16 18 Achievable rate (bps/Hz) 24.95 25 25.05 16.2 16.25 16.3 (a) Achiev able rate versus SNR. 256 512 1024 4 5 6 7 8 9 10 11 12 13 Achievable rate (bps/Hz) (b) Achiev able rate versus number of antennas. -10 -5 0 5 10 15 20 25 30 SNR (dB) 0 2 4 6 8 10 12 14 16 Achievable rate (bps/Hz) 24.9 25 25.1 15.8 16 16.2 (c) Achiev able rate versus SNR under the 3GPP TR 38.901 UMa scenario. Fig. 8. Achiev able rate versus SNR and number of antennas high-SNR regime. Noting that while the PSO with full space scheme exhibits slightly better estimation precision than the proposed method, this difference can be negligible in terms of rate when SNR is higher than 5 dB (to be shown in Fig. 8(a)). Figs. 7(b) and 7(c) compare the root MSE (RMSE) of angle and range estimation versus SNR for different beam training schemes, respectively , where the estimation RMSEs of angle and range are deﬁned as RMSE θ = p E [ ∥ θ − ¯ θ ∥ 2 ] and RMSE r = p E [ ∥ r − ¯ r ∥ 2 ] , respectiv ely . It is observed that both the coarse estimation of Stage 1 and the CNN-based scheme saturate at a noticeable error ﬂoor even in high-SNR conditions. In contrast, the RMSE for both angle and range of the proposed method decreases monotonically with increasing SNR, effecti vely overcoming the error ﬂoor observed in pure learning-based benchmarks. This validates the effecti veness of the optimization-based reﬁnement in correcting the residual er - rors from the coarse estimation stage. Moreover , the proposed method with 1 σ -criterion boundary saturates at a suboptimal lev el compared to that with the 3 σ -criterion boundary . This indicates that an overly restricted boundary may exclude the true channel parameters, thereby preventing the optimization algorithm from conv erging to the global optimum. E. Achievable Rate Comparison Fig. 8(a) ev aluates the achiev able rate given the estimated channel versus SNR for different beam training schemes. First, the benchmark schemes neglecting NLoS components or spherical wav efronts (e.g., the LoS-based scheme and Far- ﬁeld beam training) suf fer se vere performance degradation due to signiﬁcant channel model mismatch. Second, compared to the CNN-based scheme, the coarse estimation of Stage 1 achiev es a substantial performance gain across the entire SNR regime. This improvement is attributed to the superior feature extraction capabilities of the U-Net architecture, where the encoder-decoder structure with skip connections allows for effecti ve fusion of multi-scale features, thereby preserving the ﬁne-grained spatial details of the multi-path channel that are often lost in the con ventional CNN-based scheme. Third, the proposed method with 1 σ demonstrates a noticeable perfor- mance improv ement over the coarse estimation of Stage 1 but ev entually saturates at suboptimal. This indicates that the search boundary deﬁned by the 1 σ criterion is ov erly restric- tiv e, which may fail to include the true parameters, thereby prev enting the optimization algorithm from reaching the global optimum. Finally , the PSO with full space scheme achie ves an achiev able rate that tightly approaches the perfect-CSI- based beamforming upper bound, where the method runs a signiﬁcantly large number of iterations to ensure conv ergence. In comparison, the proposed method similarly achiev es this near-optimal performance with low complexity . Besides, Fig. 8(b) inv estigates the effect of the number of antennas on the achie vable rate. The path ranges are ﬁx ed at 70 m, which ensures that the user and scatterers remain within the near-ﬁeld region as the array aperture expands. As expected, the achiev able rate for all near-ﬁeld schemes improv es with the number of antennas, beneﬁting from the increased array gain and spatial resolution. In contrast, the far-ﬁeld beam training method suffers from performance degradation as the antenna number increases due to the model mismatch caused by the more pronounced near-ﬁeld effect. Moreov er, one can observe that our proposed method outperforms other methods across different numbers of antennas, maintaining a negligible performance gap relativ e to the upper bound. 13 T o further validate the ef fectiveness of the proposed scheme under more practical propagation conditions, we additionally provide the achiev able rate comparison under a 3GPP-based simulation setup [28]. Speciﬁcally , QuaDRiGa is employed to generate the channel dataset according to the 3GPP TR 38.901 UMa scenario at 24 GHz [29]–[31]. The corresponding simulation results are shown in Fig. 8(c). It is observed that the proposed method still achiev es rate performance close to its upper bound based on perfect CSI, signiﬁcantly outper- forming benchmark schemes. Even under the low-SNR regime (e.g., -10 dB), the proposed hybrid method still achiev es a noticeable performance gain over benchmark schemes, which demonstrates its robustness in practical near-ﬁeld complex scattering environments. Compared with the simulation results in Fig. 8(a), the performance gap between the proposed method and its upper bound based on perfect CSI is slightly larger under the 3GPP-based simulation setup, which is due to the discrepancy between the ideal near -ﬁeld channel model and the complex practical channel modelling. V I I . C O N C L U S I O N In this paper , we studied near-ﬁeld beam training under multi-path channel conditions. In order to o vercome the limita- tions of con ventional DFT -based methods, we proposed a two- stage hybrid learning-and-optimization method for near-ﬁeld multi-path beam training. In particular , a U-Net model was employed in the ﬁrst stage to pro vide a coarse estimation of the key multi-path parameters, while a PSO-based optimization method was applied in the second stage to achie ve high- precision reﬁnement within a conﬁned search region. Lastly , numerical results veriﬁed the effecti veness of the proposed hybrid method in signiﬁcantly improving estimation accuracy compared with existing schemes. R E F E R E N C E S [1] H. Lu, Y . Zeng, C. Y ou, Y . Han, J. Zhang, Z. W ang, Z. Dong, S. Jin, C.- X. W ang, T . Jiang, X. Y ou, and R. Zhang, “ A tutorial on near-ﬁeld XL- MIMO communications toward 6G, ” IEEE Commun. Surv . T ut. , vol. 26, no. 4, pp. 2213–2257, 2024. [2] J. Cong, C. Y ou, J. Li, L. Chen, B. Zheng, Y . Liu, W . W u, Y . Gong, S. Jin, and R. Zhang, “Near-ﬁeld integrated sensing and communication: Opportunities and challenges, ” IEEE W ireless Commun. , vol. 31, no. 6, pp. 162–169, Dec. 2024. [3] C. Y ou, Y . Cai, Y . Liu, M. Di Renzo, T . M. Duman, A. Y ener, and A. Lee Swindlehurst, “Next generation advanced transceiv er technolo- gies for 6G and beyond, ” IEEE J . Sel. Areas Commun. , vol. 43, no. 3, pp. 582–627, Mar . 2025. [4] Y . Zhang, X. W u, and C. Y ou, “Fast near-ﬁeld beam training for extremely large-scale array , ” IEEE W ireless Commun. Lett. , vol. 11, no. 12, pp. 2625–2629, Dec. 2022. [5] C. W u, C. Y ou, Y . Liu, L. Chen, and S. Shi, “T wo-stage hierarchical beam training for near-ﬁeld communications, ” IEEE T rans. V eh. T ech- nol. , vol. 73, no. 2, pp. 2032–2044, Feb. 2024. [6] Y . Pan, C. Pan, S. Jin, and J. W ang, “RIS-aided near-ﬁeld localization and channel estimation for the terahertz system, ” IEEE J. Sel. T opics Signal Process. , vol. 17, no. 4, pp. 878–892, Jul. 2023. [7] M. Cui and L. Dai, “Channel estimation for extremely large-scale MIMO: Far -ﬁeld or near-ﬁeld?” IEEE T rans. Commun. , vol. 70, no. 4, pp. 2663–2677, Apr . 2022. [8] X. W u, C. Y ou, J. Li, and Y . Zhang, “Near-ﬁeld beam training: Joint angle and range estimation with DFT codebook, ” IEEE T rans. W ir eless Commun. , vol. 23, no. 9, pp. 11 890–11 903, Sep. 2024. [9] Z. W ang, R. Kiran, J. Nair , C.-H. Chen, T .-H. Chou, S. Tsai, and R. Zhang, “Sparsity-aware near-ﬁeld beam training via multi-beam combination, ” arXiv pr eprint arXiv:2505.08267 , 2025. [10] Z. W ang, R. Kiran, S. Tsai, and R. Zhang, “Low-comple xity near-ﬁeld beam training with DFT codebook based on beam pattern analysis, ” arXiv preprint arXiv:2503.21954 , 2025. [11] Y . Lu, Z. Zhang, and L. Dai, “Hierarchical beam training for extremely large-scale MIMO: From far-ﬁeld to near-ﬁeld, ” IEEE T rans. Commun. , vol. 72, no. 4, pp. 2247–2259, Apr. 2024. [12] C. Zhou, C. Y ou, Z. Huang, S. Shi, Y . Gong, C.-B. Chae, and K. Huang, “Multi-beam training for near-ﬁeld communications in high-frequency bands: A sparse array perspective, ” IEEE T rans. W ireless Commun. , vol. 25, pp. 6937–6953, 2026. [13] W . Liu, H. Ren, C. Pan, and J. W ang, “Deep learning based beam training for extremely large-scale massive MIMO in near-ﬁeld domain, ” IEEE Commun. Lett. , vol. 27, no. 1, pp. 170–174, Jan. 2023. [14] G. Jiang and C. Qi, “Near-ﬁeld beam training based on deep learning for extremely large-scale MIMO, ” IEEE Commun. Lett. , vol. 27, no. 8, pp. 2063–2067, Aug. 2023. [15] J. Nie, Y . Cui, Z. Y ang, W . Y uan, and X. Jing, “Near-ﬁeld beam training for extremely large-scale MIMO based on deep learning, ” IEEE Tr ans. Mobile Comput. , vol. 24, no. 1, pp. 352–362, Jan. 2025. [16] Y . W ang, C. Qi, W . He, and A. Nallanathan, “Near-ﬁeld beam training and tracking with deep learning for extremely large-scale MIMO, ” IEEE T rans. V eh. T echnol. , vol. 74, no. 12, pp. 19 783–19 788, Dec. 2025. [17] D. Y u, M. Kolbæk, Z.-H. T an, and J. Jensen, “Permutation inv ariant training of deep models for speaker-independent multi-talker speech separation, ” in Pr oc. Int. Conf. Acoust. Speech Signal Process. (ICASSP) , Jun. 2017, pp. 241–245. [18] B. Ning, H. Y in, S. Liu, H. Deng, S. Y ang, Y . Zhang, W . Mei, D. Gesbert, J. Park, R. W . Heath, and E. Bj ¨ ornson, “Precoding matrix indicator in the 5G NR protocol: A tutorial on 3GPP beamforming codebooks, ” IEEE Commun. Surv . T ut. , vol. 28, pp. 4581–4623, 2026. [19] O. Ronneberger , P . Fischer, and T . Brox, “U-Net: Con volutional net- works for biomedical image segmentation, ” in Pr oc. Int. Conf. Med. Image Comput. Comput.-Assist. Intervent. (MICCAI) , 2015, pp. 234– 241. [20] W . Xie, J. Xiao, P . Zhu, C. Y u, and L. Y ang, “Deep compressed sensing- based cascaded channel estimation for RIS-aided communication sys- tems, ” IEEE Wir eless Commun. Lett. , vol. 11, no. 4, pp. 846–850, Apr . 2022. [21] Z. Fu, S. Mukherjee, M. T . Lanagan, P . Mitra, T . Chawla, and R. M. Narayanan, “T ransfer learning and double U-Net empowered wa ve prop- agation model in complex indoor en vironments, ” IEEE Tr ans. Antennas Pr opag. , vol. 73, no. 7, pp. 4814–4828, Jul. 2025. [22] C. Zhou, C. Y ou, S. Gong, B. L yu, B. Zheng, and Y . Gong, “Channel estimation for XL-IRS assisted wireless systems with double-sided visibility regions, ” in Pr oc. 16th Int. Conf. W ireless Commun. Signal Pr ocess. (WCSP) , 2024, pp. 456–461. [23] M. Cui, Q. Zeng, and K. Huang, “T owards atomic MIMO receivers, ” IEEE J. Sel. Areas Commun. , vol. 43, no. 3, pp. 659–673, Mar . 2025. [24] P . Netrapalli, P . Jain, and S. Sanghavi, “Phase retriev al using alternating minimization, ” IEEE T rans. Signal Pr ocess. , vol. 63, no. 18, pp. 4814– 4826, Sep. 2015. [25] L. Y ao, C. Y ou, C. Zhou, B. Zheng, and W . Mei, “Position optimization for two-layer movable antenna systems, ” IEEE W ir eless Commun. Lett. , vol. 15, pp. 1270–1274, 2026. [26] M. Q. Khan, A. Gaber , M. Parvini, P . Schulz, and G. Fettweis, “ A low-comple xity machine learning design for mmW av e beam prediction, ” IEEE W ireless Commun. Lett. , vol. 13, no. 6, pp. 1551–1555, Jun. 2024. [27] Z. W u and L. Dai, “Multiple access for near-ﬁeld communications: SDMA or LDMA?” IEEE J. Sel. Ar eas Commun. , vol. 41, no. 6, pp. 1918–1935, Jun. 2023. [28] Y . Lu and L. Dai, “Near-ﬁeld channel estimation in mixed LoS/NLoS en vironments for extremely large-scale MIMO systems, ” IEEE T rans. Commun. , vol. 71, no. 6, pp. 3694–3707, Jun. 2023. [29] H. Xu, J. Zhang, P . T ang, H. Xing, H. Miao, N. Zhang, J. Li, J. Wu, W . Y ang, Z. Zhang, W . Jiang, Z. He, A. Haghighat, Q. W ang, and G. Liu, “Near-ﬁeld propagation and spatial non-stationarity channel model for 6–24 GHz (FR3) extremely large-scale MIMO: Adopted by 3GPP for 6G, ” IEEE J. Sel. Ar eas Commun. , vol. 44, pp. 3201–3218, 2026. [30] S. Jaeckel, L. Raschkowski, K. B ¨ orner , and L. Thiele, “QuaDRiGa: A 3- D multi-cell channel model with time evolution for enabling virtual ﬁeld trials, ” IEEE T rans. Antennas Pr opag. , vol. 62, no. 6, pp. 3242–3256, Jun. 2014. [31] 3rd Generation Partnership Project (3GPP), “Study on channel model for frequencies from 0.5 to 100 GHz, ” 3GPP , T echnical Report TR 38.901 V19.2.0, Jan. 2026, release 19.

Near-field Beam Training under Multi-path Channels: A Hybrid Learning-and-Optimization Approach

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment