CSI-tuples-based 3D Channel Fingerprints Construction Assisted by MultiModal Learning

CSI-tuples-based 3D Channel Fingerprints Construction Assis ted by MultiModal Learning Chenjie Xie, Graduate Student Member , I EEE, Li Y ou, Senior Member , IEEE, Ruirong Chen, Gaoning He, Xiqi Gao, F ellow , IEEE Abstract —Low-altitude communications can promote the in- tegration of aerial and terr estrial wireless r esource s, expand network cov erage, and enhance transmission q uality , ther eby empowering the de velopment of sixth-generation ( 6G) mobile communications. As an enabler for lo w-altitude transmission, 3D channel ﬁ ngerprints (3D-CF), also referred to as the 3D radio map or 3D chann el knowledge map, are expected to enhance th e understandin g of communication en viro nments and assist in the acquisition of chann el state informa tion (CSI), thereby av oiding repeated estimations and reducing computation al complexity . In this p ap er , we p ropose a modularized multimodal framewo rk to construct 3D-CF . Speciﬁcally , we ﬁrst establish the 3D-CF model as a collection of CSI-tuples based on Rician fad ing ch annels, wit h each tu p le comprisin g the low-altitude vehicle’ s (LA V) positi on s and its corresponding statistical CS I. In consideration of the heterogeneous structures of different p rior data, we formulate the 3 D-CF constru cti on problem as a multimodal regr ession task, where the targ et ch annel information in the CSI-tuple can be estimated directly by its corresponding LA V positions, together with communication measurements and geogra phic env ironment maps. Then , a high-efﬁ ciency mu l timodal framework is proposed accordingly , which includes a correlation-based multimodal fu- sion ( Corr -MMF) module, a mul t i modal repr esentation (MM R) module, and a CSI regr ession ( C S I-R) module. Numerical results show th at our proposed framewo rk can efﬁciently construct 3D- CF and achieve at least 27.5% higher accuracy th an the state- of-the-art algorithms under different communication scenarios, demonstrating i t s competitiv e perform ance and excellent gener - alization ability . W e also analyze the computational complexity and illustrate it s superiority in terms of the in f erence time. Index T erms —3D ch an n el ﬁngerprints, multimodal framewo rk, CSI-tuples, low-altitude communication. I . I N T R O D U C T I O N As the sixth-gen eration (6G) mobile comm unications con- tinue to e volve, massi ve dem ands f or low-altitude appli- cations ha ve arisen a c ross tran sportation, agricu ltu re, and emergency services, ca taly zing the vigo r ous dev elopment of low-altitude co mmunicatio ns [2] –[6]. Currently , low-altitude networks strive to pr ovide real-time commun ication and nav- igation serv ic e s for low-altitude vehicles ( L A Vs), facilitating the interoperab ility and coordin ation with terrestrial networks, delivering commu nication suppor t for speciﬁc regions, a nd enhancin g bo th network coverage and transmission quality Part of this work was acce pted by IE E E WCNC 2026 [1]. Chenji e Xie, Li Y ou, and Xiqi Gao are with the National Mobile Commu- nicat ions Research Laboratory , Southeast Uni ve rsity , Nanjin g 210096, China, and also with the Purple Mountain Laboratories, Nanjing 211111, China (e- mail: cjxie@seu.edu .cn, lyou@seu.edu.cn, xqgao@seu .edu.cn). Ruirong Chen and Gaoning He are with the Huawei T echn ologie s Co., Ltd., Shenzhen 518129, China (e-mail: ruirongchen@hu awei .com, he gaon- ing@hua wei.com). [2], [6]. In the fore seea ble fu ture, low-altitude co mmunicatio ns will f urther prom ote th e synergistic integratio n o f aerial and terrestrial re sources, empowering a n e w parad igm for the development of mobile co m munication s. Howe ver , a cquiring high-quality chan nel state infor mation (CSI) in low-altitude comm unication systems rem ains an u nre- solved challen ge, which signiﬁcantly impacts the pe r forman c e of low-altitude wire less tran smission. Due to the spa tially non-u niformly distributed scatterers [ 7], [8], the radio envi- ronmen ts in low-altitude scenarios bec o me more complicated , and the signa l p ropagatio n characteristics vary signiﬁcan tly across different altitudes, rend ering the acquisition of accurate CSI considerably ch allenging. On the other hand , LA Vs in low-altitude airspa ce exhibit hig h mobility , constra in ed power consump tion, limited payload , and restricted compu tin g power [7], [9 ] , mak in g the acquisition o f real-time and efﬁcient CSI more difﬁcult. Fortunately , chann el ﬁngerp r ints (CF), also ref erred to as the channel kn owledge map (CKM) [10] or the radio en vironmen t map (REM) [11 ], ha ve e merged as a novel ap proach to address the afor ementione d challen g e. By deﬁnition, CF is a site-speciﬁc datab ase storing th e u ser-location-related CSI. In terrestrial c o mmunicatio n systems, 2D-CF has demon stra te d its e ffectiveness in assisting the acquisition of chan nel info r- mation, which suppo rts the reso u rce managem ent [12], beam selection [13], wireless po sition ing [1 4], and oth er ap plications without repetitive estimation s. Com parably , in lo w-altitude commun ication systems, 3D-CF will also directly provide the corr espondin g CSI based on th e positions o r trajecto- ries o f LA Vs, thereby redu cing pilot overhead, conserv in g wireless resources, and impr oving transmission efﬁciency . It is for e seea ble that the introd uction of 3D-CF will certainly bring a new perspective to th e development of low-altitude commun ications. Currently , several studies have co mmenced to explore ap - proach e s for high-e fﬁcienc y 3D-CF constructio n. For exam- ple, the authors in [15] detected the num ber of radia tio n sources based on the path loss (PL) m o del an d con structed a 3D spectrum ma p ac cording ly . Nevertheless, adopting such a one- size- ﬁts-all PL mod el inevitably leads to sub stantial errors for 3D-CF constru ction due to its co nsiderable vari- ability acr oss d ifferent altitud es. T o addr e ss the limitations of model-ba sed app roaches, data-d riv en methods have been extensi vely investigated. For in stance, the au thors in [16 ] lever - aged LA V -b a sed measur ements to ev aluate th e interpolatio n algorithm s, inclu ding nearest ne ighbor (NN), linea r, in verse distance weighting (IDW), and ordina r y Kriging, validating their feasibility fo r 3D-CF recon struction. Gau ssian process regression (GPR), spar se Bay esian learning , and compressed sensing ( CS) were also dev eloped to r e d uce the required number of samples in [1 7]–[21 ]. Additionally , t-singular value decomp o sition (t-SVD), ﬁber samplin g ten sor d ecomposition (FSTD), b lock-term tensor d ecomposition (BTD), an d other tensor-based alg orithms were ado pted to fu rther enhance the computatio nal efﬁciency by leveraging the smoothness prior of m e asurements in [22]– [ 25]. H owever , these pur e data- driven methods are entir e ly e nvironmen t-blind. In practice, wireless chann els are profou n dly inﬂu e n ced b y the geog r aphic en vironmen t throug h a complex in terplay of sign al reﬂection, diffraction, and scattering mechan isms [26], [27 ] , especially in low-altitude commu nication systems where line-o f-sight (LOS) p aths ar e pred ominant. T o e m power the geog raphic environment-assisted 3D-CF construction , researchers turn to machine learning (ML) for feasible solution s. For instance, the authors in [28] designed two deep neural networks (DNNs) to jointly recon struct the 3D REM and its communication environment. Moreover , generative artiﬁcial intelligence (GenAI) was a dopted for high- quality 3 D-CF generation , supporting both rad iation-aware and radiation- unaware scen a rios with sparse spatial observations based on generative adversarial n e twork (GAN) or diffusion model (DM) [29 ]–[31] . Howev er , these computer vision- based GenAI algor ithms need to mo del 3D-CF as image s, re q uiring a discretizatio n fo r th e target r egion wher e L A Vs in the same grid (p ixel) share th e iden tica l CSI (pixel values) [ 32], [33] . This assumption introduces som e critical limitations. Firstly , the unifo rm discretization reduces the ﬂexibility of 3D-CF . Unlike terrestrial network s, L A V traje c to ries in low-altitude airspace are highly co mmunicatio n-deman d-driv en [34]. In- discriminate gridd ing merely results in a mism a tc h between 3D-CF resolu tion and comm unication de m ands while wasting computatio nal resourc e s. Seco ndly , the in tra-grid CSI sharing induces signiﬁcant errors for 3D-CF application s. Particularly when buildings exist within the grid , PL an d channe l sha d - owing for LA Vs o n o p posite sid e s m a y differ substantially and can not be repr esented by the same CSI. Thir dly , th e grid-b a sed 3D mod e l trigger s an expon ential increase in data volume, resulting in hig h com p utational complexity and being unsuitable fo r prac tica l deploymen t. Multimodal learning (MML ) , a co mputer agent with in- telligent cap abilities such as un d erstanding , reasoning , an d learning, enables the integra tio n of diverse data mo dalities f o r predictive task s and offers a p romising avenue to resolve the aforemen tioned issues o f GenAI by tra nscending the image- only pro c essing con stra in t. Cur rently , [3 5]–[4 5 ] have demon- strated en ormou s ap plication po tentials of MML-based CSI prediction in wireless comm unications. Particularly for 3D- CF con struction in low-altitude systems, wh ich in volves more complex data m odalities, includ in g geog raphic environment maps, sparse commu nication m easurements, and L A V coor di- nates, MML is expected to process and integrate the und e rly- ing in formation acr oss these mod alities, thereby offering n ovel solutions fo r 3 D-CF con stru ction and e n hancing environmenta l awareness in low-altitude com munication systems. Motiv ated b y the ab ove discu ssion s, we inv estiga te th e 3D-CF c o nstruction based on MML fo r low-altitude com - munication systems. Speciﬁcally , we encapsulate the LA V’ s coordin ates an d its correspo nding ch annel inf ormation into a CSI-tuple, byp assing the discretizatio n o peration an d directly ﬁtting the mappin g relationship betwe en tuple elem ents to minimize the 3D-CF errors. Notably , the CSI sto r ed in 3D- CF can be ﬂexibly deﬁned according to practical transmission requirem ents, such as the received signal strength (RSS), coverage, L O S prob ability , o r even the ch annel covariance matrix and p ower angle spectru m (P AS). Then, we sufﬁciently consider the pr io r knowledge to assist the constructio n of 3D-CF , in c luding geogr aphic e nvironmen t maps, sparse com - munication measurem ents, an d LA V position s. Due to their heteroge n eous data structures that can not b e proce ssed by a single-type ne twork , we regard them as multimo dal data and transform the 3D-CF construction pro blem into a multimo dal regression task, thereb y d ev elo ping a modular ized 3D-CF multimoda l fr a mew or k. The main contributions of this paper can be sum marized as follows: • Based o n the ground- to -LA V Rician fading chann el model, we pr opose a 3D-CF model that is more suitab le for low-altitude com munication systems. Speciﬁcally , the 3D-CF is conce ptualized as a collec tio n o f CSI-tuples with each tuple comprising the LA V p osition and its cor- respond in g channe l inform ation. This mod el en ables th e adaptive adjustmen ts of CSI-tup les acco rding to practical commun ication demand s, thu s m aking the ﬂexible 3D- CF construction po ssible. • Gi ven the heter ogeneo us structur e s of different prior data, we form u late the 3 D-CF co nstruction pro blem as a multimodal regression task, where th e target channel informa tio n in the CSI-tuple can b e estima te d dir ectly by its correspo nding LA V lo cation, geogra p hic environment maps, an d measu r ement d ata. • Based o n th e structu ral ch aracteristics and intern al rela- tions of prior d ata, we pr opose a h ighly efﬁcient modu- larized mu ltimodal framework for 3D-CF constructio n . During the d ata pro cessing stage, a correlation- based multimoda l fu sion ( Corr-MMF) module and a m ultimodal representatio n (MMR) modu le are designed based on the relativity between commu nication environmen ts and measuremen t d ata, wh ich extract a n d lear n feature s of the CSI distribution in horizo n tal and vertical dir ec- tions, respectively . In th e CSI estimatio n phase, we align these different fe atures via embe dding op erations an d design the c h annel state info rmation regression (CSI-R) module to estimate CSI b y lev er aging L A V positions as conditio n al inpu ts, ther eby recovering CSI-tup les an d accomplishin g the 3D-CF construction . • W e p resent numerical results to show that the pro- posed mod ularized m ultimodal fram ew or k ac h iev es at least 27.5 % high er acc uracy tha n state-of-the - art algo- rithms in 3D-CF con struction. Experime n tal r esults also demonstra te its com petitiv e per forman c e in gen eralization capability and comp utational complexity . The rest of this paper is organized as follows. In Section II, we estab lish the gro und-to - LA V cha nnel mo del and 3D-CF model in low-altitude co mmunication scen arios, an d form u late the construction prob lem accord ingly . Section III elabo rates on the structu r e of o ur proposed multimo dal framework. Numerical resu lts are pr esented in Section IV. Fin ally , we conclud e th e pape r in Sec tio n V. Notations :  = √ − 1 de n otes the im aginary un it. a T represents th e tran spose o f vector a and || A || F is th e Fro benius norm fo r m atrix A . C M × N × K denotes the M × N × K dimensiona l com p lex tensor space a n d R 3 represents the three- dimensiona l real space. E {·} denotes the expectation o pera- tion. C N ( a , B ) represents the c omplex Gaussian d istribution with mean a and cov a riance B . T he notation , is u sed for deﬁnitions. I I . S Y S T E M M O D E L In this section, we intro d uce the chan nel model fo r g round - to-LA V links in low-altitude airspace. Then, a CSI-tuples- based 3D-CF mode l is estab lished ac c ordingly , which is highly ﬂexible a nd d oes not rely on the sp a tial gridding . Based o n the charac teristics of 3D-CF , we fo rmulate the con struction problem as a multimo d al regression task with th e assistance of p r ior da ta . A. Chan nel Model As illustrated in Fig. 1, we con sider a gro und-to - LA V system in lo w- altitude airspace , where the base station (BS), positioned at ( x, y , 0 ) , is equ ipped with a un iform linear arra y (ULA) comp rising N BS antenna elem e nts [ 46]. For simplicity , each LA V in th e target region em ploys a sing le antenna and moves in a constan t velocity in a time interval o f interest. T o better characte r ize the groun d -to-LA V links in target ar eas, we assume that each ch annel includ e s a line-of-sigh t ( LOS) path and several reﬂected p aths, both contr ibuting to the recei ved signal for a spec iﬁc LA V [ 47]. By ado pting the c o rrelated Rician fading channel, the d ownlink (DL) channel b etween the BS and the m -th LA V over the n -th symb ol can be modeled as [4 7], [48 ] h m [ n ] = p β m  ¯ h m [ n ] + ˜ h m [ n ]  , (1) where β m represents the large-scale chann el fading co e fﬁcient, ¯ h m [ n ] an d ˜ h m [ n ] deno te the LOS com ponent an d NLOS compon ent, r e spectiv ely . For th e LOS compon ent ¯ h m [ n ] , deﬁn e K as the Rician factor and we have [4 8]–[5 1 ] ¯ h m [ n ] = r K K + 1 α ( φ m, 0 , θ m, 0 ) e  (2 π ν m ξ m, 0 T s n + ϕ m, 0 ) , (2) where ν m is the Dopp ler shift, T s is the sy stem sampling duration , ϕ m, 0 is the p hase shift for LOS compo nent, and ξ m, 0 , v m k T m, 0 . v m is the unit velocity vector with an elev ation angle φ m,v and an az imuth angle θ m,v , which is giv en by v m = [cos( θ m,v ) sin( φ m,v ) , sin( θ m,v ) sin( φ m,v ) , cos( φ m,v )] . (3) Reflected Path LAV LOS Path BS Movement Direction reflecte d p ath LOS p a th refle cted path refle cted path L OS p ath LOS p ath LOS p at h S pa th LOS p ath LOS p ath LOS p ath LOS p ath LOS p ath LOS p ath LOS p ath LO S pat L OS p th h h t h h t h t r r r efle r r r efle © m , 0 © m , v ¶ m , v © m , i ¶ m , i ¶ m , 0 X Z Y BS LAV Fig. 1: A typical ground-to-LA V communication scenario, where all possible channe l components include one LOS path and sev eral reﬂect ed paths. Its propagat ion geometry tak es the BS as the origin. k m, 0 is the unit wave vector with an elevation angle φ m, 0 and an azimu th ang le θ m, 0 , which is given by [52] k m, 0 = [cos( θ m, 0 ) sin( φ m, 0 ) , sin( θ m, 0 ) sin( φ m, 0 ) , cos( φ m, 0 )] . (4) α ( φ m, 0 , θ m, 0 ) is the steering vector with an ele vation ang le φ m, 0 and an azimu th angle θ m, 0 , wh ich can be expre ssed as [53] α ( φ m, 0 , θ m, 0 ) = [1 , e  2 π d m λ c ζ m, 0 , . . . , e  2 π ( N BS − 1) d m λ c ζ m, 0 ] , (5) where d m is th e inter-antenna spacing, λ c is the wavelength, and ζ m, 0 , cos( θ m, 0 ) sin( φ m, 0 ) . For the NLO S comp onent ˜ h m [ n ] , deﬁne L as the num ber of NLOS path s, we have [4 9], [54] , [55] ˜ h m [ n ] = r 1 K + 1 L X l =1 α ( φ m,l , θ m,l ) √ L e  (2 π ν m ξ m,l T s n + ϕ m,l ) , (6) where ξ m,l , v m k T m,l with v m and k m,l similar to (3) and ( 4), re spectiv ely . Assume that { φ m,l } L l =1 , { θ m,l } L l =1 , a nd { ϕ m,l } L l =1 are ind ependen t random variables, then, according to the central limit theo rem [ 5 6], when L tends to inﬁn ity , ˜ h m [ n ] will appro ximate a zero-m ean complex Gau ssian ran - dom process, i.e., ˜ h m [ n ] ∼ C N (0 , Λ m ) , where Λ m represents the po siti ve semi-deﬁnite spatial cov arian ce matrix o f the NLoS co m ponents fo r the m -th LA V [47]– [49], [ 5 4], [55 ]. Based on the analysis of (2) and ( 6 ), the chann el model between the BS and th e m - th LA V over the n -th sym bol can be expr essed as h m [ n ] ∼ C N ( H , R ) [47]–[4 9], where mea n H = √ β m ¯ h m [ n ] and covariance R = β m Λ m . In acco rdance with this ch annel mod e l, th e RSS can be written by g m = P BS || h m [ n ] || 2 F , (7) where P BS denotes the transm it power . B. 3D- CF Model and Pr o blem F ormulation Based o n the chan nel mo del pre sen ted in Section I I -A, we next in troduce the 3D-CF m o del for LA Vs in low-altitude airspace an d then form ulate the co nstruction pr oblem. 1) 3 D-CF Mo del: Traditional 3D-CF model typ ically par- titions the target area in to grids, where all receivers within the same grid share th e identical channel infor mation, ther e b y conv er ting CF into an imag e [32], [33]. In contra st, we m odel 3D-CF as a collection of CSI-tup les { ( X , Ω) } , where X repr e - sents th e LA V coord inates array and Ω d enotes its associated channel infor mation. On on e hand, for any LA V in low- altitude airspace, we can always locate its cor respondin g CSI- tuple in 3 D-CF and ob tain th e accu rate chann el inform ation, thereby red u cing errors ind uced by spatial grid ding. On the other hand , the collection o f CSI-tup les can be dyna m ically adjusted or reconstru cted accor ding to termin al den sity , low- altitude trafﬁc load , an d spatial u tilization, maximizing th e 3D- CF ﬂexibility while m inimizing unn ecessary compu tational overhead in pr actical applications. In acco r dance with the channe l mo del in Section II-A, we deﬁne th e RSS in (7) as the target ch a nnel infor mation stor ed in 3D-CF , i.e. , Ω = g m . Theref o re, the co llection of CSI-tup les is ultim a te ly expressed as G = { ( X m , Ψ( X m )) | Ψ : X m ∈ R 3 → g m } , (8) which is our propo sed 3 D-CF m odel. Note that we deﬁne Ω as RSS so lely for the convenience of elucidating the model, ta sk , and method o logy . In practice, Ω can be deﬁned as different channel information accor ding to practical commun ication requirem ents, such as PL, delay , Do ppler shift, or e ven the channel covariance m a trix, and the optimal beam indices, to match d i verse ap plications. 2) Pr o blem F ormulatio n : Under the de ﬁn ition of (8), the problem o f con structing 3D-CF is transfor med into the task o f exploring fun ction Ψ , that is, ﬁnding a map ping r elationship from the LA V locatio n to its correspon ding RSS. Howe ver, directly ﬁtting Ψ is extremely challengin g du e to the absence of distinct co rrelations b etween LA V location and its RSS. Consequen tly , we need to seek some prio r infor m ation to facilitate the co nstruction of Ψ . On on e h and, the low- altitude geogra p hic en v ironmen t, which can be viewed as an image and conveniently captu red by too ls like RGB cam eras, exerts a sig niﬁcant inﬂuence on th e d istribution of RSS. In terrestrial networks, geograp hic informa tion has been proven to be ef fective an d crucial in assisting the reconstru ction o f 2D-CF [3 3], [57]– [59]. In low-altitude airspace, en v ironmenta l effects on RSS are more pron o unced du e to the d ominanc e of LOS paths in ground -to-LA V channels. Theref o re, th e low- altitude geogra p hic en v ironmen t, deno ted as E , is one of the essential prior in f ormation to facilitate the con struction o f Ψ . On the o ther ha nd, m easurable CF samplin g data nea r the gro und, denoted as tenso r G gro , can also ser ve as the prior infor mation, a s they par tly reﬂect th e signal pr opagation characteristics and r e veal the u nderlyin g r elationship between terminal loca tio ns and RSS. Based on E an d G gro , the m apping relationship Ψ can be rewritten as Ψ : ( X m , E , G gro ) → g m , X m ∈ R 3 . (9) In practice, it is challenging to derive a feasible ana- lytical solutio n by tradition al interpolatio n method s, h ence, we em ploy a de ep n eural n etwork Ψ ′ to ﬁt Ψ in (9) . In particular, the network Ψ ′ in volves three mod al v ar iables as inputs: the low-altitude co mmunicatio n environment E , which encomp a sses both horizon ta l a n d vertical info rmation of all buildings, vegetation, and other structures in the target low- altitude a irspace; CF sampling d ata G gro , which indicate the signal propag ation c h aracteristics; and th e LA V po sition X m . Therefo re, we for mulate the problem of ﬁtting Ψ by the network Ψ ′ , i.e., th e constructio n of 3D-CF , as a multimoda l regression task, wh ich is given b y arg min Θ E  || Ψ ′ [( X m , E , G gro ); Θ] − g m || 2 F  (10) s . t . g m = Ψ( X m , E , G gro ) , (10a) X m ∈ R 3 , (10b) where Θ is the trainable parameters fo r the network Ψ ′ . I I I . M U LT I M O DA L F R A M E W O R K F O R 3 D - C F C O N S T RU C T I O N As analyzed in Sectio n II-B, mutual relationship s am ong X m , E , and G gro reveal the under lying pattern s of spatial RSS distribution, hen ce, in this section, we develop a mo dularized multimoda l framework to ﬁt th e mappin g relation ship Ψ and construct 3D-CF . As shown in Fig. 2, our pr oposed 3D- CF Mu ltimodal framew o rk in cludes th ree essential mod ules: the Corr-MMF modu le, the MMR m odule, an d the CSI-R module, wher e the ﬁrst two mo dules are de signed to extract features of CSI distribution and geog raphic environments in horizon tal and vertical dire ctions, r e spectiv ely , and the third module is responsible for sp acial CSI p rediction and 3D-CF reconstruc tion based on th ese featur es. Next, we will introduc e them r e spectiv ely . A. The Corr elatio n-based MultiModal Fusion (Co rr-MMF) Module In conventional multimod al lea r ning tasks, data fr om two or m ore media often exhibit stron g correlatio ns. Since th eir structural ch aracteristics are hetero g eneous, it is necessary to condu c t a un iﬁed encoding and comb ination, kn own as MultiModal Fu sio n (MMF) [60]. By deﬁnition, M M F is the process of extracting and integrating fea tu res from two or mo r e media to perform the subsequen t regression or classiﬁcation [61]. I t lev erages th e corre la tio n and com p lementarity amon g 3. MMR Module 1. Data Modalities (Inputs) CF Measurements LAV Positions Geographic Environment Maps Horizontal Information Vertical Information 2. Corr-MMF Module Feature Extraction Feature Fusion 3. MMR Module 4. CSI-R Module Feature Extraction Feature Embedding CSI Regression conditional information CSI (Output) 3D-CF : data flow : training flow 3. MMR Module Fig. 2: Diagram of the proposed 3D-CF MultiModal framewo rk. T his scheme include s three essential modules: the Corr-MMF module, the MMR module, and the CSI-R module. The ﬁrst two modules are designed to extract features of CSI distribut ion and geographi c env ironments in horizontal and vert ical direct ions, respecti vely , and the third module is responsible for spacial CSI predicti on and 3D-CF reconst ruction. different data to keep critical featur es and rem ove red undan t ones, integrating various infor mation in to a stable multimo d al representatio n [62 ]. In our 3 D- CF constructio n task, available CF measur ements G gro near the gro und exhibit a strong corr elation with th e horizon tal geog raphic environment inform ation E h [63]– [65]: on one han d, RSS in G gro reveal the po ssible distribution of buildings, vegetation, and o ther structur es in the target area [66]; o n th e oth er hand , E h indicates the p otential reﬂectio n, diffraction, scattering, an d obstruction during sign al p ropaga- tion, thus affecting the distribution of RSS [67 ]. Consequen tly , exploring and fusing the correlated ch aracteristics between G gro and E h are crucial for the m ultimodal framework to learn 3D-CF patterns in the horizontal d irection, whic h motiv ates our desig n o f the Corr-MMF m odule. In p articular, the low- dimensiona l f e ature represen tations extracted a n d fused by the Corr-MMF module m ust satisfy the following two cr iter ia: Criterion 1 : The low-dimensional feature re presentations should m aximally pr eserve the critical informatio n inherent to both G gro and E h ; Criterion 2 : The low-dimensional feature re presentations should e ffectively preserve th e cor related inf ormation among G gro and E h . 1) Network Design for Criter io n 1: For Criterion 1, a workable structure is the featu re extractor (encod er) used to implement the key informatio n extraction for b oth G gro and E h . As shown in Fig . 3(a ), the encoder includes two stages: feature extraction an d featu re fusio n. During th e f eature e x traction stage, two sub-en coders, E 1 and E 2 , ar e employed to extract fea tu res of G gro and E h , respectively , eliminatin g data red undancy an d ach ieving dimensiona lity reduc tion. Speciﬁca lly , each o f E 1 and E 2 consists of se veral conv olu tional layers, each of which is accompan ied by a rectiﬁed linear u nit (ReLU) to cond u ct th e downsampling opera tio n. T o further speed up the conver g ences and simultaneo usly enhan ce the g eneralization p erform ance of the n etwork, we inco rporate the batch norma lization (BN) after each co n volutional layer . Furthermo re, we observe tha t RSS of the spe c iﬁc LA V in 3D-CF exh ibits a strong corr elation with the chann e l in forma- tion and geograph ical environment in its vicinity [68]. Con- sequently , a terminal a tten tion m echanism (T AM) is d esigned for E 1 and E 2 to focu s on data blocks closer to the LA V . Speciﬁcally , we construct two Gau ssian mask s M G and M E , with their dimensions identical to G gro and E h , respectively . For the ( m, n )-th elemen t in M G and M E , we have M G ( m, n ) = M E ( m, n ) = e − d 2 m,n 2 σ 2 , (11) where d m,n denotes the distance between this ( m, n )-th el- ement and LA V position, and σ 2 the variance o f Gaussian distribution f or M G and M E . It is worth noting he r e th at M G and M E in T AM will assign different weigh ts to G gro and E h at different spatial position s, thereby optimizing the process of featu re extraction. Meanwh ile, the Gaussian-distributed weights inheren tly main tain continuity and d ifferentiab ility , facilitating the b ackward propag ation comp utations for each sub-enco der . Overall, outputs of th e sub-e ncoders E 1 and E 2 can be Encoder Virtual Decoder Stage 1 Stage 2 Fusion Features F Z D Latent Space T-C-AM LAV Position TAM CAM E2 E3 Feedback E1 G gro E h Output Fusion Features (a) Diagram of the proposed Corr-MMF module. The network includes a two- stages encoder , a latent space, and a virtual decoder . The terminal attenti on mechanism (T AM) and channel attention mechanism (CAM) are introduced in this two-stage s encoder as well. Mixed Features weighted Mixed Features AvgPool MaxPool Channel Attention Vector Sigmoid weighted + ... Shared Conv (b) Diagram of the channel attentio n mechanism (CAM). Fig. 3: Diagram of the Corr-MMF m odule and its CAM. written as O E 1 = BN(ReLU(Conv( M G ⊙ G gro ))) ∈ C h × w × c , (12) O E 2 = BN(ReLU(Conv( M E ⊙ E h ))) ∈ C h × w × c , (13) where ⊙ represents the Hadama r d pr oduct, h, w, c are heights, widths, an d chan nels of O E 1 and O E 2 , respectively . During the fea tu re fusion stage, we a d opt an add layer to integrate O E 1 and O E 2 . Th en, th e add ed f eature O F = O E 1 + O E 2 ∈ C h × w × c is fed into a sub-en coder E 3 for further extraction of critical in formation , thereby completing the featu r e fu sion. Note that the ad d layer here does n ot expand the n umber of c hannels in O F , but rath er expo nentially enhances the fea tu re inform ati ven e ss they conta in . Therefo re, the sub sequent sub- e n coder E 3 must be capab le of ad aptiv e ly ev aluating the importan ce of each chann el to discern cruc ial features a nd emp hasize them. T o this end, we intro duce a ch a n nel attention mech anism (CAM) to assign varying weig h ts to different chan nels [6 9], as depicted in Fig . 3(b). Spe ciﬁcally , CAM employs an average pooling and a max poo ling to separately obtain the global statistical infor mation of each ch annel in mixed featu res O F , denoted as avgp o ol( O F ) ∈ C 1 × 1 × c and maxp o ol( O F ) ∈ C 1 × 1 × c , respec ti vely . Subsequen tly , shared conv olution al lay - ers with a kernel size of 1 are u tilized to co n vert avgp o ol( O F ) and maxp o ol( O F ) into two sets o f p reliminary weight vectors. The summa tio n o f these two weights is then nor malized via the Sigmoid fu nction to generate the ultima te channel a ttention vector V c ∈ C 1 × 1 × c , which is given by V c = Sigmoid { Conv[avgpo ol( O F )] + Conv [max po o l( O F )] } . (14) By b roadcasting and multiplying V c with O F , th e sub-en coder E 3 will focus more on the weighted crucial features, thereby optimizing th e entire perfor mance. Overall, assum e that the sub -encoder E 3 has the same structure as E 1 and E 2 , its outp uts can be expressed as O E 3 = BN(Relu(Conv( M C ⊙ O F ))) ∈ Z , (15) where M C ∈ C h × w × c is the chan nel attentio n ten sor by broadc a stin g V c , with dimensions identical to those of O F . As shown in Fig. 3(a), O E 3 represents the data m anifold in the latent space Z , which is precisely the outpu t o f the Corr- MMF modu le. For ease of u n derstandin g, we d enote O E 3 as O CorrMMF . Based on the above two-stage encod e r, a virtu al d ecoder D is adop te d accor dingly to fur ther ensure the max im al preservation of key fea tures de scr ibed in Criter ion 1. Note that the deco der D is termed as “virtu a l” be cause the Corr-MMF module exclusi vely requires the ou tput O CorrMMF , while D solely serves as a n optim ization feedb ack. 2) Correlation Evalu ation for Criterion 2: For Criterion 2, we have to dev elo p additional con straints for Corr-MMF module to effecti vely preserve the correla te d infor m ation among G gro and E h . For ease of deriv a tio n a n d analysis, denote the two-stage enco d er in Fig. 3 (a) as f unction f ( · ) , where f ( Z ) = O CorrMMF with th e two-view in put Z = ( G gro , E h ) . Consequ ently , we can separate ly obtain the key features fro m G gro and E h by setting E h = 0 and G gro = 0 , denoted as f ( Z E =0 ) and f ( Z G =0 ) , respectively . Let f i ( Z E =0 ) and f i ( Z G =0 ) represen t the i -th elements of f ( Z E =0 ) and f ( Z G =0 ) , respectiv e ly , by intro ducing the adjusted cosine similarity , the c o rrelation can be expressed as corr [ f ( Z E =0 ) , f ( Z G =0 )] = P N i =1 ∆ g ,i ∆ e ,i q P N i =1 ∆ 2 g ,i q P N i =1 ∆ 2 e ,i , (16) where ∆ g ,i = f i ( Z E =0 ) − f ( Z E =0 ) , ∆ e ,i = f i ( Z G =0 ) − f ( Z G =0 ) , f ( Z E =0 ) an d f ( Z G =0 ) are the mean values fo r the key features from G gro and E h , respectiv ely . Note that max- imizing (16) empowers the Corr-MMF m odule to effectiv ely preserve the c o rrelation between features extracted f rom G gro and E h , which fulﬁlls the requirem ents of Criterion 2. 3) Ob jective Fun ction fo r Corr-MMF Modu le: Based on the whole network in the ab ove 1 ) a n d the corr elation evalu- ation in the above 2), we th en develop a match ing objective function to train the Corr-MMF mo d ule so that it can u ni- formly satisfy the requir e ments of Criter ia 1 and 2. Denote the network in Fig. 3(a) as F , with its trainab le parameters bein g ϑ . Note th at F is formed b y cascading the encoder f ( · ) with a virtu al decod e r D . Fir st, we intro duce a fusion-r e construction loss L fusion ( ϑ ) to ensu re that the fused features f ( Z ) ca n b e restor ed to the orig inal two-view in put Z = ( G gro , E h ) , wh ich is given b y L fusion ( ϑ ) = ||F ( Z ; ϑ ) − Z || F . (17) Next, a corr e la tio n loss L corr ( ϑ ) is ad opted to ensure that the features extracted and fused by Corr-MMF mod ule effectiv ely preserve the correlatio n be twe en G gro and E h . Accordin g to (16), L corr ( ϑ ) is desig n ed as L corr ( ϑ ) = 1 − cor r [ f ( Z E =0 ) , f ( Z G =0 )] , (18) where L corr ( ϑ ) ∈ [0 , 2] . Ad ditionally , we also introduc e a cross-recon struction loss L cross ( ϑ ) to a ssist the pr ocess of model training, g iven b y L cross ( ϑ ) = ||F ( Z G =0 ; ϑ ) − G gro || F + ||F ( Z E =0 ; ϑ ) − E h || F . (19) It is worth n oting here that the cross-reconstruction loss L cross ( ϑ ) carr ies the p hysical meanin g in practice: Due to the relativity between E h and G gro , it is theor e tically po ssible to recover E h from G gro and vice versa [33] . This p rocess emphasizes no t only th e reco nstruction of E h and G gro but also their cor relation, serving as a further en hancemen t for both L corr ( ϑ ) and L fusion ( ϑ ) . T aking all the losses L fusion ( ϑ ) , L corr ( ϑ ) , and L cross ( ϑ ) into conside r ation, we p resent the objective f unction for Corr- MMF m odule as follows: L ob j ( ϑ ) = L fusion ( ϑ ) + L cross ( ϑ ) + λ L corr ( ϑ ) , (20) where λ is employed to contro l the pr oportion am ong different losses. When λ approach es 0 , Cor r-MMF mo d ule d isregards the cr oss-modal corr e lation, failing Criterio n 2. Con versely , when λ tends to inﬁnity , it neglects the p reservation of key features and the reco nstruction of distinct mo dalities, vio lating Criterion 1. Ther e fore, λ req uires pruden t selection to achieve an op timal balan ce in Co rr-MMF m odule perform ance. B. The MultiMod al Repr esen tation (MMR) Module In conventional multimod a l learnin g tasks, raw multime- dia data canno t be directly pr ocessed b y machines, so it is necessary to conduc t the uniﬁed description and proce ssing , known as the MultiModal Repre sen tation (MMR) [60], [61]. By d e ﬁnition, MMR refers to the process of repr esenting informa tio n from sev er al med ia in a tensor or vector fo rm [60]. While similar to MMF , it places g reater emphasis on the unifor m ity across d a ta representatio ns r ather than the featu r e fusion among d ifferent mo dalities. In our 3D- CF co nstruction task, th e horizon tal geogr aphic informa tio n E h and CF measureme n ts G gro have a lr eady been fused by Cor r-MMF modu le ( f ( Z ) = O CorrMMF ), hen ce, we need to fu rther process the vertical geogr aphic info rmation E v to align with the data rep resentation of O CorrMMF , w h ich motiv ates our d esign of the M MR mo dule. Speciﬁcally , the MMR module takes E v as th e inpu t and employs an auto-enco der as its core architecture, wh ich comprises an inp ut layer, a downsamplin g layer, a latent space, an upsamplin g layer , and an outpu t layer, as shown Low-dimensional Representation Conv+Relu+B N Input Layer Output Layer Conv+Relu+BN × L Conv+Relu+B N Conv+Relu+BN Downsampling Upsampling SAM Conv+Relu+BN × L Conv+Relu+BN Output feedback Output Output Output (a) Diagram of the MMR module. It emplo ys an auto-enc oder as the core archit ecture, including an input layer , a downsampling layer , a latent space, an upsampling layer , and an output layer . The spatial attent ion mechanism (SAM) is introduce d as well. weighted weighted weighted Features ¦ v MaxPool AvgPool Concatenated weighted Spatial Attention Matrix (b) Diagram of the spatial attention mechani sm (SAM). Fig. 4: Diagram of the MMR module and its SAM. in Fig. 4( b). Th e inpu t layer is prima r ily utilized for data reshaping , transform ing E v into an easily processable tensor O E v ∈ C h × w × c . The following downsampling layer and upsampling layer are similar to the encoder E 3 and virtual decoder D in Fig. 3 (a), respectively . It is worth notin g here that this structural similarity ensures the uniform ity between O CorrMMF and O E v , which is the cor e of th e MM R m odule. The o utput laye r, c orrespon ding to the input layer, is ﬁnally append ed to reco nstruct data back to E v . Additionally , through a meticulous analysis of the data structure of O E v , we observe that pixels in feature matrices on each c hannel imp ly the speciﬁc characteristics of spa tial commun ication environmen t at different positions. Therefo re, the MMR mod u le is expected to adap ti vely ev a lu ate the im- portance of each pixel, thu s highlig hting critical environmental features at key location s. T o this end, we intr oduce a spatial attention mech anism (SAM) to assign v ar ying weights to different pixels in each chann el of O E v [69], as depic te d in Fig. 4( b). Speciﬁcally , SAM employs a global m a x pooling and a globa l average poo ling to sep arately o btain featur e maps of each pixel ﬁb ers in O E v , deno ted as maxp o ol( O E v ) ∈ C h × w × 1 and avgp o ol( O E v ) ∈ C h × w × 1 , respectively . Subse- quently , we constru ct the integrated statistical info rmation by concatenatin g maxp o ol( O E v ) an d avgp o ol( O E v ) , an d feed it into a conv olu tional layer to der i ve the prelim inary weigh t matrix. Afterwards, the Sigmo id function is intr oduced for normalizatio n to acquire the ultimate spatial attention matrix V s ∈ C h × w × 1 , w h ich is given by V s = Sigmoid { Conv[Concat(avgpo ol( O E v ) , maxpo ol( O E v ))] } . (21) By broa d casting and multiplying V s with O E v , the down- sampling pro cess will focus more on the weighted crucial en v ironmen ta l featu r es at key locations, thereb y imp roving the perfor mance o f the MMR m odule. Overall, the ou tput of the MMR m odule, name ly the low- dimensiona l re p resentation of E v , c an be expressed as O MMR = BN(Relu(Conv( M S ⊙ O E v ))) , (22) where M S ∈ C h × w × c is the spatial attentio n tensor by broadc a stin g V s , w ith d imensions identical to those o f O E v . C. Th e Chann el State Information Re gression (CSI -R) Mo dule In the 3D-CF construc tio n task, E a nd G gro have been , respectively , transfor med into low-dimensional re p resentations O CorrMMF and O MMR with iden tical data structur es, w h ere O CorrMMF embodies the 3D-CF patterns in the hor izontal direction and O MMR embodies th e environment p atterns in the vertical direction. Based on this, we n ext design the CSI- R mo d ule to predict the chann el in formation at a speciﬁc LA V lo cation X in low-altitude airspace, thereby enabling the ﬂexible 3D-CF constru c tio n fr e e from grid co nstraints. × × × Feature Embedding . . . . . Flatten . . . . . Dro p ou t Dropout Dro p ou t Dropout conditional information LAV Position . . . Conv Conv FC Regression O CorrMMF O MMR Fig. 5: Diagra m of the propose d CSI-R module. The networ k includ es a feature embedding layer and a fully connecte d regressi on layer . As illustrated in Fig. 5, The CSI-R modu le uses O CorrMMF and O MMR as in p uts and consists of a featu re embedd ing layer and a f ully con n ected (FC) regression layer . Regarding the f eature embed ding lay er , we ﬁrst partition the featur e maps of O CorrMMF and O MMR into several eq ually-sized patches by a sliding win dow . T hen, a conv olu tional laye r with its kernel size matching the p atch size is fo llowed to generate a sequence of embed ding vectors. Flattening these sequences yield s a on e-dimension al embed ding outpu t, which will be fed into the subseque nt FC regression laye r . Note that the entire feature embeddin g pro cess ca ptures elemental similarities in th e f eature m aps of O CorrMMF and O MMR , respectively , p ositioning elem ents with higher similar ity closer in the embedd ing sp ace, thus accelerating the convergence and enh a n ce th e accur a cy . The FC regression con sists of se vera l dense lay ers, each fo llowed by a dro pout lay er to prevent severe overﬁtting. Durin g the regression pro cess, 3D coordin ates of the LA V are vectorized and incor porated as condition al inf ormation , prom pting th e pred iction o f locatio n- speciﬁc chan nel char acteristics for 3D-CF co nstruction. In summar y , th e mapp ing relation sh ip Ψ can be effectiv ely approx imated thro u gh the cascade of Corr-MMF modu le , MMR m odule, and CSI-R module, th ereby enabling th e con- struction of CSI-tuples in ( 8) an d achieving the con ﬁguration of 3D - CF accor ding to pra c tical commu nication requirem ents. This certainly provides novel insights into the low-altitude commun ications. I V . N U M E R I C A L R E S U LT S In this section, numer ical results ar e pr ovid ed to ev alua te the perfor mance of the p roposed 3D- CF multimod al framework. First, we intr o duce the gen eration o f our Sion na-based datasets and detail the experim e nt setup. Then , w e explore the imp act of λ on the accuracy of 3D- CF construction. By employing Kriging interpo la tio n, GPR, GAN, and FL as ben chmarks, we next comp are the 3D-CF co nstruction perfo rmance across different scenario s, dem onstrating the prediction accuracy a nd generalizatio n cap ability o f the pro posed 3D- CF mu ltimodal framework. Finally , w e analy ze the computation al comp lexity . A. Data sets All datasets are g enerated by NVIDIA Sionna, an open - source library for research o n wireless co m munication system s [70]. Speciﬁcally , we ﬁrst download the geog raphic environ- ment maps o f Nanjing, China, from Open StreetMap (OSM) [71], a collabo rativ e mappin g p roject maintained b y a global commun ity o f volunteers who continu ously update a n d vali- date geog raphic infor m ation. These maps a r e then imported into Blender to co nstruct communic a tion scenar io s, which are su bsequently transferre d to Sionna for dataset ge n eration via ray tracin g (R T). No te th at all selected areas re p resent typical u rban macr o -cell or m icro-cell scenarios with varying building shapes, quan tities, and distributions, thereby ensuring strong da ta diversity to supp ort model training and facilitate the evaluation of gener alization capab ility . T o facilitate ou r experiments, the BS is ra n domly deployed in th e target region with N BS = 64 , and the LA V ﬂight altitudes are conﬁned to 25 − 80 m. CF measurem ent da ta G gro are un iformly samp led at a height of 1 . 5 m with a resolution of 1 m . Note that there exists an inh erent reso lution-accu racy- complexity trade-o ff: enhan ced near-ground sampling density improves the acc u racy of 3D-CF constru ction, b u t increases the comp u tational com plexity . Hen ce, the resolution of G gro should b e determined based on practical limitatio n s in real- world scenar ios. Based on the speciﬁc in terface for wireless commun ication simulation in Sionna, we co nduct ray tracin g to obtain all possible LOS and NLOS path s between BS an d LA Vs, establish the channel model in acc ordance with (1 ), compute RSS acco rding to (7), and construct 3D-CF by (8 ). T o fur ther enhan ce th e stability of n etwork training , we ado pt the “m ax-min” linear n ormalization to scale the raw RSS in 3D-CF into [0 , 1] , which is given by g ′ m = max  g m − g thr ( g m ) max − g thr , 0  , (23) where g thr is the RSS threshold since sign als below g thr are practically un detectable by LA V in real-world scen arios [59]. Add itional par a m eters u sed to gen erate the d atasets are displayed in T able I. T ABLE I P A R A M E T E R S U S E D T O G E N E R A T E D AT A S E T S I N T H E P L AT F O R M O F S I O N NA . Parameters V alue Communication scenario urban macro-cell / micro-cell Size of the target region 256 × 256 Number of antenna elements for BS 64 BS height 25 m LA V ﬂight altitudes 25 − 80 m Sampling height of G gro 1 . 5 m Carrier frequency 3 . 5 GHz Maximum number of paths 5 RSS threshold − 147 dB Transmit power 23 dBm B. Exp eriment Setup In corr esponden ce with the 3 D-CF multimod a l fram ew or k in Section III, we co nﬁgure all h yper-parameters of the th ree modules in T able I I . Speciﬁcally , regarding the Corr-MMF module, the batch size is set to be 128 an d the epochs are 60 , with the learning rate programm ed to be 0 . 00 01 for the ﬁrst 35 epochs an d then linearly decaying to zero. For the MMR mod ule, the train ing epo c h s are set to b e 60 , an d the learning r ate is initialized at 0 . 001 fo r the ﬁrst 35 epoc h s and subsequen tly re d uced linea rly to zero. The train ing pro cess of the CSI-R mod ule req uires 15 e pochs with a reduced batch size o f 32 . The learning r a te is set to be 0 . 0 001 for th e ﬁrst 10 epochs, followed by a lin ear decay to zero. All the simulatio ns are imp lemented by T en sorFlow , with the com puter equippe d with an Intel( R) Cor e(TM) i7-12 700 and a GeForce GTX 4090. T o evaluate the p e r forman ce of the prop osed m ultimodal framework in 3D-CF constructio n, we emp loy th e mean ab- solute er ror (MAE) and r o ot me a n square error (RMSE) as metrics, wh ich can be expressed as MAE = 1 n n X i =1 | ˜ g ′ m − g ′ m | , (24) RMSE = v u u t 1 n n X i =1 ( ˜ g ′ m − g ′ m ) 2 , (25) where ˜ g ′ m is the predicted RSS in CSI-tuples in 3D-CF . Since MAE provides equal weights to all error s, it can intuitively re- ﬂect the overall 3D-CF construction perform a n ce. Conversely , RMSE exh ib its greater sensitivity to outliers, facilitating the detection of extrem e pred iction deviations in ou r pro posed 3D-CF m ultimodal fram ew ork . The combin ation of these two m etrics ena bles a mo re comp rehensive ev alua tion of the experimental results. T ABLE II H Y P E R - PA R A M E T E R S O F C O R R - M M F, M M R , A N D C S I - R M O D U L E S I N 3 D - C F M U LT I M O D A L F R A M E W O R K . Parameter Module Corr-MMF MMR CSI-R Epochs 60 50 15 Delay Epochs 35 35 10 Learning Rate 0.005 0.001 0.0001 Batch Size 128 128 32 Optimizer Adam C. Inﬂ u ence of λ in th e Corr-MMF mod ule In Corr-MMF mo d ule, λ would inﬂuence the network perfor mance by con trolling the prop ortions among different losses. Therefo re, in this section, we examin e its im pact on the 3D-CF constru ction erro r an d an a ly ze its e ffect on th e conv ergen ce b ehavior o f the Corr-MMF mo dule. 0 0.5 1 1.5 5 10 0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 MAE/RMSE 0 0.2 0.4 0.6 0.8 1 Correlation 0.036 0.032 0.029 0.034 0.035 0.039 MAE RMSE Correlation 0.068 0.066 0.063 0.063 0.069 19.8% 89.6% 99.9% 99.9% 97.9% 87.6% 0.060 Fig. 6: MAE and RMSE performance of the 3D-CF multimodal framewor k under dif ferent λ . As shown in Fig. 6 , th e propo sed m ultimodal framework can achieve the minimal 3D- CF con struction erro r at λ = 1 , with MAE = 0 . 0 29 an d RMSE = 0 . 060 , capturin g appro ximately 89 . 6 % o f the corre la tio n be twe e n E h and G gro . As λ g radually decreases, the weight of L corr ( ϑ ) redu ces an d th e 3 D - CF construction error incr eases progre ssively . Notably at λ = 0 , the perfo rmance o f MAE and RMSE deteriorates by 24 . 1 % and 15 % respectively , with cap tured correlatio n betwe e n E h and G gro dropp ing to 19 . 8 %. In prac tice , an excessively small λ will prevent the Corr-MMF module from extractin g correlated f eatures b e tween E h and G gro , thereby d isabling the opera tio n of feature fusion a nd degradin g the con struction perfor mance of the p roposed 3D- CF mu ltimodal framework. Similarly , as λ increases, the we ig ht of L corr ( ϑ ) rises and the per formanc e of 3D-CF con struction dec lines. Especially at λ = 10 , wh ere λ L corr ( ϑ ) can be considered in ﬁnite co mpared with the mag nitude of o th er losses, th e perf ormance of MAE and RMSE d eteriorates by 3 4 . 4 % and 1 3 . 3 %, respectively . Here, an excessi vely large λ will u nreasonab ly overemphasize the correlation between E h and G gro (capturin g as much as 99.9%), completely sup pressing the ir distinctive featur es and leading to a collap se of the Corr-MMF mo dule. T o co nclude, the d etermination of λ m u st strike an effective balance between distinctiv e feature extraction and correlated featu re fusion , thus optimizing the p erforma nce of Corr-MMF mo d ule and achieving accurate 3D- CF con stru ction. 0 10 20 30 40 50 60 Epochs 0.05 0.07 0.1 0.2 0.5 1 2 4 6 MSE Fig. 7: T he con verge nce behavi or of the valid ation loss for the Corr-MMF module under differen t λ . Fig. 7 pre sents the co n vergence behavior of the validation loss for the Corr-MMF module under d ifferent λ . A fu n - damental observation is that larger λ co rrespond s to gr eater λ L corr ( ϑ ) , consequently r esulting in h igher values o f th e objective f unction L ob j ( ϑ ) . Nevertheless, the validation loss is lower wh en λ = 1 com p ared to λ = 0 . 5 , wh ich dem o nstrates that λ = 1 can optimize the n etwork perf ormance to th e greatest extent. This result is con sistent with the conclusion obtained in Fig. 6. Fu rthermor e, smaller λ leads to smoo ther loss cu rves and contributes to a more stable network tr a ining, whereas larger λ resu lts in greater ﬂuctuations. This occu r s because sma ller λ balan ces the m agnitudes of different losses, enabling stead ier execution of the gr adient descent alg o rithm. Con versely , larger λ causes L corr ( ϑ ) to be mor e d ominant, making the network signiﬁcantly more susceptible to its o s- cillation and hence r e sulting in ﬂuc tu ations across different epochs. T o conclud e, variations in λ substantially im pact the stability of ne twork training. D. Eva lu ation of the 3D-CF Constructio n P erformance In this section, we compar e the p erform a nce of the p r oposed 3D-CF multimo dal framework with four benchm arks and ev aluate th e ge n eralization capab ility unde r different scenarios. 1) Be n chmarks: T o ev aluate th e perfor mance of our pro- posed 3 D-CF m ultimodal fram ew or k, the Kriging interpo lation [16], GPR [17], GAN [3 0], and FL [72] are ad opted as benchm a rks. • Kriging interpo lation [ 1 6]: Kriging is a classical in ter- polation method for constructing 3D-CF . Based on the prior sampling data, it ach iev es RSS estimation at arbi- trary locatio ns by inco rporating distances and mode lin g 3D spatial co rrelation thro u gh d ifferent variograms. T he Kriging interpo la tio n meth od d o es n ot require a grid- based mode l fo r CF , thu s making th e ﬂexible 3D-CF construction po ssible. • GPR [ 17]: Gau ssian p rocess regression is a wid ely used statistical non-p arametric model for 3D-CF co nstruction. It can co nstruct the o ptimal appr oximator of RSS dis- tribution b y designing sp eciﬁc kernel function s based on signal p r opagation char acteristics. Th is GPR-based method doe s not requ ire the grid -based model as well, therefor e makin g the 3D-CF construction mo r e ﬂexible. • GAN [30 ] : Generative adversarial network is a signiﬁcant deep generative mo del emp loyed for 3D-CF con struc- tion. Unlike the Kriging-b ased method and GPR-based method, GAN perceives the co mmunicatio n environment and utilizes it as con d itional informatio n to generate 3D-CF . Howev er, it requ ires the grid-ba sed m o del to represent CF as a mu lti-channel image, wh ich necessitates a unifo rm par titioning of the target area and assign s the same RSS for all LA Vs within th e same grid, resulting in lower accur acy an d redu ced ﬂexibility . • FL [72]: Federated learn ing rep resents a state-of the- art framework to recover 3D- CF . Em bedded with deep neural networks, it enab les th e collabora tive utilization of data f r om multiple LA Vs to establish the global 3D- CF . Since th e r egression-based FL fram e work requ ires no special a ssum ptions abo ut the CF model, it b alances both accuracy and ﬂexibility in the p rocess of 3D-CF construction . The sampling rate for Kr ig ing interp olation and GPR is set to be 5 %, wh ile the training, validation, and test datasets are identical f or all other M L-based methods. 2) Comp arison to Differ ent Ben chmarks: T able III p resents the comp arison of 3D-CF constru ction per f ormance be twe e n the propo sed multimoda l f r amew o rk and the fou r othe r base- lines. Compared to the non-A I methods like Krigin g interp o- lation [1 6 ] and GPR [17 ], the 3D-CF multimod al f r amew o rk demonstra te s a red uction in RMSE by factors of 7 . 5 an d 4 . 9 , respectively , and a d e c rease in MAE by factors of 10 . 6 and 5 . 1 , respectively , signiﬁca n tly enhancing the accu racy of 3D-CF construction . Fundamenta lly , the Krig ing-based method and GPR merely ﬁt the data itself without exploring the impact o f (a) 3D-CF for scenario 1. (b) 3D-CF for scenario 2. (c) 3D-CF for scenario 3. (d) 3D-CF for scenario 4. Fig. 8: Illustrati ons of 3D-CF under four randomly selected communicati on scenarios. The RSS distributi on at four horizontal planes, speciﬁca lly at heigh ts of 10 m, 20 m, 30 m, and 40 m, are illust rated along the Z axis. T ABLE III C O M PA R I S O N O F T H E 3 D - C F M U LT I M O D A L F R A M E W O R K W I T H S T AT E - O F - T H E - A RT A P P RO AC H E S U N D E R D I FF E R E N T C O M M U N I C AT I O N S C E NA R I O S R E G A R D I N G T H E R M S E A N D M A E P E R F O R M A N C E . Scenario 1 Scenario 2 Scenario 3 Scenario 4 RMSE MAE RMSE MAE RMSE MAE RMSE MAE Kriging [16] 0.522 0.467 0.348 0. 287 0.340 0.289 0.459 0.384 GPR [17] 0.343 0.223 0.291 0. 201 0.304 0.161 0.325 0.194 GAN [30] 0.234 0.195 0.208 0. 156 0.269 0.221 0.266 0.223 Federated Learning [72] 0.088 0.067 0.041 0. 034 0.035 0.029 0.056 0.039 Multimodal framewor k (ours) 0.069 ↓ 0.044 ↓ 0.026 ↓ 0.019 ↓ 0.024 ↓ 0.021 ↓ 0.045 ↓ 0.027 ↓ commun ication environmen ts on chann e l ch a r acteristics. Par- ticularly in low-altitude airsp ace, wher e th e phy sical environ- ments become m ore com plex, pure da ta ﬁtting is no lo nger su f - ﬁcient to accu rately predict th e distribution o f chan n el infor- mation. Secondly , co mpared to the classical g enerative mo del GAN [30], the propo sed multimod al fra m ew or k can achiev e 3 . 4 -fo ld and 4 . 4 -fold performan ce ad vantages in RMSE and MAE metrics, respectively . On e point sh ould be no ted that in lo w-altitude airsp a c e, the phy sical en v ir onment comp rises different data moda lities in horizo ntal and vertical dim ensions, with LA V c o ordinates ev olving into tern ary arr ays. Sin c e GAN in [ 30] is lim ited to p rocessing merely single-m odal data, the 3D-CF construction a ccuracy is inevitably com p romised. Moreover , the adoptio n of GAN need s to mod el 3 D-CF as images, resulting in a rig id and inﬂexible co nstruction and cannot achieve the n on-unif orm den sity adaptatio n accord- ing to p ractical commun ication requ irements. Our prop osed multimoda l framework effectiv ely add resses these lim itations, thereby signiﬁcan tly redu cing 3D-CF c onstruction error s and enhancin g ﬂexibility . Finally , compar e d to the state-of-the-a rt method, th e prop osed m ultimodal framework still outperfo rms the FL-based appro a c h [7 2] by 27 . 5 % a n d 52 . 2 % in terms of the RMSE and MAE, respectively . This stems from the operation of feature extraction and feature fusion for diverse data mo dalities via Corr-MMF modu le and M M R modu le, which enables the ﬁnal regression network to compr ehensively learn the m apping relation ship b etween LA V’ s po sitions and its RSS. T o conclu de, owing to th e CSI-tuple s- b ased model and the module- based design , the pr oposed m u ltimodal fr a me- work exhibits superior perfo rmance an d high ﬂexibility in 3D- CF construction . 3) Comp arison Un der Differ en t S cenarios: Fig. 8 p resents the 3 D-CF across four rando mly selected com munication scenarios with different urban structu res or building densities. For illustrative purp oses only , the RSS distribution across four h orizontal plan e s at 10 m, 20 m, 3 0 m , and 40 m is giv en alo ng th e Zaxis. T ab le I II p r ovides the comp arison of RMSE and MAE pe rforman ce across these scenarios. It can b e observed that, regardless of the scenario, o ur p roposed mu l- timodal framework c an always achieve 3D-CF constru ction with smaller errors th an the baselin e s, de m onstrating its gen- eralization ability with RMSE and MAE standard d eviations of 0 . 0 012 and 0 . 00 14 , r espectiv ely . E. Ab lation Experimen ts In this subsection, we condu ct two ablation experime n ts to respectively d emonstrate the contributions o f d ifferent input modalities and attention mechanisms to the prop osed 3D-CF multimoda l fr amew o rk. 1) I n put Mo dality: T able IV presents the contr ibution of each inpu t modality to the prop osed 3D-CF mu ltimodal frame- work. Firstly , when E v is missing, the mod e l fails to learn the vertical signal-propag ation ch a r acteristics, leadin g to a rapid perfor mance deterio r ation. This o u tcome indicates that the construction of 3 D-CF mu st accoun t for the three-dim ensional nature of the low-altitude e nvironmen t. If existing 2 D - CF co n - struction me th ods a re dir e ctly ap plied to low-altitude scenar io s that only co nsider the ho rizontal building distribution, severe model mismatch will ar ise. Seco ndly , wh en bo th E h and G gro are missing, the mo del cannot lear n the h orizontal cha r acteris- tics of the CSI distribution, leading to sign iﬁcant degradation in model per formanc e as well. Howe ver , when o nly o ne o f them is missing, the 3D-CF constructed by the pro posed architecture exp e riences only a 4. 4% drop in pe rforman ce. In pr actice, bo th h orizontal environmental inform ation an d near-ground m e a surements can r eﬂect the signal-pro pagation characteristics in the hor izontal d irection, and the absence of either alo ne do es not cause model collap se. T ABLE IV C O N T R I B U T I O N O F E A C H I N P U T M O DA L I T Y T O T H E 3 D - C F M U LT I M O D A L F R A M E W O R K . Geographic environments E Measurements G gro Evaluation metrics E h E v RMSE MAE ! ! ! 0. 045 0. 027 ! ! % 0. 047 0. 028 % ! ! 0. 047 0. 029 ! % ! 0. 687 0. 675 % % ! 0. 700 0. 692 In sum mary , the ablatio n study on input mo d alities demo n- strates th e effectiv en ess and n ecessity of each type of prior informa tio n, thereby v a lidating th e ratio nality of ou r p roposed 3D-CF multimodal fr amew o rk. 2) Atten tion Mechanisms: T able V pr esents the impact of three different attention m echanisms, inclu ding T AM, CAM, and SAM, on the 3D-CF con struction. Firstly , the model perfor mance deteriorates severely when the T AM is removed, whereas the d egradation is less pro nounced wh e n the CAM or SAM is removed. In pr inciple, the T AM op erates d ir ectly on the in p uts, primar ily strengthening the near-grou nd mea- surement data near th e L A V projec tion. In contrast, th e CAM and SAM f unction inter nally within the network, emphasizing key features thr ough self- learning. Once the T AM is ab sen t, the data itself lack s essential weig hting, leading to sev er e perfor mance degrad ation regardless of th e intern a l network design. Secondly , the p erforman ce d ecline in the ab sence o f CAM is less severe th a n that when the SAM is missing. Sin ce both the CAM and T AM are mecha nisms within the Corr- MMF m odule, even if CAM is removed, th e T AM mec h anism still enables this module to achiev e a relatively e ffecti ve training resu lt. T ABLE V A B L ATI O N S T U DY O N D I FF E R E N T AT T E N T I O N M E C H A N I S M S F O R 3 D - C F C O N S T RU C T I O N . Corr-MMF module M MR module Evaluation metrics CAM T AM SAM RMSE MAE ! ! ! 0.045 0.027 % ! ! 0.065 0.052 ! % ! 0.546 0.455 ! ! % 0.290 0.282 In summar y , the ab la tio n stud y on attention mechan isms demonstra te s that all T AM, CAM, and SAM play indispen s- able roles in the p roposed multimod al fram ew or k, en abling the accurate and efﬁcient 3D-CF constructio n. F . Complexity Comp arison In this subsection, we analy ze the co mplexity of the pr o- posed 3D-CF multimodal framework versus bench marks and present c omparative results of their inferen ce times. Kriging GPR GAN FL Proposed scheme 2.5 10 15 25 35 45 Inference time (s) 42.73 18.17 2.81 5.18 3.06 Fig. 9: Comparison of the 3D-CF inferen ce time between the proposed multimodal frame work and s tate -of-the-art approaches. Deﬁne N as the amo unt of training d ata. For the Kriging - based method, we m ust solve the Kriging system of equations with a size o f N × N to co mplete the construction of 3D- CF , the complexity o f which r emains O ( N 3 ) even when employing the Cholesky decom position [73]. For the GPR- based meth od, th e comp lexity of covariance matr ix inv er- sion is O ( N 2 ) , and the comp lexity of margina l likeliho od approx imation in th e process o f solv ing po ster ior proba- bility is O ( N ) . Consequently , the overall com plexity f or the GPR-ba sed 3D-CF constructio n is O ( N 3 ) [ 74]. For the GAN-based appro ach, its comp lexity can b e expr e ssed as P L GAN ℓ =1 O ℓ ( N C ℓ − 1 H ℓ W ℓ C ℓ K 2 ℓ ) , where L GAN is the num b er of c o n volution layers, K ℓ is size of the conv olutio n kernel, an d H ℓ , W ℓ , and C ℓ are heights, widths, and ch annels of the ℓ -th layer’ s ou tput, respectively . For the FL-based appro ach which is main ly comp osed o f the multilayer p erceptron , its complex- ity is gi ven b y P P FL p =1 O ( N d p h p ) , where P FL is the nu mber of hidde n layer s, d p is the in put dim ension and h p represents the num ber of neuron s in the p - th hidden lay er . Regarding our prop osed 3 D-CF multimod a l framework, its complexity is the sum of comp lexities fro m all th ree mod ules, which is g iv en b y P L CorrMMF + L MMR ℓ =1 O ( N C ℓ − 1 H ℓ W ℓ C ℓ K 2 ℓ ) + P P CSIR p =1 O ( N d p h p ) , where the ﬁrst te r m contains both the Corr-MMF mo dule and M MR mo dule due to their similar network structur e s. Fig. 9 fu rther presen ts the inference time of 3D-CF co n- struction for different m ethods. As ob served, the pr oposed multimoda l fram ew or k signiﬁcantly reduces the inference time compare d to the Krigin g-based and GPR-based methods. It also achieves lower constru ction erro rs and enhan ced ﬂexibil- ity with c o mparab le time comp lexity compar e d with the GAN- based ap proach. Relative to the state-of-the- art FL algorith m, the prop osed scheme near ly halves th e c omputation al time. It should be emphasized that the propo sed 3D-CF is de sig ned to be trained, inf e rred, and deployed at the BS side. Given the abundant comp utational resou rces av ailab le at the BS, this framework is practical. T o co n clude, o ur multim odal fram ew or k exhibits com peti- ti ve advantages in co mputation al co mplexity , ind ica ting gr eat potential for p ractical application s. V . C O N C L U S I O N In this paper, we propo sed a mod ularized multimo dal frame - work to construct 3D-CF fo r low-altitude commu nications. Firstly , we established the 3D-CF model ba sed on the g round - to-LA V Rician fading ch a nnels, which were deﬁned as a collection o f CSI-tu ples with each tuple com p osed of LA V’ s positions and its correspo nding chann el info rmation. D u e to the h eterogene o us structu res of different prio r data such as LA V co o rdinates, g eograph ic environment maps, and samp ling data, we tr ansformed the 3D-CF co nstruction problem into a multimodal regression task an d pr oposed a high-efﬁciency modular ized mu ltimodal framework accordin gly , where the Corr-MMF mo dule and the MMR module were designed to extract fea tu res o f CSI distribution in horizontal and vertical directions, and the CSI-R m odule was developed to estimate the target CSI an d reco nstruct the 3D-CF . Nu merical results demonstra te d the com petitiv e perf o rmance and generalizatio n ability of our pro posed 3D-CF multimoda l framework, which attains an accur acy imp rovement of at least 27.5% over th e benchm a rks under dif f e rent commun ication scenarios. W e also an alyzed the computa tio nal complexity and illustrated its sup eriority in terms o f the inferenc e time. In the f uture, it will b e intere sting to quantify the impa c t of measurem ent data reso lution on th e constru cted 3 D-CF accuracy and its computatio nal complexity . Furth ermore, c o nsidering air-to-air channels and d ynamic environmental factors also co nstitutes a highly prom ising research direction for low-altitude 3 D - CF technique s. R E F E R E N C E S [1] C. Xie, L . Y ou, R. Chen, G. He, and X. Gao, “MML-based 3D channel ﬁngerprint s construct ion for lo w-altitude communicat ions, ” in in Proc. IEEE WCNC 2026 , Kual a Lumpur, Malaysia, Apr . 2026, pp. 1–6. [2] H. Zhang, Z. Han, G. C. Alexandropoul os, and N. H. Tran, “Special issue on aerial access networks for 6G, ” Journa l of Communicat ions and Networks , vol . 24, no. 2, pp. 121–124, Apr . 2022. [3] C. Xu, X. L iao, J. T an, H. Y e, and H. Lu, “Recent research progress of unmanned aerial vehicle regula tion policies and technologi es in urban lo w altitud e, ” IEEE Access , vol. 8, pp. 74 175–74 194, Apr . 2020. [4] N. Hossein Motlagh, T . T aleb, and O. Arouk, “Lo w-altitude unmanned aerial vehicl es-based internet of things services: Comprehe nsive survey and future perspecti ves, ” IE EE Internet Things J . , vol. 3, no. 6, pp. 899–922, Dec. 2016. [5] H. Kang, J. Joung, J. Kim, J. Kang, and Y . S. Cho, “Protect your sky: A survey of counter unmanned aerial vehi cle systems, ” IEEE Access , vol. 8, pp. 168 671–168 710, Sep. 2020. [6] X. Y e, Y . Mao, X. Y u, S. Sun, L. Fu, and J. Xu, “Inte grated sensing and communications for lo w-altitude economy: A deep reinforcement learni ng approach, ” IEEE T rans. W irele s s Commun. , pp. 1–1, 2025. [7] S. Shao, W . Zhu, and Y . Li, “Rada r detection of low-slo w-small U A Vs in comple x en vironments, ” in in Proc . IEE E ITAIC 2022 , vol. 10, Chongqing , China, Jun. 2022, pp. 1153–1157. [8] J. Li, L. Y ang, W . Hao, I. Ahmad, H. Liu, F . Shu, and D. Niyato, “Multi -layer transmitting RIS-aided recei ver for collab orati ve jamming and anti-jamming networks, ” IEE E T rans. W irele s s Commun. , vol. 24, no. 8, pp. 6518–6534, Aug. 2025. [9] J. Li, C. Zhou, J. Liu, M. Sheng, N. Zhao, and Y . Su, “Rei nforcement learni ng-based resource alloca tion for cov erage conti nuity in high dy- namic U A V communication network s , ” IEEE T rans. W irel ess Commun. , vol. 23, no. 2, pp. 848–860, Feb . 2024. [10] Y . Z eng and X. Xu, “T owa rd envi ronment-aw are 6G communications via channe l kno wledge map, ” IEEE W ir eless Commun. , vol. 28, no. 3, pp. 84–91, Jun. 2021. [11] H. B. Y ilmaz, T . T ugcu, F . Alagz, and S. Bayha n, “Radio en vironment map as enabler for practical cogniti ve radio network s , ” IEEE Commun. Mag . , vol. 51, no. 12, pp. 162–169, Dec. 2013. [12] H. Che, L. Y ou, J. W ang, Z. Jin, C. Xie, and X. Gao, “Channel charti ng-assisted non-orthogo nal pilot allocatio n for uplink XL-MIMO transmission, ” Chin. J. Electr on. , 2025 (earl y access). [13] D. Wu , Y . Zeng, S. Jin, and R. Zhang, “En vironment- aware hybrid beamforming by lev eraging channel knowle dge map, ” IEEE T rans. W ireless Commun. , vol. 23, no. 5, pp. 4990–5005 , May 2024. [14] L. Zhao, Z. Fei, X. W ang, J. Huang, Y . Li, and Y . Zhang, “IMNet: Interfer ence-a ware channel knowledg e m ap construction and localiza - tion, ” IEEE W ireless Commun. Lett. , vol. 14, no. 3, pp. 856–860, Mar . 2025. [15] J. W ang, Z. Lin, Q. Z hu, Q. Wu, T . Lan, Y . Zhao, Y . Bai, and W . Zhong, “3D spectrum mapping and reconstruction under multi-radia tion source scenari os, ” China Commun. , vol. 23, no. 2, pp. 20–34, Mar . 2024. [16] A. Ivano v , K. T onchev , V . Poulk ov , A. Manolo va, and A. Vlahov , “Interp olatio n accurac y ev aluation for 3D radio en vironment maps construct ion, ” in IEEE Int. Symp. W ire less P ers. Multimedia Commun. (WPMC) , T ampa, FL, United states, Nov . 2023, pp. 1–7. [17] X. Chen, X. Z hong, Z. Z hang, L. Dai, and S. Zhou, “High-ef ﬁ ciency urban 3D radio m ap estimati on based on sparse measurements, ” IEEE T rans. V eh. T echnol . , vol. 74, no. 10, pp. 16 488–16 493, Oct. 2025. [18] J. W ang, Q. Zhu, Z. Lin, J. Chen, G. Ding, Q. W u, G. Gu, and Q. Gao, “Sparse Bayesian learning -based hierarc hical constructio n for 3D radio en vironment maps incorporat ing channel shadowi ng, ” IEE E T rans. W ireless Commun. , vol. 23, no. 10, pp. 14 560–14 574, Oct. 2024. [19] F . Shen, G. Ding, Q. W u, and Z. W ang, “Compressed wideband spectrum mapping in 3D s pectrum-heteroge neous en vironment, ” IEEE Tr ans. V eh. T echno l. , vol. 72, no. 4, pp. 4875–4886, Apr . 2023. [20] F . Shen, Z. W ang, G. Ding, K. Li, and Q. Wu, “3D compressed spectrum mapping with sampling locations optimization in s pectru m - heterog eneous envi ronment, ” IEEE T rans. W irele ss Commun. , vol. 21, no. 1, pp. 326–338, Jan. 2022. [21] Q. Wu , F . Shen, Z. W ang, and G. Ding, “3D spectrum mapping based on R OI-driv en U A V deployment, ” IEEE Network , vol. 34, no. 5, pp. 24–31, Oct. 2020. [22] K. Y in, S. Fang, F . Chu, and Y . Fan, “Compressed tensor completion: Approach for UA V-aid ed 3-D radio map construct ion, ” IEEE Internet Things J. , vol. 11, no. 24, pp. 40 516–40 531, Dec. 2024. [23] H. Sun and J. Chen, “Energy-modiﬁed lev erage sampling for radio map construct ion via matrix completion, ” IEEE Signal Proce ss Lett. , vol. 31, pp. 1780–1784, J un. 2024. [24] C. Li, Z . Dou, and Y . Lin, “Fast 3-D radio map reconstruct ion via cross tensor approxi mation, ” IEEE Internet Things J. , vol. 11, no. 24, pp. 40 619–40 633, Dec. 2024. [25] H. Sun and J. Chen, “Integrate d interp olation and block-term tensor decomposit ion for spectrum map construction, ” IEEE T rans. Signal Pr ocess. , vol . 72, pp. 3896–3911, Aug. 2024. [26] W . Liu and J. Chen, “U A V-aided radio map construct ion exploit ing en vironment semantics, ” IEE E T rans. W irele ss Commun. , vol. 22, no. 9, pp. 6341–6355, Sep. 2023. [27] B. Li and J. Chen, “Radi o map-assisted approach for interferen ce-awar e predict ive U A V communications, ” IEEE T rans. W ireless Commun. , vol. 23, no. 11, pp. 16 725–16 741, Nov . 2024. [28] P . Zeng and J. Chen, “U A V-aided joint radio m ap and 3D en vironment reconstru ction using deep learni ng approac hes, ” in in Proc. IEEE ICC 2022 , Seoul, K orea, Republ ic of, May 2022, pp. 5341–5346. [29] S. Z hang, S. Jiang, W . Lin, Z . Fang, K. Liu, H. Zhang, and K. Chen, “Genera tiv e AI on SpectrumNe t: An open benchmark of multiband 3-D radio maps, ” IEEE T rans. Cognit. Commun. Networking , vol. 11, no. 2, pp. 886–901, Apr . 2025. [30] T . Hu, Y . Huang, J. Chen, Q. W u, and Z. Gong, “3D radio map re- construct ion based on generati ve adve rsarial networks under constraine d aircra ft traject ories, ” IE EE T rans. V eh. T echno l. , vol. 72, no. 6, pp. 8250– 8255, Jun. 2023. [31] X. W ang, Q. Zhang, N. Cheng, J. Chen, Z. Zhang, Z. Li, S. Cui, and X. Shen, “RadioDif f-3D: A 3D 3D radio map dataset and generati ve dif fusion based benchmark for 6G en vironment-a ware communica tion, ” IEEE T rans. Network Sci. Eng. , pp. 1–18, 2025. [32] C. Xie, L. Y ou, Z. Jin, J. T ang, X. Gao, and X.-G. Xia, “CF-CGN: Channel ﬁngerprints extrapola tion for multi-ban d massi ve MIMO trans- mission based on cycle -consistent generati ve netw orks, ” IEE E J . Sel. Area s Commun. , vol. 43, no. 11, pp. 3722 – 3736, Nov . 2025. [33] Z. Jin, L. Y ou, J. W ang, X. -G. Xia, and X. Gao, “ An I2I inpainti ng approac h for efﬁc ient channel knowle dge map constructi on, ” IEEE T rans. W ir eless Commun. , vol . 24, no. 2, pp. 1415–1429 , Feb . 2025. [34] T . Wu , J. Liu, J. Liu, Z. Huang, H. W u, C. Zhang, B. Bai, and G. Zhang, “ A nov el AI-based framew ork for AoI-optimal trajectory planning in U A V-assisted wireless sensor networks, ” IEEE T rans. W irele ss Com- mun. , vol. 21, no. 4, pp. 2462–2475, Apr . 2022. [35] S. Kim, S. Jeong, J. Wu , B. Shim, and M. Z. Win, “Large multimodal model-base d envi ronment-aw are channel estimatio n, ” IE EE J . Sel. Areas Commun. , vol. 43, no. 12, pp. 4059–4075, Dec. 2025. [36] H. Shimomura, Y . Koda, T . Kanda, K. Y amamoto, T . Nishio, and A. T aya, “V ision-aided frame-capture -based CSI recompositi on for WiFi sensing: A multimodal approach, ” in in Proc. IEEE CCNC 2023 , Las V e gas, NV , USA, Jan. 2023, pp. 913–914. [37] Z. Xin, Y . Liu, J. Xing, J. Huang, J. Bian, Z. Bai, and C. W ang, “Multi m odal fusion-based channel predict ion and characteriz ation for mmWave U A V A2G communications, ” IEEE T rans. Commun. , vol. 74, pp. 5089–5104, Feb. 2026. [38] G. Charan, T . Osman, A. Hredzak, N. Thawda r, and A. Alkhateeb, “V ision-position multi-modal beam prediction using real millimeter wa ve dataset s, ” in in Pr oc. IE EE WCNC 2022 , Austin, TX, USA, Apr . 2022, pp. 2727–2731. [39] F . Jiang, Y . Peng, L. Dong, K. W ang, K. Y ang, C. Pan, D. Niyato, and O. A. Dobre, “Large langua ge model enhanced multi-agent systems for 6G communicat ions, ” IEEE W ireless Commun. , vol. 31, no. 6, pp. 48–55, Dec. 2024. [40] H. Kim, T . Roh, and B. Shim, “Multi-modal sensing-aid ed beam man- agement for 6G communicat ion systems, ” in in Proc. IEEE VTC2024- F all , W ashington, DC, USA, Oct. 2024, pp. 1–5. [41] Y . Ahn, J. Kim, S. Kim, K. Shim, J. Kim, S. Kim, and B. Shim, “T oward intel ligent millimeter and terahertz communication for 6G: Computer vision-ai ded beamforming, ” IE EE W ireless Commun. , vol. 30, no. 5, pp. 179–186, Oct. 2023. [42] Y . Y ang, F . Gao, C. Xing, J. An, and A. Alkhateeb, “Deep multimodal learni ng: Mergi ng sensory data for massiv e MIMO channe l predict ion, ” IEEE J. Sel. A re as Commun. , vol. 39, no. 7, pp. 1885–1898, Jul. 2021. [43] Z. Jin, L. Y ou, D. Wing Kwan Ng, X.-G. Xia, and X. Gao, “Near-ﬁe ld channe l estimat ion for XL-MIMO: A deep generati ve model guided by side information, ” IEEE T rans. Cognit. Commun. Networking , vol. 12, pp. 628–643, May 2025. [44] Z. W en, G. Li, Z . Liu, Y . L i, and S. Han, “ A m ulti-moda l learning frame work for MIMO channel information acquisiti on, ” in in Pr oc. IEE E ICCC 2024 , Chengdu, China, Dec. 2024, pp. 2393–2399. [45] F . Jiang, L. Dong, Y . Peng, K. W ang, K. Y ang, C. Pan, and X. Y ou, “Large AI model empo wered multimoda l semantic communicati ons, ” IEEE Commun. Ma g. , vol. 63, no. 1, pp. 76–82, Jan. 2025. [46] J. Li, L. Y ang, C. Y ou, I. Ahmad, P . S. Bithas, M. Di Renzo, and D. Niyato, “ Absorpti ve RIS-assisted near-ﬁeld covert communication with ﬂuid antenna systems, ” IEEE J. Sel. Areas Commun. , vol . 44, pp. 2052–2070, Dec. 2025. [47] J. Li, Q. Pan, Z. W an, P . Zhu, D. W ang, M. Lou, J. Jin, F . Liu, and X. Y ou, “Low altitude 3-D covera ge performance analysis of cell-free RAN for 6G systems, ” IEEE T rans. V eh. T echnol. , vol. 72, no. 12, pp. 16 163–16 176, Dec. 2023. [48] F . Jiang and A. L. Swindlehurst, “Optimization of UA V heading for the ground-to-a ir uplink, ” IEEE J. Sel. Areas Commun. , vol. 30, no. 5, pp. 993–1005, Jun. 2012. [49] H. Li, L. Ding, Y . W ang, and Z. W ang, “ Air-to-groun d channel m odeling and performa nce analysis for cellular -connected U A V swarm, ” IEEE Commun. Lett. , vol . 27, no. 8, pp. 2172–2176, Aug. 2023. [50] J. T ang, X. G ao, L. Y ou, D. Shi, J. Y ang, X.-G. Xia, X. Zhao, and P . Jiang, “Massi ve MIMO-OFDM channel acquisition with time- frequenc y phase-shift ed pilots, ” IEEE T rans. Commun. , vol. 73, no. 6, pp. 4520–4535, J un. 2025. [51] Y . Zhu, L . Y ou, Q. Kong, G. Seco-Granados, and X. Gao, “Rob ust precodi ng for massiv e MIMO LEO satellite localizat ion systems, ” IEEE T rans. V eh. T echnol . , vol. 74, no. 2, pp. 3434–3438, Feb. 2025. [52] M. Qian, L. Y ou, X.-G. Xia, and X. Gao, “On the spectra l efﬁcien cy of multi-user holographic MIMO uplink transmission, ” IEEE T rans. W ireless Commun. , vol. 23, no. 10, pp. 15 421–15 434, Oct. 2024. [53] Y . Y e, L . Y ou, J. W ang, H. Xu, K. -K. W ong, and X. Gao, “Fluid antenna- assisted MIMO transmission explo iting s tatistical CSI, ” IEEE Commun. Lett. , vol. 28, no. 1, pp. 223–227, Jan. 2024. [54] H. Li, Y . W ang, C. Sun, and Z. W ang, “User-ce ntric cell-free massiv e MIMO for IoT in highly dynamic en vironments, ” IE EE Internet Things J . , vol. 11, no. 5, pp. 8658–8675, Mar . 2024. [55] H. Chang, J. Bian, C.-X. W ang, Z. Bai, W . Zhou, and e.-H. M. Aggoune, “ A 3D non-stationary wideband GBSM for lo w-altitude UA V-to-g round V2V MIMO channels, ” IEEE Access , vol. 7, pp. 70 719–70 732, May 2019. [56] M. Cardone, A. Dytso, and C. Rush, “Entropic central limit theorem for order statistics, ” IEEE Tr ans. Inf. Theory , vol. 69, no. 4, pp. 2193–2205, Apr . 2023. [57] K. Suto, S . Bannai, K. Sato, K. Inage, K. Adachi, and T . Fujii, “Image-dri ven spatial interp olation with deep learning for radio map construct ion, ” IEEE W irel ess Commun. Lett. , vol. 10, no. 6, pp. 1222– 1226, Jun. 2021. [58] G. Chen, Y . Liu, T . Zhang, J. Zhang, X. Guo, and J. Y ang, “ A graph neural network based radio map constructi on method for urban en vironment, ” IEEE Commun. Lett. , vol. 27, no. 5, pp. 1327–1331, May 2023. [59] R. Levi e, C ¸ . Y apar , G. Kutyniok, and G. Caire, “RadioUNet : Fast radio map estimation with conv olutional neural network s , ” IEEE T rans. W ireless Commun. , vol. 20, no. 6, pp. 4001–4015 , Jun. 2021. [60] D. Ramachandra m and G. W . T aylor , “Deep multimodal learni ng: A surve y on recent advanc es and trends, ” IEEE Signal Proce ss Mag . , vol. 34, no. 6, pp. 96–108, Nov . 2017. [61] S. Jabeen, X. Li, M. S. Amin, O. E . F . Bourahla, S. Li, and A. Jabbar , “ A rev iew on methods and applicat ions in multimodal deep learning, ” ACM T rans. Multimedia Comput. Commun. Appl. , vol. 19, pp. 1– 41, Feb . 2022. [62] G. Joshi, R. W alambe, and K. Kote cha, “ A re view on exp lainab ility in multimodal deep neural nets, ” IEEE A ccess , vol. 9, pp. 59 800–59 821, Mar . 2021. [63] C. Y in, Z. Xiao, X. Cao, X. Xi, P . Y ang, and D. W u, “Ofﬂin e and online search: UA V multiobject ive path planning under dynamic urban en vironment, ” IEEE Internet Things J . , vol. 5, no. 2, pp. 546–558, Apr . 2018. [64] H. Lei, Y . Y an, J. Liu, Q. Han, and Z. L i, “Hierarchic al multi-U A V path plannin g for urban low altitud e env ironments, ” IEE E Access , vol. 12, pp. 162 109–162 121, Oct. 2024. [65] D. Wu , Y . Qiu, Y . Zeng, and F . W en, “En vironment- aware channe l estimati on via integrat ing channel knowl edge map and dynamic sensing informati on, ” IEE E W irele ss Commun. Lett. , vol. 13, no. 12, pp. 3608– 3612, Dec. 2024. [66] J. Song, R. He, Z. Zhang, M. Y ang, B. Ai, H. Zhang, and R. Chen, “3D en vironment reconstruct ion based on ISAC channels, ” in Pr oc. IEEE Int. Conf . Ubiquitous Commun. (Ucom) , X i’an, China, Jul. 2024, pp. 487–491. [67] W . Liu and J. Chen, “U A V-aided radio map construct ion exploit ing en vironment semantics, ” IEE E T rans. W irele ss Commun. , vol. 22, no. 9, pp. 6341–6355, Sep. 2023. [68] Z. Cui, C. Briso-Rodr ´ ıguez, K. Guan, ˙ l. G ¨ uven c ¸ , and Z . Zhong, “Wi de- band air -to-ground channel characteri zation for multiple propagation en vironments, ” IEEE Antennas W irel. Propa g. Lett. , vol. 19, no. 9, pp. 1634–1638, Sep. 2020. [69] S. W oo, J. Park, J . -Y . Lee, and I. S. Kweon, “CB AM: Con volutional block attenti on module, ” in Proc. Eur . Conf. Comput. V is. (ECCV) , Munich, Germany , Sep. 2018, pp. 3–19. [70] J. Hoydis, F . A. Aoudia, S. Cammerer , M. Nimier-Da vid, N. Binder , G. Marcus, and A. Ke ller , “Sionna R T: Dif ferentiable ray tracing for radio propagat ion modeling, ” in Pro c. IEE E Globecom W orkshops (GC Wkshps) , Kuala Lumpur, Malaysia, Dec. 2023, pp. 317–321. [71] M. Haklay and P . W eber , “OpenStre etMap: User-generat ed street maps, ” IEEE P ervasive Comput. , vol. 7, no. 4, pp. 12–18, Dec. 2008. [72] Q. Gong, F . Wu, D. Y ang, L. Xiao, and Z. Liu, “3D radio map reconstru ction and traject ory optimiza tion for cellu lar- connected U A Vs, ” J . Commun. Inf. Networks , vol. 8, no. 4, pp. 357–368, Dec. 2023. [73] K. Sato and T . Fujii, “Krigin g-based interfere nce power constra int: Inte grated design of the radio en vironment map and transmission po wer, ” IEEE T rans. Cognit. Commun. Networki ng , vol. 3, no. 1, pp. 13–25, Mar . 2017. [74] P . Zhen, B. Zhang, Y .-Q. Xu, Z. Chen, H. W ang, and D. Guo, “Radio en vironment map construc tion based on Gaussian process with positional uncerta inty , ” IEEE W ire less Commun. Lett. , vol. 11, no. 8, pp. 1639– 1643, Aug. 2022.

CSI-tuples-based 3D Channel Fingerprints Construction Assisted by MultiModal Learning

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment