Preserving Vertical Structure in 3D-to-2D Projection for Permafrost Thaw Mapping

Pr eserving V ertical Structur e in 3D-to-2D Pr ojection f or P ermafr ost Thaw Mapping Justin McMillen 1 Robert V an Alphen 2 T aha Sadeghi Chorsi 2 Jason Shabaga 3 Mel Rodgers 2 Rocco Malservisi 2 Timoth y Dixon 2 Y asin Y ilmaz 1 Abstract Forecasting permafrost tha w from aerial lidar re- quires projecting 3D point cloud features onto 2D prediction grids, yet nai v e aggre gation meth- ods destroy the vertical structure critical in for - est en vironments where ground, understory , and canopy carry distinct information about subsur- face conditions. W e propose a projection de- coder with learned height embeddings that en- able height-dependent feature transformations, al- lowing the network to dif ferentiate ground-lev el signals from canopy returns. Combined with stratiﬁed sampling that ensures all forest strata remain represented, our approach preserves the vertical information critical for predicting sub- surface conditions. Our approach pairs this de- coder with a Point T ransformer V3 encoder to pre- dict dense thaw depth maps from drone-collected lidar o ver boreal forest in interior Alaska. Ex- periments demonstrate that z-stratiﬁed projection outperforms standard av eraging-based methods, particularly in areas with complex vertical ve g- etation structure. Our method enables scalable, high-resolution monitoring of permafrost degra- dation from readily deployable U A V platforms. 1. Introduction Regions around the globe are facing en vironmental impacts from climate warming, including ecosystem shifts and in- frastructure hazards. The circumpolar north has e xperienced widespread permafrost thaw throughout the last 50 years. Permafrost, deﬁned as soil, rock, or organic material that re- mains frozen for at least two consecutive years ( Le wkowicz et al. , 2025 ), stores substantial organic carbon. This thaw- 1 Electrical Engineering Department, Uni versity of South Florida, T ampa, FL, USA 2 Geoscience Department, Univ ersity of South Florida, T ampa, FL, USA 3 XXX Department, Univ ersity of Colorado Boulder , Boulder , CO, USA. Correspondence to: Y asin Y ilmaz < yasiny@usf.edu > , T imothy Dixon < thd@usf.edu > . Pr eprint. Marc h 18, 2026. ing can create a feedback loop, releasing more greenhouse gasses that intensify warming. The boreal forest of Alaska’ s interior is one such place experiencing warming ( Inter governmental Panel on Cli- mate Change (IPCC) , 2023 ). This region’ s discontinuous permafrost is controlled by interactions between climate, ecology , and hydrology . Boreal forests pro vide thick or - ganic soils and canopies which allo w permafrost to remain frozen at mean annual air temperatures abo ve 0°C ( Bonan & Shugart , 1989 ; Jor genson et al. , 2010 ; Zhu et al. , 2019 ). Disturbances such as warming temperatures or wildﬁres increase the likelihood of tha w initiation ( Camill , 1999 ; Y oshikawa et al. , 2002 ; Jorgenson et al. , 2022 ), which can induce ground subsidence (thermokarst) that de velops into sinkhole like features. This change in topography can ex- pand laterally , changing boreal forest into wetlands, releas- ing green houses gases, and destabilizing infrastructure. ( Osterkamp et al. , 2000 ; Haughton , 2018 ; V an der Sluijs et al. , 2018 ; Dearborn et al. , 2021 ). There is a growing need for modeling which can detect and predict thermokarst induced surface deformation ( T uretsky et al. , 2019 ). While traditional ﬁeld measurements remain accurate, they are spatially limited ( Bartsch et al. , 2023 ; GTN-P , 2015 ). Airborne lidar offers high-resolution 3D cov erage of terrain and ve getation ( Reutebuch et al. , 2003 ), but the input is an unordered 3D point cloud, yet the desired output is a georeferenced 2D map suitable for analysis and decision-making. Existing 3D-to-2D projection methods aggreg ate point fea- tures within ﬁxed spatial bins, an approach suited to au- tonomous driving b ut problematic in forests ( Li et al. , 2023 ; Hu et al. , 2022 ). V ertical structure carries predicti ve signal about subsurface conditions, yet standard aggre gation ov er - samples dense canopy while underrepresenting sparse b ut informati ve ground returns ( Fisher et al. , 2016 ; Kropp et al. , 2020 ; Meng et al. , 2010 ; Campbell et al. , 2018 ). W e propose a projection decoder with learned height em- beddings that preserve vertical structure during 3D-to-2D transformation. Our method applies farthest point sampling in the z-dimension to ensure all forest strata are represented 1 Preser ving V ertical Structure in 3D-to-2D Pr ojection f or Permafr ost Thaw Mapping regardless of point density , then augments selected points with learned z-embeddings that enable height-dependent feature trhansformations. Combined with multi-scale fea- ture fusion, our approach produces high-resolution thaw depth maps from single-date lidar . W e summarize our contributions as follo ws: • A learned height embedding that enables the network to apply height-dependent feature transformations during 3D-to-2D projection, allowing dif ferentiation between ground-lev el and canopy-deriv ed information. • A multi-scale late fusion decoder that independently projects features from each encoder stage, capturing both ﬁne detail and global context. • Experimental analysis showing that e xplicit height en- coding is the critical component for accurate thaw pre- diction, outperforming both naiv e aggregation and his- togram baselines in both re gression and classiﬁcation formulations. 2. Related W ork 2.1. Permafr ost Monitoring and F or ecasting The CALM project (Circumpolar Active Layer Monitor - ing) has conducted in-situ monitoring of permafrost for more than three decades ( Bro wn et al. , 2000 ). The project employs temperature monitoring, mechanical probing, and vertical displacement measurements to estimate activ e layer thickness (AL T), the layer that under goes seasonal freeze- thaw . Regional or global monitoring efforts expand on these ground-based measures with remotely sensed data ( Jor gen- son & Grosse , 2016 ; Kok elj et al. , 2023 ). Lidar , radar, elec- tromagnetic, and spectral imagery have all added to the tools with which permafrost and its related environmental e xpres- sions can be studied ( Pastick et al. , 2013 ; Li et al. , 2015 ; Chorsi et al. , 2024 ; Lu & Han , 2025 ). Unoccupied aerial ve- hicles (U A Vs) ha ve also established a role in permafrost and activ e layer studies due to their capacity to produce high- resolution digital surface models through photogrammetry ( Gaffe y & Bhardwaj , 2020 ). U A V -lidar , like photogramme- try , can produce point clouds and digital surf ace models, but where they dif fer is the ability for ground penetration ( Gao et al. , 2024 ; Renette et al. , 2024 ). U A V -lidar can produce full vertical proﬁles from the ground surface up into the canopy producing a product with higher information den- sity . In order to fully utilize such large datasets, machine learning models hav e become standard tools. These mod- els hav e been used to e xtrapolate permafrost extent, detect thaw induced landslides, and predict acti ve layer thickness ( Pastick et al. , 2013 ; Li et al. , 2017 ; Lou et al. , 2023 ). 2.2. 3D Computer V ision Point cloud deep learning has e volv ed from per -point pro- cessing to transformer architectures capable of handling large-scale outdoor lidar . PointNet introduced learning di- rectly on unstructured point sets using shared MLPs and symmetric pooling for permutation in variance ( Qi et al. , 2016 ). PointNet++ extended this through hierarchical set abstraction layers with multi-scale grouping to handle non- uniform densities ( Qi et al. , 2017 ). The Point T ransformer series introduced attention mechanisms for capturing long- range dependencies. Point T ransformer applied vector atten- tion within local neighborhoods ( Zhao et al. , 2021 ). Point T ransformer V2 uniﬁed multi-head and v ector attention with partition-based pooling ( W u et al. , 2022 ), and Point T ransformer V3 prioritized scalability by replacing k -NN searches with serialized neighbor mapping and Flash Atten- tion ( W u et al. , 2024 ). These ef ﬁciency gains make PTv3 suitable as an encoder backbone for processing large drone- collected point clouds. 2.3. 3D-to-2D Featur e Projection The autonomous dri ving domain has de veloped bird’ s-eye- view (BEV) projection methods for dense prediction ( Li et al. , 2023 ). V oxelNet introduced voxel feature encod- ing to transform points within spatial bins ( Zhou & T uzel , 2017 ), while PointPillars improved ef ﬁciency by or ganizing points into vertical pillars and using PointNets to learn per - pillar features scattered to pseudo-images ( Lang et al. , 2019 ). Camera-based approaches such as Lift-Splat-Shoot ( Philion & Fidler , 2020 ) and BEVFormer ( Li et al. , 2022 ) lift image features to 3D before aggre gating to BEV grids. BEVFusion ( Liu et al. , 2024 ) uniﬁed lidar and camera modalities in a shared BEV space. These BEV methods were designed primarily for object detection rather than continuous dense regression, and typi- cally employ ﬁxed spatial binning that allows dense regions to dominate aggregated features. Our approach instead queries k -nearest neighbors in XY for each output pixel, then applies farthest point sampling in Z to ensure vertical cov erage. W e then augment selected points with learned height embeddings, with learned height embeddings en- abling height-dependent feature transformations. 3. Data Collection W e chose a small ﬁeld site operated by the U.S. Army Corps of Engineers’ Cold Regions Research and Engineering Lab . This site (Farmer’ s Loop 2 (FL2)), located in Fairbanks, Alaska, is one of two thermokarst study areas on Farmer’ s Loop Road. This site was chosen as it contains sev eral thermokarst indicators. Figure 1 shows both the site’ s lo- cation in Alaska, as well as a bird’ s eye view of the data 2 Preser ving V ertical Structure in 3D-to-2D Pr ojection f or Permafr ost Thaw Mapping F igure 1. Orthophoto ﬁeld map of Farmer’ s Loop 2 ﬁeld site. Extent of the orhtophoto matches the extent of the lidar data. Inset: Location of Fairbanks, AK where our ﬁeld site is located. collection site. It displays thermokarst related microtopo- graphic shifts, changing ve getation cover , and hydrologic conditions indicativ e of rapidly thawing permafrost. Specif- ically , ve getation here is moving to a predominantly woody ve getation cov er which can be indicativ e of warming soils along side recent measurements sho wing a deepening acti ve layer . W e collected data at FL2 at the end of freeze in May 2024 and at the end of summer thaw in August 2024 to capture the ground-truth thaw map. An md-Lidar1000HR U A V equipped with a small-form-factor dual lidar-camera system was used. The U A V is pictured in Figure 2 . The UA V - lidar is a combination of a V elodyne Puck VLP-16 lidar and a SONY IMX264 camera. Images are captured simul- taneously with the lidar data. The VLP-16 lidar outputs a near-infrared laser pulse (903 nm wavelength) at a rep- etition rate of approximately 290,000 pulses per second, with a scan angle of 15° on either side of nadir and a 360° horizontal ﬁeld of view . T o enable ground-truth comparison to the lidar point cloud we deployed check points throughout the ﬁeld site. Check- point locations were selected to optimize for visibility to the U A V -lidar and GNSS satellites. A T rimble R10 was used as the base station with a T rimble R12 rov er used to surve y each check point. Additionally , the Trimble R10 acted as a GNSS base station for post-proccessing U A V trajectories. The raw lidar data from May and August are con verted to elev ation maps using the LP360 software by GeoCue. The difference between the two ele v ation maps provides the F igure 2. md-Lidar1000HR UA V with lidar and RGB camera. ground truth ele v ation change. Our objective is to forecast the elev ation change by using the lidar point cloud from May as input. Samples of point cloud, ground truth map, and forecasted map can be seen in Figure 5 . 4. Methodology 4.1. Point Cloud Cr eation The LP360 software of fers a simple pipeline that steps through each stage of lidar point cloud creation. Importantly , U A V trajectories are corrected based on post-processed base station location and updated GNSS satellite orbits. Finally , images collected in-ﬂight are matched across the point cloud resulting in a ﬁnal point cloud with RGB color information. The ﬁnal point cloud was checked for an y misalignment in geolocation with ground control points. The processed point cloud was exported alongside an interpolated digital terrain model (DTM) based on the classiﬁed ground points of the point cloud. For each time step this giv es us a full colorized and classiﬁed point cloud as well as an interpolated DTM for further use. Giv en the colored point cloud P = { ( p i , f i ) } N i =1 where p i ∈ R 3 denotes spatial coordinates and f i ∈ R 4 denotes per-point attrib utes (RGB, intensity), our goal is to predict a dense thaw depth map Y ∈ R H × W . W e frame this as a 3D-to-2D projection problem analogous to bird’ s-eye-vie w perception in autonomous driving, b ut targeting continuous regression rather than object detection. 4.2. Encoder W e adopt Point Transformer V3 (PTv3) ( Wu et al. , 2024 ) as our point cloud encoder . PTv3 processes 7-dimensional input (XYZ coordinates, RGB, and intensity) through a hi- erarchical architecture, producing multi-scale point features at four stages with channel dimensions { 64 , 128 , 256 , 512 } . Each stage captures structure at a different spatial granular - ity: early stages preserv e ﬁne-grained local detail while later 3 Preser ving V ertical Structure in 3D-to-2D Pr ojection f or Permafr ost Thaw Mapping F igure 3. Model architectur e o verview . A point cloud with per-point features (XYZ, RGB, intensity) is processed by a Point T ransformer V3 encoder , which produces multi-scale point features at four hierarchical stages. Each stage is independently projected to a 2D feature map via our height-aw are projection mechanism with learned z-embeddings (see Fig. 4 ), preserving v ertical forest structure during the 3D-to-2D transformation. The resulting feature maps are concatenated and fused through 1 × 1 con v olutions. A lightweight con volutional head produces the ﬁnal per-pix el thaw depth prediction, supporting both re gression and classiﬁcation formulations. stages encode broader context. W e lev erage all four scales in our decoder , treating the encoder as a ﬁxed architectural component and focusing our contribution on the projection mechanism. The entire architecture is shown graphically in Fig. 3 . 4.3. Multi-Scale Z-Stratiﬁed Projection Decoder The core challenge in projecting 3D point features to a 2D grid lies in preserving height-dependent information. In forest en vironments, features from ground, understory , and canopy carry distinct signals about subsurface conditions, yet naiv e aggregation conﬂates them. W e propose a decoder that (1) ensures all vertical strata are represented through stratiﬁed sampling, and (2) learns height-dependent feature transformations via explicit z-embeddings. W e propose a multi-scale decoder that processes each en- coder stage independently before fusing their 2D representa- tions. This late-fusion strategy allo ws ﬁne scales to capture local geometric detail while coarse scales provide global context. This overall 3D to 2D projection module is sho wn graphically in Fig. 4 . For each encoder stage s ∈ { 1 , 2 , 3 , 4 } , we ﬁrst project point features to a common dimension D = 128 via a learned linear layer . W e then deﬁne a regular grid of H × W = 64 × 64 query locations Q = { q j } H W j =1 in normalized [ − 1 , 1] 2 space, where each query represents a cell center for feature aggregation. For each query q j , we identify M = 2 k candidate points nearest in XY -distance, where k = 32 is the target neigh- borhood size. Rather than selecting the k nearest points, we apply farthest point sampling (FPS) o ver the M candidates in the z-dimension to select k points. FPS iterati vely b uilds a subset by always selecting the point maximally distant from all pre viously selected points; applied to z-coordinates, this produces a set that spans the full vertical e xtent of the local point distribution. This ensures all forest strata remain represented regardless of point density . Selected points are augmented with positional context. For each selected point ( p i , f i ) in query q j ’ s neighborhood, we embed the z-coordinate in p i through a learnable MLP ψ : R → R D and add it to the projected features ϕ ( f i ) : ˜ f i = ϕ ( f i ) + ψ ( z i ) (1) where ϕ is the stage-speciﬁc linear projection. This embed- ding allo ws the network to learn height-dependent feature transformations. W e aggregate neighborhood features by exploiting v ertical ordering. The k selected points are sorted by z-coordinate (ground to canopy), and their features are concatenated into a single vector encoding the full v ertical proﬁle: v j = [ w 1 ˜ f π (1) ; w 2 ˜ f π (2) ; . . . ; w k ˜ f π ( k ) ] (2) where π is the z-sorting permutation and w i are spatially- decaying weights based on XY -distance: w i = exp  − λ · max(0 , ∥ q j − p xy i ∥ − τ )  (3) with threshold τ = 0 . 1 and fallof f λ = 10 . Points within the threshold receiv e unit weight; beyond it, inﬂuence decays exponentially . This is to prev ent extremely f ar points from contributing e xcessi vely in sparse areas. The concatenated proﬁle v j ∈ R kD is projected back to R D via a two-layer MLP with LayerNorm and GELU acti vation, yielding the aggregated feature g j for query location j . Each scale produces an independent feature map G ( s ) = [ g j ] ∈ R H × W × D . W e concatenate these along the channel dimension and fuse via 1 × 1 con volutions: Y = Conv 1 × 1  [ G (1) ; G (2) ; G (3) ; G (4) ]  (4) with intermediate GroupNorm and GELU acti vations, pro- gressiv ely reducing from 4 D channels to the ﬁnal output dimension (1 for regression, C for classiﬁcation). 4 Preser ving V ertical Structure in 3D-to-2D Pr ojection f or Permafr ost Thaw Mapping F igure 4. Height-aware projection mechanism. (a) Input point cloud with vertical forest structure (ground, understory , canopy) abov e a query grid cell q j . (b) From XY -nearest candidates ( M = 2 k ), farthest point sampling in the z-dimension selects k points that span the full vertical e xtent. (c) Selected points are sorted by height and augmented with learned z-embeddings that enable height-dependent feature transformations. (d) The concatenated proﬁle vector ( k × D ) is projected through an MLP to produce a single aggregated feature. (e) This process repeats for all query locations, yielding an H × W × D feature map. 5. Experiments W e ev aluate our method on a dataset of aerial lidar col- lected ov er permafrost terrain in Fairbanks, Alaska. The dataset comprises temporally paired acquisitions: May im- agery serv es as model input, while August imagery provides ground truth after seasonal tha w . Ground truth thaw depth is deri ved from interpolated ele vation change between the two acquisitions (see Section 3 ) 1 . The continuous tha w depth raster is partitioned into 64 × 64 spatial tiles. Tiles containing data v oids, or errors (tiles with change values greater than ± 1 m ) are excluded, yielding 781 valid samples. From these valid samples, each ﬁfth was designated as an e valuation tile, leaving 624 tiles for training and 157 for ev aluation. 5.1. Prepr ocessing For each tile, we extract all points from the May acquisition whose XY coordinates fall within the tile boundary . Spatial X and Y coordinates are normalized to [ − 1 , 1] (tile-local), while Z is normalized to [0 , 1] globally across the dataset to preserve relative elev ation information. Per-point fea- tures (RGB, intensity) are standardized to zero mean and unit variance. Ground truth thaw depths are transformed to approximate a standard normal distribution. W e clip the 1st and 99th percentiles to remo ve outliers before normalization. Predictions are denormalized for ev aluation. 5.2. Implementation Details W e implement our model in PyT orch using Point Trans- former V3 as the encoder backbone. W e use [2 , 2 , 2 , 2] 1 The dataset and codes will be publicly av ailable. layers in each stage. The decoder uses hidden dimension D = 128 , k = 32 neighbors, candidate multiplier M = 2 , XY fallof f threshold τ = 0 . 1 and fallof f rate λ = 10 . More discussion on hyperparameter selection can be found in the Appendix. Output grid resolution is 64 × 64 . PTv3 is trained from scratch on our dataset. W e use AdamW optimizer with learning rate 1 × 10 − 5 and a polynomial learning scheduler . W e start training with a 2 epoch warm up linearly ramping from 10% to the full learning rate. The model is trained for a total of 100 epochs. W e use batch size 1 with gradient accumulation ov er 2 steps, yielding an ef fective batch size of 2. Using Farthest Point Sampling (FPS), we cap the upper limit of points per tile to 60,000. Mixed precision (FP16) training and Flash Attention are also used to improve training speed and reduce memory size. Data augmentation includes random 90 ◦ rotations and random jitter in the xyz coordinates. W e ev aluate after each epoch and retain the checkpoint with lowest v alidation loss. 5.3. Evaluation Metrics Regression and classiﬁcation models are trained with mean squared error (MSE) and cross entropy with in verse square weights. W e report task-speciﬁc metrics on the held-out test set. Regression. For continuous thaw depth prediction, we report root mean squared error (RMSE), mean absolute error (MAE), and coef ﬁcient of determination ( R 2 ). All error metrics are computed in original units (centimeters) after denormalization. Classiﬁcation. While continuous predictions offer ﬁne- 5 Preser ving V ertical Structure in 3D-to-2D Pr ojection f or Permafr ost Thaw Mapping T able 1. Class distribution for dataset after discretizing the contin- uous ground truth labels. Class Boundary (cm) Percentage High Heav e > 1 . 6 24.8% Medium-high Heav e [1 . 6 , 1 . 0) 16.2% Medium-low Hea ve [1 . 0 , 0 . 5) 15.9% Low Hea ve [0 . 5 , 0 . 2) 8.5% No Change [0 . 2 , − 0 . 2] 11.4% Low Tha w ( − 0 . 2 , − 1 . 0] 16.1% High Thaw < − 1 . 0 7.1% grained detail, operational deployment often beneﬁts from discrete se verity cate gories. W e e v aluate a se ven-class for- mulation spanning High Heav e to High Thaw , with bound- aries and class distrib utions sho wn in T able 1 . T o measure classiﬁcation performance, we report both class speciﬁc and mean Intersection ov er Union (mIoU). Additionally , since the classes form an ordinal scale from thaw to heav e, we additionally report metrics that account for class ordering. Quadratic W eighted Kappa (QWK) measures agreement between predictions and ground truth while penalizing dis- agreements quadratically by ordinal distance: QWK = 1 − P i,j w ij O ij P i,j w ij E ij , w ij = ( i − j ) 2 ( K − 1) 2 (5) where O ij is the observed confusion matrix, E ij is the ex- pected confusion matrix under random agreement, K is the number of classes, and w ij weights disagreements by squared class distance. QWK ranges from − 1 (systematic disagreement) to 1 (perfect agreement), with 0 indicating chance-lev el performance. Mean Absolute Error in Class Units (MAECU) directly measures av erage ordinal displacement: MAECU = 1 N N X n =1 | ˆ y n − y n | (6) where ˆ y n and y n are predicted and true class indices, re- spectiv ely . Unlike mIoU, both metrics capture whether misclassiﬁcations fall to neighboring classes (minor error) or opposite ends of the scale (sev ere error). 5.4. Baselines T o ev aluate the contribution of our proposed decoder , we compare it against a multi-scale mean-pooling baseline that shares the same PTv3 encoder and CNN reﬁnement archi- tecture. For each encoder stage i , point features f j ∈ R D i are ﬁrst projected to a common dimension via a learned linear layer . Points are then binned to a 2D grid based on their XY coordinates, with features aggregated via mean pooling: g xy = 1 |N xy | X j ∈N xy ˜ f j (7) where N xy denotes the set of points falling within grid cell ( x, y ) , and ˜ f j are features projected to the common dimen- sion. The grids from all encoder stages are concatenated channel-wise and passed through the same conv olutional reﬁnement network used by our method. This baseline represents a straightforward application of PTv3 to dense prediction. It utilizes multi-scale fusion and identical output processing, differing only in the 3D-to-2D projection mech- anism. Crucially , mean pooling across all heights causes ground-lev el information to be diluted by canopy returns, which dominate point density in forested scenes. Our z- stratiﬁed decoder addresses this by explicitly preserving vertical structure during projection. T o isolate whether cate gorical ve getation structure alone pre- dicts thaw depth, we ev aluate a histogram baseline that uses no learned point features. For each grid cell, we compute the proportion of points belonging to each LP360 classiﬁcation category (ground, lo w v egetation, medium v egetation, high ve getation) along with log point density . This 6-channel feature map (5 class proportions + density) is processed by a lightweight CNN with three conv olutional layers. This baseline tests whether the predictiv e signal is simply ”what ve getation types are present” versus learned geometric and radiometric features. 5.5. Results W e e v aluate our approach under both re gression and classiﬁ- cation formulations. Regression directly predicts continuous thaw depth, while classiﬁcation assigns pixels to discrete sev erity categories. Regression. T able 2 presents regression performance on the held-out ev aluation set. The histogram approach per- forms worst (R 2 = 0 . 656 ), conﬁrming that categorical v eg- etation proportions alone poorly predict thaw depth. Mean- pooling improv es (R 2 = 0 . 705 ) by lev eraging learned point features, but conﬂates canopy and ground-le v el informa- tion. Our decoder substantially outperforms both, reduc- ing RMSE by 70% relativ e to mean pooling ( 0 . 515 → 0 . 121 cm ) and achieving R 2 = 0 . 984 . These gains demon- strate that height-dependent feature encoding is vital for accurate thaw prediction. Classiﬁcation. T able 3 reports classiﬁcation performance. The histogram baseline achie ves only 23.3 mIoU, demon- strating that categorical vegetation proportions alone are insufﬁcient for tha w prediction despite ve getation struc- ture’ s known correlation with permafrost dynamics. The mean-pooling baseline improv es substantially to 57.5 mIoU, 6 Preser ving V ertical Structure in 3D-to-2D Pr ojection f or Permafr ost Thaw Mapping T able 2. Regression results. Best results in bold . Method RMSE ↓ MAE ↓ R 2 ↑ Histogram 0.556 0.460 0.656 Mean-pooling 0.515 0.265 0.705 Ours 0.121 0.089 0.984 indicating that learned point features capture information beyond v egetation cate gories. Our method achieves a 8 . 6 - point improvement in mIoU ov er the mean-pooling base- line, with the largest gains at the ordinal extremes: +18 . 6 points for High hea ve (C1) and +27 . 9 points for High Tha w (C7). Accurate discrimination at the e xtremes is particularly important for infrastructure monitoring, where confusing sev ere thaw with heav e could lead to opposite interv ention decisions. The ordinal metrics rev eal more about how the models misclassify . QWK improves from 0 . 52 (histogram) to 0 . 75 (mean-pooling) to 0 . 97 (ours), indicating that when our model does misclassify , predictions remain closer to the true se verity on the tha w-heave spectrum. This is corrobo- rated by the reduction in MAECU from 0 . 53 (mean-pooling) to 0 . 19 class units. This sho ws that explicitly augmenting features by height enables the network to better distinguish ground surface conditions. 5.6. Ablation Study W e ablate ke y decoder components in T able 4 , sho wing both classiﬁcation and regression results. Removing the learned z-embedding produces the largest performance drop ( − 7 . 4 points in mIoU), indicating that height-dependent feature modulation is the most critical component of our decoder . W ithout explicit elev ation en- coding, the model cannot distinguish features originating from canop y versus ground returns during aggregation, e ven when the projection strate gy preserves their spatial separa- tion. Replacing our stratiﬁed proﬁle sampling with mean pooling reduces mIoU by 3 . 8 points. Mean pooling allows dense canopy returns to dominate the aggre gated features, diluting the sparse but informati ve ground-le v el points that carry the strongest signal for thaw prediction. T aking only the closest k neighbors, rather than using FPS to maximize the z-proﬁle of each pixel, produces a drop in mIoU of 0 . 8 points. This drop indicates that spreading out sampled points maximizes the information av ailable to the decoder . W ithout it, sampled points may be extremely close together, which intuitiv ely should carry redundant information if the y represent the same object. Con versely , setting M = 4 reduces miou by 3 . 2 points. This is because a higher M allows for farther points to be sampled by FPS during decoding, which are too far away to contribute meaningful local information. Finally , using only the ﬁnal encoder stage ( S 4 ) decreases mIoU by 4 . 6 points. Earlier stages capture ﬁne-grained local geometry at higher point density , while later stages provide broader semantic conte xt. Fusing across scales al- lows the decoder to le verage both detailed surface structure and coarse ve getation patterns. 6. Discussion Our results demonstrate that learned height encoding is es- sential for accurate dense prediction in forested lidar scenes. The 70% reduction in RMSE and near -perfect R 2 of 0 . 974 indicate that our decoder successfully preserv es ground- lev el information beneath dense canopy co verage, enabling larger gains at the extremes of the thaw-heav e spectrum where accurate prediction matters most for do wnstream decision-making. The histogram baseline’ s poor performance ( 23 . 3 mIoU) conﬁrms that categorical ve getation structure alone is insuf- ﬁcient despite v egetation’ s established relationship with per- mafrost thermal dynamics. The 34-point gap between his- togram and mean-pooling baselines indicates that PTv3 cap- tures meaningful representations beyond cate gorical struc- ture. The ablation study shows that learned z-embeddings contribute more to performance than the sampling strate gy itself ( − 7 . 4 vs − 0 . 8 mIoU). This suggests the decoder’ s effecti veness lies not in only including points from multi- ple heights, but in learning how to weight features based on their vertical origin, essentially functioning as a learned vertical attention mechanism. From an application perspective, forecasting sub-centimeter thaw depth from single-timepoint winter lidar enables proac- tiv e infrastructure management by identifying at-risk areas before thaw occurs The strong performance on e xtreme classes is the most v aluable in this conte xt. Distinguishing sev ere thaw from hea ve represents the dif ference between subsidence risk requiring interv ention and frost-dri v en uplift that may self-correct. This w ork is not without limitations. Most signiﬁcantly , our e v aluation is constrained to a single geographic site in interior Alaska, comprising 781 spatial tiles. While the held-out e v aluation set was spatially disjoint from training data, the model has not been tested on different forest types, permafrost regimes, or climatic conditions. The extent to which learned representations transfer across sites remains an open question. Beyond permafrost monitoring, our approach addresses a general challenge in 3D scene understanding: projecting volumetric features to 2D when informative content is oc- cluded by dominant structures. Applications include under- story estimation in forestry , road surface prediction beneath ve getation, and ground-lev el inference in urban scenes. Fu- ture work should prioritize cross-site v alidation and multi- 7 Preser ving V ertical Structure in 3D-to-2D Pr ojection f or Permafr ost Thaw Mapping T able 3. Classiﬁcation results. Per-class IoU and mean IoU (%). QWK = Quadratic W eighted Kappa; MAECU = Mean Absolute Error in Class Units. Method C1 C2 C3 C4 C5 C6 C7 mIoU ↑ QWK ↑ MAECU ↓ Histogram 50.2 21.8 14.3 11.9 11.1 29.0 24.6 23.3 0.52 1.27 Mean-pooling 70.8 67.7 65.1 42.8 54.7 63.5 37.9 57.5 0.75 0.53 Ours 89.4 67.5 68.0 43.3 60.1 67.6 65.8 66.1 0.97 0.19 F igure 5. Qualitative comparison. Each row: input point cloud, regression ground truth, regression prediction, classiﬁcation ground truth, classiﬁcation prediction. T able 4. Ablation study on decoder components. V ariant mIoU (%) RMSE (cm) Full model 66.1 0.121 Pr ojection strate gy Mean pooling (no proﬁle) 62.3 0.297 Closest-k (no FPS in z) 65.3 0.151 M = 4 candidates 62.9 0.272 F eature encoding w/o z-embedding 58.7 0.448 Multi-scale fusion No multi-scale ( S 4 only) 61.5 0.192 temporal sequences to learn seasonal dynamics. Interpreting the learned z-embeddings’ relationship to v egetation strata could yield insights for both remote sensing and permafrost modeling. 7. Conclusion W e presented a projection decoder with learned height em- bedding for dense prediction from aerial lidar in forested en vironments. By augmenting point features with e xplicit z-encodings, our approach enables the model to learn height- dependent feature transformations that differentiate ground- lev el signals from canopy returns. Combined with stratiﬁed sampling that ensures all forest strata remain represented, the vertical information critical for predicting subsurface conditions is preserved. This enables thaw depth prediction from single-date winter point clouds, achieving a 70% re- duction in RMSE and near -perfect ordinal agreement (QWK = 0 . 97 ) on our permafrost monitoring task. The ablation analysis reveals that learned z-embeddings contribute more to performance than the projection strat- egy itself, suggesting that the decoder’ s effecti veness lies in learning height-dependent feature transformations rather than just sampling across strata. This ﬁnding has many implications beyond permafrost: any dense prediction task where vertically-distrib uted 3D structure must be collapsed 8 Preser ving V ertical Structure in 3D-to-2D Pr ojection f or Permafr ost Thaw Mapping to 2D (understory estimation in forestry , road surf ace predic- tion beneath vegetation, ground-le vel inference in occluded urban scenes) may beneﬁt from explicit ele v ation encoding during projection. Impact Statement W e present a projection decoder with learned height embed- dings that enables accurate dense predictions from aerial lidar in forested en vironments, achie ving sub-centimeter permafrost thaw estimation from single-date acquisitions. Our approach addresses the general challenge in 3D scene understanding of inferring ground-le vel properties beneath occluding structures, with immediate applications in Arctic infrastructure monitoring and climate change mitigation. References Bartsch, A., Strozzi, T ., and Nitze, I. Permafrost monitoring from space. Surveys in Geophysics , pp. 1579–1613, 2023. ISSN 44. doi: 10.1007/s10712- 023- 09770- 3. Bonan, G. B. and Shugart, H. H. En vironmental factors and ecological processes in boreal forests. Annu. Rev . Ecol. Syst , 20:1–28, 1989. URL www.annualreviews. org . Brown, J., Hinkel, K. M., and Nelson, F . E. The circumpolar activ e layer monitoring (calm) program: Research designs and initial results 1. P olar Ge- ography , 24(3):166–258, 2000. doi: 10.1080/ 10889370009377698. URL https://doi.org/10. 1080/10889370009377698 . Camill, P . Patterns of boreal permafrost peatland ve getation across en vironmental gradients sensiti ve to climate warm- ing. Canadian Journal of Botany , 77:721–733, 1999. ISSN 00084026. doi: 10.1139/B99- 008. Campbell, M. J., Dennison, P . E., Hudak, A. T ., y M. P arham, L., and Butler , B. W . Quanti- fying understory vegetati on density using small- footprint airborn e lidar . Remote Sensing of En vironment , 215:330–342, 2018. ISSN 0034- 4257. doi: https://doi.org/10.1016/j.rse.2018.06. 023. URL https://www.sciencedirect.com/ science/article/pii/S0034425718303018 . Chorsi, T . S., Meyer , F . J., and Dixon, T . H. T o- ward long-term monitoring of regional permafrost thaw with satellite interferometric synthetic aperture radar . The Cryospher e , 18:3723–3740, 2024. ISSN 19940416. URL https://link.gale.com/ apps/doc/A805497895/AONE?u=tamp44898& sid=bookmark- AONE&xid=6842e4aa . Dearborn, K. D., W allace, C. A., P atankar , R., and Baltzer , J. L. Permafrost thaw in boreal peatlands is rapidly altering forest community composition. Journal of Ecology , 109:1452–1467, 3 2021. ISSN 1365-2745. doi: 10.1111/1365- 2745.13569. URL https://onlinelibrary.wiley.com/doi/ full/10.1111/1365- 2745.13569https: //onlinelibrary.wiley.com/doi/ abs/10.1111/1365- 2745.13569https: //besjournals.onlinelibrary.wiley. com/doi/10.1111/1365- 2745.13569 . Fisher , J., Estop Aragon ´ es, C., Thierry , A., Charman, D., W olfe, S., Hartley , I., Murton, J., Williams, M., and Phoenix, G. The inﬂuence of ve getation and soil charac- teristics on activ e-layer thickness of permafrost soils in boreal forest. Global change biology , 22, 02 2016. doi: 10.1111/gcb .13248. Gaffe y , C. and Bhardwaj, A. Applications of un- manned aerial vehicles in cryosphere: Latest ad- vances and prospects. Remote Sensing , 12:948, 2020. doi: https://doi.org/10.3390/rs12060948. Copy- right - © 2020. This work is licensed under http://creativ ecommons.org/licenses/by/3.0/ (the “Li- cense”). Notwithstanding the ProQuest T erms and Con- ditions, you may use this content in accordance with the terms of the License. Last updated - 2023-12-04 Subject- sT ermNotLitGenreT ext - Greenland; Antarctica. Gao, K., Li, G., Chen, D., Su, A., Cao, Y ., Li, C., W u, G., Du, Q., Lin, J., W ang, X., Huang, S., T ang, L., and Jia, H. Pa vement damage characteristics in the permafrost regions based on uav images and airborne lidar data. Cold Regions Science and T echnology , 228:104313, 12 2024. ISSN 0165- 232X. doi: 10.1016/J.COLDREGIONS.2024.104313. URL https://www.sciencedirect.com/ science/article/pii/S0165232X24001940 . GTN-P. Global T errestrial Network for Permafrost meta- data for permafrost boreh oles (TSP) and acti ve layer monitoring ( CALM) sites, 2015. URL https://doi. org/10.1594/PANGAEA.842821 . Supplement to: Biskaborn, Boris K ; Lanckman, Jean-Pierre; Lantuit, Hugues ; Elger , Kirsten; Streletskiy , Dmitry A; Cable, W illiam L; Romanovsky , Vladimir E (2015): The new database of the Global T errestrial Network for Permafrost (GTN -P). Earth System Science Data, 7(2), 24 5-259, https://doi.org/10.5194/essd-7-24 5-2015. Haughton, E. P ermafr ost thaw-induced for est to wetland con version: potential impacts on snowmelt and basin runoff in northwestern Canada . PhD thesis, W ilfrid Laurier Univ ersity , 2018. URL https://scholars. wlu.ca/etd . 9 Preser ving V ertical Structure in 3D-to-2D Pr ojection f or Permafr ost Thaw Mapping Hu, J. S. K., Kuai, T ., and W aslander , S. L. Point density- aware voxels for lidar 3d object detection, 2022. URL https://arxiv.org/abs/2203.05662 . Intergo vernmental Panel on Climate Change (IPCC). Climate Change 2021: The Physical Science Ba- sis. W orking Gr oup I Contribution to the Sixth Assessment Report of the Intergo vernmental P anel on Climate Change . Cambridge Uni versity Press, 2023. doi: 10.1017/9781009157896. URL https: //www.cambridge.org/core/product/ 415F29233B8BD19FB55F65E3DC67272B . Jorgenson, M. T . and Grosse, G. Remote sensing of landscape change in permafrost regions. P er- mafr ost and P eriglacial Pr ocesses , 27(4):324–338, 2016. doi: https://doi.or g/10.1002/ppp.1914. URL https://onlinelibrary.wiley.com/doi/ abs/10.1002/ppp.1914 . Jorgenson, M. T ., Romanovsky , V ., Harden, J., Shur , Y ., O’ donnell, J., Schuur , E. A. G., Kanevskiy , M., Marchenko, S., Jorgenson, M. T . ., Romanovsky , V ., Marchenko, S., Shur , Y ., Kane vskiy , M., and Schuur, E. A. G. Resilience and vulnerability of permafrost to climate change. Canadian Journal of F orest Researc h , 40:1219–1236, 2010. doi: 10.1139/X. Jorgenson, M. T ., Kanevskiy , M., Roland, C., Hill, K., Schirokauer , D., Stehn, S., Schroeder, B., and Shur , Y . Repeated permafrost formation and degradation in boreal peatland ecosystems in relation to climate ex- tremes, ﬁre, ecological shifts, and a geomorphic legacy . Atmospher e , 13:1170, 7 2022. ISSN 20734433. doi: 10.3390/atmos13081170. K okelj, S. V ., Gingras-Hill, T ., Daly , S. V ., Morse, P . D., W olfe, S. A., Rudy , A. C., van der Sluijs, J., W eiss, N., Brendan O’Neill, H., Baltzer , J. L., Lantz, T . C., Gib- son, C., Cazon, D., Fraser, R. H., Froese, D. G., Giff, G., Klengenberg, C., Lamoureux, S. F ., Quinton, W . L., T uretsky , M. R., Chiasson, A., Ferguson, C., Newton, M., Pope, M., Paul, J. A., W ilson, M. A., and Y oung, J. M. The northwest territories thermokarst mapping collecti ve: a northern-driven mapping collaborative to ward under- standing the effects of permafrost thaw . Ar ctic Science , 9(4):886–918, 2023. doi: 10.1139/as- 2023- 0009. URL https://doi.org/10.1139/as- 2023- 0009 . Kropp, H., Loranty , M. c. M., Natali, S. M., Kholodo v , A. L., Rocha, A. V ., My ers Smith, I., Abbot, Benjamin W an d Abermann, J., Blanc-Betes, E., Blok, D., Blume- W erry , G., Boike, J., Breen, A. L., Ca hoon, S. M. P ., Christiansen, C. T ., Douglas, T . A., Epstein, H. a. E., Frost, G. V ., Goeckede, M., Høye, T . T ., Mamet, S. e. D., O’Donnell, J. A., Olefe ldt, D., Phoenix, G. K., Sal mon, V . G., Sannel, A. B. K., Smith, S. L., Sonnentag, Oliv er a nd V aughn, L. S., W illiams, M. h., Elberling, B., Gough, Laura a nd Hjort, J., Laﬂeur , P . M., E uskirchen, E. S., Heijmans, M. u. M., Humphreys, E. R., Iwata, H., Jones, B. M., Jorgens on, M. T ., Gr ¨ unberg, I., Kim, Y ., Laun- dre, J., Mauritz, M., Michelsen, A., Sch aepman Strub, G., T ape, K en D a nd Ueyama, M., Lee, Bang-Y ong a nd Langley , K., and Lund, M. Shallo w soils are warmer under trees and tall shrubs across arctic and boreal ecosys- tems. En vir onmental Resear ch Letters , 16(1):015001, dec 2020. doi: 10.1088/1748- 9326/abc994. URL https: //doi.org/10.1088/1748- 9326/abc994 . Lang, A. H., V ora, S., Caesar , H., Zhou, L., Y ang, J., and Beijbom, O. Pointpillars: Fast encoders for ob- ject detection from point clouds, 2019. URL https: //arxiv.org/abs/1812.05784 . Lewk o wicz, A., O’Neill, H., W olfe, S., Roy-L ´ eveill ´ ee, P ., V .E., R., Hoev e, E., Gruber , S., Brooks, H., Rudy , A., K oenig, C., Brown, N., and Bonnaventure, P . Glossary of permafrost science and engineering. T echnical report, Canadian Permafrost Association, 2025. URL http: //doi.org/10.3138/cpa- gpse . Li, A., T an, X., W u, W ., Liu, H., and Zhu, J. Predicting activ e-layer soil thickness using topographic variables at a small watershed scale. PLoS ONE , 12, 9 2017. ISSN 19326203. doi: 10.1371/JOURNAL.PONE.0183742. Li, H., Sima, C., Dai, J., W ang, W ., Lu, L., W ang, H., Zeng, J., Li, Z., Y ang, J., Deng, H., Tian, H., Xie, E., Xie, J., Chen, L., Li, T ., Li, Y ., Gao, Y ., Jia, X., Liu, S., Shi, J., Lin, D., and Qiao, Y . Delving into the de vils of bird’ s-eye- view perception: A revie w , ev aluation and recipe, 2023. URL . Li, Z., Zhao, R., Hu, J., W en, L., Feng, G., Zhang, Z., and W ang, Q. Insar analysis of surf ace deformation over permafrost to estimate active layer thickness based on one-dimensional heat transfer model of soils open. Scien- tiﬁc Reports , 5:15542, 10 2015. doi: 10.1038/srep15542. URL www.nature.com/scientificreports/ . Li, Z., W ang, W ., Li, H., Xie, E., Sima, C., Lu, T ., Y u, Q., and Dai, J. Be vformer: Learning bird’ s-eye-view rep- resentation from multi-camera images via spatiotempo- ral transformers, 2022. URL abs/2203.17270 . Liu, Z., T ang, H., Amini, A., Y ang, X., Mao, H., Rus, D., and Han, S. Bevfusion: Multi-task multi-sensor fusion with uniﬁed bird’ s-eye vie w representation, 2024. URL https://arxiv.org/abs/2205.13542 . Lou, P ., W u, T ., Chen, J., Fu, B., Zhu, X., Chen, J., W u, X., Y ang, S., Li, R., Lin, X., Shang, C., W en, A., W ang, 10 Preser ving V ertical Structure in 3D-to-2D Pr ojection f or Permafr ost Thaw Mapping D., La, Y ., and Ma, X. Recognition of tha w slumps based on machine learning and uavs: A case study in the qilian mountains, northeastern qinghai-tibet plateau. International Journal of Applied Earth Observation and Geoinformation , 116:103163, 2 2023. ISSN 1569-8432. doi: 10.1016/J.J A G.2022.103163. Lu, P . and Han, J. Remote sensing for identiﬁcation and mapping of thermokarst landforms: A re view . P ermafr ost and P eriglacial Pr ocesses , 36:329–342, 6 2025. ISSN 10991530. doi: 10.1002/PPP .2275. Meng, X., Currit, N., and Zhao, K. Ground ﬁltering al- gorithms for airborne lidar data: A re view of critical issues. Remote Sensing , 2(3):833–860, 2010. ISSN 2072-4292. doi: 10.3390/rs2030833. URL https: //www.mdpi.com/2072- 4292/2/3/833 . Osterkamp, T . E., V iereck, L., Shur, Y ., Jorgenson, M. T ., Racine, C., Doyle, A., and Boone, R. D. Observations of thermokarst and its impact on bo- real forests in alaska, u.s.a. Ar ctic, Antar ctic, and Alpine Researc h , 32:303–315, 8 2000. ISSN 1523- 0430. doi: 10.1080/15230430.2000.12003368. URL https://www.tandfonline.com/doi/abs/ 10.1080/15230430.2000.12003368 . Pastick, N. J., Jor genson, M. T ., W ylie, B. K., Minsley , B. J., Ji, L., W alvoord, M. A., Smith, B. D., Abraham, J. D., and Rose, J. R. Extending airborne electromagnetic surve ys for regional active layer and permafrost mapping with remote sensing and ancillary data, yukon ﬂats ecoregion, central alaska. P ermafrost and P eriglacial Processes , 24: 184–199, 4 2013. ISSN 10456740. doi: 10.1002/PPP . 1775. Philion, J. and Fidler , S. Lift, splat, shoot: Encoding images from arbitrary camera rigs by implicitly unprojecting to 3d, 2020. URL 05711 . Qi, C. R., Su, H., Mo, K., and Guibas, L. J. Pointnet: Deep learning on point sets for 3d classiﬁcation and segmenta- tion. arXiv pr eprint arXiv:1612.00593 , 2016. Qi, C. R., Y i, L., Su, H., and Guibas, L. J. Pointnet++: Deep hierarchical feature learning on point sets in a metric space. arXiv pr eprint arXiv:1706.02413 , 2017. Renette, C., Olvmo, M., Thorsson, S., Holmer , B., and Reese, H. Multitemporal uav lidar detects seasonal heav e and subsidence on palsas. The Cryosphere , 18: 5465–5480, 11 2024. ISSN 1994-0424. doi: 10.5194/ tc- 18- 5465- 2024. URL https://tc.copernicus. org/articles/18/5465/2024/ . Reutebuch, S. E., McGaughey , R. J., Andersen, H.-E., and Carson, W . W . Accuracy of a high-resolution l idar terrain model under a conifer fores t canopy . Cana- dian Journal of Remote Se nsing , 29(5):527–535, 2003. doi: 10.5589/m03- 022. URL https://doi.org/10. 5589/m03- 022 . T uretsky , M. R., Abbott, B. W ., Jones, M. C., Anthony , K. W ., Olefeldt, D., Schuur, E. A. G., K oven, C., McGuire, A. D., Grosse, G., Kuhry , P ., Hugelius, G., La wrence, D. M., Gibson, C., and Sannel, A. B. K. Permafrost collapse is accelerating carbon release. Natur e , pp. 32– 24, 2019. ISSN 569. doi: 10.1038/d41586- 019- 01313- 4. V an der Sluijs, J., K okelj, S. V ., Fraser, R. H., T unnicliffe, J., and Lacelle, D. Permafrost terrain dynamics and in- frastructure impacts re vealed by uav photogrammetry and thermal imaging. Remote Sensing , 10(11), 2018. ISSN 2072-4292. doi: 10.3390/rs10111734. URL https: //www.mdpi.com/2072- 4292/10/11/1734 . W u, X., Lao, Y ., Jiang, L., Liu, X., and Zhao, H. Point transformer v2: Grouped vector attention and partition- based pooling. In NeurIPS , 2022. W u, X., Jiang, L., W ang, P .-S., Liu, Z., Liu, X., Qiao, Y ., Ouyang, W ., He, T ., and Zhao, H. Point transformer v3: Simpler , faster , stronger . In CVPR , 2024. Y oshikawa, K., Bolton, W . R., Romanovsky , V . E., Fukuda, M., and Hinzman, L. D. Impacts of wildﬁre on the permafrost in the boreal forests of interior alaska. Journal of Geophysical Resear ch: Atmospher es , 107:FFR 4–1, 1 2002. ISSN 2156-2202. doi: 10.1029/2001JD000438. URL https://onlinelibrary.wiley.com/ doi/full/10.1029/2001JD000438https: //onlinelibrary.wiley.com/doi/abs/ 10.1029/2001JD000438https://agupubs. onlinelibrary.wiley.com/doi/10.1029/ 2001JD000438 . Zhao, H., Jiang, L., Jia, J., T orr , P ., and K oltun, V . Point transformer , 2021. URL abs/2012.09164 . Zhou, Y . and T uzel, O. V oxelnet: End-to-end learning for point cloud based 3d object detection, 2017. URL https://arxiv.org/abs/1711.06396 . Zhu, D., Ciais, P ., Krinner , G., Maignan, F ., Puig, A. J., and Hugelius, G. Controls of soil organic matter on soil thermal dynamics in the northern high latitudes. Nature Communications 2019 10:1 , 10: 1–9, 7 2019. ISSN 2041-1723. doi: 10.1038/ s41467- 019- 11103- 1. URL https://www.nature. com/articles/s41467- 019- 11103- 1 . 11 Preser ving V ertical Structure in 3D-to-2D Pr ojection f or Permafr ost Thaw Mapping A. Dataset Details A.1. Ground T ruth Raster Properties The ground truth elev ation change raster was deri ved from temporally paired U A V -lidar acquisitions (May to August 2024), computed as (August elev ation − May elev ation). Negati ve v alues indicate thaw (ground subsidence), while positi v e values indicate heav e (frost uplift or ve getation growth). T able 5 summarizes the raster properties. T able 5. Ground truth raster properties. Property V alue Coordinate Reference System EPSG:32606 (UTM Zone 6N) Spatial Resolution 0.10 m × 0.10 m Raster Dimensions 2261 × 2307 pixels Geographic Extent 226.1 m × 230.7 m T otal Area 5.22 ha V alid Data Coverage 67.0% V alid V alue Range [ − 2 . 33 , +5 . 70] cm A.2. Elevation Change Statistics T able 6 presents comprehensiv e statistics for the ground truth ele v ation change distrib ution. The positive mean (+0.99 cm) and median (+0.72 cm) indicate the site is heav e-dominated o verall, with localized thaw features corresponding to activ e thermokarst (including a central pond). The positive sk e wness (+0.96) reﬂects the asymmetric distribution with a longer tail tow ard extreme hea ve values, including v egetation gro wth in a grassy strip along the eastern edge. T able 6. Ground truth elev ation change statistics. Negativ e v alues indicate thaw (subsidence); positi ve values indicate hea ve (uplift). Statistic V alue (cm) Statistic V alue Minimum (max thaw) − 2 . 33 Ske wness +0 . 96 Maximum (max heav e) +5 . 70 Kurtosis 0 . 45 Mean +0 . 99 Coef. of V ariation 164 . 6% Median +0 . 72 V alid Pixels 3,495,868 Std. De v . 1 . 62 A.3. Thaw vs. Heav e Distrib ution T able 7. Overall distrib ution of thaw and hea ve pix els. Category Pixels Percentage Thaw (ne gati ve, subsidence) 808,730 23.1% No Change ( ± 0.2 cm) 396,961 11.4% Heav e (positi ve, uplift) 2,290,177 65.5% A.4. Spatial A utocorrelation W e quantify spatial autocorrelation using Moran’ s I statistic, which measures the degree to which values at nearby locations are more similar (positi ve autocorrelation) or more dissimilar (negati v e autocorrelation) than expected under spatial randomness. Moran’ s I is deﬁned as: I = N P i P j w ij · P i P j w ij ( x i − ¯ x )( x j − ¯ x ) P i ( x i − ¯ x ) 2 (8) 12 Preser ving V ertical Structure in 3D-to-2D Pr ojection f or Permafr ost Thaw Mapping where N is the number of spatial units (pixels), x i is the value at location i , ¯ x is the global mean, and w ij is the spatial weight between locations i and j (typically 1 for adjacent pixels and 0 otherwise). The statistic ranges from − 1 (perfect dispersion) through 0 (spatial randomness) to +1 (perfect clustering). The ground truth exhibits strong positiv e spatial autocorrelation (Moran’ s I = 0.99), indicating that thaw and hea ve patterns form spatially coherent clusters rather than random noise. This high value conﬁrms that ele vation change is go verned by spatially continuous physical processes (e.g., thermokarst expansion, frost hea ve) rather than pixel-independent noise. From a modeling perspectiv e, this supports the use of con v olutional operations in the output head, as neighboring predictions are highly correlated and spatial context pro vides meaningful information. B. Class Distribution Analysis B.1. Per -Class Statistics T able 8 provides detailed statistics for each sev erity class. The dataset exhibits moderate class imbalance: High Thaw (C7) contains only 7.1% of pixels, while High Heav e (C1) contains 24.8%—a 3.5 × imbalance ratio. This reﬂects the physical reality that sev ere thermokarst subsidence is spatially localized (e.g., the pond feature), while heave dominates the surrounding stable permafrost and ve getated areas. T able 8. Per -class statistics for the 7-class formulation. Boundaries deﬁned on ele vation change (negati ve = thaw , positi ve = heave). Imbalance ratio computed as (max class count) / (class count). Class Boundary (cm) Count % W eight Imbalance C1: High Heav e > +1 . 6 868,553 24.8 1.00 1.0 × C2: Med-High Heav e (+1 . 0 , +1 . 6] 566,837 16.2 1.53 1.5 × C3: Med-Lo w Heav e (+0 . 5 , +1 . 0] 556,133 15.9 1.56 1.6 × C4: Lo w Heav e (+0 . 2 , +0 . 5] 298,654 8.5 2.91 2.9 × C5: No Change [ − 0 . 2 , +0 . 2] 396,961 11.4 2.19 2.2 × C6: Lo w Thaw [ − 1 . 0 , − 0 . 2) 561,384 16.1 1.55 1.5 × C7: High Thaw < − 1 . 0 247,346 7.1 3.51 3.5 × The heav e-dominated distribution (65.5% of pix els in C1–C4) reﬂects the Farmer’ s Loop site characteristics: predominantly stable permafrost with seasonal frost heav e, a grassy strip exhibiting ve getation gro wth between acquisitions (+5.70 cm maximum), and localized thermokarst features including a central pond ( − 2 . 33 cm maximum subsidence). The 3.5 × imbalance for High Thaw (C7) motiv ates our use of inv erse-frequency class weighting and highlights the challenge of detecting rare but critical subsidence e vents for infrastructure monitoring. C. Implementation Details C.1. Full Hyperparameter Conﬁguration The complete set of hyperparameters are sho wn in T able 9 . C.2. XY Distance W eighting Parameters The z-stratiﬁed projection decoder uses a threshold-based exponential weighting scheme to control how points contribute to each grid cell based on their XY distance. For each grid cell query location, the weight assigned to a neighboring point at XY distance d is: w ( d ) = exp ( − λ · max(0 , d − τ )) (9) where τ is the distance threshold below which points receiv e full weight, and λ controls the decay rate beyond the threshold. W e select τ = 0 . 1 and λ = 10 based on empirical analysis of our dataset’ s point distrib ution. Figure 6 sho ws the relationship between these parameters and point cov erage. Threshold Selection ( τ = 0 . 1 ). The left panel sho ws the percentage of k-nearest neighbors (k=64) that fall within a gi ven threshold distance. W e plot this number because when M = 2 , FPS is sampling from 64 of the nearest points to generate 13 Preser ving V ertical Structure in 3D-to-2D Pr ojection f or Permafr ost Thaw Mapping T able 9. Complete hyperparameter conﬁguration. Component Parameter V alue Encoder (PTv3) Stages 4 Layers per stage [2, 2, 2, 2] Channel dimensions [64, 128, 256, 512] Input features 7 (XYZ, RGB, intensity) Decoder Hidden dimension D 128 Neighbors k 32 Candidate multiplier M 2 Output grid size 64 × 64 XY fallof f threshold τ 0.1 XY fallof f rate λ 10 T raining Optimizer AdamW Learning rate 1 × 10 − 5 LR scheduler Polynomial decay W armup epochs 2 T otal epochs 100 Batch size 1 Gradient accumulation 2 steps Max points per tile 60,000 Precision Mixed (FP16) A ugmentation Rotation Random 90° multiples Jitter XYZ coordinate noise Loss (Classiﬁcation) T ype Cross-entropy Class weights In verse frequency Loss (Regression) T ype MSE the Z-proﬁle for the decoder . At τ = 0 . 1 , approximately 94.8% of selected neighbors receive full weight ( w = 1 ). This ensures that the vast majority of points used in aggreg ation contribute without attenuation, while still allo wing the weighting scheme to do wnweight the small fr action of points that may be spatially misaligned due to v ariance in point cloud density . A smaller threshold would unnecessarily penalize well-positioned points, while a larger threshold w ould ef fecti vely disable the weighting mechanism entirely . Falloff Selection ( λ = 10 ). The right panel illustrates weight decay curves for various λ values. W ith λ = 10 , points at distance d = 0 . 2 (twice the threshold) recei ve weight w ≈ 0 . 37 , and points at d = 0 . 3 receiv e w ≈ 0 . 14 . This provides a smooth transition that maintains spatial locality without introducing discontinuities. Lo wer values (e.g., λ = 1 ) would allo w distant points to contribute nearly equally , potentially blurring spatial boundaries. Higher values (e.g., λ = 20 ) would create sharper cutoffs that approach hard thresholding, which we found less stable during training. The combination of τ = 0 . 1 and λ = 10 thus implements a “soft locality” constraint: points within approximately one grid cell spacing contribute fully , while more distant points are progressiv ely do wnweighted rather than excluded entirely . C.3. Neighborhood Size Selection W e use k = 32 neighbors per grid cell, informed by both empirical analysis and established con ventions in point cloud deep learning ( Qi et al. , 2017 ; Zhao et al. , 2021 ). As described in Section C.4 , we ﬁrst retrie ve M · k = 64 candidate points (the nearest neighbors in XY), then select k = 32 from this pool. Crucially , farthest point sampling (FPS) in the z-dimension will preferentially select points near the spatial boundary of the candidate pool, since XY distance is not taken into consideration. This means the ef fecti ve spatial extent of the ﬁnal k points is largely determined by the M · k candidate pool, not k alone. 14 Preser ving V ertical Structure in 3D-to-2D Pr ojection f or Permafr ost Thaw Mapping F igure 6. XY distance weighting analysis. Left: Fraction of k-nearest neighbors within threshold distance. Right: W eight fallof f curves for different λ v alues at τ = 0 . 1 . F igure 7. Spatial extent of k-nearest neighbors. The y-axis shows the a verage maximum distance among the k neighbors selected for each grid cell. The red dashed line indicates the grid cell spacing; v alues below this line indicate insuf ﬁcient coverage. Figure 7 shows the a verage maximum distance among k -nearest neighbors as a function of k , computed across sample tiles from our dataset. The dashed line indicates the grid cell spacing in normalized coordinates ( 2 / 64 ≈ 0 . 031 ). With M = 2 , our 64-point candidate pool extends well beyond the cell boundary , and FPS will reliably select points from this outer region to capture v ertical extremes. This ensures complete spatial coverage while the XY weighting scheme (Section C.2 ) appropriately downweights these more distant points during feature aggre gation. C.4. Candidate Pool Multiplier The hybrid sampling strategy ﬁrst identiﬁes M · k candidate points (the M · k nearest neighbors in XY), then selects k points from this pool using a combination of closest-k and farthest point sampling (FPS) in the z-dimension. The multiplier M controls the size of this candidate pool. Larger M provides more candidates for FPS to select from, potentially enabling better vertical di versity . Ho we ver , e xpanding the candidate pool necessarily includes points at greater XY distances, which may be spatially misaligned with the target grid cell. This creates a tradeof f between vertical sampling di versity and horizontal spatial ﬁdelity . At M = 1 , the candidate pool equals the ﬁnal selection size ( M · k = k ), reducing the method to simple k-nearest neighbors. This eliminates the z-stratiﬁed sampling that moti v ates our decoder design. W e e v aluated M ∈ { 2 , 4 } and found that M = 2 achiev es 66.1 mIoU compared to 62.9 mIoU for M = 4 . This 3.1 point improv ement demonstrates that the candidate pool 15 Preser ving V ertical Structure in 3D-to-2D Pr ojection f or Permafr ost Thaw Mapping should be kept spatially tight. W ith M = 4 (128 candidates for k = 32 ), the outer candidates lie at XY distances where ev en the exponential downweighting cannot fully compensate for spatial misalignment. The resulting feature aggregation conﬂates information from neighboring grid cells, degrading prediction accuracy . In contrast, M = 2 (64 candidates) restricts the pool to a compact spatial neighborhood while still providing suf ﬁcient candidates for FPS to achie ve meaningful vertical di versiﬁcation. C.5. Data Normalization • Spatial coordinates: X, Y normalized to [ − 1 , 1] per-tile; Z normalized to [0 , 1] globally across the dataset to preserve relativ e elev ation information. • Per -point features: RGB and intensity standardized to zero mean, unit variance using dataset-wide statistics. • Ground truth: Normalized to zero mean and unit variance for training. Predictions are denormalized for ev aluation. D. Computational Cost T able 10. Computational requirements. Inference time measured on an NVIDIA R TX 4090M Laptop GPU. Method Parameters GPU Memory Inference Histogram Baseline 0.2M 0.02 GB 0.76 ms/tile Mean-pooling + PTv3 39.4M 4.3 GB 119 ms/tile Ours (Z-Stratiﬁed) 41.1M 4.6 GB 133 ms/tile Our decoder adds 1.7M parameters (+4.3%) and 14ms inference time (+11.8%) compared to mean-pooling, while achieving 8.6-point mIoU improv ement and 70% RMSE reduction—a fa v orable accuracy-ef ﬁciency tradeoff. E. Broader A pplicability While dev eloped for permafrost monitoring, our z-stratiﬁed projection addresses a general challenge: preserving vertically- distributed information when projecting 3D data to 2D beneath occluding structures. Potential applications include: • For estry: Understory biomass estimation from canopy-penetrating lidar • Urban mapping: Ground-le vel inference beneath b uilding o verhangs • A utonomous driving: Road surface estimation under ve getation • Archaeology: Subsurface feature detection through forest co ver F . Gr ound T ruth V isualization and Analysis This section presents visualizations of the ground truth ele v ation change data to pro vide insight into the dataset characteristics and support reproducibility . F .1. V alue Distribution Figure 8 sho ws the distrib ution of ele v ation change v alues across the study site. The histogram (a) rev eals a right-sk e wed distribution centered around +0 . 7 cm, with the majority of pixels falling in the heav e regime (positi ve values). The vertical dashed lines indicate the class boundaries used for the 7-class classiﬁcation formulation. The distribution e xhibits a clear peak in the low hea ve range ( 0 to +1 . 0 cm) and a longer tail toward extreme hea v e values reaching +5 . 7 cm, corresponding to ve getation gro wth in grassy areas. The box plots (b) illustrate the within-class value distrib utions, ordered from High Thaw (C7, leftmost) to High Hea ve (C1, rightmost). The thaw classes (C6–C7) sho w tight distributions corresponding to the localized thermokarst feature, while High Heav e (C1) exhibits greater spread due to varying ve getation gro wth rates. The median values progress monotonically across classes, conﬁrming that the ordinal class structure captures a meaningful physical gradient from subsidence to uplift. 16 Preser ving V ertical Structure in 3D-to-2D Pr ojection f or Permafr ost Thaw Mapping F igure 8. Ground truth value distrib ution. (a) Histogram of elev ation change v alues with class boundaries (dashed red lines). Negati ve values indicate tha w (subsidence); positi ve v alues indicate heave (uplift). (b) Box plots showing the value distrib ution within each class, ordered from High Thaw (C7) to High Hea ve (C1). F .2. Spatial Distribution Figure 9 presents the spatial distribution of ground truth values across the Farmer’ s Loop study site. The continuous map (a) rev eals distinct spatial patterns: a localized tha w feature (thermokarst with central pond) appears in w arm colors (red/orange) in the south-central region, while heav e dominates the surrounding areas (cool colors), with the most extreme heav e along a grassy strip on the eastern edge where vegetation gro wth between May and August contributes to positi ve ele vation change. The classiﬁed map (b) discretizes these patterns into the 7-class formulation. The High Thaw class (C7, dark red) forms coherent spatial clusters corresponding to activ e thermokarst features, while the High Heav e class (C1, dark blue) dominates the v egetated eastern strip. The strong spatial coherence (Moran’ s I = 0.99) is visually e vident, with similar classes clustering together rather than appearing randomly distributed. F igure 9. Spatial distribution of ground truth ele v ation change. (a) Continuous v alues in centimeters, with warm colors indicating tha w (subsidence) and cool colors indicating heave (uplift). (b) 7-class discretization showing the spatial extent of each severity category . Coordinates are in UTM Zone 6N (EPSG:32606). 17 Preser ving V ertical Structure in 3D-to-2D Pr ojection f or Permafr ost Thaw Mapping F .3. Class Imbalance Analysis Figure 10 quantiﬁes the class distrib ution and resulting imbalance. The bar chart (a) shows that the dataset is hea ve- dominated, with 65.5% of pixels f alling in hea ve classes (C1–C4). High Heav e (C1) is the most pre valent class at 24.8%, while High Thaw (C7) is the least common at 7.1%. The imbalance ratios (b) reveal moderate class imbalance with a maximum ratio of 3.5 × between the largest (C1) and smallest (C7) classes. This imbalance reﬂects the physical reality that sev ere thermokarst subsidence is spatially localized, making accurate detection of High Thaw the most challenging and ar guably most important task for infrastructure monitoring. The in verse-frequency weights sho wn in T able 8 ensure that the rare High Thaw class (C7) recei ves proportionally higher loss contributions during training. F igure 10. Class distribution analysis. (a) Percentage of pixels in each class, showing the heav e-dominated nature of the dataset. (b) Imbalance ratios computed as (max class count) / (class count), with the dashed line indicating balanced classes. High Thaw (C7) is 3.5 × underrepresented relativ e to High Hea ve (C1). G. Qualitative Results Figure 11 presents qualitati ve results on three representative e v aluation tiles, showing both regression and classiﬁcation outputs alongside ground truth. Across all examples, the model demonstrates robust performance in capturing the dominant spatial patterns of permafrost dynamics, with classiﬁcation predictions that closely match ground truth class boundaries. The regression outputs provide additional granularity for continuous monitoring applications. 18 Preser ving V ertical Structure in 3D-to-2D Pr ojection f or Permafr ost Thaw Mapping F igure 11. Qualitativ e results on three evaluation tiles. Each row shows (left to right): input point cloud with RGB coloring, regres- sion ground truth, regression prediction, classiﬁcation ground truth, and classiﬁcation prediction. Regression colormaps show thaw (red/negati ve) to heave (blue/positiv e) in centimeters. Classiﬁcation legend indicates sev erity from High Thaw (C7, red) through No Change (C5, cream) to High Heav e (C1, dark blue). The model accurately captures spatial patterns of thaw and heave across di verse tile conditions. 19

Preserving Vertical Structure in 3D-to-2D Projection for Permafrost Thaw Mapping

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment