PiCSRL: Physics-Informed Contextual Spectral Reinforcement Learning
High-dimensional low-sample-size (HDLSS) datasets constrain reliable environmental model development, where labeled data remain sparse. Reinforcement learning (RL)-based adaptive sensing methods can learn optimal sampling policies, yet their applicat…
Authors: Mitra Nasr Azadani, Syed Usama Imtiaz, Nasrin Alamdari
PiCSRL: Physics-Informed Conte xtual Spectral Reinforcement Learning Mitra Nasr Azadani , Syed Usama Imtiaz , and Nasrin Alamdari Department of Civil and En vir onmental Engineering Florida State University , T allahassee, FL, USA { si22j, mn22, nalamdari } @fsu.edu Abstract —High-dimensional low-sample-size (HDLSS) datasets constrain reliable en vironmental model development, where la- beled data remain sparse. Reinf orcement learning (RL)-based adaptive sensing methods can learn optimal sampling policies, yet their application is sever ely limited in HDLSS contexts. In this work, we present PiCSRL (Physics-Inf ormed Contextual Spec- tral Reinfor cement Lear ning), where embeddings are designed using domain knowledge and parsed directly into the RL state repr esentation f or impro ved adaptive sensing. W e developed an uncertainty-aware belief model that encodes physics-inf ormed features to improve prediction. As a repr esentative example, we evaluated our approach for cyanobacterial gene concentration adaptive sampling task using N ASA P A CE h yperspectral imagery over Lake Erie. PiCSRL achieves optimal station selection (RMSE = 0.153, 98.4% bloom detection rate, outperforming random (0.296) and UCB (0.178) RMSE baselines, respectiv ely . Our ablation experiments demonstrate that physics-informed features improve test generalization (0.52 R², +0.11 over raw bands) in semi-supervised learning . In addition, our scalability test shows that PiCSRL scales effectively to large networks (50 stations, > 2M combinations) with significant impro vements ov er baselines (p = 0.002). W e posit PiCSRL as a sample-efficient adaptive sensing method across Earth observation domains for impro ved observation-to-tar get mapping. Index T erms —Adaptive sensing, reinf orcement learning, physics-inf ormed machine learning, high-dimensional data, hy- perspectral remote sensing I . I N T R O D U C T I ON R E I N F O R C E M E N T L E A R N I N G (RL) based adaptive sensing methods (ADS) [1] learn when and how to sense with in action space (e.g., sampling rate, sensor activ ation decision) via optimization that maximizes information gain and mini- mizes resource use. Y et, this learning gets sev erely constrained in high-dimensional low-sample-size (HDLSS) [2] contexts, in particular , en vironmental monitoring. In the realm of Earth observation (EO), remotely sensed (RS) data hav e driv en a paradigm shift to complement ground-truth data [3]. From near-surf ace sensors to the newly launched NASA ’ s Plankton, Aerosol, Cloud, and Ocean Ecosystem (P A CE) mission, which Copyright 2026 IEEE. Published in the 2026 IEEE International Geoscience and Remote Sensing Symposium (IGARSS 2026), scheduled for 9 - 14 August 2026 in W ashington, D.C.. Personal use of this material is permitted. Ho wever , permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works, must be obtained from the IEEE. Contact: Manager, Copyrights and Permissions / IEEE Service Center / 445 Hoes Lane / P .O. Box 1331 / Piscataway , NJ 08855-1331, USA. T elephone: + Intl. 908-562-3966. provides hundreds of contiguous spectral bands with increas- ingly high-resolution data and rich spatial details. Ho wever , this unprecedented capability for en vironmental monitoring (i.e., water quality [4, 5], precision in agriculture) requires learning from labeled examples, i.e., ground-truth measure- ments, which remain sparse and limited. In this regard, when feature dimension approaches or exceeds sample size, the mathematical foundations of learning change fundamentally , and the cov ariance estimates become unreliable, which makes models fit noise rather than the true signal. ADS dynamically allocates monitoring resources based on belief state to improve detection ef ficiency [6]. When the action space is small, simple heuristic methods may suf fice, and optimal solutions can be deriv ed via exhausti ve search. Howe ver , as the action space increases, the complexity of finding optimal combinations grows exponentially . Current ADS methods in en vironmen- tal monitoring hav e explored Gaussian process bandits [7], information-directed sampling [8], and spatial utility optimiza- tion [9] as solutions. HDLSS constraints are often addressed through regularization techniques that reduce model com- plexity using dimensionality reduction and data augmentation [10]. These challenges, with increasingly high action space and HDLSS for adapted sensing , are further exacerbated by the inherent nature of the sequential decision-making task in the domain of water quality , which is inherently challenging. Such as constraints imposed by catchment hydrology [11], coupled with complex, nonlinear bacterial ecological behavior [12, 13]. Moreover , in spectral sensing applications where physical laws govern observations [14], this creates a funda- mental r epr esentation pr oblem . W e propose PiCSRL: Physics- Informed Contextual Spectral Reinfor cement Learning , where embeddings are designed using domain knowledge and parsed directly into the RL state representation for improved adaptive sensing. In a representation example, we v alidate our approach in policy sampling for cyanobacteria genes concentration prediction using N ASA P ACE hyperspectral imagery over Lake Erie, USA. Our contributions are as follows: 1) W e posit that PiCSRL bridges HDLSS constraints with an improved representation mechanism for models uti- lizing hyperspectral data. 2) W e introduce PiCSRL, the first RL framew ork us- ing hyperspectral sensing under HDLSS constraints for sample-efficient policy learning. 3) W e demonstrate direct hyperspectral-to- cyanobacterial genes predictive modeling and to our knowledge, this is the first Algal bloom sequential decision policy RL from hyperspectral imagery for improv ed lake sampling. I I . M E T H O D S A. Pr oblem F ormulation Let’ s x ∈ R d denote sensor observ ations with d features and y ∈ R the target quantity at each location. The agent has access to n labeled training samples where d > n , the HDLSS condition. The task is to learn a policy π and sequentially select K locations from N candidates to maximize information about the spatial field subject to budget constraints. B. Semi-Supervised Learning with Physics-Informed F eatur es Let’ s ϕ : R d → R d ′ denote a physics-informed trans- formation where d ′ ≪ d . For our spectral observ ations, ϕ computes ten indices derived from established spectroscopic relationships (T able I): x phys i = ϕ ( r i ) = [ I 1 ( r i ) , I 2 ( r i ) , . . . , I 10 ( r i )] T (1) Where r i ∈ R 286 is the raw hyperspectral reflectance, and does each index I j encode a specific physical mechanism. W e employ a teacher-student semi-supervised learning (SSL) pipeline to lev erage unlabeled observ ations with dimensional- ity reduced through physics-informed features. W e first train a ridge regression model on the labeled dataset using the physics-informed features: ˆ y i = w T x phys i + b, where x phys i ∈ R 10 (2) The regularization parameter α = 1 . 0 was selected via 5- fold cross-validation to prev ent overfitting on the small labeled set. Ridge re gression serves as a robust teacher due to its closed-form solution, resistance to multicollinearity , and well- calibrated predictions under regularization. The trained teacher model generates pseudo-labels for all unlabeled pixels by applying it to their physics-informed features: ˜ y u = f teacher ( ϕ ( r u )) , ∀ r u ∈ D unlabeled (3) Pseudo-labels are constrained to the empirical range [min( y train ) , max( y train )] to enforce physical plausibility . W e adopted a multi-layer perceptron (MLP) serves as the student model, and train it on the combined labeled and pseudo- labeled datasets totaling 60,215 samples. The architecture consists of two hidden layers (64 and 32 neurons) with batch normalization after each layer , ReLU acti vation, and dropout ( p = 0 . 3 ) for regularization. W e use a weighted loss function that anchors predictions to ground-truth measurements while lev eraging the broader spectral distribution captured in unla- beled data: L = 1 N N X i =1 w i ( y i − ˆ y i ) 2 (4) Where w i = 10 for labeled samples and w i = 1 for pseudo- labeled samples. This 10:1 weighting ratio was chosen to balance trust in ground-truth versus teacher predictions. T ABLE I P H YS I C S -B AS E D S P E CT R A L I N D IC E S Index Formulation Physical Mechanism CI ρ 681 − ρ 665 Chlorophyll fluorescence NDCI ( ρ 709 − ρ 665 ) / ( ρ 709 + ρ 665 ) Red-edge response MCI ρ 709 − ( ρ 681 + ρ 753 ) / 2 Maximum chlorophyll F AI See [15] Floating algae index PC ρ 620 /ρ 665 Phycocyanin absorption ChlRed ρ 680 /ρ 665 Chlorophyll/red ratio BG ρ 443 /ρ 555 Blue/green ratio GR ρ 555 /ρ 665 Green/red ratio NIR ρ 865 /ρ 665 NIR/red ratio NDI ( ρ 665 − ρ 620 ) / ( ρ 665 + ρ 620 ) Normalized difference Representativ e indices shown; full set includes ten features. ρ λ denotes reflectance at wav elength λ nm. Formulations deriv e from established bio-optical relationships [4, 5]. C. Reinfor cement Learning in Reduced State Space Our belief model employs a shallow neural network for point prediction, together with a bootstrap ensemble for uncertainty estimation. Each network f m maps transformed features z = ϕ ( x ) to predictions, with the ensemble providing uncertainty through prediction disagreement: µ ( z ) = 1 M M X m =1 f m ( z ) , σ ( z ) = v u u t 1 M M X m =1 ( f m ( z ) − µ ( z )) 2 (5) where M denotes ensemble size. Network architecture is kept shallow to prev ent overfitting in HDLSS conditions. The state at decision step t comprises predicted values µ ∈ R N and uncertainties σ ∈ R N for candidate locations, along with a binary mask indicating previously visited sites. State dimen- sionality scales with candidate locations N , independent of raw observation dimension d . Actions correspond to selecting an un visited candidate location. The reward function balances information acquisition with uncertainty reduction: r t = α · r info ( a t ) + β · r uncert ( a t ) + γ · r spatial ( a t ) (6) where re wards for info is the negati ve absolute prediction error , uncert is the epistemic uncertainty at selected location, and spatial is min distance to previously selected stations; com- ponent weights are determined through sensiti vity analysis. Policy learning is performed using a deep Q-learning frame- work trained with uniform experience sampling [16]. T raining proceeds through simulated episodes constructed from the belief model predictions, requiring no additional real-world samples beyond those used for belief model construction. This simulation-based training helps model capture the physics- informed state representation. I I I . E X P E R I M E N T S A. Experimental Setup W e employ hyperspectral imagery from NASA ’ s P A CE Ocean Color Instrument (OCI) eight-day composite products paired with ground-truth measurements (cyanobacteria gene abundance) from W estern Lake Erie. Our training data com- prises 98 station-days from 2024; testing uses 92 station-days from 2025. These eight monitoring stations span the western Fig. 1. Physics-Informed Contextual Spectral Reinforcement Learning (PiCSRL) framework. Physics-informed bio-optical indices and sparse in-situ observations are integrated through a semi-supervised learning framew ork and an uncertainty-aware belief model. These uncertainty-aware predictions are then used in a reduced RL state representation to guide adaptiv e station selection under sampling constraints. basin, and adaptiv e selection is performed under a fixed sam- pling budget of three stations per sampling event. For rigorous analysis, we implemented detailed baseline experiments that include random selection, spatially-stratified sampling, greedy intensity-based selection, and an upper confidence bound (UCB) strategy based on belief mean and uncertainty [7]. B. Repr esentation Learning for HDLSS W e first analyzed the effect of physics-informed features on generalization using a semi-supervised learning approach. Our results demonstrate that raw spectral features achieve higher training fit but substantially lower test generalization (T est R² = 0.41) compared to physics-informed indices (T est R² = 0.52). In principle, When in-sample performance is higher in comparison to out-of-sample performance, indicates that model learns spurious correlations specific to training data rather than generalizable relationships. Our semi-supervised approach yields modest but consistent improv ement over the supervised baseline. The ensemble model achieves a test R 2 = 0 . 517 compared to the teacher’ s R 2 = 0 . 516 , with the 613 × expansion in training data (from 98 to 60,215 samples) pro- viding marginal gains in generalization. The limited improv e- ment suggests that hand-crafted physics-based indices already encode the bloom-relev ant spectral signal ef fectively , with minimal additional information accessible through pseudo- label augmentation. This validates our hypothesis that physics- T ABLE II G E NE R A L IZ A T I O N P E RF O R M AN C E B Y F E A T U R E R E PR E S E NTA T I O N Featur es Dimension T rain R 2 T est R 2 Physics-Informed Indices 10 0.47 0.52 Raw Spectral Bands 117 0.52 0.41 Combined 127 0.54 0.49 informed dimensionality reduction is the primary mechanism addressing HDLSS challenges, and SSL provides incremental benefit by expanding coverage of the spectral feature space. C. Adaptive Strate gy P erformance W e performed adaptiv e sensing experiment with eight can- didate stations and three selection budget with exhaustiv e enumeration to identify the optimal station combination and direct verification of learned policies. PiCSRL Fig. 2 consistently selects the station subset with minimum reconstruction error and commensurate the exhaus- tiv e optimum (RMSE = 0.1527 ± 0.006) and ef fectively attaining optimal performance. In contrast, heuristic baselines exhibit substantially higher reconstruction error , including Greedy-Spatial (RMSE = 0.2098) and Greedy-Risk (RMSE = 0.1982), while random selection performs least (RMSE = 0.2958). Uncertainty-aware selection improv es performance relativ e to purely heuristic algorithms in reducing reconstruc- tion error (RMSE = 0.178 ± 0.011) and increasing bloom Fig. 2. Adaptive sampling performance for selecting K = 3 stations from N = 8 candidates. Left: Lake-wide reconstruction error (RMSE; mean ± 1 σ over 500 episodes), with the optimal exhaustive baseline shown as a dashed line. Right: Bloom detection rate (%), with the 95% operational target and the perfect (100%) reference indicated. detection accuracy . Howe ver , the proposed PiCSRL frame- work achieves the best overall performance in simultaneously minimizing reconstruction error and maximizing detection rate (98.4%). In addition, the computational comparison further fa- vors the proposed approach, where PiCSRL inference requires only a forward pass through a trained deep Q-network, while exhausti ve search requires iterativ e finding of optimal combi- nations for station counts and this becomes infeasible for lar ge- scale networks. This efficienc y advantages the deployment of RL-based adaptive sensing for large spatial scales to preserve near-optimal performance. D. Scalability Analysis In order to ev aluate our model performance for large scale deployment, we constructed a 50-station adapti ve sensing scenario, where virtual stations were spatially distributed and formed ov er two million possible selection combinations. PiCSRL Fig. 3 maintains superior performance advantages at this large scale and achie ves the highest bloom detection rate (88.5%) where largest cumulativ e reward was 6.97. In comparison, Greedy-Risk achieves 84.3% detection, while uncertainty-based UCB selection reaches 81.3%. Random se- lection performs at least with only (9.3%) detatection rate and reflects the dif ficulty of the task at scale. Our statistical test confirms that the improvement achieved by PiCSRL over Greedy-Risk is significant (p = 0.002), while the UCB baseline also dif fers significantly from the reference (p = 0.042). These results further substantiate that the proposed approach scales effecti vely to large candidate sets. I V . D I S C U S S I O N Our physics-informed RL provides a mechanistic represen- tations that overcome HDLSS data constraints without any explicit reliance on regularization. W e note, in ablation ex- periments, higher-dimensional features achiev e better training performance but didn’ t perform well on test datasets. This is evidence that the model overfitted and learned spurious correlations. In addition, hyperspectral sensors such as P A CE hav e contiguous narrow wav elengths; while this enhances en vironmental modeling to obtain precise spectral signatures, Fig. 3. Performance comparison of adaptive sampling strategies. Left: De- tection accuracy for comparison methods and the proposed PiCSRL method. Right: Corresponding cumulativ e reward achieved by each strategy . PiCSRL attains the highest detection accuracy and cumulative rew ard, with statistically significant improvement over baseline methods ( p = 0 . 002 ). it also induces multicollinearity when target features span multiple bands. The pretrained embeddings encode physics- based representations for the do wnstream task; the RL agent does not need to rediscov er them from raw data. This sub- stantially reduces complexity , which leads to a faster learning and better decisions. In our experiments, we compare against a UCB baseline ( µ + β σ ) that combines ensemble predictions with uncertainty bonuses. DQN consistently outperforms this heuristic with statistical significance. Unlike Gaussian Process methods that would require cubic computational scaling and suffer from kernel degradation in high dimensions, our boot- strap ensemble provides computationally efficient uncertainty estimates while operating in the physics-informed feature space. This architectural choice, learning policies in a reduced belief space rather than raw observation space, enables RL to succeed in HDLSS conditions where traditional kernel- based methods face fundamental limitations. Limitations and Future W ork: Our current approach requires domain experts to specify physics-informed features, and its limited to aquatic monitoring. Future work in volves multi-objective extensions using more datasets and comparing RL performance. V . C O N C L U S I O N In this work, we presented a framew ork based on an uncertainty-aware, multi-objective RL model. As a repre- sentativ e example, we formulated monitoring as a sequen- tial decision problem for adapti ve en vironmental monitor - ing, which establishes physics-informed representations in high-dimensional, low-sample-size (HDLSS) contexts. For cyanobacterial gene prediction, our model uses domain- informed hyperspectral bands that enable sample-efficient learning ov er large regions. Our experimental analysis demon- strates that RL consistently identifies near-optimal sampling locations and outperforms v arious competing methods. As en vironmental systems are highly interconnected that requires sequential decision-making for long-term benefits; our work lays the foundation for lev eraging RS-based data to move beyond sparse and static observation strategies. V I . A C K N OW L E D G M E N T Fig. 1 is created with Biorender [17] R E F E R E N C E S [1] D. Pecioski, V . Gavriloski, S. Domazetovska, and A. Ignjatovska, “ An ov erview of reinforcement learning techniques, ” in Pr oc. 12th Mediterranean Conf. Embed- ded Computing (MECO) , Budva, Montenegro, 2023, pp. 1–4. [2] C. Chadebec, E. Thibeau-Sutre, N. Burgos, and S. Allas- sonni ` ere, “Data augmentation in high dimensional low sample size setting using a geometry-based variational autoencoder , ” IEEE T rans. P attern Anal. Mach. Intell. , vol. 45, no. 3, pp. 2879–2896, Mar . 2023. [3] A. Nuriddinov , E. Ahmadisharaf, and M. R. Alizadeh, “High Resolution Flood Extent Detection Using Deep Learning with Random Forest Deriv ed Training La- bels, ” arXiv preprint , 2026. A v ailable: https://arxiv .org/abs/2603.22518 [4] S. U. Imtiaz, M. Nasr Azadani, and N. Alamdari, “SimCLR-enabled wide and deep learning for cyanobac- terial bloom prediction from NASA ’ s P A CE hyperspec- tral mission, ” IEEE Geosci. Remote Sens. Lett. , vol. 22, pp. 1–5, 2025, Art. no. 1504905. [5] S. H. Rabby , X. Sun, A. M. I. Hafiz, Z. Y an, S. U. Imtiaz, M. Nasr Azadani, M. Pakdehi, A. S. Moumouni, E. Ah- madisharaf, and N. Alamdari, “ Application of machine learning methods in water quality modeling, ” in Machine Learning and Artificial Intelligence in T oxicology and En vir onmental Health , Z. Lin and W .-C. Chou, Eds. Academic Press, 2026, pp. 271–309. [6] P . F . Lermusiaux, “ Adaptiv e modeling, adapti ve data assimilation and adapti ve sampling, ” Physica D , vol. 230, pp. 172–196, 2007. [7] N. Sriniv as et al. , “Gaussian process optimization in the bandit setting: No regret and experimental design, ” in Pr oc. ICML , pp. 1015–1022, 2010. [8] D. Russo and B. V an Roy , “Learning to optimize via information-directed sampling, ” Oper . Res. , vol. 66, no. 1, pp. 230–252, 2018. [9] A. Krause et al. , “Near-optimal sensor placements in Gaussian processes: Theory , efficient algorithms and empirical studies, ” J. Mach. Learn. Res. , v ol. 9, pp. 235– 284, 2008. [10] D. T uia et al. , “Domain adaptation for the classification of remote sensing data, ” IEEE Geosci. Remote Sens. Mag. , vol. 4, no. 2, pp. 41–57, 2016. [11] M. Nasr Azadani, S. U. Imtiaz, and N. Alamdari, “Role of impoundment and irrigation in intensive agriculture watersheds, ” J. Hydrol. , vol. 662, pt. C, 2025, Art. no. 134075. [12] M. A. Salou, S. U. Imtiaz, M. Nasr Azadani, and N. Alamdari, “Near real-time and next-day prediction for Escherichia coli ( E. coli ) concentrations in highly urbanized watersheds, ” W ater Res. , vol. 290, 2026, Art. no. 125030. [13] N. Alamdari, Z. Y an, M. Nasr Azadani, and S. U. Imtiaz, “ Algal blooms, ” in Data-Driven Earth Observa- tion for Disaster Management , X. Huang, S. W ang, K. Kalogeropoulos, and A. Tsatsaris, Eds. Elsevier , 2026, pp. 183–205. [14] S. U. Imtiaz, M. Nasr Azadani, and N. Alamdari, “SpecTM: Spectral T ar geted Masking for Trustw orthy Foundation Models, ” arXiv preprint , 2026. A vailable: https://arxi v .org/abs/2603.22097 [15] C. Hu, “ A nov el ocean color index to detect floating algae in the global oceans, ” Remote Sens. Envir on. , vol. 113, no. 10, pp. 2118–2129, 2009. [16] Z. W ang et al. , “Dueling network architectures for deep reinforcement learning, ” in Pr oc. ICML , pp. 1995–2003, 2016. [17] Created in BioRender . Imtiaz, S. U. (2026) https://BioRender .com/q746gxk
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment