Adaptive Regime-Aware Stock Price Prediction Using Autoencoder-Gated Dual Node Transformers with Reinforcement Learning Control

1 This work has been submitted to the IEEE for possible publication. Copyright may be transferr ed without notice, after which this ver sion may no longer be accessible. Adapti v e Re gime-A ware Stock Price Prediction Using Autoencoder -Gated Dual Node T ransformers with Reinforcement Learning Control Mohammad Al Ridhawi, Mahtab Haj Ali, and Hussein Al Osman School of Electrical Engineering and Computer Science, Uni versity of Ottawa, Otta wa, Canada e-mail: malri039@uottaw a.ca Abstract —Stock markets exhibit r egime-dependent behavior where prediction models optimized f or stable conditions often fail during volatile periods. Existing appr oaches typically treat all market states uniformly or requir e manual regime labeling, which is expensive and quickly becomes stale as market dynamics evolv e. This paper introduces an adaptive prediction framework that adaptively identiﬁes deviations fr om normal market condi- tions and routes data thr ough specialized pr ediction pathways. The ar chitecture consists of three components: (1) an autoencoder trained on normal mark et conditions that identiﬁes anomalous regimes through reconstruction error , (2) dual node transf ormer networks specialized f or stable and event-driven mark et con- ditions r espectively , and (3) a Soft Actor -Critic r einforcement learning controller that adaptively tunes the regime detection threshold and pathway blending weights based on prediction performance feedback. The reinf orcement learning component enables the system to learn adaptive regime boundaries, deﬁning anomalies as mark et states where standard prediction approaches fail. Experiments on 20 S&P 500 stocks spanning 1982 to 2025 demonstrate that the pr oposed framework achie ves 0.68% MAPE for one-day predictions without the reinfor cement controller and 0.59% MAPE with the full adapti ve system, compared to 0.80% f or the baseline integrated node transf ormer . Directional accuracy reaches 72% with the complete framework. The system maintains robust performance during high-volatility periods, with MAPE below 0.85% when baseline models exceed 1.5%. Ablation studies conﬁrm that each component contributes mean- ingfully: autoencoder routing accounts for 36% relati ve MAPE degradation upon remov al, followed by the SA C controller at 15% and the dual-path architectur e at 7%. Index T erms —Stock price forecasting, autoencoder , regime detection, node transf ormer , reinfor cement learning, Soft Actor - Critic, adaptive systems, deep lear ning. I . I N T RO D U C T I O N F IN ANCIAL markets operate across distinct regimes char- acterized by different statistical properties, volatility lev- els, and correlation structures [1]. During stable periods, price mov ements follo w relativ ely predictable patterns driv en by fundamental factors and gradual information incorporation. Crisis periods, earnings announcements, and macroeconomic shocks induce abrupt shifts in market beha vior where historical patterns pro vide limited guidance. Models trained on aggregate historical data often perform well on average but degrade un- der volatile or e vent-dri ven conditions, where rob ust prediction is especially important. Prior work on stock prediction has often treated market conditions as homogeneous. Graph neural networks capture cross-sectional dependencies [2], transformers model temporal dynamics [3], and sentiment analysis incorporates qualitative signals [4]. Our previous work demonstrated that combining node transformer architectures with BER T (Bidirectional En- coder Representations from Transformers) sentiment analysis achiev es 0.80% mean absolute percentage error (MAPE) and 65% directional accurac y (D A) on S&P 500 stocks [5]. Y et this integrated model applies the same processing regardless of market conditions, leaving potential gains from regime-a ware specialization unexploited. The challenge of regime detection compounds prediction difﬁculties. Traditional approaches rely on hidden Markov models [1] or threshold rules on v olatility indicators [6], both requiring manual speciﬁcation of regime deﬁnitions. Supervised classiﬁers demand labeled training data identifying which historical periods constitute crises or anomalies. Such labels are subjectiv e, backward-looking, and fail to generalize as market structure ev olves. A system that automatically dis- cov ers re gime boundaries from prediction performance itself would avoid these limitations. This paper introduces an adaptiv e framework addressing both challenges. An autoencoder trained on normal market data learns to reconstruct typical price patterns; high recon- struction error indicates departure from learned normality . This weakly supervised anomaly score gates data ﬂow through dual node transformer pathways: one optimized for stable conditions, another incorporating ev ent-speciﬁc features for turbulent periods. A Soft Actor-Critic (SA C) reinforcement learning controller observes prediction outcomes and adjusts the autoencoder threshold and pathway blending weights to maximize forecasting accuracy . The SA C component adapts the anomaly-routing threshold by discovering which threshold 2 settings improv e downstream predictions. The contrib utions of this work are: 1) An autoencoder -based regime detection mechanism that identiﬁes market state shifts using weakly supervised anomaly detection trained on historically stable market periods. The autoencoder learns a compressed represen- tation of normal market dynamics; deviations from this representation trigger e vent-aw are processing. 2) A dual node transformer architecture with specialized pathways for stable and volatile market conditions. The ev ent pathway incorporates additional features including volatility regime indicators, sentiment spikes, and ev ent characterization signals. 3) A Soft Actor-Critic reinforcement learning controller that adaptiv ely tunes the regime detection threshold and pathway blending based on realized prediction perfor- mance. This enables the system to learn adapti ve regime deﬁnitions from prediction outcomes rather than relying on fully hand-labeled regime annotations. 4) Experimental validation demonstrating a 26% MAPE reduction ov er the baseline integrated node transformer (0.59% vs 0.80%) and a 7 percentage point improv ement in directional accurac y (72% vs 65%). Section II re views related work on regime detection, adap- tiv e prediction, and reinforcement learning for ﬁnancial appli- cations. Section III presents the proposed architecture. Sec- tion IV reports experimental results, and Section V discusses ﬁndings, limitations, and implications. I I . L I T E R A T U R E R E V I E W A. Re gime Detection in F inancial Markets Market re gime identiﬁcation has a long history in economet- rics and quantitativ e ﬁnance. Hamilton [1] introduced Markov- switching models that probabilistically transition between states with distinct statistical properties. These models esti- mate regime-speciﬁc parameters (means, v ariances, transition probabilities) via maximum likelihood, enabling classiﬁcation of historical periods into regimes. Extensions incorporate time- varying transition probabilities [7] and multiv ariate dependen- cies. Threshold models of fer an alternative where re gime switches occur when observable v ariables cross speciﬁed boundaries. The Self-Exciting Threshold Autoregressi ve (SE- T AR) model [8] switches dynamics based on lagged values of the series itself. In ﬁnance, volatility indices such as VIX (CBOE V olatility Index) commonly serve as re gime indicators, with thresholds separating low , medium, and high volatility states. Machine learning approaches to regime detection include clustering methods that partition historical periods based on feature similarity [9], hidden Marko v models with neural network emission distributions, and change-point detection al- gorithms [10]. These methods generally require either explicit labels or assumptions about the number and nature of regimes. The framew ork proposed in this paper does not entirely av oid such assumptions, as the primary routing is binary and the ev ent pathway conditions on three VIX-based volatility levels. This design nonetheless requires fe wer structural commitments than methods that must specify the number , boundaries, and statistical properties of multiple regime states. The autoen- coder learns to distinguish normal from anomalous market conditions through reconstruction error without requiring ex- plicit regime deﬁnitions, and the SA C controller continuously adapts the routing threshold based on prediction feedback rather than relying on ﬁxed, manually chosen boundaries. The regime structure is therefore partially discovered from data rather than imposed entirely by the modeler . B. Autoencoders for Anomaly Detection Autoencoders learn compressed representations by recon- structing inputs through an information bottleneck [11]. The encoder maps inputs to a lower -dimensional latent space, and the decoder reconstructs the original input from this represen- tation. When trained on normal data, autoencoders reconstruct typical patterns with low error; anomalous inputs that deviate from the training distrib ution yield higher reconstruction error , providing an anomaly score. V ariational autoencoders (V AEs) extend this framew ork by imposing distributional constraints on the latent space [12]. The V AE objectiv e combines reconstruction loss with a regularization term encouraging the latent distribution to match a prior (typically standard Gaussian). This probabilistic formulation enables principled uncertainty quantiﬁcation and generation of no vel samples. In ﬁnancial applications, autoencoders hav e been applied to fraud detection [13], anomaly identiﬁcation in trading patterns [14], and feature e xtraction for do wnstream prediction tasks [15]. Liu et al. [16] employed autoencoder-based feature extraction combined with bidirectional LSTM (Long Short- T erm Memory) for stock price prediction, reporting improved performance from the learned representations. C. Graph Neur al Networks for Stock Prediction Graph neural netw orks (GNNs) model relational structure among entities through message passing over graph topology [17]. In stock prediction, nodes represent individual securities while edges capture relationships including sectoral afﬁliation, supply chain connections, or return correlations. Chen et al. [2] proposed a graph con volutional feature-based CNN combining graph con volutions with dual con volutional networks for market-le vel and stock-le vel features. W ang et al. [18] introduced multi-graph architectures deﬁning both static (sector) and dynamic (correlation) graphs, achie ving 5.11% error reduction ov er LSTM baselines on Chinese market indices. The node transformer architecture [19] extends transformers to graph-structured data through attention mechanisms that respect graph topology . Unlike standard graph neural networks with ﬁx ed message-passing schemes, node transformers learn contextualized representations through adaptiv e attention o ver graph neighborhoods. 3 D. Reinfor cement Learning in F inance Reinforcement learning (RL) optimizes sequential decision- making through interaction with an en vironment, learning policies that maximize cumulativ e re ward [20]. Financial ap- plications include portfolio optimization [21], order execution [22], and trading strategy de velopment [23]. Deep RL algorithms combine neural network function ap- proximation with RL principles. Deep Q-Networks (DQN) learn action-value functions for discrete action spaces [24], while policy gradient methods directly optimize policies for continuous action spaces. Actor-critic algorithms unify both approaches by combining v alue estimation (critic) with policy optimization (actor) for improv ed stability and sample efﬁ- ciency . Soft Actor-Critic (SA C) [25] incorporates entropy regular - ization into the actor-critic framework, encouraging explo- ration while maintaining policy stability . By adding policy entropy to the re ward, the maximum entrop y objectiv e pre vents premature con ver gence to deterministic policies, and SAC performs well across continuous control tasks with delayed, noisy re ward signals [25], [26]. Here, SAC serves not as a trading agent but as a meta- controller that learns to conﬁgure the prediction system itself. The controller adjusts the autoencoder threshold and pathway blending weights based on observ ed prediction performance, effecti vely learning what regime deﬁnitions optimize down- stream forecasting accurac y . E. Resear ch P ositioning Prior work has addressed regime detection and stock predic- tion as separate problems. Regime-switching models identify market states but do not adapt prediction methods accordingly [1], [27], [28], while stock prediction models treat all con- ditions uniformly or rely on hand-crafted regime indicators [2], [29]. Our framew ork inte grates these elements by pairing unsupervised re gime detection (autoencoder) with specialized prediction pathways (dual node transformers) and a SAC con- troller that learns optimal re gime deﬁnitions from prediction outcomes. This closed-loop architecture enables the system to discov er useful regime boundaries rather than imposing them a priori. I I I . M E T H O D O L O G Y A. System Overview Figure 1 presents the complete system architecture. Raw market data ﬂows through feature engineering to produce technical indicators and normalized price features. The autoen- coder processes these features, producing a reconstruction er - ror score that quantiﬁes deviation from normal market patterns and serves as the basis for regime classiﬁcation. Based on this score and a learned threshold, the router directs data to either the normal or ev ent node transformer pathway depending on the detected regime. Both pathways produce predictions that are blended according to learned weights. The ﬁnal prediction is ev aluated, and the SA C controller uses this feedback to adjust the autoencoder threshold and blending parameters for subsequent iterations. Market Data Features x t Autoencoder Router NodeFormer (Normal) NodeFormer (Event) Adaptive Blending Evaluation SAC Controller e t e t < τ e t ≥ τ y N t + h y E t + h ˆ y t + h RMSE, DA τ α Fig. 1. System architecture overvie w . Market features x t enter the autoen- coder , which produces reconstruction error e t (shown on arrow). The router directs data to normal or e vent node transformer pathways based on whether e t exceeds the learned threshold τ . Each pathway produces a prediction ( y N t + h , y E t + h ), and adaptiv e blending combines them into the ﬁnal forecast ˆ y t + h . The SA C controller observes ev aluation metrics and adjusts both τ and α (dashed blue arrows) to optimize forecasting accuracy . The term r e gime in this framew ork operates at two distinct lev els. At the primary le vel, the autoencoder performs a binary classiﬁcation of each trading day as either normal or anomalous based on whether its reconstruction error exceeds a learned threshold τ . This binary decision determines routing: days classiﬁed as normal are processed by the normal node transformer , while anomalous days are directed to the e vent node transformer . A binary primary classiﬁcation is chosen rather than a multi-class scheme for both practical and theoret- ical reasons. The autoencoder’ s reconstruction error is a scalar anomaly score that naturally lends itself to thresholding rather than clustering into multiple categories, and the fundamental distinction in anomaly detection is between in-distribution and out-of-distribution inputs. Attempting to subdivide anomalous states at the routing stage would require assumptions about the number and nature of anomaly cate gories that the unsupervised autoencoder is not designed to make; instead, that ﬁner-grained characterization is deferred to the ev ent pathway itself. W ithin the event pathway , a secondary level of regime characterization captures the heterogeneity of anomalous pe- riods. The e vent context vector c t provides the ev ent node transformer with descriptiv e features including a VIX-based volatility classiﬁcation into three levels (low , medium, or high, determined by training-period terciles), sentiment spike indica- tors, earnings event proximity , and cross-asset stress measures. This secondary characterization does not constitute a sepa- rate routing mechanism; rather , it conditions the ev ent trans- former’ s internal representations by supplying information about the nature of the detected anomaly . An earnings-driven disruption during a period of otherwise low market volatility produces dif ferent price dynamics than a systemic sell-off during an already elev ated volatility regime, and the context vector enables the transformer to learn these distinctions from the data. The normal transformer does not recei ve this additional context because it processes in-distribution samples 4 where market dynamics follow the stable patterns learned dur- ing the autoencoder’ s training phase, making regime-speciﬁc conditioning unnecessary . B. F eature Engineering Input features follow established methodologies for ﬁnancial time series. For each stock i at time t , the raw feature vector comprises: x raw i,t = [ O t , H t , L t , C t , V t ] (1) where O t , H t , L t , C t denote open, high, low , and closing prices, and V t is trading volume (collecti vely referred to as OHLCV data). T echnical indicators include simple moving av erages (SMA) at 5, 10, and 20-day windows, exponen- tial moving av erages (EMA) at matching windows, 14-day Relativ e Strength Index (RSI), Mo ving A verage Con ver gence Div ergence (MA CD) with standard (12, 26, 9) parameters, daily returns, log returns, and 20-day rolling volatility . Figure 2 illustrates this pipeline. Raw OHLCV Data SMA 5, 10, 20 EMA 5, 10, 20 RSI, MACD Returns Rolling V olatility Z-Score Normalization (Expanding Windo w) Prediction Features x i,t ∈ R 17 Router Features x router i,t ∈ R 6 Fig. 2. Feature engineering pipeline. Raw OHLCV data is processed through technical indicator computations (SMA, EMA, RSI, MACD, volatility). All features undergo expanding-window z-score normalization to prevent look- ahead bias, producing prediction features and router-speciﬁc features. In addition to prediction features, router-speciﬁc features capture regime-rele v ant signals: x router i,t = [ σ (5) t , σ (20) t , ∆ VIX t , ρ ∆ t , | S t | , ν post t ] (2) where σ ( k ) t is k -day rolling volatility , ∆ VIX t is VIX per - centage change, ρ ∆ t is change in av erage pairwise correlation among stocks, | S t | is absolute sentiment magnitude, and ν post t is post velocity (count of X posts mentioning crisis- related keyw ords within the trading day). Sentiment enters the router as an absolute value because the router’ s function is anomaly detection rather than directional prediction. For the purpose of identifying unusual market conditions, the magni- tude of sentiment deviation is the rele vant signal: both strongly negati ve sentiment (indicating panic) and strongly positiv e sentiment (indicating euphoria or speculative excess) represent departures from typical market behavior . The signed sentiment score S t is retained in the full prediction feature vector x i,t that reaches the node transformers, so directional information contributes to the price forecasts themselves ev en though the routing decision depends only on sentiment intensity . These six router features are chosen to capture both gradual shifts (rolling volatility , correlation changes) and abrupt events (VIX spikes, sentiment surges, social media clustering). Missing v alues in price data (due to trading halts or data gaps) are handled through temporally-aware imputation. For training data, short gaps (1-2 trading days) use linear interpo- lation between surrounding kno wn values. F or validation and test data, only forward-ﬁlling from the most recent observed value is applied to ensure no future information leaks into predictions. T echnical indicators (SMA, EMA, RSI, MACD) are computed only after imputation, using the forward-ﬁlled values. Normalization uses e xpanding-window z-scores to pre vent look-ahead bias. During training, each feature is standardized using the mean and standard deviation computed over all av ailable data from the start of the training period up to time t : ˜ x i,t = x i,t − µ 1: t σ 1: t (3) where µ 1: t and σ 1: t are the cumulati ve mean and standard deviation from the ﬁrst training observation through time t . This expanding windo w ensures that normalization at each time step uses only past information. During v alidation and testing, normalization statistics are ﬁxed at the full training- period values ( µ 1: T train and σ 1: T train ), ensuring that no informa- tion from the ev aluation period inﬂuences standardization. C. Autoencoder for Regime Detection The autoencoder learns a compressed representation of normal market dynamics. It is trained exclusiv ely on data from stable market periods, deﬁned during training as days where VIX remains below the 75th percentile of its training-period distribution. Figure 3 presents the detailed architecture. x t d in 64 32 z t d z 32 64 ˆ x t d in e t = ∥ x t − ˆ x t ∥ 2 2 W 1 W 2 W 3 W 4 ReLU ReLU ReLU ReLU Encoder f enc Decoder f dec Fig. 3. Autoencoder architecture for regime detection. The encoder com- presses the input feature vector through two hidden layers (64, 32 units) to a latent representation z t of dimension d z = 32 . The decoder reconstructs the input through symmetric layers. Reconstruction error e t serves as the anomaly score for regime classiﬁcation. 1) Ar chitectur e and T raining: The encoder maps the con- catenated feature vector to a latent representation through two hidden layers: z t = f enc ( x t ) = ReLU ( W 2 · ReLU ( W 1 x t + b 1 ) + b 2 ) (4) where x t ∈ R d in is the input feature vector , z t ∈ R d z is the latent representation with d z = 32 , and W 1 ∈ R 64 × d in , 5 W 2 ∈ R 32 × 64 are weight matrices. The decoder reconstructs the input through a symmetric architecture: ˆ x t = f dec ( z t ) = W 4 · ReLU ( W 3 z t + b 3 ) + b 4 (5) where W 3 ∈ R 64 × 32 and W 4 ∈ R d in × 64 . The autoencoder is trained to minimize reconstruction loss over the stable- period data: L AE = 1 T T X t =1 ∥ x t − ˆ x t ∥ 2 2 (6) T raining uses the Adam optimizer with learning rate 10 − 3 , batch size 64, for a maximum of 20 epochs with early stopping based on v alidation reconstruction loss. 2) Anomaly Scor e and Routing: At inference time, the re- construction error serves as an anomaly score e t = ∥ x t − ˆ x t ∥ 2 2 . Data points with e t exceeding threshold τ are classiﬁed as anomalous and routed to the event pathway , while those below τ proceed through the normal pathway . The threshold τ is initialized at the 95th percentile of training-set reconstruction errors and subsequently adjusted by the SA C controller . D. Dual Node T ransformer Ar chitectur e T wo node transformer networks process data depending on regime classiﬁcation. Both follow the same base architectural design (6 layers, 8 attention heads, 512 model dimension) but maintain independent weights trained on different data subsets, and the ev ent pathway accepts a larger input due to additional context features. Figure 4 illustrates the dual pathway structure. Router: e t ≷ τ Normal Path ( e t < τ ) Event Path ( e t ≥ τ ) Input: x i,t Stock Embedding Temporal Encoding Multi-Head Attention × 6 Feed-Forward + LayerNorm Prediction Head y normal i,t + h Input: [ x i,t ∥ c t ] Stock + Event Embedding Temporal Encoding Multi-Head Attention × 6 Feed-Forward + LayerNorm Prediction Head y event i,t + h Adaptive Blending: ˆ y = αy normal + (1 − α ) y event Event Context c t : - Regime embedding - Sentiment spike - Days to earnings - Cross-asset stress Fig. 4. Dual node transformer architecture. The router directs data based on reconstruction error . The normal pathway (left, orange) processes typi- cal market conditions with base features. The event pathway (right, blue) augments inputs with ev ent context features c t . Both pathways follow the same architectural design (layer count, attention heads, model dimension) but maintain independently trained weights and differ in input dimensionality , as the e vent pathway accepts additional context features. Outputs are blended with adapti ve weight α . 1) Normal Node T ransformer: The normal pathway pro- cesses typical market conditions using the node transformer ar- chitecture [19], which extends standard transformers to graph- structured data by incorporating relational inducti ve biases into the attention mechanism. The stock market is represented as a graph G = ( V , E ) with N = 20 stock nodes and a fully- connected edge set. While graph neural networks are often applied to larger graphs, the N = 20 design balances cross- sectional breadth against temporal depth (252-day sequences per stock), and ablation results conﬁrm that the graph structure contributes 7% MAPE improv ement (T able VII), indicating that cross-sectional dependencies carry predictive value e ven at this scale. Each stock i recei ves a learned embedding s i ∈ R d s that captures persistent stock-speciﬁc characteristics such as sector behavior and volatility proﬁle. The input representation for stock i at time t concatenates the normalized feature v ector with temporal encoding and the stock embedding: h (0) i,t = [ x i,t ∥ TE ( t ) ∥ s i ] ∈ R d in (7) T emporal encoding follo ws V aswani et al. [3], us- ing sinusoidal positional encodings where TE ( t, 2 k ) = sin( t/ 10000 2 k/d ) and TE ( t, 2 k + 1) = cos( t/ 10000 2 k/d ) for dimension index k and model dimension d = 512 . This encoding allows the model to distinguish trading days and capture periodic patterns at multiple frequencies. Edge weights in the graph are initialized from sector rela- tionships and return correlations computed strictly on training data (1982-2010): e (0) ij = 0 . 5 · δ sector ( i, j ) + 0 . 5 · max(0 , ρ train ij ) (8) where δ sector ( i, j ) = 1 if stocks i and j share the same sector classiﬁcation and 0 otherwise, and ρ train ij is the Pearson correlation of daily returns computed over the training period only , preventing any leakage from validation or test data. During training, edge weights are reﬁned through a learnable function e ( ℓ +1) ij = σ ( w T e [ h ( ℓ ) i ∥ h ( ℓ ) j ] + b e ) , where σ is the sigmoid function and h ( ℓ ) i is the node representation at layer ℓ . This allows the model to discover relationship patterns not captured by initial sector and correlation priors. Figure 5 illustrates the resulting graph structure with representativ e edge weights. At each layer, the node transformer applies multi-head self- attention with causal masking to jointly process all stocks across the temporal dimension. The input representations are projected into queries, keys, and values through learned linear transformations Q = XW Q , K = XW K , V = XW V , and attention output is computed as: A = softmax  QK T √ d k + M + E  V (9) where d k = 64 is the key dimension, M is the causal mask with M ab = −∞ if a < b and M ab = 0 other- wise, and E ∈ R N × N is the learned edge weight matrix. The additive graph bias allows content-based attention (via QK T ) and structural priors (via E ) to jointly determine how 6 AAPL MSFT CRM JPM JNJ UNH PFE XOM CVX KO PG 0.78 0.65 0.82 ■ T echnology ■ Financial ■ Healthcare ■ Energy ■ Consumer Fig. 5. Graph representation of stock relationships (representati ve subset of 11 stocks sho wn for clarity; full graph contains all 20 stocks). Nodes represent indi vidual stocks, colored by sector . Solid edges indicate same-sector connections with higher learned weights (annotated values sho w correlation- based initialization from training data). Dashed edges represent weaker cross- sector correlations that are learned during training. information ﬂo ws between stocks at each layer, ensuring that predictions at time t use only information from times up to and including t . The architecture uses H = 8 attention heads, each operating in 64 dimensions, yielding a total model dimension of d model = 512 . Each transformer layer follows the standard pre-norm resid- ual pattern. The multi-head attention output is added to the input through a residual connection, followed by layer normal- ization. The normalized output then passes through a position- wise feed-forward network consisting of two linear transfor- mations with a ReLU activ ation, expanding the representation to d ff = 2048 dimensions before projecting back to 512. A second residual connection and layer normalization follow the feed-forward block. Dropout at rate 0.1 is applied after both the attention and feed-forward sublayers. The architecture stacks 6 such layers, with the output of the ﬁnal layer fed into a prediction head consisting of a linear projection from the model dimension to a single scalar price prediction per stock. Figure 6 illustrates the detailed layer structure. 2) Event Node T ransformer: The e vent pathway augments the base architecture with additional inputs capturing regime- speciﬁc information. The input to the event transformer con- catenates the standard feature vector with an event context vector , x ev ent i,t = [ x i,t ∥ c t ] , where c t ∈ R d c with d c = 12 . This vector comprises four groups of features. A learned regime embedding r t ∈ R 4 maps the current VIX regime (low , medium, or high, determined by training-period VIX terciles) through a trainable embedding layer . A sentiment spike component s t ∈ R 2 encodes a binary ﬂag and scaled magnitude when daily sentiment exceeds two standard de- viations of training-period sentiment. An ev ent characteriza- tion component a t ∈ R 4 captures proximity to scheduled earnings announcements (days-to-announcement, normalized), historical earnings surprise magnitude for the stock, a bi- nary earnings-windo w indicator , and sector-av erage surprise. Input X ( ℓ ) Multi-Head Attention (8 heads) Add & LayerNorm FFN (512 → 2048 → 512) Add & LayerNorm Output X ( ℓ +1) Dropout 0.1 Dropout 0.1 × 6 layers Fig. 6. Single transformer layer architecture. Input passes through multi- head self-attention, residual connection with layer normalization, feed-forward network, and another residual connection with normalization. The architecture stacks 6 such layers. Finally , a cross-asset stress vector ¯ e cross t ∈ R 2 encodes the mean and standard deviation of reconstruction error across all 20 stocks at time t , distinguishing systemic anomalies (high mean error) from idiosyncratic ones (high variance). The full context vector is the concatenation c t = [ r t ∥ s t ∥ a t ∥ ¯ e cross t ] . Architecturally , the e vent transformer differs from the nor- mal pathway in two respects beyond its independently trained weights. First, the input projection layer is wider: while the normal transformer’ s ﬁrst linear layer maps from d in dimensions (the concatenation of market features, temporal encoding, and stock embedding), the e vent transformer maps from d in + d c dimensions to accommodate the appended context vector . This wider projection maps back to the shared model dimension of d model = 512 before entering the ﬁrst transformer layer , so all subsequent layers (the 6 transformer blocks, feed-forward networks, and prediction head) operate at the same dimensionality as the normal pathway . Second, the ev ent pathway includes a trainable embedding layer that maps the discrete VIX regime label (one of three categories) to the continuous regime embedding r t ∈ R 4 . This embedding layer is an additional learnable component with no counterpart in the normal pathway , adding 3 × 4 = 12 trainable parameters for the three regime cate gories. The remaining context features ( s t , a t , ¯ e cross t ) are continuous values that enter the context vector directly without additional learned transformations. 3) P athway Blending: Rather than hard routing, predictions from both pathways are blended with an adaptive weight: ˆ y i,t + h = α t · y normal i,t + h + (1 − α t ) · y ev ent i,t + h (10) The blending coef ﬁcient α t ∈ [0 , 1] is determined by the SA C controller based on current market state. During high- conﬁdence normal periods, α t approaches 1; during clear anomalies, it approaches 0. Intermediate v alues enable smooth transitions and hedge against misclassiﬁcation. 7 E. Soft Actor-Critic Contr oller The SA C controller learns to conﬁgure the prediction system by adjusting the autoencoder threshold τ and blending weight α based on observed prediction performance. Although these are only two scalar parameters, the optimization landscape is non-trivial: the rew ard signal is delayed (prediction er- rors are observed only after the threshold decision), noisy (ﬁnancial returns are inherently stochastic), and non-stationary (optimal thresholds shift as market regimes evolv e). SA C is well suited to this setting because its entropy regularization prev ents premature con vergence to ﬁxed threshold v alues, and its off-polic y learning with experience replay enables sample- efﬁcient adaptation from sparse, delayed feedback. Simpler alternativ es such as grid search or bandit methods typically assume stationary reward distributions [20] and cannot adapt continuously to shifting regime dynamics. Figure 7 presents the actor -critic network architecture. State s t [ e t , ¯ e, σ t , RMSE t − 1 , DA t − 1 , α t − 1 , τ t − 1 ] FC(256) + ReLU FC(256) + ReLU µ ϕ , log σ ϕ Action a t = [∆ τ, ∆ α ] FC(256) + ReLU FC(256) + ReLU Q θ 1 ( s, a ) FC(256) + ReLU FC(256) + ReLU Q θ 2 ( s, a ) min( Q θ 1 , Q θ 2 ) Actor π ϕ T win Critics Fig. 7. Soft Actor-Critic network architecture. The actor network (left, orange) maps states to a Gaussian policy over actions [∆ τ , ∆ α ] . T win critic networks (right, green/cyan) estimate Q-values; the minimum is used to prev ent overestimation. All networks use two hidden layers with 256 units and ReLU activation. 1) Markov Decision Pr ocess F ormulation: The control problem is formulated as a Marko v Decision Process (MDP). The state s t comprises: s t = [ e t , ¯ e t − k : t , σ t , RMSE t − 1 , D A t − 1 , α t − 1 , τ t − 1 ] (11) including current reconstruction error, recent error history ov er the past k = 5 trading days (one week), volatility , previous prediction metrics, and current parameter settings. The action space consists of continuous adjustments a t = [∆ τ , ∆ α ] ∈ [ − 0 . 1 , 0 . 1] 2 to threshold and blending weight, clipped to maintain τ ∈ [ e min , e max ] and α ∈ [0 , 1] . The reward signal combines prediction accuracy and stability: r t = − RMSE t − λ dir · (1 − D A t ) − λ stable · | ∆ τ | (12) where λ dir = 0 . 5 weights directional accuracy and λ stable = 0 . 1 penalizes threshold instability to prev ent oscillation. 2) SAC Algorithm: SA C maximizes the entropy-regularized objectiv e: J ( π ) = T X t =0 E [ r t + α ent H ( π ( ·| s t ))] (13) where H is policy entropy and α ent is the temperature parameter controlling exploration. The actor network π ϕ ( a | s ) outputs a Gaussian distribution ov er actions, π ϕ ( a | s ) = N ( µ ϕ ( s ) , σ ϕ ( s ) 2 ) , while tw o critic networks Q θ 1 , Q θ 2 esti- mate action values. T o prev ent o verestimation, the minimum of both critics is used: Q ( s, a ) = min( Q θ 1 ( s, a ) , Q θ 2 ( s, a )) (14) All networks are feed-forward with two hidden layers of 256 units each. T raining uses the Adam optimizer with learning rate 3 × 10 − 4 , soft target updates with τ soft = 0 . 005 , and replay b uffer size 10 5 . 3) T raining Pr otocol: The SA C controller is trained after the autoencoder and node transformers are pre-trained. T rain- ing begins by initializing τ at the 95th percentile of training reconstruction errors and α = 0 . 5 . At each step, the controller computes predictions using the current τ and α , ev aluates them against actual outcomes to obtain the reward signal, updates the SA C networks from collected transitions, and applies the learned action adjustments to both parameters. This loop runs for 50 epochs with 1000 steps per epoch. T emperature α ent is automatically tuned to tar get entropy − dim( a ) following Haarnoja et al. [25]. F . T raining Pipeline Figure 8 illustrates the complete multi-stage training pipeline. Stage 1: AE 20 epochs Stable data Stage 2: NodeFormer 60 epochs Normal + Event Stage 3: SAC 50 epochs Learn τ , α Stage 4: Fine-tune 20 epochs All unfrozen Training Progression Fig. 8. Multi-stage training pipeline. Stage 1 trains the autoencoder on stable market data. Stage 2 trains both node transformers on their respectiv e data subsets. Stage 3 trains the SA C controller with frozen prediction components. Stage 4 performs end-to-end ﬁne-tuning with all weights unfrozen. The complete training pipeline proceeds in four stages. In Stage 1 (20 epochs), the autoencoder is trained on stable- period data where VIX falls below the 75th percentile of its training-period distribution. Stage 2 (60 epochs) trains both node transformers: the normal pathway on data with low reconstruction error (below the 95th percentile), and the ev ent pathway on high-error data augmented with context features. In Stage 3 (50 epochs), autoencoder and node transformer weights are frozen while the SAC controller learns to optimize the threshold and blending parameters. Finally , Stage 4 (20 epochs) unfreezes all components for end-to-end ﬁne-tuning with reduced learning rates. The architecture is modular by design: Stages 1 and 2 produce a fully functional prediction system in which the routing threshold and blending weight remain at their initialization values ( τ at the 95th percentile of training-set reconstruction errors, α = 0 . 5 ). Stage 3 adds adaptiv e control on top of this static conﬁguration, allowing the experimental ev aluation to quantify the marginal contri- bution of the SA C controller by comparing the system with and without it. At inference time, all weights—including the SA C policy network—are frozen. The policy produces state- dependent routing decisions through its ﬁxed learned mapping, 8 with no gradient updates or reward computation during the test period. G. Loss Functions The prediction networks minimize a composite loss: L = λ 1 L MSE + λ 2 L DIR + λ 3 L REG (15) where L MSE = 1 N P i,t,h ( y i,t + h − ˆ y i,t + h ) 2 is the mean squared error between predicted and actual prices. The direc- tional loss L DIR is a binary cross-entropy term that explicitly rew ards correct prediction of price mov ement direction, since minimizing magnitude error alone does not guarantee direc- tional accuracy: L DIR = − 1 N X i,t,h [ d i,t,h log p i,t,h + (1 − d i,t,h ) log (1 − p i,t,h )] (16) where d i,t,h = I ( y i,t + h > y i,t ) is the true direction indicator and p i,t,h is the predicted probability of a price increase. The regularization term L REG = ∥ θ ∥ 2 2 applies L2 weight decay over all trainable parameters θ , penalizing large weight magnitudes to prev ent overﬁtting. Loss weights are λ 1 = 1 . 0 , λ 2 = 0 . 5 , λ 3 = 10 − 4 . T able I summarizes all model h yperparameters across the three components. T ABLE I M O DE L H Y P E RPA R AM E T E RS Component Parameter V alue Autoencoder Hidden layers [64, 32] Latent dimension 32 Learning rate 10 − 3 T raining epochs 20 Node Transformer Layers 6 Attention heads 8 Model dimension 512 FFN dimension 2048 Dropout 0.1 Learning rate 10 − 4 Input sequence length 252 days SA C Controller Hidden layers [256, 256] Learning rate 3 × 10 − 4 Soft update τ 0.005 Replay buf fer 10 5 T raining epochs 50 Stage 4 Fine-tuning AE learning rate 10 − 4 Node Transformer learning rate 10 − 5 SA C learning rate 3 × 10 − 5 I V . E X P E R I M E N T S A N D R E S U LT S A. Dataset and Experimental Setup The dataset comprises two complementary data streams for 20 S&P 500 stocks spanning January 1982 to March 2025. The Financial Market Data (FMD) stream consists of daily OHLCV (open, high, lo w , close, v olume) price data sourced from Y ahoo Finance, providing adjusted close prices that ac- count for stock splits and dividends. Each trading day produces a ﬁv e-dimensional price vector per stock alongside the trading volume, from which 11 additional technical indicators are deriv ed (SMA, EMA, RSI, MA CD, returns, log returns, and rolling volatility) as described in Section III. The sentiment stream dra ws on tw o datasets. The ﬁrst is the Market Sentiment Evaluation (MSE) dataset [30], a publicly av ailable corpus of ﬁnance-related social media messages annotated by ﬁnancial experts with sentiment scores in [ − 1 , +1] , which serves as ground truth for ﬁne-tuning the BER T sentiment classiﬁer . The second is the Comprehensi ve Stock Sentiment (CSS) dataset, which was introduced in [5] and w as constructed using the X (formerly T witter) API through systematic searches for posts mentioning the 20 stock tickers, yielding approximately 4.2 million posts cov ering January 2007 to March 2025. The ﬁne-tuned BER T model is applied to the CSS corpus to generate daily sentiment scores for each stock, which are then aggregated and fed into the prediction framework as additional input features. T able II lists the complete stock universe. Stocks were selected to span nine distinct sectors, ensuring that the graph structure captures both intra-sector and cross-sector depen- dencies. The selection also prioritizes variation in market capitalization, trading volume, and v olatility characteristics to ev aluate robustness across different stock proﬁles. For compa- nies with IPO dates after 1982 (e.g., Salesforce incorporated 1999, Netﬂix 2002, V isa 2008), data begins at their ﬁrst av ailable trading date, and these stocks are included in training only from their listing date onward. T ABLE II S TO C K U N I VE R S E : 2 0 S & P 5 0 0 C O N ST I T U EN T S A CR OS S 9 S E C TO R S Sector Stock (Ticker) Data Start T echnology Apple (AAPL) 1982 Microsoft (MSFT) 1986 Salesforce (CRM) 1999 Financial Services JPMorgan Chase (JPM) 1982 V isa (V) 2008 Healthcare Johnson & Johnson (JNJ) 1982 UnitedHealth Group (UNH) 1984 Pﬁzer (PFE) 1982 Retail W almart (WMT) 1982 Home Depot (HD) 1982 Energy ExxonMobil (XOM) 1982 Chevron (CVX) 1982 Consumer Goods Procter & Gamble (PG) 1982 Coca-Cola (KO) 1982 Nike (NKE) 1982 McDonald’ s (MCD) 1982 Entertainment Netﬂix (NFLX) 2002 T elecommunications V erizon (VZ) 1982 Industrials Boeing (BA) 1982 Caterpillar (CA T) 1982 T emporal splits maintain strict chronological separation to pre vent any leakage of future information into training. 9 The training set spans January 1982 to December 2010 (approximately 70% of the temporal range), encompassing multiple market cycles including the 1987 crash, the dot- com bubble and its collapse, and the 2008 ﬁnancial crisis. The validation set cov ers January 2011 to December 2016 (approximately 15%), a period of relativ ely steady recovery used for hyperparameter tuning and early stopping. The test set spans January 2017 to March 2025 (approximately 15%), including the 2018 correction, the 2020 CO VID crash, and the 2022 market decline, which provide rigorous ev aluation under div erse volatility conditions. Since the X platform (formerly T witter) was founded in 2006, sentiment data covers January 2007 to March 2025. For the 1982-2006 portion of training, sentiment features are set to zero, meaning the model learns to operate with and without sentiment depending on the data period. During the validation and test periods, full sentiment coverage is av ailable. B. Evaluation Metrics Model performance is assessed using ﬁve complementary metrics. The primary metric is Mean Absolute Percentage Error (MAPE), deﬁned as: MAPE = 100 N N X i =1     y i − ˆ y i y i     (17) MAPE provides intuitive interpretation as percentage devi- ation from actual prices. As a complementary measure, Root Mean Squared Error (RMSE) penalizes large errors more heavily due to the squaring operation and is computed in normalized price units, where each stock’ s prices are z-scored individually to enable fair cross-stock aggregation: RMSE = v u u t 1 N N X i =1 ( y i − ˆ y i ) 2 (18) Directional Accuracy (D A) measures the proportion of correctly predicted price movement directions, which is par- ticularly relevant for trading applications where the sign of the predicted mov e often matters more than its magnitude: D A = 100 N N X i =1 I ( sign ( ˆ y i,t + h − y i,t ) = sign ( y i,t + h − y i,t )) (19) Theil’ s U statistic provides a scale-independent benchmark by comparing forecast error to that of a nai ve random walk that predicts tomorro w’ s price as today’ s price: U = p P t ( y t +1 − ˆ y t +1 ) 2 p P t ( y t +1 − y t ) 2 (20) V alues of U < 1 indicate that the model outperforms the naiv e baseline, making this metric particularly informativ e for long time series where absolute price level changes can affect percentage-based measures. Finally , the Conﬁdence T racking Rate (CTR) captures the proportion of predictions where model conﬁdence (measured as in verse prediction variance across the dual pathway outputs) agrees with actual accuracy: CTR = 1 N T X i,t I (( conf i,t > ¯ c ) = ( | ˆ y i,t + h − y i,t + h | < ¯ ϵ )) (21) where conf i,t is the in verse prediction variance from the two pathways, ¯ c is the median conﬁdence, and ¯ ϵ is the median absolute error across all predictions. CTR indicates whether the model “knows when it knows, ” a property v aluable for risk-sensitiv e do wnstream applications. C. Baseline Models Baselines span statistical methods (ARIMA, V AR, MS- V AR [31]), classical machine learning (Random Forest, SVR, XGBoost), deep learning (LSTM, Simple T ransformer), mul- timodal and regime-switching approaches (BER T Sentiment + LSTM, HMM-LSTM), recent time-series transformers (T imes- Net [32], PatchTST [33], iT ransformer [34]), and the Inte- grated NodeFormer -BER T model from prior work [5]. T o ensure fair comparison, all baselines share the same experimental conditions where ver the model class permits. Every model uses identical temporal splits (training: 1982– 2010, validation: 2011–2016, test: 2017–2025), the same expanding-windo w z-score normalization described in Sec- tion III, and the same missing-data imputation strate gy . Models capable of multiv ariate input (Random Forest, SVR, XG- Boost, LSTM, Simple T ransformer , BER T Sentiment + LSTM, T imesNet, P atchTST , iT ransformer , HMM-LSTM, Integrated NodeFormer -BER T) receiv e the same 17-dimensional feature vector comprising OHLCV data and 11 deri ved technical indicators. Daily sentiment scores produced by the ﬁne- tuned BER T classiﬁer are appended to the feature set for all multiv ariate models, so that any advantage from sentiment information is av ailable to baselines as well as to the proposed framew ork. ARIMA operates on the uni variate closing price series for each stock independently , V AR jointly models the closing prices of all 20 stocks, and MS-V AR extends the V AR speciﬁcation with Mark ov-switching regime dynamics o ver the same joint price series, since these statistical methods are not designed to incorporate arbitrary exogenous feature vectors. The target variable for all models is identical: the closing price at horizon h ∈ { 1 , 5 , 20 } trading days ahead. The Simple Transformer baseline uses the same encoder architecture as the proposed node transformer (6 layers, 8 at- tention heads, 512-dimensional representations) but processes each stock’ s time series independently without graph structure or inter-stock attention, isolating the contribution of graph- based relational modeling. The BER T Sentiment + LSTM baseline combines the same BER T -deriv ed sentiment scores with a two-layer LSTM through concatenation-based fusion, testing whether the attention-based integration in the proposed architecture pro vides meaningful improv ement ov er straight- forward feature combination. The Integrated NodeFormer - BER T model reproduces our prior work [5] with its published hyperparameters, serving as the primary single-pathway base- line against which architectural additions are measured. 10 PatchTST [33] segments each stock’ s multi variate input time series into ov erlapping patches and applies a transformer encoder with self-attention over the patch sequence, capturing local temporal patterns within patches and long-range depen- dencies across them. Its channel-independent design processes each feature dimension separately before aggregating predic- tions, which limits its ability to model cross-feature interac- tions. iTransformer [34] in verts the con ventional transformer architecture by applying self-attention across the v ariate (fea- ture) dimension rather than the temporal dimension, enabling it to capture dependencies among price, volume, technical indicators, and sentiment features at each time step. Times- Net [32] extends temporal modeling by transforming one- dimensional time series into two-dimensional tensors based on learned multi-periodicity structure, applying inception-based con volution blocks to capture both intra-period and inter- period variation. All three recent time-series transformers process each stock’ s feature set independently without graph structure or cross-stock attention, isolating the contrib ution of relational modeling in the proposed framew ork. The Markov-Switching V AR (MS-V AR) [31] extends the V AR baseline with K = 3 latent regime states governed by a ﬁrst-order Markov chain, allowing the intercepts and error covariance to vary across regimes while the autore- gressiv e coefﬁcients remain regime-in variant (MSIH speciﬁ- cation). Regime transitions are inferred through maximum likelihood estimation of the full joint model, in contrast to our autoencoder-based approach which detects anomalies from reconstruction error without specifying the number or parametric structure of regimes a priori. The HMM-LSTM baseline combines a Hidden Markov Model with K = 3 states for regime detection with three regime-speciﬁc two- layer LSTMs, each trained on data assigned to its correspond- ing regime by the V iterbi decoder . At inference, the HMM identiﬁes the most likely current regime and routes the input to the corresponding LSTM, producing a regime-conditional forecast. This architecture provides the most direct comparison to our framework: it replaces the autoencoder with an HMM for regime detection, the node transformers with LSTMs for prediction, and omits adaptiv e control entirely , using ﬁxed routing with no blending across pathways. Each baseline underwent hyperparameter tuning via grid search on validation data, with the search ranges and selected values reported in T able X (Appendix). D. Main Results Results are reported for two v ariants of the proposed frame- work. The full model (AE-NodeFormer + SAC) includes all components and completes all four training stages. The ablated variant (AE-NodeFormer , no SA C) retains the autoencoder and dual node transformers but remo ves the reinforcement learning controller entirely , skipping Stage 3 of the training pipeline. In this variant, the routing threshold is ﬁxed at τ = e 95 , the 95th percentile of training-set reconstruction errors, which is the same initialization used by the full model before SA C adaptation begins. This percentile is a standard choice in anomaly detection, classifying the top 5% of reconstruction errors as anomalous. The blending weight is held constant at α = 0 . 5 , assigning equal contribution to both pathways regardless of market conditions. Comparing the two variants isolates the contribution of adaptiv e parameter tuning from the architectural beneﬁts of autoencoder routing and dual-pathway specialization. T able III presents 1-day ahead closing price prediction results across all baselines and proposed v ariants. T ABLE III 1 - D A Y A H EA D C L O S IN G P R IC E P R E D IC T I O N R ES U LTS . B E ST R E SU LT S I N B O LD . Model MAPE RMSE DA Theil’ s U CTR ARIMA [35] 1.20% 1.35 55% 0.98 51% V AR [36] 1.10% 1.30 56% 0.95 52% MS-V AR [31] 1.02% 1.22 57% 0.90 53% Random Forest [37] 1.10% 1.25 57% 0.92 53% SVR [38] 1.20% 1.40 54% 1.02 50% XGBoost [39] 1.00% 1.15 59% 0.85 55% LSTM [40] 1.00% 1.20 58% 0.88 54% Simple Transformer [3] 0.90% 1.10 61% 0.80 57% BER T Sent. + LSTM [4] 0.90% 1.05 62% 0.78 58% HMM-LSTM [1] 0.87% 1.02 64% 0.76 60% T imesNet [32] 0.85% 1.00 63% 0.75 59% PatchTST [33] 0.83% 0.98 64% 0.74 59% iT ransformer [34] 0.82% 0.97 64% 0.73 61% Integrated NF-BER T [5] 0.80% 0.95 65% 0.72 62% AE-NodeFormer (no SA C) 0.68% 0.88 69% 0.68 64% AE-NodeFormer + SA C 0.59% 0.82 72% 0.64 67% The proposed AE-NodeFormer + SA C achie ves 0.59% MAPE, representing a 26% relati ve improvement ov er the Integrated NodeFormer -BER T baseline (0.80%) and a 28% improv ement over iT ransformer (0.82%), the strongest recent time-series transformer . Directional accuracy reaches 72%, a 7 percentage point gain over the graph-based baseline. Among regime-switching approaches, HMM-LSTM achiev es 0.87% MAPE, outperforming the basic LSTM (1.00%) by 13% through regime-speciﬁc specialization, yet still trailing the proposed model by 32%, indicating that the combina- tion of autoencoder-based anomaly detection, graph-aw are dual pathways, and adapti ve control provides substantially greater beneﬁt than parametric regime detection with indepen- dent LSTMs. The recent time-series transformers (T imesNet 0.85%, PatchTST 0.83%, iTransformer 0.82%) cluster near the Integrated NodeFormer -BER T (0.80%), conﬁrming that the prior single-pathway architecture was already competitiv e with current state-of-the-art forecasting models and that the improv ements in the present work stem from the regime- aware architectural innov ations rather than from a weak base- line. All pairwise improvements of the proposed model over iT ransformer , PatchTST , and HMM-LSTM are statistically signiﬁcant (Diebold-Mariano test [41], p < 0 . 001 in each case). T o contextualize the directional accuracy , we computed a naiv e long-only baseline: predicting “up” for e very day . Ov er the 2017-2025 test period, this nai ve strate gy achie ves 54% D A on av erage across the 20 stocks, reﬂecting the slight upward drift in equity markets. The 72% D A of the proposed model thus represents an 18 percentage point improvement ov er this trivial baseline, conﬁrming that the model captures predictiv e signal beyond simple market drift. 11 T o assess generalization across forecasting horizons, T a- ble IV and T able V present 5-day and 20-day ahead closing price results. T ABLE IV 5 - D A Y A H EA D C L O S IN G P R IC E P R E D IC T I O N R ES U LTS Model MAPE RMSE DA Theil’ s U CTR ARIMA [35] 2.05% 2.30 51% 1.05 47% V AR [36] 1.88% 2.10 52% 1.00 48% MS-V AR [31] 1.70% 1.90 53% 0.95 49% Random Forest [37] 1.92% 2.15 52% 1.02 48% SVR [38] 2.10% 2.35 50% 1.08 46% XGBoost [39] 1.68% 1.88 54% 0.92 50% LSTM [40] 1.65% 1.85 54% 0.93 50% Simple Transformer [3] 1.50% 1.68 56% 0.85 53% BER T Sent. + LSTM [4] 1.48% 1.65 57% 0.83 54% HMM-LSTM [1] 1.45% 1.60 59% 0.82 55% T imesNet [32] 1.40% 1.55 58% 0.81 56% PatchTST [33] 1.38% 1.52 59% 0.79 55% iT ransformer [34] 1.35% 1.50 59% 0.80 57% Integrated NF-BER T [5] 1.30% 1.45 61% 0.78 58% AE-NodeFormer (no SA C) 1.15% 1.32 64% 0.74 60% AE-NodeFormer + SA C 1.05% 1.25 67% 0.70 63% T ABLE V 2 0 -D A Y A H E A D C L O S IN G P R IC E P R E D IC T I O N R ES U LTS Model MAPE RMSE DA Theil’ s U CTR ARIMA [35] 3.10% 3.45 48% 1.12 44% V AR [36] 2.85% 3.20 49% 1.05 45% MS-V AR [31] 2.55% 2.85 50% 0.98 46% Random Forest [37] 2.90% 3.25 49% 1.08 45% SVR [38] 3.20% 3.55 47% 1.15 43% XGBoost [39] 2.60% 2.90 51% 0.98 47% LSTM [40] 2.50% 2.80 51% 0.96 47% Simple Transformer [3] 2.25% 2.52 53% 0.90 50% BER T Sent. + LSTM [4] 2.20% 2.45 54% 0.88 51% HMM-LSTM [1] 2.12% 2.35 55% 0.87 52% T imesNet [32] 2.08% 2.30 56% 0.85 52% PatchTST [33] 2.05% 2.28 55% 0.84 53% iT ransformer [34] 2.00% 2.22 56% 0.86 53% Integrated NF-BER T [5] 1.90% 2.15 57% 0.85 54% AE-NodeFormer (no SA C) 1.70% 2.00 60% 0.82 56% AE-NodeFormer + SA C 1.55% 1.85 63% 0.78 59% Performance improv ements persist across all prediction horizons. At the 5-day horizon, the proposed model achieves 1.05% MAPE compared to 1.30% for the Integrated NodeFormer -BER T and 1.35% for iTransformer , maintaining a 19% and 22% relati ve advantage respecti vely . At 20 days, these gaps widen further: the proposed model reaches 1.55% MAPE versus 1.90% for the graph-based baseline and 2.00% for iT ransformer , reﬂecting the increasing value of regime- aware routing as the prediction horizon extends and structural regime shifts become more consequential. Se veral statistical and classical ML baselines (ARIMA, V AR, Random Forest, SVR) produce Theil’ s U values exceeding 1.0 at the 20- day horizon, indicating that the y underperform the nai ve random walk at longer horizons—a well-known limitation of models without explicit temporal or regime-adapti ve struc- ture. In contrast, all transformer-based and re gime-switching models maintain Theil’ s U below 1.0 across all horizons. Directional accuracy for the proposed model declines from 72% at 1-day to 63% at 20-day , a more gradual degradation than iTransformer (64% to 56%) or HMM-LSTM (64% to 55%), suggesting that the combination of autoencoder routing and SA C adaptation captures structural signals that remain informativ e beyond short-term momentum. E. P er-Stoc k Results T o examine cross-sectional variation, T able VI presents 1- day ahead closing price results for all 20 stocks in the uni verse, grouped by sector . T ABLE VI P E R - S T OC K 1 - D AY A H EA D C L O S IN G P R I C E R E SU LT S ( A E - N O D E F OR M E R + S AC ) Sector Stock MAPE RMSE D A Theil’s U T echnology AAPL 0.62% 0.88 69% 0.68 MSFT 0.50% 0.72 74% 0.60 CRM 0.63% 0.89 70% 0.67 Financial Services JPM 0.70% 1.02 67% 0.72 V 0.48% 0.69 76% 0.58 Healthcare JNJ 0.44% 0.64 77% 0.55 UNH 0.52% 0.74 73% 0.61 PFE 0.58% 0.80 72% 0.64 Retail WMT 0.42% 0.62 78% 0.54 HD 0.48% 0.68 75% 0.59 Energy XOM 1.10% 1.38 63% 0.82 CVX 0.95% 1.22 64% 0.79 Consumer Goods PG 0.43% 0.62 77% 0.55 KO 0.44% 0.65 76% 0.56 NKE 0.56% 0.73 72% 0.63 MCD 0.45% 0.65 77% 0.56 Entertainment NFLX 0.75% 1.05 66% 0.75 T elecommunications VZ 0.47% 0.67 75% 0.58 Industrials B A 0.68% 0.94 68% 0.71 CA T 0.60% 0.81 71% 0.66 Mean 0.59% 0.82 72% 0.64 Individual stock performance spans from 0.42% MAPE (WMT) to 1.10% (XOM), with 16 of 20 stocks falling below 0.70%. The error distribution aligns with established differences in equity predictability . Defensiv e stocks with stable rev enue proﬁles—WMT (0.42%), PG (0.43%), JNJ (0.44%), KO (0.44%), and MCD (0.45%)—cluster at the low end regardless of sector classiﬁcation, suggesting that the predictability advantage stems from fundamental b usiness stability rather than broad sectoral factors. Energy stocks occupy the highest error positions, with both XOM (1.10%) and CVX (0.95%) exhibiting MAPE values roughly double the universe mean, consistent with the dominant inﬂuence of exogenous commodity price mov ements that the autoen- coder’ s feature-based reconstruction cannot fully anticipate. W ithin sectors, meaningful variation persists: in healthcare, JNJ (0.44%) substantially outperforms PFE (0.58%), plau- sibly reﬂecting Pﬁzer’ s heightened pipeline-driv en volatility during the test period; in technology , MSFT (0.50%) out- performs both AAPL (0.62%) and CRM (0.63%), consistent with dif ferences in rev enue div ersiﬁcation and product-cycle exposure. Financial services exhibit a similar spread, where V isa’ s stable payment-processing model yields considerably lower error (0.48%) than JPMorgan’ s sensitivity to interest 12 rate and credit dynamics (0.70%). Although the sample of two to four stocks per sector does not support formal sta- tistical claims about sectoral predictability , the consistency of observed patterns—both energy stocks at the top of the error distribution, ﬁ ve defensiv e consumer and healthcare names clustered near the bottom—suggests that stock-le vel characteristics such as earnings stability , commodity exposure, and idiosyncratic volatility interact with the regime detection mechanism in interpretable ways. Theil’ s U remains below 1.0 for all 20 stocks without exception, conﬁrming that the model outperforms the naiv e random-walk baseline across the full predictability spectrum. Directional accuracy ranges from 63% (XOM) to 78% (WMT), with e very stock exceeding the 54% nai ve long-only baseline reported in the main results, indicating that the regime-a ware routing mechanism provides meaningful predictiv e signal even for the most v olatile equities in the uni verse. F . Ablation Study T o quantify the contrib ution of each architectural compo- nent, T able VII reports results from systematically remo ving one component at a time, with all other components held constant or adapted to the reduced architecture as described below . The No SA C conﬁguration removes the reinforcement learning controller entirely , skipping Stage 3 of the training pipeline. The autoencoder and dual node transformers re- tain their Stage 1 and Stage 2 trained weights. The routing threshold is ﬁxed at τ = e 95 , the 95th percentile of training- set reconstruction errors, and the blending weight is held at α = 0 . 5 , assigning equal contribution to both pathw ays regard- less of market conditions. This variant isolates the beneﬁt of adaptiv e parameter tuning from the architectural contributions of regime-a ware routing and pathway specialization. The No Dual P aths conﬁguration replaces the two spe- cialized node transformers with a single node transformer that processes all data regardless of regime classiﬁcation. The single pathway retains the same architectural hyperpa- rameters as each individual pathway in the full model (6 layers, 8 attention heads, 512 model dimension), so that any performance dif ference reﬂects the architectural choice of pathway specialization rather than a difference in model capacity . The autoencoder is retained, and its reconstruction error e t together with a binary re gime indicator (determined by threshold τ ) are concatenated to the single pathway’ s input feature vector , providing regime context without architectural separation. The SAC controller remains active but operates ov er a reduced action space: it adjusts only τ to optimize the anomaly detection threshold, since the blending weight α is undeﬁned when a single pathway produces the output. This variant quantiﬁes the value of allocating independent representational capacity to normal and anomalous conditions, as opposed to conditioning a shared pathway on a regime signal. The No AE conﬁguration removes the autoencoder, which cascades into removing dual-pathway routing and the SA C controller , since both depend on the reconstruction error that the autoencoder produces. The resulting system is a single node transformer processing the standard feature set aug- mented with BER T sentiment scores, architecturally equi valent to the Integrated NodeFormer -BER T baseline from prior work [5]. This conﬁguration serves as the reference point from which the incremental contrib utions of regime-aw are routing, pathway specialization, and adaptiv e control are jointly mea- sured. T ABLE VII A B LAT IO N S T U DY : 1 - D A Y M A PE Conﬁguration MAPE ∆ vs Full Full Model (AE + Dual NF + SA C) 0.59% – No SAC (AE + Dual NF) 0.68% +15.3% No Dual Paths (AE + Single NF + SAC) 0.63% +6.8% No AE (Single NF + BER T , baseline) 0.80% +35.6% Autoencoder routing contributes most substantially , with its remov al increasing MAPE by 35.6% relativ e. This large degradation reﬂects the fact that the autoencoder provides the foundational regime signal upon which both pathway routing and adaptiv e control depend; its remov al eliminates the entire regime-a ware processing chain. The SA C controller contributes the next largest improvement at 15.3%, conﬁrming that adaptive tuning of τ and α based on prediction feedback materially outperforms static initialization, particularly during regime transitions where the optimal threshold and blending weight shift ov er time. The dual-pathway architecture con- tributes a smaller b ut meaningful 6.8% impro vement, indicat- ing that allocating independent weights to normal and anoma- lous conditions yields better representations than conditioning a single pathway on a binary regime indicator, e ven when both conﬁgurations receiv e the same regime information from the autoencoder . All components provide statistically signiﬁcant gains ( p < 0 . 01 via paired t-tests across stock-day predictions). G. V olatility Re gime Analysis T o e valuate regime-speciﬁc performance, T able VIII disag- gregates MAPE by VIX regime. T ABLE VIII 1 - D A Y M A PE B Y V O L A T I L IT Y R E GI M E Model Low VIX Medium VIX High VIX iT ransformer 0.72% 0.92% 1.42% HMM-LSTM 0.78% 0.95% 1.35% Integrated NodeFormer-BER T 0.70% 0.90% 1.50% AE-NodeFormer (no SAC) 0.60% 0.75% 1.10% AE-NodeFormer + SAC 0.52% 0.65% 0.85% The regime-speciﬁc results rev eal an instructiv e pattern. During lo w-VIX periods, iTransformer (0.72%) slightly out- performs HMM-LSTM (0.78%), as the superior representa- tional capacity of the transformer architecture dominates when market dynamics are stable and regime detection adds limited value. During high-VIX periods, this relationship reverses: HMM-LSTM (1.35%) outperforms iT ransformer (1.42%) be- cause its regime-speciﬁc LSTMs adapt to volatile conditions 13 ev en though its overall architecture is less expressi ve. The proposed model outperforms both across all VIX lev els, main- taining MAPE at 0.85% during high-volatility periods where iT ransformer reaches 1.42% and the Integrated NodeFormer - BER T baseline reaches 1.50%. The 40% relativ e improv ement ov er iT ransformer in high-VIX conditions, compared to 28% ov erall, conﬁrms that regime-aware processing is most valu- able in precisely the conditions where accurate forecasts matter most for risk management. H. SAC Contr oller Behavior Because the threshold τ and blending weight α vary across the test period (Figure 9), it is important to specify precisely what occurs at inference time and to address the resulting implications for baseline comparability . After Stage 4 training completes, all model weights are frozen, including the SA C actor and critic networks. During testing, the actor network operates as a ﬁxed deterministic function: giv en the current state s t , it outputs [∆ τ , ∆ α ] through a single forward pass with no gradient computation, no reward ev aluation, and no parameter update. The threshold and blending weight ev olve across the test period because the inputs to this ﬁxed function change—reconstruction errors rise during volatile markets, recent prediction metrics shift—not because the policy itself is modiﬁed. In this respect, the SAC policy at inference time is functionally equiv alent to any other feedforward neural network applied to streaming data: its parameters are static, but its outputs depend on input features that vary over time. This state-dependent inference behavior is not unique to the proposed frame work. The HMM-LSTM baseline per- forms analogous adaptive routing at test time: at each step, the HMM’ s forward algorithm computes regime posterior probabilities using ﬁxed transition and emission parameters learned during training, and these probabilities determine which regime-speciﬁc LSTM produces the forecast. The MS- V AR baseline similarly infers time-v arying regime proba- bilities through its ﬁxed Markov-switching parameters, ad- justing intercepts and error cov ariance accordingly . In all three cases—the proposed SA C policy , the HMM, and the Markov-switching model—a frozen statistical or neural model produces time-varying routing decisions from ﬁxed parameters applied to changing inputs. The proposed frame work thus does not enjoy an online learning adv antage relative to the re gime- switching baselines; rather, the three approaches represent alternativ e designs for the same underlying capability of state- dependent inference-time adaptation. A separate question is whether state-dependent routing confers an advantage ov er the purely static baselines (ARIMA, Random F orest, LSTM, Simple T ransformer , and others) that apply ﬁxed parameters uniformly across all market conditions. It does, and this advantage is by design: the central thesis of this work is that regime-aware processing improv es prediction quality . Crucially , the ablation study (T able VII) demonstrates that this advantage does not depend on the SAC controller . The AE-NodeFormer variant without SA C uses entirely static routing parameters ( τ = e 95 , α = 0 . 5 ) and achie ves 0.68% MAPE, which already outperforms ev ery baseline including iT ransformer (0.82%) and the Integrated NodeFormer-BER T (0.80%). The architectural contributions of autoencoder-based regime detection and dual-pathway specialization account for the majority of the improvement, with the SA C controller providing an additional 15.3% relativ e gain through its state- dependent reﬁnement of routing decisions. The comparison between the static No SA C v ariant and the baselines is there- fore on equal footing with respect to inference-time adaptation. Regarding the state features themselves, the previous-day prediction metrics (RMSE t − 1 , D A t − 1 ) included in the SA C state vector are computed by comparing the model’ s day- ( t − 1) forecast with the realized closing price, which is publicly av ailable at market open on day t . This introduces no information leakage: any practitioner would kno w whether yesterday’ s prediction was accurate before making today’ s forecast. The remaining state features—current reconstruction error e t , recent error history ¯ e t − k : t , and market volatility σ t —are computed entirely from the model’ s o wn outputs and observable market data, with no access to future prices. Figure 9 illustrates the threshold trajectory produced by the frozen policy . During stable periods, the policy maps low reconstruction errors to higher thresholds, routing most data through the normal pathway . When volatility increases and reconstruction errors rise, the same ﬁxed policy maps these elev ated states to lo wer thresholds, directing more data to the e vent pathway . The blending weight α follows a complementary pattern, reducing normal pathway contribution during detected anomalies. 0 20 40 60 80 100 0 0 . 2 0 . 4 0 . 6 0 . 8 1 COVID 2022 T est Period (2017-2025) Threshold τ Learned τ Static τ (No SAC) Fig. 9. Threshold τ produced by the frozen SA C policy over the test period. Dips correspond to volatile periods (CO VID crash around index 35, 2022 market decline around index 75) where elev ated reconstruction errors cause the ﬁxed policy to output lower threshold values, routing more data through the event pathway . The dashed red line shows the static threshold ( τ = e 95 ) used in the No SAC ablation variant. All policy weights are frozen after training; the trajectory reﬂects state-dependent outputs from a ﬁxed function, not online learning. The threshold trajectory re veals interpretable beha vior . Dur- ing the pre-CO VID period (2017-2019), the frozen policy maps the prev ailing low reconstruction errors to a relatively high threshold ( τ ≈ 0 . 70 - 0 . 75 ), routing the majority of data through the normal pathway . As the CO VID crash unfolds in early 2020, the spike in reconstruction errors causes the policy to output large negati ve ∆ τ adjustments, dropping the threshold sharply to approximately 0.35 and activ ating the ev ent pathway for most stocks. Recov ery is gradual: as reconstruction errors slowly normalize, the policy produces 14 small positi ve adjustments that return the threshold to pre- crisis lev els over se veral months rather than snapping back, reﬂecting the state-to-action mapping learned during training between persistent ele vated errors and cautious threshold re- cov ery . A similar but less se vere pattern occurs during the 2022 market decline. Importantly , the policy’ s mapping was learned entirely from training-period data; no crisis labels, test-period supervision, or weight updates inform the threshold trajectory shown in the ﬁgure. The frozen policy generalizes its learned associations between reconstruction error patterns and routing decisions to market ev ents it has ne ver encountered. I. Statistical Signiﬁcance T o conﬁrm that the observed improv ements are not at- tributable to chance, T able IX presents paired t-test results comparing daily squared errors. T ABLE IX S T ATI S T I CA L S I GN I FI C AN C E ( n = 1 , 580 T E S T D AYS ) Comparison t -statistic p -value Cohen’ s d AE-NF+SA C vs Integrated NF-BERT − 5 . 82 < 0 . 0001 0.46 AE-NF+SA C vs AE-NF (no SAC) − 3 . 45 0 . 0006 0.27 AE-NF+SA C vs LSTM − 7 . 21 < 0 . 0001 0.57 All comparisons achiev e p < 0 . 001 , with effect sizes (Cohen’ s d ) ranging from 0.27 to 0.57 and indicating medium practical signiﬁcance. The largest effect size (0.57) appears in the comparison against LSTM, which lacks both graph struc- ture and regime awareness. The smallest effect size (0.27) is between the full model and the non-adaptiv e AE-NodeFormer variant, consistent with the SA C controller’ s contribution being a reﬁnement over an already strong base architecture rather than a wholesale improvement. V . D I S C U S S I O N A. Interpr etation of Results Regime-a ware prediction with adaptive control outperforms homogeneous approaches across all metrics. The 26% MAPE improv ement over the baseline integrated model (0.59% vs 0.80%) reﬂects gains from three sources: regime detection (au- toencoder), specialized processing (dual node transformers), and adapti ve parameter tuning (SAC controller). W ithout requiring explicit per-sample anomaly labels, the autoencoder identiﬁes market states that deviate from normal patterns. High reconstruction errors coincide with periods of elev ated volatility , earnings announcements, and macroeco- nomic shocks, allowing the system to detect anomalies as deviations from its learned representation of typical beha vior . The SAC controller learns to adjust the detection threshold based on prediction outcomes, maintaining a higher threshold during stable periods to prev ent unnecessary routing to the ev ent pathway and lo wering it during genuine regime shifts to engage event-a ware processing. This adaptive behavior emerges from the reward signal rather than hand-crafted rules. The dual-path architecture enables specialization: the nor- mal pathway dev elops representations optimized for sta- ble conditions where fundamental factors dominate, while the ev ent pathw ay incorporates additional conte xt (sentiment spikes, volatility re gimes, e vent characterization) that pro ves informativ e during turbulent periods but might add noise during normal conditions. B. Economic Interpr etation While the performance improvements carry practical im- plications, they should be interpreted cautiously . A 7 per- centage point improv ement in directional accuracy (72% vs 65%) carries practical value for trading decisions, and the 43% relativ e improv ement during high-volatility periods is particularly v aluable since these are precisely the conditions where accurate forecasts matter most for risk management. T ransaction costs, market impact, and ex ecution constraints would reduce realized gains from an y trading strate gy based on these predictions. The economic signiﬁcance analysis in prior work [5] demonstrated that even the baseline model’ s predictions, when combined with simple trading rules, gener- ate returns exceeding buy-and-hold benchmarks before costs. The improvements documented here should amplify these returns proportionally , though the gap would narrow after incorporating realistic trading frictions. C. Limitations The most consequential limitation concerns data selection. The 20 stocks used in this study are drawn from the current S&P 500 univ erse, which introduces surviv orship bias: com- panies that failed, were acquired, or delisted between 1982 and 2025 are absent from the dataset. Because the selected stocks are disproportionately successful over the ev aluation period, predictability estimates may be inﬂated relativ e to what a real-time in vestor would experience when choosing from the full market. Performance on a point-in-time univ erse constructed from historical index constituents could differ , and the reported results should therefore be interpreted as evidence of architectural capability rather than guaranteed trading performance. Sev eral data-related concerns compound this issue. Edge weights in the graph are initialized from correlations computed ov er the training period (1982-2010), but market correlations are non-stationary , and relationships that held during this window may weaken or re verse by the test period (2017-2025). The learnable edge reﬁnement mechanism partially compen- sates by adapting weights during training, though fundamental correlation regime shifts could still affect generalization. Sen- timent data from X (formerly T witter) is av ailable only from 2007 onward; for the 1982-2006 portion of training, sentiment features are set to zero. This means the sentiment modality effecti vely “turns on” partway through training, potentially limiting the model’ s ability to learn robust sentiment-price relationships from the earlier decades. On the modeling side, the reinforcement learning controller requires careful tuning of reward weights, temperature, and network architecture, and suboptimal conﬁgurations could lead to unstable threshold behavior or poor con vergence. The system also learns regime boundaries from its o wn prediction performance, which risks feedback loops where poor initial 15 predictions lead to suboptimal threshold learning; the staged training protocol mitigates this by pre-training each component before SA C optimization, but the circularity is not fully eliminated. The three-component architecture increases com- putational requirements, with training time approximately 33% longer than the baseline model, which may constrain real-time deployment. Although the SA C polic y weights are frozen at inference time, the state-dependent routing decisions produced by the ﬁxed polic y constitute a form of adapti ve processing that purely static baselines (ARIMA, Random Forest, LSTM) do not possess. The re gime-switching baselines (HMM-LSTM, MS-V AR) share this property , and the ablation study conﬁrms that the static No SA C v ariant already outperforms all base- lines, b ut the additional gain from state-dependent threshold tuning should be interpreted as an architectural advantage of the frame work rather than an improvement attributable solely to prediction accurac y . From an economic standpoint, the analysis excludes trans- action costs, market impact, and execution constraints. For strategies in volving frequent rebalancing, these frictions would reduce realized returns. Backtesting is performed on the same 20-stock universe used for model development, leaving ex- ternal validity on unseen stocks untested. Finally , sentiment data was collected via the X (formerly T witter) API, which has undergone signiﬁcant access polic y changes, making ex- act replication of the sentiment component challenging with current API limitations. D. Futur e Directions Sev eral extensions could address the limitations identiﬁed abov e. Expanding the stock uni verse to include historical inde x constituents would mitigate survi vorship bias. Automated SA C hyperparameter tuning via meta-learning could reduce conﬁg- uration sensiti vity . More efﬁcient architectures would enable real-time deployment, and strictly online regime detection without training-period VIX percentiles would eliminate an y residual look-ahead. Beyond addressing limitations, incorporating multiple au- toencoders for different anomaly types (e.g., separating liquid- ity crises from earnings shocks) could reﬁne regime classiﬁca- tion. Extending the frame work to portfolio optimization, where regime-a ware allocation could improv e risk-adjusted returns, represents a natural application. Transfer learning to adapt the system to ne w markets or asset classes without full retraining would broaden practical applicability . V I . C O N C L U S I O N This paper introduced an adaptiv e framework for stock price prediction that automatically detects market regimes and adjusts processing accordingly . The architecture combines an autoencoder for regime detection, dual node transformer networks specialized for stable and volatile conditions, and a Soft Actor-Critic reinforcement learning controller that learns adaptiv e regime thresholds from prediction performance. Experiments on 20 S&P 500 stocks spanning 1982-2025 demonstrate substantial improvements ov er prior approaches: the complete system achiev es 0.59% MAPE for one-day predictions compared to 0.80% for the baseline integrated node transformer, while directional accuracy reaches 72%, a 7 percentage point improvement. These gains persist across prediction horizons and are most pronounced during v olatile periods where the baseline struggles. The key conceptual contribution is the adapti ve learning of regime boundaries. Rather than relying solely on ﬁxed hand-crafted anomaly deﬁnitions, the system discov ers useful boundaries by optimizing downstream prediction accuracy . This approach avoids the staleness problem of hand-crafted regime rules and the labeling burden of supervised regime classiﬁcation. Future work should validate the framew ork on broader uni- verses, de velop more ef ﬁcient implementations for real-time deployment, and explore extensions to portfolio optimization and risk management applications. A P P E N D I X T able X reports the hyperparameter search space e xplored for each baseline model. All searches were conducted via grid search on the validation set (2011–2016), with the conﬁgu- ration yielding the lowest validation MAPE selected for test ev aluation. For ARIMA, optimal orders v ary across stocks because each stock exhibits different autocorrelation and partial autocorre- lation structure; the most common selections were ( p, d, q ) = (2 , 1 , 2) and (1 , 1 , 1) . XGBoost uses early stopping with a patience of 50 rounds on v alidation RMSE, so the ef fectiv e number of estimators is often lower than the speciﬁed max- imum. The LSTM and Simple Transformer sequence lengths are ﬁxed at 252 trading days (one calendar year) to match the proposed model’ s input window , ensuring that differences in performance reﬂect architectural capacity rather than infor- mation asymmetry . The MS-V AR uses the MSIH speciﬁcation (Markov-switching intercept and heteroscedasticity), which al- lows both the intercept and the error variance to switch across regimes while keeping the autoregressi ve coefﬁcients regime- in variant; this pro vides sufﬁcient ﬂexibility to capture v olatility regime changes without overﬁtting the transition dynamics. For the HMM-LSTM, regime assignments are determined by the V iterbi path on the training set, and each regime- speciﬁc LSTM is trained only on data segments assigned to its corresponding state. PatchTST and iTransformer use the channel-independent and in verted-attention conﬁgurations recommended in their respectiv e original publications, with sequence lengths ﬁxed at 252 to match other deep learning baselines. R E F E R E N C E S [1] J. D. Hamilton, “ A new approach to the economic analysis of nonsta- tionary time series and the business cycle, ” Econometrica , vol. 57, no. 2, pp. 357–384, 1989. [2] W . Chen, M. Jiang, W .-G. Zhang, and Z. Chen, “ A novel graph con volutional feature based con volutional neural network for stock trend prediction, ” Information Sciences , vol. 556, pp. 67–94, 2021. [3] A. V aswani, N. Shazeer , N. Parmar , J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “ Attention is all you need, ” in Advances in Neural Information Pr ocessing Systems , vol. 30, 2017, pp. 5998–6008. 16 T ABLE X H Y PE R PAR A M E TE R S E A R CH R A NG E S A N D S E L EC T E D V A L UE S F O R B A S E LI N E M O D E LS Model Hyperparameter Search Range Selected ARIMA AR order ( p ) { 0 , 1 , 2 , 3 , 4 , 5 } Per stock (AIC) Differencing ( d ) { 0 , 1 , 2 } Per stock (AIC) MA order ( q ) { 0 , 1 , 2 , 3 , 4 , 5 } Per stock (AIC) Selection criterion AIC, BIC AIC V AR Lag order { 1 , 2 , 3 , . . . , 10 } 3 Trend { none, constant, both } constant Selection criterion AIC, BIC BIC Random Forest Number of estimators { 100 , 200 , 500 } 200 Maximum depth { 5 , 10 , 15 , 20 , None } 15 Min samples split { 2 , 5 , 10 } 5 Min samples leaf { 1 , 2 , 4 } 2 Max features { √ d, log 2 ( d ) , 0 . 5 } √ d SVR Kernel RBF (ﬁxed) RBF Regularization ( C ) { 0 . 1 , 1 , 10 , 100 } 10 Kernel width ( γ ) { scale , 0 . 01 , 0 . 1 } scale Epsilon ( ϵ ) { 0 . 01 , 0 . 05 , 0 . 1 } 0.05 XGBoost Number of estimators { 100 , 500 , 1000 } 500 Maximum depth { 3 , 5 , 7 , 10 } 7 Learning rate { 0 . 01 , 0 . 05 , 0 . 1 } 0.05 Subsample ratio { 0 . 7 , 0 . 8 , 0 . 9 , 1 . 0 } 0.8 Column sample ratio { 0 . 7 , 0 . 8 , 0 . 9 , 1 . 0 } 0.8 Min child weight { 1 , 3 , 5 } 3 L1 regularization ( α ) { 0 , 0 . 01 , 0 . 1 } 0.01 L2 regularization ( λ ) { 1 , 1 . 5 , 2 } 1.5 LSTM Hidden dimension { 64 , 128 , 256 , 512 } 256 Number of layers { 1 , 2 , 3 } 2 Dropout { 0 . 1 , 0 . 2 , 0 . 3 } 0.2 Learning rate { 10 − 4 , 5 × 10 − 4 , 10 − 3 } 5 × 10 − 4 Batch size { 32 , 64 } 64 Sequence length 252 (ﬁxed) 252 Simple Transformer Layers { 4 , 6 , 8 } 6 Attention heads { 4 , 8 } 8 Model dimension { 256 , 512 } 512 FFN dimension { 1024 , 2048 } 2048 Dropout { 0 . 1 , 0 . 2 } 0.1 Learning rate { 10 − 4 , 5 × 10 − 4 , 10 − 3 } 10 − 4 BER T Sentiment + LSTM LSTM hidden dimension { 128 , 256 , 512 } 256 LSTM layers { 1 , 2 } 2 Sentiment fusion { concat., gated } concatenation Dropout { 0 . 1 , 0 . 2 , 0 . 3 } 0.2 Learning rate { 10 − 4 , 5 × 10 − 4 } 10 − 4 MS-V AR Number of regimes ( K ) { 2 , 3 , 4 } 3 Lag order { 1 , 2 , 3 , . . . , 10 } 3 Switching speciﬁcation { MSI, MSM, MSIH } MSIH EM conver gence tolerance { 10 − 6 , 10 − 8 } 10 − 8 HMM-LSTM HMM states ( K ) { 2 , 3 , 4 } 3 HMM covariance { diag., full } diagonal LSTM hidden dimension { 128 , 256 , 512 } 256 LSTM layers { 1 , 2 , 3 } 2 Dropout { 0 . 1 , 0 . 2 , 0 . 3 } 0.2 Learning rate { 10 − 4 , 5 × 10 − 4 , 10 − 3 } 5 × 10 − 4 Sequence length 252 (ﬁxed) 252 TimesNet Layers { 2 , 3 , 4 } 3 Model dimension { 32 , 64 , 128 } 64 T op- k periods { 3 , 5 , 7 } 5 FFN dimension { 64 , 128 , 256 } 128 Dropout { 0 . 1 , 0 . 2 , 0 . 3 } 0.2 Learning rate { 10 − 4 , 5 × 10 − 4 , 10 − 3 } 10 − 4 Sequence length 252 (ﬁxed) 252 PatchTST Patch length { 12 , 16 , 24 } 16 Stride { 8 , 12 , 16 } 8 Layers { 3 , 4 , 6 } 4 Attention heads { 4 , 8 } 8 Model dimension { 128 , 256 , 512 } 256 Dropout { 0 . 1 , 0 . 2 , 0 . 3 } 0.2 Learning rate { 10 − 4 , 5 × 10 − 4 , 10 − 3 } 10 − 4 iTransformer Layers { 3 , 4 , 6 } 4 Attention heads { 4 , 8 } 8 Model dimension { 128 , 256 , 512 } 256 FFN dimension { 256 , 512 , 1024 } 512 Dropout { 0 . 1 , 0 . 2 } 0.1 Learning rate { 10 − 4 , 5 × 10 − 4 , 10 − 3 } 10 − 4 Integrated NF-BER T Architecture and hyperparameters ﬁxed per [5]. [4] J. Devlin, M.-W . Chang, K. Lee, and K. T outanov a, “BER T: Pre- training of deep bidirectional transformers for language understanding, ” in Proceedings of the Confer ence of the North American Chapter of the Association for Computational Linguistics , 2019, pp. 4171–4186. [5] M. A. Al Ridha wi, M. Haj Ali, and H. Al Osman, “Stock mark et prediction using node transformer architecture integrated with BER T sentiment analysis, ” Submitted to IEEE Access , 2026, under re view . [6] Z. Liu, J. Liu, Q. Zeng, and L. Wu, “VIX and stock market volatility predictability: A new approach, ” Finance Researc h Letters , vol. 48, p. 102887, 2022. [7] F . X. Diebold, J.-H. Lee, and G. C. W einbach, “Regime switching with time-v arying transition probabilities, ” Business Cycles: Durations, Dynamics, and F orecasting , pp. 144–165, 1994. [8] H. T ong, Non-linear time series: A dynamical system approac h . Oxford Univ ersity Press, 1990. [9] P . Nystrup, H. Madsen, and E. Lindstr ¨ om, “Regime-based versus static asset allocation: Letting the data speak, ” The Journal of P ortfolio Management , vol. 44, no. 1, pp. 103–115, 2017. [10] S. Aminikhanghahi and D. J. Cook, “ A survey of methods for time series change point detection, ” Knowledge and Information Systems , vol. 51, no. 2, pp. 339–367, 2017. [11] G. E. Hinton and R. R. Salakhutdinov , “Reducing the dimensionality of data with neural networks, ” Science , vol. 313, no. 5786, pp. 504–507, 2006. [12] D. P . Kingma and M. W elling, “ Auto-encoding variational Bayes, ” in International Confer ence on Learning Repr esentations , 2014. [13] A. Pumsirirat and L. Y an, “Credit card fraud detection using deep learning based on auto-encoder and restricted Boltzmann machine, ” International Journal of Advanced Computer Science and Applications , vol. 9, no. 1, pp. 18–25, 2018. [14] M. Ahmed, A. N. Mahmood, and J. Hu, “ A surve y of network anomaly detection techniques, ” Journal of Network and Computer Applications , vol. 60, pp. 19–31, 2016. [15] W . Bao, J. Y ue, and Y . Rao, “ A deep learning framew ork for ﬁnancial time series using stacked autoencoders and long-short term memory , ” PLOS ONE , vol. 12, no. 7, p. e0180944, 2017. [16] M. Liu, H. Sheng, N. Zhang et al. , “ A new deep network model for stock price prediction, ” in International Conference on Machine Learning for Cyber Security , 2022, pp. 413–426. [17] Z. W u, S. P an, F . Chen, G. Long, C. Zhang, and S. Y . Philip, “ A comprehensiv e survey on graph neural networks, ” IEEE T ransactions on Neural Networks and Learning Systems , vol. 32, no. 1, pp. 4–24, 2020. [18] C. W ang, H. Liang, B. W ang, X. Cui, and Y . Xu, “MG-Conv: A spa- tiotemporal multi-graph con volutional neural network for stock market index trend prediction, ” Computers and Electrical Engineering , vol. 103, p. 108285, 2022. [19] Q. W u, W . Zhao, Z. Li, D. P . Wipf, and J. Y an, “Nodeformer: A scalable graph structure learning transformer for node classiﬁcation, ” in Advances in Neural Information Processing Systems , v ol. 35, 2022, pp. 27 387– 27 401. [20] R. S. Sutton and A. G. Barto, Reinforcement learning: An intr oduction , 2nd ed. MIT Press, 2018. [21] Z. Jiang, D. Xu, and J. Liang, “ A deep reinforcement learning frame- work for the ﬁnancial portfolio management problem, ” arXiv preprint arXiv:1706.10059 , 2017. [22] Z. Ning, P . Dong, X. W ang, X. Hu, L. Guo, B. Hu, R. Y . K. Kwok, and V . C. M. Leung, “ A double deep Q-learning model for energy-ef ﬁcient edge scheduling, ” IEEE Tr ansactions on Services Computing , vol. 14, no. 5, pp. 1555–1566, 2021. [23] Y . Deng, F . Bao, Y . Kong, Z. Ren, and Q. Dai, “Deep direct rein- forcement learning for ﬁnancial signal representation and trading, ” IEEE T ransactions on Neural Networks and Learning Systems , vol. 28, no. 3, pp. 653–664, 2016. [24] V . Mnih, K. Kavukcuoglu, D. Silv er, A. A. Rusu, J. V eness, M. G. Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski et al. , “Human-level control through deep reinforcement learning, ” in Natur e , vol. 518, no. 7540, 2015, pp. 529–533. [25] T . Haarnoja, A. Zhou, P . Abbeel, and S. Levine, “Soft actor-critic: Off- policy maximum entropy deep reinforcement learning with a stochastic actor , ” in International Confer ence on Machine Learning . PMLR, 2018, pp. 1861–1870. [26] T . Haarnoja, A. Zhou, K. Hartikainen, G. T ucker, S. Ha, J. T an, V . Kumar , H. Zhu, A. Gupta, P . Abbeel, and S. Levine, “Soft actor-critic algorithms and applications, ” arXiv pr eprint arXiv:1812.05905 , 2019. [27] A. Ang and G. Bekaert, “International asset allocation with regime shifts, ” The Revie w of F inancial Studies , vol. 15, no. 4, pp. 1137–1187, 2002. [28] M. Guidolin and A. Timmermann, “ Asset allocation under multivariate regime switching, ” Journal of Economic Dynamics and Control , vol. 31, no. 11, pp. 3503–3544, 2007. [29] T . Fischer and C. Krauss, “Deep learning with long short-term memory networks for ﬁnancial market predictions, ” Eur opean Journal of Oper- ational Resear ch , vol. 270, no. 2, pp. 654–669, 2018. [30] K. Cortis, A. Freitas, T . Daudert, M. Huerlimann, M. Zarrouk, S. Hand- schuh, and B. Da vis, “SemEval-2017 task 5: Fine-grained sentiment analysis on ﬁnancial microblogs and news, ” in Proceedings of the 11th International W orkshop on Semantic Evaluation (SemEval-2017) , 2017, pp. 519–535. 17 [31] H.-M. Krolzig, Markov-Switching V ector A utore gressions: Modelling, Statistical Inference , and Application to Business Cycle Analysis . Berlin: Springer-V erlag, 1997. [32] H. W u, T . Hu, Y . Liu, H. Zhou, J. W ang, and M. Long, “T imesNet: T emporal 2d-v ariation modeling for general time series analysis, ” in Pr oceedings of the 11th International Confer ence on Learning Repr e- sentations (ICLR) , 2023. [33] Y . Nie, N. H. Nguyen, P . Sinthong, and J. Kalagnanam, “ A time series is worth 64 words: Long-term forecasting with transformers, ” in Pr oceed- ings of the 11th International Conference on Learning Representations (ICLR) , 2023. [34] Y . Liu, T . Hu, H. Zhang, H. W u, S. W ang, L. Ma, and M. Long, “iTrans- former: Inv erted transformers are effecti ve for time series forecasting, ” in Pr oceedings of the 12th International Conference on Learning Rep- r esentations (ICLR) , 2024. [35] G. E. P . Box and G. M. Jenkins, Time Series Analysis: F orecasting and Contr ol . San Francisco: Holden-Day , 1976. [36] C. A. Sims, “Macroeconomics and reality , ” Econometrica , vol. 48, no. 1, pp. 1–48, 1980. [37] L. Breiman, “Random forests, ” Machine Learning , vol. 45, no. 1, pp. 5–32, 2001. [38] V . N. V apnik, The Natur e of Statistical Learning Theory . New Y ork: Springer , 1995. [39] T . Chen and C. Guestrin, “XGBoost: A scalable tree boosting system, ” in Pr oceedings of the 22nd A CM SIGKDD International Conference on Knowledge Discovery and Data Mining , 2016, pp. 785–794. [40] S. Hochreiter and J. Schmidhuber , “Long short-term memory , ” Neural Computation , vol. 9, no. 8, pp. 1735–1780, 1997. [41] F . X. Diebold and R. S. Mariano, “Comparing predicti ve accuracy , ” Journal of Business & Economic Statistics , vol. 13, no. 3, pp. 253–263, 1995. Mohammad Al Ridhawi receiv ed the B.A.Sc. de- gree in computer engineering and the M.Sc. degree in digital transformation and innovation (machine learning) from the University of Ottawa, Ottaw a, Canada, in 2019 and 2021, respectively . He is cur- rently pursuing the Ph.D. degree in electrical and computer engineering at the University of Ottawa, where he also serves as a Part-T ime Engineering Professor . He has industry experience as a Senior Data Scientist and Senior Machine Learning Engi- neer , building production ML systems in ﬁnancial and en vironmental domains. His research interest s include deep learning, graph neural networks, natural language processing, ﬁnancial time series analysis, and reinforcement learning. Mahtab Haj Ali receiv ed the M.Sc. degree in digital transformation and innovation from the University of Otta wa, Otta wa, Canada, in 2021. She is cur- rently pursuing the Ph.D. degree in electrical and computer engineering at the University of Ottawa, with a research focus on time series forecasting and deep learning models. She works as an AI Research Engineer at the National Research Council of Canada, where she builds and ev aluates large language models (LLMs) and develops AI-dri ven solutions for real-world industrial applications. Her work includes large-scale time series analysis, advanced feature engineering, and the application of LLMs in production en vironments. Her research interests include deep learning for time series analysis, deep neural networks, and applied artiﬁcial intelligence. Hussein Al Osman received the B.A.Sc., M.A.Sc., and Ph.D. degrees from the Univ ersity of Ottawa, Ottawa, Canada. He is an Associate Professor and Associate Director in the School of Electrical En- gineering and Computer Science at the University of Ottawa, where he leads the Multimedia Process- ing and Interaction Group. His research focuses on affecti ve computing, multimodal affect estimation, human–computer interaction, serious gaming, and multimedia systems. He has produced over 50 peer- revie wed research articles, two patents, and several technology transfers to industry .

Adaptive Regime-Aware Stock Price Prediction Using Autoencoder-Gated Dual Node Transformers with Reinforcement Learning Control

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment