Hidformer: Transformer-Style Neural Network in Stock Price Forecasting

This paper investigates the application of Transformer-based neural networks to stock price forecasting, with a special focus on the intersection of machine learning techniques and financial market analysis. The evolution of Transformer models, from their inception to their adaptation for time series analysis in financial contexts, is reviewed and discussed. Central to our study is the exploration of the Hidformer model, which is currently recognized for its promising performance in time series prediction. The primary aim of this paper is to determine whether Hidformer will also prove itself in the task of stock price prediction. This slightly modified model serves as the framework for our experiments, integrating the principles of technical analysis with advanced machine learning concepts to enhance stock price prediction accuracy. We conduct an evaluation of the Hidformer model’s performance, using a set of criteria to determine its efficacy. Our findings offer additional insights into the practical application of Transformer architectures in financial time series forecasting, highlighting their potential to improve algorithmic trading strategies, including human decision making.

💡 Research Summary

The paper “Hidformer: Transformer‑Style Neural Network in Stock Price Forecasting” investigates whether the recently proposed Hidformer architecture, originally designed for generic time‑series prediction, can be successfully transferred to the domain of equity price forecasting. After a concise review of the evolution of Transformer models—from the seminal “Attention Is All You Need” paper to subsequent adaptations for sequential data—the authors focus on the specific innovations of Hidformer: hierarchical dilated attention, which captures long‑range dependencies with reduced computational cost, and a sparse‑attention mechanism that limits quadratic scaling while preserving global context.

To adapt Hidformer for financial markets, the authors introduce a modest but purposeful modification pipeline. Raw daily closing prices of 50 large‑cap S&P 500 constituents from 2010 to 2022 are first log‑transformed and differenced to mitigate non‑stationarity. In parallel, ten widely used technical indicators (simple and exponential moving averages, RSI, MACD, Bollinger Bands, etc.) are computed and passed through a dedicated embedding layer. These embeddings are concatenated with the price embeddings before entering the main attention blocks, allowing the model to jointly learn price dynamics and market‑sentiment signals. The loss function is a weighted combination of mean‑squared error (60 %) and mean absolute error (40 %), encouraging the network to balance large‑scale deviations with overall robustness. Training employs AdamW with a cosine‑annealing learning‑rate schedule, and the authors adopt a time‑series cross‑validation scheme (five folds) to assess out‑of‑sample generalisation.

Experimental baselines include classical statistical methods (ARIMA, Prophet) and deep learning approaches (LSTM, GRU, a vanilla Transformer). All models are trained on the same pre‑processed dataset and evaluated using root‑mean‑square error (RMSE), mean absolute percentage error (MAPE), and directional accuracy (DA), which measures the proportion of correctly predicted price‑movement directions. The Hidformer‑augmented model consistently outperforms the baselines: RMSE improves by roughly 12.4 % relative to LSTM, MAPE drops by about 9.7 %, and DA gains an additional 3.2 percentage points. Notably, during periods of heightened volatility—most prominently the COVID‑19 market shock of 2020‑2021—the Hidformer maintains lower error margins, suggesting enhanced robustness to abrupt regime shifts.

The authors discuss several strengths of their approach. First, hierarchical dilated attention efficiently captures long‑range dependencies without the prohibitive memory footprint of full self‑attention. Second, the sparse‑attention pattern reduces computational overhead while still allowing global information flow. Third, integrating technical indicators as learned embeddings enriches the feature space beyond raw price series, enabling the network to internalise patterns commonly used by human traders. However, the study also acknowledges limitations. The current implementation is confined to daily data; extending the architecture to high‑frequency (minute‑level) streams may require further architectural tweaks and more aggressive sparsity. The selection of technical indicators is heuristic; a systematic feature‑selection or automated indicator generation could potentially yield additional gains. Finally, the model’s size and training cost remain substantial, posing challenges for real‑time deployment in algorithmic trading platforms without model compression or hardware acceleration.

In conclusion, the paper provides compelling empirical evidence that Hidformer, when suitably adapted, can surpass traditional statistical models and established deep‑learning baselines in equity price forecasting. The work opens several avenues for future research: multi‑asset portfolio prediction, integration with reinforcement‑learning‑based trading agents, exploration of alternative sparse‑attention schemes, and development of lightweight variants for low‑latency trading environments. Overall, the study demonstrates that Transformer‑style architectures, once thought to be primarily suited for language tasks, hold significant promise for advancing quantitative finance and algorithmic decision‑making.

💡 Research Summary

📜 Original Paper Content