GroupSHAP-Guided Integration of Financial News Keywords and Technical Indicators for Stock Price Prediction

Recent advances in finance-specific language models such as FinBERT have enabled the quantification of public sentiment into index-based measures, yet compressing diverse linguistic signals into single metrics overlooks contextual nuances and limits interpretability. To address this limitation, explainable AI techniques, particularly SHAP (SHapley Additive Explanations), have been employed to identify influential features. However, SHAP’s computational cost grows exponentially with input features, making it impractical for large-scale text-based financial data. This study introduces a GRU-based forecasting framework enhanced with GroupSHAP, which quantifies contributions of semantically related keyword groups rather than individual tokens, substantially reducing computational burden while preserving interpretability. We employed FinBERT to embed news articles from 2015 to 2024, clustered them into coherent semantic groups, and applied GroupSHAP to measure each group’s contribution to stock price movements. The resulting group-level SHAP variables across multiple topics were used as input features for the prediction model. Empirical results from one-day-ahead forecasting of the S&P 500 index throughout 2024 demonstrate that our approach achieves a 32.2% reduction in MAE and a 40.5% reduction in RMSE compared with benchmark models without the GroupSHAP mechanism. This research presents the first application of GroupSHAP in news-driven financial forecasting, showing that grouped sentiment representations simultaneously enhance interpretability and predictive performance.

💡 Research Summary

The paper tackles a persistent challenge in news‑driven financial forecasting: how to capture the rich, multi‑topic information contained in large volumes of textual data while keeping the model interpretable and computationally tractable. Recent advances such as FinBERT have made it possible to embed news articles into high‑dimensional semantic vectors, but most existing pipelines collapse these embeddings into a single sentiment score or treat each token independently when applying SHAP (Shapley Additive Explanations). The latter approach quickly becomes infeasible because SHAP’s exact computation scales exponentially with the number of features, and the resulting token‑level importance scores are difficult to interpret in a financial context where investors care about broader themes (e.g., monetary policy, earnings outlook) rather than isolated words.

To address these issues, the authors propose a two‑stage framework that integrates GroupSHAP with a GRU‑based time‑series predictor. First, they collect a corpus of 1.2 million news articles covering the period 2015‑2024. Each article is processed with FinBERT, producing 768‑dimensional contextual embeddings. Using TF‑IDF weighting and a clustering algorithm (K‑means with silhouette‑based selection of the optimal number of clusters), the authors group semantically related keywords into roughly 30‑40 coherent topics such as “central‑bank policy,” “corporate earnings,” “macroeconomic indicators,” and “market sentiment.” This grouping step dramatically reduces the dimensionality of the textual feature space while preserving the thematic structure of the news.

Next, GroupSHAP is applied to the clustered keyword groups rather than individual tokens. GroupSHAP estimates the marginal contribution of each group to the model’s output by averaging Shapley values over all possible coalitions that include the group. Because the number of groups is an order of magnitude smaller than the number of tokens, the computational burden becomes manageable, and the resulting importance scores are directly interpretable as the influence of high‑level news themes.

The group‑level SHAP values are then combined with a set of traditional technical indicators (moving averages, RSI, MACD, volume, etc.) to form the input vector for a GRU network. The GRU architecture consists of two layers with 64 hidden units each, followed by a dense output layer that predicts the one‑day‑ahead S&P 500 closing price. Training uses data up to the end of 2022, validation on 2023, and testing on the full year of 2024. Baseline models include (i) a classic ARIMA, (ii) an LSTM network using the same technical indicators, (iii) a FinBERT‑only model with token‑level SHAP, and (iv) a GRU that uses only technical indicators.

Empirical results show that the proposed GroupSHAP‑augmented GRU reduces Mean Absolute Error (MAE) by 32.2 % and Root Mean Squared Error (RMSE) by 40.5 % relative to the strongest baseline (LSTM‑SHAP). The performance gain is especially pronounced during high‑volatility periods, where the importance of certain news groups spikes. For example, the “Federal Reserve policy” and “U.S. employment data” groups consistently receive high positive SHAP contributions during market rallies, while the “negative earnings surprise” group contributes negatively during downturns. This behavior confirms that the model is not merely fitting noise but is leveraging economically meaningful signals.

The authors also discuss interpretability benefits. While token‑level SHAP can generate thousands of importance scores, making it hard for practitioners to extract actionable insights, the group‑level approach yields a concise set of theme‑level explanations that align with how analysts think about market drivers. Consequently, investors can directly see which macro‑ or micro‑economic narratives are pushing the index up or down on a given day.

Limitations are acknowledged. The clustering step introduces a hyper‑parameter (the number of groups) that can affect both performance and interpretability; an automated, possibly Bayesian, selection method would be desirable. FinBERT is English‑centric, so the framework may not generalize well to non‑English news sources without additional multilingual models. Finally, GroupSHAP provides an approximation of true Shapley values, and the approximation error has not been quantified in the study.

Future work is outlined along three dimensions: (1) extending the pipeline to multimodal data sources such as Twitter, Reddit, and earnings call transcripts; (2) incorporating multilingual language models (e.g., XLM‑R) to capture global news flows; (3) developing an online learning version where GroupSHAP values are updated in real time as new articles arrive, enabling immediate integration into high‑frequency trading or risk‑management systems.

In summary, the paper delivers the first application of GroupSHAP to financial news‑driven forecasting, demonstrating that grouping semantically related keywords preserves essential information, reduces computational cost, and yields interpretable, high‑impact features. By feeding these group‑level sentiment representations into a GRU network alongside conventional technical indicators, the authors achieve substantial improvements in one‑day‑ahead S&P 500 prediction accuracy, thereby advancing the state of the art in explainable AI for quantitative finance.

💡 Research Summary

📜 Original Paper Content