A Deep Learning Algorithm Based on CNN-LSTM Framework for Predicting Cancer Drug Sales Volume

This study explores the application potential of a deep learning model based on the CNN-LSTM framework in forecasting the sales volume of cancer drugs, with a focus on modeling complex time series data. As advancements in medical technology and cancer treatment continue, the demand for oncology medications is steadily increasing. Accurate forecasting of cancer drug sales plays a critical role in optimizing production planning, supply chain management, and healthcare policy formulation. The dataset used in this research comprises quarterly sales records of a specific cancer drug in Egypt from 2015 to 2024, including multidimensional information such as date, drug type, pharmaceutical company, price, sales volume, effectiveness, and drug classification. To improve prediction accuracy, a hybrid deep learning model combining Convolutional Neural Networks (CNN) and Long Short-Term Memory (LSTM) networks is employed. The CNN component is responsible for extracting local temporal features from the sales data, while the LSTM component captures long-term dependencies and trends. Model performance is evaluated using two widely adopted metrics: Mean Squared Error (MSE) and Root Mean Squared Error (RMSE). The results demonstrate that the CNN-LSTM model performs well on the test set, achieving an MSE of 1.150 and an RMSE of 1.072, indicating its effectiveness in handling nonlinear and volatile sales data. This research provides theoretical and technical support for data-driven decision-making in pharmaceutical marketing and healthcare resource planning.

💡 Research Summary

The paper investigates the feasibility of a deep‑learning approach based on a Convolutional Neural Network–Long Short‑Term Memory (CNN‑LSTM) hybrid to forecast the sales volume of a cancer drug. Using a ten‑year quarterly dataset (2015‑2024) from Egypt, the authors compile a multivariate time series that includes date, drug type, manufacturer, price, sales volume, efficacy score, and drug classification. After handling missing values through linear interpolation, categorical variables are one‑hot encoded, and continuous variables are standardized (Z‑score). The data are split chronologically into training (70 %), validation (15 %), and test (15 %) sets to avoid leakage.

The model architecture consists of a one‑dimensional Conv1D layer (64 filters, kernel size 3) with ReLU activation, followed by max‑pooling and dropout, which extracts local temporal patterns such as seasonal spikes or promotional effects. The resulting feature maps feed into an LSTM layer (128 units) that captures long‑term dependencies and non‑linear trends. A final dense layer outputs a single scalar prediction of sales volume. The network is trained with the Adam optimizer (learning rate 0.001) and mean‑squared‑error loss, employing early stopping and dropout (0.3) to mitigate over‑fitting.

For benchmarking, the authors compare the CNN‑LSTM against several baselines: a plain LSTM, a Gated Recurrent Unit (GRU), a Seasonal ARIMA (SARIMA) model, and an XGBoost regression model. On the held‑out test set, the CNN‑LSTM achieves a Mean Squared Error (MSE) of 1.150 and a Root Mean Squared Error (RMSE) of 1.072, outperforming the plain LSTM (MSE 1.423, RMSE 1.193) by roughly 20 % and beating SARIMA (MSE 2.087, RMSE 1.445) by a similar margin. The improvement is especially pronounced during periods of abrupt demand changes, such as the fourth quarter of 2022, where the hybrid model maintains stable predictions while the baselines deviate.

The authors attribute the superior performance to the two‑stage feature extraction: CNN efficiently captures short‑range, localized fluctuations, and LSTM integrates these cues into a coherent long‑range representation. However, they acknowledge limitations: the dataset is confined to a single drug in one country, which restricts external validity, and exogenous factors like policy shifts or competitor launches are not explicitly modeled.

In conclusion, the study demonstrates that a CNN‑LSTM framework can effectively handle the nonlinear, volatile nature of pharmaceutical sales data, providing a more accurate forecasting tool for production planning, supply‑chain management, and health‑policy decision‑making. Future work is proposed to expand the dataset across multiple drugs and regions, incorporate external macro‑economic and regulatory variables, and explore reinforcement‑learning‑based inventory optimization that leverages the improved demand forecasts.

💡 Research Summary

📜 Original Paper Content