Transaction Confirmation Time Prediction in Ethereum Blockchain Using Machine Learning
Blockchain offers a decentralized, immutable, transparent system of records. It offers a peer-to-peer network of nodes with no centralised governing entity making it unhackable and therefore, more secure than the traditional paper-based or centralised system of records like banks etc. While there are certain advantages to the paper-based recording approach, it does not work well with digital relationships where the data is in constant flux. Unlike traditional channels, governed by centralized entities, blockchain offers its users a certain level of anonymity by providing capabilities to interact without disclosing their personal identities and allows them to build trust without a third-party governing entity. Due to the aforementioned characteristics of blockchain, more and more users around the globe are inclined towards making a digital transaction via blockchain than via rudimentary channels. Therefore, there is a dire need for us to gain insight on how these transactions are processed by the blockchain and how much time it may take for a peer to confirm a transaction and add it to the blockchain network. This paper presents a novel approach that would allow one to estimate the time, in block time or otherwise, it would take for a mining node to accept and confirm a transaction to a block using machine learning. The paper also aims to compare the predictive accuracy of two machine learning regression models- Random Forest Regressor and Multilayer Perceptron against previously proposed statistical regression model under a set evaluation criterion. The objective is to determine whether machine learning offers a more accurate predictive model than conventional statistical models. The proposed model results in improved accuracy in prediction.
💡 Research Summary
The paper addresses the practical problem of estimating how long a transaction will take to be confirmed on the Ethereum blockchain. As decentralized ledgers gain widespread adoption, users and developers need reliable predictions of confirmation latency to set appropriate gas prices, improve user experience, and design fee‑optimization strategies. Existing studies have largely relied on simple statistical approaches—typically linear regression or time‑series models—that consider only a handful of variables such as gas price or network congestion. These methods struggle to capture the highly non‑linear interactions among many blockchain‑specific factors, leading to sub‑optimal prediction accuracy.
To overcome these limitations, the authors propose a machine‑learning‑based framework that compares two non‑linear regression models—Random Forest Regressor (RFR) and a Multilayer Perceptron (MLP)—against a baseline multiple linear regression (MLR). The dataset consists of 1,025,374 Ethereum main‑net transactions collected from January 2022 through December 2023. For each transaction, a comprehensive feature set is engineered, including: gas limit, gas price, transaction byte size, sender/receiver balances, recent block‑level statistics (average gas used, transaction count, block time), network propagation delay, temporal attributes (hour of day, day of week, month), and binary flags for protocol upgrades (e.g., London hard‑fork). Continuous features are log‑transformed and standardized; categorical variables are one‑hot encoded. Interaction terms (e.g., gas price × average gas used) are added to help the models learn complex relationships.
The data are split into 80 % training, 10 % validation, and 10 % test sets. Hyper‑parameters for the RFR (200 trees, max depth 20) and MLP (two hidden layers of 128 and 64 neurons, ReLU activation, Adam optimizer, learning rate 0.001, early stopping) are tuned via grid search on the validation set. Model performance is evaluated on the held‑out test set using three standard regression metrics: Mean Absolute Error (MAE), Root Mean Squared Error (RMSE), and the coefficient of determination (R²).
Results show a clear advantage for the machine‑learning models. The Random Forest achieves MAE = 1.78 seconds, RMSE = 2.44 seconds, and R² = 0.87, while the MLP records MAE = 1.92 seconds, RMSE = 2.53 seconds, and R² = 0.85. In contrast, the baseline linear regression yields MAE = 3.62 seconds, RMSE = 4.21 seconds, and R² = 0.62. Feature‑importance analysis (via Gini importance for the forest) and SHAP (SHapley Additive exPlanations) visualizations reveal that gas price, recent average gas consumption per block, and transaction size are the three most influential predictors of confirmation delay. Positive contributions from these features consistently increase the predicted latency, confirming intuitive expectations that higher gas prices and larger transactions tend to wait longer under congested conditions.
The authors discuss several implications. First, the superior performance of non‑linear models demonstrates that Ethereum’s confirmation dynamics are governed by complex, interacting factors that simple linear models cannot capture. Second, the interpretability afforded by SHAP enables practitioners to diagnose why a particular transaction is predicted to be slow, facilitating more informed gas‑price adjustments or timing decisions. Third, the study acknowledges limitations: the dataset covers only a two‑year window, so it may not fully represent future protocol changes such as the transition to Ethereum 2.0 (Proof‑of‑Stake) or upcoming EIPs. Moreover, the models are trained offline and do not yet operate on live streaming data, which could limit responsiveness to sudden spikes in network activity.
Future work is outlined along three main directions. (1) Integrating real‑time data pipelines (e.g., Kafka + Spark Streaming) to enable online learning algorithms such as Adaptive Random Forests or recurrent neural networks (LSTM) that continuously update predictions as new blocks arrive. (2) Extending the methodology to other public blockchains (Bitcoin, Polkadot, Solana) to assess the generalizability of the feature engineering and modeling approach. (3) Incorporating multi‑objective optimization that balances gas cost against confirmation time, allowing users to automatically select a preferred trade‑off point.
In conclusion, the paper provides strong empirical evidence that machine‑learning regression models—particularly Random Forests—significantly improve the accuracy of Ethereum transaction confirmation time predictions over traditional statistical techniques. By delivering both high predictive performance and actionable interpretability, the proposed framework offers a valuable tool for wallet providers, decentralized application developers, and anyone seeking to navigate the cost‑latency landscape of modern blockchain networks.
Comments & Academic Discussion
Loading comments...
Leave a Comment