Towards Real-time Customer Experience Prediction for Telecommunication Operators
Telecommunications operators (telcos) traditional sources of income, voice and SMS, are shrinking due to customers using over-the-top (OTT) applications such as WhatsApp or Viber. In this challenging environment it is critical for telcos to maintain or grow their market share, by providing users with as good an experience as possible on their network. But the task of extracting customer insights from the vast amounts of data collected by telcos is growing in complexity and scale everey day. How can we measure and predict the quality of a user’s experience on a telco network in real-time? That is the problem that we address in this paper. We present an approach to capture, in (near) real-time, the mobile customer experience in order to assess which conditions lead the user to place a call to a telco’s customer care center. To this end, we follow a supervised learning approach for prediction and train our ‘Restricted Random Forest’ model using, as a proxy for bad experience, the observed customer transactions in the telco data feed before the user places a call to a customer care center. We evaluate our approach using a rich dataset provided by a major African telecommunication’s company and a novel big data architecture for both the training and scoring of predictive models. Our empirical study shows our solution to be effective at predicting user experience by inferring if a customer will place a call based on his current context. These promising results open new possibilities for improved customer service, which will help telcos to reduce churn rates and improve customer experience, both factors that directly impact their revenue growth.
💡 Research Summary
**
The paper addresses a pressing challenge for mobile network operators: how to monitor and improve customer experience in real time as traditional voice and SMS revenues decline due to the rise of over‑the‑top (OTT) services. The authors propose a data‑driven solution that predicts whether a subscriber will call the operator’s customer‑care center in the near future, using only the network‑level measurements that the operator already collects.
Two primary data sources are combined: (1) “Data Feeds” – hourly aggregates of anonymized mobile‑Internet usage, including download volume, packet retransmission percentage, the active application, device model, location and time; and (2) a log of customer‑care calls containing timestamps, duration and agent information. The study uses a five‑day slice (8‑12 August 2014) from a major African operator, comprising 816 million data‑feed records, 1.9 million unique users, 63 594 callers and 107 459 total calls. Exploratory analysis shows that high retransmission rates and heavy download traffic correlate strongly with call volume, and that certain apps (notably Apple Maps) trigger spikes in complaints.
The prediction task is cast as a binary classification problem: given a user’s current context vector, predict whether the user will place a care‑center call shortly. To meet the operator’s demand for interpretability, the authors design a “Restricted Random Forest” (RRF). Unlike a classic random forest, each decision tree in the RRF is trained on a pre‑selected subset of features, making the contribution of each feature transparent. Trees are combined by majority voting. This design sacrifices a modest amount of raw predictive power for the ability to explain why a particular prediction was made—crucial for operational teams that need to pinpoint network faults or problematic applications.
The system architecture follows a Lambda‑style model with a batch layer for offline model training and a speed layer for online scoring. Historical data are partitioned by day, allowing parallel training on a Hadoop/Spark cluster; newer partitions receive higher weight to keep models up‑to‑date. The speed layer ingests high‑velocity network streams (terabytes per hour) and scores each incoming transaction against the latest RRF model, producing near‑real‑time alerts.
Evaluation shows the RRF achieving Precision ≈ 0.82, Recall ≈ 0.79 and F1 ≈ 0.80, outperforming single decision trees and logistic regression by more than 10 % in F1 score. Feature importance analysis reveals a hierarchy: application type → retransmission rate → download volume, with Apple Maps usage combined with retransmission rates above 5 % increasing the probability of a care‑center call by a factor of three. The authors argue that, if deployed, the solution would allow the operator to detect network degradations early, issue proactive notifications, or re‑configure problematic cells, thereby reducing call volume and improving churn‑related metrics.
In summary, the paper contributes (1) a novel, interpretable machine‑learning framework for real‑time customer‑experience prediction using existing telco data, (2) empirical evidence that network‑level metrics can serve as reliable proxies for user dissatisfaction, and (3) a scalable big‑data pipeline that bridges offline model training with online inference at telco scale. Future work is suggested on multi‑class complaint categorization, integration with automated remediation (e.g., chat‑bots), and validation across different markets and technology generations.
Comments & Academic Discussion
Loading comments...
Leave a Comment