Customer Churn in Mobile Markets A Comparison of Techniques

International Business Research ; V ol. 8, No. 6; 2015 ISSN 1913-9004 E-ISSN 1913-9012 Published by Canadian Center of Science and Education 224 Custo m er Churn in Mobile M arkets: A Co mparison of T echniques Mohammed Hassou na 1 , Ali T arhini 2 , T ariq Elyas 3 & Moha mmad Saeed AbouTrab 2 1 Computing and Infor mation Systems Depart ment, U niversit y of Greenwich, United Kin gdom 2 Department of Infor mation Systems, Brunel Uni versity London, Mid dlesex, U nited Kingdom 3 Faculty of Arts and Humanities, King Abdulaziz U niversity, Saudi A rabia Correspondence: Ali T arhini, Department of Inf ormatio n Systems , Brunel Universit y Londo n, Midd lesex, UK . E-mail: ali.tarhini@bru nel. ac .uk Received: A pril 16 , 201 5 Accepted : May 15, 201 5 Online Published : Ma y 25, 2 015 doi:10.5539 /ibr .v8n6p 22 4 URL: http://dx.do i.org/10.5539/ ibr .v8n6p 22 4 Abstract The h igh increase in the num ber of co mpanies competing in mature markets m akes cus tomer retention an important factor for any company to survive. Thus, many methodologies (e.g., data mining and statistics) have been prop osed to analyse and stud y customer rete ntion. The validity of suc h methods is not yet proved though. This paper tries to fill this gap by empiricall y comparing two techniq ues: Customer c hurn - dec ision tree a nd logistic regression models. T he pap er proves the supe riorit y o f dec ision tree techniq ue and stresses the need s for more advanced methods to ch urn modelling. Keywords: custo mer relationship management, customer ch urn, data mining, mobile market 1. Introduction Information technolog y (I T) has beco me a vital and inte gral part of every busine ss plan (T ar hini et al., 2 014a, b, c; Abbasi et al., 201 5; Alenezi et al, 2015; E l-Masri et al., 2015; Masa ’ deh et al., 2015 a, b, c; T arhini et al., 2015a , b, c) . Recently , Customer Relationship Ma nagement ( CRM) has got lot of attentions for nowadays companies where cu stomer rete ntion is c onsidered to be its main factor to be investigated as it focuses on developing and controlling lo yal, profitable a nd lasting relatio ns with customers. Developing successful retention techni q ues is important for b usinesses i n general, and for mobile operator s in particular since t hey are l osing 20% to 40% o f their c ustomers each year ( Jonathan, Jangh yuk, & Lawrence, 20 01; Ahn, H an , & Lee, 20 06; Seo, Ranganatha n, & Babad, 2008; Ordenes et al., 20 14; Hu, Shu, & Qiao, 2014, T arhini et al., 201 5d, e; Orozco et al., 2015 ). Attracting ne w customers costs a lot in ad vertising, ed ucating, creating new accounts. Such costs do not exist in the case o f retai ning existing custo mers. As a result, keepi ng an existing custo mer is five times cheaper than one attracting a new one (McIlro y & B arnett, 200 0). Improving c ustomer retentio n contribute s in red ucing churn rate from 20% to 10% annuall y saved about £ 25 millio n to the mobile operator Orange (A ydin & Ö zer, 2005). Investigate customer churn has been a s ubject for m any methods and tech niques including statistical methods and data mining. The super iority of u sing data mining is well establi shed to investigate custo mer ch urn compared with traditio nal m arket re search surveys (Shaw et al., 2001; Oliv eira, 2012; Huang, Kechadi , & Buckley , 201 2). Ma rket resear ch surveys based o n runnin g questionnaires or interviews voluntarily suff ers from a high cost, limited access to the custo mer pop ulation and d ata sel f -repo rting. Contrary , data m ining provides knowledge of whole customer populations based on analysis of t heir current and historical data. Data mining increasingly beco mes a main technique in c ustomer retention to predict future custo mers ’ attitude and detect patterns within historical data (W ei & Chiu, 2002; Han, L u, & Leung, 20 12; Liu & Fan, 2014) . T wo o f the most co mmon d ata mining tech niques are re gression anal ysis and decision tree (Hung, Y en , & W ang, 2006). First, logistic regression predicts the occurrence p rob ability o f customer churn by formulating a set of equations, input field values, factors a ffecting c ustomer churn a nd the output field ( A hn et al., 2006; Burez & V an den Poel, 2007). Second, decision trees, the most pop ular type of p redictive model ( B urez & V an d en P oel, 2007; Nie et al., 201 1), are used to solve classification pro ble ms where the instances ar e classified i nto one of two classes (i.e., positives and negatives). However, there i s little in the li terature that ( a) Provid es an inclu sive view of ho w decision tree and logistic regressio n models can be used to analyse and u nderstand customer churn in the mobile teleco mmunication market; and (b ) explores the p ros and cons of these techni ques. www . ccsenet.org/ibr International Business Research V ol. 8 , No. 6; 2015 225 Addressing these i ssues, we clar ify ho w to develop decision trees and logistic regressio n models using rea l data sets provided b y a UK mobile op erator . Experimental results a nd discussion ar e pr esented to pro vide understandings into the complexity of the churn prob lem. W e discuss the pros and cons of tech niques in customer retention analysis and suggest directions for future research. T his paper is structured as follows. Section 2 revises decisio n tree and logistic r egression modelling techniq ues. Perfor mance evaluation metrics are described in Section 3. Sectio n 4 intro duces modelling e xperiments results. Sectio n 5 eval uates and co mpares the performance of the developed models. Lastly , sec tion 6 concludes t his paper and points to future work. 2. Churn M odelling T echniq ues 2.1 Log istic Regr ession Logistic regression is a data mining technique used to predict occurrence probability of custom er churn (e.g., Burez & V an den Poel, 2007; Ahn et al., 2 006; Migué is et al. , 2012 ; Faris, Al -Shboul, & Ghatas heh, 20 14; Glady , Baesens, & C roux, 2009; Farquad, Ravi, & Raju, 2014). Logistic regression is based on a mathematically-orie nted app roach to anal yse the a ffecting of variables on the others. Prediction is made b y forming a set of eq uations connecting input values (i.e., affecting c usto mer chur n) with t he output fi eld (probability of churn). The equations (1), (2) and (3) give the mathematical for mulas for a logistic regression model (Miner , Nisbet, & Eld er , 2009).  󰇛            󰇜   󰇛  󰇜  (1)  󰇛  󰇜   󰇛    󰇜 (2)                      (3) Where: • y is the target variable for ea ch individual j (custo mer in churn modelling), y is a binary class label ( 0 or 1) ; •   is a constant; •   is the weight given to the specific variable   associated with each customer j (j =1,…. m) ; •        are the pred ictor variables for each custo mer j, from which y is to b e predicted . Customer data sets are analysed to form the r egression equa tions. An e valuation p rocess fo r each customer in the data set is then per formed. A customer is ca n be at risk o f c hurn if the p -value for the custo mer is greater than a predefined value (e.g. 0. 5), Regression anal ysis needs to b e processed in caution si nce it may give misleadin g results when carrying out causality a nd impact assessment (Cook & W eisberg, 1982; Amin et al., 2014). Multicollinearity res ulted from strong correlations between independent variables is a concerning fa ctor in logistic regression models as well (Miner et al., 2 009; Hui, Li, & Zongfan g, 201 3). The existence of strong multicollinear ity leads to incor rect conclusions ab out relation ships between independent a nd dependent variables since it i nflates the varia nces of the p arameter estimates and gives wrong m agnitude of the regression coefficient. Under certain circumstances, logistic regressio n can b e use d to approximate a nd represen t nonlinear s ystems i n spite o f it is a l inear appr oach (Cook & W ei sberg, 1982; Ren et al., 201 4). However , logisti c r egression needs to b e tested and evaluated for the increasing co mplexity o f the mobile teleco mmunication market (P aul & Cadman, 2002; Su, S hao , & Y e, 2 012; Su, 2014). 2.2 Decision T r ee Decision trees, the mos t p opular pr edictive models, is a tree graph presenting t he variables ’ relatio nships (B urez & V an den Poel, 200 7). Used to so lve c lassification and predictio n pr oblems, decision tree models ar e represented and eval uated in a top -do wn way . The two phases to develop decision trees are tree building and tree pruning. Starting from th e root node representing a feat ure to b e classified, d ecision tree i s b uilt. Selecting a f eature can be do ne b y e valuating i ts information ga in rat io. T he lo wer level nodes are t hen co nstructed in similar way to t he divide and co nquer strategy . Improvi ng pred ictive accurac y a nd r educing co mplexity , p runing proce ss is app lied on decision trees to produce a smaller tree and guara ntee a better generalisation by re moving branches contain i ng the larg est estimated er ror rate (Au, Chan, & Y ao , 200 3). The d ecision about a given ca se regardi ng to which of t he two classes it belo ngs is thus made b y moving fr o m the root node to all leaves. Thou gh there a re many algorit hms for building decision tree, C AR T , C5.0 and CHA ID are those m ost used (B in, Peiji, & Juan, 2007) . www . ccsenet.org/ibr International Business Research V ol. 8 , No. 6; 2015 226 Decision trees have several advantages. First, they are easy to visualise and un derstand (Bakır et al., 2009) . Second, no p rior assumptions abo ut the data are needed since it is a no npara metric ap proach (Fried l & Brod ley , 1997; Rahman et al., 201 2). Third, decision trees ca n process nu merical and categorical data. On t he o ther hand, decision tr ees suffer fro m so me disadva ntages. First, its perfor mance is affected b y co mplex interactio ns amon g variables and attribute s. Second, co mplex decision trees ar e very hard to be visualised and interpreted. Third, it suff ers from the lac k of robustness and o ver-sensitivity t o trainin g data sets (Burez & V an d en Po el, 2 007; Hassouna & Arzoky , 2011; De, Subbiah, & V egi, 20 14). 3. Data Mining Performance Eva luation Metrics The cr itical issue in using different churn modelling methods relate to: (a) Eff icientl y a ssessing t he perfor mance of these methods; a nd ( b), benchmarking and comparing the relati ve perfor mance a mong competing models. This section discusses t hose issues. 3.1 Classification Accuracy The confusion matrix is a tool that can b e used to measure the p erformance of binary cla ssification mode ( also called co ntingency table). A co nfusion matrix is a visual representation of information ab out actual and pred icted classifications p roduced by a classificatio n model (Mi ner et al., 2009, p. 2 92). T able 1 depicts a confusion matrix for a binary classifier . T able 1. A confusion matrix for a binary classifier Predicted classe s Actual classes Class=Y es/+/ Churn Class=No/-/No-c hurn Class=Y es/+/ Churn TP (true positive) FN (false neg ative) Class=No/-/ No- churn FP (false positive) TN (true negative ) Diff erent accurac y metrics result ed fr om the confusion matrix are classification accurac y , se nsitivity a nd specificity . Classification accurac y (CA) is the percenta ge o f the o bservations t hat were correctly classified, which can be calculated from the matrix usin g equation (4) (V u k & Curk, 2006 ): CA=           ( 4) Classification accuracy is an a mbiguous indicator particularl y in t he case of extreme data. T o explain, a base contains 9,9 90 churn custo mers and 10 non -churn ones. If a model succeeds to p redict that all 1 0,000 customers are at risk of churn, the acc uracy of classificatio n will be 99.9%. The high accuracy rate mistakenly indicates t hat the model is ver y ac curate in predicting customer churn because the m odel does not detect an y non -churn customers. Predicting churn cases in a correct way is always more important than predicting no n-churn cases as the cost of mis-pred icting churn is higher tha n that of mis -pred icting non-churn. 3.2 Sensitivity and Specificity Overcoming some of the weakness of accuracy metric, sensitivity a nd speci ficity are used. Sensitivity is the proportion of actual po sitives that are correctl y identified. Spec ificity is the prop ortion of actual negatives th at are correctly identified. The equations are t hus: Sensitivity=     (5) Specificity=     ( 6) Mobile o perators prefer m odels with high se nsitivity rather than models with h igh specificit y b ecause the cost associated with the i ncorrect classification of churners i s higher than t he cost associated w ith the incorrect classification of a no n-churner . A compro mise bet ween high sensitivity co mbined with reasonable specificity www . ccsenet.org/ibr International Business Research V ol. 8 , No. 6; 2015 227 should b e al ways made so mobile op erators can effectively mana ge their marketing b udget to achieve high customer retention. 3.3 Receiver Op erating Characteristic Curve (ROC) The Receiver Operati ng Char acteristic (ROC) curve is a d epiction of the r elations between the tr ue positive rate (i.e., benefits) and fal se p ositive rate (i.e., co sts), dra wn on x and y axi s i n l leaner scale (Kar ahoca, Karahoca & A ydin, 2007) . ROC represen ts the relations b etween the churners ratio co rrectly predicted as churners, and non-churners r atio wrongly predicted as churners. T he ROC provides relative compro mises bet ween benefits and costs. T he ROC c urve con sists of po ints correspo nding to prediction r esults. Figure 1 pr esents an example of a ROC curve. Figure 1. An example of RO C curve The best performance model i s when the ROC curve p asses through or close to ( 0, 1). T he model sensitivity a nd specificity will then b e 1 00% (i.e., no false negatives and no false p ositive r espectively). Some models such as logistic regression p roduce a score rather than p roducing b inary class dec isions (i.e. c hurn or non -c hurn). T o produce a b inary classifier i n this case, thresholds are used. If the classifi er result i s greater than a threshold, the classification class is a chur n. Otherwise, it is non -churn. Figure 1 dep icts ROC curve for a random predictor represented by t he diagonal li ne that divides t he RO C space into t wo parts represent s. RO C curve s pass ing near this line correspond to random guessing classifiers (e. g. classifying by tossing a coin). Generally , models with ROC curve s passing the top left part of the ROC curve perfor m better . The area under the ROC c urve (i.e., called AUC) is used as a per formance metric (Karahoca et al. , 2007). T he AUC value ranges from 0.0 to 1. 0. Models perform better when having greater AUC. Mo reover, models with AUC value greater than 0.5 perfor m better than random models because t he area under the ROC dia gonal line is 0.5. 3.4 Lift Chart Evaluating model perfor mance and d etermining thresholds, the Lift chart is similar to the ROC curve by yielding high tr ue positive cases. If the total number s o f t he clas sification insta nces are u nknown a nd the true p ositive TP rate cannot be computed, the ROC curve is not ap propriate and a Lift chart sho uld be use d for measuring t he accuracy of models ( V uk & Curk, 20 06) . In custo mer c hurn, t he Lift chart gathers custo mers into deciles based on their churn p redicted probability . Each decile shows the predicatio n performance o f the model. An instance o f lift chart for a c ustomer chur n model is shown in Figure 2 . T he top two dec iles arrest about 5 0% of churners and the to p five deciles assert about 90% o f www . ccsenet.org/ibr International Business Research V ol. 8 , No. 6; 2015 228 churners. Lift charts ha ve become pre ferred per formance evaluation metric used a mong marketers since it groups customers based on their relative probab ility to c hurn. Keeping the marketing cost at a minimum level along with maintaining high cu stomer retention rate can be achieved by targe ting cu stomers in the top deciles rather than targeting all of them. Figure 2. A li ft chart of customer churn model 4. Modelling Experi ments and Result s T o test those methods, churn analysis and p rediction experiments were used. The ubiquitous data min in g methodology CRISP -DM was adopted to investigate custo mer churn in the telec ommunications sec tor (Chapman et al. , 2000; Rudin & W a gstaff, 2014 ). T he CRISP -DM methodolo gy gives co mprehensive instructions and pro cedures for applying data mining algorithms to solve real- world prob lems. The CRISP-DM model consists of six phases: (1) Busin ess understanding; (2) data understanding; (3) data preparation; (4) modelling; (5) evaluation; a nd (6) deployment. Understanding business involves specifying its ob jectives and correlating them with data mining applications. Understanding data entails data co llection, familiarization, exploratio n, d escription and quality veri fication. Data preparation transfor ming r aw data into a suitable for mat in the sake o f app lying data mining algorit hms includes data cleaning, trans formation and r eduction. Modelling selects and applies suitable data mining a lgorithms and identifies their par ameters for addressing business pr oblem. T he decision abo ut adopting data mining models should be taken based on the evaluation results f rom a business o bjective perspective. Deployment is th e last phase where the discovered kno wledge from the data mining process should be organized and presented in a format that the business can use. 4.1 Data Sets De scription This section highlights the proce sses involved in t he data understanding phase. T wo sets of data were used to show ho w traditional a nalysis techniq ues may be used to explo re churn. T he data sets were obtained from a U K mobile telecom munication op erator d ata warehouse. T he analysis is based on t wo datasets of 15 ,519 and 19 ,919 customers resp ectively conta ining 17 variables. The d ependent variable ( output variable) simply identifies whether the customer c hurns or not. The predictor variables (input variables) are i n T ab le 2. These variab les are categorised into five groups: Demographics, cost, features/market ing, services usa ges an d customer services. A brief descrip tion of the predictor variables is provided by T able 2. Both data sets contained 5 0% of custo mers who ch urned and 50 % who sta yed with the operato r . The data sets also contain custo mers on 12 - and 18-month contracts. The data sets reflect the customer population features under stud y . The d ata sets cover customers ’ data from March to Dece mber, at which the c ustomers took the decision of chur n. The data were collected from Month 4 to Month 9 for the ne w or upgraded 1 2 -month c usto mers. In addition, the data were collected from Mon th 9 to Month 1 5 for those on 18 month contracts. The time of the d ata sets used in this study is illustrated in Fi gure 3. www . ccsenet.org/ibr International Business Research V ol. 8 , No. 6; 2015 229 Figure 3. T ime window of data sets T able 2. Desc ription of data sets ’ variables Category V ariable Name Description Demog raphics Lifestage _Segment Subscribers ’ age stage and gender Gender Subscribers ’ gender Post_Code Post code in whi ch subscribe rs live Cost Package_Cost Cost of the package of serv ices chosen by subscriber s Contract_L ength Number of mon ths of the contrac t T enure Number of mon ths with the prese nt mobile oper ator Features/Market ing T ariff The package of services chosen by subscriber s Device_De sc Handset model a nd manufacture r sales_channel The first channel where the relationship with the customer was established Usage Leve l Q2_bytes Data usages in t he second quarter Q3_bytes Data usages in t he third quarter Q2_voice V oice usages in the second quarter Q3_voice V oice usages in the third quar ter Customer Se rvices No_of_Repairs Number of time s handset has bee n in for repair in a 12-month period Prob_Handset Known issues w ith existing ha ndset No_of_Complain ts Number of custome r complaints re garding billing in a 12 month pe riod 4.2 Data Preparation The data preparation phase produces the final d ata sets fo r models. T he main goal of data prep aration is to enhance data quality a nd improve data anal ysis perfor mance. Data prepar ation needs to be ca rried o ut in a more iterative manner until a co nclusive outco me is reached. In this study , data preparation processes include: • Imputation of the missing values; • Discretisation of numerical variables ; • Transformation fro m one set of discrete values to a nother ; • Feature selection of the most informative variables ; • New variable derivation. Imputation p rocess involves replacing missing values with co mplete in formation based on an estimate fro m completed values. Creating new variables from the data is based on discreti sation and transfor mation. T wo new variables were formed to mea sure voice a nd cha nge i n data usage. T he per formance res ults o f t he exper iments show t hat models with selected features outperfor m t hose with full feature sets. In spite of the fact that the data sets used in this stud y have few variables, a feature selection pro cess performed involved three steps: www . ccsenet.org/ibr International Business Research V ol. 8 , No. 6; 2015 2 30 1) Feature screening by deletin g unreliable variables. Two variables (i.e., proble m_handset and no_of_complaint) were re moved from the feat ure list because they had ver y little variation. 2) Feature ranking b y sorting t he rest o f variables b ased on their importance a nd correlation with t he dependant variable. 3) Feature selection b y choosing the most i mportant variables and re moving others. 4.3 Modellin g In this pha se, lo gistic r egression and decision tree techniq ues were applied and their parameters were calibrated to o ptimise values. IBM SP SS Mod eller ( formerly Clementine) was used to build and experi ment with those techniques. 4.4 Logistic Reg r ession Analys is A model can be marked ‘ goo d ’ when having a 25 % higher classification accurac y rate than t he propo rtional by chance acc uracy rate (Costea & Ekl und, 20 03). T he logistic regression analysis was sta rted by computi ng t he proportional by cha nce accurac y rate based on calc ulating the pro portion of cases for each group (c hurn and non-churn). T he propor tions of the no n-ch urn and the churn gro ups are p resented in T able 3. The proportional by chance acc uracy rate was then co mputed b y squaring and summarising the proportion of cases in each group. The logistic regressio n acc urac y rates are 25 % higher than the propo rtional by cha nce accurac y rate. As a r es ult, the classification acc uracy criteria are satisfied and the logistic r egression model perform s better than a rando m guess. A relatio n between the dep endent variable (i.e., pro bability of churn) and combination o f independent variables is veri fied. The indep endent variables have been added to the analysis and the statistic al significance of the model chi -square at step 1 ha s bee n checked. The probabilities of the models ’ chi-squares for both data sets were less th an or equal to the level of significance of 0.05. The null hypothesis was accor dingly rejected . T he existence of a relations hip between the independen t variables and the dep endent variable was thus supported . T able 3. The proportional by chance acc uracy rate Data set Proportion of ch urner Proportion of non-churner Proportional by chance Logistic regr ession model accuracy rate Dataset 1 0.5277 0.47220 0.5176 0.681 Dataset 2 0.499 0.5033 0.5023 0.642 Multicollinearity is a potential issue in logistic regres sion th at is d etected b y examining t he standard erro rs of the B co ef ficients. An erro r larger than 2.0 indicates numerical p roblems (Miner et al., 2009 ). No ne of the independent variables in t his anal ysis had a stand ard error lar ger than 2.0. As a resul t, multicollinearity was not a concern in this study . W e present the result s o f the first data set as the a nalysis gave very similar results for t wo data sets. Figure 4 sho ws the buil t mode l for logistic regre ssion. T he s tream presents data operations perfor med on t he ra w data. Each oper ation is represented b y an icon or node. The n odes are linked together in a strea m representing the flow of data through eac h op eration. T he classificatio n table for the best performing l ogistic regre ssion m odel based on the classificatio n c ut-o ff of 0.5 and 95% confidenc e intervals (CI for exp (B)) is shown in T ab le 4. T he classification table register s correct and not c orrect estimations. T able 4. The logistic regression clas sification table Observe d Predicted Flag Correct Per centage No Y es The Last step Flag No 5183 1294 80.0 Y es 2175 2226 50.6 Overall Pe rcentage 68.1 www . ccsenet.org/ibr International Business Research V ol. 8 , No. 6; 2015 231 T able 5 lists the B parameter, w ald statistics, degree o f freedom, significance levels and odd ratio . The W ald statistics and th e co rresponding signi ficance level test the statistical si gnificance of ea ch coefficient (B). If the W ald statistic is significant ( i.e. less than 0.05), the variable is sig nificant. Exp (B) values in T ab le 5 are the results of testing the risk of customer churn. Exp (B ) is the predicted change in odd s for a unit i ncrease in the corresponding indep endent variable. If odds ratio is less than 1, a decr ease happ ens in odds. Other wise, a n increase happens in od ds. If Odds ratio is 1.0, the independent variable d oes not affect the dependant variable. Figure 4. Data stream for the logistic regre ssion model T able 5. Logistic regre ssion analysis results of t he first step B S.E. W ald df Sig. Exp(B) Step 1 a CNTR_L ENGTH .084 .009 79.717 1 .000 1.087 Gender .050 .050 .972 1 .324 1.051 no_of_repairs -.220 .072 9.282 1 .002 .803 Cost -.002 .002 1.343 1 .247 .998 billing_querie s -.057 .062 .830 1 .362 .945 T enure -.026 .003 97.216 1 .000 .974 Coverage -.289 .052 30.309 1 .000 .749 chang_data_usage -.159 .048 10.781 1 .001 .853 chang_voice_us age -.943 .081 133.980 1 .000 .390 bundle_avr_usag e -.106 .069 2.358 1 .125 .900 Constant .853 .247 1 1.946 1 .001 2.347 All o f the variables with hig her significance level ( more than 0.0 5) are eliminated fro m the m odel. Then, the logistic regressio n model is applied pr ogressively throu gh several iterations to elimin ate all non-significant variables. T able 6 shows all the signi ficant variables, whic h were included in the final logi stic model. T able 6. Logistic regre ssion analysis results of t he last step B S.E. W ald df Sig. Exp(B) Step 6 cntr_length .085 .009 85.360 1 .000 1.089 no_of_repairs -.217 .072 9.088 1 .003 .805 tenure -.027 .003 105.587 1 .000 .974 Coverage -.292 .052 31.086 1 .000 .747 chang_data_usage -.162 .048 1 1.193 1 .001 .850 chang_voice_us age -.973 .079 152.903 1 .000 .378 Constant .735 .217 1 1.461 1 .001 2.085 www . ccsenet.org/ibr International Business Research V ol. 8 , No. 6; 2015 232 4.5 Decision T r ee Analysis Customer churn behaviour was inv estigated by creati ng a d ecision tree m odel. Three decision tree algorithms (CAR T , C5.0 and CHAID) were utilised to evaluate their performance. T he accuracy rates of t he d ecision tree were 68.57 % with respect to CAR T , 68.83% with respect to the CHAD a nd 70 .25% with respect to the C5. The results of C5 are disc ussed i n more d etail as it has the best pr edictive per for mance. The decision tree co nstructed using C5 model is shown in Figure 7 . The most important independent varia bles are cntr_length, chang_voice_usage and tenure. Ranking independen t variables based on their importance has be en sho wn to be consistent with the logi stic regression results. Figure 5. Data strea m for the decision tree model 5. Model Eva luation and Discussion From a data-anal ysis perspective, t he models presented i n the last sectio n were chosen based on the quality of their o utputs. In the evaluat ion p hase, a perfor mance co mparison of the di ff erent modell ing techniques i s car ried out. A decision o n which modellin g technique to be a dopted for deploying the final custo mer churn model should be taken at the end of this phase. AUC, RO C curve, to p -decile and overall accuracy ar e used here to compare the performance of the modelling techniq ues. Based on the evaluation pr ocesses, t he limit at ions of d ata m ining in customer churn anal ysis are discussed. Figure 6 depicts t he RO C curve for three decision tree s and one logistic regression models. T able 7 presents the three evaluatio n metrics adopted in this stud y . Fig ure 6 and T able 7 p roves tha t decision trees and C5 outper for m the logistic r egression m odels. T hese results ar e consistent with some pr ior literature (e.g., Bakır et al., 2009) , having in mind that achieving fair co mparison is difficult b ecause of the diff erence in the data sets use d. T his study sug gests decision tree analysis as a potentially valuab le to ol for churn prediction based on the evaluation results and the data sets u sed. T able 7 shows that C5 model outperforms all other models, including the Logistic Re g. (2) model devel op ed by a data analytics team working for the mobile operator . C5 m odel in Figure 7 shows a lift value of 1.598 at 30 percentile. T o explain, C5 model ca n capture about 16% o f the churners in the top 30% of a sorted list of churned customers. www . ccsenet.org/ibr International Business Research V ol. 8 , No. 6; 2015 233 Figure 6. The ROC curve of d ecision trees and logistic regression models T able 7. Mo del evaluation metrics Model Lift (top 30%) Area under curve Overall accur acy C5 1.598 0.763 70.25 CHAD 1.584 0.710 68.83 CAR T 1.409 0.603 68.57 Logistic Reg. 1.218 0.723 68.1 Logistic Reg.(2) 1.4 - 67.5 Figure 7. The C5 decision tree model www . ccsenet.org/ibr International Business Research V ol. 8 , No. 6; 2015 234 6. Limitations of Da ta Mining in Customer Retention Analysis The results the d ata mining models are reported in this paper prove that prediction accuracy can b e im proved further b y fine tuning model par ameters. Specifically , the experimental res ults show that the pred iction accuracy of the models develop ed has i mproved ac curacy by 1 -3% compared to that achieved b y the mobile o perator . Although data mining is an advanced custo mer churns analysis too ls, the following poten tial limitations to its u se exist: 1) There is no direct v alue explanation of these p atterns. Skilled data mining professionals are needed to provide such an expla nation (Friedl & Bro dley, 1997). 2) Although data mining links between variables and customer behaviours, it fails to identify the causal relationships bet ween variable s (Cook & Weisberg, 1982). D ecision trees can be used in some cases to infer causal relationships. 3) Data mining models have a relatively short expir y life. The mobile m arket faces ne w t echnologies on a daily bases. As a result, histor ical data b ecome less valuable for predictions. 4) Churn relates to co mplex interactions within populatio n. Examining all the factors a ffecting c ustomer ch urn simultaneously and j ointly by building a model is not a pplicable. 5) The level of the a nalysis in t he data mining models d ecreases the ability to captu re the heterogeneity o f the customers. Ne w tech niques are thus required to suppo rt custo mer churn anal ysis to bypass the cons o f the data mining techniques. T able 8 lists the limitations of using data mining tools in customer churn analysis and describ es the general requirements to o vercome the m. Agent -Based Mo delling and Si mulation (ABMS) technique is a mong t he techniques t hat ha ve the potential to advance the custo mer churn anal ysis and o vercome so me of the li mitations of data m ining (P aul & Cadman, 2 002 ). ABMS is an emerging means of si mulating b ehaviour and examining behavioural consequences. ABMS is a computational model for simulating the actions an d interactions of autonomous individuals i n a network, with a view to assessing their e ffects on t he s ys te m as a whole. I n o utline, agents represent custo mers and agent relationship s represent pr ocesses of agent interaction . T able 8. D ata mining limitations in custo mer churn analysis No. Description of the limitation How to ove rcome it 1 Explanation of the v alue and the sig nificance of t he patterns and relationships necessitate skilled data mining profe ssionals Provide an easy and intuitive way to explain the value and the significance of re lationships betw een variable s 2 Low e fficiency of identify ing causal rel ationship Enhance the a bility of identify ing causal re lationships 3 Short expiry date and the depe ndency on data quality Offe r to ols for long-term planning, and minimise the dependency on data quality 4 Neglecting custome rs ’ interactions Include custome r interaction i n customer churn analysis 5 Cannot captures t he heteroge neity of custome rs Offe r tools that take ac count of the hete rogeneity of customers 7. Conclusion Many mobile operator s have used data mining techniques such as regre ssion and decision trees to overcome churn p roble ms. The e xperimental results cond ucted in this stud y s how that the decisio n t ree model outperfor ms all the logistic regression mod els exa mined, including the model develop ed by a d ata analytics team working in the mobile o perator . The decision tree model produces a lift value of 1.598 at 30 percentile compared with 1.4 at the same percentile p roduced by the d ata analytic s tea m. For this pro bl e m and for similar data sets, therefore, the conclusion is that d ecision trees are a preferable technique for investigating custo mer ch urn. Mo re reflectively , although data mining provides ver y useful i nsight into c ustomer churn, limitations ar e app arent in relatio n to significance, ca usality , d ata e volution, model co mplexity an d aggregation. In ord er to overcome t hese li mitations new ways of understanding and exploring data may b e helpful - Agent Based Mo delling Simulation (ABMS) provides one such aven ue for more complex i nvestigations o n customer retention. www . ccsenet.org/ibr International Business Research V ol. 8 , No. 6; 2015 235 References Abbasi, M. S., T arhini, A., Hassouna, M., & Shah, F . (2015). Soc ial, organizational, demograp hy and individuals ’ technology accep tance behaviour: A co nceptual model. European Scien tific Jou rnal, 1 1 (9), 48 -76. http://eujournal.org/index.php/esj /article/view/5279 Ahn, J . H., Han, S. P. , & Lee, Y . S. (2 006). Customer churn anal ysis: Churn deter minants and mediation e ffects of par tial d efection in the Ko rean mobile teleco mmunications ser vice indu stry . T elecomm unication s Po licy , 30 (10), 552-568. http://dx.do i.org/10.1016/j.telpol.2006.09.006 Alenezi, H., T arhini, A., & Masa ’ deh, R. (2015). Investigating the Stra tegic Relat ionship betw een Infor mation Quality and E-Government Benefits: A Literature Re view . International Review of Social Science s and Humanities , 9 (1 ), 33 - 50 . http://www .irssh.co m/volumes/vol_9_no_ 1_april_2015 Amin, A., Khan, C., Ali, I. , & Anwar, S. (2014). Custo mer churn prediction i n telecommunication indus try: With and without counter-example. Natur e-Inspired Computation and Ma chine Learning (pp. 206 -218). Springer . http://dx.doi.org/10.1007/9 78-3- 319 - 13650 -9_19 A ydin, S., & Ö zer , G. (2005). The anal ysis o f antece dents of custo mer lo yalty in t he Turkish mobile telecommunication market. Eur op ean J ournal of Marketing , 39 ( 7), 910-925. http://dx.doi.org/10.1 108/0309056051 0601833 Bakır , B., Batmaz, İ., Güntürkün, F ., İpekçi, İ., Köksal, G. , & Ö zd emirel, N. (2009). Defect cause modeling with decision tree and regressio n analysis. Relation , 24 (1), 1- 4. Bin, L., Peiji, S., & J uan, L. (20 07) . Customer chu rn prediction ba sed on the decision tree in p ersona l handyphone system service . In 2007 Internatio nal Conference on Ser vice Syste ms and Service Management (pp. 1-5). IEEE. http://dx.doi.org/10.1 109/ICSSSM.2007.428 0145 Burez, J., & V an den P oel, D. (2007). CRM at a p ay -TV co mpany: Using analytical models to r educe c ustomer attrition b y targeted marketing for subscrip tion services. Expert Systems with Applications, 32 (2) , 277-288 . http://dx.doi.org/10.1016/j.eswa.2005 .1 1.037 Chapman, P ., Clinton, J., Kerber, R., Khabaza, T ., Reinartz, T ., Shearer, C. , & Wirth, R. (2000). CRISP -DM 1. 0 Step- by -step data mining guide. Cook, R. D., & W eisberg, S. (1982). Criticism a nd influence anal ysis in regression. Sociological Meth odology , 13 (3), 313-361. http://dx.doi.org /10.230 7/270724 Costea, A., & Ekl und, T . (2003). A two-level a ppr oach to making cla ss predictions, system scien ces . P roceedings of the 36th An nual Hawaii IE EE International Conference ( pp. 1 - 9) . Hawaii. De, A., Subb iah, S., & V e gi, S. R. ( 2014) . Metho ds for managing telecommunica tion service and devices thereof . Google Patents. El -Masri, M., & T ar hini, A. (20 15). A design scien ce approach to gamify educ ation: F r om games to pla tforms . T wenty-Third Europea n Conference on Infor mation Systems (ECIS) (pp. 26 - 38 ). Mü nster , Ger many . Faris, H., Al-S hboul, B. , & Ghatas heh, N. (2014 ). A gen etic pro gramming based frame work for c hurn pr ediction in telec ommunicatio n industry . Computational Collective Intelligence , T echn ologies and App lications (pp. 353 -362). Springer . http://d x.doi.org/10.1007/978 -3- 319 -1 1289- 3_36 Farquad, M. , Ravi, V ., & Raju , S. B. (2014). Churn p rediction u sing co mprehensible s upport vector machine: An analytical CRM application. A pplied Soft Compu ting, 19 (1), 31-40. http://dx.doi.org/10.1016/j.asoc.201 4.01.031 Friedl, M. A., & B rodley , C. E. (1997). Decision tree classification of la nd co ver from remotely sensed data. Remote Sen sing of Environment, 61 (3), 399-409. http://dx.do i.org/10.1016/S0034 -4257(97)00049 -7 Glady , N., Baesens, B ., & Croux, C. (2009 ). Modeling chur n using cus tomer lifeti me value. European Journal of Operationa l Resear ch, 197 (1), 402-411 . http://dx.doi.org/10.1016 /j.ejor .2008.06.027 Han, S. H., Lu, S. X., & Leung, S. C. (201 2). Segm entatio n of telecom customers based on customer value b y decision tree model. Exp ert Systems with Application s, 39 (4), 3964-3973. http://dx.doi.org/10.1016/j.eswa.2011.09.034 Hassouna, M. , & Arzoky , M. (2011). Agent based modelling and simulation: T oward a new mo del o f cu stomer r etention in the mobile market. Proceedings o f the 2011 Summer Computer Simulatio n Co nference ( p p. 30 -35). Society for Modelin g & Simulation I nternational. www . ccsenet.org/ibr International Business Research V ol. 8 , No. 6; 2015 236 Hu, C., Shu, H., & Qiao, X. ( 2014) . Customer segmen tation model r esear ch based on o r ga nizational customer life cycle in telecom ope rator . 2014 International Conference on Education T echnology and Social Science , Atlantis Press. http://d x.doi.org/10.2991/icetss -14.2014.8 8 Huang, B., Kec hadi, M. T ., & Buckley , B. (2 012). Customer ch urn predictio n in teleco mmunications. Exp ert Systems with Applications, 3 9 (1), 141 4-1425. http://dx.doi.o r g/10.1016 /j.esw a.2011.08.024 Hui, L., Li, S., & Zongfang, Z. (201 3). T he model and em pirical research of application scoring b ased on d ata mining methods. Pr ocedia Comp uter Science, 17 , 91 1-918. http://dx.doi.org/10.1016/j.procs.201 3.05.1 16 Hung, S. Y ., Y en, D. C. , & W ang, H. Y . (2006). Applying data m ining to telec om c hurn management. Expert Systems with Applications, 31 (3), 515 -524. http://dx.doi.o rg/10.1016/j.eswa.2005.09.080 Karahoca, A., Karahoca, D., & A ydin, N. (2007). GSM churn managemen t using an adaptive neu r o- fuzzy infer ence system . The 2007 International Conference o n Intelligent Pervasive Computing, 200 7 , IPC (p p. 323 -326). IEEE. V alencia, Spain. http://dx.doi.org/10.1109/IPC.2007.1 19 Jonathan, L., J anghyuk, L., & Lawrence, F . (2001). T he impact of s witching co sts on the c ustomer satisfaction-loyalty link: Mobile phone service in France. Journal of Service s Ma rketing, 15 (1 ), 3 5-48. http://dx.doi.org/10.1 108/08876040110381463 Liu, D. S., & Fan, S. J. (2014). A mod ified decisio n tree algor ithm based on genetic al gorithm for mobile user classification prob lem. The Scientific W o rld Journal , (1) . http://dx.doi.org/10.1 155/2014/468324 Masa ’ deh, R., T ay eh, M., Al-Jar rah, I. M., & T arhini, A. (2015 ). Accounting vs. market -based measures of firm performance related to information technology inv estments. International Review of Social Scien ce s and Humanities , 9 (1 ), 129-145. http://www .irss h.com/volumes/vol_ 9_no_1_april_20 15 . Masa ’ deh, R., T arhini, A., A l -Dmour , R. H., & Obeidat, B. Y . (2015). Strategic IT -bu siness alignment as managers ’ explorative and explo itative strateg ies. European Scientific Journal, 11 (7), 437- 457 . http://eujournal.org/index.php/esj /article/view/5334 Masa ’ deh, R., Obeidat, B. Y ., Al -Dmour , R. H., & T ar hini, A. (2015 ). Kno wledge manage ment strategie s as intermediary variables betwee n IT -business strategic a lignment and firm p erforma nce. Eur opean Scientific Journal, 1 1 (7), 344- 368 . http://eujournal.org/index.php/esj/article/view/53 McIlroy , A., & Barnett, S. ( 2000). Building c ustomer relationships: Do d iscount cards work? Managing S ervice Quality , 1 0 (6), 347 -355. http://dx.doi.org/10.11 08/0960452001 0351491 Migué is, V . L., V an den P oel, D., Ca manho, A. S. , & Falcã o e Cunha, J. (2012). Mo deling partial custo mer chur n: On the v alue of first product-categor y purchase sequences. Expert Systems with Applications, 39 (12) , 1 1250-1 1256. http://dx.doi.o rg/10.1016/j.eswa.2012.03.073 Nie, G., Rowe, W ., Zhang, L., Tian, Y ., & Shi, Y . (2011). Credit card churn forecasting b y logistic regre ssion and decision tree. Expert Systems with Appli cations, 38 (1 1), 15273-152 85. http://dx.doi.org/10.1016/j.eswa.2011.06.028 Miner , G ., Nisbet, R., & Elder, IV . J. (2009). Handbook of statistical an alysis and d ata minin g applications . Academic Press. Oliveira, V . L. M. (201 2) . An alytical custo mer r elationship manag ement in r etailing supported by d ata mining technique s . Universidade do Porto . Ordenes, F . V ., Theodoulidis, B ., Burton, J., Gruber, T . , & Zaki, M. (2014). Analy zing customer experience feedback using text minin g: A linguistics-based approach. Jou rnal of Service Research , 10 (1), 242- 256 . http://dx.doi.org/10.1 177/1094670514 524625 Orozco, J., T arhini, A., Masa ’ deh, R., & T arhini, T . (2015). A framework o f IS/busine s s alig nment management practices to im prove the d esign o f IT Governance architectures. Internationa l Journal of Business and Managemen t, 10 (4), 1 -12. http://dx.doi.org/10.5539/ijbm.v10n4p1 Orozco, J., El-Masri, M., & T arhini, A., T arhini, T . (2015 ). The Impact of IS-Business A lignment Prac tices on Organizational Choice of IS-Busi ness Alignment Strategies . The 19th Pacific Asia Confer ence on Information S ystems (P ACIS 2015 ), Singapore, 6 -9 July 2015. Paul, T ., & Cadman, R. (200 2). Agent -based modeling of c ustomer behavior in the teleco ms and media marke ts. Info, 4 (1) , 56-63. http://dx.doi.org/10.1 108/1463669021042664 0 Rahman, A., Kumar , S., Fazal, S., & Siddiqui, M. A. (2012). Assessment of land use/land co ver change i n the www . ccsenet.org/ibr International Business Research V ol. 8 , No. 6; 2015 237 north-west district o f Del hi usin g remote sensi ng and GI S techniques. Journal o f the Ind ian Society o f Re mote Sensing, 40 (4), 689 -697. http://dx.doi.org/10.1007/s12524 -011-0165-4 Ren, J., Ning, Z., Kir kness, C. S., Asche, C. V . , & W ang, H. ( 2014). Risk of u sing lo gistic regression to ill ustrate exposure-response relationship of infectious diseases. BMC Infectiou s Diseases, 14 (1), 540. http://dx.doi.org/10.1 186/1471 - 2334 - 14 - 540 Rudin, C., & W agstaff, K. L. (20 14). Machine learning for science a nd society . Ma chine Learnin g, 95 (1), 1-9. http://dx.doi.org/10.1007/s10994 - 013 - 5425 -9 Seo, D., Ranganathan, C., & Babad, Y . (2008). T wo-level model of customer retention in the US mobile telecommunications service market . T e lecommu nication s Policy , 32 (3 ), 18 2-196. http://dx.doi.org/10.1016/j.telpo l.2007.09.004 Shaw , M. J., Subramaniam, C., T an, G. W . & W elge, M. E . (2001). Knowledge management and data mining for marketing. Decision S upport Systems, 31 (1), 127 -137. http://dx.doi.org/10.1016/S0167 -9236( 00)00123 -8 Su , Q. (2014 ). Research of i nstrument to m easure customer perceived service quality o f mobile data services. Internationa l Journal of Networking and V irtual Or ganisations, 14 (1), 176 -196. http://dx.doi.org/10.1504/IJ NVO.2014.065101 Su, Q., Shao, P ., & Y e, Q. (2012). T he analysis on th e det er min ants of mobile VIP customer churn: A logistic regression approach. Internationa l Journal of Services T echnolo gy a nd Management, 18 (1), 61-74. http://dx.doi.org/10.1504/IJ STM.2012.049016 T arhini, A., Ammar , H., T arhini, T. , & Masa ’ deh, R. (201 5). Anal ysis of t he critical success factors for enterprise resource planning imple mentation fro m stake holders ’ persp ective: A systematic revie w . Internationa l Business Research , 8 (4), 25-40. http://dx.doi.org/10.5539 /ibr .v8n4p25 Ta rhini, A., T eo, T ., & T arhini, T . (2015 ). A cross-cultural validit y of the E-learning Accept ance Measure (ElAM) in Lebanon and England: A co nfirmatory factor analys is . Education and Information T ec hnologies . http://dx.doi.org/10.1007/s10639 - 015 - 9381 -9 T arhini, A., Fa kih, M., Arzoky , M. , & T ar hini, T . (2015 ). Designing guideli nes to disco ver causes of dela ys in construction pro jects: The case of Lebanon. I nternationa l Business Resear ch, 8 (6). T arhini, A ., Hasso una, M., Abbasi, M. S. , & Orozco, J. (2015 ). T owards th e ac ceptance of RSS to support learning: An e mpirical study to validate the technology acceptance model in Lebanon. Ele ctr onic Jo urnal of e-Learning , 13 (1), 30- 41 . Retrieved fro m http://www .ej el.org/volume13/issue1 T arhini, A., Scott, M., Shar ma, K. S., & Abbasi, M. S. (2015e). Diff erence s in inte ntion to use educat ional RSS feeds between Lebanese and British S tudents: A multi ‑ group analysis based o n the technology ac ceptance model . Electr onic Journ al of e-Lea rning, 13 (1 ), 14-29. Retrieved fro m http://www .ejel. org/volume13/issue1 T arhini, A., Hone, K., & Li u, X. (20 14a). A cross -cultural exa mination of the impact o f social, organizational and individual factors on technology acceptance bet ween British and Lebanese university stu dents. British Journal o f Educational T ech nology . http://dx.doi.org/10.1 1 1 1/b jet.12169 T arhini, A., Hone, K., & Liu, X. (2014b). T he effects o f ind ividual di f ferences o n e -learning users ’ be haviour i n developing co untries: A structural equatio n model. Comp uters in Human Behavior , 41 , 153-163. http://dx.doi.org/10.1016/j.chb.2 014.09.020 T arhini, A., Ho ne, K., & Liu, X. (2014c). Meas uring the moderating effect of gender and age on e-lear ning acceptance in England: A structural equation modellin g approach for an exten ded technology acceptance model. Journal o f Educationa l Computing Resea r ch, 51 (2), 163 -184. http://dx.doi.org/10.2190/EC.5 1.2.b V uk, M., & Curk, T . ( 2006). ROC curve, lift chart and calib ration plot. Metodolo ˇski Zvezki, 3 (1 ), 89-108. W ei, C. P. , & Chiu, I. (2002) . Turning teleco mmunications call details to churn prediction: A data mining approach. Expert Systems with App lications, 23 ( 2), 103-11 2. http://dx.doi.org/10.1016/S0957 -4174 (02)00030-1 Copyrights Copyright for this article is r etained by the author(s), with first publication rights granted to the journal. This is an open-acces s article distributed u nder the terms and condition s of t he Creative Co mmons Attribution license (http://creativeco mmons.org/licenses/by/3.0/).

Customer Churn in Mobile Markets A Comparison of Techniques

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment