Inclusive Ranking of Indian States via Bayesian Bradley-Terry Model

Inclusiv e Ranking of Indian States via Ba y esian Bradley-T erry Mo del Arshi Rizvi Departmen t of Mathematics Indian Institute of T ec hnology Delhi maz248174@maths.iitd.ac.in Rah ul Singh Departmen t of Mathematics Indian Institute of T ec hnology Delhi sirahul@iitd.ac.in Abstract Ev aluating the p erformance of diﬀeren t administrativ e regions within a country is crucial for its dev elopment and p olicy formulation. The p erformance ev aluators are mostly based on health, education, p er capita income, aw areness, family planning and so on. Not only ev aluating regions, but also ranking them is a crucial step, and v arious metho ds ha ve been prop osed to date. W e aim to provide a ranking system for Indian states that uses a Bay esian approach via the famous Bradley-T erry mo del for paired comparisons. The ranking metho d uses indicators from the NFHS-5 dataset with the prior information of p er-capita incomes of the states/UT s, thus leading to a holistic ranking, which not only includes human dev elopment factors but also take accoun t the economic background of the states. W e also carried out v arious Marko v chain Mon te Carlo diagnostics required for the reliability of the estimates of merits for these states. These merits thus provide a ranking for the states/UT s and can further b e utilised to make informed p olicy decisions. K eywor ds: Ba yesian Inference, Bradley-T erry Model, Mark o v Chain Monte Carlo, Paired Comparison, Ranking. 1 In tro duction India consists of 28 States and 8 Union T erritories, each go verned separately . The dev elopment of the country dep ends on the developmen t of its states and union territories. Since indep endence, India has b een facing and trying to resolv e the issues of regional disparities. States suc h as Kerala, Maharash tra, Goa, and T amil Nadu are on trac k for rapid economic developmen t, whereas states suc h as Bihar, Uttar Pradesh, and Jharkhand are still struggling to ev en meet basic human needs. There are v arious reasons for this inequalit y , some of which are mentioned in Y adav (2023), including historical, geographical, migration, and p olitical instabilit y . F rom time to time, v arious indices and measures are released to help states compare their p erformance across ﬁelds such as health, emplo yment, education services, and aw areness. V arious surv eys are also b eing conducted fo cussing on these factors. One suc h survey is the National F amily Health Surv ey (NFHS-5), which highligh ts 1 imp ortan t asp ects of family well-being in the states and UT s. Sev eral studies, lik e a review of the NFHS data in addressing India’s maternal health situation by Ra j and Gupta (2022), inter-state disparities in female educational attainment in India by Y adav (2026), ev aluating states based on women emp o wermen t b y calculating emp o wermen t index by Vignitha et al. (2024), etc. ha ve b een conducted using the NFHS data, hoping to help the gov ernment and p olicymak ers to make w ell-informed decisions in the direction of dev elopment with a sustainable approac h. T o facilitate impro ved p erformance of the states, the identiﬁcation of their p osition in a holistic ranking is required. There is a platform called NITI Aay og, which is the Gov ernment of India’s top p olicy think tank, promoting economic developmen t and co operative federalism b y actively in volving States in p olicymaking. It promotes comp etitiv e federalism by driving b etter p erformance among States and UT s through transparent, sector-wise rankings. These rankings, based on ob jectiv e indicators, motiv ate healthy comp etition and encourage b oth states and districts to improv e outcomes. Th us, reﬂecting on the fact that rankings are an integral and initial step for developmen t. Oliv er (2010) and Purtle et al. (2019) used rankings based on p opulation health to measure state p erformance and mak e internal p olicy decisions. There are several metho ds of ranking regions/states of a country , like the index metho d, whic h aggregates v arious indicators into one score. F or example, HDI uses the geometric mean to calculate the score and faces limitations when one of the v ariable b ecome close to 0 . Other metho ds use PCA which fail to mo del non-linear relationships b et ween the indicator v ariables. Motiv ated by this fact, we prop ose one suc h ranking which is based on a paired comparison mo del, namely the Bradley-T erry mo del. The mo del is constructed using comparisons b et ween states on diﬀerent indicator v ariables in the NFHS d ata. The Bradley-T erry mo del assigns merits to a set of states/UT s based on pairwise comparisons of these states, where the logit of state s 1 b eating state s 2 is giv en by the diﬀerence of their merits. Assuming randomness of merit parameters, the mo del is extended to use prior information of the states via a Ba yesian approach. The Bay esian metho d allows us to work with the distribution of the merit parameters and has ma jor adv antages, as discussed in W ainer (2023), whic h uses the Ba yesian Bradley-T erry approach to compare m ultiple machine learning algorithms on multiple datasets. One of the adv antages includes providing the probability of one algorithm b eing b etter than the other (uncertaint y in ranking), unlike MLE estimates. Caron and Doucet (2012) discussed eﬃcient Bay esian inference for Generalised Bradley-T erry mo dels, in whic h they applied the Ba yesian metho d where MLE cannot be found for the original data set, as some of the items alw ays lose in all the comparisons. Seymour et al. (2022) used the Ba yesian approac h to develop the Bay esian spatial Bradley-T erry (BSBT) mo del to identify the deprived regions of T anzania, where the net work representation of the country is used as prior information. In a country like India, economic developmen t is one of the ma jor factors con tributing tow ards the gro wth of states in terms of b etter living (Ra j et al. (2024)). Thus, using p er-capita income as prior information for the states w ould lead to a reliable ranking system. 2 Main con tributions in the pap er are as follo ws: 1. T o provide a ranking system based on comparisons across v arious indicator v ariables that fo cuses on key asp ects of holistic state developmen t, with the prior information of economic disparit y b et ween states/UT s. 2. T o mo del the cov ariance matrix as a function of the economic proximit y of the states/UT s for mo delling prior. 3. T o analyse the conv ergence of the Mark ov chain obtained via the Mark ov chain Monte Carlo sampling metho d so that the ﬁnal estimates of merit parameters are reliable and lead to a stable ranking. 2 Datasets 2.1 NFHS Dataset The dataset is collected from the National F amily Health Survey (NFHS) 1 conducted in 2019-21 (NFHS-5). The NFHS is one of the w orld’s largest household surveys. It is widely used to track data on demographic and health indicators across Indian states and districts and was conducted under the Indian Ministry of Health and F amily W elfare (MoHFW). It provides information on p opulation, nutrition, health, and family planning for Indian states and union territories (as of Marc h 2017). The information was collected from 636,699 households, 724,115 women, and 101,839 men. The survey includes 131 questions (indicator v ariables) falling in diﬀerent groups, to name a few, p opulation and household proﬁle, marriage and fertilit y , infant and child mortality rates, curren t use of family planning metho ds, quality of family planning services, maternal and c hild health, treatmen t of childhoo d diseases, nutritional status of adults, women emp o wermen t, gender based violence, etc. India has several demographic and health databases, but the NFHS is unique in that it not only provides individual-level records but also provides extensive information on health correlates, including so cioeconomic status, access to healthcare, and risk factors. 2.2 P er-Capita Income Dateset A dataset containing p er-capita income of the states/UT s for the same timeline (2020-2021) as that of NFHS is also used. P er-capita income is an imp ortant factor aﬀecting the qualit y of life in dev eloping countries suc h as India. Sev eral articles like Thorn (1968) and Hordofa (2023) emphasise the use of p er-capita income as a measure of economic developmen t; how ever, this is not a single b est measure for the purp ose, but is often used to compare states in terms of economic or sometimes h uman developmen t. The dataset used in the study is taken from the NITI Aay og website 2 , whic h 1 Link for dataset 2 NITI Aa yog 3 includes data only for 33 states and UT s, excluding Dadra and Nagar Ha veli and Daman and Diu, Lakshdw eep and Ladakh. The entries for these UT s are also remov ed from the NFHS dataset for the analysis.                                                                    Figure 1: P er-capita incomes of states in diﬀeren t income level zones. 3 Mo delling F ramew ork 3.1 The standard Bradley-T erry mo del The Bradley-T erry mo del was ﬁrst introduced by Zermelo (1929) and later p opularised by Bradley and T erry (1952). Consider a set of M states or union territories for which merit parameters are assigned to eac h state p µ 1 , µ 2 , . . . , µ M q . Supp ose there are K indicator v ariables on which the p erformances of M states/UT s are analysed. Each pair of states is compared K times, so the total num b er of comparisons is K M p M ` 1 q{ 2 . Let X ij b e a random v ariable deﬁned as the n umber of times state i outp erforms state j . Then X ij „ Bin p K , π ij q , where X ij are assumed to b e indep enden t. The term π ij denotes the probability that i is preferred ov er j and is given b y the Bradley-T erry mo del π ij “ exp p µ i q exp p µ i q ` exp p µ j q , (1) 4 where i ‰ j , 1 ď i, j ď M . Since for any c P R π ij “ exp p µ i ` c q exp p µ i ` c q ` exp p µ j ` c q “ exp p µ i q exp p µ i q ` exp p µ j q , an identiﬁabilit y constraint is required to estimate the parameters uniquely . A more p opular choice is ř M i “ 1 µ i “ 0 . Let x “ p x ij q 1 ď i,j ď M where x ij b e the n umber of times i wins o ver j , then the lik eliho o d function for the mo del is giv en by l p x | µ q “ M ź i “ 1 ź j ă i ˆ K x ij ˙ π x ij ij p 1 ´ π ij q K ´ x ij (2) where µ “ p µ 1 , µ 2 , . . . µ M q . T o compute the mo del parameters, maximum likelihoo d estimates can b e obtained based on iterative algorithms (Zermelo (1929), Hunter (2004)). Instead of obtaining p oin t estimates, our interest is in obtaining a distribution for the merit parameters using some prior information. 3.2 The Ba y esian framew ork A ccording to Bay esian theory , the unknown merit parameters ( λ 1 , . . . , λ M ) are assumed to b e random and dep enden t on each other, suc h that they share some information prior to their estimation. This shared information ma y arise from spatial pro ximity (Seymour et al. (2022)), economic similarit y , common climatic or territorial characteristics, or other historical or contextual relationships among the entities. T o mo del the prior distribution of these parameters in a wa y that pro vides ﬂexibilit y in mo delling their joint relationship, we use a zero-mean multiv ariate Gaussian distribution conditional to ř M i “ 1 µ i “ 0 or 1 T µ “ 0 where 1 is a vector of ones. So, w e hav e p µ | 1 T µ “ 0 q „ N M p 0 , Σ ´ Σ 1 p 1 T Σ 1 q ´ 1 1 T Σ q . (3) where Σ can b e mo delled using any kernel function. The fact that we w ant to mo del cov ariance b et w een the M states/UT s makes the Gaussian distribution a natural choice for the prior. The conditional distribution thus obtained is actually a degenerate distribution on the space of merit parameters, R 33 so the constrained cov ariance matrix, Σ ´ Σ 1 p 1 T Σ 1 q ´ 1 1 T Σ (denoted by C ) is not in vertible in theory . W e will use an approximation metho d later, wherever the inv erse is required. 3.2.1 Mo delling cov ariance W e design the cov ariance matrix Σ in a wa y that assigns higher co v ariance to economically similar states, suc h as T elangana and Karnataka, than to states with large economic diﬀerences, like Bihar and Sikkim. The economic diﬀerence can b e measured using the p er-capita income of the regions. It is the av erage income earned b y a p erson in a particular region (state, union territory or coun try) 5 and is one of the w ays to measure the economic condition of a region (Roy et al. (2019)). Kernel functions, suc h as squared exponential, Matérn, and rational quadratic, can b e used with the distance b et w een regions i and j tak en as the diﬀerence in p er-capita incomes of the corresp onding regions. Using v arious kernel functions, cov ariance b et w een i and j is deﬁned as co v p µ i , µ j q “ Σ ij “ α 2 exp ˜ ´ d 2 ij l 2 ¸ (4) co v p µ i , µ j q “ Σ ij “ α 2 ˜ 1 ` d 2 ij 2 s 2 l 2 ¸ ´ s (5) where α 2 is the v ariance hyperparameter, l is the characteristic length scale, and s is a p ositive-v alued scale-mixture parameter, eq. (4) and eq. (5) are deﬁned using squared exp onential and rational quadratic k ernel functions, resp ectiv ely . The distance d ij , deﬁned using p er-capita incomes of states or UT s, if taken as d ij “ p i ´ p j , giv es a nearly diagonal cov ariance matrix. A dditionally , this deﬁnition ov erlo oks the fact that the same v alue of d ij represen ts a greater lev el of economic disparity for lo wer-income states than for higher-income states. F or example, a diﬀerence of around Rs. 11,000 b et ween a p er-capita income of Rs. 1,27,550 in Maharas h tra and Rs. 1,16,229 in Mizoram is less signiﬁcan t than the p er-capita incomes of Rs. 62,944 in Chhattisgarh and Rs. 74,489 in Meghalay a. T o capture the relativ e income lev els, we consider log p er-capita incomes and d ij “ log p i ´ log p j . This deﬁnition also resolve s the issue of the diagonal co v ariance matrix. The cov ariance matrix can b e obtained using any of the kernel functions with the ab ov e deﬁnition of d ij . The hyperparameters like α , l and s can b e tuned or assumed random, for which another prior distribution can b e considered. It ma y happ en that the cov ariance matrix obtained, irresp ectiv e of the k ernel functions used, is not p ositiv e deﬁnite due to n umerical appro ximation. T uning h yp erparameters can solv e this issue. 3.3 Mo del Fitting Giv en the observed comparison data x for each pair of regions, we wan t to obtain the distribution of the mo del parameters. W e assume α to b e random, w e take a prior on it so that the p osterior distribution is a function of merit parameters µ “ p µ 1 , . . . , µ M q and cov ariance function v ariance h yp erparameter α . Using Bay es’ theorem, the p osterior distribution is given by p p µ , α 2 | x q 9 l p x | µ q p p µ | 1 T µ “ 0 , α 2 q p p α 2 q (6) where the ﬁrst term on the right-hand side is the Bradley-T erry likelihoo d in eq. (2), the second term is a Gaussian prior on merit parameters µ in eq. (3), and the last term is an indep enden t prior on the v ariance hyperparameter α 2 . The hierarchical structure of the mo del allows for b etter 6 inferences. The p osterior distribution can not b e obtained explicitly , as there do es not exist a closed form of the p osterior for the chosen prior (as the Gaussian distribution is not a conjugate prior of the binomial distribution). So, we would use sampling techniques to obtain the samples of the distribution p p µ , α 2 | x q . Since the distribution is high-dimensional, Marko v c hain Monte Carlo sampling metho ds will b e used to estimate the mo delling parameters µ and α 2 . 3.3.1 MCMC sampling W e w ant to obtain the p osterior samples of parameter µ and v ariance h yp erparameter α 2 as a w ay to approximate the posterior distribution p p µ , α 2 | x q in eq. (6). The exp ectation of the p osterior distribution can b e used to assign v alues to the merit parameters. The mean of the p osterior samples, obtained via MCMC sampling, which pro duces correlated samples, is used to estimate the exp ectation of the target distribution. The num b er of samples dep ends on the stopping criteria. The prior on merit parameters µ is Gaussian, and for the estimation of h yp erparameter α 2 , an inv erse-gamma prior distribution can b e taken, whic h is conjugate to the normal distribution. The conjugacy allo ws us to up date the α 2 parameter via Gibbs sampling. F or up dating merit parameters µ , a mo diﬁed random walk metho d can b e used in whic h the next sample is given b y pCN (preconditioned Crank Nicolson) up date, which is well suited for higher dimensions and Gaussian priors (see Cotter et al. (2013)). The pCN algorithm is used in Bay esian wide neural net works b y Pezzetti et al. (2024), in which this algorithm is inv estigated for the use of MCMC sampling for high dimensions. The ratio of likelihoo ds gives the acceptance probabilit y as the prop osal is reversible with resp ect to the Gaussian prior(Cotter et al. (2013)). Prop osition 1. L et µ P R M satisfy the line ar c onstr aint 1 T µ “ 0 . A ssume a Gaussian prior distribution on µ , r estricte d to this subsp ac e, and let the likeliho o d ℓ p x | µ q deﬁne the ψ p µ q “ ´ log ℓ p x | µ q . Then the pr e c onditione d Cr ank–Nic olson (pCN) algorithm, c ombine d with a Metr op olis–Hastings ac c ept–r eje ct step, gener ates a Markov chain p µ t q t ě 0 whose invariant distribution has density p p µ | x q9 exp ` ´ ψ p µ q ˘ p p µ | 1 T µ “ 0 q . The Markov chain thus obtaine d is er go dic and c onver ges to the stationary distribution p p µ | x q . Mor e over, the sp e e d of c onver genc e of the pCN algorithm is r obust with r esp e ct to the dimension M . In simple terms, the ab o ve proposition suggests that the pCN algorithm con verges when the lik eliho o d and prior satisfy the given conditions. The pro of of conv ergence is given in Hairer et al. (2014), and it is also shown that the algorithm is robust with resp ect to the dimension of the samples, making it suitable to use for v ectors having a dimension as high as 33. 7 If we assume inv erse-gamma distribution as the prior for v ariance co v ariance hyperparameter α 2 , then the conditional distribution has a closed form and Gibbs sampling can b e used to sample. F or Gibbs up date, we require full conditional distribution for α 2 , whic h has the following closed form: p α 2 | µ q „ Inv-Gamma p χ ` M { 2 , ω ` µ C ´ 1 µ T q (7) where χ and ω are shap e and scale parameters resp ectiv ely for in verse gamma distribution and C “ Σ ´ Σ 1 p 1 T Σ 1 q ´ 1 1 T Σ is the constrained prior co v ariance matrix (eq. (3)). Algorithm 1 MCMC for mo del parameters Require: Initial mo del parameters µ p 0 q and α 2 p 0 q , tuning parameter β P p 0 , 1 q , data x through win matrix, prior co v ariance matrix C , h yp erparameters p ω , χ q . 1: for t th iteration do 2: (Gibbs step) Sample v ariance parameter α 2 p t q from its in verse-gamma full conditional conditioned on µ p t ´ 1 q . 3: (pCN prop osal) Prop ose µ ˚ “ a 1 ´ β 2 µ p t ´ 1 q ` β ξ , ξ „ N p 0 , α 2 p t q C q . 4: (Lik eliho o d ev aluation) Compute the log-likelihoo d at µ ˚ . 5: (Metrop olis–Hastings step) Accept µ ˚ with probabilit y min ´ 1 , log π p x | µ ˚ q ´ log π p x | µ p t ´ 1 q q ¯ . 6: end for (with a stopping criteria). 7: return p osterior samples t µ p t q u (and t α 2 p t q u if sampled). If we run the algorithm for a ﬁnite num b er of iterations, some sampling error would b e there, which needs to b e assessed, based on which some stopping rules can b e deﬁned. 3.3.2 Stopping rules/MCMC diagnostics Though the pCN algorithm conv erges in theory as the n umber of samples tends to inﬁnity , we ha ve to run the chain for a ﬁnite num b er of samples, which introduces sampling error b et ween the estimated v alue and the true v alue of a quantit y . This unknown Monte Carlo error must b e estimated to assess the qualit y of the estimate. Here, we are in terested in the exp ectation of the p osterior distribution estimated by taking the av erage of the samples. Let t µ t u N t “ 1 b e an ergo dic Mark ov chain with stationary distribution p on R M with mean E p r µ t s and co v ariance Σ µ “ Cov p µ t q . Let L k b e the auto-cov ariance at lag k , L k “ Cov p µ t , µ t ` k q , then 8 the asymptotic or long-run co v ariance matrix is giv en by L “ Σ µ ` 8 ÿ k “ 1 p L k ` L T k q , (8) so that, for i.i.d. samples, L “ Σ µ . The correlation in the chain reduces the information of the samples. Thus, a metric called eﬀectiv e sample size (ESS) is used to quan tify the indep enden t information imparted b y the correlated samples. The connection betw een ESS, v ariance and information unfolds via the cen tral limit theorem. Supp ose t µ t u t ě 1 b e samples generated b y algorithm 1. Prop erties of MCMC a verage ¯ µ N “ ř N t “ 1 µ t { N are well do cumen ted, e.g., Gey er (2011). F or example, if the MCMC samples satisfy a Harris ergo dic condition with in v ariant distribution p , and E p r} µ t } 2 s ă 8 , then ¯ µ N Ñ E p r µ t s almost surely as N Ñ 8 . Moreo ver, if there exists δ ą 0 suc h that E p r} µ t } 2 ` δ s ă 8 and the Marko v c hain is p olynomially ergo dic of suﬃciently high order, then ? N ` µ N ´ E p r µ t s ˘ d Ý Ñ N p 0 , L q , where L is the asymptotic cov ariance matrix deﬁned in eq. (8). F or full rank Σ µ and L , the ESS is deﬁned in V ats et al. (2019) as: ESS “ N „ det p Σ µ q det p L q ȷ 1 { M (9) In our exp erimen ts, this deﬁnition degenerates due to near-deterministic co ordinates arising from constrained priors or hierarchical shrinkage, resulting in singular or ill-conditioned cov ariance estimates, and the other reason could b e numerical approximation. When Σ µ is a low-rank matrix, say r “ rank p Σ µ q ă M then det p Σ µ q “ 0 and ESS b ecome undeﬁned. In that case, let Σ µ “ U DU T where D “ diag p e 1 , . . . , e r , 0 , . . . , 0 q . If U r collects all the eigenv ectors corresp onding to p ositiv e eigenv alues, then deﬁne S “ Span p U r q and denote Q S “ U T r QU r for a symmetric matrix Q . The multiv ariate ESS is deﬁned in Mukherjee (2021) as: ESS “ N « p det p Σ S µ q p det p L S q ﬀ 1 { M (10) where p det( Q ) “ ś e i p Q qą 0 e i p Q q is the pseudo-determinan t. T o calculate the ESS, ﬁrst, the quan tities in the numerator and denominator must b e estimated. The cov ariance matrix Σ µ is estimated b y sample cov ariance matrix ˆ Σ µ “ 1 N ´ 1 N ÿ t “ 1 p µ t ´ µ N qp µ t ´ µ N q T 9 The asymptotic v ariance L can b e estimated using multiv ariate sp ectral v ariance estimator ˆ L “ ˆ Σ ` b ´ 1 ÿ k “ 1 w k p ˆ L k ` ˆ L k T q where ˆ L k “ p N ´ k q ´ 1 ř N ´ k t “ 1 w k p L k ` ˆ L k T q where k ě 1 and b is the batc hsize or bandwidth after whic h there is no signiﬁcan t auto correlation in the chain. It is c hosen suc h that b “ b N with b N Ñ 8 and b { N Ñ 0 . w k is the lag windo w that assigns weigh t to the lags. In practice, the eigenv alues may not b e exactly equal to 0 but they are very close to 0 , so the rank r is estimated by ˆ r b y thresholding the eigenv alues and U ˆ r can b e obtained b y retaining the eigen vectors whose corresp onding eigen v alues are greater than the threshold and hence ˆ Σ ˆ S µ “ U ˆ r ˆ Σ µ U ˆ r and ˆ L ˆ S “ U ˆ r ˆ LU ˆ r . No w, for N dep enden t samples obtained from algorithm 1, we calculate ˆ Σ µ and ˆ L and if ˆ S b e the span of eigenv ectors for which eigenv alues are greater than or equal to a threshold and let ˆ r “ dim p ˆ S q ě 1 , then eq. (10) is estimated by ESS “ N « p det p ˆ Σ S µ q p det p ˆ L S q ﬀ 1 { ˆ r (11) Another diagnostic which we can lo ok into is the acceptance rate. F or the pCN algorithm, as men tioned ab ov e, the acceptance ratio is given by the ratio of likelihoo ds. The acceptance rate for a m ultiv ariate c hain is exp ected to b e in the range 20% ´ 30% (Cotter et al. (2013)). If the acceptance rate is high, then this can reﬂect that the chain is stuck in a region and is not mixing prop erly . Although, for b etter diagnostic, ESS should b e high but we will try to ac hieve acceptance rate atleast more than 20% by tuning the hyperparameters, step size and running the chain for longer duration. T raceplots for merit parameters µ “ p µ 1 , . . . , µ M q help in deciding the burnin p eriod and analysing the proper exploration of p osterior distribution by the chain. Though they are not the strong and suﬃcien t diagnostic to ol for MCMC, but they help us to understand the general b eha viour of the Mark ov chain lik e if the chain is stuck or sho wing upw ard or do wnw ard trend. These signs help to iden tify the problems in the MCMC sampling algorithm. Another type of plots whic h are used frequently are auto correlation function with resp ect to lag whic h shows the auto correlation in the chain with diﬀerent lags. As lag increases, auto correlation decreases. W e can also observe the swaps in the p osition of states in the ranking at each iteration using ranking distance lik e Kendall-T au distance. 10 4 Implemen tation and Results The implemen tation is done in Python in tw o parts: ﬁrst, the NFHS data is used to construct the win matrix, and second, p er-capita income is used to generate a cov ariance matrix for the prior distribution. The NFHS data are contained in an Excel sheet, with columns providing data on v arious indicators for eac h state. In total, there are 131 indicators v ariables for each state. Some indicators are remo ved b ecause most states/UT s hav e null v alues. Out of them, Chandigarh is a UT for which many entries are missing. The analysis is conducted in t wo wa ys: b y removing indicators with n ull v alues (which yields a smaller set of indicators) and by removing Chandigarh, so that the remaining states/UT s can b e analysed using more indicators. Considering all the states and remo ving the columns ha ving missing en tries, w e hav e 116 columns based on which the win matrix is obtained. This win matrix would then b e used to calculate the likelihoo d while implemen ting the MCMC algorithm. The cov ariance matrix for the prior distribution is obtained using any of the k ernel functions (squared exp onen tial, Matern, rational quadratic, exp onen tial, etc.), where the distance is deﬁned in section 3.2.1. W e set the v alue of the length scale h yp erparameter l “ 0 . 09 for the squared exp onen tial kernel and h yp erparameters of the in verse gamma distribution, χ and ω , to b e 2 and 1 resp ectiv ely . The β is set to b e 0 . 009 . If w e take diﬀerent v alues of χ and ω , then the samples b ecome more or less correlated according to the sample v alues of α 2 b eing high or low. The v alue of the length scale parameter l is set in such a w ay that the v alue of the quadratic form ( µ T C µ ) in eq. (7) b ecomes stable and ﬂuctuates around a particular v alue. Decreasing v alue of β leads to a sudden increase in acceptance rate, signifying less exploration of the d istribution b y the chain. F or instance, setting the same parameter v alues with β “ 0 . 004 giv es a 60% acceptance rate with multi ESS (deﬁned in eq. (11)) around 19347 . Setting β “ 0 . 009 gives around 28% acceptance rate with multi-ESS around 20002 . The traceplots for parameter µ , α 2 and quadratic form are given in the supplemen tary . In vestigating traceplots for all the parameters µ “ p µ 1 , . . . , µ M q and α 2 , w e remov e samples from initial 10 , 00 , 000 iterations after running the algorithm for 30 , 00 , 000 iterations. 11 State-wise Heat Map of India 0.6 0.4 0.2 0.0 0.2 0.4 0.6 Indicator V alue (a) Heatmap for v alues of merit parameters. 0 2 4 6 8 10 12 14 2 0.0 0.1 0.2 0.3 0.4 0.5 Density P osterior Density (Histogram) (b) Histogram for α 2 parameters. Figure 2: Results when 116 indicators are included. Goa Sikkim Delhi Chandigar h Gujarat Haryana K ar natak a Andaman & Nicobar T amil Nadu T elangana P uducher ry K erala Himachal P radesh Uttarakhand Maharashtra Mizoram P unjab Andhra P radesh Arunachal P radesh T ripura Chhattisgar h R ajasthan Odisha Nagaland Jammu & K ashmir Assam W est Bengal Meghalaya Madhya P radesh Jharkhand Manipur Uttar P radesh Bihar 0 5 10 15 20 25 30 R ank Comparison of Income R ank and NFHS- Based R ank Income R ank NFHS based R ank Figure 3: Change in ranking shows the eﬀect of likelihoo d on p osterior based on NFHS data. Next, w e will remov e Chandigarh from the dataset to obtain the estimates, thereby increasing the n umber of indicators and the num b er of comparisons b et ween states. The total num b er of indicators 12 from the NFHS dataset is no w 125 . 0.0 2.5 5.0 7.5 10.0 12.5 15.0 17.5 2 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 Density P osterior Density (Histogram) Figure 4: Histogram for v ariance hyperparameter α 2 (when Chandigarh is remo ved). Goa Sikkim Delhi Gujarat Haryana K ar natak a Andaman & Nicobar T amil Nadu T elangana P uducher ry K erala Himachal P radesh Uttarakhand Maharashtra Mizoram P unjab Andhra P radesh Arunachal P radesh T ripura Chhattisgar h R ajasthan Odisha Nagaland Jammu & K ashmir Assam W est Bengal Meghalaya Madhya P radesh Jharkhand Manipur Uttar P radesh Bihar 0 5 10 15 20 25 30 R ank Comparison of Income R ank and NFHS- Based R ank Income R ank NFHS based R ank Figure 5: Change in ranking sh o ws the eﬀect of likelihoo d on p osterior based on NFHS data without Chandigarh. No signiﬁcan t diﬀerences are observ ed in the rankings or plots after including the additional indicators (by remo ving Chandigarh). This may b e b ecause the contribution of the original 116 indicator v ariables dominates that of the additional 9 v ariables in determining the ﬁnal ranking. 13 State-wise Heat Map of India 0.4 0.2 0.0 0.2 0.4 Indicator V alue (a) Heatmap for v alues of merit parameters. 0 10 20 30 40 50 2 0.00 0.05 0.10 0.15 0.20 0.25 Density P osterior Density (Histogram) (b) Histogram for v ariance hyperparameter α 2 . Figure 6: Results when only low-lev el income states are included. W e now include only states with low er income lev els and obtain the ranking based on the initial 116 indicator v ariables. T aking β “ 0 . 009 giv es approximately 60% acceptance rate, so w e increased the step size to 0 . 02 , whic h gives 27% acceptance rate with ESS 33458 (keeping v alues of all the h yp erparameters same as b efore). 14 T ripura Chhattisgar h R ajasthan Odisha Nagaland Jammu & K ashmir Assam W est Bengal Meghalaya Madhya P radesh Jharkhand Manipur Uttar P radesh Bihar 0 2 4 6 8 10 12 14 R ank Comparison of Income R ank and NFHS- Based R ank Income R ank NFHS based R ank Figure 7: Change in ranking shows the eﬀect of lik eliho o d on p osterior based on NFHS data for lo w-level of incomes. If w e compare this ranking with the one that includes all 33 states, we observe changes in the p ositions of states suc h as Uttar Pradesh and Nagaland, with ma jor swaps. Since some states with higher income levels are remov ed, the cov ariance structure changes. Also, the likelihoo d is changed, since the comparisons are remo ved inv olving the higher-income states. As the diﬀerence b et ween the v alues of merit parameters is v ery small and ranking dep ends only on the ordering of p osterior means, even small shifts in the p osterior distribution can lead to noticeable sw aps in rank p ositions. T o b e sp eciﬁc, the gap b et ween the estimates of middle low er rank states in the ov erall ranking is v ery small (as compared to the other states), which shows the p ossibilities of swaps in p osition of these states when only low er income states are analysed. The other reason could b e that the states whic h p erformed b etter when compared to high income states, no w face a decline in their p erformance when they are compared to lo w income states. This gives a ma jor p olicy making insight as w ell. Next, w e do the same kind of analysis on states which do esn’t ha ve exceptionally high incomes (lik e Goa, Delhi, Sikkhim, Chandigarh, see section 2.2). T aking the same v alues of all parameters, w e set β “ 0 . 01 , to make a trade-oﬀ b etw een acceptance rate and ESS. The acceptance rate is 27% with ESS 21327 . 15 State-wise Heat Map of India 0.6 0.4 0.2 0.0 0.2 0.4 Indicator V alue (a) Heatmap for v alues of merit parameters. 0.0 2.5 5.0 7.5 10.0 12.5 15.0 17.5 2 0.0 0.1 0.2 0.3 0.4 Density P osterior Density (Histogram) (b) Histogram for v ariance hyperparameter α 2 . Figure 8: When outliers (extremely high-level income states) are remov ed. Gujarat Haryana K ar natak a Andaman & Nicobar T amil Nadu T elangana P uducher ry K erala Himachal P radesh Uttarakhand Maharashtra Mizoram P unjab Andhra P radesh Arunachal P radesh T ripura Chhattisgar h R ajasthan Odisha Nagaland Jammu & K ashmir Assam W est Bengal Meghalaya Madhya P radesh Jharkhand Manipur Uttar P radesh Bihar 0 5 10 15 20 25 30 R ank Comparison of Income R ank and NFHS- Based R ank Income R ank NFHS based R ank Figure 9: Change in ranking sh o ws the eﬀect of likelihoo d on p osterior based on NFHS data without outliers. The relativ e p ositions of most states remain unc hanged after excluding high-income states. Only 16 State-wise Heat Map of India 0.4 0.3 0.2 0.1 0.0 0.1 0.2 0.3 Indicator V alue (a) Heatmap for v alues of merit parameters. 0 5 10 15 20 2 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 Density P osterior Density (Histogram) (b) Histogram for v ariance hyperparameter α 2 . Figure 10: When only middle-level income states are included. minor sw aps are observed among adjacent states suc h as Assam and Uttar Pradesh, Meghala ya and Arunac hal Pradesh, and Mizoram and Karnataka. These changes might b e b ecause of small shifts in p osterior estimates after removing extreme high-income states. Imp ortan tly , no substan tial rank rev ersals are observed, indicating that the o verall ordinal structure of the rankings is robust and not driv en by outliers such as Goa, Delhi, Chandigarh, and Sikkim. Lastly , we p erform the analysis on the states with middle lev el of incomes. With the same v alues of parameter and β “ 0 . 009 , acceptance rate b ecomes nearly 78% , so w e increase the step size to β “ 0 . 04 to obtain the acceptance rate of 27% with ESS 26312 . 17 Gujarat Haryana K ar natak a Andaman & Nicobar T amil Nadu T elangana P uducher ry K erala Himachal P radesh Uttarakhand Maharashtra Mizoram P unjab Andhra P radesh Arunachal P radesh 0 2 4 6 8 10 12 14 R ank Comparison of Income R ank and NFHS- Based R ank Income R ank NFHS based R ank Figure 11: Change in ranking shows the eﬀect of likelihoo d on p osterior based on NFHS data for middle-lev el income states. A ma jor change is observed in the p osition of T elangana. Its rank improv es when the analysis is restricted to middle-income states compared to the ov erall ranking. Ho w ever, its p osition remains unc hanged when b oth low- and middle-income states are included. This suggests that the inclusion of lo w-income states inﬂuences its relative p osition, likely b ecause it may not hav e p erformed as strongly as some lo w-income states on certain NFHS indicators. W e ha ve also implemented the Bradley-T erry mo del on the NFHS dataset to obtain a ranking using p oin t estimates giv en by the following iteration scheme (Newman (2023)): π 1 i “ ř M j “ 1 x ij π j {p π i ` π j q ř M j “ 1 x j i {p π i ` π j q 18 Goa P uducher ry Himachal P radesh Chandigar h T amil Nadu Jammu & K ashmir K erala Andaman & Nicobar Mizoram Delhi Haryana Sikkim Uttarakhand R ajasthan K ar natak a P unjab Odisha Chhattisgar h Madhya P radesh Manipur Andhra P radesh T elangana Maharashtra W est Bengal Gujarat Meghalaya Arunachal P radesh Nagaland Uttar P radesh T ripura Assam Jharkhand Bihar 0 5 10 15 20 25 30 R ank Comparison of ranks obtained by Bradley - T er ry and Bayesian Bar dley - T er ry models. Bradley - T er ry Bayesian Bradley - T er ry Figure 12: Rankings obtained via Bradley-T erry and Bay esian Bradley-T erry mo dels, resp ectively . A few ma jor c hanges in state p ositions are observ ed when comparing the classical Bradley-T erry ranking with the Bay esian ranking. The states at b oth extremes of the ranking remain the same under b oth approaches. Only minor shifts are seen among states in the middle of the distribution. This suggests that the Bay esian ranking is largely driven by the lik eliho o d constructed from the NFHS indicators, with the prior (based on p er capita income) ha ving limited inﬂuence on the ov erall ranking. 5 Sim ulation W e w ould sample the merit parameters from the prior distribution, assuming diﬀerent v alues of length scales, which would b e further used to simulate the win matrix by taking 100 num b er of total comparisons. The win matrix is then used to estimate the parameters using our mo del with the same v alues of parameters, and an estimate is also obtained by the classical Bradley-T erry mo del. W e could compare the resp ectiv e estimates of merit parameters with their true v alues. 6 Discussion W e know that data collection is a tedious pro cess and requires time, resources, and costs. In such a situation, the only resort is to make the b est use of a v ailable data. In this pap er, w e hav e devised a wa y to use the NFHS data to obtain a ranking along with the prior information of the states’ economic condition via p er-capita income, whic h b ecomes p ossible through the Bradley-T erry mo del via the Bay esian route. The Bradley-T erry mo del is used with the Bay esian approach to obtain the ranking of 33 Indian states/UT s. The sampling b ecomes easier in dimensions as high as 33 by using 19 a pre-Condition Crank Nikolson algorithm, which is eﬀective for high dimensional prob elms. W e analysed the multiv ariate samples using v arious MCMC diagnostic to ols like traceplots, eﬀectiv e sample size, acceptance ratio and observed the stability of ranking to provide reliable estimates of the merit parameters. The v arious diagnostics sho wed that the chain is in its conv erging state, th us making the p osterior estimates reliable for estimating the merit parameters. The ranking thus obtained provides an opp ortunit y to the p olicy makers to analyse the states’ condition on a scale based on imp ortan t factors like family health, a wareness, education, etc. and can b e eﬀectiv ely used to suggest p olicies and measures. Though w e hav e used the p er-capita income as prior information, the ranking is mainly inﬂuenced b y the NFHS indicators, which is evident after comparing the NFHS-based ranking with rankings solely based on p er-capita income and classical Bradley-T erry . The rankings obtained by taking subsets of states (by categorising them in to lo w-level, middle-level and high-lev el incomes) provide insights in to the stability of p ositions of some of the states, while some states sho w ma jor sw aps in their p ositions. Moreo ver, the ranking metho d can b e repro duced on similar datasets dealing with muc h larger and more v aried indicators and can easily b e up dated with the a v ailabilit y of new datasets. Apart from it, ranking sp eciﬁc to a particular domain, lik e the health sector, public education, and prev alence of diseases, etc., can b e obtained, which can help the administration to take necessary actions in the particular domain. Setting priors based on diﬀerent information, like geographical, historical or p olitical, can also b e used for ranking states/UT s based on the a v ailabilit y of data. The ranking metho d used here assumed equal weigh tage of all the indicator v ariables, but that may not b e relev an t in practical scenarios, lik e the use of tobacco/alcohol in a region, should not con tribute equally as the factors related to education do. Giving prop er weigh tage to diﬀeren t indicators ma y lead to a more realistic ranking system. Apart from it, the data for subregions can also b e used to determine the p osition/p erformance of the region by making suitable mo diﬁcations in the current ranking metho d. A c kno wledgmen ts The w ork of Rahul Singh was partially supp orted by New F aculty Seed Grant at Indian Institute of T echnology Delhi, India. 20 References Bradley , R. A. and T erry , M. E. (1952). Rank analysis of incomplete blo c k designs: I. the metho d of paired comparisons. Biometrika , 39(3/4):324–345. Caron, F. and Doucet, A. (2012). Eﬃcien t bay esian inference for generalized bradley–terry mo dels. Journal of Computational and Gr aphic al Statistics , 21(1):174–196. Cotter, S. L., Rob erts, G. O., Stuart, A. M., and White, D. (2013). Mcmc metho ds for functions: mo difying old algorithms to make them faster. Statistic al Scienc e , pages 424–446. Gey er, C. J. (2011). In tro duction to mark ov chain monte carlo. Handb o ok of markov chain monte c arlo , 20116022(45):22. Hairer, M., Stuart, A. M., and V ollmer, S. J. (2014). Sp ectral gaps for a metrop olis–hastings algorithm in inﬁnite dimensions. Hordofa, D. F. (2023). The mo derating eﬀect of income inequality on the relationship b et ween economic gro wth and p olitical economy , human capital, innov ation, and saving channels in ethiopia. Disc over Glob al So ciety , 1(1):21. Hun ter, D. R. (2004). Mm algorithms for generalized bradley-terry mo dels. The annals of statistics , 32(1):384–406. Mukherjee, S. (2021). mv ess-z: A robust multiv ariate eﬀectiv e sample size for degenerate and quan tized marko v c hains. Bul letin of Computer and Data Scienc es , 2(1):3–15. Newman, M. E. (2023). Eﬃcient computation of rankings from pairwise comparisons. Journal of Machine L e arning R ese ar ch , 24(238):1–25. Oliv er, T. R. (2010). Population health rankings as p olicy indicators and p erformance measures. Pr eventing chr onic dise ase , 7(5):A101. P ezzetti, L., F av aro, S., and Peluc hetti, S. (2024). F unction-space mcmc for bay esian wide neural net works. arXiv pr eprint arXiv:2408.14325 . Purtle, J., P eters, R., Kolk er, J., and Diez Roux, A. V. (2019). Uses of p opulation health rankings in lo cal p olicy con texts: A m ultisite case study . Me dic al Car e R ese ar ch and R eview , 76(4):478–496. Ra j, J., Gupta, V., and Shra wan, A. (2024). Economic gro wth and h uman dev elopment in india–are states con verging? Indian Public Policy R eview , 5(3 (Ma y-Jun)):94–137. Ra j, P . and Gupta, N. (2022). A review of the national family health surv ey data in addressing india’s maternal health situation. Public He alth R eviews , 43:1604825. 21 Ro y , S., Sen, C., and Sany al, R. (2019). An empirical inquiry into p er capita conv ergence of indian states. Glob al Journal of Emer ging Market Ec onomies , 11(3):232–247. Seymour, R. G., Sirl, D., Preston, S. P ., Dryden, I. L., Ellis, M. J., P errat, B., and Goulding, J. (2022). The ba yesian spatial bradley–terry model: Urban depriv ation mo delling in tanzania. Journal of the R oyal Statistic al So ciety Series C: A pplie d Statistics , 71(2):288–308. Thorn, R. S. (1968). P er capita income as a measure of economic developmen t. Zeitschrift Für Nationalökonomie/Journal of Ec onomics , (H. 2):206–216. V ats, D., Flegal, J. M., and Jones, G. L. (2019). Multiv ariate output analysis for marko v c hain mon te carlo. Biometrika , 106(2):321–337. Vignitha, B., Debnath, A., Sai, T. K., Charag, S., PVS, A., Sai, T., et al. (2024). W omen’s emp o w erment in india: State-wise insights from the national family health survey 5. Cur eus , 16(7). W ainer, J. (2023). A bay esian bradley-terry mo del to compare m ultiple ml algorithms on multiple data sets. Journal of Machine L e arning R ese ar ch , 24(341):1–34. Y ada v, B. L. (2026). In ter-state disparities in female educational attainment in india: A comp osite index based analysis using nfhs-v data. Journal of Educ ation, So ciety and Behaviour al Scienc e , 39(1):37–46. Y adav, S. (2023). The problem of regional disparities: An ov erview in indian context. Journal of Humanities and Educ ation Development. Doi , 10. Zermelo, E. (1929). Die b erec hnung der turnier-ergebnisse als ein maxim umproblem der wahrsc hein- lic hkeitsrec hn ung. Mathematische Zeitschrift , 29(1):436–460. App endix T o pro ve the prop osition 1, we will use the following result on the conv ergence of th e pCN algorithm. Lemma 1. If we assume that the tar get density on R M is of the form, p p d µ q 9 N p d µ ; 0 , C q ¨ exp p´ ψ p µ qq wher e ψ : R M ÞÑ R satisﬁes the fol lowing c onditions 1. Ther e exist c onstants p ą 0 and B ą 0 such that for al l µ P R M , 0 ď ψ p µ q ď B ` 1 ` } µ } p ˘ . 22 2. F or every r ą 0 ther e exists a c onstant B p r q ą 0 such that for al l µ , v P R M with max t} µ } , } v }u ă r , | ψ p µ q ´ ψ p v q| ď B p r q} µ ´ v } . Then, for a ﬁxe d 0 ă β ď 1 { 2 , the pCN algorithm applie d to p p d µ q has a unique invariant me asur e p p d µ q , and the c orr esp onding sample aver age satisﬁes a str ong law of lar ge numb ers and a CL T for every initial c ondition(Hair er et al. (2014)). F or the pro of of lemma 1, one can refer to Hairer et al. (2014). Precisely , the function ψ : R M Ñ R needs to b e b ounded b elo w, has at most p olynomial growth, and is lo cally Lipschitz contin uous (Cotter et al. (2013)). No w, we show that the function ψ satisﬁes the assumptions in lemma 1. Pro of of prop osition 1. The function ψ p µ q is the negative log-lik eliho o d and satisﬁes the required conditions. Ignoring constants indep enden t of µ , the Bradley–T erry negative log-likelihoo d is Φ p µ q “ ÿ i ă j r K log p e µ i ` e µ j q ´ x ij µ i ´ p K ´ x ij q µ j s . since adding a constant to the function ψ do es not change the p osterior, ψ p µ q is non-negative. Now, log p e µ i ` e µ j q ď log p 2 e max p µ i , µ j q q ď log 2 ` max p µ i , µ j q Therefore, K log p e µ i ` e µ j q ď K max p µ i , µ j q ` c. Using max p µ i , µ j q ď | µ i | ` | µ j | , w e get K log p e µ i ` e µ j q ď K | µ i | ` K | µ j | ` c. Since 0 ď x ij ď K , | x ij µ i | ď K | µ i | . Hence ev ery term in ψ p µ q is b ounded by constants times | µ i | and | µ j | . Hence, Φ p µ q ď c M ÿ i “ 1 | µ i | ` c. Using Cauc hy–Sc h warz, M ÿ i “ 1 | µ i | “ x| µ | , 1 y ď } µ } 2 } 1 } 2 . 23 Since } 1 } 2 “ ? M , M ÿ i “ 1 | µ i | ď ? M } µ } 2 . Therefore, Φ p µ q ď B p 1 ` } µ } 2 q . This sho ws p olynomial gro wth (with p “ 1 ). No w, the deriv ative: B B µ i log p e µ i ` e µ j q “ e µ i e µ i ` e µ j P p 0 , 1 q . Hence, eac h partial deriv ativ e of ψ is b ounded on b ounded sets: ˇ ˇ ˇ ˇ B Φ B µ i ˇ ˇ ˇ ˇ ď B p r q whenev er } µ } 2 ă r. Th us for } µ } 2 , } v } 2 ă r , | ψ p µ q ´ ψ p v q| ď B p r q} µ ´ v } 2 , whic h prov es lo cal Lipschitz contin uity . 24

Inclusive Ranking of Indian States via Bayesian Bradley-Terry Model

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment