Selection and Collider Restriction Bias Due to Predictor Availability in Prognostic Models

Reading time: 4 minute
...

📝 Original Info

  • Title: Selection and Collider Restriction Bias Due to Predictor Availability in Prognostic Models
  • ArXiv ID: 2602.17255
  • Date: 2026-02-19
  • Authors: ** 정보 없음 (원문에 저자 명시가 없으며, 추후 원문을 확인 필요) **

📝 Abstract

This methodological note investigates and discuss possible selection and collider restriction bias due to predictor availability in prognostic models.

💡 Deep Analysis

📄 Full Content

The inflation in the number of published prediction models over recent decades has generated an extensive methodological literature on common shortcomings in prognostic model development and guidelines for their design, validation, and reporting [2,[4][5][6][7][8][9][10][11]. In broad terms, the development of prediction models should include the development of the prediction model itself [6], its external validation and calibration [1,7], and, ideally, the conduct of a prospective clinical and economic impact study [8]. This well-defined framework, intended to ensure the methodological soundness of prediction model assessment implicitly assumes that required predictors are routinely available at the point of care [2,6], an assumption that, despite its importance, has received little explicit attention in the methodological literature on prediction models [11]. More generally, when a prognostic score is developed or validated using retrospective data, inclusion is restricted-by constructionto patients with recorded predictors. This restriction to patients with recorded predictors is not inherently problematic, and if measurement occurs independently of determinants of outcome risk, the analysed sample may still yield unbiased estimates. However, when predictor measurement depends on underlying disease severity or related care processes, restriction selects individuals based on a variable influenced by determinants of the outcome. This process is analogous to protopathic bias, a form of revers causality in which early manifestations of disease prompt intervention before formal diagnosis [12]. This situation, illustrated in panel A of Figure 1, corresponds to classical selection bias: underlying disease severity (U) influences both the outcome and the likelihood of predictor measurement so that restricting analysis to individuals with recorded predictors effectively selects on severity. where both underlying disease severity and its proxy influence measurement of P 2 , making its availability a collider.

A concrete example illustrating how the assumption of unbiased predictor availability may not hold in practice is provided by the Kidney Failure Risk Equation (KFRE) [15]. The KFRE is a prognostic model developed to predict progression to kidney failure in patients with chronic kidney disease stages 3-5 (CKD 3-5), defined by persistent reduction in kidney function or markers of kidney damage. It estimates an individual’s risk of kidney failure at 2 or 5 years and is intended to inform risk stratification and referral decisions in patients with chronic kidney disease. The commonly used four-variable version relies on age, sex, estimated glomerular filtration rate (eGFR), and urine albumin-to-creatinine ratio (uACR), a measure of albuminuria.

Although the KFRE has been the subject of numerous validation studies reporting strong predictive performance [16][17][18][19], its uptake in routine clinical practice remains limited [19]. This limited use may be partly explained by constraints in routine data availability [20], with eGFR or uACR not being systematically recorded in community-based care for patients with chronic kidney disease stages 3-5. In the UK, albuminuria testing among patients with chronic kidney disease stages 3-5 remains uncommon in primary care, with fewer than 25% undergoing uACR testing within one year overall, but increasing to about 37% among those formally registered with chronic kidney disease, indicating substantially higher testing conditional on chronic kidney disease recognition [21]. More recent national audits report annual testing in around 30% of patients with chronic kidney disease stages 3-5 [22]. Similar patterns have been reported in the US, where albuminuria testing remains uncommon among adults at risk for chronic kidney disease, with ACR recorded in around 17% of these patients, while being associated with a higher prevalence of chronic kidney disease treatment [23]. More generally, a recent systematic review and meta-analysis of 59 studies across 24 countries, including over 3 million patients with chronic kidney disease, showed that while 81.3% of patients received eGFR monitoring, only 47.4% underwent albuminuria testing [24].

The example of the KFRE suggests alternative scenarios, illustrated in panels B and C of When the situation reduces to classical selection bias, prognostic model development may still yield coefficients representative of the underlying higher-risk population. By contrast, conditioning on a collider distorts associations between all baseline predictors-not only P 1 and P 2 -and the outcome [13].

In the context of the KFRE, declining eGFR prompts uACR testing, while perceived overall kidney failure risk-reflected by symptoms and comorbidities such as diabetes-independently influences the same decision. This double dependence of predictor availability on both eGFR and the perceived risk of the outcome characterises collider restriction bias [25].

Reference

This content is AI-processed based on open access ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut