Repeating an imperfect biomarker test based on an initial result can introduce bias and influence misclassification risk. For example, in some blood donation settings, blood donors' hemoglobin is remeasured when the initial measurement falls below a minimum threshold for donor eligibility. This paper explores methods that use data resulting from processes with conditionally repeated biomarker measurement to decompose the variation in observed measurements of a continuous biomarker into population variability and variability arising from the measurement procedure. We present two frequentist approaches with analytical solutions, but these approaches perform poorly in a dataset of conditionally repeated blood donor hemoglobin measurements where normality assumptions are not met. We then develop a Bayesian hierarchical framework that allows for different distributional assumptions, which we apply to the blood donor hemoglobin dataset. Using a Bayesian hierarchical model that assumes normally distributed population hemoglobin and heavy tailed $t$-distributed measurement variation, we found that the total measurement variation accounted for 22\% of the total variance among females and 25\% among males, with population standard deviations of $1.07\, \rm g/dL$ for female donors and $1.28\, \rm g/dL$ for male donors. Our Bayesian framework can use data resulting from any clinical process with conditionally repeated biomarker measurements to estimate individuals' misclassification risk after one or more noisy continuous measurements and inform evidence-based conditional retesting decision rules.
Biomarker levels, such as blood pressure, blood glucose, cholesterol, C-reactive protein, and hemoglobin, play a prominent role in modern medicine. Diagnosis and treatment decisions often involve dichotomizing a continuous biomarker to classify an individual as positive for a condition (e.g., diagnose diabetes based on hemoglobin A1C) or as indicated for an intervention (e.g., transfuse red cells based on hemoglobin). When using imperfect tests, repeating a biomarker measurement can reduce measurement uncertainty and lower the risk of misclassification (false positives or false negatives). Because measurements close to a decision threshold are more likely to produce misclassifications, clinicians often observe an initial measurement before deciding whether to collect an additional measurement.
Repeating critical values in clinical chemistry laboratories is also common, but its added value is uncertain [1,2,3]. However, the specific re-testing strategy (when a measurement is repeated and how measurements inform further decisions) may lead to a “sequential testing bias” similar to what is described for clinical trials [4,5]. This paper focuses on the case of measurement of hemoglobin (Hb) prior to blood donation. Low Hb in blood donors can indicate anemia, which can develop donation-associated iron deficiency [6,7]. Thus, as recommended by the WHO [6], most countries screen donors to ensure that Hb levels exceed a minimum threshold before blood donation, often different for male and female donors. Failing the pre-donation Hb test is the single most common reason for on-site deferral of blood donation [8].
Low Hb deferrals protect donor health by preventing the exacerbation of iron deficiency and anemia. However, deferrals lead to the loss of a potential donation, waste blood establishment resources, and are inconvenient for donors who traveled to a donation center. Low Hb deferrals are also donor dissatisfiers, reducing the likelihood of return for future donations [9,10,11].
Pre-donation Hb is usually measured in a fingerstick capillary sample using a point-of-care device.
Prior work has found substantial variation in fingerstick Hb measurements. Fingerstick samples have more pre-analytical “drop-to-drop” variation than venous blood draws [12,13,14], leading to limited sensitivity and specificity when used to diagnose anemia [15]. Therefore, many low Hb deferrals likely result from erroneous low Hb measurements and may be unnecessary [16]. Several blood establishments reported to repeat a Hb fingerstick measurement that is below the threshold for donation [17].
Wasteful “false positive” low hemoglobin deferrals must be balanced against “false negatives,” when a donor is classified as having sufficient Hb due to an erroneously high Hb measurement. Risk of false negatives must be minimized to avoid removing iron-containing blood from donors with insufficiently recovered Hb or iron deficiency anemia from another cause. The risk of false positives and negatives depends on both the measurement uncertainty distribution as well as the distribution of Hb levels in blood donor populations.
The questions that arise are: when is it sensible to repeat a capillary Hb measurement? And how should we interpret these repeated measurements? From the blood service perspective, it is tempting to stop when the measurement is above the threshold, using the maximum of all measurements. It was shown by Chung et al. (2017) [18] that such a testing strategy may lead to biases in the recorded Hb levels, and Pothast et al. (2025) [19] showed this strategy skews the distribution of recorded Hb levels.
Quantifying the sources of variation can inform whether a Hb measurement is potentially misclassified and whether a repeat measurement is applicable.
In this paper, we investigate several methods to determine the measurement variation from datasets in which repeated measurements are conditionally observed and apply these methods to quantifying measurement variability in blood donor fingerstick Hb measurements. In section 2 we provide background information and mathematical notation for the problem of conditionally repeated measurements. In section 3 we describe the dataset at our disposal. Then in section 4 we derive two frequentist methods to decompose the sources of variation under normality assumptions. After observing that this assumption is not met in our data and studying how this can affect our estimates, we resort to Bayesian methods in section 5, where we model other distributions explicitly and we show how Hb measurements in our data can be best represented. Finally, in section 6 we discuss our results and how they can aid in interpreting repeated (Hb) measurements and other clinical applications.
We assume that the total variation of a biomarker level measured across a population of individuals is coming from two sources: (1) the variation in the population of the “true” level (the “between persons” variation) and (2) t
This content is AI-processed based on open access ArXiv data.