When to repeat a biomarker test? Decomposing sources of variation from conditionally repeated measurements

Repeating an imperfect biomarker test based on an initial result can introduce bias and influence misclassification risk. For example, in some blood donation settings, blood donors' hemoglobin is remeasured when the initial measurement falls below a …

Authors: Supun Manathunga, Mart P. Janssen, Yu Luo

When to rep eat a biomark er test? Decomp osing sources of v ariation from conditionally rep eated measuremen ts Supun Manath unga 1 , Mart P . Janssen 2 , Y u Luo 3 , W. Alton Russell 1,4 , and Mart P othast 2 1 Exp erimen tal Medicine, McGill Universit y, Montreal, Canada 2 T ransfusion T echnology Assessmen t, Sanquin Research, Amsterdam, Netherlands 3 Departmen t of Mathematics, King’s College, London, United Kingdom 4 Epidemiology , Biostatistics and Occupational Health, McGill Univ ersity, Mon treal, Canada F ebruary 17, 2026 Abstract Rep eating an imperfect biomark er test based on an initial result can introduce bias and influence misclas- sification risk. F or example, in some blo o d donation settings, blo o d donors’ hemoglobin is remeasured when the initial measurement falls b elow a minim um threshold for donor eligibility . This pap er explores metho ds that use data resulting from processes with conditionally repeated biomarker measuremen t to decomp ose the v ariation in observed measurements of a contin uous biomark er in to population v ariabilit y and v ariabilit y aris- ing from the measurement procedure. W e presen t t wo frequentist approaches with analytical solutions, but these approac hes p erform p o orly in a dataset of conditionally rep eated blo o d donor hemoglobin measurements where normality assumptions are not met. W e then develop a Bay esian hierarchical framework that allows for differen t distributional assumptions, whic h w e apply to the blo o d donor hemoglobin dataset. Using a Ba yesian hierarc hical mo del that assumes normally distributed population hemoglobin and heavy tailed t -distributed measuremen t v ariation, we estimate that measurement v ariation is resp onsible for 22% of the total v ariance for females and 25% for males in point-of-care hemoglobin measures, with p opulation standard deviations of 1 . 07 g / dL for female donors and 1 . 28 g / dL for male donors. Our Bay esian framew ork can use data resulting from any clinical pro cess with conditionally rep eated biomarker measurement to estimate individuals’ mis- classification risk after one or more noisy contin uous measurements and inform evidence-based conditional retesting decision rules. Keywor ds: Me asur ement variation, R ep eate d me asurements, Bayesian mo del ling, hier ar chic al mo dels, Blo o d donation 1 1 In tro duction Biomark er lev els, such as blo o d pressure, blo o d glucose, c holesterol, C-reactive protein, and hemoglobin, pla y a prominen t role in mo dern medicine. Diagnosis and treatmen t decisions often inv olve dic hotomiz- ing a contin uous biomark er to classify an individual as p ositive for a condition (e.g., diagnose diab etes based on hemoglobin A1C) or as indicated for an in terv en tion (e.g., transfuse red cells based on hemoglobin). When using imp erfect tests, rep eating a biomarker measuremen t can reduce measure- men t uncertaint y and low er the risk of misclassification (false p ositives or false negatives). Because measuremen ts close to a decision threshold are more likely to pro duce misclassifications, clinicians often observ e an initial measurement b efore deciding whether to collect an additional measurement. Rep eating critical v alues in clinical chemistry lab oratories is also common, but its added v alue is uncertain [1, 2, 3]. Ho w ever, the sp ecific re-testing strategy (when a measurement is rep eated and ho w measurements inform further decisions) may lead to a “sequen tial testing bias” similar to what is described for clinical trials[4, 5]. This pap er fo cuses on the case of measuremen t of hemoglobin (Hb) prior to blo o d donation. Low Hb in blo o d donors can indicate anemia, which can dev elop donation-asso ciated iron deficiency [6, 7]. Th us, as recommended by the WHO[6], most coun tries screen donors to ensure that Hb levels exceed a minim um threshold b efore blo od donation, often different for male and female donors. F ailing the pre-donation Hb test is the single most common reason for on-site deferral of blo o d donation[8]. Lo w Hb deferrals protect donor health by preven ting the exacerbation of iron deficiency and anemia. Ho w ever, deferrals lead to the loss of a p otential donation, waste blo o d establishmen t resources, and are incon v enient for donors who trav eled to a donation center. Lo w Hb deferrals are also donor dissatisfiers, reducing the likelihoo d of return for future donations[9, 10, 11]. Pre-donation Hb is usually measured in a fingerstic k capillary sample using a p oint-of-care device. Prior work has found substantial v ariation in fingerstic k Hb measurements. Fingerstic k samples hav e more pre-analytical ”drop-to-drop” v ariation than venous blo o d draws[12, 13, 14], leading to limited sensitivit y and sp ecificity when used to diagnose anemia[15]. Therefore, many low Hb deferrals lik ely result from erroneous low Hb measuremen ts and may b e unnecessary[16]. Several blo o d establishmen ts rep orted to rep eat a Hb fingerstick measurement that is b elow the threshold for donation[17]. W asteful ”false p ositive” lo w hemoglobin deferrals must b e balanced against ”false negativ es,” when a donor is classified as having sufficient Hb due to an erroneously high Hb measurement. Risk of false negativ es must b e minimized to av oid removing iron-containing blo o d from donors with insufficie n tly reco v ered Hb or iron deficiency anemia from another cause. The risk of false p ositives and negatives dep ends on b oth the measurement uncertaint y distribution as w ell as the distribution of Hb levels in blo o d donor p opulations. The questions that arise are: when is it sensible to repeat a capillary Hb measuremen t? And how should we interpret these rep eated measurements? F rom the blo o d service p ersp ective, it is tempting to stop when the measuremen t is abov e the threshold, using the maxim um of all measuremen ts. It w as 2 sho wn b y Ch ung et al. (2017)[18] that such a testing strategy may lead to biases in the recorded Hb lev els, and Pothast et al. (2025)[19] show ed this strategy skews the distribution of recorded Hb levels. Quan tifying the sources of v ariation can inform whether a Hb measuremen t is p otentially misclassified and whether a rep eat measurement is applicable. In this paper, we in v estigate sev eral methods to determine the measuremen t v ariation from datasets in which rep eated measurements are conditionally observed and apply these metho ds to quantifying measuremen t v ariability in blo o d donor fingerstick Hb measuremen ts. In section 2 we provide bac k- ground information and mathematical notation for the problem of conditionally rep eated measure- men ts. In section 3 w e describ e the dataset at our disp osal. Then in section 4 w e deriv e t w o frequen tist metho ds to decomp ose the sources of v ariation under normalit y assumptions. After observing that this assumption is not met in our data and studying ho w this can affect our estimates, we resort to Ba y esian metho ds in section 5, where we mo del other distributions explicitly and we show how Hb measuremen ts in our data can b e b est represen ted. Finally , in section 6 we discuss our results and ho w they can aid in in terpreting repeated (Hb) measurements and other clinical applications. 2 Bac kground 2.1 Notation and terminology W e assume that the total v ariation of a biomark er level measured across a p opulation of individuals is coming from t wo sources: (1) the v ariation in the p opulation of the “true” level (the ”b etw een p ersons” v ariation) and (2) the v ariation coming from the measurement, whic h is defined as the v ariation b etw een rep eated measurements of the same individual at a single p oint in time. Another source of v ariation is the v ariation of the “true” level within an individual ov er time, but here we consider that to b e part of the p opulation v ariation[20]. W e define the true biomarker lev el for an individual i as T i , ignoring temp oral v ariabilit y within individuals. T i is drawn from the p opulation distribution with mean µ and noise ϵ pop , i.e., T i = µ + ϵ pop . A biomark er measurement x i with noise ϵ meas can be written as: x i = T i + ϵ meas = µ + ϵ pop + ϵ meas (1) Assuming indep endent ϵ pop and ϵ meas , the distribution of biomark er measuremen ts is the conv olution of population v ariability and measurement error: f X ( x ) = ( f pop ∗ g meas ) ( x ) = Z f pop ( t ) g meas ( x − t ) dt (2) where f pop ( · ) is the probability densit y function of biomarker lev els in the p opulation and g meas ( · ) the probabilit y density function of a single measuremen t. 3 2.2 Non-conditionally rep eated measurements The measurement v ariability in a noisy test can b e estimated when rep eated measurements are av ail- able for the same individuals. F or an individual i whose true biomarker level is T i , biomarker mea- suremen t j ∈ { 1 , . . . , J } can b e written as: x i,j = T i + ϵ i,j , (3) where ϵ i,j denote random measurement error. With tw o measuremen ts p er individual, we hav e: x i, 1 = T i + ϵ i, 1 x i, 2 = T i + ϵ i, 2 . (4) and the difference b etw een the tw o measuremen ts: ∆ i = x i, 1 − x i, 2 = ϵ i, 1 − ϵ i, 2 (5) If w e assume that ϵ i,j has mean 0, v ariance σ 2 meas , and is indep endent of T i and j , then V ar(∆ i ) = V ar( ϵ i, 1 ) + V ar( ϵ i, 2 ) = 2 σ 2 meas σ 2 meas = V ar(∆ i ) / 2 . (6) Th us, by estimating the v ariance of the difference b etw een tw o rep eated measurements, one can estimate σ 2 meas , the v ariance of the measuremen t procedure. 2.3 Conditionally rep eated measurements W e now consider the case when the decision to tak e a second measurement dep ends on the result of a first measurement; for example, when a second fingerstic k Hb is only recorded if a blo o d donor’s first fingerstic k Hb falls b elow the donor eligibility threshold. Figure 2 shows an example with simulated data. In suc h settings, pairs of measuremen ts are conditionally observed and eq. (6) no longer pro vides an un biased estimate of the measuremen t v ariance σ 2 meas . Let c denote a threshold b elow which the first measurement is rep eated. If we only observe ∆ i = x i, 1 − x i, 2 when x i, 1 < c , this induces selection on the measuremen t error ϵ i, 1 while ϵ i, 2 remains un biased. Naiv ely applying eq. (6) will result in: ˆ σ 2 meas = V ar(∆ i | x i, 1 < c ) 2 = V ar( ϵ i, 1 | x i, 1 < c ) + σ 2 meas 2 . Because conditional retesting reduced the observ ed v ariability in x i, 1 , V ar(∆ i | x i, 1 < c ) < 2 σ 2 meas and ˆ σ 2 meas will be a biased underestimation of σ 2 meas . 4 When the p opulation biomarker levels and measuremen t noise are indep endent and normally dis- tributed, this bias can b e explicitly expressed. Let T i ∼ N ( µ, σ 2 pop ) , (7) ϵ i,j ∼ N (0 , σ 2 meas ) . (8) The marginal distribution of a biomarker measurement x i,j is then normal with mean µ and total v ariance: σ 2 total = σ 2 pop + σ 2 meas . (9) Let α = c − µ σ total and λ = ϕ ( α ) Φ( α ) , (10) where ϕ and Φ denote the standard normal densit y and cumulativ e distribution functions, resp ectively . Then, from Johnson (1994)[21], V ar( ϵ i, 1 | x i, 1 < c ) = σ 2 meas  1 − σ 2 meas σ 2 total  αλ + λ 2   , (11) and therefore V ar(∆ i | x i, 1 < c ) = 2 σ 2 meas − σ 4 meas σ 2 total  αλ + λ 2  . (12) Therefore, the naiv e estimator ˆ σ 2 meas = V ar(∆ i | x i, 1 < c ) / 2 underestimates σ 2 meas b y σ 4 meas 2 σ 2 total  αλ + λ 2  . The magnitude of the bias depends on the threshold c through α . Thus, normalit y assumptions enable explicit correction for this truncation via properties of the conditional normal distribution.W e compare bias curv es obtained from this theoretical result to simulated data in fig. B1. 3 Blo o d donor Hb data F ollowing sections will assess metho ds using pre-donation fingerstic k Hb screening data from Vitalant, one of the largest bloo d collectors in the United States. Our dataset includes visits b et w een Jan uary 2017 and Octob er 2022. The full dataset contains 2,582,402 unique donors and 9,099,136 donation visits, of which 6,528,084 had a recorded pre-donation fingerstick Hb measurement. W e restricted the analysis to only first visits of any type of intended donation, resulting in 1,863,159 visits from unique donors. W e further selected only same-day rep eated measuremen ts to isolate measuremen t v ariability from longer-term biological v ariation. The data were stratified by sex, with donation eligibility thresholds of 13 g/dL for males and 12.5 g/dL for females. Among males, 18,173 of 849,469 (2.1%) initial Hb measurements were b elow the eligibility threshold, prompting a second measuremen t for 17,195 visits (94.6%). Among females, 123,379 of 1,013,690 (12.2%) initial Hb 5 measuremen ts fell b elow the threshold, prompting a second measurement for 114,840 visits (93.1%). A flow chart of the data selection pro cess is provided in fig. A1. The distribution of initial Hb measuremen ts and the relationship b etw een initial and rep eat measurements are shown in fig. 1. 4 F requen tist approac hes In this section, we deriv e and ev aluate t wo frequen tist metho ds for deriving the measurement and p opulation v ariance from conditionally rep eated measurements, correcting for the bias induced by naiv ely applying eq. (6). Both metho ds assume T i and ϵ i,j are normally distributed. Therefore, rep eated measuremen ts follow a multiv ariate normal distribution with J dimensions where J is the n um b er of rep eated measurements p er individual. When J = 2, fully observed measurement pairs ( x i, 1 , x i, 2 ) follo w a biv ariate normal distribution with mean ( µ, µ ) and cov ariance matrix Co v( x i, 1 , x i, 2 ) =   σ 2 total ρσ 2 total ρσ 2 total σ 2 total   . (13) The total v ariance is given by eq. (9) and the correlation co efficient b etw een x i, 1 and x i, 2 is ρ = σ 2 pop σ 2 total . (14) Note that estimating σ 2 pop and σ 2 meas is equiv alent to estimating ρ and σ 2 total . Because x i, 1 is observ ed for all N individuals, we can estimate the mean true biomark er level µ and total v ariance σ 2 tot without bias as: ˆ µ = 1 N N X i =1 x i, 1 (15) ˆ σ 2 total = P N i =1 ( x i, 1 − ˆ µ ) 2 N − 1 . (16) 4.1 Conditional exp ectation metho d When the second measurement is only observed when x 1 < c , the conditional means of paired mea- suremen ts are shifted down ward relative to µ according to the truncation factor λ from eq. (10): E [ x 1 | x 1 < c ] = µ − σ total λ, (17) E [ x 2 | x 1 < c ] = µ − ρ σ total λ. (18) The correlation co efficien t ρ is iden tifiable from the difference b etw een the tw o conditional means: ρ = 1 − E [ x 2 | x 1 < c ] − E [ x 1 | x 1 < c ] σ total λ . (19) In practice, ˆ ρ CE is obtained using eq. (19) b y replacing the conditional exp ectations with the sample means of the truncated paired observ ations and the sample mean and v ariance as in eq. (16). 6 T able 1: Comparison of true and estimated v ariance parameters across metho ds for sim ulated data, with estimates rep orted as mean ± SD of 1000 simulated datasets with N = 10000. Metho d T rue σ 2 pop Est. σ 2 pop ± SD T rue σ 2 meas Est. σ 2 meas ± SD Conditional expectation 1 . 00 1 . 00 ± 0.04 0 . 64 0 . 64 ± 0.03 Maxim um likelihoo d 1 . 00 1 . 00 ± 0.02 0 . 64 0 . 64 ± 0.02 4.2 Maxim um likelihoo d estimation The correlation coefficient ρ can also be estimated through maxim um lik eliho o d. Assuming a biv ariate normal distribution, the log-likelihoo d of the data under truncation is L ( µ, σ total , ρ ) = n X i =1 log f 2 D ( x i, 1 , x i, 2 | µ, σ total , ρ ) + n log P ( x i, 1 < c ) , (20) where f 2 D represen ts the density function of the biv ariate normal distribution and P ( x 1 < c ) can b e ev aluated using the cumulativ e distribution function of the univ ariate normal distribution. Setting the deriv ative with resp ect to ρ equal to 0 gives us: d L dρ = ρ 3 − (1 + ρ 2 ) 1 N X i x ′ i, 1 x ′ i, 2 + ρ 1 N X i ( x ′ 2 i, 1 + x ′ 2 i, 2 ) − 1 ! = 0 , (21) where x ′ i,j = x i,j − µ σ total and N is the num b er of paired observ ations. In practice, ˆ ρ MLE is obtained by setting µ = ˆ µ and σ total = ˆ σ total from eq. (16). Note that the cutoff c does not app ear in d L dρ . Therefore, unlik e ˆ ρ CE , estimating ρ using maxim um lik eliho o d does not require knowledge of the retesting cutoff. 4.3 F requentist approaches in simulated data W e simulated Hb measurements with the conditional rechec king under normality assumptions. Fig- ure 2 shows simulated data with p opulation mean µ = 15 g / dL, population standard deviation σ pop = 1 g / dL, and measurement standard deviation σ meas = 0 . 8 g / dL, and a retesting cutoff c = 13 g / dL. F or these parameters, the correlation b etw een the initial and the rep eated measuremen t, if ev ery measuremen t was unconditionally rep eated is ρ = 1 2 1 2 +0 . 8 2 ≈ 0 . 61. W e estimate ˆ ρ from the conditionally rep eated measurement data by the conditional exp ectation metho d (eq. (19)) and the maximum lik eliho o d metho d (eq. (21)) on 1000 simulated datasets with N = 10000 initial measurements. Both metho ds successfully recov er the true ρ (table 1). Similar results were found using v arious parameters for µ , σ pop , σ meas , and c (data not shown). 7 Conditionally rep eated measurements with additional dep endencies In the previous sim ulation, the only condition for rep eating a measuremen t was the initial measurement b eing less than the cutoff v alue, so p =      1 , if x 1 < c 0 , otherwise (22) where p is the probability of observing a repeat measurement. But more complex conditional retesting pro cesses are p ossible. F or example, initial v alues that are closer to the cutoff migh t be more likely to b e rep eate d. T o assess our metho ds under such conditions, we simulated data in which an individual w as conditionally retested with the follo wing probabilit y: p =      e − r ( c − x 1 ) , if x 1 < c 0 , otherwise . (23) A larger rate parameter r means that individuals with an initial measuremen t far from the threshold are less likely to b e retested, as visualized in fig. B2. W e observed substan tial bias in ˆ ρ CE , for larger v alues of r , but ˆ ρ MLE remained unbiased across simulations (fig. 3). This result is exp ected since ˆ ρ CE dep ends on the cutoff c , but ˆ ρ MLE do es not. Thus, our sim ulations sho w that ˆ ρ MLE is more robust to the specific conditions gov erning which individuals are retested. 4.4 F requentist approaches in real data Using the full dataset describ ed in section 3, we estimated the measurement error v ariance and p opu- lation v ariance using the conditional exp ectation and the maximum likelihoo d metho ds. Uncertaint y w as estimated using 1,000 b o otstrap samples. The estimates obtained using the conditional exp ectation metho d and the maximum likelihoo d metho d are summarized in table 2 and the distributions of the estimates by each metho d are depicted in fig. 4. W e observ ed a significant difference in estimates for the p opulation v ariance, measurement error v ariance across the tw o me tho ds and across males and females for each metho d. The maximum lik eliho o d metho d should not b e susceptible to additional conditional dep endencies (as studied in section 4.3), but what is remark able is that the measurement error v ariance is significantly different for males ( ˆ σ 2 meas = 0 . 61 ± 0 . 03 g / dL) and females ( ˆ σ 2 meas = 0 . 34 ± 0 . 01 g / dL). This leads us to b eliev e that there are additional systematic uncertainties for this metho d, which we will inv estigate b elo w. 4.5 Limitations of frequen tist approaches The conditional exp ectation method and the maxim um lik eliho o d method both rely on the assumption that measurement errors are normally distributed. I n the presence of outliers (heavy tails), the v ariance decomp osition and the correlation relationship in eq. (14) no longer hold, and the resulting estimates can be biased. 8 T able 2: Estimated p opulation v ariance, measurement error v ariance rep orted as mean ± SD by method and sex. Metho d Sex ˆ σ 2 pop ˆ σ 2 meas Conditional expectation F emale 1 . 05 ± 0.01 0 . 38 ± 0.01 Conditional expectation Male 1 . 57 ± 0.01 0 . 52 ± 0.02 Maxim um likelihoo d F emale 1 . 10 ± 0.01 0 . 34 ± 0.01 Maxim um likelihoo d Male 1 . 48 ± 0.02 0 . 61 ± 0.03 T o study the effect of outliers, we sim ulated conditionally rep eated Hb lev els, similar to section 4.3, but with a t -distributed measurement error with different degrees of freedom df . The scale parameter of the t distribution is denoted b y s ; the corresp onding v ariance then equals s 2 df df − 2 for df > 2. Therefore, the estimated measurement uncertaint y , using our naive approac hes from b efore, ˆ σ meas should b e m ultiplied by q df df − 2 to compare with s . The difference betw een the estimated ˆ σ meas using the maxim um likelihoo d metho d (the conditional exp ectation metho d b ehav es similarly) and the simulated v alue is shown in fig. 5. F or large df the scenario matches the normal-error case and bias is negligible. How ever, for df < 10 b oth metho ds o v erestimates ˆ σ meas b y more than 5%. Extending the closed-form estimates for ρ and σ meas to non-normal distributional assumptions is c hallenging. Instead, section 5 prop oses a Bay esian hierarchical mo delling framework to decomp ose measuremen t and p opulation v ariability from conditionally rep eated measurements under flexible distributional assumptions. 5 Ba y esian approac h Ba y esian hierarchical models hav e gained p opularity in cases when rep eated measuremen ts are condi- tionally dep endent. In such settings, hierarchical formulations allow the decomp osition of v ariability in to measuremen t-level noise, unit-level heterogeneity , and higher-order contextual dep endence. There is a ric h literature demonstrating the Ba yesian hierarchical approaches in capturing complex dep en- dence structures. F or example, Gustafson (2003)[22] provided one of the foundational treatmen ts of Bay esian hierarchical mo delling for measurement error and misclassification, demonstrating how hierarc hical structures can explicitly represent uncertaint y in b oth exp osure assessmen t and outcome pro cesses. Greenland (2005)[23] incorp orated Bay esian hierarchical mo delling within a multiple-bias framew ork to simultaneously adjust for several key sources of bias, including exp osure misclassifica- tion, selection bias, and confounding, in an observ ational study of childhoo d leukemia. Similarly , Luo et al. (2018)[24] employ ed a hierarchical Bay esian framew ork to estimate the prev alence of atten tion- deficit/h yp eractivit y disorder (ADHD), accoun ting fully for the uncertaint y , v ariability and spatial dep endence for the estimate. Therefore, in this section, w e will apply the hierarchical mo delling framew ork to decomp ose the v ariations arising from conditionally rep eated measurements of a con- 9 tin uous biomarker. 5.1 Ba yesian mo del structure W e mo del biomarker measurements using a tw o-level measuremen t error framework. As b efore, an individual i has an unobserved true biomark er level T i at the time of a visit. W e assume that T i ∼ f pop ( · | θ pop ) , (24) where f pop denotes the p opulation distribution of true biomark e r level with parameters θ pop . An observed measurement x i,j is a noisy observ ation of T i , affected by measurement error with parameters θ meas x i,j | T i ∼ g meas ( · | T i , θ meas ) . (25) A pair of measuremen ts on the same individual x i, 1 and x i, 2 ha v e the same underlying true measure T i , so any within-pair v ariability is attributable to the measuremen t pro cess, as shown graphically in fig. 6. This allows for identification of the measuremen t error distribution. The following sections c onsider hierarchical mo dels with four sets of distributional assumptions for the laten t true biomark er level T i and the measuremen t error process ϵ meas (sho wn in detail in fig. B3): a) Model a : Normal true biomarker level with normal measuremen t error b) Model b : Normal true biomarker level with Studen t- t measuremen t error c) Model c : Normal true biomarker level with mixture of normal measurement error d) Model d : Skew-normal true biomarker level with Student- t measurement error 5.2 Mo del estimation in simulated data W e used simulations to assess whether our hierarchical mo delling framework could correctly estimate underlying parameters when the distribution of the data generating pro cess is known. First, we sim ulated four datasets corresp onding to the distributions of mo dels a-d with parameter v alues giv en in table C1. Then, w e fit the corresp onding Bay esian model to eac h sim ulated dataset using Mark o v chain Mon te Carlo (MCMC). Mo dels used weakly informative priors for top-level parameters suc h that the prior predictive distribution of observed Hb measurements lay within physiologically plausible ranges, with most of the mass b et ween 12 and 18 g / dL (table C2)[25]. W e computed p osterior summaries for all parameters. Across simulated datasets, the p osterior distributions concentrate around the true v alues used to generate the data. In particular, the true parameter v alues fall within the 95% credible interv als for the estimated parameters, indicating that the prop osed likelihoo d and prior specification can recov er the data-generating parameters under the study design. P osterior me ans and 95% credible in terv als are reported in table C3. 10 5.3 Mo del selection in simulated data Next, we assessed whether a mo del selection pro cedure would correctly identify the mo del that cor- resp onds to the data-generating pro cess underlying our four sim ulated datasets. Our mo del selection pro cess used 5-fold cross-v alidation to compare candidate mo dels and selected the mo del with the largest marginal log p oint wise predictive density (marginal LPPD). F ollowing standard K -fold cross- v alidation, the data are partitioned into K disjoint folds. F or each fold, the mo del is fit to the remaining K − 1 folds and ev aluated on the held-out data. F or a Ba yesian mo del with parameters θ , predictiv e p erformance on a v alidation set is summarized by the LPPD, whic h in tegrates o ver p osterior uncertain t y in the parameters[26]. F or a v alidation fold con taining n observ ations { x i } n i =1 , the LPPD is defined as LPPD = n X i =1 log  Z p ( x i | θ ) p post ( θ ) dθ  , (26) where p post ( θ ) denotes the p osterior distribution of θ obtained from the training data. Our generative mo del includes individual-sp ecific latent true Hb v alues that are not shared across training and v alidation folds. Consequen tly , predictive ev aluation m ust integrate ov er these latent v ariables rather than conditioning on their p osterior v alues from the training fit. W e therefore com- pute a marginal predictiv e densit y for the v alidation data by in tegrating out the latent true Hb v alues, yielding predictions that are unconditional on any sp ecific latent state and appropriate for new indi- viduals. F or a v alidation set with n observ ations, the marginal LPPD (mLPPD) is given by mLPPD = n X i =1 log Z Z p ( x i | T i , θ ) p ( T i | θ ) p post ( θ ) dT i dθ ! (27) where θ = ( θ pop , θ meas ). W e appro ximate the outer in tegral using S posterior dra ws { θ ( s ) } S s =1 ob- tained from the training-set fit. F or each fixed θ ( s ) , the inner integral ov er the latent true Hb T i is appro ximated using R Monte Carlo dra ws { x ( s,r ) i } R r =1 ∼ p ( x i | θ ( s ) ). The resulting computed marginal LPPD (cLPPD) estimator is cLPPD = n X i =1 log 1 S S X s =1 b p  x i | θ ( s )  ! , (28) where b p  x i | θ ( s )  = 1 R R X r =1 p  x i | T ( s,r ) i , θ ( s )  . F or each true mo del, we computed 5-fold CV cLPPD across all fitted mo dels and compared their total scores ( table C4). As shown in fig. 7, the mo del corresp onding to the true data generation pro cess has the highest cLPPD for three of four datasets and is within a negligible margin of the b est score for the fourth, indicating that cLPPD can reliably reco v er the correct generative mo del 11 T able 3: Paired (foldwise) differences b etw een Model d and other mo dels. P ositive v alues indicate b etter p erfor- mance for Model d . Comparison Mean (cLPPD other - cLPPD Model d ) SD(diff ) SE(diff ) Model b − Model d − 2 . 35 9.52 4.26 Model c − Model d − 7 . 59 15.54 6.95 Model a − Model d − 40 . 83 17.82 7.97 across simulated settings. When comp eting mo dels are close, the differences in cLPPD are small and the ranking is effectively indistinguishable, suggesting that the mo dels are practically equiv alent for prediction in those scenarios. 5.4 Mo del selection in real data W e now apply our mo del selection pro cedure to real data (section 3). T o reduce computation time, w e analyzed a random sample of 10,000 male and 10,000 female donors from the filtered dataset. W e computed 5-fold cLPPD for candidate mo dels a-d ; p er-fold v alues are summarized in table C5. The sk ew–t mo del ( Model d ) ac hieved the largest mean cLPPD, but its adv antage ov er the normal– t mo del ( Model b ) was small (mean difference = 2 . 35, SE = 4 . 26), and the difference relative to the mixture mo del ( Model c ) was similarly mo dest (mean difference = 7 . 59, SE = 6 . 95) (table 3). In con trast, the normal–normal mo del ( Model a ) performed substan tially worse (mean difference = 40 . 83, SE = 7 . 97), indicating that heavier-tailed or skew ed measuremen t error is needed. Because Model d adds an extra skewness parameter with only marginal gains, we selected the normal–t sp ecification ( Model b ) as the final model for parsimony W e fitted this final hierarchical measurement-error mo del ( Model b ) using a random sample of 100,000 donation visits, drawing four chains of 2000 warm-up and 2000 sampling iterations (8000 p ost-w arm up draws in total). Prior specification for the mo del is same as for the mo del used in cross-v alidation pro cess (table C2). 5.5 P osterior inference in real data T race plots of the MCMC indicated go o d conv ergences (fig. B4). Posterior estimates show ed clear sex-sp ecific differences in the p opulation mean and v ariability of true Hb (table 4). The p osterior mean for the true underlying Hb w as 15.74 g/dL for males and 13.82 g/dL for females, each with narro w 95% p osterior in terv als. The estimated p opulation v ariance was higher in males (1 . 63 (g / dL) 2 , 95% CrI: 1.60–1.67) than in females (1 . 13 (g / dL) 2 , 95% CrI: 1.12–1.15). Measuremen t error scale parameters s were similar across sexes (p osterior mean 0 . 36 g / dL) with small degrees of freedom for b oth sexes. V ariation due to the measuremen t is 22% of the total v ariance in females and 25% in males. 12 T able 4: Posterior summary statistics for model parameters, split by sex for each statistic. P arameter Mean SD Q2.5% Q97.5% Male F emale Male F emale Male F emale Male F emale µ 15 . 74 13 . 82 0 . 01 0 . 01 15 . 73 13 . 81 15 . 75 13 . 82 σ 2 pop 1 . 63 1 . 13 0 . 02 0 . 02 1 . 60 1 . 12 1 . 67 1 . 15 s 0 . 36 0 . 36 0 . 01 0 . 01 0 . 34 0 . 35 0 . 38 0 . 37 df 2 . 60 3 . 28 0 . 09 0 . 10 2 . 45 3 . 12 2 . 76 3 . 46 T able 5: Misclassification due to the measurement uncertaint y as determined from the mo del p osterior true Hb ( T i ) and sampled measurements ( x i ). The threshold c is 12 . 5 g / dL for females and 13 g / dL for males. FD = false deferral; FB = false bleed; PPV = p ositive predictive v alue; NPV = negativ e predictiv e v alue. Sex Strategy FD (%) FB (%) 1 − PPV (%) 1 − NPV (%) Male Single 0 . 9 0 . 4 48 0 . 4 Rep eat 0 . 2 0 . 6 15 0 . 6 F emale Single 3 . 3 2 . 5 30 2 . 8 Rep eat 0 . 7 3 . 7 10 4 . 0 Practical implications Using p osterior draws from the fitted measurement-error mo del, w e can compute the p osterior distri- bution of a donor’s true latent Hb ( T i ) conditional on one or tw o observ ed fingerstick measuremen ts and estimate the p osterior probability that the true Hb exceeds the eligibility threshold. F or example, an initial measuremen t of 12 . 8 g / dL yields a 47.3% p osterior probability that the true Hb exceeds the threshold of 13 g / dL. When a second measuremen t of 12 . 4 g / dL is observ ed, this probability decreases to 17.4%, while a second measuremen t of 13 . 2 g / dL increases it to 59.1%. The corresp onding p osterior densit y shifts for these scenarios are sho wn in fig. 8. W e determined the misclassification rate due to the measuremen t b y comparing latent true Hb ( T i ) of the p osterior samples with simulated measurements from the mo del ( x i ). F alse deferrals are considered to b e truly eligible with T i ≥ c , but ha v e a measurement x i < c . Vice versa, false bleeds ha v e T i < c , but x i ≥ c . W e determined the percentage of false deferrals and false bleeds for a strategy that is based on a single measurement and for a strategy with a repeated measurement only if the prior measuremen t is b elo w the threshold (see table 5). As may b e exp ected the num b er of false deferrals is reduced by rep eating low measurements at the cost of an increase of false bleeds. 13 6 Discussion Conditionally repeating a contin uous biomarker test introduces a form of sequen tial testing bias. Our pap er illustrates how data arising from such pro cesses can b e used to isolate the contribution of the measuremen t pro cess to the total v ariation and quantify the risk of misclassification based on one or more biomark er measurements. First, w e demonstrated tw o frequen tist methods that assume normally distributed measurement error, including a maximum likelihoo d metho d that is robust to the sp ecific conditions under which rep eated testing is p erformed. But, when applied to conditionally retested blo o d donor Hb measurements, metho ds unexp ectedly led to inconsistent estimates of measurement v ariation b etw een male and female donors (section 4.4). Second, we introduced a Ba yesian hierarchical mo delling framework that allows flexible distributional assumptions. Applying this framework to the blo o d donor Hb dataset, we found sup erior out-of-sample prediction using a hea vy tailed distribution for the measurement error, suggesting that the normality assumptions of our frequentist approaches made them inappropriate for this application. Routine rep eat testing of Hb measurements b elow the threshold for donation is intended to reduce false deferrals caused by measurement error. How ever, in the presence of significan t measurement uncertain t y this practice may hav e un w an ted consequences: it ma y increase the chance that donors with gen uinely lo w Hb may still b e accepted and it is not clear if this practice has the optimal effect to reduce the num b er of deferrals. Dev eloping an evidence-based testing strategy requires separating measuremen t v ariability from the v ariability b etw een individuals. The current blo o d donation datasets that are av ailable to us, lik e from Vitalant (US), present a metho dological challenge, as second measurements are observed only after an initial lo w result. This conditional sampling violates the assumptions of standard rep eat-measurement analyses that rely on unconditionally observ ed pairs. T o address this, we dev elop ed metho ds that explicitly accoun t for this selection mec hanism in estimating measurement error v ariance. Under the assumption that all sources of v ariation follow normal distributions, the rep eated mea- suremen ts can b e represented as draws from a biv ariate normal distribution. W e show ed that this allo ws us to get un biased estimation of the correlation co efficient and, together with the total v ariance, can b e used to decomp ose the v ariation into that present in the p opulation and from the measure- men t. How ever, if the normalit y assumptions are not met, these estimates ma y be biased. Indeed, we found that when applying these approaches yielded different measurement error v ariances for males and females, which is unexp ected. T o relax the distributional assumptions, w e constructed a hierarchical Bay esian mo del that allo ws to specify sp ecific distributions for both p opulation and measurement distributions. Of the four model classes ev aluated, the most parsimonious and b est-fitting mo del assumed a normal distribution for the p opulation and a t -distribution for measurement error. Applying this mo del to the Hb data yielded a similar scale parameter s = 0 . 36 g / dL for b oth males and females. W e found that p opulation v ariability is smaller in females d σ pop = 1 . 07 g / dL than in males d σ pop = 1 . 28 g / dL. P opulation means 14 w ere also low er in females than in males ( d µ pop = 13 . 82 g / dL vs d µ pop = 15 . 73 g / dL), consistent with kno wn sex differences in Hb levels. Note that the ratio σ pop /µ pop is quite similar b etw een males (8 . 2%) and females (7 . 8%). The reduced p opulation v ariance in female donors ma y also reflect a selection effect, as individuals with very low Hb are less likely to presen t for donation and thus ma y b e underrepresented in the dataset. Suc h a mec hanism could induce skewness in the female Hb distribution. Such a mo del with skewness did marginally show a b etter fit, though not significantly (section 5.3). It would b e w orth while to explore if suc h a skew ed distribution is appropriate. W e restricted our analysis to settings where rep eated biomarkers are measured in quic k succession (e.g., at a single blo o d donation visit) and within-p erson changes in biomark er levels o ver time ma y b e ignored. It is straigh tforw ard to extent our Ba yesian hierarc hical framew ork to estimate within-p erson fluctuation as a third source of v ariation, which settings where rep eated measuremen ts o ccur on dif- feren t days. Our blo o d donor dataset includes rep eated Hb measuremen ts on different days without in terv ening donations, but these constitute only approximately 0 . 2% of all rep eated measurements. Robust estimation would require more than 100,000 samples, rendering MCMC inferences computa- tionally infeasible. F uture work could inv estigate alternativ e strategies to address the within-individual v ariability . Our Bay esian metho d has other limitations. First, computational scalability is limited. It is w ell kno wn that MCMC b ecomes computationally prohibitive for very large datasets, which restricts our abilit y to use the full set of av ailable measurements. Our approach used man y unrep eated measure- men ts to inform the p opulation distribution, which is inefficient when a ma jorit y of the individuals w ere tested once and do not contribute directly to the estimation of measurement error. F uture w ork could explore approximate Ba yesian metho ds or v ariational inference to improv e scalability , as w ell as targeted sub-sampling schemes that retain efficiency while reducing computational burden. Second, our analysis assumes that measurement error has a mean of 0 and is indep endent of the laten t biomarker level. The first assumption implies that the latent biomark er level one would obtain b y infinite rep eated measures is the true level, ignoring the p ossibility of systematic ov erestimation or underestimation. The assumption of indep endent error ignores the p ossibility that a measure- men t pro cedure is less reliable for some biomarker lev els than others. Indeed, a comparison of three p oin t-of-care Hb devices to a ”gold standard” venous Hb measurement found evidence of systematic underestimation and prop ortional bias [15]. Third, while our Bay esian mo del can estimate the mis- classification risk based on one or tw o biomarker measurements, it only considers sex and biomark er measures at a single p oint in time. F uture work could use additional data ab out the individual to refine predictions. In the case of blo o d donor Hb levels, considering v ariables like donation history , past biomark ers, and weigh t w ould likely reduce the uncertaint y in p osterior predictions. Estimated measurement error and the p opulation Hb distribution in this study can b e used for a probabilistic assessment of donor eligibility . By propagating measuremen t uncertaint y through the 15 mo del, one or tw o fingerstick measurements are conv erted into p osterior probabilities that quantify confidence ab out a donor’s true Hb relative to the threshold (fig. 8). Our mo del also clarifies the v alue of rep eat testing by indicating when a second measurement can meaningfully shift the eligibility probabilit y versus when it adds little information, esp ecially for b orderline v alues near the cutoff. In summary , this study pro vides a framework for disen tangling p opulation-level v ariation from measuremen t error in biomarker measuremen ts under a selective retesting proto col. By explicitly mo delling v arious distributional assumptions through a hierarc hical Bay esian form ulation, w e obtained robust estimates of measurement and p opulation v ariabilit y in Hb measurements in blo o d donors in the US. These results can b e used to inform donor eligibility in a data-driven and evidence based manner. Using our framework it is p ossible to determine the underlying p osterior probabilities of a donors true Hb, which may b e used to accurately ev aluate wether a measuremen t should b e rep eated and ho w to interpret the rep eated measuremen ts. Author con tributions SM conducted data analysis, metho dological dev elopmen t and wrote the man uscript. MPJ supp orted the pro ject by co ordinating collab orations and contributed to the research plan. YL provided sta- tistical exp ertise and contributed to writing the manuscript. W AR sup ervised the pro ject, provided access to the data and con tributed to the man uscript. MP also sup ervised the pro ject, supp orted data analysis and contributed to the writing of the manuscript. W AR and MP are co-senior authors who con tributed equally to this work. MPJ is deceased. Ethics appro v al This study was approv ed by the McGill Universit y Research Ethics Board (reference n um b er 22-05- 018). Ac kno wledgmen ts The authors thank the bloo d donors whose data enabled this study . The authors also thank collabora- tors Ralph V assallo and Marjorie Brav o from Vitalan t Medical Affairs and Brian Custer and Zhanna Kaidaro v a from Vitalant Research Institute for sharing data and providing feedback on our study . Data and Co de Av ailability The analysis co de is a v ailable at https://github.com/ssm123ssm/Hb- variability- - - code.git . The repository includes scripts that simulate data and apply the methods describ ed in the man uscript. 16 Blo o d donor data were analyzed under a data sharing agreement and ethics approv al that do not p ermit sharing of individual-level data. Financial disclosure This researc h w as funded in part by the Natural Sciences and Engineering Researc h Council of Canada (NSER C) [funding reference num b er RGPIN-2023-04160, PI W. Alton Russell]. Conflict of interest The authors declare no p otential conflict of in terests. References [1] Aijun Niu, Xianxia Y an, Lin W ang, Y an Min, and Cheng jin Hu. Utility and necessity of rep eat testing of critical v alues in the clinical chemistry lab oratory . PL oS ONE , 8:e80663, 11 2013. [2] Neda Soleimani, Amir Azadi, Mohammad Jav ad Esmaeili, F atemeh Gho dsi, Reza Ghahramani, Azadeh Hafezi, T ay eb eh Hosseyni, Arezo o Arabzadeh, Samira kha jeh, Mahsa F arhadi, and Sa- hand Mohammadzadeh. T ermination of rep eat testing in chemical lab oratories based on practice guidelines: Examining the effect of rule-based rep eat testing in a transplantation center. Journal of Analytic al Metho ds in Chemistry , 2021:1–7, 5 2021. [3] Su-Chieh Pamela Sun, JuanDavid Garcia, and Joshua A Hayden. Rep eating critical hematology and coagulation v alues wastes resources, lengthens turnaround time, and delays clinical action. Am eric an Journal of Clinic al Patholo gy , 149:247–252, 2 2018. [4] Elena Kulinsk ay a, Richard Huggins, and Samson Henry Dogo. Sequential biases in accumulating evidence. R ese ar ch Synthesis Metho ds , 7:294–305, 9 2016. [5] John Whitehead. On the bias of maximum likelihoo d estimation following a sequential test. Biometrika , 73:573–581, 1986. [6] W orld Health Organization. Blo o d Donor Sele ction: Guidelines on Assessing Donor Suitability for Blo o d Donation . W orld Health Organization, 2012. [7] F emmeke J. Prinsze, Rosa de Gro ot, Tiffany C. Timmer, Saurabh Zalpuri, and Katja v an den Hurk. Donation-induced iron depletion is significantly asso ciated with lo w hemoglobin at subse- quen t donations. T r ansfusion , 61:3344–3352, 12 2021. [8] I V eldhuizen and E W agenmans. Domaine survey on donor management in Eur op e , pages 148– 149. 2010. 17 [9] Marlo es L.C. Spekman, Theo G. v an Tilburg, and Ev a Maria Merz. Do deferred donors con tinue their donations? a large-scale register study on whole blo o d donor return in the netherlands. T r ansfusion , 59:3657–3665, 12 2019. [10] Brian Custer, Karen S. Schlumpf, David W righ t, T oby L. Simon, Susan Wilkinson, and P aul M. Ness. Donor return after temp orary deferral. T r ansfusion , 51:1188–1196, 6 2011. [11] Adrian Bruhin, Lorenz Go ette, Simon Haenni, Lingqing Jiang, Alexander Marko vic, Adrian Ro ethlisb erger, Regula Buchli, and Beat M. F rey . The sting of rejection: Deferring blo o d donors due to low hemoglobin v alues reduces future returns. T r ansfusion Me dicine and Hemother apy , 47:119–128, 2020. [12] Meaghan M. Bond and Reb ecca R. Richards-Kortum. Drop-to-drop v ariation in the cellular comp onen ts of fingerpric k blo o d: Implications for point-of-care diagnostic developmen t. A meric an Journal of Clinic al Patholo gy , 144:885–894, 12 2015. [13] Laura S. Hackl, Crystal D. Karakoch uk, Dora In´ es Mazariegos, Kidola Jeremiah, Omar Ob eid, Nirmal Ra vi, Desalegn A. Ayana, V eronica V arela, Silvia Alay´ on, Omar Dary , and Denish Mo or- th y . Assessing accuracy and precision of hemoglobin determination in venous, capillary p o ol, and single-drop capillary bloo d sp ecimens using three different hemo cue ® hb mo dels: The multi- coun try hemoglobin measurement (heme) study . Journal of Nutrition , 154:2326–2334, 7 2024. [14] David W. Killilea, F rans A. Kuyp ers, Sandra K. Larkin, and Kathleen Sch ultz. Blo o d draw site and analytic device influence hemoglobin measurements. PL oS ONE , 17, 11 2022. [15] Steven Bell, Mic hael Sweeting, Anna Ramond, Ryan Ch ung, Stephen Kaptoge, Matthew W alker, Thomas Bolton, Jennifer Sambrook, Carmel Mo ore, Am y McMahon, Sarah F ahle, Donna Cullen, Susan Mehenn y , Angela M. W o o d, Jane Armitage, Willem H. Ouw ehand, Gail Miflin, David J. Rob erts, John Danesh, and Emanuele Di Angelantonio. Comparison of four metho ds to mea- sure haemoglobin concen trations in whole blo o d donors (compare): A diagnostic accuracy study . T r ansfusion Me dicine , 31:94–103, 4 2021. [16] Mart P . Janssen. Wh y the ma jority of on-site rep eat donor deferrals are completely unw ar- ran ted. . . . T r ansfusion , 62:2068–2075, 10 2022. [17] Saurabh Zalpuri, Bas Romeijn, Elias Allara, Mindy Goldman, Hany Kamel, Jed Gorlin, Ralph V assallo, Yv es Gr ´ egoire, Naok o Goto, P eter Flanagan, Joanna Sp eedy , Andreas Buser, Jose Mauro Kutner, Karin Magnussen, Johanna Castr´ en, Liz Culler, Harry Sussmann, F emmeke J. Prin- sze, Kevin Be langer, V eerle Comp ernolle, Pierre Tib erghien, Jose Manuel Cardenas, Manish J. Gandhi, Kamille A. W est, Cheuk-Kw ong Lee, Sian James, Deanne W ells, Laurie J. Sutor, Silv ano W endel, Matthew Coleman, Axel Seltsam, Kimberly Ro den, Whitney R. Steele, Milos Bohonek, Ramir Alcantara, Eman uele Di Angelantonio, and Katja v an den Hurk. V ariations in hemoglobin measuremen t and eligibilit y criteria across blo o d donation services are asso ciated with differing lo w-hemoglobin deferral rates: a b est collab orative study . T r ansfusion , 60:544–552, 3 2020. 18 [18] Ryan K. Chung, Angela M. W o o d, and Michael J. Sweeting. Biases incurred from nonrandom rep eat testing of haemoglobin levels in blo o d donors: Selectiv e testing and its implications. Bio- metric al Journal , 61:454–466, 3 2019. [19] Mart Pothast, Katja v an den Hurk, and Mart P . Janssen. Modeling the effect of conditionally rep eating hemoglobin measurements prior to blo o d donation. T r ansfusion , 65:1395–1399, 8 2025. [20] F ederica Braga and Mauro Pan teghini. Generation of data on within-sub ject biological v ariation in lab oratory medicine: An up date. Critic al R eviews in Clinic al L ab or atory Scienc es , 53:313–325, 9 2016. [21] Norman Lloyd. Johnson, Samuel. Kotz, and N.. Balakrishnan. Continuous univariate distribu- tions . Wiley , 1994. [22] Paul Gustafson. Me asur ement err or and misclassific ation in statistics and epidemiolo gy: imp acts and Bayesian adjustments . Chapman and Hall/CRC, 2003. [23] Sander Greenland. Multiple-bias mo delling for analysis of observ ational data. Journal of the R oyal Statistic al So ciety Series A: Statistics in So ciety , 168(2):267–306, 2005. [24] Y u Luo, Da vid A Stephens, and David L Buck eridge. Estimating prev alence using indirect infor- mation and bay esian evidence synthesis. Canadian Journal of Statistics , 46(4):673–689, 2018. [25] HK W alker, WD Hall, and JW Hurst, editors. Clinic al Metho ds: The History, Physic al, and L ab or atory Examinations, Chapter 151 . Butterworths, 3 edition, 1990. [26] Aki V ehtari, Andrew Gelman, and Jonah Gabry . Practical bay esian mo del ev aluation using lea v e-one-out cross-v alidation and waic. Statistics and Computing , 27:1413–1432, 9 2017. A Blo o d donor data selection Our applied example uses data from Vitalant, a large blo o d op erator in the United States. The full dataset contains hemoglobin measurements from donor visits b etw een January 1, 2017 and Octob er 31, 2022 and is comprised of 2,582,402 unique donors with 9,099,136 visits recorded in the database, of whic h 6,528,084 had a pre-donation fingerstick Hb measurement. The ma jorit y of the visits were intended for whole blo o d donation visits, comprising 68% of the total. Double red-cell donation visits accounted for 10% of visits, while plasma and platelet donation visits represen ted 12% and 10% of visits did not result in a successful donation for v arious reasons.The sex-sp ecific hemoglobin threshold for donation eligibility was 13 g/dL for males and 12.5 g/dL for females. If the initial pre-donation fingerstick Hb fell b elow this threshold, a second fingerstick Hb measuremen t was p erformed and recorded in the database. The ma jority of visits (90%) with a Hb v alue b elow the threshold for the first test underwen t a second test the same day . The data selection flo w chart is shown in fig. A1. 19 B Additional figures C Additional tables T able C1: Data-generating parameter v alues used in the simulation study . Latent parameters sp ecify the distri- bution of the true latent quantit y; measurement parameters gov ern observ ation noise. V ectors give sex-sp ecific v alues in the order (male, female). F or mo del c, σ meas , 1 and σ meas , 2 are the mixture comp onen t scales and π is the mixture weigh t; for mo dels b and d, df is the Studen t- t degrees of freedom. Mo del Laten t distribution parameters Measuremen t parameters Mo del a µ = (14 . 8 , 13 . 8) σ pop = (0 . 55 , 0 . 60) σ meas = (0 . 55 , 0 . 55) Mo del b µ = (14 . 8 , 13 . 8) σ pop = (0 . 55 , 0 . 60) σ meas = (0 . 55 , 0 . 55) df = (5 , 5) Mo del c µ = (14 . 8 , 13 . 8) σ pop = (0 . 55 , 0 . 60) σ meas , 1 = (0 . 45 , 0 . 45) σ meas , 2 = (2 . 0 , 2 . 2) π = 0 . 80 Mo del d µ loc = (14 . 8 , 13 . 8) µ scale = (0 . 55 , 0 . 60) µ skew = (5 , − 5) σ meas = (0 . 55 , 0 . 55) df = (5 , 5) 20 T able C2: Priors used for real-data models (sex-sp ecific parameters). P arameter b ounds (caps) follo w the mo del definitions: σ pop , σ meas ∈ [0 . 2 , 20]; df ∈ [2 , 30]; for the mixture mo del σ meas , 1 , σ meas , 2 ∈ [0 . 2 , 2] and π ∈ [0 , 1]; for the sk ew mo del µ skew ∈ [ − 5 , 5], σ meas ∈ [0 . 2 , 2], and df ∈ [2 , 30]. Mo del Priors Mo del a (Normal–normal) µ ∼ N (15 , 2); σ pop ∼ N (0 , 2); σ meas ∼ N (0 , 2) Mo del b (Normal–t) µ ∼ N (15 , 2); σ pop ∼ N (0 , 2); σ meas ∼ N (0 , 2); df ∼ Gamma(2 , 0 . 1) Mo del c (Mixture) µ ∼ N (15 , 2); σ pop ∼ N (0 , 2); σ meas , 1 ∼ N (0 , 2); σ meas , 2 ∼ N (2 , 2); π ∼ Beta(2 , 2) Mo del d (Skew–t) µ loc ∼ N (15 , 2); µ scale ∼ N (1 , 2); µ skew ∼ N (0 , 2); σ meas ∼ N (1 , 2); df ∼ Gamma(2 , 0 . 1) 21 Figure 1: (T op:) Distribution of the first Hb measurement of males and females at all blo o d donor visits. (Bottom:) scatterplot of the first (X1) and the rep eat (X2) Hb measurement among the subset of donors with t w o measurements at the same visit. 22 Figure 2: Scatter plot of the initial and rep eat measuremen t of 1000 simulated conditionally rep eated measure- men ts. Density plots show the marginal distribution of x 1 and x 2 when pairs are observ ed for all individuals (green) and when x 2 is only observed when x 1 falling falls b elow c = 13 g/dL (orange). 23 Figure 3: Difference b etw een estimated ˆ σ meas and simulated σ meas = 0 . 8 g/dL. Shown is the mean with 95% CI of 200 rep eats of simulated datasets with rec heck probability parameter r from eq. (23). 24 Figure 4: Violin plots depicting the distribution of b o otstrapp ed estimates for p opulation v ariance and measure- men t error v ariance using the conditional exp ectation metho d and maximum likelihoo d metho d, stratified b y sex 25 Figure 5: P ercen t difference b etw een the estimated measurement error v ariance and the true v ariance (accounting for degrees of freedom) for conditional exp ectation and maximum likelihoo d metho ds under truncation with a t distributed error measurement. P oin ts show mean bias across 100 simulations with 95% confidence interv als, plotted against the degrees of freedom (df ) on a log scale. 26 Figure 6: Graphic representation of the tw o-lev el hierarchical mo del used in the Ba y esian framework. 27 Figure 7: Delta cLPPD heatmap from 5-fold CV on syn thetic datasets. Eac h tile shows the total cLPPD for a fitted mo del minus the b est cLPPD within the same true-mo del dataset (higher is b etter, 0 indicates the winner). Negativ e v alues indicate worse predictive p erformance relativ e to the b est mo del for that dataset. 28 Figure 8: Prior, lik eliho o d, and p osterior for three scenarios (columns A - C) with a 13 g / dL cutoff and initial Hb of 12 . 8 g / dL. Scenario A uses one measurement; B adds 12 . 4 g / dL; C adds 13 . 2 g / dL. Red dashed lines mark the cutoff, black dashed lines the observ ed measuremen ts. 29 Figure A1: Flow chart of selecting donation visits. 30 Figure B1: Bias under conditional rep eat measurements across parameter sets. Lines show the theoretical bias as a function of truncation severit y , and shaded bands show b o otstrap 95% confidence interv als for the n umerical bias from simulations; stronger truncation yields larger do wnw ard bias. 31 Figure B2: Simulating additional dep endencies for rep eating a measurement. Each colored line corresp onds to the probabilit y of p erforming a rep eat measuremen t based on the initial measurement. 32 Figure B3: Distributional assumptions tested 33 Figure B4: T raceplots of the model for the global parameters for males and females. The plots sho w the parameter estimates of the four chains across the post warm-up iterations. 34 T able C3: Parameter recov ery summary for data simulation under each generation pro cess (p osterior mean and 95% credible interv al). Mo del P arameter Mean 2.5% 97.5% Mo del a µ 1 14.825 14.780 14.870 µ 2 13.764 13.716 13.812 σ meas , 1 0.542 0.517 0.569 σ meas , 2 0.557 0.530 0.587 σ pop , 1 0.601 0.561 0.642 σ pop , 2 0.624 0.581 0.669 Mo del b df 1 5.711 4.234 7.617 df 2 6.038 4.445 7.771 µ 1 14.783 14.736 14.832 µ 2 13.802 13.753 13.851 σ meas , 1 0.542 0.497 0.585 σ meas , 2 0.564 0.518 0.606 σ pop , 1 0.566 0.519 0.614 σ pop , 2 0.606 0.557 0.655 Mo del c π 0.739 0.471 0.838 µ 1 14.821 14.774 14.869 µ 2 13.829 13.777 13.880 σ meas , 1 , 1 0.416 0.276 0.487 σ meas , 1 , 2 0.719 0.426 1.565 σ meas , 2 , 1 1.869 1.340 2.214 σ meas , 2 , 2 1.732 0.264 2.433 σ pop , 1 0.545 0.481 0.597 σ pop , 2 0.615 0.551 0.669 Mo del d df 1 5.611 4.290 7.301 df 2 5.949 4.512 7.612 µ loc , 1 14.864 14.773 14.973 µ loc , 2 13.839 13.728 13.923 µ scale , 1 0.479 0.375 0.576 µ scale , 2 0.658 0.559 0.750 µ skew , 1 4.849 1.247 11.590 µ skew , 2 -5.638 -11.713 -2.122 σ meas , 1 0.568 0.524 0.609 σ meas , 2 0.580 0.538 0.622 35 T able C4: T otal cLPPD b y true mo del and fitted mo del (5-fold CV). T rue mo del Fitted mo del T otal cLPPD Mo del a Mo del a -4012.0 Mo del b -4031.9 Mo del c -4013.2 Mo del d -4032.9 Mo del b Mo del a -4402.6 Mo del b -4336.3 Mo del c -4339.1 Mo del d -4340.4 Mo del c Mo del a -5346.9 Mo del b -4910.0 Mo del c -4889.4 Mo del d -4910.8 Mo del d Mo del a -4173.6 Mo del b -4037.0 Mo del c -4045.7 Mo del d -4037.3 T able C5: Per-fold computed LPPD (cLPPD) for the four candidate mo dels. Mean and standard error (SE) are computed across the fiv e folds. Each row lists the p er-fold cLPPD (folds 1 - 5) follow ed by the across-fold mean and standard error (SE). Larger (less negative) cLPPD indicates b etter predictive p erformance. The maximum cLPPD in each column is b olded. Mo del F old 1 F old 2 F old 3 F old 4 F old 5 Mean SE Mo del a (Normal–normal) -11298.27 -11335.27 -11338.68 -11229.12 -11122.52 -11264.77 40.66 Mo del b (Normal–t) -11251.73 -11313.39 -11309.15 -11151.97 -11105.21 -11226.29 42.00 Mo del c (Mixture) -11243.57 -11310.46 -11305.82 -11199.73 -11098.05 -11231.53 39.19 Mo del d (Skew–t) -11242.39 -11306.80 -11310.03 -11164.87 -11095.61 -11223.94 41.58 36

Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Leave a Comment