Sample size and power determination for assessing overall SNP effects in joint modeling of longitudinal and time-to-event data

Reading time: 5 minute
...

📝 Original Info

  • Title: Sample size and power determination for assessing overall SNP effects in joint modeling of longitudinal and time-to-event data
  • ArXiv ID: 2602.15247
  • Date: 2026-02-16
  • Authors: ** - 논문에 명시된 저자 정보는 제공되지 않았습니다. (원문에 저자 명시가 없으므로 “저자 미상”으로 표기) **

📝 Abstract

Longitudinal biomarkers are frequently collected in clinical studies due to their strong association with time-to-event outcomes. While considerable progress has been made in methods for jointly modeling longitudinal and survival data, comparatively little attention has been paid to statistical design considerations, particularly sample size and power calculations, in genetic studies. Yet, appropriate sample size estimation is essential for ensuring adequate power and valid inference. Genetic variants may influence event risk through both direct effects and indirect effects mediated by longitudinal biomarkers. In this paper, we derive a closed-form sample size formula for testing the overall effect of a single nucleotide polymorphism within a joint modeling framework. Simulation studies demonstrate that the proposed formula yields accurate and robust performance in finite samples. We illustrate the practical utility of our method using data from the Diabetes Control and Complications Trial.

💡 Deep Analysis

📄 Full Content

Diabetes, characterized by chronically elevated blood glucose levels, is among the most prevalent and rapidly increasing chronic diseases worldwide. In 2021, an estimated 38.4 million Americans, approximately 11.6% of the U.S. population, were living with diabetes (Centers for Disease Control and Prevention, 2021).

Globally, the number of adults with diabetes is projected to reach 693 million by 2045, a more than 50% increase from 2017 levels (Cho et al., 2018). As a leading cause of death, diabetes is associated with a wide range of long-term complications, including microvascular conditions (e.g., nephropathy, retinopathy, and neuropathy) and macrovascular diseases (e.g., cardiovascular disease and stroke) (Papatheodorou et al., 2018). These complications contribute substantially to increased mortality, blindness, kidney failure, and reduced quality of life (Morrish et al., 2001). Type 1 diabetes, also known as insulin-dependent diabetes mellitus, results from the pancreas’s inability to produce sufficient insulin. Individuals with type 1 diabetes require lifelong insulin therapy to maintain normoglycemia (Chiang et al., 2014). Although its precise etiology remains unclear, it is widely believed to involve complex interactions between genetic and environmental factors (Cole and Florez, 2020).

Because poor glycemic control is a major risk factor for diabetes-related complications (Knuiman et al., 1986), a key scientific question is whether genetic variants influence the timing of these complications through glycemic pathways, typically measured via hemoglobin A1c (HbA1c), or through alternative mechanisms. Addressing this question requires a study design with adequate statistical power, as sample size directly impacts the precision of parameter estimates and the ability to detect meaningful genetic effects. However, practical constraints often limit data collection, making rigorous sample size estimation a critical component of study design.

Several statistical approaches have been developed for sample size estimation in survival analysis. Schoenfeld (1983) introduced a widely used formula based on the Cox proportional hazards model for comparing two randomized groups. Other contributions include log-rank test-based approaches (Freedman, 1982;Lakatos, 1988), and methods for non-binary covariates under exponential survival assumptions (Zhen and Murphy, 1994). Hsieh and Lavori (2000) extended Schoenfeld’s formula to accommodate continuous and categorical covariates without assuming an exponential survival distribution. More recently, Chen et al. (2011) proposed sample size formulas for the associations between longitudinal biomarkers and survival outcomes, as well as overall treatment effect. Wang et al. (2014) developed sample size formula to incorporate time-dependent covariates in proportional hazards models.

In genetic studies, the exposure variable is often a single nucleotide polymorphism (SNP), typically coded as a dosage variable with three levels (e.g., 0, 1, or 2 copies of a risk allele). Depending on the minor allele frequency, genotype distribution may be highly unbalanced, potentially reducing statistical efficiency and affecting the validity of asymptotic approximations. Moreover, unlike treatment effects, which are generally assumed to be directional (beneficial or harmful), genetic effects can increase or decrease the risk or timing of an event, necessitating two-sided hypothesis testing.

In this paper, we derive a closed-form sample size formula for testing the overall SNP effect within a joint modeling framework for longitudinal and survival data. We further developed an interactive Shiny app to determine sample size and power, available at https://krisyuanbian.shinyapps.io/PowerSNP/ . Our method extends the approach developed by Chen et al. (2011) to accommodate non-binary covariates such as SNPs. We illustrate the utility of our approach using data from the Diabetes Control and Complications Trial (DCCT; The DCCT Research Group, 1986Group, , 1990)), a multicenter randomized controlled clinical trial initiated in 1983 that enrolled 1,441 individuals with type 1 diabetes. Participants were randomized to receive either intensive insulin therapy or conventional treatment and were followed for an average of 6.5 years. The study collected rich longitudinal data on glycemic control (e.g., quarterly HbA1c measurements) and detailed time-to-event data on diabetic complications such as retinopathy. A genome-wide association study (GWAS) by Paterson et al. (2010) identified several SNPs associated with these complications in both treatment arms. Our analysis suggests that for either treatment arm, the DCCT sample size is insufficient to achieve adequate power at conventional GWAS significance levels.

The remainder of this paper is organized as follows. Section 2 introduces the joint modeling framework and presents our new sample size formula for assessing overall SNP effects. Section 3 evaluate

Reference

This content is AI-processed based on open access ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut