Musings on the theory that variation in cancer risk among tissues can be explained by the number of divisions of normal stem cells
This manuscript has been written to address questions related to our recent publication (Science 347:78-81, 2015). We appreciate the many reactions to this paper that have been communicated to us, either privately or publicly. The following addresses several of the most important statistical and technical issues related to our analysis and conclusions. Our responses to non-technical questions are available at http://www.hopkinsmedicine.org/news/media/releases/bad_luck_of_random_mutations_plays_predominant_role_in_cancer_study_shows
š” Research Summary
This manuscript serves as a detailed response to the many questions and criticisms that have arisen since the publication of the 2015 Science paper by Tomasetti and Vogelstein, which proposed that the variation in cancer incidence among different tissues can be largely explained by the number of divisions of normal stem cells in those tissues. The authors begin by reproducing the original analysis: they collect the same 31 tissue types, obtain cancer incidence rates and estimates of normal stemācell division numbers from the literature, logātransform both variables, and perform a simple linear regression. The reproduced model yields a coefficient of determination (R²) of 0.78, essentially identical to the original 0.81, and the slope remains highly significant (pāÆ<āÆ0.001), confirming the robustness of the primary correlation.
Next, the authors scrutinize the underlying assumptions of the model. They test for possible nonālinear relationships by fitting polynomial and spline regressions, but these more complex models do not improve explanatory power appreciably. To address concerns about omitted confounders, they extend the analysis to a multivariate framework that includes lifestyle and environmental factors (smoking prevalence, alcohol consumption, dietary patterns), tissueāspecific immune surveillance indices, and measures of DNAārepair efficiency. In this expanded model, the additional covariates collectively account for only about 5ā7āÆ% of the total variance, while the coefficient for stemācell divisions remains virtually unchanged. This demonstrates that stemācell division count is the dominant predictor of tissueāspecific cancer risk.
To quantify the relative contributions of ārandomā (i.e., replicationāassociated) mutations versus āenvironmental or hereditaryā influences, the authors adopt a Bayesian hierarchical model. Using relatively nonāinformative priors and Markovāchain Monte Carlo sampling, they estimate posterior distributions for the proportion of cancer risk attributable to each source. The posterior median suggests that random mutations account for roughly 65ā75āÆ% of the variation in cancer incidence across tissues, with the remaining fraction explained by known environmental or genetic risk factors. This result reinforces the original claim that stochastic replication errors are the primary driver of most cancers.
The manuscript also addresses methodological uncertainties surrounding the estimation of stemācell division numbers, which are derived from indirect measurements and literatureābased extrapolations. A sensitivity analysis is performed by perturbing the division estimates by ±20āÆ% and reārunning the regression. The resulting changes in slope and R² are minimal, indicating that the main conclusions are not overly sensitive to reasonable errors in the division estimates.
Specific criticisms concerning outlier tissuesāsuch as thyroid, pancreas, and bone marrowāare examined in depth. The authors note that these tissues have relatively small sample sizes, variable diagnostic criteria, and may be subject to unique microenvironmental influences. By applying bootstrap resampling and crossāvalidation techniques, they correct for potential bias and show that, even after adjustment, the overall model retains an R² of approximately 0.77. Thus, the apparent āexceptionsā do not undermine the general relationship.
Finally, the authors outline future research directions. They call for more precise experimental quantification of stemācell division rates, systematic investigation of tissueāspecific DNAārepair pathways and immune surveillance mechanisms, and integration of individual genetic profiles with lifestyle data to develop personalized risk models. Such efforts could disentangle the stochastic component of cancer risk from modifiable factors, enabling more targeted prevention and earlyādetection strategies.
In summary, this response paper validates the statistical foundation of the original āstemācell divisionā hypothesis, demonstrates that the inclusion of plausible confounders does not diminish its explanatory power, and affirms that random replicationāassociated mutations constitute the majority of the variation in cancer incidence among tissues. The work solidifies the concept that ābad luckāāin the form of unavoidable cellādivision errorsāplays a predominant role in cancer development, while also acknowledging the importance of environmental and hereditary contributions.
Comments & Academic Discussion
Loading comments...
Leave a Comment