A Bayesian Gamma-power-mixture survival regression model: predicting the recurrence of prostate cancer post-prostatectomy

A Ba y esian Gamma-p o w er-mixture surviv al regression mo del: predicting the recurrence of prostate cancer p ost-prostatectom y ∗ T omm y W alk er Mac k a y 1 , Mingtong Xu 2 , Shahrokh F. Shariat 3 , and Roger Sew ell 4,5 1 Mathematical Institute, Univ ersity of Oxford; but at the time this w ork was done at T rinity College, Cam bridge, CB2 1TQ, United Kingdom 2 Medical Researc h Council Lab oratory of Molecular Biology , Cambridge, CB2 0QH, United Kingdom 3 Departmen t of Urology , Medical Univ ersity of Vienna, Austria 4 T rinity College, Cam bridge, CB2 1TQ, United Kingdom 5 roger.sewell@can tab.net Marc h 27, 2026 Ac knowledgemen ts: T o Mont y Barlo w of Cambridge Consultants for his assistance with data access, and to Lalita Ramakrishnan for motiv ating the dev elopment of the mo del and organising funding for T ommy W alker Mack ay and Mingtong Xu. Abstract W e constructed a Bay esian hierarc hical mo del for predicting the probabilit y distribution of surviv al from a disease based on a n umber of observed explanatory v ariables. This mo del considers an unkno wn n umber of comp eting mo des of death, each of which has a Gamma-p o wer distribution of surviv al time, and each of whic h is activ e with probabilit y giv en by a logistic linear regression of explanatory v ariables with unknown coeﬃcients. Cho osing suitable priors we applied this mo del to predicting recurrence time of prostate cancer following radical prostatectomy based on preop erativ e markers, clinical data, and op erativ e ﬁndings, using a dataset previously collected b y Shariat et al. W e estimated the apparent Shannon information (ASI) in predictions based on the mo del on unseen patients using a v ariety of subsets of the a v ailable data. Results: In all the subsets examined the ASI was positive with posterior probability greater than 0.975 . Using only age and results of pre-op erative blo o d tests (PSA and biomarkers) we achiev ed 0.232 (0.180 to 0.290) nats ASI (0.335 (0.260 to 0.419) bits) (p osterior mean and equitailed 95% p osterior conﬁdence in terv als). This is more than double the mean p osterior ASI previously achiev ed on the same dataset b y a subset of the current authors using a log-sk ew-Student-mixture mo del, and is greater than that previous v alue with p osterior probability greater than 0.99 . Additionally using pre- or p ost-operative Gleason grades, op erativ e ﬁndings, clinical stage, and presence or absence of extraprostatic extension or seminal vesicle in v asion did not increase the ASI extracted. Ho wev er removing the blo od-based biomark ers and replacing them with either pre-op erativ e Gleason grades or ﬁndings a v ailable from MRI scanning greatly reduced the av ailable ASI to resp ectiv ely 0.077 (0.038 to 0.120) and 0.088 (0.045 to 0.132) nats (b oth less than the v alues using blo o d-based biomarkers with p osterior probability greater than 0.995). A greedy approach to selection of the best biomark ers ga ve TGF β 1 , VCAM1, IL6sR, and uP A in descending order of imp ortance from those examined. W e stress that these ﬁndings apply only in a set of patients who hav e all undergone radical prostatectom y , and that the ﬁndings in patien ts taking a watc hful waiting approach might b e very diﬀerent. ∗ First v ersion deposited in arxiv; RFS version 1.23.1.1 . 1 Con ten ts 1 In tro duction 2 2 Bac kground 3 2.1 Assessmen t of the results of a prediction metho d . . . . . . . . . . . . . . . . . . . . . . 3 2.2 Previous mo delling of prostate cancer relapse time . . . . . . . . . . . . . . . . . . . . . 3 3 The mo del 4 3.1 Com bination of diﬀerent mo des of death . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 3.2 Mo del of a single mo de of death . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 3.3 The distribution of p . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 3.4 Priors on the parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 3.4.1 General form of the priors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 3.4.2 Sp eciﬁc v alues of the hyperparameters and the resulting priors . . . . . . . . . . 6 3.5 MCMC metho dology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 3.6 Sampling metho ds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 3.7 Estimation of the ASI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 4 The dataset 12 5 Results 12 6 Discussion 18 1 In tro duction Prostate cancer is the most commonly diagnosed cancer and the second leading cause of cancer death in men in the United States[ 1 ]. Up to 30% of patients exp erience at least bio c hemical recurrence follo wing initial therapy with curativ e inten t. It is b elieved that iden tifying this subset of prostate cancer patien ts at the time of initial surgery would allow selection of a subset of patients who should receive additional therapies in the mon ths following initial prostatectom y , thus a v oiding the additional cost, incon v enience, and morbidity associated with giving all patients suc h additional treatment. F urther, it may be p ossible to use such likelihoo d of early recurrence when choosing patient groups to enter clinical trials for nov el therapies, shortening the duration of such trials by facilitating selection of those patients most lik ely to experience early recurrence on the basis of biomarker data. V arious authors hav e attempted to use com binations of biomarkers and clinical data to predict which patients are at highest risk of recurrence. In particular, Shariat et al[ 2 ] collected a dataset of 423 patients with prostate cancer and analysed it using standard Co x prop ortional hazards metho ds[ 3 ] and Harrell-b o otstrapped concordance[ 4 ]. Sew ell et al[ 5 ] pro duced a Bay esian mo del based on log-sk ew-Student mixtures, and assessed it using apparent Shannon information[ 6 ], illustrating wh y this is a more reliable metho d of assessing predictions by an algorithm than Harrell-b o otstrapped concordance. How ev er, despite b eing the ﬁrst prediction metho d to produce statistically signiﬁcantly p ositiv e ASI from suc h a dataset, the amount of ASI obtained b y these authors was small (mean 0.109 nats (sligh tly less than one-sixth of a bit) b etter than a simple exp onen tial deca y prediction that is the same for all patients). 2 No w, ASI cannot exceed the true Shannon information (TSI) av ailable in the data ab out the time of relapse[ 6 ]. How ev er it is p ossible for the ASI extracted by a bad algorithm to b e less than the TSI av ail- able, or even to b e negative (i.e. more misleading than the reference prediction), and corresp ondingly p ossible for a b etter algorithm to extract more ASI. Bay esian metho ds extract ASI equal to the TSI giv en the mo delling and prior assumptions made – but if one mo del (or set of priors) reﬂects reality b etter than another, b oth TSI and ASI given that model ma y increase ov er those given the less accurate mo del. In this presen t pap er w e seek to improv e on the mo del accuracy ac hieved in [ 5 ], using a completely diﬀeren t, Gamma-p o wer-mixture based surviv al regression mo del, to which we again apply Ba yesian inference. In section 2 we review the background of mo delling eﬀorts, assessment of results, and previous attempts to predict relapse of prostate cancer. In section 3 we present the curren t mo del. In section 4 w e describ e the dataset used, and give our results in detail in section 5 b efore discussion and conclusion in section 6 . 2 Bac kground 2.1 Assessmen t of the results of a prediction metho d Man y pap ers, including [ 2 ], hav e follow ed Harrell[ 4 ] and relied on concordance of predictions with the order of relapse of pairs of patien ts whose relapse order could b e determined as a wa y of v alidating their predictions. Sewell show ed in app endix B of [ 5 ] how Harrell’s b o otstrapping can give “82% accuracy” scores to algorithms that giv e completely random outputs unrelated to truth; and concordance itself already relies on a small subset of the pairs of patients, often only 7% of the a v ailable pairs, for whom it is p ossible to determine the order of relapse (b ecause for many pairs one will b e censored before the other relapses). In consequence w e need a b etter metho d of predicting the quality of a prediction of time of relapse. In [ 6 ] the Apparen t Shannon Information (ASI) in a prediction ab out the time of relapse is put forward as an appropriate metho d, applying equally well to b oth censored and uncensored patien ts; an appropriate metho d of estimating it is also given. Giv en a reference probability distribution (such as an exp onen tial deca y common to all patien ts), the ASI scores a predicted probability distribution of relapse time by measuring the a verage logarithm of the factor by which that predicted probabilit y density exceeds the reference probability density at the actual time of relapse of an unseen patien t (and similarly for the ratio of the predicted probabilit y of non-relapse b y the censoring time to the reference probabilit y of non-relapse by the censoring time). If on av erage that factor is 2 then the ASI is one bit; if it is a factor of e then the ASI is one nat (so that 1 nat = 1.44 bits, and 1 bit = 0.69 nats). (Since b oth predicted and reference densities hav e to in tegrate to one, one cannot just increase the density everywhere.) On the other hand if the prediction is misleading and on av erage puts the probability density elsewhere than at the true time of relapse, so that the predicted densit y at the true relapse time is b elo w that of the reference densit y , then the ASI will b e negative. W e should note that any prediction metho d that gives purely p oin t predictions of relapse time, as opp osed to probability distributions on relapse time, will yield −∞ on ASI, with the sole exception that the time of relapse of ev ery unseen patient is exactly at the predicted time. 2.2 Previous mo delling of prostate cancer relapse time V arious authors hav e used nomograms (e.g. [ 7 ], [ 8 ]), Cox prop ortional hazards mo dels (e.g. [ 2 ]), and Ba yesian hierarchical models (e.g. [ 5 ]); only the last has shown statistically signiﬁcantly positive ASI resulting, although a v ariant of a Co x prop ortional hazards mo del where the hazard rate is assumed not to v ary with time could also do so in principle[ 5 ]. 3 Ho wev er, the ASI yielded by the log-skew-Studen t-mixture mo del in [ 5 ] is not even quite one sixth of a bit – hardly something to write home ab out. There are sev eral p otential con tributory reasons for this: 1. The explanatory v ariables in the dataset may not contain m uch more information ab out the time of relapse ev en given an accurate mo del; 2. The mo del may not accurately represent reality; 3. The priors used may not accurately represent reality; 4. In principle an arbitrary algorithm may giv e low er ASI than the av ailable TSI given the mo del and priors – ho wev er Bay esian mo dels av oid this, so this is not applicable in the case of [ 5 ]. 3 The mo del W e no w describe the Gamma-pow er-mixture model used in this paper; the w a y this model will be used is describ ed in app endices 1 and 2 of [ 9 ]; large parts of the mo del are similar to the surviv al mo del used in that pap er for tu b erculous meningitis, though the mo del of that paper did not allo w for any explanatory v ariables. W e adopt the Bay esian paradigm and construct a generative hierarchical Bay esian mo del as follo ws. W e supp ose that there exist an unknown n umber J of diﬀerent mec hanisms causing death, and that eac h suc h mode of death has a diﬀerent lifetime distribution. Indeed throughout this description w e will use “death” to mean any type of ev ent following which our interest in the patien t lapses, and in particular in this pap er including as an application example the even t that there is bio c hemical relapse of prostate cancer. 3.1 Com bination of diﬀerent mo des of death First let us consider a single patient i . Let x denote a lifetime, i.e. the time until a patient dies. Let j ∈ { 1 , 2 , ..., J } denote a particular mec hanism (or mo de) of death. Let x j denote the time at which mo de j would kill the patien t; we set x j = ∞ to denote the p ossibility that that mo de would nev er hav e killed the patien t. Then the patien t’s time of death is given by x = min 1 ≤ j ≤ J x j . In particular x = ∞ denotes the situation that the patient never dies (unlikely as this is). 3.2 Mo del of a single mo de of death W e now drop the subscripts j , but assume that this subsection will b e rep eated J times with the subscript j s added to ev ery random v ariable, with each of the rep etitions b eing indep enden t as far as the mo del is concerned b efore being conditioned on observ ed data. Similarly we elide the v ariable i indexing patients. When later w e w ant to refer to the complete set of J v alues of e.g. p , we will use b old face, e.g. p = ( p 1 , p 2 , ..., p J ) or p i = ( p i, 1 , p i, 2 , ..., p i,J ) for a v ariable such as p which depends on b oth patient and mo de of death. Th us w e will set P ( x | p, k , m, r ), i.e. P ( x j | p j , k j , m j , r j ), to b e such that with probability p , x k is Gamma 4 distributed with parameters m ′ = m and r ′ = mr k , and otherwise x = ∞ . Thus we hav e P ( x | p, k , m, r ) =          p | k | ( mr k ) m Γ( m ) x mk − 1 e − m ( rx ) k (0 < x < ∞ ) 1 − p ( x = ∞ ) 0 ( x ≤ 0) . Here p ∈ [0 , 1], k ∈ R \ { 0 } , m, r > 0; the explanatory v ariables will enter the model through the parameter p . Note that we hav e here a distribution which has b oth a discrete and a contin uous part, so that P ( x | ... ) is used as notation b oth for a probability and for a probabilit y densit y: in other words, we hav e a con tinuous distribution for ﬁnite p ositive x , giv en by a density function, whose interpretation is that its integral from x 1 to x 2 is the probability that x 1 < x < x 2 ; but as we hav e a non-zero probabilit y 1 − p that x = ∞ , the integral from 0 (inclusive) to ∞ (exclusiv e) of the densit y giv en by the ﬁrst line of the ab o v e formula for P ( x | p, k , m, r ) must be p . On the other hand we hav e a discrete distribution for x = ∞ , and 1 − p is a probability , not a density . By w ay of v ery approximate intuition: p is the probabilit y that a particular mo de of death would kill the patient at a ﬁnite time; r is the recipro cal of the o verall timescale to deaths of those patients who die; m go verns how v ariable those times of death are – the smaller m is, the more v ariable are the times of death; and the sign of k plays a part in determining whether the hazard rate for this mo de of death is increasing or decreasing, while the magnitude of k gov erns how abruptly the spread of death time is cut oﬀ in the less spread out direction. Sp eciﬁcally , k = 0 makes no sense, as then we would hav e x k = 1 for all x , and an in v alid distribution would result (so it should not b e a surprise that the prior on k is bimo dal with zero density at zero). 3.3 The distribution of p In order to allow the explanatory v ariables to aﬀect the surviv al distributions, w e no w supp ose that eac h patient i has a vector c i = ( c i, 1 , c i, 2 , ..., c i,V ) of explanatory v ariables whose v alues are known. W e then set P ( p i,j | c , β ) = 1 1 + e l i,j where l i,j = V X v =1 c i,v β v ,j , where β v ,j is the co eﬃcient of v ariable c v for mo de of death j , to b e determined by inference from the dataset. 3.4 Priors on the parameters W e sp ecify the priors on the parameters in tw o stages. First, we sp ecify their general form, and second w e choose sp eciﬁc v alues for the hyperparameters that then sp ecify a unique prior. 3.4.1 General form of the priors The total n umber J of mo des is itself to b e considered a random v ariable, on which w e put the prior P ( J | α J ) = (1 − α J ) α J − 1 J for J ∈ N ∗ = { 1 , 2 , ... } and for some ﬁxed α J ∈ [0 , 1) . 5 The prior for the parameters m, r, k , β of eac h mode of death are tak en to b e indep enden t, and as follows. W e take the prior on r to b e Gamma, with parameters m r , r r > 0, so that P ( r | m r , r r ) = r m r r Γ( m r ) r m r − 1 e − r r r . W e take the prior on each of the parameters k and m to b e the conjugate prior on each with resp ect to this parameterisation. Thus for p ositiv e real parameters a m , b m w e hav e P ( m | a m , b m ) ∝ m b m m e − ( a m + b m ) m Γ( m ) b m . Similarly for parameters N k ∈ N , a k ∈ R + and b k = ( b 1 , b 2 , ..., b N k ) , c k = ( c 1 , c 2 , ..., c N k ) ∈ R N k + w e ha ve P ( k | a k , b k , c k ) ∝ | k | a k N k Y n =1 b kc n n e − c n b k n . In the case of β v ,j w e set P ( β v ,j | γ ) = e γ β v,j (1 + e γ β v,j ) 2 . 3.4.2 Sp eciﬁc v alues of the hyperparameters and the resulting priors The priors were chosen to be uninformativ e and very wide, with the exception of the prior on β v ,j , for whic h setting it to o wide turned out to hav e the undesirable eﬀect of implying a high probability that eac h mode of death would either b e alw ays activ e for all patients or alwa ys inactive. Speciﬁcally we to ok the following v alues: The sp eciﬁc parameter v alues chosen were as follows: a J = 0 . 8 γ = 1 (where we assume that all explanatory v ariables ha ve b een shifted to hav e mean zero then scaled to v ariance one ov er the database, with the exception of a single constant v ariable equal to one for all patien ts) a m = b m = 1 m r = 0 . 5 r r = 30 da ys N k = 2 a k = 1 b k = (0 . 2 , 0 . 2) c k = (0 . 5 , 0 . 5) . These result in the following depicted distributions for J, β v ,j , m j , r j , k j , and hence for the depicted samples from the distributions for surviv al probability and hazard rate against time as well as the mean and 2.5% and 97.5% cen tiles for the last tw o: see Figures 1 to 9 . 6 Figure 1: Prior on J , the n umber of diﬀerent mo des of death. Figure 2: Prior on β v ,j , the co eﬃcien t of c i,v for mo de j of death. Figure 3: Prior on k . 7 Figure 4: Prior on m . Figure 5: Prior on r . Figure 6: Samples from resulting prior on surviv al probabilit y against time. Here the v ariables c i,v ha ve b een taken from diﬀeren t random patien ts in the dataset for eac h curv e, while the β v ,j and other v ariables used are random samples from the prior. 8 Figure 7: Samples from resulting prior on hazard rate against time. Here the v ariables c i,v ha ve b een tak en from diﬀerent random patien ts in the dataset for ea c h curve, while the β v ,j and other v ariables used are random samples from the prior. Figure 8: Mean and 2.5% and 97.5% centiles of prior on surviv al probability against time. Figure 9: Mean and 2.5% and 97.5% centiles of prior on hazard rate against time. 9 3.5 MCMC metho dology W e introduce additional v ariables j i for eac h patient i which indicate whether the time of death w as censored (v alue 0) or was caused by a particular mo de j of death (v alue j  = 0 unknown). W e also in tro duce v ariables x i,j of unkno wn v alues giving for each patient the time of death that would hav e resulted from mo de j if no other mo des had killed the patient ﬁrst. These v ariables take the sp eciﬁc v alue x i,j = ∞ if mo de j w ould in fact not hav e killed patient i at any ﬁnite time. W e initialise the parameters J, β , m , r , k from the prior and initialise the additional v ariables j and x randomly to any set compatible with those and the observed v ariables ˆ x . These v ariables then form θ 1 = ( J, β , m , r , k , j , x ), the ﬁrst of a sequence of samples ( θ n ) n =1 ,... to b e drawn. 3.6 Sampling metho ds A thorough review of the metho ds underlying all of the following is av ailable in either [ 10 ] or in [ 11 ] except where otherwise indicated. The key point is that if w e resample each v ariable b y a metho d that satisﬁes detailed balance, and given other w eak conditions whic h are here fulﬁlled, F eller’s theorem [ 10 ] then guarantees the the sequence of samples ( θ n ) will even tually conv erge to a sequence of samples from the desired distribution P ( θ | ˆ x ). The samples in this sequence will not b e indep enden t of each other, though the conditional distribution of θ n 1 giv en θ n 0 will also con verge to P ( θ | ˆ x ) as n 1 → ∞ with n 0 ﬁxed, i.e. to indep endence. Sampling from the p osterior was done by the MCMC technique of Gibbs sampling, i.e. sampling from the follo wing distributions palindromically: 1. P ( k | ˆ x , x , J, j , m , r , β ). This distribution has tw o parts ( k j > 0 and k j < 0), eac h of which is log- conca ve. W e therefore ﬁrst resample the sign of each k j using the Metrop olis-Hastings algorithm [ 10 ], then use adaptive rejection sampling [ 12 ] to resample the magnitude of k j giv en its sign, then resample the sign again to main tain detailed balance. 2. P ( m | ˆ x , x , J, j , r , k , β ). This distribution is log conca ve, so we may use adaptiv e rejection [ 12 ] sampling to sample from it. 3. P ( r | ˆ x , x , J, j , m , k , β ). F or each j , this distribution is in general a product of a Gamma distribution on r j and a m uc h narrow er Gamma distribution on r k j j . W e therefore sample from the Gamma rel- ev an t to the latter [ 11 ], using this as a prop osal distribution for the Metrop olis-Hastings algorithm [ 10 ], resulting in the Hastings ratio coming from the Gamma on r j . 4. P ( j | ˆ x , J, m , r , k , β ) then P ( x | ˆ x , J, j , m , r , k , β ). The ﬁrst of these is a discrete distribution which is trivial to sample from, and the second reduces to a truncated Gamma distribution. T o sample from the latter we divide into tw o cases: if the shap e parameter is ≥ 1 the distribution is log- conca ve and w e can use adaptive rejection sampling [ 12 ]; otherwise we use Metropolis-Hastings [ 10 ] with either an exp onen tial or a Gamma prop osal distribution, dep ending which is estimated to b e likely to b e quick er given the other parameters. 5. P ( J | ˆ x , x , j , m , r , k , β ) (where only v alues of j unused in j are allo wed to b e remov ed) follo wed, if J has increased, b y sampling the new elements of m , r , k , p from the prior distributions on these v ariables. Resampling of J uses a discrete conditional distribution, and is done using a prop osal to either increase or decrease J by 1, and applying the appropriate Hastings ratio [ 10 ] to reject the prop osal in such a wa y as to achiev e detailed balance. 6. P ( β | ˆ x , x , J, j , m , r , k ). In this case the vector ( β v , 1 ) v =1 ,...,V is indep enden t of the β v ,j for j  = 1, so we resample each suc h v ector β = ( β v ,j ) v =1 ,...,V separately for j = 1 , ..., J . The conditional 10 distribution w e need to sample from has the form P ( β | X ) ∝ G Y g =1 1 1 + e X g, : β for some matrix X of size G × V with X g , : denoting row g of the matrix. The following resampling metho d is the result of trying a range of alternativ es, and c ho osing that which maximises mobilit y . T o maintain numerical stability and ensure go o d resampling mobility , we ﬁrst change the co ordi- nate system b y rescaling X , replacing it with Y = X C − 1 where C , the Cholesky decomp osition of the symmetrised version of X , is an upp er triangular matrix suc h that C ′ C = X ′ X ; this has the eﬀect of changing the scale of the columns of X to b e one, i.e. of making the eigen v alues of Y ′ Y to b e one (and indeed Y ′ Y to b e the identit y). W e then apply a rotation matrix Q to Y on the right, replacing Y with YQ ; here Q is randomly c hosen from the uniform distribution on all rotation matrices b y taking it to b e the rotational (orthogonal) comp onen t of the QR-decomp osition of a matrix all of whose entries are drawn from indep enden t unit Gaussians. W e apply the corresp onding transformations to the curren t v alue of β , so that the distribution of the transformed β using Y corresp onds to that of the original β using X . W e then apply one step of Newton’s metho d, starting from the current p oin t, using the deriv ativ e and Hessian H of the log density , to estimate the mo de of the conditional distribution from which w e wish to sample; w e start our prop osal by moving to ˆ β , half w ay to that estimated mo de from the curren t v alue of β . (W e also add 10 − 4 I times the largest singular v alue of Y to H b efore in verting it to av oid awkw ard singularities arising.) W e complete our prop osal distribution by prop osing a p oin t that is Student distributed with shap e m = 2 and v ariance − H / 2 ab out ˆ β . W e then apply the Hastings ratio (see [ 10 ]) to deter- mine whether to accept newly prop osed v alue of β or to reissue the previous one, b efore ﬁnally transforming the result bac k to the original co ordinate system. W e apply simulated annealing[ 10 ] o ver the ﬁrst 1000 MCMC samples, then contin ue at co olness one for a further 7000 samples, discarding the ﬁrst 2050 of the resulting 8000 samples. W e anneal from a distribution P 0 ( x ) at co olness zero to the mo delled distribution P ( x | θ ) at co olness one, setting the distribution at co olness t to b e P t ( x | θ ) = P 0 ( x ) 1 − t P ( x | θ ) t , where t is the co olness and P 0 ( x ), indep enden t for each patient and mo de of death, is an equiprobable mixture of a zero-cen tred Cauch y of width 30 days truncated b elo w at zero and a p oin t distribution at inﬁnit y . W e set the co olness t b y requiring log  t 1 − t  to b e linearly spaced ov er the 1000 samples for whic h sim ulated annealing is in force. While this suﬃces to deﬁne the annealing scheme, the details of the mo diﬁed resampling are non-trivial. Con vergence was chec ked by using synthetic data and ensuring that the truth for surviv al and hazard rate lies within the equitailed 95% p osterior conﬁdence interv al at least 95% of the time. 3.7 Estimation of the ASI F or each subset of the explanatory v ariables considered, training was done on a randomly selected half of the patients, and ASI estimated on the other half, then swapping the roles of the tw o halv es, exactly as in the 2/2 scenario of [ 6 ]. Ha ving measured the individual samples of ASI on eac h patient, getting 11 Figure 10: Samples of the ASI measured on individual patients using all the av ailable explanatory v ari- ables (dark blue histogram), with mean p osterior skew-Studen t-mixture (green curve), mean p osterior ASI (green dashed line) and equitailed 95% p osterior conﬁdence interv al for the mean ASI (red dashed lines). a histogram such as that of Figure 10 , we used a sk ew-Student mo del identical to that used in [ 6 ] to estimate the mean ASI; it ga ve us a set of p osterior samples of the mean ASI such as that shown in Figure 11 along with its Gaussian ﬁt according to the empirical mean and v ariance of the sample set. F or statistical comparison of mean ASI from one subset of the explanatory v ariables with that from another, the probability of one b eing greater than the other was estimated on the basis of 850 samples of mean ASI from each. Where one of the comparands was from previously published w ork whose samples are no longer av ailable, we b oth applied Gaussian assumptions (reasonable on the basis e.g. of Figure 11 ), and also separately determined b ounds that could b e prov en from the published mean and cen tiles. 4 The dataset The dataset has previously b een describ ed in [ 2 ]. Brieﬂy , pre-op erativ e biomarkers PSA, free PSA, TGF β 1 , IL6sR, IL6, VCAM1, VEGF, endoglin, P AI1, uP A, and uP AR were measured in p eripheral blo od on 423 patients with biopsy-prov en lo calised prostate cancer, all of whom underwen t radical prostatectom y and bilateral lymphadenectom y . Also recorded w ere the ma jor and minor Gleason grades from pre-operative biopsy and from op erativ e samples, age, and whether or not there was seminal v esicle in v asion, extraprostatic extension, tumour at the surgical margins, or metastases in lymph no des. P atients were considered to ha ve relapsed when PSA rose ab o ve 0.2 µ g/litre and remained so on a conﬁrmation reading. P atients w ere follow ed for an av erage of 38.9 months, by which time they had either relapsed or were considered censored when follo w-up ceased. 5 Results The ASI obtained in predictions of relapse time (or of non-relapse at censoring time) was as shown in T able 1 for a v ariet y of subsets of the explanatory v ariables. That obtained using the full set of explanatory v ariables w as approximately double that obtained in [ 5 ] using the same set of v ariables, and w as greater with probabilit y at least 0.859 without distributional assumptions or at least 0.975 using the 12 Figure 11: Posterior samples of the mean ASI tak en from p osterior samples of the skew-Studen t mixture inferred from the histogram of ﬁgure 10 , with the Gaussian of mean and v ariance corresp onding to the empirical v alues from these samples. Gaussian assumption supp orted e.g. by Figure 11 . The v alues obtained using an y subset that included the biomark ers w ere greater than those using only age, PSA, MRI-discov erable v ariables, pre-op Gleason grades, and surgically disco verable v ariables with probability greater than 0.99 (T able 2 ). The calculation of the ASI is illustrated in tuitively in ﬁgures 12 and 13 . Where a red (resp. green) blob is ab o ve the magenta reference prior the patient con tributes a p ositiv e sample of the ASI; where b elo w, a negativ e sample; b oth con tribute to the histogram of ﬁgure 10 . T aking a greedy approach to determine which particular biomarkers would provide how muc h informa- tion ga ve the incremental ﬁndings in T able 3 , showing that TGF β 1 pro vides most ASI ab out time of relapse, follo wed by VCAM1, IL6sR, and uP A, which together with age and PSA giv e ab out 88% of the a v ailable ASI. 13 Figure 12: Illustration of the predictions made by the mo del using all the p oten tial explanatory v ariables. The magenta curve is the density function of the reference prior for measuring the ASI. The blue curves are predictions of the probability density of time of relapse for individual patients, unseen during training, who did in fact relapse; note that they in tegrate to ≤ 1 with the remaining probability b eing a p oin t mass at ∞ . The red blobs indicate the time of actual relapse. The sample of the ASI from an individual suc h patient is the logarithm of (the height of the red blob divided b y the height of the magenta line at the same time p oin t). 14 Figure 13: Illustration of the predictions made by the mo del using all the p oten tial explanatory v ariables. The magen ta curv e is the decum ulative probability function of the reference prior for measuring the ASI. The blue curv es are predictions of the probability density of time of relapse for individual patients, unseen during training, who did not in fact relapse b efore b eing censored. The green blobs indicate the time of actual censoring. The sample of the ASI from an individual such patient is the logarithm of (the height of the green blob divided b y the height of the magenta line at the same time p oin t). 15 (nats) V ariables 2.5% ASI mean ASI 97.5% ASI Prev 0.024 0.109 0.195 A 0.015 0.047 0.078 A G 0.038 0.077 0.120 AM 0.045 0.088 0.132 AMS 0.074 0.124 0.178 AMGS 0.042 0.105 0.158 AB 0.180 0.232 0.290 ABM 0.171 0.224 0.279 ABGM 0.154 0.207 0.267 ABGMS 0.135 0.197 0.256 T able 1: ASI obtained from each subset of the v ariables: A = age, PSA. G = pre-op Gleason grades. M = extraprostatic extension, seminal v esicle inv asion, clinical stage (MRI discov erable v ariables). B = biomark ers in p eripheral blo od (free PSA, TGF-h1, sIL-6R, IL-6, VCAM-1, VEGF, endoglin, P AI-1, uP A, and uP AR). S = tumour presence in op erativ e margin, or in lymph no des, and Gleason grades in surgically remo ved tissue (surgically discov erable v ariables). Prev = result obtained in [ 5 ] using all v ariables. 16 V ariables B V ariables A Prev A A G AM AMS AMGS AB ABM ABGM ABGMS Prev 0.500 0.908 0.747 0.669 0.384 0.526 0.009 0.014 0.029 0.050 A 0.092 0.500 0.129 0.078 0.009 0.047 0.000 0.000 0.000 0.000 A G 0.253 0.871 0.500 0.370 0.089 0.213 0.000 0.000 0.000 0.001 AM 0.331 0.922 0.630 0.500 0.159 0.311 0.000 0.000 0.000 0.003 AMS 0.616 0.991 0.911 0.841 0.500 0.676 0.003 0.006 0.015 0.039 AMGS 0.474 0.953 0.787 0.689 0.324 0.500 0.001 0.001 0.004 0.014 AB 0.991 1.000 1.000 1.000 0.997 0.999 0.500 0.583 0.743 0.797 ABM 0.986 1.000 1.000 1.000 0.994 0.999 0.417 0.500 0.674 0.737 ABGM 0.971 1.000 1.000 1.000 0.985 0.996 0.257 0.326 0.500 0.582 ABGMS 0.950 1.000 0.999 0.997 0.961 0.986 0.203 0.263 0.418 0.500 T able 2: Probability that ASI from v ariable subset A is greater than that from v ariable subset B. A = age, PSA. G = pre-op Gleason grades. M = extraprostatic extension, seminal v esicle inv asion, clinical stage (MRI discov erable v ariables). B = biomark ers in p eripheral blo od (free PSA, TGF-h1, sIL-6R, IL-6, VCAM-1, VEGF, endoglin, P AI-1, uP A, and uP AR). S = tumour presence in op erativ e margin, or in lymph no des, and Gleason grades in surgically remo ved tissue (surgically discov erable v ariables). Prev = result obtained in [ 5 ] using all v ariables. (nats) V ariables 2.5% ASI mean ASI 97.5% ASI Age and PSA only 0.015 0.047 0.078 + also TGF β 1 0.071 0.117 0.163 + also V CAM1 0.129 0.181 0.234 + also IL6sR 0.149 0.203 0.257 + also uP A 0.151 0.203 0.257 + rest 0.180 0.232 0.290 T able 3: ASI pro vided by a greedy incremental approach to adding biomarkers to the array of explana- tory v ariables. 17 6 Discussion This mo del is as far as we know only the second to rep ort statistically signiﬁcantly p ositiv e ASI ab out relapse time after prostatectomy in patien ts with prostate cancer. W e b eliev e it is a signiﬁcant adv ance that approximately double the ASI has b een obtained than using the previous log-skew-Studen t mo del. The authors are aw are of other as yet unpublished applications of this mo del to other situations where predictions of time of surviv al from potential explanatory v ariables are of interest, where also statistically signiﬁcan tly p ositiv e ASI is obtained. W e b eliev e that many other future applications could exist. A t least one of the authors was very surprised to ﬁnd that the information (ASI) pro vided b y this arra y of biomarkers ab out future relapse time was so m uch greater than that provided by pre- and p ost-op Gleason grades, MRI-discov erable v ariables, and surgical ﬁndings combined. Nonetheless, this b ecomes p erhaps slightly less surprising in consideration of the recent commercial announcemen t in the media[ 13 ] that measurement of an array of ov er one h undred biomarkers in blo od and urine can accurately determine the presence or absence of prostate cancer – though at the time of writing there are no publicly av ailable details on exactly what is mean t by “accuracy” in this announcemen t or exactly what biomark ers are b eing measured. Of course, while information on relapse time after prostatectomy is in teresting, it w ould b e even more in teresting to see how muc h information is av ailable about the future time course of possible prostate cancer that has not been treated and is instead b eing managed by watc hful waiting. Datasets on such patien ts exist (e.g.[ 14 ]), but as far as we know do not con tain information on biomark ers. Nonetheless it would ev en b e interesting to know whether in the watc hful waiting situation prostate biopsy and Gleason grading would provide more ASI ab out time course than it do es in a p opulation who ha ve all had radical prostatectom y . Author Con tributions : T ommy W alk er Mack ay wrote the new parts of the mo delling soft ware and designed the resampling and annealing schemes; Mingtong Xu wrote the softw are for mo delling the distribution of individual ASI samples and inferring their mean; Roger Sewell prop osed the mo del, executed the analysis, and wrote the pap er; and Shahrokh Shariat collected the original dataset. F unding : T omm y W alker Mac k ay and Mingtong Xu thank T rinity College Cambridge (W oo ds fund) for supp ort. Mingtong Xu was also supp orted by a W ellcome T rust Principal Research F ello wship (223103/Z/21/Z) and National Institutes of Health gran t (5 R01 AI054503) to Lalita Ramakrishnan. Op en Access : F or the purpose of open access, the author has applied a CC BY public cop yright license to any Author Accepted Manuscript version arising from this submission. This w ork is licensed under a Creativ e Commons Attribution 4.0 International License. 18 References [1] A. Jemal, R. Siegel, E. W ard, T. Murray , J. Xu, and M. Thun, “Cancer statistics,” CA: A c anc er journal for clinicians , vol. 57, pp. 43–66, 2007. [2] S. Shariat et al. , “Impro ved prediction of disease relapse after radical prostatectom y through a panel of preop erativ e blo od-based biomark ers,” Clinic al Canc er R ese ar ch , vol. 14, no. 12, pp. 3785–3791, 2008. [3] D. Cox, “Regression mo dels and life tables,” Journal of the R oyal Statistic al So ciety Series B , v ol. 34, no. 2, pp. 187–220, 1972. [4] F. Harrell, K. Lee, and D. Mark, “Multiv ariable prognostic models: Issues in dev eloping modeols, ev aluating assumptions and adequacy , and measuring and reducing errors,” Statistics in Me dicine , v ol. 15, pp. 361–387, 1996. [5] R. F. Sewell, E. J. Crow e, and S. F. Shariat, “Biomarkers can predict time of recurrence of prostate cancer with strictly p ositiv e apparent shannon information against an exp onen tial attrition prior.” https://arxiv.org/pdf/2404.17857.pdf , 2010. [6] R. F. Sewell, “Assessment of the quality of a prediction.” pdf , 2010. [7] M. Kattan, J. Eastham, A. Stapleton, and T. Wheeler, “A preop erativ e nomogram for disease recurrence follo wing radical prostatectomy for prostate cancer,” Journal of the National Canc er Institute , v ol. 90, pp. 766–771, 1998. [8] M. Graefen, P . Karakiewicz, I. Cagiannos, et al. , “In ternational v alidation of a preop erative nomo- gram for prostate cancer recurrence after radical prostatectomy ,” Journal of Clinic al Onc olo gy , v ol. 20, pp. 3206–3212, 2002. [9] L. Whit worth et al. , “A Ba yesian analysis of the asso ciation b etw een Leukotriene A4 Hydrolase genot yp e and surviv al in tub erculous meningitis,” eLife , vol. 10, 2021. [10] Radford M. Neal, “Probabilistic inference using Marko v Chain Monte Carlo Metho ds,” 1993 , T ech- nical Rep ort CRG-TR-93-1, Department of Computer Science, Universit y of T oron to. [11] John Dagpunar, Principles of R andom V ariate Gener ation . Clarendon Press (Oxford), 1988. [12] W.R. Gilks, “Adaptiv e Rejection Sampling.” https://en.wikipedia.org/wiki/Rejection_ sampling#Adaptive_rejection_sampling . Retrieved on 14.12.2019. [13] C. T. Ev ans, “PM with Ev an Da vies.” https://www.bbc.co.uk/sounds/play/m00282hv . First transmitted on 17.2.2025; starting at 19 min utes in; retrieved on 18.2.2025. [14] F.C. Hamdy and others for the ProtecT study group, “10-year outcomes after monitoring, surgery , or radiotherapy for lo calized prostate cancer,” New England Journal of Me dicine , vol. 375, no. 15, pp. 1415–1424, 2016. 19

A Bayesian Gamma-power-mixture survival regression model: predicting the recurrence of prostate cancer post-prostatectomy

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment