Generalizations related to hypothesis testing with the Posterior distribution of the Likelihood Ratio
The Posterior distribution of the Likelihood Ratio (PLR) is proposed by Dempster in 1974 for significance testing in the simple vs composite hypotheses case. In this hypotheses test case, classical frequentist and Bayesian hypotheses tests are irreco…
Authors: I. Smith, A. Ferrari
GENERALIZA TIONS RELA TED TO HYPOTHESIS TESTING WITH THE POSTERIOR DISTRIBUTION OF THE LIKELIHOOD RA TIO. I. SMITH L ab or atoir e des Scienc es du Climat et de l’Envir onnement ; IPSL-CNRS, F r anc e. Université de Nic e Sophia-Antip olis, CNRS, Observatoir e de la Côte d’Azur, F r anc e. A. FERRARI Université de Nic e Sophia-Antip olis, CNRS, Observatoir e de la Côte d’Azur, F r anc e Abstra ct. The P osterior distribution of the Lik eliho o d Ratio (PLR) is proposed by Dempster in 1974 for significance testing in the simple vs comp osite h yp otheses case. In this hypotheses test case, classical frequentist and Bay esian h yp otheses tests are irreconcilable, as emphasized b y Lindley’s parado x, Berger & Selke in 1987 and many others. How ev er, Dempster shows that the PLR (with inner threshold 1) is equal to the frequentist p-v alue in the simple Gaussian case. In 1997, Aitkin extends this result by adding a n uisance parameter and showing its asymptotic v alidity under more general distributions. Here we extend the reconciliation b etw een the PLR and a frequen tist p-v alue for a finite sample, through a framework analogous to the Stein’s theorem frame in which a credible (Ba yesian) domain is equal to a confidence (frequentist) domain. This general reconciliation result only concerns simple vs comp osite h ypotheses testing. The measures prop osed by Aitkin in 2010 and Ev ans in 1997 ha ve interesting prop erties and extend Dempster’s PLR but only by adding a n uisance parameter. Here we prop ose tw o extensions of the PLR concept to the general comp osite vs comp osite hypotheses test. The first extension can b e defined for improp er priors as so on as the p osterior is proper. The second extension app ears from a new Ba yesian-t yp e Neyman-Pearson lemma and emphasizes, from a Bay esian p ersp ectiv e, the role of the LR as a discrepancy v ariable for hypothesis testing. 1. Intr oduction 1.1. Classical hypotheses test metho dologies. Simple v ersus comp osite hypotheses testing is a general statistical issue in parametric mo deling. It consists for a given observed dataset x in c ho osing among the h yp otheses H 0 : θ = θ 0 H 1 : θ ∈ Θ 1 (1) where the distribution of x is c haracterized by the underlying unkno wn parameter θ . Under the alternativ e h yp othesis H 1 , θ takes a v alue different from the p oint θ 0 , and the uncertaint y of θ is describ ed by a prior probability densit y function π 1 ( θ ) which is p ositive only for θ ∈ Θ 1 . W e assume that the data mo del p ( x | θ ) has the same expression under H 0 and H 1 . T o c ho ose among H 0 and H 1 , a test statistic T ( x ) (such as the Generalized Likelihoo d Ratio) is generally compared to a threshold ζ and one decides to c ho ose H 0 if T ( x ) is greater than ζ . If H 1 is c hosen whereas the true underlying θ w as equal to θ 0 , a t yp e I error is made in the decision. Under the classical Neyman paradigm (see Neyman and Pearson (1933); Neyman (1977)), the E-mail addr esses : zazoo@mac.com, andre.ferrari@unice.fr . Key wor ds and phr ases. hypothesis testing, PLR, p-v alue, likelihoo d ratio, frequen tist and Bay esian reconcili- ation, Lindley’s paradox, in v ariance, Neyman-Pearson lemma. P art of this work was published in Smith and F errari (2014). 1 GENERALIZA TIONS RELA TED TO THE POSTERIOR DISTRIBUTION OF THE LR 2 threshold ζ is chosen so that the probabilit y of the type I error lies under (or is equal to) some fixed level α , t ypically a 5% error rate. Instead of inv erting this function, a p-v alue can b e defined in order to serve as the test statistic to b e directly compared to the 5% level (Lehmann and Romano (2005)): p v al ( T ( x 0 )) = Pr ( T ( x 0 ) < T ( x ) | θ 0 ) (2) where x 0 is the observed dataset and x the v ariable of integration. Note that with this notation, H 0 is rejected when p v al ( T ( x 0 )) is greater than some threshold. On the Bay esian side, the test statistic classically used (Rob ert (2007)) is the Bay es F actor (BF) defined b y BF ( x ) = p ( x | θ 0 ) R dθ p ( x | θ ) π 1 ( θ ) Making a binary decision consists of choosing H 0 if BF ( x ) is greater than some threshold, and the c hoice of the threshold is made in general b y a straight in terpretation of the BF. The Jeffreys’s scale for example states that if the observed BF is b etw een 10 and 100 there is a strong evidence in fa vor of H 0 . The mere p osterior probability Pr ( H i | x ) of an hypothesis may also b e considered b y itself. A practical issue of the BF in the simple vs comp osite h yp otheses test is that it is defined up to a multiplicativ e constant if the prior π 1 is improp er 1 ev en though the p osterior distribution is prop er. P artial BF s accoun t for this issue by someho w using part of the data to up date the prior into a prop er p osterior, and then use this p osterior as the prior for the rest of the data. The most simply defined P artial BF is the F ractional BF (FBF) prop osed by O’Hagan (1995). A related and more fundamen tal issue is Lindley’s paradox, initially studied by Jeffreys (1961) and called a paradox b y Lindley (1957), which shows among others that, when testing a simple vs a comp osite h yp othesis, the null hypothesis H 0 is to o highly fa v oured against H 1 for a natural diffuse prior under Θ 1 . More precisely , for example i n the test of the mean of a Gaussian lik eliho o d, the p-v alue | x | defines the uniformly most p ow erful test, which is a v ery strong optimal prop erty even according to at least part of the Bay esian communit y . Ho w ever, for a fixed prior and some dataset x that adjusts so that the asso ciated classical p-v alue remains fixed (so that the evidence for H 0 shall not change), Pr ( H 0 | x ) / Pr ( H 1 | x ) tends to 1 as the sample size increases. This issue, intensiv ely discussed and developed (see T sao (2006) for a quite recent study), is consensually considered as a real trouble b y a quite large part of the comm unity . Unlike the BF, other tests lik e the FBF or the Bernardo (2011) test do not suffer from this problem in Lindley’s frame. Other ideas hav e b een developed which prev ent Lindley’s frame from o ccurring, a v oiding troubles for the BF. Berger and Delampady (1987) for example argue that testing a simple hypothesis is an unreasonable question. Some other references will b e giv en in the section 2.1. Among many frequentist and Bay esian p-v alues (several are listed b y Robins et al. (2000)), the next most classical Ba y esian-type hypotheses test statistic is the p osterior predictiv e p-v alue, highligh ted by Meng (1994). Unlike the BF which only integrates ov er the parameter space Θ , the p osterior predictiv e p-v alue in tegrates ov er the data space X , lik e frequentist p-v alues. But unlike the frequen tist p-v alue which in tegrates under the frequen tist likelihoo d p ( x | θ 0 ) , it in tegrates under the predictive likelihoo d p ( x pred | x 0 ) = R dθ p ( x pred | θ ) π ( θ | x 0 ) where x 0 is the observ ed dataset. In a frequentist p-v alue only a statistic (ie a function of x only) can define the domain of integration. On the contrary , in the p osterior predictiv e p-v alue, a discrepancy v ariable (function of b oth x and θ ) can b e used to define the domain of integration. Note that the c hoice of the discrepancy v ariable to use there remains an issue. Although a bit less classical, the approach of Ev ans (1997) needs to b e introduced b ecause the tool and some of its prop erties are in teresting and closely related to the ones deriv ed in this pap er. In the simple vs comp osite test case presented up to now, the to ol prop osed by Ev ans 1 π 1 is called improp er if its integral ov er Θ 1 is infinite, whic h occurs if π 1 is constant ov er an unbounded domain for example. GENERALIZA TIONS RELA TED TO THE POSTERIOR DISTRIBUTION OF THE LR 3 (1997) and the ones studied in this pap er are ev en mathematically equal. But the to ol prop osed b y Ev ans (1997) is defined to test more generally H 0 : Ψ( θ ) = ψ 0 for a parameter of in terest ψ = Ψ( θ ) . The test statistic consists of measuring the Observed Relative Surprise (ORS) related to the h yp otheses b y computing: ORS ( x ) = Pr π Ψ (Ψ( θ ) | x ) π Ψ (Ψ( θ )) ≥ π Ψ ( ψ 0 | x ) π Ψ ( ψ 0 ) x (3) The relative b elief ratio of ψ defined by RB ( ψ ) = π Ψ (Ψ( θ ) | x ) π Ψ (Ψ( θ )) − 1 is measuring the change in b elief in ψ b eing the true v alue of Ψ( θ ) from a priori to a p osteriori . So if RB ( ψ 0 ) > 1 w e ha v e evidence in fav or of H 0 . Relative b elief ratios are discussed in Baskurt and Ev ans (2013) where RB ( ψ 0 ) is presented as the evidence for or against H 0 and (3) is presented as a measure of the reliability of this evidence. This leads to a p ossible resolution of Lindley’s parado x as the relativ e b elief ratio can b e large and ORS small without contradiction. See Example 4 of Baskurt and Ev ans (2013) and note that Ev ans (1997) shows that ORS conv erges to the classical p-v alue as the prior b ecomes more diffuse in this example. 1.2. P osterior distribution of the Likelihoo d Ratio (PLR). Let’s fo cus again on the sim- ple vs comp osite hypothesis test. Con trary to the p osterior predictive p-v alue, the Posterior distribution of the Likelihoo d Ratio (PLR) do es not integrate ov er some data whic h are unob- serv ed, but only in tegrates ov er Θ . It still conditions up on the only observed v ariable, namely x 0 , lik e for the BF, but on a domain defined from a divergence v ariable, like the p osterior predictive p-v alue. This statistic prop osed by Dempster (1973) is defined by (4) PLR ( x, ζ ) = Pr LR ( x, θ ) ≤ ζ x where LR ( x, . ) is the Lik eliho o d Ratio LR ( x, θ ) = p ( x | θ 0 ) p ( x | θ ) θ ∈ Θ 1 Since θ is random, the deterministic function LR ( x, . ) ev aluated at the random v ariable θ b ecomes naturally random with some p osterior distribution characterized by its cumulativ e dis- tribution, the PLR. As emphasized by Birn baum (1962), Dempster (1973) and Roy all (1997), the threshold ζ which compares the original likelihoo ds under H 0 and under H 1 is directly in- terpretable and can b e chosen the same wa y an error level α is chosen in the Neyman-Pearson paradigm. “ PLR ( x, 1) = 0 . 1 ” for example reads “The probabilit y that the likelihoo d of θ 1 is more than the lik eliho o d of θ 0 is 0.1.”. The PLR can therefore b e used for a binary decision, b y fixing ζ and deciding to reject H 0 if PLR ( x, ζ ) is greater than, say , 0.9. One can chec k if the binary decision is sensitive to the choice of b oth thresholds by making the test for sev eral thresholds and see if the decision is differen t. In the extreme case, note that due to the nice definition of the PLR, one can simply display PLR ( x, ζ ) as a function of ζ to get a broad view. The range of ζ under which PLR ( x, ζ ) gro ws t ypically from 0.2 to 0.8 indicates if the decision for H 0 or H 1 is clear, or not. As so on as the p osterior can b e sampled, these computations and graphs are v ery easy to display as will b e explained later. The PLR has b een first prop osed b y Dempster (1973, 1997), then studied esp ecially b y Aitkin (1997) and Aitkin (2010) but also used and analyzed by Aitkin et al. (2005, 2009). As men tioned in the previous subsection, it turns out that the PLR is also closely related to the ORS prop osed b y Ev ans (1997), which generalizes the PLR. The PLR is also closely related to the e-v alue asso ciated to the F ull Bay esian Significance T est (FBST) from Pereira and Stern (1999) and sligh tely revisited b y Borges and Stern (2007) whic h then someho w generalizes the PLR b y adding a reference distribution on θ , and b y systematically dealing with the case where the n ull h yp othesis domain Θ 0 has a dimension less than Θ 1 but which is not necessarily restricted to the p oint Θ 0 = { θ 0 } . W e do not list the results found by these different analyses, apart from some sp ecifically menti oned ones. GENERALIZA TIONS RELA TED TO THE POSTERIOR DISTRIBUTION OF THE LR 4 The PLR turns out to b e a natural Bay esian measure of evidence of the studied hypotheses since it inv olv es only the p osterior distribution of θ (no integral o v er X ) and the likelihoo d, claimed b y Birnbaum (1962), Roy all (1997) and others, to b e the only to ol that can measure evidence. Unlike the BF, the PLR is w ell defined for an improper prior as so on as the p osterior is prop er, and is not sub ject to Lindley’s paradox. It is also inv arian t under an y isomorphic transformation of the X space and any transformation of the Θ space, as a consequence of b eing a mere function of the likelihoo d. These last prop erties were emphasized for example for the e-v alue asso ciated to the FBST. The PLR also consists in a natural alternative to the BF in different regards. T o start with, the PLR first compares (compares p ( x | θ 0 ) and p ( x | θ ) ) and then integrates, whereas the BF first in tegrates and then compares (compares p ( x | θ 0 ) and R dθ p ( x | θ ) π ( θ ) ). Second, Newton and Raftery (1994) and man y others sho w that if the prior under H 1 is prop er, the BF is simply the p osterior mean of the LR, ie the mean of the distribution describ ed b y the PLR 2 . Ho w ev er a p oin t estimate is in general not given alone but accompanied by an uncertaint y indicator. Smith and F errari (2010) sho w that the p osterior mean of the LR raised at some p o w er is equal to the FBF introduced previously; the mean of the PLR is given by the BF and its v ariance is easily related to the FBF. Ho w ev er, Smith (2010) shows that the Generalized Likelihoo d Ratio b ounds the support (v alues of ζ ( x ) for which PLR ( x, ζ ) > 0 ) of the PLR and that at this low er b ound the PLR in general starts b y an infinite deriv ativ e. In addition to this theoretical result, n umerical examples also indicate that the p osterior density function of the LR is in general highly asymmetric. Therefore, the BF (p oint estimate of the LR) or any standard centered credible in terv als do not app ear to b e relev an t inferences ab out the LR seen as random v ariable. Instead, the same w a y the BF is to b e thresholded, the actual information ab out LR ( x, θ ) whic h seems to b e relev ant, and in v ariant under the transformation LR ( x, θ ) 7→ ( LR ( x, θ )) − 1 , is to indicate its cum ulativ e p osterior distribution, whic h is precisely the PLR. In practice, the PLR can b e straightforw ardly computed as so on as the p osterior distribution π ( θ | x ) is sampled. Just obtain from a Monte Carlo Mark ov Chain (MCMC) algorithm an almost i.i.d. c hain { θ [1] , ..., θ [ m ] } from the p osterior distribution π ( θ | x ) and compute LR ( x, θ [ i ] ) for each sample. The resulting histogram sketc hes the p osterior density of the LR and the plot of the empirical cum ulativ e distribution of the LR chain sk etches the PLR as a function of ζ . The PLR has b een realistically and thouroughly applied b y Smith (2010) to the detection of extra-solar planets from images acquired with the dedicated instrument SPHERE mounted on the V ery Large T elescop e. At this momen t, only v ery finely simulated images w ere a v ailable. The PLR has b een applied to t w o sim ulated datasets, one in whic h no extra-solar planet is presen t (dataset sim ulated under H 0 ) and the other in which an extra-solar planet is present ( H 1 dataset). Although the extra-solar planet is very dark ( 10 6 times less brigh t than the star it surrounds), close to the star (angular distance in the sky of 0 . 2 arcseconds i.e. 6 . 10 − 5 degrees), and although only 2 × 20 images were used, thanks to the quality of the optical instruments and of the statistical mo del the detection and not detection were eviden t, with PLR ( x, 0 . 1) = 0 . 0 for the dataset under H 0 and PLR ( x, 0 . 1) = 0 . 94 for the dataset under H 1 . As studied by Smith (2010), the statistical mo del and consecutive metho d are very satisfying compared to classical metho ds. 1.3. Problematics addressed here. Despite its p otential interest the PLR has not b een ex- tensiv ely studied up to now. This pap er aims at con tributing in this in v estigating w ork b y some new results. In the simple vs comp osite hypotheses test case, it turns out that the PLR plays a strong role in understanding the p ossible reconciliation b et w een frequentist and Bay esian hypothesis testing. The PLR with inner threshold ζ = 1 is simply equal to some frequentist p-v alue for some “likelihoo d - prior - hypotheses” combinations. Dempster (1973); Aitkin (1997) ha v e first 2 Alternativ ely , note that if we had defined the BF and LR with the alternative hypothesis at the numerator of these fractions, the BF would ha ve b een the prior mean of the LR. GENERALIZA TIONS RELA TED TO THE POSTERIOR DISTRIBUTION OF THE LR 5 noticed and highligh ted this equiv alence when testing the mean of a gaussian likelihoo d with a uniform prior. In the section 2, we extend the conditions of this equiv alence result under a frame analogous to the one used to reconcile confidence and credible domains. The subsection 2.1 synthesizes the long quest of reconciliation b etw een frequentist and Bay esian h yp otheses tests, the subsection 2.2 pro v es and discusses the reconciliation reached betw een the PLR and some frequentist p-v alue in suc h an inv arian t frame, the subsection 2.3 gives examples and p ersp ectives, and the subsection 2.4 discusses the connection b etw een this reconciliation result and the one obtained b etw een (frequen tist) confidence domains and (Ba yesian) credible regions. Aitkin (1997) and Aitkin (2010) extended the PLR definition to an hypotheses test frame iden tical to the one presented at the end of the subsection 1.1, namely H 0 : Ψ( θ ) = ψ 0 , also considered by Ev ans (1997) and others. How ever, the PLR has not b een yet generalized to the general comp osite vs comp osite hypotheses test. The generalization is somehow unnatural for a frequentist p-v alue b ecause for a simple hypothesis H 0 : θ = θ 0 the p-v alue is a frequentist probabilit y conditioned on the fixed parameter θ 0 (see equation 2), although a conditional prob- abilit y cannot b e defined on a comp osite set Θ 0 if no probability distribution ov er Θ is used. By con trast, the PLR is recipro cally a probability conditioned up on the observed dataset x 0 , and x 0 naturally remains fixed under a comp osite hypothesis. Therefore, the transition from simple to comp osite null hypothesis do es not raise immediate obstacles for the PLR. How ever, a join t measure on the parameter spaces of b oth hypotheses is still required. The section 3 prop oses and motiv ates t wo generalizations of the PLR. The mathematical expressions of the tw o extensions are simply giv en and rephrased in the subsection 3.1. The first extension in particular enables the use of improp er priors as so on as the p osterior is prop er. It can therefore b e used in the subsection 3.2 for the detection of precipitation change where an almost improp er prior is to b e used but leads to a prop er p osterior. On the other side, the second extension, made of t w o symmetrical probabilities, app ears in a Bay esian version of the Neyman-P earson lemma. As detailed in the subsection 3.3, the tw o join t measures asso ciated to no sp ecific discrepancy v ariable lead through the lemma to the discrepancy v ariable LR ( x 0 , θ ) . A concluding discussion is prop osed in the section 4. The app endices essentially presen t the pro ofs of the mathematical results. 2. Equiv alence between the PLR and a frequentist p-v alue 2.1. Previous ten tativ e reconciliations of frequentist and Ba y esian tests. As in tro duced in the section 1.1, Lindley’s paradox presents a frame where Pr ( H 0 | x ) (often thought as b eing the Ba yesian measure of evidence) may b e exp ected to b e equal to the frequen tist p-v alue, but happ ens not to b e. Also, the BF is not satisfying in the frame “p oint null hypothesis H 0 and diffuse prior π 1 ”. This highlights the need for other Bay esian-t yp e h yp otheses tests, but also raises more generally the question of reconciliation b etw een frequentist and Ba y esian hypotheses tests. The conditions up on whic h frequen tist (Neyman (1977)) and Bay esian (Jeffreys (1961)) an- sw ers agree is alw ays of in terest in order to understand the in terpretation of the procedures and the limits of the t w o paradigms, somehow defined b y what they are not . A first approach to see when could frequentist and Bay esian hypotheses tests b e unified consists of analyzing, for differen t h yp otheses likelihoo ds and priors, when are the classical p-v alue and Pr ( H 0 | x ) equal. These tw o concepts are to b e compared b ecause they both seem to handle only H 0 and in very simple wa ys, one from the frequentist the other from the Bay esian p ersp ectives 3 . It turns out that unlike for a comp osite n ull hy p othesis (e.g. Casella and Berger (1987)), for a p oin t null hypothesis Lindley’s paradox Pr ( H 0 | x ) > p v al ( x ) alwa ys seems to hold. Berger and Sellk e (1987) in particular sho w that among very broad classes of priors Pr ( H 0 | x ) > p v al ( x ) alwa ys holds for Pr ( H 0 ) = 0 . 5 . Also see the extensiv e list of references included. Oh and DasGupta (1999) follo ws this analysis b y studying the effect of the c hoice of Pr ( H 0 ) . 3 Note how ever that H 1 is implicitely taken in to account through the marginal distribution of x in Pr ( H 0 | x ) . GENERALIZA TIONS RELA TED TO THE POSTERIOR DISTRIBUTION OF THE LR 6 Another approach consists of mo difying the standard frequen tist pro cedure and/or the stan- dard Bay esian hypotheses test pro cedure, but still relying on the p-v alue and Pr ( H 0 | x ) , to see if they can then b e made equiv alen t. Berger and Delampady (1987) for example study “precise” (concen trated) but not exactly “p oint” hypotheses, Berger et al. (1994) use frequen tist p-v alues computed from a likelihoo d conditioned up on a set in which lies the observed dataset, not on the dataset itself, and define a non-decision domain in the BF test pro cedure. Sellk e et al. (2001) adv o cate calibrating ( r esc aling ) the frequentist p-v alue to relate this new statistic to other test statistics. As already mentioned in the section 1.1, one can also try to unify the p-v alue to Bay esian type statistics fully differ ent from the BF, to see when frequentist and Ba y esian types hypotheses tests can b e made equiv alen t. In particular, when Dempster (1973) prop osed to use the PLR, he also men tioned that when testing the mean of a normal distribution, the PLR is equal to the classical frequentist p-v alue when computed for a uniform prior and with inner parameter ζ = 1 . This fundamen tal result w as again emphasized by Aitkin (1997) and Dempster (1997). Aitkin (1997) asymptotically extended this result to any regular distribution, making use of the asymptotic conv ergence of a regular distribution tow ards a normal distribution. F or an y regular con tin uous distribution and a smo oth prior, the PLR, with ζ = 1 , tends asymptotically to the classical p-v alue. Also, with a nuisance parameter η and still calling θ the tested parameter, he defines LR by LR ( x, θ , η ) = p ( x | θ 0 , η ) /p ( x | θ , η ) , in whic h case under the same conditions as in the previous case the PLR is equal to a p-v alue. F or a normal distribution, when testing the mean and considering the v ariance as a nuisance parameter, the result is also true for a finite sample. 2.2. New reconciliation result. The sets of conditions found by Dempster (1973) and Aitkin (1997) under whic h the PLR (with ζ = 1 ) is equal to a p-v alue are directly related to the test of the mean of a normal distribution under a uniform prior. The next subsection generalizes this exact finite-sample result under the frame of statistical in v ariance. As will b e discussed at the end of the section, although the technical conditions derived here ma y b e relaxed, it may b e difficult to find, at least within the current statistical frame, a fundamentally more general frame of conditions for an equalit y b et w een the PLR and a p-v alue to hold. As presented in curren t classical textb o oks in Ba yesian statistics (Berger (1985), Rob ert (2007)), inv ariance in statistics arises from the inv ariant Haar measure defined on some top olog- ical group. Throughout this subsection and the related app endices, we will use the notions and results synthesized b y Nach bin (1965) and Eaton (1989). The tools necessary to understand the result are in tro duced in the app endix 1. In this frame, the PLR (given by an integral ov er the parameter space Θ ) can b e reexpressed as an in tegral o v er the sample space X , equal to a p-v alue for ζ = 1 . In this subsection x and θ denote random v ariables or v ariables of in tegration according to the context. First, for clarity , we give the equiv alence b et w een the PLR and a frequentist integral under the assumption that the sample space X , the parameter space Θ and the transformations group G are isomorphic. Theorem 1. Cal l P Θ = { p ( . | θ ) , θ ∈ Θ } a family of pr ob ability densities with r esp e ct to the L eb esgue me asur e on X , and c al l G a gr oup acting on X . Assume that P Θ is invariant under the action of the gr oup G on X and note ¯ g θ the induc e d action of the element g ∈ G on the element θ ∈ Θ . Cal l H r and H l r esp e ctively a right and left Haar me asur es of G and assume that (1) G , X and Θ ar e isomorphic. (2) The prior me asur e Π r is the me asur e induc e d by H r on Θ . (3) The me asur e induc e d by H l on X is absolutely c ontinuous with r esp e ct to the L eb esgue me asur e. Cal l π l the c orr esp onding density. (4) The mar ginal density of x is finite, so that the p osterior me asur e Π r x on Θ , classic al ly define d by the e quation (23), defines the p osterior pr ob ability Pr ( . | x 0 ) . GENERALIZA TIONS RELA TED TO THE POSTERIOR DISTRIBUTION OF THE LR 7 Then, the PLR define d by the e quation (4) c an b e r e expr esse d for any ζ > 0 as the fr e quentist inte gr al: PLR ( x 0 , ζ ) = Pr p ( x 0 | θ 0 ) π l ( x 0 ) ≤ ζ p ( x | θ 0 ) π l ( x ) | θ 0 (5) wher e x 0 ∈ X is the observe d data and θ 0 ∈ Θ the p ar ameter value under the nul l hyp othesis. A more general theorem (Theorem 2) derived in a frame which av oids the Leb esgue assumption and may inv olv e more technical conditions is prov ed in App endix 2. Theorem 1 is a consequence of Theorem 2 and its pro of is given in App endix 3. The assumption that G and X are isomorphic is easily relaxed by replacing the sample space b y the space of a sufficient statistic. Recall that if X is a random v ariable whose probability distribution is parametrized b y θ , S ( X ) is called a sufficien t statistic of θ if the probability distribution of X conditioned up on the random v ariable S ( X ) do es not dep end on θ . Note that according to the Darmois (1935) theorem, among families of probability distributions whose domains do not v ary with the parameter b eing estimated, only in exp onential families is there a sufficien t statistic whose dimension remains b ounded as the sample size increases. The expression of the theorem 2 is simply extended by replacing X b y a sufficient statistic S ( X ) in the assumptions and b y replacing in the frequentist integral the probability densit y of X b y the one of S ( X ) : Corollary 1. Cal l P Θ = { p ( . | θ ) , θ ∈ Θ } a family of pr ob ability densities with r esp e ct to any me asur e on X . Cal l S ( X ) , for X ∈ X , a sufficient statistic of θ and P S, Θ = { p S ( . | θ ) , θ ∈ Θ } the family of pr ob ability densities of S ( X ) with r esp e ct to the L eb esgue me asur e on S ( X ) . Cal l G a gr oup acting on S ( X ). Assume that P S, Θ is invariant under the action of the gr oup G on S ( X ) and note ¯ g θ the induc e d action of the element g ∈ G on the element θ ∈ Θ . Cal l H r and H l r esp e ctively any right and left Haar me asur es of G . Assume that (1) G , S ( X ) and Θ ar e isomorphic. (2) The prior me asur e Π r is the me asur e induc e d by H r on Θ . (3) The me asur e induc e d by H l on S ( X ) is absolutely c ontinuous with r esp e ct to the L eb esgue me asur e. Cal l π l the c orr esp onding density. (4) The mar ginal density of x is finite, so that the p osterior me asur e Π r x on Θ , classic al ly define d by the e quation (23), defines the p osterior pr ob ability Pr ( . | x 0 ) . Then, the PLR define d by the e quation (4) c an b e r e expr esse d, with x 0 ∈ X , θ 0 ∈ Θ and ζ > 0 , as the fr e quentist inte gr al: PLR ( x 0 , ζ ) = Pr p S ( S ( x 0 ) | θ 0 ) π l ( S ( x 0 )) ≤ ζ p S ( S ( x ) | θ 0 ) π l ( S ( x )) | θ 0 (6) wher e x 0 ∈ X is the observe d data and θ 0 ∈ Θ the p ar ameter value under the nul l hyp othesis. The pro of follows the pro of of the theorem 1 in the App endix 3. By ev aluating ζ = 1 in the result, the PLR with ζ = 1 is easily and finally sho wn to b e equal to a frequentist p-v alue, where the test statistic is a w eigh ted marginal likelihoo d of the sufficien t statistic S ( x ) . Corollary 2. Under the assumptions of the c or ol lary 1, the PLR with inner thr eshold ζ = 1 is e qual to a p-value: (7) PLR ( x 0 , 1) = p val T ( x 0 ) with the test statistic (8) T ( x ) = p S ( S ( x ) | θ 0 ) π l ( S ( x )) The corollary 2 can b e reexpressed as the fact that under the inv ariance assumptions, rejecting H 0 when PLR ( x 0 , 1) > p is equiv alen t to rejecting H 0 when p v al T ( x 0 ) > p where the p-v alue GENERALIZA TIONS RELA TED TO THE POSTERIOR DISTRIBUTION OF THE LR 8 is based on the idea of rejecting H 0 when T ( x 0 ) defined in equation (8) (observ ed w eigthed lik eliho o d under H 0 ) is not large enough. 2.3. Examples and p ersp ective. Dempster (1973) has shown that the PLR is equal to the classical p-v alue asso ciated to the test statistic T ( x ) = | ¯ x − θ 0 | when testing the mean of a normal family for X with a uniform prior on Θ . The corollary 2 extends this result since the normal family is one of the distributions inv arian t under translation when testing the lo cation parameter, the uniform prior (i.e. Leb esgue measure) is the measure induced from the right Haar measure asso ciated to translation, and the test statistic T ( . ) is a monotone function of p S ( S ( . ) | θ 0 ) π l ( S ( . )) − 1 since the translation (sum) is comm utative, so that ∆( g ) = 1 for all g ∈ G and so π l is constan t. The result prov ed here concerns all distributions inv arian t under some group transformation, under the assumptions that there exists a sufficien t statistic and that the sets G , S ( X ) and Θ are isomorphic. Assume for example that the lik eliho o d p S has the t ypical form p S ( S ( x ) | θ ) = θ − 1 f S ( x ) θ − 1 . The likelihoo d is inv arian t under the scale transformation g ( S ( x )) = α × S ( x ) and the actions on S ( X ) and Θ are iden tical. Note that U f ( U ) with U = S ( X ) θ − 1 is a pivotal quan tit y , meaning that its distribution do es not dep end on θ . The induced prior measure is classically giv en by Π r ( dθ ) ∝ θ − 1 dθ . Since the multiplication transformation is comm utativ e, the mo dulus ∆ is uniformly equal to 1, so that the test statistic that app ears in the p-v alue (corollary 2) is simply T ( x ) = S ( x ) θ − 1 0 f S ( x ) θ − 1 0 where θ 0 is the v alue of the parameter under H 0 . F or a more general insight into the relationship b et w een Haar inv ariance and the Fisher piv otal theory , see Eaton and Sudderth (1999). The theorem 2 assumes that G , X and Θ are isomorphic. This assumption is relaxed in the corollaries 1 and 2 where the sample X is replaced b y a sufficient statistic S ( X ) : G , S ( X ) and Θ are assumed to be isomorphic. This tric k is one of the t w o classical dimensionality reduction tec hniques concerning Haar measures applied to statistical problems and somehow restricts the likelihoo d to b elong to the exp onential family from Darmois theorem. The second tric k consists schematically in replacing S ( X ) b y the orbit of G asso ciated to the observ ed dataset O x 0 = { g x 0 | g ∈ G } ⊂ X . Ho wev er, the whole set of assumptions that would b e inv olved is more technical, see for example the general assumptions made by Zidek (1969) or Eaton and Sudderth (2002), and not in v estigated here. 2.4. Connection to other Bay esian and frequen tist reconcilations. The result, whic h concerns hypothesis testing, may b e related to the different approaches used to reconcile fre- quen tist and Ba yesian point estimation someho w and confidence interv al esp ecially . Group inv ariance applied to inv arian t inference is the classical frame of suc h unifications. The Fisherian pivotal theory (Fisher (1956)) is an imp ortan t contribution mainly to the “frequen tist” side and the righ t Haar measure to the “Bay esian” side. The reconciliation of the tw o approaches has started with F raser (1961) and has b een deeply studied since then, by Zidek (1969) for example. The most general stage of unification is reached by Eaton and Sudderth (1999). They explicit the cen tral h yp othesis of the Fisherian pivotal theory and show under quite standard assumptions in inv ariance that this hypothesis leads to a pro cedure which is identical to the Ba y esian inv ariant pro cedure when using the prior induced b y the righ t Haar measure. Note that they also sho w (and in a more general manner by Eaton and Sudderth (2002)) that for a Ba y esian inv ariant inference to b e admissible (in the sense that there exists no in v ariant inference whose mean quadratic error is lo w er for all θ ) it has to b e obtained from the right Haar prior. More concretely , the question related to reconciled probabilit y domains is: “Under what as- sumptions do es the following equality hold?” Pr θ ∈ R ( x ) x = Pr θ ∈ R ( x ) θ (9) i.e. Z { θ ∈R ( x ) } dθ π ( θ | x ) = Z { x | θ ∈R ( x ) } dx p ( x | θ ) GENERALIZA TIONS RELA TED TO THE POSTERIOR DISTRIBUTION OF THE LR 9 F or the equality to hold, eac h probability needs to b e a constant. After F raser (1961) initial w ork, Stein (1965) sketc hed the first conditions of what would b e called later Stein’s theorem for in v arian t domains. The part which is common to the differen t “Stein’s theorems” is the follo wing: If a domain R ( x ) ⊂ Θ satisfies ¯ g R ( x ) = R g ( x ) with ¯ g R ( x ) = { ¯ g θ | θ ∈ R ( x ) } , then under [some invarianc e assumptions], Pr θ ∈ R ( x ) x = c ∀ x ∈ X (Bayesian pr ob ability) and Pr θ ∈ R ( x ) θ = c ∀ θ ∈ Θ (fr e quentist pr ob ability) One of the simplest set of assumptions found since Stein (1965) is the one of Chang and Villegas (1986). It is relatively close to the one used for our results, presented in the section 2.2. Our result, mainly holding in the theorem 1, is not a consequence of Stein’s theorem because the domain R ( x ) ⊂ Θ is not inv ariant in our case. R ( x ) would be inv arian t only if θ 0 w as in v arian t under the transformations group G , i.e. if ¯ gθ 0 = θ 0 for all ¯ g (this is equiv alent to assuming that H 0 is inv arian t under G ). But in the theorem 2, expressed and prov ed in the app endix 2 and used in the app endix 3 to prov e the theorem 1, φ θ is assumed to b e one-to-one for all θ ∈ Θ , which implies that ¯ g θ 0 = θ 0 is equiv alen t to ¯ g = e (identit y function). So the domain R ( x ) ⊂ Θ is not in v arian t in our case and Stein’s theorem do es not imply the reconciliation result presented in the section 2.2. The theorem 1 do es not answer the previous question, but rather relaxes the form of the domain and accepts a pro cedure that v aries according to the observed dataset x 0 and the v alue of the parameter θ 0 under H 0 . It answ ers to the question: “Under what assumptions and for what domains R and C do es the following equality hold?” Z R ( x 0 ,θ 0 ) ⊂ Θ dθ π ( θ | x 0 ) = Z C ( x 0 ,θ 0 ) ⊂X dx p ( x | θ 0 ) (10) The domains found tak e the form R ( x 0 , θ 0 ) = { θ | p ( x 0 | θ 0 ) ≤ p ( x 0 | θ ) } C ( x 0 , θ 0 ) = { x | p ( x 0 | θ 0 ) f ( x 0 ) ≤ p ( x | θ 0 ) f ( x ) } where f ( x ) is some weigh ting function, actually giv en b y the in verse of the left prior induced b y the underlying group. 3. PLR for composite vs composite hypotheses testing Up to this section, the PLR has b een only defined in the simple ( H 0 : θ = θ 0 ) vs comp osite case, ie according to Dempster (1973)’s first definition. F or the more general h yp othesis H 0 : Ψ( θ ) = ψ 0 presen ted at the end of the section 1.1, Demp- ster’s approach has b een generalized by Aitkin (1997), with a mo dification presen ted by Aitkin (2010). Namely , Aitkin (2010) prop oses to compute Pr p ( x | θ ) < p x | (Ψ , Λ) − 1 ( ψ 0 , Λ( θ )) x and details and illustrates some adv an tages of the metho d. In the case of Ψ( θ ) = θ , it corre- sp onds to Dempster’s definition (see page 42 of Aitkin (2010)). The approach of Ev ans (1997) also carries interesting prop erties. In particular, a v ariety of optimalit y prop erties for inferences based on relative b elief ratios are established in Ev ans et al. (2006), Ev ans and Shakhatreh (2008) and Ev ans and Jang (2011), whic h include optimal testing prop erties based on establishing a kind of Ba y esian version of the Neyman-P earson lemma. Ho w ever, the hypotheses test case on which they rely is not broad enough for many cases. The purp ose of this section is to extend the definition of the PLR to the classical comp osite vs comp osite h yp otheses test. Supp ose the data mo dels related to the tw o hypotheses b elong to the same parametric family P Θ = { p ( . | θ ) , θ ∈ Θ } . This assumption can actually be realized for an y hypotheses test of parametric models b y merging the tested parametric families in a so-called sup er-mo del . A GENERALIZA TIONS RELA TED TO THE POSTERIOR DISTRIBUTION OF THE LR 10 comp osite vs comp osite hypotheses test consists in choosing among H 0 : θ ∈ Θ 0 H 1 : θ ∈ Θ 1 (11) for an y domains Θ 0 and Θ 1 . W e note Π 0 ( . ) and Π 1 ( . ) the prior distributions o v er Θ 0 and Θ 1 . In this section w e propose t w o extensions of Dempster’s approach for this test case. The first extension prop osed can b e used when the prior under one hypothesis is improp er but b oth p osteriors are proper. The second extension, made of t wo symmetrical probabilities, is the statistics suggested by a new Bay esian-type Neyman-Pearson lemma which also indicates that the LR is a cen tral discrepancy v ariable. 3.1. Extensions of the PLR. In the simple Θ 0 = { θ 0 } vs comp osite hypotheses test, the PLR w as primarly defined as PLR ( x, ζ ) = Z { θ 1 | p ( x | θ 1 ) <ζ p ( x | θ 0 ) } Π 1 ( dθ 1 | x ) In the comp osite vs comp osite hypotheses test, a first interesting extension of this concept consists in defining the follo wing statistics: PLR 01 ( x, ζ ) = Z { ( θ 0 ,θ 1 ) | p ( x | θ 0 ) <ζ p ( x | θ 1 ) } Π 0 ( dθ 1 | x )Π 1 ( dθ 0 | x ) (12) It is w ell defined as so on as the p osterior distributions are b oth prop er. Since only x is known, the ev ent p ( x | θ 0 ) < ζ p ( x | θ 1 ) can b e measured only b y in tegrating ov er all θ 0 ∈ Θ 0 and all θ 1 Θ 1 . Here we decide to measure it according to the p osterior distribution of θ 0 times the p osterior distribution of θ 1 , whic h is p erfectly allo w ed. A second in teresting extension of the simple PLR consists in defining the t wo symmetrical follo wing statistics: PLR 0 ( x, ζ ) = Z { ( θ 0 ,θ 1 ) | p ( x | θ 1 ) <ζ p ( x | θ 0 ) } Π 0 ( dθ 0 | x )Π 1 ( dθ 1 ) (13) PLR 1 ( x, ζ ) = Z { ( θ 0 ,θ 1 ) | p ( x | θ 0 ) <ζ p ( x | θ 1 ) } Π 1 ( dθ 1 | x )Π 0 ( dθ 0 ) (14) In the simple vs comp osite test, note that only PLR 01 ( x, ζ ) and PLR 1 ( x, ζ ) are equal to the PLR as defined by Dempster (1973) and can th us b e considered as extensions of the PLR. Ho w ever, giv en the symmetry of the t wo h yp otheses in a comp osite vs comp osite test, the notation PLR 0 ( x, ζ ) will be also necessary in the sequel. Eac h quantit y has its own definition, in terpretation, prop erties and field of use. W e don’t in v estigate interpretation far here, and rather focus on unquestionable prop erties and results. PLR 01 ( x, ζ ) is the only extension of the tw o whic h allows for using improp er priors. It will b e illustrated in the next subsection to test a practical precipitation c hange, whic h requires the use of a prior whic h is to o smo oth for the other extension to b e used. On the other side, the statistics PLR 1 ( x, 1) is the exp ectation o ver the prior under H 0 of the p osterior probability under H 1 that the likelihoo d of θ 0 is less than the likelihoo d of θ 1 , and recipro cally . PLR 1 ( x, ζ ) = E 0 [ Pr 1 ( p ( x | θ 0 ) < ζ p ( x | θ 1 ) | x )] PLR 0 and PLR 1 will app ear as statistics emerging from a more general frame through a Ba y esian-type Neyman-Pearson lemma. Extending the in terpretation of the new PLRs in terms of joint probabilities requires the definition of a measure ov er Θ 0 × Θ 1 giv en x and one of the tw o hypotheses. Suc h a measure seems to mak e sense in terms of b oth mathematics and interpretation but the issue needs to b e deep ened. Remark 1. If al l subsets define d on the sets Θ 0 × X | H 0 and Θ 1 | H 0 ar e indep endent, then the joint me asur e Π 01 , 0 define d over Θ 0 × Θ 1 × X | H 0 is e qual to: Π 01 , 0 ( dθ 0 , dθ 1 | x ) = Π 0 ( dθ 0 | x )Π 1 ( dθ 1 ) GENERALIZA TIONS RELA TED TO THE POSTERIOR DISTRIBUTION OF THE LR 11 Figure 1. Log-likelihoo d of the dataset under H 0 (left figure) and the dataset under H 1 (cen ter and right figures). Some frequentist estimations of the parame- ters (circles) are sup erimp osed on the true parameters v alues (diamond). for infinitesimal subsets ar ound any ( θ 0 , θ 1 ) ∈ Θ 0 × Θ 1 . The same holds when r eplacing the r oles of H 0 and H 1 , and le ads to the me asur e Π 01 , 1 : Π 01 , 1 ( dθ 0 , dθ 1 | x ) = Π 0 ( dθ 0 )Π 1 ( dθ 1 | x ) The pro of of the remark stands in the app endix 4. So if w e assume that the joint measures exist and that the priors and p osteriors are all prop er, then the comp osite PLRs defined in the equations (13) and (14) are probabilit y measures. 3.2. Example: detection of a c hange in precipitation in Switzerland. Let’s illustrate PLR 01 defined in the equation (12). Although the change in temp erature in the 20th cen tury is evident at a w orld scale and in some areas, a p otential c hange in precipitation remains under study . As a simple case, let’s consider a single weather station in Switzerland and test whether the statistical prop erties of the rain frequency ha v e changed. As recalled for example by Aksoy (2000), daily precipitation amounts are well describ ed by a gamma distribution, characterized by a shap e parameter a and a rate parameter b . Assume the daily rainfalls x 1 fallen during the five first automns of the 20th century are i.i.d. with parameters a 1 and b 1 , as w ell as x 2 during the five last automns with parameters a 2 and b 2 . The detection of a statistical change consists in testing whether the set of parameters are equal or not: H 0 : ( a 1 , b 1 ) = ( a 2 , b 2 ) H 1 : ( a 1 , b 1 ) 6 = ( a 2 , b 2 ) (15) Note that the dimension of Θ 0 = R 2 + ∗ is less than the dimension of Θ 1 = R 4 + ∗ , so that for a regular prior under H 1 , Pr 1 ( θ ∈ Θ 0 ) = 0 . Borges and Stern (2007) are particularly in terested by the b eha vior of the e-v alue of the FBST in such cases. Here it simply means that there is one prior π ( a, b ) under H 0 and the pro duct of tw o priors π ( a 1 , b 1 ) × π ( a 2 , b 2 ) under H 1 , to b e combi ned resp ectively with the likelihoo d p ( x 1 , x 2 | a, b ) under H 0 and the likelihoo d p ( x 1 | a 1 , b 1 ) × p ( x 2 | a 2 , b 2 ) under H 1 . T o enable simple simulations of the p osterior distributions under b oth h yp otheses, the conju- gate prior (see the comp edium by Fink (1997)) of the gamma distribution developed by Miller (1980) is used for π , with hyperparameters that may v ary without affecting muc h the final results. The impact of the prior on the PLR is easy to see from the PLR displa y as will b e explained very shortly . In practice, the prior π is almost improp er so that only the PLR 01 defined in equation (12) can b e used. First, simulations roughly corresp onding to the observed rainfall are p erformed. One dataset is simulated under H 0 and another is simulated under some reasonably similar alternativ e H 1 . The t w o simulated datasets are c haracterized by their lik eliho o ds, display ed on the figure 1. The p osterior distribution of eac h couple ( a, b ) , ( a 1 , b 1 ) and ( a 2 , b 2 ) is separately sampled b y a MCMC m ultiv ariate slice sampling algorithm (Radford (2003)) implemen ted in the R pack age GENERALIZA TIONS RELA TED TO THE POSTERIOR DISTRIBUTION OF THE LR 12 Figure 2. PLR obtained from the dataset simulated under H 0 (left) and the dataset simulated under H 1 (righ t). In practice those are simply the empirical cum ulativ e distributions of the LR ( x, θ [ i ] ) chains. F or the H 0 dataset the PLR clearly correctly accepts H 0 , and recipro cally for the H 1 dataset the PLR clearly correctly rejects H 0 : notice the difference of the x axes scales b etw een b oth sim- ulation cases. “SamplerCompare” kindly written and provided b y Thompson (2012). The PLR is simply com- puted by ordering the LR obtained for all p ossible combinations of parameters and counting the fraction whic h is less than some threshold ζ chosen according to the lev el of evidence w an ted in fa v or of H 0 or H 1 . In practice, the PLR is display ed as a function of ζ by simply displaying the empirical cumulativ e distribution of the LRs. This leads to the figure 2. It can b e read for example that for the dataset under H 0 , PLR 01 ( x, 0 . 1) = 0 . 08 , which means that there is an almost null probabilit y that the likelihoo d under H 1 is more than 10 times greater than the lik eliho o d under H 0 , so that H 0 is (correctly) clearly accepted. Alternatively , for the H 1 dataset, PLR 01 ( x, 0 . 1) = 1 . 00 , meaning that there is a probabit y one that the likelihoo d under H 1 is more than 10 times greater than the lik eliho o d under H 0 , so that H 0 is (correctly) clearly rejected. Note that since the GLR indicates the lo w er b ound of the supp ort of the PLR and since the slop e of the PLR is infinite there if the likelihoo d function is smo oth enough at its maximum (see section 1.2), the prior exact expression only affects the wa y PLR 01 ( x, ζ ) increases as ζ departs from GLR ( x ) . Here for example the c hoice of the hyperparameters (among a domain considered as r e asonable ) do es not change the conclusion that would b e drawn from the PLR display ed on the figure 2. Switc hing to the true dataset x , the PLR is obtained follo wing the same pro cedure as with the sim ulated datasets. PLR 01 ( x, ζ ) is display ed on the figure 3. The graph is –by construction of the simulations– very similar the one obtained for the graph obtained with the data simulated under H 0 . No w, PLR 01 ( x, 0 . 1) = 0 . 10 and H 0 can clearly not b e rejected, so that no change in the 20th precipitation in Switzerland is detected, whic h is not surprising to climatologists. 3.3. Ba yesian type Neyman-Pearson lemma. In the c hoice of an hypothesis, instead of considering the subset (16) R ∗ ( x ) = { ( θ 0 , θ 1 ) | p ( x | θ 0 ) < ζ p ( x | θ 1 ) } one might consider any subset R ( x ) ⊂ Θ 0 × Θ 1 , that may dep end on x . Such a subset could in v olve a discrepancy v ariable D : X × Θ 7→ R like in the predictive p-v alue highligh ted by Meng (1994), and take the form “ D ( x, θ 0 ) < ζ D ( x, θ 1 ) ”. The discrepancy v ariable that app ears in the PLR is LR ( x, θ ) . R ∗ ( x ) defined from the LR test is in teresting for h yp otheses testing because this set is a someho w classical hyp othesis rejection set. It is not a fully classical rejection set b ecause it is defined on the parameter space rather than on the observ ation space, but its c haracterization GENERALIZA TIONS RELA TED TO THE POSTERIOR DISTRIBUTION OF THE LR 13 Figure 3. PLR obtained from daily automnal precipitation observ ed in a w eather station in Switzerland from 1900-1905 for the first part of the dataset and 1995-2000 for the second part of the dataset. H 0 cannot be rejected, so that no c hange in precipitation is detected. is optimal in the frequentist setting. R ∗ ( x ) is the set, dep ending on the dataset x , of all fixed ( θ 0 , θ 1 ) ∈ Θ 0 × Θ 1 suc h that the likelihoo d of θ 0 is less than the likelihoo d of θ 1 , which reasonably leads to reject H 0 for this element ( θ 0 , θ 1 ) . The same w ay , one can replace the LR test by any test, ie consider any subset R ( x ) ⊂ Θ 0 × Θ 1 suc h that for ( θ 0 , θ 1 ) ∈ R ( x ) , H 0 w ould be decided to b e rejected. With such a phrasing, it may app ear natural that the frequentist Neyman-Pearson lemma can b e derived in a recipro cal, somehow Bay esian, frame. Note that the Neyman-Pearson lemma can b e expressed, as will b e the prop osition here, symmetrically in the t w o hypotheses. The symmetry is only broken when adopting the Neyman paradigm which fixes a level for the PF A and deduce the corresp onding ζ (see section 1.1). T o rederiv e a Neyman-Pearson lemma one would define the recipro cal notions of “Probability of F alse Alarm” and “Probability of goo d Detection”: PF A B ( R , x ) = Z R ( x ) Π 0 ( dθ 0 | x )Π 1 ( dθ 1 ) (17) PD B ( R , x ) = Z R ( x ) Π 1 ( dθ 1 | x )Π 0 ( dθ 0 ) (18) These quantities would also define probabilit y measures if the joint measures exist and if the priors and p osteriors are all prop er. Note that these measures can b e related to a joint measure with no conditioning ov er the h yp othesis: for any set R ( x ) ⊂ Θ 0 × Θ 1 ev en tually dep ending on x , Pr ( R , x ) = Pr ( H 0 ) Pr ( R| x, H 0 ) + Pr ( H 1 ) Pr ( R| x, H 1 ) with Pr ( R| x, H i ) = Z R Π 01 ,i ( dθ 0 , dθ 1 | H i , x ) = Z R Π i ( dθ i | x )Π j ( dθ j ) so Pr ( R| x ) = Pr ( H 0 ) Z R Π 0 ( dθ 0 | x )Π 1 ( dθ 1 ) + Pr ( H 1 ) Z R Π 1 ( dθ 1 | x )Π 0 ( dθ 0 ) = Pr ( H 0 ) PF A 0 ( R , x ) + Pr ( H 1 ) PD 1 ( R , x ) GENERALIZA TIONS RELA TED TO THE POSTERIOR DISTRIBUTION OF THE LR 14 The Ba yesian type probabilities PF A B and PD B add up the same wa y t yp e I and t yp e I I probabilit y errors add up in a frequentist integral. Note also that PF A B ( ¯ R , x ) = 1 − PF A B ( R , x ) where ¯ R ( x ) is the set complemen tary to R ( x ) in Θ 0 × Θ 1 . F ollowing the underlying idea of the Neyman-P earson approac h, a p ossibility for choosing R ( x ) consists in maximizing PD B ( R , x ) o v er R ( x ) for a fixed PF A B ( R , x ) . Prop osition 1. The subset that maximizes PD B ( R , x ) for a fixe d value of PF A B ( R , x ) is e qual to the LR subset R ∗ ( x ) define d in e quation (16). In this c ase, the “Bayesian PF A and PD” ar e given by PF A B ( R ∗ , x ) = 1 − PLR 0 ( x, ζ ) and PD B ( R ∗ , x ) = PLR 1 ( x, ζ ) . R e cipr o c al ly, the subset that maximizes PF A B ( R , x ) for a fixe d value of PD B ( R , x ) is e qual to ¯ R ∗ ( x ) , ie the set which ac c epts H 0 ac c or ding to the LR test. In this c ase, PF A B ( ¯ R ∗ , x ) = PLR 0 ( x, ζ ) and PD B ( ¯ R ∗ , x ) = 1 − PLR 1 ( x, ζ ) . As p ostdata measures (i.e. dep ending on the observ ed data), contrary to the predata frequen- tist PF A and PD, it is therefore informative enough to give PLR 0 ( x, ζ ) and PLR 1 ( x, ζ ) for some v alue ζ of interest. But this is only p ossible if the priors and p osteriors under b oth hypotheses are prop er. The pro of of the prop osition follo ws the pro of of the Neyman-P earson lemma restricted to deterministic tests. It stands in the app endix 5. 4. Concluding general discussion about the PLR The PLR introduced by Dempster (1973) in the simple vs comp osite hypotheses test deserv es m uc h attention. It compares the original lik eliho o ds p ( x | θ 0 ) and p ( x | θ 1 ) b y computing the p oste- rior probability that this usual LR test c ho oses H 0 or H 1 . The PLR is simple, nicely interpretable and coupled with some deep prop erties. Compared to the classical Bay esian hypotheses tests, first note that unlike the BF, the PLR can be defined even for improp er priors, and unlike Pr ( H 0 | x ) it do es not require the delicate c hoice of some Pr ( H 0 ) . This is crucial in practice as w ell as in fundamen tal issues like Lindley’s parado x. The PLR also turns out to b e a very natural alternative to the BF in many asp ects. The PLR first compares (the original likelihoo ds) and then integrates, whereas the BF first integrates and then compares (the marginal likelihoo ds). In the simple vs comp osite h yp otheses test, considering LR ( x, θ ) as a random v ariable for a fixed x , th e PLR is its p osterior cumulativ e distribution (i.e. the probability of a one sided cr e dible interval ) whereas the BF is its p osterior mean p oint estimate . This credible in terv al vs p oint estimate duality b et w een the PLR and the BF also translates in decision theory: Hwang et al. (1992) stressed that Pr ( H 0 | x ) do es not measure evidence, since this is done only through the lik eliho o d, but measures the accuracy of a test b y estimating the indicator function I Θ 0 ( θ ) . Also note that being the measure of a credible in terv al, the PLR is also a natural hypotheses test to ol which connects p ostdata (i.e. conditioned up on x ) hypotheses testing and credible in terv al inference. This formal equiv alence w as known to hold for predata inference (a rejection set is equiv alen t to a confidence in terv al) and “known” not to hold for p ostdata inference for usual Bay esian to ols (see Lehmann and Romano (2005) and Goutis and Casella (1997)). T o ols lik e the PLR set up this connection. Ho w ever, when generalizing the PLR in the section 3.1, most of these dual prop erties cannot b e generalized to the comp osite vs comp osite h yp otheses test. Instead, a recipro city b etw een the PLR and the BF exists through a Neyman-Pearson lemma p ersp ectiv e. The second extension of the PLR has b een shown in the section 3.3 to b e a somehow optimal measure, in that it measures the set that maximizes PD B for a fixed PF A B (Ba y esian-type version of the frequentist Neyman-P earson lemma). Recipro cally , the BF giv es a somehow optimal measure, although in the fr e quentist Neyman-Pearson sense, in that it maximizes the a verage ov er π 1 of PD ( θ 1 ) for a fixed PF A (frequen tist classical Neyman-Pearson lemma but for the marginal lik eliho o d and not the original unkno wn one). In the simple vs composite hypotheses test, the connection b et w een the PLR (related to credible in terv al) and the BF (related to point estimate) has b een underlined. Another imp ortan t GENERALIZA TIONS RELA TED TO THE POSTERIOR DISTRIBUTION OF THE LR 15 connection lies b etw een frequentist and Bay esian t yp e hypotheses tests, namely frequen tist p- v alues and Pr ( H 0 | x ) or PLR. This reconciliation quest has b een the sub ject of many debates, including Lindley’s paradox in its most simple form (test of the mean of a Gaussian with a uniform prior), which has only b een simply reac hed by the PLR by Dempster (1973). In the section 2.2 w e hav e generalized this reconciliation result to a quite general inv arian t frame, close to the one used in Stein’s theorem, i.e. in a frame under which confidence and credible interv als are equiv alen t. Note that in v ariance is also a persp ectiv e adopted to develop and ev aluate inferences, and in particular to develop new p-v alues as done recently by Ev ans and Jang (2010) for example. F or the PLR, standard simple inv ariance prop erties directly follows from the simple use of the lik eliho o ds. T o conclude on the contribution of this pap er, the equiv alence b etw een the PLR and a p-v alue has b een pro v ed in a general inv arian t frame, which nicely connects to the equiv alence b et w een confidence and credible domains. This result ma y contribute to a b etter understanding of deep and fundamen tal issues related to b oth hypotheses testing and parameter estimation, in b oth frequen tist and Ba yesian paradigms. References Aitkin, M. (1997). The calibration of p-v alues, p osterior Bay es factors and the AIC from the p osterior distribution of the likelihoo d. Statistics and Computing , 7:253–261. Aitkin, M. (2010). Statistic al infer enc e: an inte gr ate d Bayesian / likeliho o d appr o ach . Chapman and Hall. Aitkin, M., Boys, R. J., and Chadwick, T. (2005). Bay esian p oint n ull hypothesis testing via the p osterior lik eliho o d ratio. Statistics and Computing , 25(3):217–230. Aitkin, M., Liu, C. C., and Chadwic k, T. (2009). Ba y esian mo del comparison and mo del av er- aging for small-area estimation. Annals of Applie d Statistics , 3(1):199–221. Akso y , H. (2000). Use of gamma distribution in hydrological analysis. T urk. J. Engin. Envir on. Sci. , 24:419–428. Baskurt, Z. and Ev ans, M. (2013). Hyp othesis assessmen t and inequalities for Bay es factors and relativ e b elief ratios. Bayesian Analysis , 8,3:569–590. Berger, J. and Sellke, T. (1987). T esting a p oint n ull hypothesis: the irreconcilability of P v alues and evidence (with discussion). Journal of the Americ an Statistic al Asso ciation , 82:112–139. Berger, J. O. (1985). Statistic al de cision the ory and Bayesian analysis . Springer-V erlag, 2nd edition. Berger, J. O., Brown, L., and W olpert, R. (1994). A unified conditional frequentist and Bay esian test for fixed and sequen tial simple h yp othesis testing. Annals of Statistics , 22(4):1787–1807. Berger, J. O. and Delampady , M. (1987). T esting precise h yp otheses (with discussion). Statistic al Scienc e , 2(3):317–335. Bernardo, J. (2011). Bayesian Statistics 9 , chapter Integrated ob jectiv e Bay esian estimation and h yp othesis testing. Oxford Universit y Press. Birn baum, A. (1962). On the foundation of statistical inference (with discussion). Journal of the A meric an Statistic al Asso ciation , 57(298):269–326. Borges, W. and Stern, J. (2007). The rules of logic comp osition for the Ba y esian epistemic e-v alues. L o gic journal of the IGPL , 15(5–6):401–420. Casella, G. and Berger, R. L. (1987). Reconciling Ba yesian and frequen tist evidence in the one-sided testing problem. Journal of the Americ an Statistic al Asso ciation , 82(397):106–111. Chang, T. and Villegas, C. (1986). On a theorem of Stein relating Ba y esian and classical infer- ences in group mo dels. The Canadian Journal of Statistics , 14(4):289–296. Darmois, G. (1935). Sur les lois de probabilité à estimation exhaustiv e. Compte-R endu de l’A c adémie des Scienc es de Paris , 200(1265–1266). Dempster, A. P . (1973). The direct use of likelihoo d for significance testing. In Pr o c e e dings of Confer enc e on F oundational Questions in Statistic al Infer enc e , pages 335–354, Aaarh us, Denmark. GENERALIZA TIONS RELA TED TO THE POSTERIOR DISTRIBUTION OF THE LR 16 Dempster, A. P . (1997). Commentary on the pap er b y Murray Aitkin, and on discussion b y Mervyn Stone. Statistics and Computing , 7(4):265–269. Eaton, M. (1989). Gr oup invarianc e applic ations in Statistics . Regional Conf. Series in Prob. and Stat. Eaton, M. (2007). Multivariate statistics . Institute of Mathematical Statistics. Eaton, M. and Sudderth, W. (1999). Consistency and strong inconsistency of group-inv arian t predictiv e inferences. Bernoul li , 5(5):833–854. Eaton, M. and Sudderth, W. (2002). Group in v arian t inference and right Haar measure. Journal of Statistic al planning and infer enc e , 103(1–2):87–99. Ev ans, M. (1997). Ba y esian inference pro cedures derived via the concept of relativ e surprise. Communic ations in Statistics , 26:1125–1143. Ev ans, M., Guttman, I., and Swartz, T. (2006). Optimality and computations for relative surprise inferences. Canadian Journal of Statistics , 34(1):113–129. Ev ans, M. and Jang, G. (2010). In v arian t p-v alues for mo del c hec king. Annals of Statistics , 38(1):512–525. Ev ans, M. and Jang, G. (2011). Inferences from prior-based loss functions. T echnical Rep ort 1104, Dept. of Statistics, U. of T oronto. Ev ans, M. and Shakhatreh, M. (2008). Optimal prop erties of some Ba y esian inferences. Ele ctr onic Journal of Statistics , 2:1268–1280. Fink, D. (1997). A comp endium of conjugate priors. T echnical rep ort, Mon tana State Universit y . Fisher, R. A. (1973, 1st ed.: 1956). Statistic al metho ds and scientific infer enc e . Oliver and Boyd, 3rd edition. F raser, D. A. S. (1961). The fiducial metho d and in v ariance. Biometrika , 48(3–4):261–280. Goutis, C. and Casella, G. (1997). Relationships b etw een p ost-data accuracy measures. Annals of the Institute of Statistic al Mathematics , 49(4):711–726. Hw ang, J., Casella, G., Rob ert, C., W ells, M., and F arrell, R. (1992). Estimation of accuracy in testing. Annals of Statistics , 20(1):490–509. Jeffreys, H. (1961). The ory of pr ob ability . Oxford Universit y Press, 3rd edition. Lehmann, E. L. and Romano, J. P . (2005). T esting statistic al hyp otheses . Springer, 3rd edition. Lindley , D. (1957). A statistical paradox. Biometrika , 44(1–2):187–192. Meng, X.-L. (1994). Posterior predictive p-v alues. Annals of Statistics , 22(3):1142–1160. Miller, R. (1980). Ba yesian analysis of the tw o-parameter Gamma distribution. T e chnometrics , 22(1):65–69. Nac h bin, L. (1965). The Haar inte gr al . V an Nostrand. Newton, M. and Raftery , A. (1994). Approximate Bay esian inference with the weigh ted lik eliho o d b o otstrap. Journal of the R oyal Statistic al So ciety Series B , 56(1):3–48. Neyman, J. (1977). F requentist probabilit y and frequentist statistics. Synthese , 36:97–131. Neyman, J. and Pearson, E. (1933). On the problem of the most efficient tests of statistical h yp otheses. Philosophic al T r ansactions of the R oyal So ciety of L ondon, Series A , 231:289– 337. Oh, H. and DasGupta, A. (1999). Comparison of the p-v alue and p osterior probabilit y . Journal of Statistic al planning and infer enc e , 76(1–2):93–107. O’Hagan, A. (1995). F ractional Ba y es factors for mo del comparison. Journal of the R oyal Statistic al So ciety , 57(1):99–138. P ereira, C. and Stern, J. (1999). Evidence and credibilit y: full Bay esian significance test for precise h yp otheses. Entr opy , 1:104–115. Radford, N. (2003). Slice sampling. Annals of Statistics , 31(3):705–767. Rob ert, C. P . (2007). The Bayesian choic e . Springer, 2nd edition. Robins, J., v an der V aart, A., and V en tura, V. (2000). Asymptotic distribution of p-v alues in comp osite n ull mo dels. Journal of the Americ an Statistic al Asso ciation , 95(452):1143–1156. Ro y all, R. (1997). Statistic al evidenc e: a likeliho o d p ar adigm . Chapman and Hall / CRC Press. GENERALIZA TIONS RELA TED TO THE POSTERIOR DISTRIBUTION OF THE LR 17 Sellk e, T., Bay arri, M. J., and Berger, J. O. (2001). Calibration of p-v alues for testing precise n ull h yp otheses. Americ an Statistician , 55(1):62—71. Smith, I. (2010). Déte ction d’une sour c e faible : mo dèles et métho des statistiques. Applic ation à la déte ction d’exoplanètes p ar imagerie dir e cte. PhD thesis, Université de Nice Sophia-Antipolis. Smith, I. and F errari, A. (2010). The p osterior distribution of the likelihoo d ratio as a measure of evidence. In Maxent . Smith, I. and F errari, A. (2014). Equiv alence b et w een the p osterior distribution of the lik eliho o d ratio and a p-v alue in an in v ariant frame. Bayesian Analysis . Stein, C. (1965). Approximation of improp er prior measures by prior probabilit y measures. In Bernoul li, Bayes, L aplac e F estschrift , pages 217–240. Springer-V erlag. Thompson, M. (2012). R p ackage “ SamplerComp ar e” . T sao, C. A. (2006). A note on Lindley’s parado x. T est , 15(1):125–139. Villegas, C. (1981). Inner statistical inference I I. Annals of Statistics , 9(4):768–776. Zidek, J. (1969). A represen tation of Ba y esian inv arian t pro cedures in terms of Haar measure. A nnals of the Institute of Statistic al Mathematics , 21(1):291–308. Appendix 1: Introduction to inv ariance in st a tistics F or a lo cally compact Hausdorff group G , K ( G ) denotes the class of all con tinuous real-v alued functions on G that hav e compact supp ort. The left inv arian t Haar measure on G is defined as a Radon measure H l suc h that for all f ∈ K ( G ) and all g 0 ∈ G , Z G f ( g ) H l ( dg ) = Z G f ( g 0 g ) H l ( dg ) The right in v ariant Haar measure H r on G is defined as H l but replacing g 0 g b y g g 0 . F or a given group, b oth Haar measures exist and are unique up to multiplicativ e constan ts. The (righ t) mo dulus ∆ of G is the real p ositive v alued function suc h that if H l is a left in v arian t Haar measure, then for all f ∈ K ( G ) and all g 0 ∈ G , Z f ( g g − 1 0 ) H l ( dg ) = ∆( g 0 ) Z f ( g ) H l ( dg ) (19) F rom the unicity of the Haar measure, ∆ do es not dep end on the choice of H l and is a con tinuous function such that for all g 1 , g 2 ∈ G , ∆( g 1 g 2 ) = ∆( g 1 )∆( g 2 ) , which implies that ∆( g − 1 ) = ∆( g ) − 1 . Note that for a group G the set of all right Haar measures is equal to the set of the left Haar measures if and only if ∆ is identically equal to 1. This o ccurs for example when G is compact or comm utativ e. Concerning the Haar measures on the group G , the initial definitions and prop erties imply that if H l is a left in v arian t Haar measure on G and ∆ the mo dulus of G then for all f ∈ K ( G ) Z f ( g − 1 ) H l ( dg ) = Z f ( g )∆( g ) − 1 H l ( dg ) (20) The mo dulus also enables to relate right and left inv arian t Haar measures. F rom the last prop erty , the measure defined b y H r ( dg ) = ∆( g ) − 1 H l ( dg ) (21) is a right in v arian t Haar measure on G . The same wa y , if H r is a right in v arian t Haar measure on G , then the measure defined by H l ( dg ) = ∆( g ) H r ( dg ) is a left inv arian t Haar measure. The Haar measure is applied to statistics through the concept of inv ariance of a data mo del under a group of transformations. A parametric family P Θ = p ( . | θ ) , θ ∈ Θ of densities with resp ect to any measure µ on X is said to b e in v arian t under the transformations group G if for each g ∈ G there exists a unique θ ∗ ∈ Θ such that if the distribution of X has the density p ( . | θ ) ∈ P Θ then Y = g X has the density p ( . | θ ∗ ) ∈ P Θ . This prop erty defines the action of G on Θ : θ ∗ ma y simply b e denoted θ ∗ = ¯ g θ where { ¯ g , g ∈ G } defines a group. GENERALIZA TIONS RELA TED TO THE POSTERIOR DISTRIBUTION OF THE LR 18 A measure µ on X is said to b e relatively inv ariant with multiplier χ under the group G if for all f ∈ K ( X ) and g ∈ G Z f ( x ) µ ( dx ) = χ ( g ) Z f ( g x ) µ ( dx ) (22) If w e assume that b oth the family of densities and the measure µ are resp ectiv ely inv arian t and relativ ely inv ariant, schematic ally we get p ( x | θ ) = χ ( g ) p ( g x | g θ ) for all x ∈ X , θ ∈ Θ and g ∈ G . F or more ab out the connection b et w een such a m ultiplier and the Jacobian of the transformation that leads to g x from x , see for example Berger (1985) or Eaton (2007). Note that the theorem 2 could b e formulated differen tly , by defining the inv ariance of a probability mo del, but this phrasing is less common than the inv ariance of a family of probability densities and this would ha v e entailed a longer presen tation. T o shorten the preliminaries and without assuming any knowledge ab out group theory , we will not refer to group prop erties like transitivit y , orbits... and will concretely simply assume that Θ and G are isomorphic. More precisely , we will assume that the transformation φ θ : G 7→ Θ with φ θ ( g ) = g θ is one-to-one whatever θ ∈ Θ . The right Haar prior on Θ is to b e induced from the right Haar measure H r on G and the action of G on Θ . F rom the frame chosen, the righ t Haar prior Π r a is simply defined b y Π r a = H r ( φ − 1 a ) , with a ∈ Θ . As sho wn in Villegas (1981), it turns out that the measure Π r a actually do es not dep end on a . The induced prior is therefore unique for a fixed H r and noted Π r . Π r = H r ( φ − 1 a ) means that for any measurable subset A ⊂ Θ , Π r ( A ) = H r φ − 1 a A with φ − 1 a A = φ − 1 a θ | θ ∈ A . Note that a subset A = dθ denotes an infinitesimal subset centered around θ , where θ is implicit. Π r can b e normalized in to a probability measure if and only if the group G is compact, and in this case w e can go back to the usual notation Π r ( A ) = Pr ( θ ∈ A ) where the measure Π r is implicit in Pr ( . ) . Finally , from the data mo del densit y p ( . | θ ) and the prior Π r , the p osterior measure Π r x on Θ is classically defined b y Π r x ( B ) = R B p ( x | θ )Π r ( dθ ) m ( x ) for all B ⊂ Θ (23) with m ( x ) = Z p ( x | θ )Π r ( dθ ) where the marginal m ( x ) density of x is alw ays assumed to be finite, so that Π r x defines a probabilit y measure even if Π r do es not. Then the p osterior probabilit y of an even t is denoted b y Pr ( . | x ) , meaning Pr ( θ ∈ B | x ) = Π r x ( B ) . Appendix 2: General theorem and its proof Theorem 2. Cal l P Θ = { p ( . | θ ) , θ ∈ Θ } a family of pr ob ability densities with r esp e ct to a me asu r e µ r on X , sp e cifie d later, and c al l G a gr oup acting on X . Assume that P Θ is invariant under the action of the gr oup G on X and note ¯ g θ the induc e d action of the element g ∈ G on the element θ ∈ Θ . Cal l H r any right Haar me asur e of G and define the tr ansformations φ θ (for θ ∈ Θ ) and φ x (for x ∈ X ) by (24) φ θ : ¯ G 7→ Θ ¯ g 7→ ¯ gθ φ x : G 7→ X g 7→ g x Assume that (1) φ θ is one-to-one for al l θ ∈ Θ and φ x is one-to-one for al l x ∈ X . (2) The prior me asur e Π r on Θ is the me asur e induc e d by H r via φ θ and the me asur e µ r on X is the me asur e induc e d by H r via φ x : Π r = H r ( φ − 1 θ ) and µ r = H r ( φ − 1 x ) . (3) The mar ginal density of x is finite, so that the p osterior me asur e Π r x on Θ , classic al ly define d by the e quation (23), defines the p osterior pr ob ability Pr ( . | x 0 ) . GENERALIZA TIONS RELA TED TO THE POSTERIOR DISTRIBUTION OF THE LR 19 Then, the PLR define d by the e quation (4) c an b e r e expr esse d, for any ζ > 0 and any c ∈ X , as the fr e quentist inte gr al: PLR ( x 0 , ζ ) = Pr p ( x 0 | θ 0 )∆ φ − 1 x 0 c ≤ ζ p ( x | θ 0 )∆ φ − 1 x c | θ 0 (25) wher e ∆ is the mo dulus of the gr oup G , as define d in the e quation (19), and in pr actic e x 0 ∈ X is the observe d data and θ 0 ∈ Θ the p ar ameter value under the nul l hyp othesis. Note that as seen in the previous app endix, the measures µ and Π r defined in the theorem 2 do not dep end on the choice of θ ∈ Θ and x ∈ X in the functions φ θ and φ x . In order to clarify the pro of, we note a instead of x and b instead of θ in the following. W e shall make use of the follo wing lemma: Lemma 1. The me asur es µ on X and Π r on Θ induc e d ab ove by the right Haar me asur e H r on G ar e r elatively invariant with mo dulus ∆ − 1 . Pr o of. Z f ( g 0 x ) µ ( dx ) = Z f ( g 0 x ) H r φ − 1 a ( dx ) (Def. of µ in the Cond. of Th. 2) = Z f g 0 φ a g H r ( dg ) (transformation g = φ − 1 a x ) = Z f g 0 g a ∆( g ) − 1 H l ( dg ) (Def. of φ a and prop. eq. (21)) = ∆( g 0 ) Z f g 0 g a ∆( g 0 g ) − 1 H l ( dg ) (Multiplicity prop. of ∆ ) = ∆( g 0 ) Z f ( g a )∆( g ) − 1 H l ( dg ) ( H l left in v arian t) = ∆( g 0 ) Z f ( x ) µ ( dx ) (previous computation made in reverse order) This also implies that a Haar prior induced as in the theorem 2, i.e. from a right inv arian t Haar measure on G , is relatively inv arian t. PLR ( x 0 , ζ ) = Pr p ( x 0 | θ 0 ) ≤ ζ p ( x 0 | θ ) x 0 = 1 m ( x 0 ) Z θ | p ( x 0 | θ 0 ) ≤ ζ p ( x 0 | θ ) p ( x 0 | θ )Π r ( dθ ) (26) = 1 m ( x 0 ) Z θ | p ( x 0 | θ 0 ) ≤ ζ p ( x 0 | θ ) p ( x 0 | θ ) H r φ − 1 b ( dθ ) (Def. Π r in the Cond. of Th. 2) = 1 m ( x 0 ) Z g | p ( x 0 | θ 0 ) ≤ ζ p ( x 0 | φ b g ) p ( x 0 | φ b g ) H r ( dg ) ( g = φ − 1 b θ ) = 1 m ( x 0 ) Z g | p ( x 0 | θ 0 ) ≤ ζ p ( x 0 | g b ) p ( x 0 | g b ) H r ( dg ) (Def. φ θ eq. (24)) = 1 m ( x 0 ) Z g | p ( x 0 | θ 0 ) ≤ ζ p ( x 0 | g b ) p ( x 0 | g b )∆( g ) − 1 H l ( dg ) (Prop. eq. (21)) = 1 m ( x 0 ) Z g | p ( x 0 | θ 0 ) ≤ ζ p ( x 0 | g − 1 b ) p x 0 g − 1 b H l ( dg ) (Prop. eq. (20)) But according to lemma 1, µ is relativ ely inv ariant with mo dulus ∆ − 1 . Since the densit y family is in v arian t, p ( x | θ ) = ∆( g ) − 1 p ( g x | g θ ) for all x ∈ X , θ ∈ Θ , g ∈ G GENERALIZA TIONS RELA TED TO THE POSTERIOR DISTRIBUTION OF THE LR 20 i.e. p ( x | g − 1 θ 0 ) = ∆( g ) − 1 p ( g x | θ 0 ) for all x ∈ X , θ 0 ∈ Θ , g ∈ G Then, PLR ( x 0 , ζ ) = 1 m ( x 0 ) Z g : | p ( x 0 | θ 0 ) ≤ ζ p ( g x 0 | b )∆( g ) − 1 ∆( g ) − 1 p ( g x 0 | b ) H l ( dg ) = 1 m ( x 0 ) Z g | p ( x 0 | θ 0 ) ≤ ζ p ( g x 0 | b )∆( g ) − 1 p ( g x 0 | b ) H r ( dg ) (Prop. eq. (21)) = 1 m ( x 0 ) Z g | p ( x 0 | θ 0 ) ≤ ζ p ( g x 0 | b )∆( g ) − 1 p ( g x 0 | b ) µ φ a ( dg ) (Def. µ ) It can be noticed that the equation (26) dep ends neither on a ∈ X nor on b ∈ Θ . Cho ose no w for simplicit y a = x 0 . Then, making the transformation x = φ x 0 g = g x 0 , PLR ( x 0 , ζ ) = 1 m ( x 0 ) Z x | p ( x 0 | θ 0 ) ≤ ζ p ( x | b )∆ φ − 1 x 0 x − 1 p ( x | b ) µ ( dx ) By a similar computation we get the expression of the marginal density of X ev aluated at x 0 : m ( x 0 ) = Z p ( x 0 | θ )Π r ( dθ ) = Z p ( x | b ) µ ( dx ) = 1 The marginal density of X is constant, the same wa y the frequen tist risk of an inv arian t estimator do es not dep end on θ . So PLR ( x 0 , ζ ) = Z x | p ( x 0 | θ 0 ) ≤ ζ p ( x | b )∆ φ − 1 x 0 x − 1 p ( x | b ) µ ( dx ) In order to get a form closer to a p-v alue, we choose from no w b = θ 0 and note that for any c ∈ X , ∆ φ − 1 x 0 x = ∆ φ − 1 c x ∆ φ − 1 c x 0 (27) b ecause if we note g = φ − 1 x 0 x g 1 = φ − 1 c x g 2 = φ − 1 c x 0 then on one side g x 0 = x and on the other g 1 ( g − 1 2 x 0 ) = g 1 c = x so that g x 0 = ( g 1 g − 1 2 ) x 0 so φ x 0 g = φ x 0 ( g 1 g − 1 2 ) so g = g 1 g − 1 2 ( φ a is one-to-one) so ∆( g ) = ∆( g 1 ) ∆( g 2 ) (Prop. of ∆ ) Finally , for any c ∈ X PLR ( x 0 , ζ ) = Z x | p ( x 0 | θ 0 ) ∆( φ − 1 c x 0 ) ≤ ζ p ( x | θ 0 ) ∆( φ − 1 c x ) p ( x | θ 0 ) µ ( dx ) (28) It is also in teresting to note that φ − 1 a b = ( φ − 1 b a ) − 1 (29) since g = φ − 1 a b ⇒ g a = b ⇒ a = g − 1 b ⇒ g − 1 = φ − 1 b a GENERALIZA TIONS RELA TED TO THE POSTERIOR DISTRIBUTION OF THE LR 21 so that the same w a y we ha ve PLR ( x 0 , ζ ) = Z x | p ( x 0 | θ 0 )∆ φ − 1 x 0 c ≤ ζ p ( x | θ 0 )∆ φ − 1 x c p ( x | θ 0 ) µ ( dx ) = Pr p ( x 0 | θ 0 )∆ φ − 1 x 0 c ≤ ζ p ( x | θ 0 )∆ φ − 1 x c θ 0 Appendix 3: Proof of the theorem 1 and corollaries 1, 2 The theorem 1 is a corollary of the theorem 2 presented and prov ed in the previous app endix: the theorem 2 can b e reexpressed more simply by assuming that the likelihoo d family and the induced Haar measures are absolutely con tin uous with resp ect to the Leb esgue measure. Pr o of of the the or em 1. The pro of only consists of reexpressing the domains of integration b e- cause the in tegrands expression are not functions of the use of the decomp osition of the measures o v er some other measures ( µ or Lesb esgue). The pro of even actually only consists of reexpressing the domain of in tegration of the p-v alue because the domain of integration of the PLR do es not dep end on the density ov er X used since the domain of in tegration is a subset of Θ , not X . If we note p µ ( . | θ ) the density with resp ect to the induced Haar measure µ r and p ( . | θ ) the densit y with resp ect to the Leb esgue measure, we hav e by definition P ( dx | θ ) = p µ ( x | θ ) µ r ( dx ) = p µ ( x | θ ) π r ( x ) dx and P ( dx | θ ) = p ( x | θ ) dx and so p µ ( x | θ ) = p ( x | θ ) π r ( x ) On the other side the mo dulus ∆ can also b e reexpressed as a function of the induced prior densities π l ( x ) and π r ( x ) . F rom the equations (29) and (21), ∆ φ − 1 x c = ∆ φ − 1 c x − 1 = H l dφ − 1 c x H r dφ − 1 c x = µ r ( dx ) µ l ( dx ) = π r ( x ) π l ( x ) Com bining these t wo results w e get p µ ( x | θ )∆ φ − 1 x c = p ( x | θ ) π l ( x ) Pr o of of the c or ol lary 1. PLR ( x 0 , ζ ) = Pr ( p X | θ 0 ( x 0 ) ≤ ζ p X | θ ( x 0 ) x 0 = Pr p X | S ( X ) ( x 0 | S ( x 0 )) p S ( X ) | θ 0 ( S ( x 0 )) ≤ ζ p X | S ( X ) ( x 0 | S ( x 0 )) p S ( X ) | θ ( S ( x 0 )) x 0 b ecause since S ( x ) is a function of x , p X | θ ( x ) = p X,S ( X ) | θ x, S ( x ) and since in addition S ( X ) is a sufficien t statistic of X , p X | θ ( x ) = p X | S ( X ) ,θ x | S ( x ) p S ( X ) | θ S ( x ) = p X | S ( X ) x | S ( x ) p S ( X ) | θ S ( x ) (30) Simplifying the densities whic h do not dep end on θ , PLR ( x 0 , ζ ) = Pr p S ( X ) | θ 0 S ( x 0 ) ≤ ζ p S ( X ) | θ 0 S ( x 0 ) S ( x 0 ) if p X | S ( X ) x 0 | S ( x 0 ) > 0 = Pr p S ( X ) | θ 0 S ( x 0 ) ( π l ( S ( x 0 ))) − 1 ≤ ζ p S ( X ) | θ 0 S ( x ) ( π l ( S ( x ))) − 1 θ 0 (Th. 1) GENERALIZA TIONS RELA TED TO THE POSTERIOR DISTRIBUTION OF THE LR 22 Pr o of of the c or ol lary 2. First reexpress the PLR under the conditions of the theorem 1 b y using a cum ulativ e distribution. Note T ( x ) the statistic: T ( x ) = p S ( X ) | θ 0 S ( x ) ( π l ( S ( x ))) − 1 Seen as a random v ariable, the dataset x induces the random v ariable T ( X ) the same wa y the statistic S ( x ) induced S ( X ) . Note F T ( X ) | θ 0 the cumulativ e distribution of T ( X ) under the n ull h yp othesis: F T ( X ) | θ 0 ( ζ ) = Pr T ( x ) ≤ ζ θ 0 Starting from the theorem 1, the PLR can b e reexpressed as PLR ( x 0 , ζ ) = 1 − F T ( X ) | θ 0 ζ − 1 T ( x 0 ) In particular, for a threshold ζ = 1 , one can directly notice that the PLR is equal to the p-value defined for the GLR b y the equation (2), but now instead asso ciated to the test statistic T ( x ) . Also note that the frequentist test corresp onding to the PLR is then giv en, for any threshold λ > 0 , by Reject H 0 if p S S ( x ) | θ 0 ( π l ( S ( x ))) − 1 ≤ λ Appendix 4: Proof of the remark 1 If there exists a joint measure ov er Θ 0 × Θ 1 × X | H 0 and if the even ts defined on the sets Θ 0 × X | H 0 and Θ 1 | H 0 are indep enden t, then P ( dθ 0 , dθ 1 , x | H 0 ) = P ( dθ 0 , x | H 0 ) P ( dθ 1 | H 0 ) This can b e reformulated using the standard previous notations: P ( dθ 0 , dθ 1 , x | H 0 ) = Π 0 ( dθ 0 | x ) m 0 ( x )Π 1 ( dθ 1 ) Then, noting Π 01 , 0 ( . | x ) = P ( . | H 0 , x ) , Π 01 , 0 ( dθ 0 , dθ 1 | x ) = P ( dθ 0 , dθ 1 , x | H 0 ) R P ( dθ 0 , dθ 1 , x | H 0 ) = m 0 ( x )Π 0 ( dθ 0 | x )Π 1 ( dθ 1 ) m 0 ( x ) = Π 0 ( dθ 0 | x )Π 1 ( dθ 1 ) Appendix 5: Proof of the pr oposition 1 Recall that the set R ∗ ( x ) defined in equation (16) is the LR set that rejects H 0 , and up on which is defined PLR 1 in equation (14). Call PF A B ( R ∗ , x ) and PD B ( R ∗ , x ) the asso ciated integrals defined in equations (17) and (18). Call R ( x ) ⊂ Θ 0 × Θ 1 an y other set and PF A B ( R , x ) and PD B ( R , x ) its asso ciated integ rals. The goal is to show that PF A B ( R , x ) ≤ PF A B ( R ∗ , x ) implies that PD B ( R , x ) ≤ PD B ( R ∗ , x ) for any test set R . The fact that PD B ( R , x ) ≤ PD B ( ¯ R ∗ , x ) implies that PF A B ( R , x ) ≤ PF A B ( ¯ R ∗ , x ) is sho wn in a recipro cal w a y . One can c hec k that the following inequalit y holds for all x ∈ X , all θ 0 ∈ Θ 0 and all θ 1 ∈ Θ 1 : I R ∗ ( x ) ( θ 0 , θ 1 ) − I R ( x ) ( θ 0 , θ 1 ) ( p ( x | θ 0 ) − ζ p ( x | θ 1 )) ≤ 0 Since the inequality is true for all x, θ 0 and θ 1 , we can m ultiply the left hand side b y any p ositiv e term and in tegrate o ver Θ 0 × Θ 1 . This implies in particular: Z Θ 0 Z Θ 1 Π 0 ( dθ 0 ) m 0 ( x ) Π 1 ( dθ 1 ) m 1 ( x ) I R ∗ ( x ) ( θ 0 , θ 1 ) − I R ( x ) ( θ 0 , θ 1 ) ( p ( x | θ 0 ) − ζ p ( x | θ 1 )) ≤ 0 GENERALIZA TIONS RELA TED TO THE POSTERIOR DISTRIBUTION OF THE LR 23 But since Π i ( dθ i | x ) = Π i ( dθ i ) p ( x | θ i ) m i ( x ) − 1 for i = 0 , 1 , this implies 1 m 1 ( x ) Z Θ 0 Z Θ 1 Π 0 ( dθ 0 | x )Π 1 ( dθ 1 ) I R ∗ ( x ) ( θ 0 , θ 1 ) − I R ( x ) ( θ 0 , θ 1 ) − ζ m 0 ( x ) Z Θ 0 Z Θ 1 Π 1 ( dθ 1 | x )Π 0 ( dθ 0 ) I R ∗ ( x ) ( θ 0 , θ 1 ) − I R ( x ) ( θ 0 , θ 1 ) ≤ 0 where w e recognize PF A B and PD B as defined in equations (17) and (18): PF A B ( R ∗ ( x )) − PF A B ( R ( x )) m 1 ( x ) − ζ PD B ( R ∗ ( x )) − PD B ( R ( x )) m 0 ( x ) ≤ 0 Therefore w e finally ha ve ζ PD B ( R ( x )) − PD B ( R ∗ ( x )) m 0 ( x ) ≤ PF A B ( R ( x )) − PF A B ( R ∗ ( x )) m 1 ( x ) from whic h w e conclude the final implication PF A B ( R ( x )) ≤ PF A B ( R ∗ ( x )) ⇒ PD B ( R ( x )) ≤ PD B ( R ∗ ( x )) The fact that PD B ( R , x ) ≤ PD B ( ¯ R ∗ , x ) implies that PF A B ( R , x ) ≤ PF A B ( ¯ R ∗ , x ) is shown in a recipro cal wa y .
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment