Integrated organic inference (IOI): A reconciliation of statistical paradigms

In tegrated organic inference (IOI): A reconciliation of statistical paradigms Russell J. Bo w ater Indep endent r ese ar cher, Sartr e 47, A c atlima, Huajuap an de L e´ on, Oaxac a, C.P. 69004, Mexic o. Email addr ess: as given on arXiv.or g. Twitter pr oﬁle: @nake d statist Personal website: sites.go o gle.c om/site/b owaterfosp age Abstract: It is recognised that the Ba yesian approac h to inference can not adequately cop e with all the types of pre-data b eliefs ab out p opulation quan tities of interest that are commonly held in practice. In particular, it generally encounters diﬃcult y when there is a lack of suc h b eliefs o ver some or all the parameters of a mo del, or within certain partitions of the parameter space concerned. T o address this issue, a fairly comprehensiv e theory of inference is put forw ard called in tegrated organic inference that is based on a fusion of Fisherian and Ba yesian reasoning. Dep ending on the pre-data kno wledge that is held ab out an y giv en mo del parameter, inferences are made about the parameter conditional on all other parameters using one of three metho ds of inference, namely organic ﬁducial inference, bispatial inference and Ba yesian inference. The full conditional post-data densities that result from doing this are then com bined using a framew ork that allo ws a join t p ost-data densit y for all the parameters to b e sensibly formed without requiring these full conditional densities to b e compatible. V arious examples of the application of this theory are presented. Finally , the theory is defended against p ossible criticisms partially in terms of what w as previously deﬁned as generalised sub jectiv e probabilit y . Keyw ords: Ba yesian; Bispatial inference; Fisherian; Gibbs sampler; Incompatible conditional densities; Ob jectiv e and sub jectiv e probabilit y; Organic ﬁducial inference; P v alues. 1 1. In tro duction The general problem of making inferences about a p opulation on the basis of a small ran- dom sample from that p opulation has long b een of great interest to scien tiﬁc researchers. This problem is often addressed by making the assumption that, in the p opulation, the distribution of the measuremen ts b eing considered is a mem b er of a giv en parametric family of distributions. Although this assumption can b e criticised, w e will c ho ose in this pap er to examine problems of inference that are constrained by this assumption. Our justiﬁcation for this is that, ﬁrst, this class of problems has substan tial imp ortance in its own righ t, and second, resolving suc h problems can b e view ed as a con venien t ﬁrst step tow ards tac kling cases in whic h making such an assumption is not appropri- ate. Therefore, let us supp ose that the data set to b e analysed x = { x 1 , x 2 , . . . , x n } was dra wn from a join t densit y or mass function g ( x | θ ) that dep ends on a set of parameters θ = { θ i : i = 1 , 2 , . . . , k } , where each θ i is a one-dimensional v ariable. A wa y of classifying the nature of the problem that is encoun tered in trying to mak e inferences ab out the set of parameters θ is to do so on the basis of the t yp e of kno wledge that was held ab out these parameters b efore the data w ere observed. In this resp ect, it can b e argued that the three most common t yp es of pre-data opinion that, in practice, are naturally held ab out any given model parameter θ j conditional on all other parameters θ − j = { θ 1 , . . . , θ j − 1 , θ j +1 , . . . , θ k } b eing kno wn are as follows: 1) Nothing or very little is kno wn ab out the parameter. 2) It is felt that the parameter may well b e close to a sp eciﬁc v alue, which may for example indicate the absence of a treatment eﬀect, or the lac k of a correlation b etw een v ariables, but apart from this nothing or very little is known ab out the parameter. Some examples of where it w ould b e reasonable to hold this t yp e of pre-data opinion were giv en in Bo water (2019b). 2 3) W e kno w enough ab out the parameter for our opinion ab out it to b e satisfactorily represen ted by a probabilit y densit y or mass function o ver the parameter. F or the reason just given, eac h of these types of pre-data opinion ab out the parameter θ j will therefore b e treated as corresp onding to a distinct problem of inference. Nev- ertheless, since our pre-data opinions ab out each of the parameters in an y giv en set of parameters θ ma y w ell fall in to diﬀeren t categories among the three b eing considered, it ma y b e necessary to address t w o or all three of these t yp es of problem in an y particular scenario. These problems are the three problems of inference that will b e of principal interest in what follo ws. More sp eciﬁcally , the aim of the present pap er will b e to show how these problems can b e dealt with in a harmonious manner b y using an approac h to inference based on a fusion of Fisherian (as attributed to R. A. Fisher) and Ba y esian reasoning. Of course, giv en the ob vious incompatibilities that exist b et ween, and to some exten t even within, these t w o schools of reasoning, w e will need to b e giv en some lib erty in ho w eac h of these approaches to inference is in terpreted. In this resp ect, although the theory that will b e outlined is based on a t yp e of prob- abilit y that is inherently sub jectiv e, and therefore not frequen tist as in the Fisherian paradigm, it is not the same type of probability that is commonly regarded as underlying sub jective Ba yesian theory . Instead, it is a generalised form of sub jective probability that eﬀectiv ely allows probability distributions to b e distinguished according to where they are on a scale that go es from them b eing virtually ob jectiv e to them b eing extremely sub jective. This type of probability was referred to as generalised sub jective proba- bilit y in Bo w ater (2018b). F urthermore, the theory to b e presen ted relies on v arious concepts that are heavily used b y frequentist statisticians, e.g. suﬃcien t and ancillary statistics, p oint estimators and their distributions, the classical notion of signiﬁcance, 3 and also one v ery imp ortan t idea that during his own lifetime w as chieﬂy advocated b y Fisher himself, namely the ﬁducial argument. W e are not suggesting, though, that the prop osed metho dology should b e judged p ositively simply b ecause it represen ts a com- promise b et ween comp eting sc ho ols of inference, rather we recommend, quite naturally , that it should b e ev aluated on the basis of its eﬀectiveness in dealing with the particular inferen tial challenge that has b een set out. T o giv e a little more detail, each of the three aforemen tioned problems of inference will b e addressed using a metho d that is sp eciﬁc to the problem concerned, and although this results in the use of three metho ds that are of a clearly diﬀerent nature, these metho ds are nev ertheless compatible with the ov erall framework of inference that will b e put forward. In particular, the ﬁrst t yp e of problem will b e tackled using what, in Bow ater (2019a), w as called organic ﬁducial inference. On the other hand, the second problem will b e addressed using what, in Bo w ater (2019b), was called bispatial inference. Finally , the third problem will b e dealt with using Ba y esian inference. The ov erall framework just referred to pro vides a wa y of co ordinating these distinct metho ds of inference so that it is p ossible to sim ultaneously make inferences about all of the parameters in the mo del. Let us no w brieﬂy describ e the structure of the pap er. In the next ﬁve sections, we will presen t summaries of the fundamen tal concepts and methods that form the basis of the general theory in question, whic h will b e called integrated organic inference (IOI). In particular, in the next tw o sections we will summarise the theory of generalised sub jec- tiv e probabilit y and the ov erall framework of in tegrated organic inference. F urthermore, after clarifying in Section 2.3 the in terpretation that will b e adopted in this pap er of the Ba yesian approach to inference, concise accounts of the metho ds of organic ﬁducial inference and bispatial inference will b e giv en in Sections 2.4 and 2.5. V arious examples of the application of in tegrated organic inference will then b e outlined in detail in Sec- tions 3.1 to 3.5. In the ﬁnal section of the pap er (Section 4), a discussion of this theory 4 of inference will b e presented in the form of answers to questions that w ould b e exp ected to naturally arise ab out the theory when it is ﬁrst ev aluated. The theory will b e referred to as integrated organic inference (IOI) b ecause it inte- grates what are often considered to be conﬂicting approaches to inference into an o verall framew ork that relies, in general, on what can be viewed as being an organic simulation algorithm. F urthermore, the type of inferences that this theory facilitates ma y , depending on the circumstances, b e regarded as b eing ob jectiv e or very sub jectiv e, but are never- theless alwa ys organic, in the sense that they are intended to b e only really understo o d b y living sub jects, e.g. h umans, rather than primitive robots. 2. F undamen tal concepts and metho ds 2.1. Generalised sub jectiv e probability Ov erview Under this deﬁnition of probability , a probabilit y distribution is deﬁned b y its (cumula- tiv e) distribution function and the strength of this function relative to other distribution functions of in terest. The distribution function is deﬁned as having the standard mathe- matical prop erties of suc h a function. Let us now brieﬂy outline the notion of the strength of a distribution function and some of the concepts that underlie this notion. F urther details and examples of these concepts and of the notion of strength itself can b e found in Bo water (2018b). Similarit y As in the aforementioned paper, let S ( A, B ) denote the similarity that a given individual feels there is b etw een his conﬁdence (or con viction) that an even t A will o ccur and his conﬁdence (or conviction) that an even t B will o ccur. F or an y three even ts A , 5 B and C , it is assumed that an individual is capable of deciding whether or not the orderings S ( A, B ) > S ( A, C ) and S ( A, B ) < S ( A, C ) are applicable. The notation S ( A, B ) = S ( A, C ) is used to represen t the case where neither of these orderings apply . Reference set of ev en ts Let O = { O 1 , O 2 , . . . , O m } b e a ﬁnite ordered set of m even ts that are mutually exclusive and exhaustiv e. Also, let us assume that if O (1), O (2) and O (3) are three subsets of the set O that contain the same n umber of ev en ts, then the following is true: S   [ O j ∈ O (1) O j , [ O j ∈ O (2) O j   = S   [ O j ∈ O (1) O j , [ O j ∈ O (3) O j   for all p ossible c hoices of the subsets O (1), O (2) and O (3). Under this assumption, a reference set of even ts R can b e deﬁned as follo ws: R = { R ( λ ) : λ ∈ Λ } (1) where R ( λ ) = O 1 ∪ O 2 ∪ · · · ∪ O λm and Λ = { 1 /m, 2 /m, . . . , ( m − 1) /m } . F or example, it should b e clear that an y giv en individual could easily decide that the set of all the outcomes of randomly dra wing a ball out of an urn containing m distinctly labelled balls could b e the set O . Equation (1) gives the deﬁnition of a reference set of even ts assuming that this set is discrete. F or the deﬁnition of a contin uous reference set of ev ents, see Bo water (2018b). External strength of a distribution function Let t wo contin uous random v ariables X and Y of p ossibly diﬀerent dimensions hav e elicited or giv en distribution functions F X ( x ) and G Y ( y ) resp ectiv ely . Also, we will 6 sp ecify the set of even ts F [ a ] as follows: F [ a ] =  { X ∈ A} : Z A f X ( x ) dx = a  for a ∈ [0 , 1] where { X ∈ A} is the even t that X lies in the set A and f X ( x ) is the density function corresp onding to F X ( x ), and w e will specify the set G [ a ] in the same w a y but with resp ect to the v ariable Y instead of the v ariable X and the distribution function G Y ( y ) instead of F X ( x ). F or a giv en discrete or contin uous reference set of ev en ts R , we will no w deﬁne the function F X ( x ) as b eing externally stronger than the function G Y ( y ) at the resolution λ , where λ ∈ Λ, if min A ∈ F [ λ ] S ( A, R ( λ )) > max A ∈ G [ λ ] S ( A, R ( λ )) An in terpretation that could b e given to this deﬁnition is that, if a particular individual judges a function F X ( x ) as being externally stronger than a function G Y ( y ) then, relativ e to the reference even t R ( λ ), the function F X ( x ) could be regarded as representing his uncertain t y ab out the v ariable X b etter than G Y ( y ) represents his uncertain ty about the v ariable Y . A deﬁnition of the internal rather than the external strength of a distribution function, and other deﬁnitions of the external strength of a distribution function that are applicable to discrete distribution functions and to distribution functions derived b y formal systems of reasoning, e.g. deriv ed b y applying the standard rules of probability , can b e found in Bo w ater (2018b). 7 2.2. Ov erall framew ork of the theory Brief outline The general aim of the theory to b e presen ted is to construct a join t density/mass function of all the model parameters θ that accurately represen ts what is known ab out these parameters after the data ha ve b een observed, i.e. what can b e referred to as a p ost-data densit y function of these parameters. Let this densit y function b e denoted as p ( θ | x ). T o b e more sp eciﬁc, this will b e done by ﬁrst determining each of the density functions in the complete set of full conditional p ost-data density functions of the parameters θ , i.e. the set of density functions: p ( θ j | θ − j , x ) for j = 1 , 2 , . . . , k (2) One of the key features of the approach that will b e dev elop ed is that it allo ws an y giv en one of these density functions to b e constructed using whic hever one of the three distinct methods of inference men tioned in the In tro duction is regarded as b eing the most appropriate for the task. In order to remo v e a p oten tially imp ortant source of conﬂict b etw een the three metho ds of inference b eing referred to, the quite natural assumption will b e made that during the pro cess of determining eac h of the full conditional densities in equation (2), the set of conditioning parameters θ − j are alwa ys treated as b eing known constants. This means that usually it will not b e p ermitted that an y one of these conditional densities is determined b y ﬁrst constructing a join t p ost-data densit y of the parameter θ j and some or all of the parameters in the set θ − j , and then conditioning this joint densit y on the parameters θ − j . Ho wev er, making the assumption that has just been made do es not generally eliminate the p ossibilit y that the set of full conditional densities in equation (2) ma y b e determined using the metho ds in question in a wa y that implies that they are 8 not consisten t with an y join t density function of the parameters concerned, i.e. these conditional densities may be incompatible among themselv es. On the other hand, if the full conditional densities under discussion are indeed compatible then, since, under a mild requiremen t, a join t density function is uniquely deﬁned by its full conditional densities, these densities will, in general, deﬁne a unique joint p ost-data densit y function for the parameters θ , i.e. a unique density p ( θ | x ). Addressing the issue of incompatible full conditional densities As discussed in Bo water (2018a), to chec k whether full conditional densities of the ov er- all t yp e b eing considered are compatible, it ma y b e p ossible to use a simple analytical metho d. In particular, w e b egin to implement this metho d by prop osing an analytical expression for the joint density function of the set of parameters θ , then we determine the full conditional density functions for this joint density , and ﬁnally we see whether these conditional densities are equiv alent to the full conditional densities in equation (2). If this equiv alence is achiev ed, then these latter conditional densities clearly must b e compatible. This metho d has the adv antage that generally , in such circumstances, it directly giv es us an analytical expression for the unique joint p ost-data density p ( θ | x ), i.e. under a mild condition, it will b e the originally prop osed joint densit y for the param- eters θ . By contrast, in situations that will undoubtedly often arise where it is not easy to establish whether or not the full conditional densities in equation (2) are compatible, let us imagine that we mak e the p essimistic assumption that they are in fact incompatible. Nev ertheless, ev en though these full conditional densities could b e incompatible, they could b e reasonably assumed to represen t the b est information that is a v ailable for con- structing a join t p ost-data densit y function of the parameters θ , or in other words, for constructing the most suitable densit y p ( θ | x ). Therefore, it would seem appropriate to try to ﬁnd the join t densit y of the parameters θ that has full conditional densities that 9 most closely approximate those giv en in equation (2). T o achiev e this ob jectiv e, let us fo cus attention on the use of a metho d that was ad- v o cated in a similar con text in Bo water (2018a), in particular the metho d that simply consists in making the assumption that the join t densit y of the parameters θ that most closely corresponds to the full conditional densities in equation (2) is equal to the limiting densit y function of a Gibbs sampling algorithm (Geman and Geman 1984, Gelfand and Smith 1990) that is based on these conditional densities with some given ﬁxed or random scanning order of the parameters in question. Under a ﬁxed scanning order of the model parameters, w e will deﬁne a single transition of this t yp e of algorithm as b eing one that results from randomly drawing a v alue (only once) from each of the full conditional den- sities in equation (2) according to some giv en ﬁxed ordering of these densities, replacing eac h time the previous v alue of the parameter concerned by the v alue that is generated. Let us clarify that it is b eing assumed that only the set of v alues for the parameters θ that are obtained on completing a transition of this kind are recorded as b eing a newly generated sample, i.e. the intermediate sets of parameter v alues that are used in the pro cess of making suc h a transition do not form part of the output of the algorithm. T o measure how close the full conditional densities of the limiting density function of the general type of Gibbs sampler b eing presently considered are to the full conditional densities in equation (2), w e can make use of a metho d that, in relation to its use in a similar context, was discussed in Bow ater (2018a). The reasoning that underlies this metho d can b e easily appreciated by ﬁrst assessing the practical viabilit y of another sp e- ciﬁc pro cedure for v erifying the compatibility of the conditional densities in equation (2). In particular, on the basis of the results in Chen and Ip (2015), it can b e deduced that the conditional densities in this equation will b e compatible if, under a ﬁxed scan- ning order of the parameters θ that is implemen ted in the w ay that was just sp eciﬁed, a Gibbs sampling algorithm based on these full conditional densities satisﬁes the following 10 three conditions: A) It is p ositiv e recurrent for all p ossible ﬁxed scanning orders. This condition ensures that the sampling algorithm has at least one stationary distribution for an y given ﬁxed scanning order. B) It is irreducible and ap erio dic for all p ossible ﬁxed scanning orders. T ogether with condition A, this condition ensures that the sampling algorithm has a limiting distribu- tion for any given ﬁxed scanning order. C) Giv en conditions A and B hold, the limiting densit y function of the sampling algo- rithm needs to b e the same ov er all p ossible ﬁxed scanning orders. Moreo v er, when these conditions hold, the join t p ost-data density function of the param- eters θ that is directly deﬁned by the full conditional densities in equation (2) will b e the unique limiting densit y function of these parameters referred to in condition C. The suﬃciency of the conditions A to C just listed for establishing the compatibilit y of an y giv en set of full conditional densities was pro ved for a sp ecial case in Chen and Ip (2015), whic h is a pro of that can b e easily extended to the more general case that is currently of in terest. Nev ertheless, even if, with resp ect to the sp eciﬁc type of full conditional densities re- ferred to in equation (2), we can establish that condition A and condition B are satisﬁed, it will usually b e imp ossible, in practice, to determine whether condition C is satisﬁed. F rom an alternative p ersp ectiv e, if w e assume that the full conditional densities in this equation are in fact incompatible, then if conditions A and B are satisﬁed, it w ould ap- p ear to b e useful ( with reference to condition C ) to analyse how the limiting densit y function of a Gibbs sampler based on these full conditional densities v aries ov er a rea- sonable n um b er of v ery distinct ﬁxed scanning orders of the sampler. If within suc h an analysis, the v ariation of this limiting densit y with resp ect to the scanning order of the 11 parameters θ can b e classiﬁed as small, negligible or undetectable, then this should giv e us reassurance that the full conditional densities in equation (2) are, respectively accord- ing to suc h classiﬁcations, close, v ery close or at least v ery close, to the full conditional densities of the limiting densit y of a Gibbs sampler of the type that is of main interest, i.e. a Gibbs sampler that is based on any giv en ﬁxed or random scanning order of the parameters concerned. In trying to choose the scanning order of this t yp e of Gibbs sampler such that it has a limiting density function that corresp onds to a set of full conditional densities that most accurately approximate the densit y functions in equation (2), a go o d general choice w ould arguably b e, what will be referred to as, a uniform random scanning order. Under this type of scanning order, a transition of the Gibbs sampling algorithm in question will b e deﬁned as b eing one that results from generating a v alue from one of the full conditional densities in equation (2) that is chosen at random, with the same probability of 1 /k b eing giv en to any one of these densities b eing selected, and then treating the generated v alue as the up dated v alue of the parameter concerned. Ho w ever, it can b e easily shown that indep enden t of whether or not the set of full conditional densities in equation (2) are compatible, the last full conditional density in this set that is sampled from in completing a giv en ﬁxed scanning order will b e one of the full conditional densities of the limiting density function of the t yp e of Gibbs sampler b eing discussed that uses suc h a ﬁxed scanning order. This therefore provides a reason for p erhaps deciding, in certain applications, that the limiting density of a Gibbs sampler of the general type in question most satisfactorily corresp onds to the full conditional densities in equation (2) when a given ﬁxed rather than a uniform random scanning order of the parameters θ is used. 12 Con v en tional sim ulation issues As with all Gibbs samplers it is imp ortan t to v erify in implementing strategies of the t yp e just mentioned that the sampler concerned has conv erged to its limiting density function within the restricted num b er of transitions of the sampler that can b e observ ed in practice. T o do this, we can mak e use of standard metho ds for analysing the conv ergence of Monte Carlo Marko v c hains describ ed in, for example, Gelman and Rubin (1992), Co wles and Carlin (1996) and Bro oks and Rob erts (1998). How ever, the use of suc h con v ergence diagnostics ma y b e considered to b e sligh tly more imp ortan t in the case of present in terest in which the full conditional densities on whic h the Gibbs sampler is based could b e incompatible, since, compared to the case where these densities are kno wn to b e compatible, there is likely to b e, in practice, a little more concern that the Gibbs sampler ma y not actually ha v e a limiting densit y function, even though in reality the gen uine risk of this may still b e extremely lo w. A notable adv antage of the general metho d for ﬁnding a suitable join t p ost-data density for the parameters θ that has just b een outlined is that it can directly ac hieve what is often the main goal of a standard application of the Gibbs sampler, namely that of obtaining go o d approximations to the exp ected v alues of functions of the parameters of a mo del ov er the p ost-data or p osterior densit y for these parameters that is of interest, i.e. exp ected v alues of the following t yp e: E[ h ( θ ) | x ] = Z R k h ( θ ) p ( θ | x ) dθ where p ( θ | x ) is a giv en p ost-data density function of the parameters θ , while h ( θ ) is an y giv en function of these parameters. T o b e more sp eciﬁc, this kind of expected v alue ma y , of course, b e appro ximated using the Monte Carlo estimator: 13 1 N − b N X i = b +1 h ( θ ( i ) 1 , θ ( i ) 2 , . . . , θ ( i ) k ) where θ ( i ) 1 , θ ( i ) 2 , . . . , θ ( i ) k is the i th sample of parameter v alues among the N samples gen- erated by the sampler in total, and b is the n um b er of initial samples that are classiﬁed as b elonging to the burn-in phase of the sampler. 2.3. Ba y esian inference As was in eﬀect done so b y Bay es in his famous pap er Ba yes (1763), it will b e assumed that Ba y esian inference dep ends on three k ey concepts. First, Bay es’ theorem as a purely mathematical expression. Second, the justiﬁcation of the application of this theorem to w ell-understo o d ph ysical exp eriments, e.g. random spins of a wheel or random draws of a ball from an urn of balls. Finally , something which will b e referred to as Bay es’ analogy , which is the type of analogy that can b e made b et ween the uncertaint y that surrounds the outcomes of the kind of physical exp eriments just mentioned to whic h Ba y es’ theorem can b e v ery naturally applied, and the uncertain t y that surrounds what are the true v alues of any unkno wn real-w orld quan tities that are of interest. By using this latter concept, we can justify the use of Ba y esian inference in a m uc h wider range of applications than is allow ed b y only using the ﬁrst tw o concepts. How ever, dep ending on the t yp e of application, the Bay es’ analogy may b e a go od analogy or a p oor analogy , which is something that needs to b e taken in to account when assessing the adequacy of any given application of the Bay esian metho d. In k eeping with the notation deﬁned in the Introduction, the p ost-data or p osterior densit y function of the parameter θ j giv en all other mo del parameters θ − j can b e ex- pressed according to Bay es’ theorem as follows: p ( θ j | θ − j , x ) = C 0 g ( x | θ ) p ( θ j | θ − j ) 14 where p ( θ j | θ − j ) is the pre-data or prior density function of the parameter θ j giv en the parameters θ − j , while C 0 is a normalising constant. In this pap er, w e will exclude from consideration t wo metho ds of inference that are often referred to as ‘ob jectiv e’ forms of Ba yesian inference. The ﬁrst of these metho ds consists in alwa ys sp ecifying the prior density p ( θ j | θ − j ) as b eing a uniform or ﬂat den- sit y function ov er all v alues of θ j . This implies, though, that the Bay es’ analogy m ust b e brok en due to this prior density b eing improp er and/or due to the p osterior densit y of any given p opulation quan tit y of in terest h ( θ j ) conditional on the parameters θ − j p ossessing, in general, the property of b eing dep enden t on the parameterisation of the sampling mo del, whic h of course is a very undesirable property for this p osterior density to ha v e. On the other hand, the second t yp e of metho d en tails sp ecifying the prior densit y p ( θ j | θ − j ) such that it dep ends on the sampling mo del, i.e. allowing what is kno wn ab out the parameter θ j to dep end on ho w we in tend to collect more information ab out this parameter, how ever doing this clearly again breaks the Ba yes’ analogy . A famous example of a type of prior density that is sp eciﬁed in this w ay is a prior density that is deriv ed b y applying Jeﬀreys’ rule, see Jeﬀreys (1961), although man y other prior densities of this kind ha ve b een prop osed, see for example, Kass and W asserman (1996). T o conclude, it can b e strongly argued that, due to the Ba y es’ analogy b eing clearly brok en, the application of either of the t wo metho ds of inference that hav e just b een men tioned should not really be regarded as being an application of the Ba y esian approac h to inference at all. 2.4. Organic ﬁducial inference W e will now outline some of the k ey concepts that underlie the theory of organic ﬁducial inference. Descriptions of other imp ortant concepts on whic h this theory is based, along with further details ab out the concepts that will b e outlined here and ab out the ov erall 15 theory itself, can b e found in Bow ater (2019a). Throughout this section, it will b e assumed that the v alues of the parameters in the set θ − j are kno wn. Fiducial statistics A ﬁducial statistic Q ( x ) will b e deﬁned as b eing a univ ariate statistic of the sample x that can b e regarded as eﬃcien tly summarising the information that is con tained in this sample ab out the only unknown parameter θ j , given the v alues of other statistics that do not provide any information ab out this parameter, i.e. ancillary statistics. If, in an y giv en case, there exists a univ ariate suﬃcient statistic for θ j , then this w ould naturally b e chosen to be the ﬁducial statistic for that case. In other cases, it may well make go o d sense to choose this statistic Q ( x ) to b e the maxim um lik eliho o d estimator of θ j . F or ease of presen tation, w e will assume, in what follo ws, that the c hoice of the ﬁducial statistic can b e justiﬁed without reference to any particular ancillary statistics. Data generating algorithm Indep enden t of the wa y in whic h the data set x was actually generated, it will b e assumed that this data set was generated b y the follo wing algorithm: 1) Generate a v alue γ for a contin uous one-dimensional random v ariable Γ, which has a densit y function π 0 ( γ ) that do es not dep end on the parameter θ j . 2) Determine a v alue q ( x ) for the ﬁducial statistic Q ( x ) by setting Γ equal to γ and Q ( x ) equal to q ( x ) in the following expression for the statistic Q ( x ), whic h eﬀectiv ely should deﬁne the wa y in which this statistic is distributed: Q ( x ) = ϕ (Γ , θ j ) (3) where the function ϕ (Γ , θ j ) is sp eciﬁed so that it satisﬁes the follo wing conditions: 16 a) The density or mass function of Q ( x ) that is, in eﬀect, deﬁned by equation (3) is equal to what it would ha v e b een if Q ( x ) had been determined on the basis of the data set x . b) The only random v ariable up on which ϕ (Γ , θ j ) dep ends is the v ariable Γ. 3) Generate the data set x from its sampling density or mass function g ( x | θ 1 , θ 2 , . . . , θ k ) conditioned on the statistic Q ( x ) b eing equal to its already generated v alue q ( x ). In the con text of this algorithm, the v ariable Γ is referred to as the primary random v ariable (primary r.v.). Strong ﬁducial argumen t This is the argumen t that the densit y function of the primary r.v. Γ after the data ha ve b een observed, i.e. the p ost-data density function of Γ, should b e equal to the pre-data densit y function of Γ, i.e. the densit y function π 0 ( γ ) as deﬁned in step 1 of the data generating algorithm just presented. Mo derate ﬁducial argumen t It will b e assumed that this argument is only applicable if, on observing the data x , there exists some positive measure set of v alues of the primary r.v. Γ ov er whic h the pre- data density function π 0 ( γ ) was p ositiv e, but ov er whic h the p ost-data densit y function of Γ, whic h will b e denoted as the density function π 1 ( γ ), is necessarily zero. Under this condition, it is the argumen t that, o v er the set of v alues of Γ for which the densit y function π 1 ( γ ) is necessarily p ositiv e, the relative height of this function should b e equal to the relative height of the density function π 0 ( γ ), i.e. the heights of these t wo functions should b e prop ortional. 17 W eak ﬁducial argumen t This argument will b e assumed to b e only applicable if neither the strong nor the mo d- erate ﬁducial argumen t is considered to b e appropriate. It is the argumen t that, ov er the set of v alues of the primary r.v. Γ for which the p ost-data densit y function π 1 ( γ ) is necessarily p ositive, the relative heigh t of this function should b e equal to the relativ e heigh t of the pre-data density function π 0 ( γ ) m ultiplied b y weigh ts on the v alues of Γ determined b y a given function ov er the parameter θ j that w as sp eciﬁed b efore the data w ere observ ed. This latter function is called the global pre-data function of θ j . Let us no w deﬁne this function. Global pre-data (GPD) function The global pre-data (GPD) function ω G ( θ j ) is used to express pre-data knowledge, or a lac k of such knowledge, about the only unknown parameter θ j . This function may b e an y given non-negativ e and upper b ounded function of the parameter θ j . It is a function that only needs to be speciﬁed up to a prop ortionality constant, in the sense that, if it is m ultiplied by a p ositiv e constan t, then the v alue of the constant is redundant. Unlik e a Ba y esian prior densit y , it is not con trov ersial to use a GPD function that is not globally in tegrable. A principle for deﬁning the ﬁducial densit y f ( θ j | θ − j , x ) Let us no w consider a principle for deﬁning the p ost-data densit y of θ j conditional on the parameters θ − j , whic h giv en that it will b e deriv ed using a type of ﬁducial inference, will b e called the ﬁducial densit y of θ j conditional on θ − j , and will b e denoted as the densit y f ( θ j | θ − j , x ). T o be able to use this principle, the following condition must b e satisﬁed. 18 Condition 1 Let G x and H x b e, resp ectiv ely , the sets of all the v alues of the primary r.v. Γ and the parameter θ j for which the densit y functions of these v ariables m ust necessarily b e p ositiv e in ligh t of ha ving observed only the v alue of the ﬁducial statistic Q ( x ), i.e. the v alue q ( x ), and not any other information in the data set x . T o clarify , an y set of v alues of Γ or an y set of v alues of θ j that are regarded as b eing imp ossible after the statistic Q ( x ) has b een observed can not b e contained in the set G x or the set H x resp ectiv ely . Giv en this notation, the present condition will b e satisﬁed if, on substituting the v ariable Q ( x ) in equation (3) by its observed v alue q ( x ), this equation would deﬁne a bijectiv e mapping b et w een the set G x and the set H x . Under this condition, the full conditional ﬁducial density f ( θ j | θ − j , x ) is deﬁned b y setting Q ( x ) equal to its observed v alue q ( x ) in equation (3), and then treating the v alue θ j in this equation as b eing a realisation of the random v ariable Θ j , to giv e the expression: q ( x ) = ϕ (Γ , Θ j ) (4) except that, instead of the v ariable Γ necessarily ha ving the densit y function π 0 ( γ ) as deﬁned in step 1 of the data generating algorithm, it will b e assumed to hav e the p ost- data densit y function of this v ariable as deﬁned b y: π 1 ( γ ) = ( C 1 ω G ( θ j ( γ )) π 0 ( γ ) if γ ∈ G x 0 otherwise where θ j ( γ ) is the v alue of the v ariable Θ j that maps on to the v alue γ of the v ariable Γ according to equation (4), the function ω G ( θ j ( γ )) is the GPD function of θ j deﬁned earlier, and C 1 is a normalising constant. 19 Notice that if, on substituting the v ariable Q ( x ) by the v alue q ( x ), equation (3) de- ﬁnes an injectiv e mapping from the set of v alues { γ : π 0 ( γ ) > 0 } for the v ariable Γ to the space of the parameter θ j , then the GPD function ω G ( θ j ) expresses in eﬀect our pre-data b eliefs ab out θ j relativ e to what is implied b y using the strong ﬁducial argu- men t. By doing so, it determines whether the strong, mo derate or weak ﬁducial argumen t is used to make inferences ab out θ j , and also the wa y in whic h the latter t w o argumen ts inﬂuence the inferential pro cess. In the case where nothing or very little w as kno wn ab out the parameter θ j b efore the data were observed, it w ould generally seem reasonable to c ho ose the GPD function of the parameter θ j to b e equal to a p ositiv e constan t ov er the en tire space of this parameter. Under the assumption that there exists an injectiv e mapping from the space of Γ to the space of θ j of the t yp e just men tioned, choosing the GPD function ω G ( θ j ) in this wa y implies that the p ost-data densit y π 1 ( γ ) will b e equal to the pre-data density π 0 ( γ ), i.e. inferences will b e made ab out θ j b y using the strong ﬁducial argument. The use of the theory of ﬁducial inference b eing presen tly considered in this sp ecial case is discussed to some exten t in Bow ater (2019a), but more extensiv ely in Bow ater (2018a), where in fact a sp eciﬁc version of organic ﬁducial inference is applied to examples of this particular nature that is referred to as sub jective ﬁducial inference. Other w a ys of deﬁning the ﬁducial densit y f ( θ j | θ − j , x ) In cases where the principle just describ ed can not b e applied, i.e. when Condition 1 do es not hold, w e may well b e able to deﬁne the ﬁducial density f ( θ j | θ − j , x ) using the alternativ e principle for this purp ose that w as presen ted in Section 3.4 of Bow ater (2019a) as Principle 2, or it ma y w ell be considered acceptable to deﬁne this ﬁducial densit y using the kind of v ariations on this latter principle that w ere discussed in Sections 7.2 and 8 of this earlier pap er. The alternative principle in question, which is particularly useful in 20 cases where the data are discrete or categorical, relies on the concept of a lo cal pre-data (LPD) function for expressing additional information concerning the pre-data beliefs that w ere held ab out the parameter θ j to that which is expressed b y the GPD function for θ j . The concept of a LPD function is also detailed in Bow ater (2019a). 2.5. Bispatial inference The type of bispatial inference that will b e incorp orated into the theory being dev elop ed in the presen t pap er will b e the sp ecial form of bispatial inference that was laid out in Section 3 of Bow ater (2019b). Let us now outline the key concepts on which this t yp e of bispatial inference is based. F urther details ab out these concepts and a broader discussion of the sp eciﬁc metho d of inference in question can b e found in Bow ater (2019b). As in the previous section, the v alues of the parameters in the set θ − j will b e assumed to b e kno wn. Scenario of in terest This scenario is characterised b y there ha ving b een a substan tial degree of b elief b efore the data w ere observed that the only unknown parameter θ j la y in a narrow in terv al [ θ j 0 , θ j 1 ], but if, on the other hand, θ j had b een conditioned not to lie in this interv al, then there would hav e b een no or v ery little pre-data knowledge ab out θ j o v er all of its allow able v alues outside of the in terv al in question. Among the three common t yp es of pre-data opinion w e ma y hold ab out the parameter θ j that w ere highligh ted in the In tro duction, this scenario is clearly consistent with holding the second t yp e of opinion. T est statistics In the context of bispatial inference, a test statistic T ( x ), whic h will also b e denoted simply by the v alue t , is sp eciﬁed such that it satisﬁes t w o criteria. First, this statistic 21 m ust ﬁt within the broad deﬁnition of a ﬁducial statistic that was giv en in the previous section. Therefore, this could mean that a particular choice of the statistic T ( x ) can only b e justiﬁed with reference to given ancillary statistics, how ever, similar to ho w we pro ceeded in the previous section, we will assume here, for ease of presen tation, that this is not the case. The second criterion is that if F ( t | θ j ) is the cum ulative distribution function of the unobserv ed test statistic T ( X ) ev aluated at its observed v alue t given a v alue for the parameter θ j , i.e. F ( t | θ j ) = P ( T ( X ) ≤ t | θ j ), and if F 0 ( t | θ j ) is equal to the probabilit y P ( T ( X ) ≥ t | θ j ), then it is necessary that, o v er the set of allo w able v alues for θ j , the probabilities F ( t | θ j ) and 1 − F 0 ( t | θ j ) strictly decrease as θ j increases. P arameter and sampling space h yp otheses Under this deﬁnition of a test statistic T ( x ), if the condition: F ( t | θ j = θ j 0 ) ≤ F 0 ( t | θ j = θ j 1 ) (5) holds, where the v alues θ j 0 and θ j 1 are as deﬁned at the start of this section, then the parameter space h yp othesis H P and the sampling space h yp othesis H S will b e deﬁned as: H P : θ j ≥ θ j 0 (6) H S : ρ ( T ( X ∗ ) ≤ t ) ≤ F ( t | θ j = θ j 0 ) (7) where X ∗ is an as-y et-unobserved second sample of v alues drawn from the sampling densit y of in terest, i.e. the density g ( x | θ ), that is the same size as the observ ed (ﬁrst) sample x , i.e. it consists of n observ ations, and where ρ ( A ) is the unkno wn p opulation prop ortion of times that condition A is satisﬁed. On the other hand, if the condition in equation (5) do es not hold, then the hypotheses in question will b e deﬁned as: 22 H P : θ j ≤ θ j 1 (8) H S : ρ ( T ( X ∗ ) ≥ t ) ≤ F 0 ( t | θ j = θ j 1 ) (9) Giv en the wa y that the test statistic T ( x ) was just deﬁned, it can b e easily appreci- ated that the hypotheses H P and H S in equations (6) and (7) are equiv alent, and also that these h yp otheses as deﬁned in equations (8) and (9) are equiv alent. In addition, observ e that the probabilities F ( t | θ j = θ j 0 ) and F 0 ( t | θ j = θ j 1 ) that app ear in the def- initions of the h yp otheses H S in equations (7) and (9) w ould b e the standard one-sided P v alues that w ould b e calculated on the basis of the data set x if the null h yp othe- ses w ere regarded as b eing the h yp otheses H P that corresp ond to the tw o hypotheses H S in question. Inferen tial pro cess It will b e assumed that inferences are made ab out the parameter θ j b y means of the follo wing three-step pro cess: Step 1: Assessment of the lik eliness of the hypothesis H P b eing true using only pre-data kno wledge ab out the parameter θ j , with sp ecial attention b eing giv en to ev aluating the lik eliness of the hypothesis that θ j lies in the interv al [ θ j 0 , θ j 1 ], which is an hypothesis that is alw ays included in the h yp othesis H P . It is not necessary that this assessment is expressed in terms of a formal measure of uncertaint y , e.g. a probability do es not need to b e assigned to the hypothesis H P . Step 2: Assessmen t of the lik eliness of the hypothesis H S b eing true after the data x ha ve b een observed, leading to the assignmen t of a probabilit y to this h yp othesis, whic h will b e denoted as the probability κ . In carrying out this assessmen t, all relev ant factors ought to b e tak en into account including, in particular: (a) the size of the one-sided P v alue 23 that app ears in the deﬁnition of the h yp othesis H S , i.e. the v alue F ( t | θ j = θ j 0 ) or the v alue F 0 ( t | θ j = θ j 1 ), (b) the assessmen t made in Step 1, and (c) the kno wn equiv alency b et ween the h yp otheses H P and H S . Step 3: Conclusion ab out the probabilit y of the h yp othesis H P b eing true ha ving taken in to accoun t the data x . This is directly implied b y the assessment made in Step 2 due to the equiv alence of the hypotheses H P and H S . In com bination with organic ﬁducial inference It was describ ed in Bo water (2019b) ho w the type of bispatial inference under discussion can b e extended from allowing us to simply determine a post-data probability for the h yp othesis H P b eing true, i.e. the probability κ , to allo wing us to determine an en tire p ost-data densit y function for the parameter θ j . As was the case in this earlier pap er, we will again fav our doing this in an indirect wa y by com bining bispatial inference as has just b een detailed with organic ﬁducial inference as w as summarised in Section 2.4. In particular, the metho d that we will choose to adopt to ac hieve the goal in question will b e essen tially the metho d that was put forw ard in Section 4.2 of Bow ater (2019b). Let us no w give brieﬂy outline this metho d. T o b egin with, in applying the metho d concerned, we assume that b oth the p ost- data density function of θ j conditional on θ j lying in the interv al [ θ j 0 , θ j 1 ], and the p ost-data density function of θ j conditional on θ j not lying in this in terv al are de- riv ed under the paradigm of organic ﬁducial inference, i.e. they are ﬁducial densit y func- tions, and let us therefore denote these densit y functions by f ( θ j | θ j ∈ [ θ j 0 , θ j 1 ] , x ) and f ( θ j | θ j / ∈ [ θ j 0 , θ j 1 ] , x ) resp ectively . Since it has b een assumed that, under the condi- tion that θ j do es not lie in the interv al [ θ j 0 , θ j 1 ], nothing or very little would ha v e b een kno wn ab out θ j b efore the data were observ ed, it would seem quite natural, in deriving the latter of these ﬁducial densities f ( θ j | θ j / ∈ [ θ j 0 , θ j 1 ] , x ), to use a GPD function for θ j 24 that has the following form: ω G ( θ j ) = ( 0 if θ j ∈ [ θ j 0 , θ j 1 ] a otherwise where a > 0, whic h would b e classed as a neutral GPD function using the terminology of Bo water (2019a). On the basis of this GPD function, the ﬁducial densit y f ( θ j | θ j / ∈ [ θ j 0 , θ j 1 ] , x ) can of- ten be derived by applying the mo derate ﬁducial argument under the principle that w as outlined in Section 2.4, i.e. Principle 1 of Bo water (2019a). Alternativ ely , in accordance with what w as also advocated in Bow ater (2019a), this ﬁducial density can b e more generally deﬁned, with resp ect to the same GPD function for θ j , b y the following ex- pression: f ( θ j | θ j / ∈ [ θ j 0 , θ j 1 ] , x ) = C 2 f S ( θ j | x ) (10) where C 2 is a normalising constan t, and f S ( θ j | x ) is a ﬁducial densit y for θ j deriv ed us- ing either Principle 1 or Principle 2 of Bow ater (2019a) that would b e regarded as b eing a suitable ﬁducial densit y for θ j in a general scenario where it is assumed that there was no or very little pre-data kno wledge ab out θ j o v er all p ossible v alues of θ j . T o construct the ﬁducial density of θ j conditional on θ j lying in the interv al [ θ j 0 , θ j 1 ], i.e. the densit y f ( θ j | θ j ∈ [ θ j 0 , θ j 1 ] , x ), the metho d b eing considered relies on quite a general type of GPD function for θ j . In particular, it is assumed that this GPD function has the following form: ω G ( θ j ) = ( 1 + ν h ( θ j ) if θ j ∈ [ θ j 0 , θ j 1 ] 0 otherwise (11) where ν ≥ 0 is a giv en constant and h ( θ j ) is a contin uous unimo dal densit y function on 25 the interv al [ θ j 0 , θ j 1 ] that is equal to zero at the limits of this interv al. On the basis of this GPD function, the ﬁducial densit y f ( θ j | θ j ∈ [ θ j 0 , θ j 1 ] , x ) can often b e deriv ed b y again using the principle detailed in Section 2.4 (i.e. Principle 1 of Bow ater 2019a), but this time by calling up on the weak ﬁducial argument. Alternativ ely , in accordance with what w as also advocated in Bo water (2019a), this ﬁducial density can b e more generally deﬁned, with resp ect to the same GPD function for θ j , in the following w ay: f ( θ j | θ j ∈ [ θ j 0 , θ j 1 ] , x ) = C 3 ω G ( θ j ) f S ( θ j | x ) (12) where the ﬁducial density f S ( θ j | x ) is sp eciﬁed as it was immediately after equation (10), and C 3 is a normalising constant. No w, if in using the metho d of bispatial inference outlined immediately b efore the curren t discussion, the hypothesis H P , i.e. the hypothesis in equation (6) or equation (8), is assigned a sensible p ost-data probabilit y κ , i.e. a probability ab o v e a v ery lo w limit that is deﬁned in Bo w ater (2019b), then given the t w o conditional post-data densities for θ j that ha v e just b een sp eciﬁed, i.e. the ﬁducial densities f ( θ j | θ j ∈ [ θ j 0 , θ j 1 ] , x ) and f ( θ j | θ j / ∈ [ θ j 0 , θ j 1 ] , x ), we ha v e suﬃcien t information to determine a v alid p ost-data densit y function of θ j o v er all v alues of θ j . Hop efully , it is fairly clear wh y this is the case, nevertheless the reader is referred to Bow ater (2019b) for a more detailed account of the deriv ation of this latter p ost-data density function. In the rest of this pap er, w e will denote this ov erall p ost-data density function of θ j as the density b ( θ j | θ − j , x ) to indicate that it was deriv ed using bispatial inference. Ho w ever, there is an imp ortan t ﬁnal issue that needs to b e resolv ed, whic h is ho w the v alue of the constan t ν in equation (11) is chosen. Using the metho d being discussed, this constan t m ust in fact b e c hosen such that the ov erall p ost-data density b ( θ j | θ − j , x ) is made equiv alen t to a ﬁducial densit y function for θ j that is based on a con tin uous GPD function for θ j o v er all v alues of θ j , but except for the w a y in whic h this GPD 26 function is sp eciﬁed, is based on the same assumptions as were used to deriv e the ﬁducial densit y f S ( θ j | x ). In general, a v alue for ν will exist that satisﬁes this condition and it will b e a unique v alue. Placing this condition on the choice of ν can b e view ed as not restricting excessively the wa y w e are allow ed to express our pre-data knowledge ab out the parameter θ j , while it ensures that the densit y function b ( θ j | θ j , x ) p ossesses, in general, the usually desirable prop erty of being contin uous o v er all v alues of θ j . P ost-data opinion (PDO) curv e Observ e that in using the method of inference that has just been outlined, the assessmen t of the lik eliness of the hypothesis H S in either equation (7) or equation (9) will, in gen- eral, dep end on the v alues of the parameters in the set θ − j . This of course will b e partially due to the eﬀect that the v alues of these parameters can ha ve on the one-sided P v alue that app ears in the deﬁnition of this hypothesis, i.e. their eﬀect on the v alue F ( t | θ j = θ j 0 ) or the v alue F 0 ( t | θ j = θ j 1 ). As a result, to implement the metho d of inference under discussion within the ov erall framew ork for determining a join t p ost- data densit y of all the mo del parameters θ that was put forw ard in Section 2.2, w e will generally wish to assign not just one, but v arious probabilities to the h yp othesis H S conditional on the v alues of the parameters θ − j . It is p ossible though to simplify matters greatly b y assuming that the probabilit y that is assigned to any given h yp othesis H S , and to also therefore its corresp onding h yp othesis H P , i.e. the probabilit y κ , will b e the same for any ﬁxed v alue of the one- sided P v alue that app ears in the deﬁnition of the hypothesis H S no matter what v alues are actually taken b y the parameters in the set θ − j . By making this assumption, which is arguably a reasonable assumption in many practical situations, the probability κ b e- comes a mathematical function of the one-sided P v alue that app ears in the deﬁnition of the h yp othesis H S concerned. As was the case in Bow ater (2019b), this function will 27 b e called the p ost-data opinion (PDO) curve for the parameter θ j conditional on the parameters θ − j . 3. Examples W e will no w present v arious examples of the application of the ov erall theory that was outlined in previous sections, i.e. the theory of in tegrated organic inference. 3.1. Inference ab out a univ ariate normal distribution Let us b egin by considering what can b e referred to as Studen t’s problem, that is, the stan- dard problem of making inferences ab out the mean µ of a normal density function, when its v ariance σ 2 is unkno wn, on the basis of a sample x of size n , i.e. x = { x 1 , x 2 , . . . , x n } , dra wn from the density function concerned. If σ 2 w as kno wn, a suﬃcient statistic for µ w ould be the sample mean ¯ x , whic h there- fore, in applying the theory of ﬁducial inference outlined in Section 2.4, can naturally be assumed to b e the ﬁducial statistic Q ( x ) in this particular case. Based on this assump- tion and given a v alue for σ 2 , equation (3) can b e expressed as: ¯ x = ϕ (Γ , µ ) = µ + ( σ / √ n ) Γ (13) where the primary r.v. Γ ∼ N(0 , 1). If nothing or very little w as kno wn ab out µ b efore the data x were observ ed, then it would b e quite natural to sp ecify the GPD function for µ as follows: ω G ( µ ) = a for µ ∈ ( −∞ , ∞ ), where a > 0, whic h is indeed in keeping with ho w this function would b e c hosen using a criterion mentioned in Section 2.4. Using the principle outlined in this earlier section for deriving the ﬁducial density f ( θ j | θ − j , x ), and in particular using equation (4), this w ould imply that the ﬁducial densit y of µ giv en σ 2 , 28 i.e. the density f ( µ | σ 2 , x ), is deﬁned by: µ | σ 2 , x ∼ N( ¯ x, σ 2 /n ) (14) On the other hand, if µ was known, a suﬃcient statistic for σ 2 w ould b e ˆ σ 2 = (1 /n ) P n i =1 ( x i − µ ) 2 , whic h therefore, in applying again the theory of Section 2.4, will b e assumed to b e the statistic Q ( x ) in this case. Based on this assumption and given a v alue for µ , equation (3) can b e expressed as: ˆ σ 2 = ϕ (Γ , σ 2 ) = ( σ 2 /n )Γ (15) where the primary r.v. Γ has a χ 2 distribution with n degrees of freedom. If there was no or very little pre-data kno wledge ab out σ 2 , it w ould b e quite natural to sp ecify the GPD function for σ 2 as follo ws: ω G ( σ 2 ) = b if σ 2 ≥ 0 and zero otherwise (16) where b > 0. Again using the principle detailed in Section 2.4 for deriving the ﬁducial densit y f ( θ j | θ − j , x ), this w ould imply that the ﬁducial density f ( σ 2 | µ, x ) is deﬁned by: σ 2 | µ, x ∼ In v-Gamma ( α = n/ 2 , β = n ˆ σ 2 / 2) (17) i.e. it is an inv erse gamma density function with shap e parameter α equal to n/ 2 and scale parameter β equal to n ˆ σ 2 / 2. It can b e shown that the full conditional ﬁducial densities f ( µ | σ 2 , x ) and f ( σ 2 | µ, x ) as they hav e just b een sp eciﬁed are compatible and the join t density function of µ and σ 2 that they directly deﬁne is unique. This density function is therefore the joint ﬁducial densit y of µ and σ 2 . In particular, the marginal densit y of µ o ve r this joint ﬁducial densit y is given b y: 29 µ | x ∼ Non-standardised t n − 1 ( ¯ x, s/ √ n ) (18) where s is the sample standard deviation, i.e. it is a non-standardised Student t density function with n − 1 degrees of freedom, lo cation parameter equal to ¯ x and scaling pa- rameter equal to s/ √ n (whic h are settings that of course mak e it a very familiar mem- b er of this particular family of density functions), while the marginal densit y of σ 2 o v er the join t ﬁducial density of µ and σ 2 in question is given b y: σ 2 | x ∼ In v-Gamma (( n − 1) / 2 , ( n − 1) s 2 / 2) (19) All the main results that hav e just b een outlined were previously giv en with more explanation in Bo water (2019a), and indeed, a similar deriv ation of these results can b e found in Bo water (2018a). By con trast, in what follo ws, the results that will b e presen ted are generally original results, i.e. results not discussed in earlier pap ers, although v arious references will b e made to examples that hav e b een detailed previously . In the scenario currently b eing considered, let us no w turn our atten tion to the case where we hav e imp ortan t pre-data knowledge ab out either of the parameters µ or σ 2 that can b e adequately represen ted by a probabilit y densit y function o v er the parameter concerned conditional on the other parameter b eing known. T o giv e an example, let us assume that our pre-data opinion ab out σ 2 conditional on µ b eing known can b e ad- equately represen ted by the densit y function of σ 2 conditional on µ that is deﬁned by: σ 2 | µ ∼ In v-Gamma ( α 0 , β 0 ) (20) where α 0 > 0 and β 0 > 0 are giv en constants. T reating this densit y function as a prior densit y function, and com bining it with the lik eliho o d function in this case, un- der the Bay esian paradigm, leads to a p osterior density of σ 2 conditional on µ that is deﬁned b y: 30 σ 2 | µ, x ∼ In v-Gamma ( α 0 + ( n/ 2) , β 0 + ( n ˆ σ 2 / 2)) (21) If there was no or v ery little pre-data kno wledge ab out µ , then it would b e quite natural to let the full conditional ﬁducial densit y f ( µ | σ 2 , x ) deﬁned b y equation (14), and the full conditional p osterior density p ( σ 2 | µ, x ) deﬁned in the equation just giv en, form the basis for using the framework describ ed in Section 2.2 to determine the joint p ost-data density of µ and σ 2 , i.e. the densit y p ( µ, σ 2 | x ). In fact, by using the simple analytical metho d outlined in the op ening part of Section 2.2, it can b e easily established that these full conditional densities are compatible, and it is clear that the joint densit y function for µ and σ 2 that they deﬁne m ust b e unique. This join t densit y function is therefore the p ost-data density p ( µ, σ 2 | x ). F urthermore, the marginal density of µ ov er this join t p ost-data density is giv en by: µ | x ∼ Non-standardised t 2 α 0 + n − 1 ¯ x,  2 β 0 + ( n − 1) s 2 (2 α 0 + n − 1) n  0 . 5 ! , (22) while the marginal density of σ 2 o v er the joint densit y in question is giv en by: σ 2 | x ∼ In v-Gamma ( α 0 + (( n − 1) / 2) , β 0 + (( n − 1) / 2) s 2 ) (23) T o illustrate this example, Figure 1 shows some results from using the calculations just describ ed to p erform an analysis of a data set x that is summarised by the v alues n = 9, ¯ x = 2 . 7 and s 2 = 9. In particular, this ﬁgure shows a plot of the sp eciﬁc form of the conditional prior densit y p ( σ | µ ) as deﬁned by equation (20) that was used in this analysis, which is represented b y the short-dashed curve in Figure 1(b), a plot of the marginal p ost-data densit y p ( µ | x ) as deﬁned by equation (22), which is represen ted b y the long-dashed (rather than the dot-dashed) curve in Figure 1(a), and a plot of the marginal p ost-data densit y p ( σ | x ) as giv en b y equation (23), which is represented by 31 the long-dashed curv e in Figure 1(b). T o complete the sp eciﬁcation of the prior density p ( σ | µ ), the constants α 0 and β 0 in equation (20) were set equal to 4 and 64 respectively . These settings imply that this prior density w ould b e equal to the marginal ﬁducial densit y of σ deﬁned b y equation (19) if this latter densit y w as based on having observed a v ariance of 16 in a preliminary sample of 9 observ ations dra wn from a p opulation ha ving the same unkno wn v ariance σ 2 that is currently b eing considered. Notice that, from a practical viewp oin t, this interpretation would b e genuinely useful if the mean µ of this p opulation w as not only assumed to b e unkno wn, but w as assumed not to b e the same as the mean µ of present in terest. On the basis of only the main data set b eing analysed, i.e. the data set x , and for comparison with the plots b eing considered, the solid curves in Figures 1(a) and 1(b) represen t, resp ectiv ely , the marginal ﬁducial density f ( µ | x ) as deﬁned by equation (18) and the marginal ﬁducial density f ( σ | x ) as given b y equation (19). Let us no w change the state of kno wledge ab out b oth the parameters µ and σ 2 b efore the data were observed. In particular, let us begin b y imagining that w e ha v e important pre-data kno wledge ab out the mean µ that can b e adequately represented by a proba- bilit y density function o v er µ conditional on σ 2 b eing kno wn, i.e. the density p ( µ | σ 2 ). T o giv e an example, let this density function b e deﬁned b y: µ | σ 2 ∼ Non-standardised t ν 0 ( µ 0 , σ 0 ) (24) where ν 0 > 0, σ 0 > 0 and µ 0 are giv en constan ts. T reating this choice of the density p ( µ | σ 2 ) as a prior densit y under the Ba y esian paradigm leads to a p osterior density of µ conditional on σ 2 that is deﬁned by: p ( µ | σ 2 , x ) ∝ (1 + (1 /σ 2 0 ν 0 )( µ − µ 0 ) 2 ) − ( ν 0 +1) / 2 exp( − n ( ¯ x − µ ) 2 / 2 σ 2 ) If now w e assume that there w as no or very little pre-data knowledge ab out σ 2 , then 32 −4 −2 0 2 4 6 0.0 0.1 0.2 0.3 0.4 (a) µ Density 2 3 4 5 6 7 8 0.0 0.1 0.2 0.3 0.4 0.5 0.6 (b) σ Density Figure 1: Conditional prior and marginal p ost-data densities of the mean µ and standard deviation σ of a normal distribution it w ould b e quite natural to use the full conditional ﬁducial density f ( σ 2 | µ, x ) giv en b y equation (17), and the full conditional p osterior density p ( µ | σ 2 , x ) deﬁned b y the equation just presented, as the basis for determining the join t p ost-data densit y of µ and σ 2 , i.e. the density p ( µ, σ 2 | x ). Similar to the previous example, it can easily b e sho wn b y using once again the simple analytical metho d outlined in the op ening part of Section 2.2 that these full conditional densities are compatible, and it is again clear that the joint densit y function for µ and σ 2 that they deﬁne must b e unique. This joint densit y function, whic h is therefore the p ost-data densit y p ( µ, σ 2 | x ), can in fact b e ex- pressed as follows: p ( µ, σ 2 | x ) = (1 /σ 2 ) ( n/ 2)+1 (1 + (1 /σ 2 0 ν 0 )( µ − µ 0 ) 2 ) − ( ν 0 +1) / 2 exp( − (1 / 2 σ 2 ) n ˆ σ 2 ) (25) T o illustrate the use of the metho d b eing discussed, let us apply this metho d to the analysis of the same data set x as we w ere concerned with in the previous example. In particular, Figure 1 sho ws, along with the plots that w ere men tioned earlier, a plot of 33 the sp eciﬁc form of the conditional prior densit y p ( µ | σ 2 ) as deﬁned by equation (24) that w as used in the present analysis, whic h is represented by the short-dashed curv e in Figure 1(a), and plots of the marginal densities of µ and σ ov er the join t p ost-data densit y p ( µ, σ 2 | x ) given in equation (25), whic h are represen ted by the dot-dash curves in Figure 1(a) and Figure 1(b) resp ectively . These marginal densities of µ and σ were obtained by numerical in tegration o v er the join t densit y p ( µ, σ 2 | x ). T o complete the sp eciﬁcation of the prior density p ( µ | σ 2 ), the constants in equation (24) were giv en the settings ν 0 = 17, µ 0 = − 0 . 3 and σ 0 = 4 / 3. These settings imply that this prior density w ould b e equal to the marginal ﬁducial densit y of µ giv en b y equation (18) if this latter densit y w as based on having observed a mean of − 0 . 3 and a v ariance of 32 in a prelim- inary sample of 18 observ ations dra wn from a p opulation having the same unknown mean µ that is curren tly b eing considered. Similar to a p oin t made earlier, such an in terpretation w ould b e genuinely useful in a practical sense if the v ariance σ 2 of this p opulation was not only assumed to b e unkno wn, but was assumed not to b e the same as the v ariance σ 2 of presen t interest. Finally , in the case where w e ha v e imp ortan t pre-data kno wledge ab out b oth µ and σ 2 that can b e adequately represen ted b y full conditional probability densities o ver each of these parameters, i.e. the densities p ( µ | σ 2 ) and p ( σ 2 | µ ), it would seem reasonable, assuming that these conditional densities are compatible, to treat these densities as being conditional prior densities, and to use exclusively the standard Ba yesian approac h to make inferences ab out µ and σ 2 . Since Ba y esian inference is a w ell-kno wn form of inference, no further discussion of this particular case will b e giv en here. 3.2. Alternativ e solution to Studen t’s problem In the previous section, Studen t’s problem w as tac kled b y incorp orating organic ﬁducial inference and Ba y esian inference into the framework outlined in Section 2.2, no w let us 34 consider a case in which it would seem appropriate to address the same problem b y also incorp orating bispatial inference in to this framework. In particular, let us assume that conditional on the v ariance σ 2 b eing known, the scenario of in terest of Section 2.5 would apply if the general parameter θ j w as tak en as b eing the mean µ , with the interv al [ θ j 0 , θ j 1 ] in this scenario b eing denoted now as the in terv al [ µ 1 − ε, µ 1 + ε ], where ε ≥ 0 and µ 1 are given constan ts. W e will therefore construct the p ost-data density of µ conditional on σ 2 using the t yp e of bispatial infer- ence describ ed in Section 2.5. T o do this, the test statistic T ( x ) as deﬁned in Section 2.5 will b e quite reasonably assumed to b e the sample mean ¯ x . Therefore, in the case where the mean ¯ x is greater than zero, which will b e assumed to b e the case of particular interest, the hypotheses H P and H S will b e as deﬁned in equations (8) and (9), which implies that, for the present example, they can b e more sp eciﬁcally expressed as: H P : µ ≤ µ 1 + ε H S : ρ ( X ∗ > ¯ x ) ≤ 1 − Φ(( ¯ x − µ 1 − ε ) √ n/σ ) (= J ) (26) where X ∗ is the mean of an as-y et-unobserved sample of n additional v alues dra wn from the density function g ( x | µ, σ 2 ), i.e. the normal densit y function b eing studied, and Φ( y ) is the cumulativ e density of a standard normal distribution at the v alue y . Also, it will b e assumed, quite reasonably , that the ﬁducial densit y f S ( θ j | x ), which is required b y equations (10) and (12), i.e. the densit y f S ( µ | σ 2 , x ) in the presen t case, is the ﬁducial densit y of µ given σ 2 that w as deﬁned in equation (14). T o complete the sp eciﬁcation of the p ost-data density of µ giv en σ 2 , i.e. in k eep- ing with earlier notation, the density b ( µ | σ 2 , x ), let us now mak e some more sp eciﬁc assumptions. In particular, let us assume that µ 1 = 0 and ε = 0 . 2, and that the den- sit y function h ( θ j ) that app ears in equation (11), i.e. the density h ( µ ) in the present 35 (a) −1 0 1 2 3 4 5 6 0.0 0.1 0.2 0.3 0.4 µ Density (b) 2 3 4 5 6 7 8 0.0 0.1 0.2 0.3 0.4 0.5 0.6 σ Density Figure 2: Histograms representing marginal p ost-data densities of the mean µ and standard deviation σ of a normal distribution case, is deﬁned by: µ ∼ Beta (4 , 4 , − 0 . 2 , 0 . 2) (27) i.e. it is a b eta density function for µ on the interv al [ − 0 . 2 , 0 . 2] with b oth its shap e parameters equal to 4. F urthermore, w e will assume that the data is summarised as it w as in the previous section, i.e. b y n = 9, ¯ x = 2 . 7 and s 2 = 9. Finally , the probabilities κ that w ould b e assigned to the h yp othesis H S in equation (26) for diﬀeren t v alues of σ 2 will b e assumed to b e given by the PDO curv e for µ conditional on σ 2 that has the form ula: κ = J 0 . 6 , where, as indicated in equation (26), J is the one-sided P v alue in the deﬁnition of the h yp othesis H S concerned. These assumptions fully sp ecify the p ost-data densit y b ( µ | σ 2 , x ) according to the metho dology outlined in Section 2.5. In fact, in Bow ater (2019b), this full conditional density of µ , precisely as this den- sit y has just b een deﬁned, and the full conditional ﬁducial density f ( σ 2 | µ, x ) given b y equation (17), with the data set x assumed to b e as currently sp eciﬁed, w ere used as the 36 basis for determining the joint p ost-data density of µ and σ 2 within the same type of framew ork as describ ed in Section 2.2. As men tioned earlier, the use of the full condi- tional ﬁducial densit y of σ 2 b eing referred to would b e quite natural if it was assumed there was no or v ery little pre-data knowledge ab out the v ariance σ 2 . How ever, this assumption will not b e made here. Instead, let us assume that w e hav e imp ortant pre- data kno wledge ab out σ 2 that in fact is adequately represen ted b y the densit y function for σ 2 conditional on µ that is deﬁned b y equation (20), with the same choices for the constan ts α 0 and β 0 as were used earlier to express pre-data kno wledge about σ 2 con- ditional on µ , i.e. with α 0 = 4 and β 0 = 64. T reating this densit y function as a prior densit y function under the Ba yesian paradigm leads therefore to the p osterior densit y of σ 2 giv en µ , i.e. the density p ( σ 2 | µ, x ), b eing deﬁned as it was in equation (21). T o illustrate this example, Figure 2 shows some results from running a Gibbs sampler on the basis of the full conditional p ost-data densities of µ and σ 2 that hav e just b een deﬁned, i.e. the p ost-data densit y b ( µ | σ 2 , x ) and the p osterior density p ( σ 2 | µ, x ), with a uniform random scanning order of the parameters µ and σ 2 , as such a scanning order w as deﬁned in Section 2.2. In particular, the histograms in Figures 2(a) and 2(b) represent the distributions of the v alues of the mean µ and the standard deviation σ , resp ectiv ely , o v er a single run of six million samples of these parameters generated by the Gibbs sam- pler after a preceding run of t wo thousand samples, whic h were classiﬁed as belonging to its burn-in phase, had b een discarded. The sampling of the densit y b ( µ | σ 2 , x ) was based on the Metrop olis algorithm (Metrop olis et al. 1953), while each v alue drawn from the densit y p ( σ 2 | µ, x ) w as indep enden t from the preceding iterations. In addition to this analysis, the Gibbs sampler w as also run v arious times from dif- feren t starting p oin ts, and a careful study of the output of these runs using appropriate diagnostics provided no evidence to suggest that the sampler do es not hav e a limiting distribution, and sho w ed, at the same time, that it would app ear to generally conv erge 37 quic kly to this distribution. F urthermore, the Gibbs sampling algorithm w as run sepa- rately with eac h of the t w o p ossible ﬁxed scanning orders of the parameters, i.e. the one in which µ is up dated ﬁrst and then σ 2 is up dated, and the one that has the rev erse order, in accordance with how a single transition of such an algorithm was deﬁned in Section 2.2, i.e. single transitions of the algorithm incorp orated up dates of b oth param- eters. In doing this, no statistically signiﬁcan t diﬀerence was found betw een the samples of parameter v alues aggregated ov er the runs of the sampler in using eac h of these tw o scanning orders after excluding the burn-in phase of the sampler, e.g. b et w een the tw o sample correlations of µ and σ , even when the runs concerned were long. T aking into accoun t what was discussed in Section 2.2, this implies that the full conditional densities of the limiting distribution of the original Gibbs sampler, i.e. the one with a uniform random scanning order, should b e, at the very least, close appro ximations to the full conditional densities on which the sampler is based, i.e. the p ost-data density b ( µ | σ 2 , x ) and the p osterior densit y p ( σ 2 | µ, x ) deﬁned earlier. Eac h of the curves ov erlaid on the histograms in Figures 2(a) and 2(b), which are distinguished b y b eing plotted with short-dashed, long-dashed and solid lines, is iden tical to the curve plotted using the same line t yp e in Figures 1(a) and 1(b) resp ectiv ely . By comparing the histograms in Figures 2(a) and 2(b) with the curves in question, it can b e seen that the forms of the marginal p ost-data densities of µ and σ that are represented b y these histograms are consistent with what w e would ha ve in tuitively exp ected given the pre-data b eliefs ab out µ and σ that ha ve b een tak en in to account. It ma y also be to some exten t informativ e to compare Figures 2(a) and 2(b) with Figures 4(a) and 4(b) of Bo w ater (2019b), since these latter ﬁgures relate to the example from this earlier pap er that w as mentioned midw ay through the presen t section. 38 3.3. Inference ab out a trinomial distribution W e will no w consider the problem of making inferences ab out the parameters π = ( π 1 , π 2 , π 3 ) 0 of a trinomial distribution, where π i is the prop ortion of times that the i th outcome of the three p ossible outcomes is generated in the long run, based on observing a sample of counts x = ( x 1 , x 2 , x 3 ) 0 from the distribution concerned, where x i is the n umb er of times that the i th outcome is observed. Since of course π 1 + π 2 + π 3 = 1, this mo del has eﬀectively only tw o parameters, which w e will assume to b e the prop ortions π 1 and π 2 . T o clarify , the probabilit y of observing the sample of coun ts x = ( x 1 , x 2 , x 3 ) 0 is sp eciﬁed b y the trinomial mass function in this case, i.e. the function: g 0 ( x | π 1 , π 2 ) = ( ( n ! /x 1 ! x 2 ! x 3 !) π x 1 1 π x 2 2 π x 3 3 if x 1 , x 2 , x 3 ∈ Z ≥ 0 and n = x 1 + x 2 + x 3 0 otherwise where the total num b er of counts n is ﬁxed. In particular, let us b egin b y applying organic ﬁducial inference as outlined in Sec- tion 2.4 to mak e inferences ab out π 2 conditional on π 1 b eing kno wn. In this regard, observ e that if π 1 w as known, suﬃcient statistics for π 2 w ould b e x 2 and x 2 + x 3 . How- ev er, x 2 + x 3 is an ancillary complement of x 2 , and therefore, according to the more general deﬁnition of the ﬁducial statistic Q ( x ) giv en in Bo water (2019a), the count x 2 can justiﬁably b e assumed to b e the statistic Q ( x ). Based on this assumption and giv en a v alue for π 1 , equation (3) can naturally b e redeﬁned as: x 2 = ϕ (Γ , π 2 ) = min n y : Γ < P y j = 0 g 1 ( j | π 2 ) o (28) where the primary r.v. Γ has a uniform distribution o v er the in terv al (0 , 1), and the function g 1 ( j | π 2 ) is given by: 39 g 1 ( j | π 2 ) = ( x 2 + x 3 )! ( x 2 + x 3 − j )! j !  π 2 1 − π 1  j  1 − π 1 − π 2 1 − π 1  x 2 + x 3 − j in whic h the statistic x 2 + x 3 is treated as having already been generated. Giv en that it will b e assumed that there w as no or v ery little pre-data knowledge ab out the prop ortion π 2 , the GPD function for π 2 will b e quite reasonably sp eciﬁed as follo ws: ω G ( π 2 ) = a if 0 ≤ π 2 ≤ 1 − π 1 and 0 otherwise, where a > 0. Ho w ever, since for whatev er choice is made for this GPD function and whatev er turns out to b e the sample x , equation (28) will never satisfy Condition 1 of Section 2.4, the principle outlined in this earlier section for deriving the ﬁducial densit y f ( θ j | θ − j , x ) can not b e employ ed in the case of in terest to determine the ﬁducial density of π 2 giv en π 1 , i.e. the densit y f ( π 2 | π 1 , x ). This densit y can instead, though, b e determined by applying Principle 2 of Bow ater (2019a), which as men tioned in Section 2.4, is a principle that relies on the concept of a lo cal pre-data (LPD) function. In particular, to mak e use of this principle in the presen t case, we need to sp ecify a LPD function for π 2 . F urther details ab out ho w the principle in question is applied are giv en in Bow ater (2019a). As also discussed in this earlier pap er, the type of metho d b eing considered could b e used to obtain a complete set of full conditional ﬁducial densities for k of the p opulation prop ortions of a multinomial distribution with k + 1 categories on the basis of a giv en sample from this distribution, which could then b e used to determine a joint ﬁducial densit y of these k prop ortions (or equiv alently of all k + 1 p opulation prop ortions of the distribution) using the type of framework outlined in Section 2.2 of the current paper. In relation to this issue, a detailed example was presen ted in Bow ater (2019a) of how a join t ﬁducial densit y of the ﬁv e (or equiv alently four of the ﬁve) p opulation prop ortions of a m ultinomial distribution with ﬁve categories could b e obtained using suc h an approach. Ho w ever, in the presen t case, it will b e assumed that, unlik e the p ost-data density of π 2 giv en π 1 , the p ost-data densit y of π 1 giv en π 2 do es not b elong to the class of full 40 conditional ﬁducial densities under discussion. This is b ecause, in contrast to the kind of scenario where the t yp e of approac h just men tioned is most applicable, it will be assumed that w e hav e imp ortant pre-data kno wledge ab out the prop ortion π 1 , and that this pre- data kno wledge can, in particular, b e adequately represented by a probabilit y densit y function ov er π 1 conditional on π 2 b eing known, i.e. the densit y p ( π 1 | π 2 ). T o giv e an example, let this density function be deﬁned b y: p ( π 1 | π 2 ) = ( C 4 ( π 1 ) α − 1 (1 − π 1 ) β − 1 if 0 ≤ π 1 ≤ 1 − π 2 0 otherwise (29) where α > 0 and β > 0 are given constan ts, and C 4 is a normalising constant. T reating this choice of the densit y p ( π 1 | π 2 ) as a prior density and combining it with the likeli- ho od function in this case, under the Bay esian paradigm, leads to a p osterior density of π 1 giv en π 2 that is deﬁned by: p ( π 1 | π 2 , x ) = ( C 5 ( π 1 ) α + x 1 − 1 (1 − π 1 − π 2 ) n − x 1 − x 2 (1 − π 1 ) β − 1 if 0 ≤ π 1 ≤ 1 − π 2 0 otherwise where C 5 is a normalising constant. T o illustrate this example, Figure 3 shows some results from running a Gibbs sampler on the basis of the full conditional p ost-data densities of π 1 and π 2 that ha v e just b een referred to, i.e. the ﬁducial density f ( π 2 | π 1 , x ) and the p osterior density (deriv ed using Ba y esian inference) p ( π 1 | π 2 , x ), with a uniform random scanning order of the param- eters π 1 and π 2 . In particular, the histograms in Figures 3(a) and 3(b) represent the distributions of the v alues of π 1 and π 2 , respectively , o ver a single run of six million samples of these parameters generated by the Gibbs sampler after a preceding run of one thousand samples were discarded due to these samples b eing classiﬁed as b elonging to its burn-in phase. The sampling of the densit y p ( π 1 | π 2 , x ) was based on the Metrop o- 41 (a) 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0 1 2 3 4 5 6 π 1 Density (b) 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0 1 2 3 4 5 π 2 Density Figure 3: Unconditional prior densit y of one parameter, namely π 1 , and marginal p ost-data densities of b oth parameters π 1 and π 2 of a trinomial distribution lis algorithm, while the sampling of the density f ( π 2 | π 1 , x ) was indep enden t from the preceding iterations. Moreo v er, the observed coun ts on whic h the inferen tial pro cess b eing describ ed was based were set as follows: x 1 = 4, x 2 = 2 and x 3 = 6. Also, it w as assumed that the LPD function for π 2 w as given by: ω L ( π 2 ) = ( b if 0 ≤ π 2 ≤ 1 − π 1 0 otherwise where b > 0, which is in keeping with the c hoices that were made for functions of this kind in the aforementioned example in Bow ater (2019a) of the use of organic ﬁducial inference in this type of situation. Finally , the sp eciﬁcation of the prior density p ( π 1 | π 2 ) w as completed by making the assignmen ts α = 1 . 5 and β = 11 . 5 in equation (29). Observ e that these c hoices for the v ariables α and β imply that the prior density p ( π 1 | π 2 ) is equal to the density function of π 1 that is deﬁned by: 42 p ( π 1 ) ∝ ( π 1 ) 0 . 5 (1 − π 1 ) 10 . 5 if 0 ≤ π 1 ≤ 1 and equal to 0 otherwise (30) conditioned on the inequality π 1 ≤ 1 − π 2 , whic h clearly must alwa ys hold, but is of course a condition that can only b e applied if the proportion π 2 is kno wn. F urthermore, this latter unconditioned densit y p ( π 1 ) is equiv alen t to the (unconditional) p osterior densit y of π 1 that w ould b e formed after observing the coun ts x 1 = 1 and x 2 + x 3 = 11 (for whic h, we can see, membership of categories 2 and 3 is not distinguished) if the prior density of π 1 w as the Jeﬀreys prior that corresp onds to conducting the binomial exp erimen t that pro duced these counts (see Jeﬀreys 1961). How ever, since as mentioned in Section 2.3, p osterior densities formed on the basis of prior densities that are dep end- en t on the sampling mo del, such as the Jeﬀreys prior, are contro versial, it is arguably of more in terest to note that this p osterior density of π 1 is a close approximation to forms of the (unconditional) ﬁducial density of π 1 that w ould b e naturally constructed on the basis of the tw o counts in question, i.e. x 1 = 1 and x 2 + x 3 = 11, b y applying the metho dology in Bow ater (2019a) if nothing or v ery little w as kno wn about the proportion π 1 b efore these counts were observed. This t yp e of approximation was discussed b oth in this previous pap er and in Bow ater (2019b). In addition to the analysis just describ ed, the Gibbs sampler of present in terest was also run v arious times from diﬀerent starting p oints, and there was no suggestion from using appropriate diagnostics that the sampler do es not ha ve a limiting distribution. F urthermore, after excluding the burn-in phase of the sampler, no statistically signiﬁcant diﬀerence w as found b et ween the samples of parameter v alues aggregated ov er the runs of the sampler in using each of the tw o ﬁxed scanning orders of the parameters π 1 and π 2 that are p ossible, with a single transition of the sampler deﬁned in the same w a y as in the example outlined in the previous section, even when the runs concerned were long. Therefore, taking in to account what was discussed in Section 2.2, the full condi- tional densities of the limiting distribution of the original random-scan Gibbs sampler 43 should b e, at the very least, close approximations to the full conditional densities on whic h the sampler is based, i.e. the p osterior densit y p ( π 1 | π 2 , x ) and the ﬁducial den- sit y f ( π 2 | π 1 , x ) deﬁned earlier. The solid curves ov erlaid on the histograms in Figures 3(a) and 3(b) are plots of the marginal densities of the parameters π 1 and π 2 , resp ectiv ely , o v er the joint p osterior densit y of π 1 and π 2 that w ould b e formed after having only observed the main data of in terest, i.e. the counts x 1 = 4, x 2 = 2 and x 3 = 6, if the join t prior densit y of these parameters w as the Jeﬀreys prior for this case. It can b e sho wn that this join t p osterior densit y , whic h is in fact deﬁned b y the expression: p ( π 1 , π 2 | x ) = ( C 6 ( π 1 ) x 1 − 0 . 5 ( π 2 ) x 2 − 0 . 5 (1 − π 1 − π 2 ) x 3 − 0 . 5 if π 1 , π 2 ∈ [0 , 1] and π 1 + π 2 ≤ 1 0 otherwise where C 6 is a normalising constan t, is a close appro ximation to forms of the joint ﬁducial densit y of π 1 and π 2 that would be naturally constructed on the basis of these observ ed coun ts x 1 , x 2 and x 3 b y applying the methodology in Bo w ater (2019a) if there w as no or very little pre-data knowledge ab out π 1 and π 2 . The dashed curve o verlaid on the histogram in Figure 3(a) is a plot of the densit y function of π 1 giv en in equation (30), i.e. the unconditioned prior density p ( π 1 ). By comparing the lo cations and degrees of disp ersion of the histograms in Figures 3(a) and 3(b), it can b e seen that it is beyond dispute that generally more precise conclusions can b e drawn ab out the prop ortion π 1 than the prop ortion π 2 after the counts x 1 , x 2 and x 3 in question ha ve b een observed, which, on the basis of comparing these histograms with the curv es o verlaid on them, can b e clearly attributed to the incorp oration, under the Bay esian paradigm, of substan tial prior information ab out π 1 in to the construction of the joint p ost-data density of π 1 and π 2 . 44 3.4. Inference ab out a linear regression mo del Let us now turn our atten tion to the problem of making inferences ab out all the param- eters β 0 , β 1 , β 2 , β 3 and σ 2 of the normal linear regression mo del deﬁned by: Y = β 0 + β 1 x 1 + β 2 x 2 + β 3 x 3 + ε with ε ∼ N(0 , σ 2 ) (31) where Y is the response v ariable and x 1 , x 2 and x 3 are three co v ariates, on the basis of a data set y + = { ( y i , x 1 i , x 2 i , x 3 i ) : i = 1 , 2 , . . . , n } , where y i is the v alue of Y generated by this mo del for the i th case in this data set given v alues x 1 i , x 2 i and x 3 i of the co v ariates x 1 , x 2 and x 3 resp ectiv ely . Observ e that suﬃcient statistics for each of the parameters β 0 , β 1 , β 2 , β 3 and σ 2 conditional on all parameters except the parameter itself b eing known are resp ectively: n X i =1 y i , n X i =1 x 1 i y i , n X i =1 x 2 i y i , n X i =1 x 3 i y i and n X i =1 ( y i − β 0 − β 1 x 1 i − β 2 x 2 i − β 3 x 3 i ) 2 (32) In Bo water (2018a), all except the fourth statistic here were used as ﬁducial statistics Q ( y + ) to derive, under the strong ﬁducial argument, a complete set of full conditional ﬁducial densities of the mo del parameters in the sp ecial case where the mo del in equa- tion (31) is a quadratic regression mo del, i.e. where x 2 = ( x 1 ) 2 and the co eﬃcient β 3 is set to zero (hence the lac k of a need for the fourth statistic). Also, it was shown in this earlier paper that, since these full conditional densities are compatible, they directly deﬁne a unique join t density for β 0 , β 1 , β 2 and σ 2 , whic h is therefore a joint ﬁducial densit y for these parameters. F urthermore, it is fairly clear from this previous analysis ho w the particular method of inference that w as emplo yed can b e extended to address the problem of making inferences ab out the parameters of the more general t yp e of normal linear regression mo del that is deﬁned by equation (31). 45 Ho w ever, this sp eciﬁc t yp e of metho d is not going to b e directly applicable to the case that will b e presen tly considered. This is because, although it will b e assumed that nothing or v ery little w as known ab out the parameters β 0 , β 2 and σ 2 b efore the data w ere observ ed, b y contrast it is going to b e assumed that there was a substan tial amount of pre-data kno wledge ab out the parameters β 1 and β 3 . Let us b egin though b y clarifying ho w the full conditional p ost-data densities of β 0 , β 2 and σ 2 will b e constructed. With this aim in mind, notice that if the suﬃcien t statistics for β 0 and β 2 presen ted in equation (32) are treated as the ﬁducial statistics Q ( y + ) in making inferences ab out these t wo parameters resp ectively , then given that the sampling distributions of these statistics are normal, the functions ϕ (Γ , β 0 ) and ϕ (Γ , β 2 ), as generally deﬁned by equa- tion (3), can be expressed in a similar wa y to ho w the function ϕ (Γ , µ ) was expressed in equation (13). Also if, under the condition that σ 2 is the only unknown parame- ter, the suﬃcien t statistic for σ 2 presen ted in equation (32) is treated as the statistic Q ( y + ) in making inferences ab out this parameter, then given that this statistic divided b y σ 2 has a c hi-squared sampling distribution with n degrees of freedom, the function ϕ (Γ , σ 2 ) can b e expressed in a similar w ay to how this type of function w as expressed in equation (15), where it was also denoted as ϕ (Γ , σ 2 ) but with of course a diﬀerent meaning. F urthermore, giv en what has b een assumed, it would b e quite natural to sp ec- ify the GPD function for σ 2 in the same wa y as the GPD function for a p opulation v ariance (also denoted as σ 2 ) was deﬁned in equation (16), and to sp ecify the GPD functions for β 0 and β 2 as follo ws: ω G ( β i ) = a for β i ∈ ( −∞ , ∞ ), where a > 0. This leads to the full conditional ﬁducial densities for β 0 , β 2 and σ 2 b eing deﬁned as follo ws: β 0 | β − 0 , σ 2 , y + ∼ N  P n i =1 y i /n − β 1 P n i =1 x 1 i /n − β 2 P n i =1 x 2 i /n − β 3 P n i =1 x 3 i /n, σ 2 /n  (33) 46 β 2 | β − 2 , σ 2 , y + ∼ N  P n i =1 x 2 i y i − β 0 P n i =1 x 2 i − β 1 P n i =1 x 1 i x 2 i − β 3 P n i =1 x 2 i x 3 i P n i =1 x 2 2 i , σ 2 P n i =1 x 2 2 i  (34) σ 2 | β 0 , ... , β 3 , y + ∼ In v-Gamma  n/ 2 , P n i =1 ( y i − β 0 − β 1 x 1 i − β 2 x 2 i − β 3 x 3 i ) 2 / 2  (35) where β − j denotes the set of all the regression co eﬃcien ts except β j . No w let us pro vide more details with regard to what w as known ab out the co eﬃcien t β 3 b efore the data were observ ed. In particular, let us assume that conditional on all other parameters in the mo del b eing known, the scenario of interest of Section 2.5 w ould apply if the general parameter θ j w as taken as b eing β 3 , with the interv al [ θ j 0 , θ j 1 ] in this scenario no w b eing speciﬁed as simply the in terv al [ − δ, δ ], where δ ≥ 0. W e will therefore construct the full conditional p ost-data density of β 3 using the type of bispatial inference outlined in Section 2.5, which implies that, from no w on, this densit y will b e denoted as b ( β 3 | β − 3 , σ 2 , y + ). In particular to do this, the test statistic T ( x ) as deﬁned in Section 2.5, which now needs to b e denoted as T ( y + ), will b e assumed to b e the least squares estimator of β 3 under the condition that all other parameters are kno wn, i.e. the estimator: ˆ β 3 = P n i =1 x 3 i y i − β 0 P n i =1 x 3 i − β 1 P n i =1 x 1 i x 3 i − β 2 P n i =1 x 2 i x 3 i P n i =1 x 2 3 i (36) whic h is a reasonable assumption to mak e since, under this condition, it is a suﬃcient statistic for β 3 that satisﬁes the second criterion given in Section 2.5 for b eing the sta- tistic T ( y + ). Observe that this estimator has a sampling distribution that is deﬁned b y: ˆ β 3 ∼ N  β 3 , σ 2 / P n i =1 x 2 3 i  Therefore, the hypotheses H P and H S deﬁned in Section 2.5 that are applicable in the 47 case where ˆ β 3 ≤ 0, i.e. the h yp otheses in equations (6) and (7), can now b e expressed as: H P : β 3 ≥ − δ H S : ρ ( b B ∗ 3 < ˆ β 3 ) ≤ Φ  ( ˆ β 3 + δ )(1 /σ ) q P n i =1 x 2 3 i  (= J ) (37) where Φ( ) again denotes the standard normal distribution function, while b B ∗ 3 is the estimator ˆ β 3 calculated exclusively on the basis of an as-y et-unobserved sample of n additional data p oin ts Y ∗ + = { ( Y ∗ i , x 1 i , x 2 i , x 3 i ) : i = 1 , 2 , . . . , n } generated according to the regression mo del in equation (31), where the v alues of the cov ariates x 1 , x 2 and x 3 are assumed to b e the same as in the original sample. On the other hand, the hypotheses H P and H S that apply if ˆ β 3 > 0, i.e. the h yp otheses in equations (8) and (9), can now b e expressed as: H P : β 3 ≤ δ H S : ρ ( b B ∗ 3 > ˆ β 3 ) ≤ 1 − Φ  ( ˆ β 3 − δ )(1 /σ ) q P n i =1 x 2 3 i  (= J ) (38) Also, let us assume, quite reasonably , that the ﬁducial density f S ( θ j | x ) that is re- quired by equations (10) and (12), i.e. the density f S ( β 3 | β − 3 , σ 2 , y + ) in the presen t case, is derived on the basis of the strong ﬁducial argument with the ﬁducial statistic Q ( y + ) sp eciﬁed as b eing a suﬃcien t statistic for β 3 , e.g. one of the suﬃcien t statistics for β 3 giv en in equations (32) and (36). Under these assumptions, the ﬁducial density in ques- tion is determined in a similar w ay to how the ﬁducial densities in equations (33), (34) and (35) were determined, and in particular is given b y the expression: β 3 | β − 3 , σ 2 , y + ∼ N  ˆ β 3 , σ 2 / P n i =1 x 2 3 i  (39) On the other hand, it will b e assumed that w e knew enough ab out the co eﬃcien t β 1 48 b efore the data w ere observed such that it is possible to adequately represen t our pre- data kno wledge ab out this co eﬃcien t by placing a probabilit y density function o ver this co eﬃcien t conditional on all other parameters b eing kno wn, i.e. the densit y p ( β 1 | β − 1 , σ 2 ). T o give an example, let this densit y function b e deﬁned by: β 1 | β − 1 , σ 2 ∼ N( µ 0 , σ 2 0 ) (40) where µ 0 and σ 0 > 0 are giv en constan ts. T reating this choice of the density p ( β 1 | β − 1 , σ 2 ) as a prior density and com bining it with the lik eliho o d function in this case, under the Bay esian paradigm, leads to a full conditional p osterior density of β 1 , i.e. the densit y p ( β 1 | β − 1 , σ 2 , y + ), that can b e expressed as: β 1 | β − 1 , σ 2 , y + ∼ N σ 2 1 " ˆ β 1 P n i =1 x 2 1 i σ 2 + µ 0 σ 2 0 # , σ 2 1 ! where σ 2 1 =  ( P n i =1 x 2 1 i /σ 2 ) + (1 /σ 2 0 )  − 1 and ˆ β 1 = P n i =1 x 1 i y i − β 0 P n i =1 x 1 i − β 2 P n i =1 x 1 i x 2 i − β 3 P n i =1 x 1 i x 3 i P n i =1 x 2 1 i T o illustrate this example, Figure 4 sho ws some results from running a Gibbs sam- pler with a uniform random scanning order of the parameters β 0 , β 1 , β 2 , β 3 and σ 2 on the basis of the full conditional p ost-data densities of these parameters that ha ve just b een detailed, i.e. the ﬁducial densities f ( β 0 | β − 0 , σ 2 , y + ), f ( β 2 | β − 2 , σ 2 , y + ) and f ( σ 2 | β 0 , ... , β 3 , y + ) deﬁned b y equations (33), (34) and (35), the p ost-data densit y (derived using bispatial inference) b ( β 3 | β − 3 , σ 2 , y + ) and the p osterior densit y p ( β 1 | β − 1 , σ 2 , y + ) deﬁned b y the equation just giv en. In particular, the histograms in Figures 4(a) to 4(d) repre- sen t the distributions of the v alues of the co eﬃcients β 1 , β 2 , β 3 and the standard devia- 49 tion σ , resp ectiv ely , ov er a single run of ten million samples of all ﬁve mo del parame- ters generated b y the Gibbs sampler after allo wing for its burn-in phase b y discarding a preceding run of ﬁv e thousand samples. (F or reasons of space, a histogram of the gen- erated v alues of the in tercept co eﬃcient β 0 is not given.) The sampling of the density b ( β 3 | β − 3 , σ 2 , y + ) w as based on the Metrop olis algorithm, while the sampling of eac h of the other four full conditional p ost-data densities was indep enden t from the preceding iterations. Moreo v er, the v alues for the resp onse v ariable Y in the observed data set y + w ere a t ypical sample of n = 18 such v alues generated according to the regression mo del in equation (31) with β 0 = 0, β 1 = 5, β 2 = − 2, β 3 = 1 and σ = 1 . 5, and with the v alues of the cov ariates x 1 , x 2 and x 3 in this data set c hosen without replacement from the 27 com binations of v alues for these cov ariates that are p ossible if eac h co v ariate can only tak e the v alue − 1, 0 or 1. In particular, the wa y these cov ariate v alues were selected resulted in: P x 1 i = − 1, P x 2 i = 2, P x 3 i = 1, P x 1 i x 2 i = 3, P x 1 i x 3 i = 4 and P x 2 i x 3 i = − 3. In addition, the sp eciﬁcation of the p osterior densit y p ( β 1 | β − 1 , σ 2 , y + ) w as completed by setting the constants µ 0 and σ 0 , i.e. the constan ts that control the c hoice of the prior density of β 1 in equation (40), to b e 4.4 and 0.6 resp ectively . On the other hand, with regard to how the p ost-data densit y b ( β 3 | β − 3 , σ 2 , y + ) was fully determined, the constan t δ was assumed to b e equal to 0.1, and the probabilities κ that w ould b e assigned to the h yp othesis H S as deﬁned b y either equation (37) or equa- tion (38) for diﬀerent v alues of all the mo del parameters except β 3 w ere assumed to b e giv en b y the PDO curv e with the form ula: κ = J 0 . 6 , where, as indicated in equations (37) and (38), J is the one-sided P v alue in whic hever deﬁnition of the h yp othesis H S is applicable. Also, in determining the p ost-data densit y of β 3 in question, the density function h ( θ j ) that app ears in equation (11), i.e. the density h ( β 3 ) in the present case, w as deﬁned similar to how a density function of this type was sp eciﬁed in Section 3.2, 50 3 4 5 6 7 0.0 0.2 0.4 0.6 0.8 1.0 (a) β 1 Density −3.5 −3.0 −2.5 −2.0 −1.5 0.0 0.2 0.4 0.6 0.8 1.0 (b) β 2 Density −0.5 0.0 0.5 1.0 1.5 2.0 2.5 0.0 0.2 0.4 0.6 0.8 (c) β 3 Density 1.0 1.5 2.0 2.5 0.0 0.5 1.0 1.5 (d) σ Density Figure 4: Conditional prior densit y of one parameter, namely β 1 , and marginal p ost-data densities of four parameters β 1 , β 2 , β 3 and σ of a normal linear regression model that is, by the expression β 3 ∼ Beta (4 , 4 , − 0 . 1 , 0 . 1), where the notation here is the same as used in equation (27). Supplemen tary to this analysis, there w as no suggestion from applying appropriate diagnostics to multiple runs of the Gibbs sampler from diﬀerent starting p oin ts that it did not ha ve a limiting distribution. F urthermore, the Gibbs sampling algorithm w as run 51 separately with v arious very distinct ﬁxed scanning orders of the ﬁv e mo del parameters β 0 , β 1 , β 2 , β 3 and σ 2 in accordance with how a single transition of suc h an algorithm with a ﬁxed scanning order was deﬁned in Section 2.2. In doing this, no statistically signiﬁcan t diﬀerence was found betw een the samples of parameter v alues aggregated ov er the runs of the sampler, after excluding the burn-in phase of the sampler, in using eac h of the scanning orders concerned, e.g. b et w een the v arious correlation matrices of the parameters and b et ween the v arious distributions of each individual parameter, ev en when the runs in question were long. Therefore, on the grounds of what was discussed in Section 2.2, it would b e reasonable to conclude that the full conditional densities of the limiting distribution of the original random-scan Gibbs sampler should b e, at the v ery least, close approximations to the full conditional densities on which the sampler is based, i.e. the ﬁducial densities f ( β 0 | β − 0 , σ 2 , y + ), f ( β 2 | β − 2 , σ 2 , y + ) and f ( σ 2 | β 0 , ... , β 3 , y + ), the p ost-data densit y b ( β 3 | β − 3 , σ 2 , y + ) and the p osterior density p ( β 1 | β − 1 , σ 2 , y + ). The solid curv es o verlaid on the histograms in Figures 4(a) to 4(d) are plots of the marginal densities of the co eﬃcien ts β 1 , β 2 , β 3 and the standard deviation σ , resp ec- tiv ely , ov er the joint ﬁducial density of all the parameters in the mo del that is deﬁned directly and uniquely by the set of compatible full conditional densities that consists of the ﬁducial densities f ( β 0 | β − 0 , σ 2 , y + ), f ( β 2 | β − 2 , σ 2 , y + ) and f ( σ 2 | β 0 , ... , β 3 , y + ) just re- ferred to, which of course are giv en b y equations (33), (34) and (35), the ﬁducial densit y f S ( β 3 | β − 3 , σ 2 , y + ) given b y equation (39), and the ﬁducial density for β 1 conditional on β 0 , β 2 , β 3 and σ 2 that results from making assumptions that are analogous to those on whic h the aforemen tioned full conditional ﬁducial densities for β 0 , β 2 and β 3 are based. On the other hand, the dashed curve ov erlaid on the histogram in Figure 4(a) is a plot of the conditional prior density of β 1 giv en in equation (40). By comparing the histograms in Figures 4(a) to 4(d) with the curv es ov erlaid on them, it can be seen that the forms of the marginal p ost-data densities of β 1 , β 2 , β 3 and σ that 52 are represented by these histograms are consisten t with what could hav e b een intuitiv ely exp ected giv en the pre-data b eliefs ab out all of the mo del parameters that w ere tak en in to accoun t as part of the metho d of inference that has b een describ ed in the present section. 3.5. Inference ab out a biv ariate normal distribution T o give a ﬁnal detailed example of the application of integrated organic inference, let us consider the problem of making inferences about all ﬁve parameters of a biv ariate normal densit y function, i.e. the means µ x and µ y and the v ariances σ 2 x and σ 2 y , resp ectiv ely , of the t wo random v ariables concerned X and Y , and the correlation τ of X and Y , on the basis of a sample from this t yp e of densit y function, i.e. the sample z = { ( x i , y i ) : i = 1 , 2 , . . . , n } , where x i and y i are the i th realisations of X and Y resp ectiv ely . In Bow ater (2018a), as a wa y of addressing this problem, full conditional ﬁducial densities were derived either exactly or appro ximately for each of the parameters µ x , µ y , σ 2 x , σ 2 y and τ b y using appropriately c hosen ﬁducial statistics under the strong ﬁducial argumen t, and then it was illustrated ho w, on the basis of these conditional densities, what can b e regarded as being a suitable join t ﬁducial densit y of these parameters can b e obtained b y using the Gibbs sampler within the t yp e of framework outlined in Section 2.2 of the curren t pap er. Ho w ever, for the same kind of reason that was given in relation to the use of a similar metho d of inference in the previous section, this particular metho d is not going to b e directly applicable to the case that will b e presently considered. This is more sp eciﬁcally due to the fact that, although we will assume that nothing or v ery little w as known about the means µ x and µ y b efore the data w ere observed, b y con trast w e are going to assume that there was a substan tial amount of pre-data kno wledge ab out the v ariances σ 2 x and σ 2 y and the correlation co eﬃcient τ . T o b egin with though, let us clarify ho w the full conditional p ost-data densities of µ x and µ y will b e constructed. 53 In this regard, observe that suﬃcient statistics for the parameters µ x and µ y condi- tional on all parameters except the parameter itself b eing kno wn are: q x = ¯ x − τ ( σ x /σ y ) ¯ y and q y = ¯ y − τ ( σ y /σ x ) ¯ x, resp ectiv ely , where ¯ x = P n i =1 x i and ¯ y = P n i =1 y i . Therefore, these tw o statistics q x and q y will b e assumed to b e the ﬁducial statistics Q ( z ) that will b e used in making in- ferences ab out µ x and µ y resp ectiv ely . Under this assumption, if µ x is the only unknown parameter in the mo del, then equation (3) will now ha ve the form q x = ϕ (Γ , µ x ), and more sp eciﬁcally can b e expressed as: ¯ x − τ  σ x σ y  ¯ y = µ x − τ  σ x σ y  µ y + Γ r σ 2 x (1 − τ 2 ) n where the primary r.v. Γ ∼ N(0 , 1). Also, given what has b een assumed in relation to our pre-data kno wledge about µ x , it w ould b e quite natural to specify the GPD function for µ x as follo ws: ω G ( µ x ) = a for µ x ∈ ( −∞ , ∞ ), where a > 0. This implies that the full conditional ﬁducial density of µ x is deﬁned by: µ x | µ y , σ 2 x , σ 2 y , τ , z ∼ N  ¯ x + τ  σ x σ y  ( µ y − ¯ y ) , σ 2 x (1 − τ 2 ) n  (41) F urthermore, due to the symmetrical nature of the biv ariate normal distribution, it should b e clear that, using a GPD function for µ y of the same type as just used for µ x , the full conditional ﬁducial density of µ y w ould b e deﬁned by: µ y | µ x , σ 2 x , σ 2 y , τ , z ∼ N  ¯ y + τ  σ y σ x  ( µ x − ¯ x ) , σ 2 y (1 − τ 2 ) n  (42) With regard to what was known ab out the v ariances σ 2 x and σ 2 y b efore the data w ere observ ed, we will assume that it is p ossible to adequately represent such knowledge 54 b y placing a probabilit y density function o ver each of these parameters conditional on all parameters except the parameter itself b eing known, i.e. the densities p ( σ 2 x | µ x , µ y , σ 2 y , τ ) and p ( σ 2 y | µ x , µ y , σ 2 x , τ ) resp ectiv ely . T o giv e an example, let these densit y func- tions for σ 2 x and σ 2 y b e deﬁned resp ectiv ely b y: σ 2 x ∼ In v-Gamma ( α x , β x ) and σ 2 y ∼ In v-Gamma ( α y , β y ) (43) where α x , β x , α y and β y are giv en p ositiv e constants. Notice that, for the case b eing considered, the likelihoo d functions that w ould b e placed o v er eac h of the parameters σ 2 x and σ 2 y assuming that all parameters except the parameter itself are known are giv en by the expressions: L ( σ 2 x | µ x , µ y , σ 2 y , τ , z ) = (1 /σ x ) n exp  − 1 2(1 − τ 2 )  P ( x 0 i ) 2 σ 2 x  + τ 1 − τ 2  P x 0 i y 0 i σ x σ y  (44) and L ( σ 2 y | µ x , µ y , σ 2 x , τ , z ) = (1 /σ y ) n exp  − 1 2(1 − τ 2 )  P ( y 0 i ) 2 σ 2 y  + τ 1 − τ 2  P x 0 i y 0 i σ x σ y  (45) resp ectiv ely , where x 0 i = x i − µ x and y 0 i = y i − µ y . Therefore, if the c hoices of the densities p ( σ 2 x | µ x , µ y , σ 2 y , τ ) and p ( σ 2 y | µ x , µ y , σ 2 x , τ ) in equation (43) are treated as prior densities, it can easily b e seen ho w, by com bining these prior densities with the lik eliho o d functions in equations (44) and (45) under the Bay esian paradigm, the full conditional p osterior densities of σ 2 x and σ 2 y can b e n umerically computed, i.e. the p osterior densities p ( σ 2 x | µ x , µ y , σ 2 y , τ , z ) and p ( σ 2 y | µ x , µ y , σ 2 x , τ , z ). On the other hand, with regard to the b eliefs that were held ab out the correlation co eﬃcien t τ b efore the data were observ ed, let us assume that conditional on all other parameters b eing kno wn, the scenario of interest of Section 2.5 w ould apply if the general parameter θ j w as taken as being τ , with the in terv al [ θ j 0 , θ j 1 ] in this scenario now being 55 sp eciﬁed as the in terv al [ − ε, ε ], where ε ≥ 0. As a result, we will now discuss how the full conditional p ost-data densit y of τ will b e constructed b y using the type of bispatial inference outlined in Section 2.5, whic h implies that it will b e denoted as the density b ( τ | µ x , µ y , σ 2 x , σ 2 y , z ). In this resp ect, let us b egin by p oin ting out that since, if all parameters except τ are known, there exists no suﬃcien t set of univ ariate statistics for τ that contains only one statistic that is not an ancillary statistic, it would seem reasonable to assume that the test statistic T ( z ), as generally deﬁned in Section 2.5, is the maximum lik eliho o d estimator of τ giv en that all other parameters are known. It can b e sho wn that this maxim um likelihoo d estimator is the v alue ˆ τ that solves the follo wing cubic equation: − n ˆ τ 3 +  P n i =1 x 0 i y 0 i σ x σ y  ˆ τ 2 +  n − P n i =1 ( x 0 i ) 2 σ 2 x − P n i =1 ( y 0 i ) 2 σ 2 y  ˆ τ + P n i =1 x 0 i y 0 i σ x σ y = 0 No w, it is well known that a maxim um lik eliho o d estimator of a parameter is usually asymptotically normally distributed with mean equal to the true v alue of the parameter, and v ariance equal to the in verse of the Fisher information with resp ect to that parameter. (T o clarify , this is the Fisher information obtained via diﬀeren tiating the logarithm of the likelihoo d function with resp ect to the parameter concerned.) F or this reason, if n is large, the sampling densit y function of the maxim um likelihoo d estimator ˆ τ just deﬁned can b e appro ximately expressed as follows: ˆ τ ∼ N( τ , 1 / I ( τ )) (46) where I ( τ ) is the Fisher information of the likelihoo d function in this example with resp ect to τ assuming all other parameters are kno wn, which is in fact giv en b y: I ( τ ) = n (1 + τ 2 ) (1 − τ 2 ) 2 56 Using this appro ximation, the h yp otheses H P and H S deﬁned in Section 2.5 that are applicable in the case where ˆ τ ≤ 0, i.e. the h yp otheses in equations (6) and (7), can now b e expressed as: H P : τ ≥ − ε (47) H S : ρ ( b T ∗ < ˆ τ ) ≤ Φ  ( ˆ τ + ε ) p I ( ε )  (= J ) (48) where b T ∗ is the estimator ˆ τ calculated exclusively on the basis of an as-yet-unobserv ed sample of n additional data p oints { ( X ∗ i , Y ∗ i ) : i = 1 , 2 , . . . , n } drawn from the biv ariate normal densit y function b eing studied, and Φ( ) is again the standard normal distribu- tion function. On the other hand, the hypotheses H P and H S that apply if ˆ τ > 0, i.e. the h yp otheses in equations (8) and (9), can no w b e expressed as: H P : τ ≤ ε (49) H S : ρ ( b T ∗ > ˆ τ ) ≤ 1 − Φ  ( ˆ τ − ε ) p I ( ε )  (= J ) (50) W e should p oint out that if the estimator ˆ τ did indeed ha v e the normal distribution giv en in equation (46), then it can b e easily sho wn that this estimator would satisfy the second criterion giv en in Section 2.5 for being a v alid test statistic T ( z ), which would in turn imply that the hypotheses H P and H S as deﬁned in equations (47) and (48) would b e equiv alent, and also that these hypotheses as deﬁned in equations (49) and (50) w ould b e equiv alen t. T o determine the ﬁducial densit y f S ( θ j | x ) that is required b y equations (10) and (12), i.e. the density f S ( τ | µ x , µ y , σ 2 x , σ 2 y , z ) in the presen t case, let us b egin b y assuming that the maxim um likelihoo d estimator ˆ τ is the ﬁducial statistic Q ( z ), whic h is actually the c hoice that w as made for this statistic Q ( z ) in the aforementioned example in Bo wa- ter (2018a) when ﬁducial inference was used in this t yp e of situation, i.e. in the situation 57 where τ is the only unkno wn parameter. Ho wev er, instead of assuming that the sampling densit y function of ˆ τ is a normal density as has just b een done, and as was done in the con text of current in terest in Bo water (2018a), let us assume that it is a transformation of ˆ τ that is normally distributed, namely the function tanh − 1 ( ˆ τ ). The reason for doing this is that it can b e sho wn that, under this latter assumption, a generally b etter appro x- imation to the sampling density of ˆ τ can b e obtained than under the former assumption, except, that is, when τ is close to zero. Notice that this exception is the reason why this alternative assumption w as not the preferred assumption in the preceding discussion in order to deriv e approximate forms of the hypothesis H S . More sp eciﬁcally , it will b e assumed that the density function of tanh − 1 ( ˆ τ ) is directly sp eciﬁed (and the density function of ˆ τ is therefore indirectly sp eciﬁed) by the expression: tanh − 1 ( ˆ τ ) ∼ N (tanh − 1 ( τ ) , 1 / I (tanh − 1 τ )) where I (tanh − 1 τ ) is the Fisher information with resp ect to the quan tit y tanh − 1 ( τ ) as- suming all parameters except τ are kno wn, which is in fact giv en b y: I (tanh − 1 τ ) = n (1 + τ 2 ) Allo wing tanh − 1 ( ˆ τ ) to take the role of the statistic Q ( z ), and using the approxima- tion to the densit y function of this statistic tanh − 1 ( ˆ τ ) just giv en, w e can therefore ap- pro ximate equation (3) in the case where τ is the only unkno wn parameter as follows: tanh − 1 ( ˆ τ ) = ϕ (Γ , τ ) = tanh − 1 ( τ ) + Γ p n (1 + τ 2 ) (51) where the primary r.v. Γ ∼ N(0 , 1). Although it can b e shown that this equation do es not generally satisfy Condition 1 of Section 2.4, it is the case, on the other hand, that if Γ is generated from a standard normal density function truncated to lie in a given 58 in terv al ( − v , v ) where v > 0, then this condition will b e satisﬁed for v ery large v alues of v under the restriction that n is not to o small and ˆ τ is not very close to − 1 or 1. F or example, if n = 100 and | ˆ τ | < 0 . 999, then Condition 1 will b e satisﬁed not only for small v alues of v , but even if v is c hosen to b e as high as 36, and will b e satisﬁed for substan tially larger v alues of v as | ˆ τ | b ecomes smaller. W e will therefore make use of equation (51) under the assumption that the primary r.v. Γ follo ws the truncated normal densit y function just men tioned with v chosen to be equal to or not far b elo w the largest p ossible v alue of v that is consistent with equation (51) satisfying Condition 1. Also, since the ﬁducial density f S ( τ | µ x , µ y , σ 2 x , σ 2 y , z ) needs to b e deriv ed under the assumption that, given the v alues of the conditioning parameters µ x , µ y , σ 2 x and σ 2 y , there would ha ve b een no or v ery little pre-data knowledge about τ , it will b e quite naturally assumed that the GPD function of τ is sp eciﬁed as follows: ω G ( τ ) = b if − 1 ≤ τ ≤ 1 and 0 otherwise, where b > 0. Under the assumptions that ha v e just been made, applying the principle outlined in Section 2.4 for deriving a ﬁducial densit y of the general t yp e f ( θ j | θ − j , x ), i.e. Principle 1 of Bo water (2019a), leads to an appro ximation to the full conditional ﬁducial density of τ that is given b y: f S ( τ | µ x , µ y , σ 2 x , σ 2 y , z ) = ψ t ( γ )     dγ dτ     if τ ∈ ( τ 0 , τ 1 ) and is zero otherwise where γ is the v alue of Γ that solv es equation (51) for the giv en v alue of τ , i.e. γ = (tanh − 1 ( ˆ τ ) − tanh − 1 ( τ )) n 0 . 5 (1 + τ 2 ) 0 . 5 while ψ t ( γ ) is the standard normal densit y function truncated to lie in the interv al ( − v , v ) ev aluated at γ , and ﬁnally ( τ 0 , τ 1 ) is the in terv al of v alues of τ that, accord- ing to equation (51), corresp ond to γ lying in the in terv al ( − v , v ). With the assumption ha ving b een made that the ﬁducial density f S ( τ | µ x , µ y , σ 2 x , σ 2 y , z ) is approximately deter- 59 mined in this manner, it can b e easily seen how the sp eciﬁcation of the p ost-data density b ( τ | µ x , µ y , σ 2 x , σ 2 y , z ) can b e completed by using the criteria of Section 2.5. T o illustrate this example, Figure 5 sho ws some results from running a Gibbs sam- pler with a uniform random scanning order of the parameters µ x , µ y , σ 2 x , σ 2 y and τ on the basis of the full conditional p ost-data densities of these parameters that ha ve just b een detailed, i.e. the ﬁducial densities f ( µ x | µ y , σ 2 x , σ 2 y , τ , z ) and f ( µ y | µ x , σ 2 x , σ 2 y , τ , z ) deﬁned by equations (41) and (42), the p osterior densities (deriv ed using Ba yesian in- ference) p ( σ 2 x | µ x , µ y , σ 2 y , τ , z ) and p ( σ 2 y | µ x , µ y , σ 2 x , τ , z ) and the p ost-data densit y (de- riv ed using bispatial inference) b ( τ | µ x , µ y , σ 2 x , σ 2 y , z ). In particular, the histograms in Figures 5(a) to 5(e) represent the distributions of the v alues of µ x , µ y , σ x , σ y and τ , resp ectiv ely , ov er a single run of ten million samples of these parameters generated by the Gibbs sampler after allowing for its burn-in phase b y discarding a preceding run of ﬁve thousand samples. The sampling of each of the densities p ( σ 2 x | µ x , µ y , σ 2 y , τ , z ), p ( σ 2 y | µ x , µ y , σ 2 x , τ , z ) and b ( τ | µ x , µ y , σ 2 x , σ 2 y , z ) was based on the Metrop olis algorithm, while the sampling of eac h of the densities f ( µ x | µ y , σ 2 x , σ 2 y , τ , z ) and f ( µ y | µ x , σ 2 x , σ 2 y , τ , z ) w as indep enden t from the preceding iterations. Moreo v er, the observed data set z was a typical sample of n = 100 data p oints from a biv ariate normal distribution with µ x = 0, µ y = 0, σ x = 1, σ y = 1 and τ = 0 . 3. I n addition, the sp eciﬁcation of the p osterior densities p ( σ 2 x | µ x , µ y , σ 2 y , τ , z ) and p ( σ 2 y | µ x , µ y , σ 2 x , τ , z ) w ere completed by assuming the v alues of the constan ts α x , β x , α y and β y , i.e. the constants that con trol the choice of the prior densities of σ 2 x and σ 2 y in equa- tion (43), were set as follows: α x = 49 . 5, β x = 48, α y = 49 . 5 and β y = 34. On the other hand, with regard to how the p ost-data densit y b ( τ | µ x , µ y , σ 2 x , σ 2 y , z ) was fully de- termined, the constan t ε was assumed to b e equal to 0.02, and the probabilities κ that w ould b e assigned to the hypotheses H S in equations (48) and (50) for diﬀeren t v al- ues of all the parameters except τ w ere assumed to b e given b y the PDO curv e with, 60 −0.4 −0.3 −0.2 −0.1 0.0 0.1 0.2 0 1 2 3 4 (a) µ x Density −0.3 −0.2 −0.1 0.0 0.1 0.2 0 1 2 3 4 (b) µ y Density 0.9 1.0 1.1 1.2 0 2 4 6 8 (c) σ x Density 0.7 0.8 0.9 1.0 1.1 0 2 4 6 8 (d) σ y Density 0.0 0.1 0.2 0.3 0.4 0.5 0 1 2 3 4 (e) τ Density Figure 5: Conditional prior densities of tw o parameters, namely σ x and σ y , and marginal p ost-data densities of all ﬁv e parameters of a biv ariate normal distribution 61 once more, the form ula: κ = J 0 . 6 , where, as indicated in these earlier equations, J is the one-sided P v alue in the deﬁnition of the hypothesis H S that is applicable. Also, in determining the p ost-data densit y of τ in question, the density function h ( θ j ) that app ears in equation (11), i.e. the densit y h ( τ ) in the presen t case, w as deﬁned simi- lar to ho w a densit y function of this type was sp eciﬁed in earlier examples, that is, b y the expression τ ∼ Beta (4 , 4 , − 0 . 02 , 0 . 02), where the notation here is again as used in equation (27). Supplemen tary to this analysis, there w as no suggestion from applying appropriate diagnostics to multiple runs of the Gibbs sampler from diﬀerent starting p oin ts that it did not hav e a limiting distribution. F urthermore, after excluding the burn-in phase of the sampler, no statistically signiﬁcan t diﬀerence w as found betw een the samples of parameter v alues aggregated o ver the runs of the sampler in using v arious very dis- tinct ﬁxed scanning orders of the ﬁve mo del parameters µ x , µ y , σ 2 x , σ 2 y and τ , with a single transition of the sampler deﬁned in the same w ay as in previous examples, ev en when the runs in question w ere long. T aking into account what w as discussed in Section 2.2, we can reasonably conclude, therefore, that the full conditional densities of the limiting distribution of the original random-scan Gibbs sampler should b e, at the v ery least, close appro ximations to the full conditional densities on which the sampler is based, i.e. the ﬁducial densities f ( µ x | µ y , σ 2 x , σ 2 y , τ , z ) and f ( µ y | µ x , σ 2 x , σ 2 y , τ , z ), the p os- terior densities p ( σ 2 x | µ x , µ y , σ 2 y , τ , z ) and p ( σ 2 y | µ x , µ y , σ 2 x , τ , z ) and the p ost-data density b ( τ | µ x , µ y , σ 2 x , σ 2 y , z ). The solid curves o verlaid on the histograms in Figures 5(a) and 5(c) are plots of the marginal ﬁducial densities of the parameters µ and σ , resp ectively , as deﬁned b y equations (18) and (19) that would apply if the data set of in terest only consisted of the observ ed v alues of the v ariable X , i.e. { x i : i = 1 , 2 , . . . , 100 } , while in Figures 5(b) and 5(d), the solid curv es represen t, resp ectively , the marginal ﬁducial densities of µ 62 and σ deﬁned in the same w ay except that these densities corresp ond to treating the observ ed v alues of the v ariable Y rather than the v ariable X , i.e. the set of v alues { y i : i = 1 , 2 , . . . , 100 } , as being the data set x in the equations b eing discussed. On the other hand, the dashed curves ov erlaid on the histograms in Figures 5(c) and 5(d) are plots of the conditional prior densities for σ x and σ y , resp ectiv ely , as deﬁned in equation (43). Finally , the solid curve ov erlaid on the histogram in Figure 5(e) is a plot of a conﬁdence densit y function for the parameter τ . In general, a density function of this t yp e corre- sp onds to a set of conﬁdence interv als that hav e a v arying co v erage probabilit y for the parameter concerned, see for example Efron (1993) for further clariﬁcation. More specif- ically , for the plot b eing considered, these conﬁdence interv als for τ were constructed on the basis of summarising the data set z by the sample correlation co eﬃcien t r , and then assuming that the Fisher transformation of this co eﬃcient, i.e. the transforma- tion tanh − 1 ( r ), has a normal sampling distribution with mean tanh − 1 ( τ ) and v ariance 1 / ( n − 3), which is a standard metho d that is used in practice to form conﬁdence in terv als for the correlation τ . Similar to earlier examples, it can b e seen from comparing the histograms in Fig- ures 5(a) to 5(d) with the curves o verlaid on them that the forms of the marginal p ost-data densities of µ x µ y , σ x and σ y that are represen ted by these histograms are consisten t with what we w ould hav e intuitiv ely exp ected given the pre-data b eliefs ab out these parameters and the correlation τ that ha ve b een tak en in to accoun t. F urthermore, w e can observe that the marginal p ost-data densit y for τ represented by the histogram in Figure 5(e) diﬀers substantially from the curve ov erlaid on this histogram, i.e. the aforemen tioned t yp e of conﬁdence densit y function for τ , particularly with regard to the amoun t of probability mass that these t w o density functions assign to v alues of τ close to zero. This arguably gives an indication of ho w inadequate it w ould be, in this example, to attempt to mak e inferences ab out the correlation τ using the standard type of conﬁdence 63 in terv als for τ on whic h the ov erlaid curve in question is based. 3.6. Summary of other examples As part of the discussion of the examples that w ere outlined in the preceding sections, reference was made to additional examples from Bow ater (2018a), Bo water (2019a) and Bo w ater (2019b) that ﬁt within the inferen tial framework that has b een put forw ard in the presen t pap er. Here the opp ortunity will b e tak en to highligh t examples of a similar kind from these earlier pap ers that hav e not b een mentioned up to this p oin t. T o b egin with, let us remark that in Bow ater (2019a), organic ﬁducial inference was applied to the problem of making p ost-data inferences ab out discrete probability distri- butions that naturally only hav e one unknown parameter, in particular the binomial and P oisson distributions, and as a result, a ﬁducial density for the parameter concerned was determined. With regard to making inferences ab out a binomial prop ortion, the appli- cation of the metho d of inference in question represen ts, of course, a sp ecial case of the t yp e of scenario discussed in Section 3.3, i.e. the case where the p opulation prop ortion π 1 in this latter example is set to zero. F urthermore, the problem of making p ost-data inferences ab out a binomial prop ortion was addressed in Bo w ater (2019b) b y using the t yp e of bispatial inference that was described in Section 2.5. On the other hand, in Bow ater (2018a), join t p ost-data densities for the t wo param- eters of the Pareto, gamma and b eta distributions w ere determined by using the t yp e of framew ork that was outlined in Section 2.2 on the basis of full conditional p ost-data densities of the parameters concerned that were formed by applying, in eﬀect, organic ﬁducial inference, i.e. all these full conditional and joint p ost-data densities were, in fact, ﬁducial densities. In addition, the p ost-data densit y for a relative risk π t /π c w as deter- mined in Bo water (2019b) by using the kind of framew ork of Section 2.2 on the basis of full conditional p ost-data densities for the binomial prop ortions π t and π c that were 64 formed b y applying the t yp e of bispatial inference detailed in Section 2.5 in a wa y that mean t that dep endence w ould, in general, exist b et w een π t and π c in the joint p ost-data densit y of these parameters. Finally , in Bo water (2018a), a metho d that was, in eﬀect, or- ganic ﬁducial inference w as applied to the problem of making p ost-data inferences about the diﬀerence b et w een the means of tw o normal densit y functions that hav e unknown v ariances on the basis of indep enden t samples from the tw o densit y functions concerned, i.e. the Behrens -Fisher problem. 4. Defence and discussion of the theory There now follo ws a discussion of the theory put forw ard in the present pap er, i.e. in te- grated organic inference, arranged as a series of questions that one migh t expect w ould be naturally raised as a reaction to ﬁrst reading ab out this theory , and immediate resp onses to eac h of these questions. Question 1. Why not always use the Bayesian appr o ach to infer enc e? As comments w ere already made in Section 2.3 regarding the ﬂa wed nature of tw o common ‘ob jectiv e’ forms of Bay esian inference, let us consider the prop osal of alwa ys making post-data inferences ab out model parameters using the standard or sub jectiv e Ba y esian paradigm. It is clearly arguable that the main diﬃculty with the Ba yesian paradigm is in c ho os- ing a prior density function for the model parameters that adequately represen ts what w as known ab out these parameters b efore the data were observed. According to the deﬁnition of probabilit y being adopted in this pap er, i.e. the deﬁnition outlined in detail in Bo water (2018b) that was summarised in Section 2.1, carrying out this task in an unsatisfactory manner (which can reasonably be regarded as often b eing unav oidable) is formally indicated b y a low ranking b eing attac hed to the external strength of the prior 65 distribution function, under the assumption, which will b e made from now on wards, that the ev en t R ( λ ) is a giv en outcome of a well-understoo d ph ysical exp erimen t (such as dra wing a ball out of an urn of balls) and the resolution level λ is some v alue in the inter- v al [0 . 05 , 0 . 95]. In addition, it can b e argued that, if we only apply Ba y esian reasoning, then this assessment of external strength should, in turn, generally result in a similar low ranking b eing attac hed to the external strength of the p osterior distribution function of the parameters that is based on the prior distribution function concerned. W e can observ e that it is often claimed that the c hoice of a prior distribution function is not suc h an imp ortant issue if, ov er a set of ‘reasonable choices’ for this distribution function, the posterior distribution function to whic h it corresp onds is not ‘greatly af- fected’ by this choice. How ever, it is diﬃcult for suc h an argument to escap e the issue that has just b een raised, whic h, in the present con text, is the question of how externally strong should w e regard an y particular p osterior distribution function that corresp onds to a prior distribution function that belongs to the aforemen tioned set assuming that we can apply only Ba y esian reasoning? F urthermore, in resp onse to the claim b eing consid- ered, it can b e argued that if, for example, w e had no or very little pre-data kno wledge ab out the parameters of a giv en mo del, then the set of ‘reasonable choices’ for the prior densit y function of these parameters would need to b e so diverse that the corresp ond- ing p osterior densit y function would indeed b e very greatly aﬀected by whic h density function is chosen from this set. Of course, if a prior density function can b e found for a given set of parameters that is gen uinely considered to b e a go od representation of our pre-data kno wledge ab out these parameters, then we would naturally feel m uc h less uneasy ab out the appropriateness of using the Ba y esian metho d to mak e inferences ab out the parameters concerned. This is the reason why this metho d of inference is a critical comp onent of the in tegrated framew ork for data analysis that has b een describ ed in the presen t pap er. 66 A more detailed discussion of the lines of reasoning that ha v e just b een presented can b e found in Bo water (2017, 2018a, 2018b). Moreov er, it w as also argued in detail in Bow ater (2018b) and Bow ater (2019a) that v ery high rankings may b e justiﬁably attac hed to the external strengths of ﬁducial distribution functions derived by using the strong or mo derate ﬁducial argument as part of the theory of organic ﬁducial inference that was outlined in Section 2.4, assuming that there w as no or very little pre-data kno wledge ab out the parameters concerned ov er their p ermitted range of v alues. Partially on the basis of this kind of reasoning, it could b e argued furthermore that often, in practice, similar high rankings should b e attached to the external strengths of p ost- data distribution functions deriv ed using the type of bispatial inference describ ed in Section 2.5, assuming that the scenario of in terest sp eciﬁed in this earlier section is strictly applicable. Question 2. What ab out Lind ley’s criticism with r e gar d to the inc oher enc e of ﬁducial infer enc e? With reference to Fisher’s ﬁducial argument, it w as shown in Lindley (1958) that, if the ﬁducial density of a parameter θ that is formed on the basis of a data set x is treated as a prior densit y of θ in forming, in the usual Bay esian wa y , a p osterior density of θ on the basis of a second data set y , then, in general, this p osterior density will not b e the same as the one that w ould b e formed b y rep eating the same op eration but with y as the ﬁrst data set, and x as the second data set, i.e. ﬁducial inference generally fails to satisfy a seemingly reasonable coherency condition. As a reaction to this, it can b e remarked that ﬁducial inference, whether it is Fisher’s v ersion of this t yp e of inference, or the version outlined in the presen t paper, relies on pre- data kno wledge, or an expression of the lac k of such knowledge, being incorp orated into the inferential pro cess within the context of the observed data. Therefore, while it ma y b e lo osely acceptable, in general, to apply a blank et rule suc h as the strong ﬁducial a rgument 67 without concern for the data actually observed, it is p erhaps unsurprising that doing this could sometimes lead to the t yp e of phenomenon that has just b een highlighted. Also, the act of expressing pre-data knowledge is rarely going to b e a completely 100% precise act no matter what paradigm of inference is adopted, therefore the do or is alwa ys op en for inconsistencies in the inferential pro cess suc h as the one identiﬁed in Lindley (1958) that is under discussion. F urthermore, if indeed we are in a scenario where the coherency condition b eing considered is not satisﬁed, then at least with resp ect to the t yp e of ﬁducial inference outlined in the present pap er, i.e. organic ﬁducial inference, it would b e exp ected that go o d approximate adherence to this condition w ould usually b e ac hieved pro viding that the data sets x and y referred to ab o v e are at least mo derately sized. In other words, it can be argued that the practical consequences of the anomaly in question should generally b e regarded as b eing quite small. Observ e that the same kind of anomaly is clearly also going to apply when p ost-data densities of the parameters of a given mo del are constructed by relying in some w ay on the type of bispatial inference that w as describ ed in Section 2.5. Similar arguments can b e made, though, in response to the criticism b eing discussed with regard to this type of situation as hav e just b een presented. Finally , w e ought to men tion an imp ortan t issue that is related to this criticism. In particular, if it is considered as b eing appropriate in a particular context to form a p ost-data densit y function for the parameters of a giv en mo del by incorporating organic ﬁducial inference, and p ossibly also bispatial inference, in to the framework that has b een detailed in the presen t pap er, then we ma y ask, would it not b e best to use one or both of these metho ds of inference to construct such a densit y function on the basis of a minimal part of the data set that has actually b een observed, and as a next step, use this densit y function as a prior densit y in analysing the rest of the data under only the Ba y esian paradigm? Although, at ﬁrst sight, this strategy may app ear to b e a reasonable one, 68 it has the dra wback that p ost-data density functions constructed using organic ﬁducial inference on its o wn, or com bined with bispatial inference, may well b e regarded as b eing less adequate representations of the p ost-data uncertaint y that is felt about the parameters concerned if they are based on a small rather than a large amount of data. F or example, even if there was v ery little pre-data knowledge ab out a given parameter of interest and the ﬁducial statistic Q ( x ) is a suﬃcient statistic, it ma y b e less appro- priate to apply the strong ﬁducial argument to mak e inferences ab out this parameter if the data set is small rather than large. Also, with regard to bispatial inference, there is of course generally less chance that the one-sided P v alue in the h yp othesis H S de- ﬁned b y equation (7) or (9), i.e. the v alue F ( t | θ j = θ j 0 ) or the v alue F 0 ( t | θ j = θ j 1 ), will b e small if it is calculated on the basis of a small rather than a large data set, and as a result, more chance p erhaps that the interpretation of this P v alue will b e a little complicated. W e are therefore led again to an issue that w as discussed in the answ er to Question 1 of this section, in particular the question of whether we can justiﬁably attac h a very high ranking to the external strength of the prior densit y that forms the basis for carrying out the second step of the type of strategy b eing considered and, if w e can only apply Ba y esian reasoning in this second stage, whether we can justiﬁably attach a v ery high ranking to the external strength of the p osterior densit y that results from the whole analysis? Question 3. If the choic e of the ﬁducial statistic is not obvious, how should this statistic b e chosen? The deﬁnition of a ﬁducial statistic Q ( x ) w as giv en in Section 2.4. As alluded to in this earlier section, if there is not a suﬃcient statistic for the unkno wn parameter of in terest that is a natural choice for the ﬁducial statistic, then a fairly general choice for this latter statistic, whic h has a go o d deal of intuitiv e app eal, is the maximum likelihoo d 69 estimator of the parameter. Nevertheless, it would app ear that more sophisticated cri- teria for choosing the ﬁducial statistic could b e easily dev elop ed so that, in general, the eﬀect of an y arbitrariness in the choice of this statistic could b e assured as b eing negligible. Such a dev elopment though will b e left for future w ork. Question 4. Can the r esults obtaine d fr om applying inte gr ate d or ganic infer enc e dep end on the p ar ameterisation of the sampling mo del? There are t w o k ey reasons wh y the parameterisation of the sampling mo del ma y p os- sibly aﬀect the inferences made about p opulation quan tities of interest when applying in tegrated organic inference. First, related to a p oin t made in the answer to Question 2 of this section, it ma y b e p ossible to achiev e a more representativ e expression of pre-data kno wledge ab out the parameters of a mo del using one parameterisation of the mo del rather than another. In this case, it is fairly ob vious that ideally , out of all p ossible parameterisations of the mo del, the one should b e c hosen with regard to which the most represen tativ e expression of pre-data kno wledge ab out the parameters can b e ac hiev ed. The second reason wh y inferences ma y b e possibly aﬀected by model parameterisation is related to the answer given to Question 3 of this section. In particular, it is that parameterisations ma y exist with regard to which ﬁducial statistics Q ( x ) or test statis- tics T ( x ) can b e found that make more eﬃcient use of the information contained in the data than those that can b e found with regard to other parameterisations. How ever, it would b e exp ected that, in general, this issue w ould not hav e more than a negligible eﬀect on p ost-data inferences made ab out quantities of in terest, and where the eﬀect of this issue is more than negligible then, in the context of what w as just discussed ab out the c hoice of mo del parameterisation, there clearly should b e a preference for those parameterisations that allo w ﬁducial statistics and test statistics to b e c hosen that make the b est use of the information that is in the data. 70 Question 5. In c ases wher e the set of ful l c onditional p ost-data densities r eferr e d to in e quation (2) ar e inc omp atible, how often, in pr actic e, c ould we exp e ct them to b e ‘appr oximately c omp atible’ ? Let us begin b y clarifying that in in terpreting this question it will b e assumed that the full conditional densities referred to in equation (2) would b e describ ed as b eing ‘appro ximately compatible’ if they were incompatible, but nevertheless it w as p ossible to ﬁnd a join t densit y function of the parameters concerned such that these full conditional densities w ere closely appro ximated b y the full conditional densities of the given joint densit y . In replying to the question just raised, let us ﬁrst remem b er that examples w ere discussed in Sections 3.2 to 3.5 of the present pap er in whic h the Gibbs sampling metho d of Section 2.2 w as applied to determine a joint p ost-data density of the parameters of eac h of the sp eciﬁc mo dels of in terest in these examples. Also, v arious other examples of this kind w ere outlined in Bo water (2018a, 2019a, 2019b). In all of these examples, a justiﬁcation w as given as to wh y it w ould b e reasonable to conclude that if indeed the full conditional densities referred to in equation (2) are incompatible, then they nevertheless should b e appro ximately compatible. Ho w ever, let us take the opp ortunit y to highligh t t wo examples where the approximate compatibilit y of the full conditional densities in equation (2) app eared to b e less go od than what was seen to b e generally the case in the examples of the t yp e in question. First, in an example in Bow ater (2018a) where organic ﬁducial inference w as applied to the problem of making post-data inferences ab out all the parameters of a biv ariate normal distribution, a basic simulation study show ed that the full conditional densities referred to in equation (2) were clearly incompatible. It could b e argued, though, that the main reason for this w as likely to b e the fairly unsophisticated normalit y assumptions that were made as part of this application of the metho d of inference in question in order 71 to appro ximate the full conditional ﬁducial densities for three of the ﬁv e parameters concerned, these three parameters b eing, in particular, the t wo p opulation v ariances and the correlation co eﬃcien t. Second, although in an example in Bow ater (2019a) where organic ﬁducial inference w as used to mak e p ost-data inferences ab out all the parameters of a multinomial distribution, a justiﬁcation w as given as to wh y the full conditional densities in equation (2) should b e at least appro ximately compatible, an additional (unrep orted) simulation study show ed that in this example, the full conditional densities in question often may not hav e this desirable prop erty if the num b er of trials (or in other w ords the n um b er of observ ations) is very low and one or more of the categories ov er whic h the m ultinomial distribution is deﬁned con tain no observ ations. Nevertheless, the problem of making inferences about the parameters of a multinomial distribution on the basis of limited data of this type when, as in the example b eing referred to, there is assumed to b e no or v ery little pre-data kno wledge ab out the parameters concerned is generally a diﬃcult problem to solv e using an y paradigm of inference, see for example Berger, Bernardo and Sun (2015), and it is one that may w ell never ha ve a completely satisfactory solution. Finally , with regard to making inferences ab out the parameters θ of any given sam- pling mo del, it is imp ortan t to b ear in mind that, ev en if the full conditional densities referred to in equation (2) fail to b e at least approximately compatible, then nev ertheless, as alluded to in Section 2.2, they ma y w ell b e considered as representing the b est infor- mation that is av ailable for constructing the most suitable p ost-data densit y function for the parameters concerned using the Gibbs sampling metho d outlined in this earlier section. This concludes the discussion of the theory put forw ard in the presen t pap er, i.e. in tegrated organic inference (IOI). It is hop ed that it will b e appreciated that this theory mo diﬁes, generalises and extends Fisherian inference, and naturally com bines it with 72 Ba y esian inference in a w a y that constitutes a ma jor adv ance on the level of sophistication of either of these tw o older schools of inference. References Ba y es, T. (1763). An essay tow ards solving a problem in the doctrine of chances. Philo- sophic al T r ansactions of the R oyal So ciety , 53 , 370–418. Berger, J. O., Bernardo, J. M. and Sun, D. (2015). Overall ob jectiv e priors. Bayesian A nalysis , 10 , 189–221. Bo w ater, R. J. (2017). A defence of sub jectiv e ﬁducial inference. AStA A dvanc es in Statistic al A nalysis , 101 , 177–197. Bo w ater, R. J. (2018a). Multiv ariate sub jective ﬁducial inference. arXiv.or g (Cornel l University), Statistics , Bo w ater, R. J. (2018b). On a generalised form of sub jectiv e probabilit y . arXiv.or g (Cor- nel l University), Statistics , Bo w ater, R. J. (2019a). Organic ﬁducial inference. arXiv.or g (Cornel l University), Sta- tistics , Bo w ater, R. J. (2019b). Sharp hypotheses and bispatial inference. arXiv.or g (Cornel l University), Statistics , Bro oks, S. P . and Roberts, G. O. (1998). Conv ergence assessmen t tec hniques for Mark o v c hain Monte Carlo. Statistics and Computing , 8 , 319–335. Chen, S-H. and Ip, E. H. (2015). Beha viour of the Gibbs sampler when conditional distributions are p oten tially incompatible. Journal of Statistic al Computation and 73 Simulation , 85 , 3266–3275. Co wles, M. K. and Carlin, B. P . (1996). Marko v c hain Mon te Carlo conv ergence diag- nostics: a comparativ e review. Journal of the Americ an Statistic al Asso ciation , 91 , 883–904. Efron, B. (1993). Bay es and likelihoo d calculations from conﬁdence in terv als. Biomet- rika , 80 , 3–26. Gelfand, A. E. and Smith, A. F. M. (1990). Sampling-based approac hes to calculating marginal densities. Journal of the Americ an Statistic al Asso ciation , 85 , 398–409. Gelman, A. and Rubin, D. B. (1992). Inference from iterative sim ulation using multiple sequences. Statistic al Scienc e , 7 , 457–472. Geman, S. and Geman, D. (1984). Sto c hastic relaxation, Gibbs distributions and the Ba y esian restoration of images. IEEE T r ansactions on Pattern Analysis and Ma- chine Intel ligenc e , 6 , 721–741. Jeﬀreys, H. (1961). The ory of Pr ob ability , 3rd edition, Oxford Universit y Press, Oxford. Kass, R. E. and W asserman, L. (1996). The selection of prior distributions b y formal rules. Journal of the A meric an Statistic al Asso ciation , 91 , 1343–1370. Lindley , D. V. (1958). Fiducial distributions and Ba y es’ theorem. Journal of the R oyal Statistic al So ciety, Series B , 20 , 102–107. Metrop olis, N., Rosenbluth, A. W., Rosenbluth, M. N., T eller, A. H. and T eller, E. (1953). Equation of state calculations by fast computing machines. Journal of Chemic al Physics , 21 , 1087–1092. 74

Integrated organic inference (IOI): A reconciliation of statistical paradigms

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment