This paper considers the topic of finding prior distributions when a major component of the statistical model depends on a nonlinear function. Using results on how to construct uniform distributions in general metric spaces, we propose a prior distribution that is uniform in the space of functional shapes of the underlying nonlinear function and then back-transform to obtain a prior distribution for the original model parameters. The primary application considered in this article is nonlinear regression, but the idea might be of interest beyond this case. For nonlinear regression the so constructed priors have the advantage that they are parametrization invariant and do not violate the likelihood principle, as opposed to uniform distributions on the parameters or the Jeffrey's prior, respectively. The utility of the proposed priors is demonstrated in the context of nonlinear regression modelling in clinical dose-finding trials, through a real data example and simulation. In addition the proposed priors are used for calculation of an optimal Bayesian design.
Deep Dive into Functional Uniform Priors for Nonlinear Modelling.
This paper considers the topic of finding prior distributions when a major component of the statistical model depends on a nonlinear function. Using results on how to construct uniform distributions in general metric spaces, we propose a prior distribution that is uniform in the space of functional shapes of the underlying nonlinear function and then back-transform to obtain a prior distribution for the original model parameters. The primary application considered in this article is nonlinear regression, but the idea might be of interest beyond this case. For nonlinear regression the so constructed priors have the advantage that they are parametrization invariant and do not violate the likelihood principle, as opposed to uniform distributions on the parameters or the Jeffrey’s prior, respectively. The utility of the proposed priors is demonstrated in the context of nonlinear regression modelling in clinical dose-finding trials, through a real data example and simulation. In addition th
Mathematical models of the real world are typically nonlinear, examples in medical or biological applications can be found for instance in Lindsey (2001) or Jones et al. (2010). Setting up prior distributions in a statistical analysis of nonlinear models, however often remains a challenge. If external, numerical or non-numerical information exists, one can try to quantify it into a probability distribution, see for example the works of O' Hagan et al. (2006), Bornkamp and Ickstadt (2009), and Neuenschwander et al. (2010). The classical approach in the absence of substantive information is Jeffreys prior distribution (or variants), given by p (θ) ∝ det(I(θ)), where θ ∈ Θ ⊂ R p is the parameter, and I (θ) the Fisher information matrix of the underlying statistical model. See Kass and Wasserman (1996), Ghosh et al. (2006, ch. 5) or Berger et al. (2009) for this approach and generalizations. A serious drawback is the fact that this prior can depend on observed covariates. In the case of nonlinear regression analysis, the prior depends on the design points and relative allocations to these points and thus violates the likelihood principle. Apart from the foundational issues this raises (see, e.g., O' Hagan and Forster (2004, ch. 3)) it also has undesirable practical consequences. For Bayesian optimal design calculations in nonlinear regression models, for example, Jeffreys prior cannot be used, because it depends on the design points, which is what we want to calculate in the optimal design problem. In the context of adaptive dose-finding clinical trials, patients are allocated dynamically to the doses available (see the works of Müller et al. (2006) or Dragalin et al. (2010)) so that the sequential analysis of the data will differ from the analysis combining all data, when using Jeffreys rule. In summary the main issue with the Jeffreys prior distribution is that one cannot state it before data collection, which is crucial in some applications. Surprisingly few proposals have been made to overcome this situation. In current practice often uniform distributions for θ on a reasonable compact subset of the parameter space are used. This approach is however extremely sensitive to the chosen parametrization (which might be more or less arbitrary) and can be much more informative than one would expect intuitively.
To illustrate the point, we will use a simple example. Suppose one would like to analyse data using the exponential model exp(-θx), here with x ∈ [0, 10], which could be the mean function in a regression analysis. Assume that no historical data or practical experiences related to the problem are available.
A first pragmatic approach in this situation is to use a uniform distribution on θ values leading to a reasonable shape coverage of the underlying regression function exp(-θx), for example the interval θ ∈ [0, 5] covers the underlying shapes almost entirely. The consequences of assuming a uniform prior on [0, 5] can be observed in Figure 1 (ii). While the prior is uniform in θ space, it places most of its prior probability mass on the functional shapes that decrease quickly towards zero, and we end up with a very informative prior distribution in the space of functional shapes. This is highly undesirable when limited prior information regarding the shape is available. In addition it depends crucially on the upper bound selected for θ, and a uniform distribution in an alternative parameterization would lead to entirely different prior in the space of shapes. One way to overcome these problems is to use a distribution that is uniform in the space of functional shapes of the underlying nonlinear function. This will be uninformative from the functional viewpoint and will not depend on the selected parameterization.
In finite dimensional situations it is a standard approach to use distributions that are uniform in an interpretable parameter transformation, when it is difficult to use the classical default prior distributions. In the context of Dirichlet process mixture modelling, one can use a uniform distribution on the probability that two observations cluster into one group and then transfer this into a prior distribution for the precision parameter of the Dirichlet process. In the challenging problem of assigning a prior distribution for variance parameters in hierarchical models, Daniels (1999) assumes a uniform distribution on the shrinkage coefficient and then transfers this to a prior distribution for the variance parameter. In these cases the standard change of variables theorem can be used to derive the necessary uniform distributions. When we want to impose a uniform distribution in the space of functional shapes of an underlying regression function, however, it is not entirely obvious how to construct a uniform distribution. In the next section we will review a methodology that allows to construct uniform distributions on general metric spaces.
In Section 2.2 we will adapt this to the nonlinear
…(Full text truncated)…
This content is AI-processed based on ArXiv data.