Dynamic Gauss Newton Metropolis Algorithm
Introduction
The Problem
The GNM algorithm is for sampling generalizations of probability densities of the form $`p(x)\propto e^{-||f(x)||^2/2}`$. Here $`x\in \mathbb{R}^n`$ is a “parameter" vector, $`f:\mathbb{R}^n\to\mathbb{R}^m`$ is a “model" function, and $`||f(x)||^2=\sum_{k=1}^m f_k^2(x)`$. Typically $`m>n`$. GNM is a powerful algorithm for ill-conditioned problems because it is affine invariant.
This is related to nonlinear least squares $`\min_x ||f(x)||^2`$ that arises in statistics. For example, if $`m(x)`$ predicts the outcome of an experiment with parameter $`x`$, and $`y_k`$ are measurements, then the goodness of fit is $`\sum_{k=1}^m\frac{(m_k(x)-y_k)^2}{2\sigma_k^2}`$. If $`f_k(x)=\frac{m_k(x)-y_k}{\sigma_k}`$, then this sum has the form $`||f(x)||^2/2`$.
One generalization to the probability density function $`p(x)`$ is adding a Gaussian prior probability distribution on the parameters, $`\pi:\mathbb{R}^n\to\mathbb{R}`$, such that $`\pi(x)=\frac{1}{Z}e^{-(x-m)^T H(x-m)/2}`$, where $`m`$ is the mean and $`H`$ is the precision matrix, the inverse of the covariance. Another generalization is having an indicator function $`\chi:\mathbb{R}^n\to \{0,1\}`$ to limit the domain of the model function.
The user of the sampler must write Python code to evaluate $`\chi(x)`$, $`f(x)`$ and $`\nabla f(x)`$, the Jacobian. It is optional to add prior information, mean vector $`m\in\mathbb{R}^n`$ and precision matrix $`H\in\mathbb{R}^{n\times n}`$ (inverse of the covariance matrix).
Installation
The gnm is a Python package. The Python package numpy is required before the execution of gnm. To use the examples and plot the results, you will need matplotlib. To use the acor feature, you will need the package acor.
From the default Python packages, you will need os, setuptools (or distutils), re, sys, copy, and json. These packages likely come with your Python installation.
The easiest way to install gnm would be to use pip.
$`pip install gnm \end{bash}
To download manually, use git clone or download as zip from \url{https://github.com/mugurbil/gnm}.
\begin{bash}`$ git clone https://github.com/mugurbil/gnm.git
Then you can install manually by going into the gnm directory, and then running setup.py.
$`python setup.py install \end{bash}
To clean the repository after installation, one can run clean with \textsf{setup.py}.
\begin{bash}`$ python setup.py clean
Examples
Quick Start
The quickstart.py file in $`\sim`$/gnm/examples directory contains a simple example. Running this file should create a plot that looks like Figure [fig:quick].
$`python quickstart.py \end{bash}
In this example,$n=m=1$,$f(x)=$,$f’(x)=$. The prior has mean$m=0$and precision matrix$H=\[1\]$, giving$(x)=e^-x^2/2$. Therefore, the probability distribution that we want to sample is:$$p(x)=\frac{1}{Z}\pi(x)e^{-|f(x)|^2/2}=\frac{1}{Z}e^{-x^2/2-(x^2-y)^2/(2\sigma^2)}$$\begin{center}
\begin{figure}[h]
\centering
\caption{Sampling results for the simple well problem.}
\includegraphics[scale=0.4]{quick.png}
\label{fig:quick}
\end{figure}
\end{center}
%To understand what is going on in this file, please read the rest of the manual. First you will learn how to write your own function and how to use the developer function class, and then how to use the sampler.
\subsection{Well}
The problem in quick-start can be made harder by deepening the well, that is taking$y$to be bigger. This example is in the {\sf $\sim$/well} folder. The following line will give all the options for the file that you can play around with.
\begin{bash}`$ python well.py -h
Jtest and Simple 2D
In this example we first check the usage of Jtest, then sample a simple 2D example. There is a rotator for visiulization so that we can plot any cut of the posterior probability distribution along the plane.
Exponential Time Series
Suppose we have a process that describes the relationship between time and data by $`g(t)=\sum_{i=1}^{n} w_i e^{-\lambda_i t}`$. We have measurements $`y_k = g(t_k) +\epsilon_k`$ for $`k=1,...,m`$, where the noise is independent and identically distributed and $`\epsilon_k\sim N(0,\sigma_k^2)`$. Then the parameter is $`x=(w_1,...,w_d,\lambda_1,...,\lambda_d)`$, with $`n=2d`$. The model function is given by $`f_k(x)=\frac{g_k(t,x)-y_k}{\sigma_k}`$ for $`k=1,...m`$.
Experiment was done for $`n=4`$ and $`m=10`$. The sampler was run for $`10^5`$ samples and first $`2000`$ was burned in. The mean of the prior was taken to be $`[4.,2.,0.5,1.]`$ and the precision matrix was chosen as the identity multiplied with $`0.5`$. The initial guess is the mean of the prior. Data is created by taking the input as $`[1.,2.5,0.5,3.1]`$ and adding random normal noise with standard deviation $`0.1`$.
A 2D marginal histogram of the probability distribution sampled, $`\lambda_1`$ versus $`w_1`$, is shown in Figure 2. Figure 3 presents a 1D marginal histogram of the probability distribution sampled, for $`\lambda_1`$. This figure also displays quadrature (theoretical) curve versus the sampled probability function where we can observe the correctness of the algorithm.
| Max | Dilation | Acceptance | Acor | Effective | Number of |
| Steps | Factor | Rate | Time | Size | Function Calls |
| 0 | - | 0.273 | 2880 | 3470 | 1.00 * $`10^7`$ |
| 1 | Dynamic | 0.653 | 1720 | 5810 | 1.73 * $`10^7`$ |
| 1 | 0.1 | 0.603 | 1390 | 7180 | 1.73 * $`10^7`$ |
| 1 | 0.5 | 0.411 | 1760 | 5680 | 1.73 * $`10^7`$ |
| 2 | 0.1 | 0.812 | 1510 | 6610 | 2.12 * $`10^7`$ |
Comparison of sampling strategies for sampling exponential time series with different back-off strategies.
In this experiment we see significant gains using back-off both in the acceptance rate and in the autocorrelation time after analyzing Table 4, however this conclusion does not hold for the dynamic back-off. Since lower autocorrelation time implies higher effective sample size, we have for instance $`8850`$ effective samples with 1 back-off step versus $`5952`$ effective samples for no back-off steps. These translate to $`\frac{8850}{1.75*10^5}=0.0506`$ versus $`\frac{5952}{1.12*10^5}=0.0531`$ effective samples per function call implying that the efficiency increases with back-off. This number drops down significantly together with efficiency, to $`\frac{9434}{2.66*10^5}=0.0355`$, for 3 back-off steps. This suggests that using too many back-offs leads to too many function calls to get the same amount of effective samples, making it inefficient.
In Figure 4, we can observe that the covariance of the chain is stable implying that the chain is stationary. We can see in Figure 5 the percentage of samples accepted at each step (-1 is reject). This example shows that using too many back-off steps does not help significantly, however adding the first back-off step increases the acceptance rate of the algorithm significantly, around 30%.