We consider data consisting of photon counts of diffracted x-ray radiation as a function of the angle of diffraction. The problem is to determine the positions, powers and shapes of the relevant peaks. An additional difficulty is that the power of the peaks is to be measured from a baseline which itself must be identified. Most methods of de-noising data of this kind do not explicitly take into account the modality of the final estimate. The residual-based procedure we propose uses the so-called taut string method, which minimizes the number of peaks subject to a tube constraint on the integrated data. The baseline is identified by combining the result of the taut string with an estimate of the first derivative of the baseline obtained using a weighted smoothing spline. Finally, each individual peak is expressed as the finite sum of kernels chosen from a parametric family.
1. Introduction. In the analysis of the morphology of thin films, x-ray diffraction is an indispensable tool [Birkholz (2006)]: the intensity of diffracted x-rays yields important information about the crystalline structure of the material under consideration. The experimental data are usually obtained in the form of a diffractogram: photon counts of x-ray radiation are measured as a function of the angle of diffraction 2θ.
A typical diffractogram, as shown in Figure 1, exhibits peaks as well as a slowly varying baseline. The physically relevant information is contained in the location, shape and size of the peaks and their decomposition into a sum of one or more possibly overlapping components represented by kernels. Often thin film diffractograms are analyzed using ad-hoc methods where denoising, removal of the baseline and fitting of the peaks are performed manually. Apart from being inconvenient, this often requires knowledge of possible peak positions.
In this article we suggest a new flexible automatic procedure for the analysis of thin film diffractograms. Our aim is to separate the signal of interest from the noise. More specifically, we aim at a decomposition of the form Data = Baseline + Peaks + Noise.
(1)
Our fully automatic five-step procedure (see Section 2 below) removes the baseline and determines the number, positions, powers and shapes of the relevant peaks and their components. It can be applied when little or no prior knowledge of approximate peak positions is available, as is often the case in the analysis of the morphology of thin films. Throughout all stages of the procedure, we employ the following principle: among all models we choose the simplest one which is consistent with the data. That is, “simple” models are favored over “complex” models, but the definition of “simplicity” or, equivalently, “complexity” depends on the particular problem to be solved. We use three different definitions of complexity, namely,
• the number of peaks,
• the value of g (2) (θ) 2 dθ as a measure of roughness of the function g, • the number of components or kernels in the representation of each individual peak.
More formally, we first construct an approximation or confidence region (Section 3) using special multiscale conditions for the residuals. This specifies the set of functions consistent with the data. Within this class we then choose a model with minimum complexity [cf. Davies, Kovac and Meise (2008)].
To carry out this program, we make use of recent advances in nonparametric regression and denoising techniques, in particular, the taut string method of Davies and Kovac (2001) and the weighted smoothing splines procedure of Davies and Meise (2008). The taut string method reliably identifies the local extremes of the regression function and it is used to provide initial estimates of the “Peaks” component of (1). Weighted smoothing splines are then used in conjunction with the known positions of the peaks to provide a smooth estimate of the “Baseline” component of (1). Finally, we fit sums of Pearson Type VII curves to the identified peak intervals in order to decompose the peaks into their components and estimate the physically relevant parameters. What remains is the “Noise” component of (1).
We note that the application of the proposed method is not limited to thin film x-ray diffractograms. With little or no modification the procedure could also be applied to other types of diffractograms, for example, of powders or partly crystalline fibers of various materials. A wide range of PEAKS IN X-RAY DIFFRACTOGRAMS 3 spectroscopic methods yield data of a similar nature and require the unambiguous and automated identification of the position and width of relatively sharp peaks. Other applications could, for example, include the analysis of Raman-, FTIR-or NMR spectra and mass spectrometry data.
The paper is organized as follows. Section 2 gives some physical background and a description of the data sets as well as a short outline of our method. In Section 3 we introduce the statistical principles on which our procedure is based. Section 4 contains a short description of the taut string method and Section 5 introduces some modifications to accomodate for heteroskedastic noise. Section 6 describes the weighted smoothing splines procedure. Section 7 is devoted to the identification of the baseline and Section 8 to the identification and decomposition of the peaks. Section 9 gives a physical interpretation of the results. Finally, Section 10 contains a short discussion of the complete procedure.
- Diffractograms. X-ray diffraction is an important tool in various fields, including the analysis of crystalline materials, the identification of the molecular structure of proteins and, more recently, also as means to investigate the morphology of thin films. When thin films are prepared on glass substrates they are usually polycrystalline and may even contain different crystalline phases. The experim
This content is AI-processed based on open access ArXiv data.