Application of the Gaussian mixture model in pulsar astronomy -- pulsar classification and candidates ranking for {it Fermi} 2FGL catalog

Machine learning, algorithms to extract empirical knowledge from data, can be used to classify data, which is one of the most common tasks in observational astronomy. In this paper, we focus on Bayesian data classification algorithms using the Gaussian mixture model and show two applications in pulsar astronomy. After reviewing the Gaussian mixture model and the related Expectation-Maximization algorithm, we present a data classification method using the Neyman-Pearson test. To demonstrate the method, we apply the algorithm to two classification problems. Firstly, it is applied to the well known period-period derivative diagram, where we find that the pulsar distribution can be modeled with six Gaussian clusters, with two clusters for millisecond pulsars (recycled pulsars) and the rest for normal pulsars. From this distribution, we derive an empirical definition for millisecond pulsars as $\frac{\dot{P}}{10^{-17}} \leq3.23(\frac{P}{100 \textrm{ms}})^{-2.34}$. The two millisecond pulsar clusters may have different evolutionary origins, since the companion stars to these pulsars in the two clusters show different chemical composition. Four clusters are found for normal pulsars. Possible implications for these clusters are also discussed. Our second example is to calculate the likelihood of unidentified \textit{Fermi} point sources being pulsars and rank them accordingly. In the ranked point source list, the top 5% sources contain 50% known pulsars, the top 50% contain 99% known pulsars, and no known active galaxy (the other major population) appears in the top 6%. Such a ranked list can be used to help the future follow-up observations for finding pulsars in unidentified \textit{Fermi} point sources.

💡 Research Summary

The paper demonstrates how a Bayesian classification framework based on Gaussian Mixture Models (GMM) and the Expectation‑Maximization (EM) algorithm can be applied to two distinct problems in pulsar astronomy. After a concise review of GMM theory, the authors describe a Neyman‑Pearson testing procedure that uses the posterior probabilities output by the mixture model to make binary decisions. The first application concerns the classic period‑period‑derivative (P‑Ṗ) diagram. By fitting a GMM to the logarithms of period (P) and its derivative (Ṗ) for a large sample of catalogued pulsars, the Bayesian Information Criterion (BIC) selects six Gaussian components as optimal. Two of these components correspond to millisecond pulsars (MSPs), while the remaining four describe normal pulsars. The two MSP clusters differ in the chemical composition of their companion stars, suggesting distinct recycling histories. From the fitted mixture the authors derive an empirical boundary for MSPs: