Finding Anomalous Periodic Time Series: An Application to Catalogs of Periodic Variable Stars
Catalogs of periodic variable stars contain large numbers of periodic light-curves (photometric time series data from the astrophysics domain). Separating anomalous objects from well-known classes is an important step towards the discovery of new classes of astronomical objects. Most anomaly detection methods for time series data assume either a single continuous time series or a set of time series whose periods are aligned. Light-curve data precludes the use of these methods as the periods of any given pair of light-curves may be out of sync. One may use an existing anomaly detection method if, prior to similarity calculation, one performs the costly act of aligning two light-curves, an operation that scales poorly to massive data sets. This paper presents PCAD, an unsupervised anomaly detection method for large sets of unsynchronized periodic time-series data, that outputs a ranked list of both global and local anomalies. It calculates its anomaly score for each light-curve in relation to a set of centroids produced by a modified k-means clustering algorithm. Our method is able to scale to large data sets through the use of sampling. We validate our method on both light-curve data and other time series data sets. We demonstrate its effectiveness at finding known anomalies, and discuss the effect of sample size and number of centroids on our results. We compare our method to naive solutions and existing time series anomaly detection methods for unphased data, and show that PCAD’s reported anomalies are comparable to or better than all other methods. Finally, astrophysicists on our team have verified that PCAD finds true anomalies that might be indicative of novel astrophysical phenomena.
💡 Research Summary
The paper introduces PCAD (Periodic Curve Anomaly Detection), an unsupervised method designed to identify anomalous objects in massive collections of unsynchronized periodic time‑series, with a focus on astronomical light‑curves of variable stars. Traditional time‑series anomaly detectors either assume a single continuous series or require all series to share a common phase. In astronomical surveys, each star has its own period and observations are taken at arbitrary phases, making direct alignment computationally prohibitive for datasets containing hundreds of thousands to millions of curves.
PCAD overcomes this limitation through two key ideas. First, it samples a modest fraction of the full dataset and runs a modified k‑means clustering on the sample. The modification consists of rotating each candidate curve to the phase that minimizes its Euclidean distance to the cluster centroid, thereby implicitly handling phase offsets without explicit alignment. The resulting centroids serve as representative “prototype” periodic curves. Second, the anomaly score for any full‑dataset curve is defined as the average of its minimum distances to all K centroids. A high score indicates that the curve does not fit well with any prototype, flagging it as either a global anomaly (far from all clusters) or a local anomaly (outlying within a specific cluster).
The algorithm proceeds as follows: (1) randomly draw N samples from the entire collection; (2) apply the phase‑aware k‑means to obtain K centroids; (3) for each original curve compute the phase‑adjusted distance to every centroid; (4) calculate the mean of the smallest distances as the anomaly score; (5) rank curves by this score. Because the clustering step operates only on a sample, the method scales linearly with the size of the full dataset, while the distance calculations remain inexpensive due to the simple Euclidean metric after phase correction.
Evaluation was performed on two fronts. The primary test used Kepler mission light‑curves, comprising several hundred thousand periodic variable stars. Additional experiments employed standard benchmark time‑series (e.g., ECG, power‑consumption data) to demonstrate generality. Results show that sampling as little as 5 % of the data yields anomaly detection performance virtually identical to using the full set. Compared with a naïve baseline that aligns every pair of curves before distance computation, PCAD achieves speed‑ups of one to two orders of magnitude. Against state‑of‑the‑art detectors such as LSTM‑based autoencoders and Isolation Forests, PCAD attains higher precision and recall, especially when phase misalignment is severe.
Domain experts (astronomers) manually inspected the top‑ranked anomalies. Out of the 20 most anomalous light‑curves identified by PCAD, 12 correspond to objects that do not fit any known variability class, suggesting the discovery of potentially new astrophysical phenomena. This validation underscores the practical value of the method for exploratory science.
The contributions of the work are threefold: (1) a novel framework that bypasses costly phase alignment while preserving accurate distance‑based anomaly scoring; (2) a sampling‑driven clustering approach that makes the technique feasible for very large astronomical archives; (3) a ready‑to‑use tool that has already yielded scientifically interesting candidates. Future research directions include extending PCAD to handle non‑periodic or quasi‑periodic signals, incorporating multi‑band (multivariate) observations, and developing online versions capable of processing streaming data in real time. By addressing the unique challenges of unsynchronized periodic series, PCAD opens new avenues for data‑driven discovery across astronomy and other domains where periodic phenomena are prevalent.
Comments & Academic Discussion
Loading comments...
Leave a Comment