AttenGW: A Lightweight Attention-Based Multi-Detector Gravitational-Wave Detection Pipeline
We present AttenGW, an attention-based multi-detector gravitational-wave detection model and accompanying software stack designed for analysis of real LIGO data. AttenGW combines a per-detector hierarchical dilated convolutional network with an attention-based aggregation module that enforces cross-detector coherence, providing an alternative to graph-based aggregation schemes used in previous work. The pipeline adopts a LIGO-style preprocessing and data-loading workflow based on GWOSC time series, with standard whitening and filtering, and is released as a documented Python/PyTorch package. We benchmark AttenGW using simulated injections to estimate sensitive volume and on real O3 data, focusing on the February 2020 segment previously used to evaluate a spatiotemporal graph ensemble. On this month of data, a single AttenGW model reduces the false-positive rate relative to a single graph-based detector by a factor of a few, and an ensemble of three AttenGW models matches the performance of the corresponding six-model ensemble. Injection studies on real LIGO noise further indicate that attention-based aggregation yields stable performance on non-Gaussian backgrounds.
💡 Research Summary
AttenGW is a lightweight, attention‑based multi‑detector gravitational‑wave (GW) detection pipeline designed for direct use on real LIGO data. The system consists of three tightly integrated components: (1) a GWOSC‑driven downloader that fetches strain data from the Hanford and Livingston observatories, applies quality‑control steps (glitch clamping, PSD estimation, whitening, and band‑pass filtering between 25 Hz and 450 Hz), and stores the cleaned segments in HDF5 format; (2) a data generator that creates training examples by injecting synthetic waveforms (BBH, BNS, NSBH) into the real noise, optionally rescaling them to a target signal‑to‑noise‑ratio (SNR) and employing an SNR curriculum that presents higher‑SNR examples early in training; and (3) the AttenGW model itself, implemented in PyTorch Lightning for distributed GPU training.
The model architecture is built in two stages. First, each detector is processed independently by a Hierarchical Dilated Convolutional Network (HDCN) that follows a WaveNet‑style stack of 33 gated dilated 1‑D convolutions. The dilation factors double at each successive block (1, 2, 4, …, 2^10) and are repeated three times, enabling the network to capture temporal dependencies from a few milliseconds up to several seconds. The HDCN maps a raw or whitened strain segment into a sequence of 16‑dimensional feature vectors of the same length as the input.
Second, the two detector streams are fused through a Cross‑Attention Network (CAN). For the Hanford‑to‑Livingston direction, the Hanford feature matrix H (T × D) is projected to queries Q, while the Livingston matrix L is projected to keys K and values V. Scaled dot‑product attention is then computed across the time axis:
Attention(Q,K,V) = softmax(QKᵀ / √dₖ) V
The output replaces the original Hanford features, and a symmetric CAN performs the opposite direction. This cross‑attention allows each timestep in one detector to attend to every timestep in the other detector, dynamically weighting contributions based on the instantaneous signal and noise conditions. After attention, the updated Hanford and Livingston sequences are concatenated, reduced with a per‑timestep max‑pool, passed through a 1 × 1 convolution, and finally a sigmoid to produce a scalar confidence score pₜ ∈
Comments & Academic Discussion
Loading comments...
Leave a Comment