The Turing Synthetic Radar Dataset: A dataset for pulse deinterleaving

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

We present the Turing Synthetic Radar Dataset, a comprehensive dataset to serve both as a benchmark for radar pulse deinterleaving research and as an enabler of new research methods. The dataset addresses the critical problem of separating interleaved radar pulses from multiple unknown emitters for electronic warfare applications and signal intelligence. Our dataset contains a total of 6000 pulse trains over two receiver configurations, totalling to almost 3 billion pulses, featuring realistic scenarios with up to 110 emitters and significant parameter space overlap. To encourage dataset adoption and establish standardised evaluation procedures, we have launched an accompanying Turing Deinterleaving Challenge, for which models need to associate pulses in interleaved pulse trains to the correct emitter by clustering and maximising metrics such as the V-measure. The Turing Synthetic Radar Dataset is one of the first publicly available, comprehensively simulated pulse train datasets aimed to facilitate sophisticated model development in the electronic warfare community

💡 Research Summary

The paper introduces the Turing Synthetic Radar Dataset (TSRD), a large‑scale, publicly available benchmark designed to advance research on radar pulse de‑interleaving—a core problem in electronic warfare (EW) and signals intelligence (SIGINT). Existing work suffers from proprietary data, limited scale, and unrealistic assumptions such as a fixed number of emitters or reliance solely on pulse‑repetition‑interval (PRI) analysis. TSRD addresses these gaps by providing 6 000 synthetic pulse‑train files that together contain nearly 3 billion pulse descriptor words (PDWs). Each PDW is a five‑dimensional vector (time‑of‑arrival, centre frequency, pulse width, angle‑of‑arrival, amplitude) reflecting the typical measurements available to a radar receiver.

Two receiver configurations are simulated:

Stare mode – a static, broadband receiver that monitors the full 0.5–18 GHz spectrum continuously for a 10‑second dwell time.
Scan mode – a frequency‑scanning receiver that steps through 500 MHz bands across the same spectrum, dropping any pulses that fall outside the currently tuned band.

Both modes generate highly interleaved pulse streams with realistic propagation effects (path loss, antenna gain, ambient noise at –100 dB) and a wide variety of transmitter behaviours (fixed‑frequency, frequency‑hopping, staggered PRI, variable pulse width, etc.). The dataset includes up to 110 emitters per test‑set pulse train, with emitter counts ranging from 0 to 77 (stare) or 0 to 87 (scan). Pulse‑train lengths vary dramatically, from a few hundred pulses to over 52 million pulses, yielding an average of ~789 k pulses per train in stare mode and ~30 k in scan mode.

A deliberate label imbalance is introduced: in some trains a single emitter may account for up to 99.7 % of the pulses, mimicking scenarios where a dominant radar masks weaker, low‑probability‑of‑intercept (LPI) emitters. Statistical analysis shows that most PDW features are largely independent, with only weak correlations (e.g., between frequency and pulse width or amplitude) arising from physical constraints. This independence forces de‑interleaving algorithms to exploit higher‑order patterns across all five dimensions rather than relying on a single dominant cue.

To promote standardized evaluation, the authors launch the “Turing De‑interleaving Challenge.” Participants must assign each pulse to its correct emitter without prior knowledge of the number of emitters, and performance is measured primarily by the median V‑measure (the harmonic mean of homogeneity and completeness) across the test set. A Python library is provided for loading, windowing, and saving data, facilitating integration with modern machine learning pipelines.

Key contributions of the work include:

A first‑of‑its‑kind open synthetic radar PDW dataset that captures realistic scale, complexity, and physical effects.
A dual‑receiver paradigm that forces models to handle both continuous broadband monitoring and intermittent frequency‑scanning, reflecting real‑world radar receivers.
Explicit handling of label imbalance and missing pulses, encouraging development of robust, noise‑tolerant clustering methods.
Standardized metrics and a public challenge that create a common benchmark for comparing clustering‑based de‑interleaving approaches.

The authors outline several promising research directions enabled by TSRD: handling variable sequence lengths, unsupervised feature extraction to produce compact representations, hierarchical or attention‑based architectures for scalability, and real‑time streaming clustering for operational deployment. By providing both the data and a clear evaluation framework, TSRD aims to catalyze rapid progress in radar de‑interleaving, ultimately benefiting EW system design, threat assessment, and automated signal intelligence pipelines.

The Turing Synthetic Radar Dataset: A dataset for pulse deinterleaving

💡 Research Summary

Comments & Academic Discussion

Leave a Comment