Omni-iEEG: A Large-Scale, Comprehensive iEEG Dataset and Benchmark for Epilepsy Research

Reading time: 5 minute
...

📝 Original Info

  • Title: Omni-iEEG: A Large-Scale, Comprehensive iEEG Dataset and Benchmark for Epilepsy Research
  • ArXiv ID: 2602.16072
  • Date: 2026-02-17
  • Authors: ** 논문 본문에 저자 정보가 제공되지 않았습니다. (원문에 저자 명단이 포함되어 있지 않음) **

📝 Abstract

Epilepsy affects over 50 million people worldwide, and one-third of patients suffer drug-resistant seizures where surgery offers the best chance of seizure freedom. Accurate localization of the epileptogenic zone (EZ) relies on intracranial EEG (iEEG). Clinical workflows, however, remain constrained by labor-intensive manual review. At the same time, existing data-driven approaches are typically developed on single-center datasets that are inconsistent in format and metadata, lack standardized benchmarks, and rarely release pathological event annotations, creating barriers to reproducibility, cross-center validation, and clinical relevance. With extensive efforts to reconcile heterogeneous iEEG formats, metadata, and recordings across publicly available sources, we present $\textbf{Omni-iEEG}$, a large-scale, pre-surgical iEEG resource comprising $\textbf{302 patients}$ and $\textbf{178 hours}$ of high-resolution recordings. The dataset includes harmonized clinical metadata such as seizure onset zones, resections, and surgical outcomes, all validated by board-certified epileptologists. In addition, Omni-iEEG provides over 36K expert-validated annotations of pathological events, enabling robust biomarker studies. Omni-iEEG serves as a bridge between machine learning and epilepsy research. It defines clinically meaningful tasks with unified evaluation metrics grounded in clinical priors, enabling systematic evaluation of models in clinically relevant settings. Beyond benchmarking, we demonstrate the potential of end-to-end modeling on long iEEG segments and highlight the transferability of representations pretrained on non-neurophysiological domains. Together, these contributions establish Omni-iEEG as a foundation for reproducible, generalizable, and clinically translatable epilepsy research. The project page with dataset and code links is available at omni-ieeg.github.io/omni-ieeg.

💡 Deep Analysis

📄 Full Content

Epilepsy affects approximately 3.4 million people in the U.S. and nearly 50 million globally, making it one of the most common neurological disorders (for Disease Control & Prevention, 2017;Organization, 2023). About 30% of patients have drug-resistant epilepsy, where seizures cannot be controlled by medication (Kwan et al., 2010). Most of these patients experience focal seizures originating in specific brain regions (Jobst & Cascino, 2018), and successful treatment relies on accurately identifying the epileptogenic zone (EZ), the brain area crucial for seizure generation. There are two primary interventions aimed at disrupting or removing the EZ: (i) implantation of electrodes for targeted electrical stimulation and (ii) surgical resection of the affected brain tissue (Clinic, 2024). However, both strategies carry significant risks, including cognitive deficits resulting from damage to functionally critical regions (e.g., eloquent cortex) (Helmstaedter & Elger, 2013). Localization of the EZ is typically guided by a combination of inpatient observation, neuroimaging, and intracranial EEG (iEEG), including identification of seizure onset zones (SOZ) and interictal spikes. Yet, SOZ-based resections do not guarantee seizure freedom (Rosenow & Lüders, 2001), and non-invasive tests, such as scalp EEG, MRI, PET, and MEG, often fail to localize the EZ with sufficient precision (Jayakar et al., 2016). The current clinical standard involves iEEG studies to identify both the pathological brain regions (i.e., the EZ) and the functional anatomical areas that must be preserved to minimize cognitive side effects. However, this process relies heavily on manual review of extended iEEG recordings, which is time-consuming and subject to low inter-rater reliability (Spring et al., 2017).

Several recent studies explore the use of machine learning on iEEG data or machine learning refined neurophysiological biomarkers to facilitate epilepsy research; for example, network analysis (Partamian et al., 2025) and Convolutional Neural Network (Li et al., 2021b;Zhang et al., 2022b). However, many of these efforts have been validated only on single-institution datasets with limited cohort sizes, restricting their clinical generalizability and robustness. Although datasets from institutions such as (Fedele et al., 2017;Zhang et al., 2025a;Bernabei et al., 2023a;Gunnarsdottir et al., 2022) have been released, they differ in data formats and inconsistent channel naming and demographic metadata. Moreover, benchmark and evaluation metrics are not standardized across studies, limiting reproducibility and comparability. These inconsistencies hinder the ability to derive translatable insights and to establish reliable benchmarks for model performance across studies.

To address these challenges, we construct Omni-iEEG, a large-scale, standardized dataset for epilepsy research. Omni-iEEG comprises recordings from 302 patients across 178 hours from eight leading epilepsy centers, including the University of California, Los Angeles; Wayne State University; the University Hospital Zurich; the University of Pennsylvania; the University of Miami; the National Institutes of Health; and Johns Hopkins Hospital. All recordings were obtained prior to surgical resection from patients with focal epilepsy, enabling models to predict postsurgical outcomes from pre-operative data to simulate the surgical planning. Furthermore, since the iEEG recording comes with different formats and metadata, board-certified epileptologists (experts) verified and harmonized all recordings with consistent iEEG formats, channel annotations, and clinical metadata. All data were fully de-identified with institutional IRB approval or public-domain release agreements.

Beyond the recordings and metadata, Omni-iEEG also releases clinically meaningful pathological biomarkers, extensively annotated by board-certified experts to support robust biomarker research. We focus on one of the most promising clinically utilized iEEG biomarkers for localizing the epileptogenic zone: high-frequency oscillations (HFOs). HFOs have garnered growing interest in both clinical (Gotman, 2010;Zweiphenning et al., 2022;Frauscher et al., 2018b) and computational (Kuroda et al., 2021;Sciaraffa et al., 2020;Chaibi et al., 2014;Daida et al., 2025) domain. Despite their potential, the clinical utility of HFOs remains contested due to persistent challenges in distinguishing pathological from physiological HFOs (Zijlmans et al., 2012;Zweiphenning et al., 2022), as well as issues such as artifact contamination and inter-rater variability (Nariai et al., 2018;Spring et al., 2017;Zhang et al., 2025b). Specifically, Omni-iEEG releases annotations of candidate HFOs, focusing on the most widely accepted pathological definition: HFOs co-occurring with spikes (spkHFO). These annotations are conducted on machine-generated detections from multiple widely used HFO detection algorithms (Navarrete et al., 2016;Ding et al.,

Reference

This content is AI-processed based on open access ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut