Privacy Risks in Time Series Forecasting: User- and Record-Level Membership Inference
Membership inference attacks (MIAs) aim to determine whether specific data were used to train a model. While extensively studied on classification models, their impact on time series forecasting remains largely unexplored. We address this gap by introducing two new attacks: (i) an adaptation of multivariate LiRA, a state-of-the-art MIA originally developed for classification models, to the time-series forecasting setting, and (ii) a novel end-to-end learning approach called Deep Time Series (DTS) attack. We benchmark these methods against adapted versions of other leading attacks from the classification setting. We evaluate all attacks in realistic settings on the TUH-EEG and ELD datasets, targeting two strong forecasting architectures, LSTM and the state-of-the-art N-HiTS, under both record- and user-level threat models. Our results show that forecasting models are vulnerable, with user-level attacks often achieving perfect detection. The proposed methods achieve the strongest performance in several settings, establishing new baselines for privacy risk assessment in time series forecasting. Furthermore, vulnerability increases with longer prediction horizons and smaller training populations, echoing trends observed in large language models.
💡 Research Summary
This paper addresses a largely unexplored area of privacy research: membership inference attacks (MIAs) against time‑series forecasting models. While MIAs have been extensively studied for classification tasks, their impact on models that predict future values of multivariate sequences has received little attention. The authors close this gap by proposing two novel attacks and systematically evaluating them on realistic datasets and state‑of‑the‑art forecasting architectures.
Proposed attacks
- Multi‑Signal LiRA – an adaptation of the likelihood‑ratio attack (LiRA) that aggregates several statistical signals extracted from the model’s forecasts. The signals include standard error metrics (MSE, MAE, SMAPE), structural characteristics (trend and seasonality obtained via a 2‑D DFT), and representation‑based distances (L2 distance between TS2Vec embeddings of predicted and true sequences). Shadow models are trained on data splits that mimic the target’s training distribution. For each signal the mean and variance are estimated separately for “in‑shadow” (records that were part of the shadow’s training set) and “out‑shadow” groups, and a multivariate Gaussian model is fitted. The attack score is the likelihood ratio of the observed signal vector under the in‑ versus out‑distribution. Because estimating a full covariance matrix would require many shadow models, the authors approximate Σ by a diagonal matrix, which proves sufficient in practice.
- Deep Time Series (DTS) attack – an end‑to‑end classifier‑based approach. A small neural network is trained to discriminate between member and non‑member records using the raw input‑forecast pair as features. The network automatically learns to exploit subtle differences in loss trajectories, error patterns, and temporal dynamics without hand‑crafted signal engineering.
Both attacks are evaluated in two threat models: (i) record‑level MIAs, where the adversary must decide whether a single (X, Y) pair was in the training set, and (ii) user‑level MIAs, where the adversary receives a collection of records belonging to a single user and must infer whether that user’s data contributed to training. The user‑level setting reflects realistic privacy concerns in domains such as healthcare, where a patient’s entire longitudinal record may be exposed.
Experimental setup
- Datasets: TUH‑EEG (multivariate electroencephalogram recordings) and ELD (electricity load data). Both contain multiple variables per time step and are split into overlapping windows to create training records.
- Forecasting models: (a) a classic LSTM network and (b) N‑HiTS, a recent hierarchical interpolation architecture that achieves strong long‑horizon performance.
- Training regime: For each experiment the target model is trained on a random subset of users (I_train). Separate shadow models (K ≈ 10–20) are trained on disjoint user splits to emulate the target’s behavior.
- Variables explored: prediction horizon H (5, 10, 20 steps), number of records per user n (10, 30, 50), and model complexity.
Key findings
- Vulnerability is real – Both LSTM and N‑HiTS are susceptible to MIAs. Even with modest shadow model budgets, the attacks achieve high AUC scores.
- User‑level attacks are dramatically stronger – In the user‑level game, the attacks often reach near‑perfect detection (≥ 99% accuracy). Aggregating evidence across a user’s multiple records amplifies the signal, making it far easier to infer membership than for a single record.
- Prediction horizon matters – Longer horizons increase attack success. When H is extended from 5 to 20 steps, the likelihood that a record’s loss trajectory reveals memorized information grows, boosting both LiRA and DTS performance.
- Training data size matters – Smaller training populations (fewer users or fewer records per user) lead to higher over‑fitting and consequently higher MIA success rates. This mirrors trends observed in large language models.
- Attack performance – The proposed Multi‑Signal LiRA outperforms adapted versions of the original LiRA and RMIA that were designed for classification, especially when multiple signals are combined. The DTS attack achieves the best record‑level results, surpassing all baselines by 5–10% in AUC.
- Model architecture trade‑off – N‑HiTS, while delivering superior forecasting accuracy, exhibits slightly higher privacy leakage than LSTM, suggesting a trade‑off between predictive power and memorization.
Implications and recommendations
- Privacy‑by‑design for time‑series services: Practitioners deploying forecasting models in sensitive domains (e.g., medical monitoring, smart‑grid load forecasting, financial time‑series) should treat user‑level membership inference as a primary threat.
- Mitigation strategies: Differential privacy mechanisms, regularization techniques (early stopping, weight decay), and data augmentation can reduce memorization. Balancing user‑level contribution (e.g., limiting the number of records per user or applying per‑user sampling quotas) can also lower leakage.
- Benchmarking: The authors’ attacks, especially Multi‑Signal LiRA and DTS, should become standard benchmarks for privacy audits of forecasting models, analogous to how ImageNet‑based attacks are used for vision models.
- Future work: Extending formal privacy guarantees (e.g., DP‑SGD) to hierarchical models like N‑HiTS, exploring adaptive shadow‑model generation, and investigating defenses that specifically target the multi‑signal patterns identified in this study.
In summary, the paper convincingly demonstrates that time‑series forecasting models are not immune to membership inference attacks. The introduced attacks set new baselines for privacy risk assessment in this domain, and the extensive empirical analysis provides clear guidance for both researchers and practitioners on how to evaluate and mitigate these risks.
Comments & Academic Discussion
Loading comments...
Leave a Comment