Bridging the High-Frequency Data Gap: A Millisecond-Resolution Network Dataset for Advancing Time Series Foundation Models

Bridging the High-Fr equency Data Gap: A Millisecond-Resolution Network Dataset f or Advancing T ime Series F oundation Models Subina Khanal 1 Seshu Tirupathi 2 Merim Dzaferagic 3 Marco Rufﬁni 3 T orben Bach Pedersen 1 Abstract T ime series foundation models (TSFMs) require div erse, real-world datasets to adapt across v ary- ing domains and temporal frequencies. Howe ver , current large-scale datasets predominantly focus on lo w-frequency time series with sampling in- tervals, i.e., time resolution, in the range of sec- onds to years, hindering their ability to capture the nuances of high-frequency time series data. T o address this limitation, we introduce a novel dataset that captures millisecond-resolution wire- less and trafﬁc conditions from an operational 5G wireless deployment, expanding the scope of TSFMs to incorporate high-frequency data for pre-training. Further , the dataset introduces a new domain, wireless networks, thus complementing existing more general domains like energy and ﬁnance. The dataset also provides use cases for short-term forecasting, with prediction horizons spanning from 100 milliseconds (1 step) to 9.6 seconds (96 steps). By benchmarking traditional machine learning models and TSFMs on predic- tiv e tasks using this dataset, we demonstrate that most TSFM model conﬁgurations perform poorly on this ne w data distribution in both zero-shot and ﬁne-tuned settings. Our work underscores the im- portance of incorporating high-frequency datasets during pre-training and forecasting to enhance ar- chitectures, ﬁne-tuning strategies, generalization, and robustness of TSFMs in real-world applica- tions. 1. Introduction Foundation models (FMs) ha ve signiﬁcantly enhanced ma- chine learning (ML) by utilizing large-scale pre-training on diverse datasets, enabling them to generalize across a 1 Department of Computer Science, Aalborg University , Aal- borg, Denmark 2 IBM Research Europe, Dublin, Ireland 3 T rinity College Dublin, The Uni versity of Dublin, Dublin, Ireland. Corre- spondence to: Subina Khanal < subinak@cs.aau.dk > . Pr eprint. Marc h 18, 2026. wide array of tasks and domains ( Thakur , 2024 ). Recently , time series foundation models (TSFMs) hav e attracted more interest due to their capability to handle complex temporal tasks, with a particular focus on generalizing across varying time scales and domains, including forecasting, anomaly detection, and classiﬁcation ( Liang et al. , 2024 ). Howe ver , dev eloping effecti ve TSFMs requires access to datasets that capture div erse real-world scenarios at varying frequencies and across different domains. The blue dots in Fig. 1 demon- strate that the existing benchmark datasets predominantly focus on low-frequenc y time series with sampling intervals in the range of seconds to years. Hence, the focus of this paper is to dev elop and benchmark a high-frequency wireless network dataset in the millisec- ond resolution by comparing the performance of TSFMs with shallow machine learning models to enable ne w archi- tectures and ﬁne-tuning strategies that can extend to high- frequency wireless netw ork data use cases and potentially provide generalizable and div erse characteristics that can improv e the accuracy of TSFMs on existing datasets as well. The main contributions of this paper and dataset are: (1) Extending the scope of pre-training and generalizability for state-of-the-art TSFMs by providing a dataset at millisecond resolution (Fig. 1 ). (2) Introduction of a new domain, namely , wireless networks, to the existing domains of open datasets (Fig. 2 ). (3) Applications with short-term forecasting, with prediction horizons spanning from 100 milliseconds (1 step) to 9.6 seconds (96 steps) (Fig. 3 ). The rest of the paper is organized as follo ws. Related work is discussed in Section 2 . Section 3 provides a detailed descrip- tion of the 5G network data, and its characteristics. Section 4 presents the details of models benchmarked, including experimental ev aluation and analysis. Section 5 outlines the ablation study . Finally , in Section 6 , we conclude and provide directions for future research. 2. Related W ork T ime Series Foundation Models (TSFMs) have surged in recent years, with their architectures continually e volving to achiev e improved performance in both zero-shot and ﬁne-tuned scenarios. Notably , several TSFMs have gar- 1 Bridging the High-Frequency Data Gap: A Millisecond-Resolution Network Dataset for Adv ancing TSFMs 10 − 1 10 0 10 1 10 2 10 3 10 4 10 5 10 6 10 7 10 5 10 6 10 7 10 8 10 9 10 10 Y M W D 6H H 30T 15T 10T 5T 1T 4S 100MS T imescale (seconds, log scale) Dataset Size (samples, log scale) Dataset T ype Existing Benchmark Datasets Network F igure 1. Comparison of timescales and dataset sizes for standard existing datasets used for pre-training (T able 14 in ( Aksu et al. , 2024 )) as compared with the new benchmark. The red dot represents the new dataset that is introduced in this paper . Energy T ransport Climate CloudOps W eb Sales Nature Network Healthcare Econ/Fin 10 6 10 7 10 8 10 9 10 10 16,355,708,284 4,861,221,871 4,188,011,730 1,518,268,292 428,082,373 197,591,316 6,233,896 3,175,140 1,483,370 797,363 T otal Dataset Samples (log scale) F igure 2. Comparison of existing domains for pre-training (T able 14 in ( Aksu et al. , 2024 )) with the new benchmark. The red bar represents the new dataset that is introduced in this paper . 6 8 12 13 14 18 30 48 60 96 480 600 720 900 10 6 10 7 10 8 845,109 2,525,512 201,042 371,579 10,023,836 11,246,411 1,447,848 131,125,706 194,369 3,175,140 129,375,020 194,369 129,375,020 194,369 Prediction length T otal Dataset Samples (log scale) F igure 3. Comparison of prediction lengths of standard test data (T able 2 in ( Aksu et al. , 2024 )) as compared with the new benchmark. The red bar represents the new dataset that is introduced in this paper . nered widespread attention within the community , including Chronos ( Ansari et al. , 2024 ), TTM ( Ekambaram et al. , 2024 ), Moirai ( W oo et al. , 2024 ), T imesFM ( Das et al. , 2024 ), and T ime-MOE ( Xiaoming et al. , 2025 ). These mod- 2 Bridging the High-Frequency Data Gap: A Millisecond-Resolution Network Dataset for Adv ancing TSFMs 0 2 Org. 1e7 mac_dl_brate 0 1 T rend 1e7 2.5 0.0 2.5 Season. 1e6 01 00:00 01 00:10 01 00:20 01 00:30 01 00:40 01 00:50 0 1 Resid. 1e7 (a) 01 00:00 01 00:10 01 00:20 01 00:30 01 00:40 01 00:50 Timestamp 0.0 0.5 1.0 1.5 2.0 Downlink bitrate 1e7 Actual Rolling Mean Rolling Standard Deviation (b) 4 2 0 2 4 Theoretical quantiles 0.50 0.25 0.00 0.25 0.50 0.75 1.00 Ordered V alues 1e7 Probability Plot (c) 0 200 400 600 800 1000 P eriods 2 0 2 4 6 8 SNR (dB) (d) F igure 4. T arget v ariable (Do wnlink Bitrate; mac dl brate): (a) STL decomposition, (b) Rolling mean and standard deviation, (c) Residual Q-Q, (d) Signal-to-Noise Ratio (dB). T able 1. Summary of STL Decomposition of all datasets. Dataset STL Decomposition T rend Seasonality Residuals Network Unstable, step-like shifts. W eak short-term periodic patterns. Hidden by noise. Sharp spikes. Bursts of noise. Lots of unpredictable variation. ETTh1 Mostly steady with small rises and falls. Small, regular repeating pattern. T iny random changes. Electricity Remains steady throughout. Strong repeating pattern. Occasional b ursts of noise. W eather Almost ﬂat but interrupted by sudden sharp spikes. No seasonality . Mostly small, but with rare sudden jumps. T rafﬁc Slowly increasing trend ov er time. Strong, regular repeating pattern. Small random changes. els can be broadly categorized into two distinct classes: transformer-based and non-transformer -based architectures ( Liang et al. , 2024 ). Our work complements these de velop- ments by introducing a high-frequency , real-world dataset from a nov el domain (wireless networks), which provides an additional and challenging benchmark for ev aluating the robustness and adaptability of TSFMs. T ransformer -based TSFMs lar gely follo w established self- supervised (e.g., Moirai) or supervised transformer frame- works (e.g., TimeXer), which hav e garnered signiﬁcant recognition within the ﬁeld. In contrast, non-transformer- 3 Bridging the High-Frequency Data Gap: A Millisecond-Resolution Network Dataset for Adv ancing TSFMs based TSFMs lev erage alternativ e machine learning models such as Multi-Layer Perceptron (MLP) and Con volutional Neural Networks (CNN) (e.g., TTMs). More recent efforts hav e also focused on enhancing dif fusion-based methods ( K ollovieh et al. , 2023 ; Su et al. , 2025 ) for modeling and generating data of different characteristics, which is cru- cial for generati ve time series forecasting. Furthermore, to address statistical heterogeneity in time series foundation model training and ensure robust generalization, a decen- tralized cross-domain model fusion approach, as Federated Learning (FL), has been explored in ( Chen et al. , 2025 ). The successful deployment of these TSFMs for accurate zero-shot forecasting relies on the de velopment of pre- trained models that have undergone extensiv e training on datasets characterized by diverse patterns and resolution properties. This emphasis on data diversity is critical, as it enables TSFMs to exhibit generalizability across a wide range of scenarios and capture complex temporal dynamics with enhanced accuracy . Notably , prior research has under- scored the importance of resolution and domain diversity in pre-trained models for optimizing performance (e.g., Sec- tion 4 in ( Ansari et al. , 2024 ) for Chronos and Section 4.9 and Fig.3 in ( Ekambaram et al. , 2024 ) for TTM. In practice, a range of open datasets is available for TSFMs, which collectively provide the necessary heterogeneity to ensure that these models generalize eff ectiv ely to out-of- domain datasets and real-world applications. Speciﬁcally , popular datasets such as those from Monash ( Godahew a et al. , 2021 ), LIBCITY ( W ang et al. , 2021 ), and the UCI Machine Learning archiv e ( Asuncion et al. , 2007 ) have be- come foundational in pre-training TSFMs and are widely utilized for assessing model performance. These datasets not only serv e as data for pre-trained models b ut also enable out-of-domain testing of pre-trained models when a sub- set of the datasets are not considered for pre-training. W e position our dataset as a complementary resource to these existing open datasets, speciﬁcally targeting the gap for millisecond-le vel time series from communication networks for both training and out-of-domain e valuation of TSFMs. Our dataset directly addresses this need for di versity by in- troducing a pre viously underrepresented domain with v ery ﬁne temporal granularity , thereby contributing to a better understanding of the generalization capabilities of TSFMs when applied to high-frequency wireless data. This paper provides a benchmark dataset that can ﬁll the crit- ical gap for high-frequency data for TSFMs. In contrast to other high-frequency datasets, our network dataset provides carefully curated use cases for uni v ariate and multi v ariate forecasting problems ideally suited for TSFMs, along with an initial benchmark study on this dataset. 3. Dataset 3.1. Dataset Overview W e utilize a time series dataset of 5G Radio Access Network (RAN) Performance Measurements (PMs) collected from a real-world deployment of a 5G Open Radio Access Netw ork (O-RAN) within the OpenIreland testbed. O-RAN intro- duces a modular and open architecture that decomposes the traditional monolithic RAN into standardized, interoperable components (i.e, the Central Unit (CU), Distributed Unit (DU), and Radio Unit (RU)) facilitating multi-vendor de- ployments and software-dri ven control. Central to O-RAN’ s programmability is the near -Real-T ime RAN Intelligent Controller (near-R T RIC), which enables rapid, feedback- driv en network optimization. The data was captured using software-deﬁned radios (Et- tus USRPs) conﬁgured as a base station and multiple user equipments (UEs). T o simulate di verse real-world usage, the setup incorporated v arious mobility proﬁles (static, pedes- trian, car , bus, and train) and generated trafﬁc from both benign applications (web browsing, V oIP , IoT , and video streaming) and malicious activities (DDoS-Ripper, DoS- Hulk, PortScan, Slowloris). PMs were collected at the base station side and span a broad set of physical and medium access control layer features, including the Channel Quality Indicator (CQI), Modulation and Coding Scheme (MCS), Noise ratio interference (SINR), Signal strength (RSSI), buf fer occupancy , and packet deli very statistics. In the dataset, each UE is associated with a unique identiﬁer , de- noted as ue ident , which serves to distinguish individual UEs across all collected traces. This identiﬁer remains con- sistent for a giv en UE, regardless of the mobility pattern or trafﬁc class associated with its traces. The resulting dataset enables temporal modeling of RAN dynamics under realistic operational conditions. This data context is particularly well-suited for very short- term forecasting, where the goal is to predict network states (e.g., throughput, channel quality , traf ﬁc class) ov er a short horizon ranging from milliseconds to a fe w seconds. Such forecasting enables predictive control strategies in scenar - ios characterized by rapid ﬂuctuations in load, mobility , or interference (see Section 3.2 for dataset characteristics). Short-term throughput predictions enhance scheduling ef- ﬁciency and application-level rate control, especially in latency-sensiti ve services like cloud gaming or interacti ve video. Forecasting CQI, for example, allows the network to proacti vely steer users to cells with better anticipated ra- dio conditions, support load-aware hando vers, and preemp- ti vely adjust adapti ve bitrate algorithms for video streaming. Like wise, anticipating trafﬁc class transitions supports early enforcement of QoS policies, dynamic resource allocation (e.g., in network slicing), and intrusion detection mecha- nisms capable of identifying malicious activity before it 4 Bridging the High-Frequency Data Gap: A Millisecond-Resolution Network Dataset for Adv ancing TSFMs signiﬁcantly degrades the service. 3.2. Dataset Characteristics While Section 3.1 provides a broad overvie w of the 5G network dataset, our analysis and experiments are carried out on a carefully ﬁltered subset of the data. W e ﬁlter the raw data on the basis of the mobility pattern and benign traf ﬁc class. In particular, the static mobility pattern for the video str eaming trafﬁc class. Therefore, the results presented here represent the characteristics of the ﬁltered dataset rather than those of the complete dataset. The time series of the 5G network demonstrates sev eral im- portant characteristics. Fig. 4a shows the STL (Seasonal and T rend decomposition using Loess) of the time series, which separates the original data (labeled Org. in Fig. 4a ) into distinct structural components, i.e., the trend, seasonal and residual components. Here, the trend component reﬂects the underlying structure of the series; howe ver , it appears unstable, as characterized by step-like shifts rather than a smooth trajectory . The seasonal component captures only weak short-term periodic patterns, which are easily obfus- cated by the stronger irregular behavior in the data. The residual component contains the remaining variability , in- cluding sharp spikes and bursts of endogenous noise that cannot be explained by trend or seasonality . Similarly , as illustrated in Fig. 4b , both the rolling mean and the stan- dard deviation are observed to change substantially over time, conﬁrming that the process is non-stationary and het- eroskedastic. This means that the statistical properties of the data are not constant. The data exhibit e xtreme outlier ev ents that are more prominent in speciﬁc time periods than in random ev ents throughout the series. The autocorrelation analysis (see Section 9 ) reveals a strong temporal persis- tence with slow decay , conﬁrming the clustering of extreme ev ents observed in the data. In Fig. 4c , the residuals deviate strongly from the reference line, particularly in the tails, indicating a heavy-tailed distrib ution. Finally , the signal-to- noise ratio (SNR) analysis in Fig. 4d provides a quantitative view of this instability . The SNR values highlight that the series is dominated by short-term periodic structures (high SNR in periods 2-20), while medium-term c ycles exist b ut are weaker , and long-term seasonality is essentially absent (SNR nears to zero and ev en negati ve be yond period 600). Overall, the time series is mostly inﬂuenced by short-term changes, bursts of v olatility and clustered anomalies, rather than stable long-term trends. Next, we provide a summary of the overvie w on the com- parison between our 5G network dataset and other common pre-trained datasets (further experimental details are pre- sented in Appendix A.4 ). The pre-trained datasets used for comparison are: ETTh1 ( Zhou et al. , 2021 ) is an hourly subset of the Electricity Transformer T emperature (ETT) T able 2. Features used in multiv ariate setting. Featur e Description CQI Channel Quality Indicator MCS Modulation and Coding Scheme pkt ok Number of packets sent pkt nok Number of packets dropped dataset, containing two years of transformer oil temperature and related power load data from two counties in China. Electricity ( W u et al. , 2021 ) dataset contains the hourly electricity consumption(in kWh) from 321 clients, recorded between 2012 and 2014. W eather ( W u et al. , 2021 ) data from 2020 in Germany , recorded ev ery 10 minutes, with 21 indicators such as air temperature, humidity , and wind speed. T rafﬁc ( W u et al. , 2021 ) is a collection of hourly road occupancy rates (0–1) from sensors on San Francisco Bay Area freew ays, collected by the California Department of T ransportation between 2015 and 2016. T able 1 sum- marizes the key differences among the datasets based on their STL decomposition, highlighting that our dataset is notably different due to its unstable trend, weak seasonality , and spiky residuals. Appendix A.4 includes other data char- acteristics, such as temporal dependencies, and statistical variability . 4. Benchmark In this section, we provide a comprehensi ve analysis of the benchmarked models (as explained in 4.1 ) for the considered target v ariable downlink bitrate ( bitrate ) in the 5G network dataset. In the multiv ariate setting, all considered models use four input features, with descriptions provided in T able 2 . Section 4.3 provides implementation details, including the data processing pipeline, that reﬂects our consideration of only a subset of data to illustrate the impact of this high- frequency dataset. 4.1. Models benchmarked W e selected three state-of-the-art tree-based ensemble mod- els: Random Forest (RF) ( Breiman , 2001 ), implemented using Scikit-learn, eXtreme Gradient Boosting (XGBoost, hereafter XGB) ( Chen & Guestrin , 2016 ), and Adapti ve Ran- dom Forest (ARF) ( Gomes et al. , 2018 ) implemented using the River library . As an additional online baseline, we in- cluded a simple incremental linear regression model, Online LR (OLR) ( Ouhamma et al. , 2021 ), also implemented using the Riv er library . Similarly , we selected a non-parametric baseline, referred to as nai ve forecast (Nai ve) ( Beck et al. , 2025 ), for a fair e v aluation on high-frequency data. In addition, we ev aluated three time series foundation mod- els (TSFMs): T inyT imeMixer (TTM) ( Ekambaram et al. , 5 Bridging the High-Frequency Data Gap: A Millisecond-Resolution Network Dataset for Adv ancing TSFMs T able 3. Parameters used in model training. Parameter Univariate Multivariate n models 10 20 max features None 0.5 grace period 50 100 max depth None 5 (a) Hyper-parameters speciﬁc to ARF . Parameter V alue T arget v ariable Downlink bitrate No. of features 4 Mobility pattern Static Past observ ations 5 Prediction horizon 96 T rain set:T est set 80:20 (b) Common parameters for all shallow models. 2024 ), Chronos ( Ansari et al. , 2024 ), and Lag-Llama ( Ra- sul et al. , 2023 ), each speciﬁcally designed for time series forecasting. TTM is an extremely light-weight pre-trained model, with effecti ve transfer learning capabilities based on the light-weight TSMix er architecture. Likewise, Chronos is a language modeling frame work for time series for pre- trained probabilistic time series models. In this work, we speciﬁcally adopted the Chronos-bolt-small variant (46M parameters) as the representativ e Chronos model for our experiments. Lag-Llama is a general-purpose foundation model for univ ariate probabilistic time series forecasting based on a decoder-only transformer architecture that uses lags as cov ariates. 4.2. System speciﬁcation The experiments are carried out on a local machine with the following hardw are and software speciﬁcations: Operating System: Microsoft Windo ws 10 Enterprise, V ersion 22H2; Processor: 11th Gen Intel ® Core™ i7-1165G7 CPU @ 2.80 GHz with 4 cores and 8 threads; Memory: 32 GB RAM. 4.3. Implementation details Pr e-processing: During data pre-processing, we changed the time resolution of our dataset by conv erting the original millisecond-lev el observ ations into 100-millisecond inter- val s. This choice reﬂects practical constraints in O-RAN net- works, where collecting performance measurements at ev ery millisecond would impose excessi ve ov erhead. For shallow models, input sequences are constructed using a sliding- window approach, where past observations within a ﬁxed windo w are used to predict future target v alues. For TSFMs, we follo w the original implementation protocols described in their respecti ve papers. The prediction horizons range from 1 millisecond up to 9.6 seconds. Short-term horizons are often straightforward, as the target v ariable (i.e., bitrate) tends to remain stationary across v ery small timescales. In contrast, longer horizons provide more meaningful insights, enabling applications such as video streaming to anticipate changes in bitrate and proactively adjust parameters like en- coding le vel. These long horizon forecasts are v aluable both for adapting Quality of Service (QoS) and for estimating the stability of the bitrate, that is, how frequently it is e xpected to change. Model par ameters: T able 3 summarizes the parameters used during model training. For common parameters shared across all models, ofﬂine experiments were conducted to select optimal values based on prediction accuracy , ensuring fair benchmarking conditions. Both RF and XGB models used these optimized common parameters along with their respectiv e default model-speciﬁc hyper-parameters without additional tuning. For the ARF model, the same optimized common parameters were used, and model-speciﬁc hyper- parameter tuning was performed for multiv ariate setting using random search methodology , with parameter ranges detailed in T able 3a . The best performing Root Mean Square Error (RMSE)-based ARF conﬁguration was selected for the ﬁnal ev aluation. Furthermore, the prediction horizon for all models is set at 96 steps, with each step representing a 100- millisecond interv al; which corresponds to predicting the next 9.6 seconds (9600 ms). The dataset is divided into 80% training and 20% testing, preserving temporal order . As the data for each user are sequential and not mixed, this split naturally keeps the sequence of each user intact, prev enting data leakage from future observations into training. Model training: For RF and XGB, we utilized Scikit-learn’ s MultiOutputRe gressor wrapper to enable direct multi-step forecasting. For TSFMs, we follow the original implemen- tation protocols described in their respecti ve papers. Each model was trained with three dif ferent random seeds (42, 99, and 123) to ensure reproducibility and to assess v ariability in performance. P ost-pr ocessing: Both Chronos and Lag-Llama are trained to predict a ﬁx ed length horizon H from a giv en context win- dow . By def ault, these models produce forecasts only for the ﬁnal prediction windo w of each series and skip series that do not meet the minimum context length. This default e v al- uation frame work dif fers from shallow models that generate forecasts for ev ery test sample. T o ensure a consistent com- parison across models, we implemented a rolling ev aluation procedure for both Chronos and Lag-Llama. Speciﬁcally , starting with each timestamp t , we provide the model with all historical data av ailable up to t and generate the next steps H . W e then slide the starting point forward by one time step and repeat the prediction until the end of the series. This produces overlapping multi-step forecasts aligned with 6 Bridging the High-Frequency Data Gap: A Millisecond-Resolution Network Dataset for Adv ancing TSFMs T able 4. Performance metrics of benchmarked models. Univariate Multivariate Model RMSE MAE RMSE MAE RF 0 . 0344 ± 0 . 0001 0 . 0227 ± 0 . 0001 0 . 0342 ± 0 . 0001 0 . 0226 ± 0 . 0001 XGB 0 . 0354 ± 0 . 0001 0 . 0232 ± 0 . 0001 0 . 0354 ± 0 . 0001 0 . 0231 ± 0 . 0001 ARF 0.0270 ± 0.0002 0 . 0189 ± 0 . 0001 0.0175 ± 0.0007 0.0130 ± 0.0005 Naiv e 0 . 0418 ± 0 . 0000 0 . 0240 ± 0 . 0000 0 . 0418 ± 0 . 0000 0 . 0240 ± 0 . 0000 OLR 0 . 0551 ± 0 . 0000 0 . 0308 ± 0 . 0000 0 . 0555 ± 0 . 0000 0 . 0310 ± 0 . 0000 TTM (Zero-shot) 0 . 0359 ± 0 . 0000 0 . 0230 ± 0 . 0000 0 . 0359 ± 0 . 0000 0 . 0230 ± 0 . 0000 TTM (Fine-tuning) 0 . 0371 ± 0 . 0015 0 . 0237 ± 0 . 0011 0 . 0393 ± 0 . 0007 0 . 0250 ± 0 . 0004 Chronos (Zero-shot) 0 . 0313 ± 0 . 0000 0 . 0185 ± 0 . 0000 0 . 0273 ± 0 . 0000 0 . 0181 ± 0 . 0000 Chronos (Fine-tuning) 0 . 0281 ± 0 . 0000 0.0178 ± 0.0000 0 . 0253 ± 0 . 0000 0 . 0176 ± 0 . 0000 Lag-Llama (Zero-shot) 0 . 0617 ± 0 . 0002 0 . 0384 ± 0 . 0001 - - Lag-Llama (Fine-tuning) 0 . 0474 ± 0 . 0039 0 . 0268 ± 0 . 0009 - - 01 00:45 01 00:50 01 00:55 Timestamp 2 4 6 Downlink bitrate 1e6 Actual RF XGB 01 00:45 01 00:50 01 00:55 Timestamp 2 4 6 8 Downlink bitrate 1e6 ARF OLR 01 00:45 01 00:50 01 00:55 Timestamp 2 4 6 8 Downlink bitrate 1e6 Naive T TM Zero-shot T TM Fine-tuning 01 00:45 01 00:50 01 00:55 Timestamp 2 4 6 Downlink bitrate 1e6 Chronos Zero-shot Chronos Fine-tuning 01 00:45 01 00:50 01 00:55 Timestamp 2 4 6 8 Downlink bitrate 1e6 Lag-Llama Zero-shot Lag-Llama Fine-tuning 00:46 00:48 2 4 1e6 00:46 00:48 2 4 1e6 00:46 00:48 2 4 1e6 00:46 00:48 2 4 1e6 00:46 00:48 2 4 1e6 F igure 5. Actual v .s. Predicted bitrate values in a Uni variate setting. 7 Bridging the High-Frequency Data Gap: A Millisecond-Resolution Network Dataset for Adv ancing TSFMs 01 00:45 01 00:50 01 00:55 Timestamp 1 2 3 4 5 6 7 8 Downlink bitrate 1e6 Actual RF XGB 01 00:45 01 00:50 01 00:55 Timestamp 1 2 3 4 5 6 7 8 Downlink bitrate 1e6 ARF OLR 01 00:45 01 00:50 01 00:55 Timestamp 1 2 3 4 5 6 7 8 Downlink bitrate 1e6 Naive T TM Zero-shot T TM Fine-tuning 01 00:45 01 00:50 01 00:55 Timestamp 1 2 3 4 5 6 7 Downlink bitrate 1e6 Chronos Zero-shot Chronos Fine-tuning 00:46 00:48 1 2 3 4 1e6 00:46 00:48 1 2 3 4 1e6 00:46 00:48 1 2 3 4 1e6 00:46 00:48 1 2 3 4 1e6 F igure 6. Actual v .s. Predicted bitrate values in a multi variate setting. T able 5. Performance Metrics for Different Fine-T uning strategies for TTM. Fine-tuning Strategy RMSE MAE Head-only ﬁne-tuning 0.0413 0.0270 Adapter-based ﬁne-tuning 0.0522 0.0334 each test timestamp, allowing direct comparison with the shallow models. T o extend Chronos, which is inherently a uni variate model, to the multi v ariate setting, we use AutoGluon-T imeSeries (A G-TS) cov ariate regressors ( Shchur et al. , 2023 ). The cov ariate regressor is a tabular model trained on kno wn co- variates and static features to predict the target at each time step. Its predictions are subtracted from the target series, and Chronos then forecasts the residuals. For each rolling window , we create a future cov ariate table that matches the next H time steps immediately following the end of the current window . This table contains the values of the exoge- nous v ariables for those steps, allo wing Chronos to use both the historical target and the future covariates to generate accurate forecasts. 4.4. Results In this section, we e valuate the performance of shallo w mod- els and TSFMs in both uni variate and multiv ariate settings. T able 4 presents the performance of the benchmarked shal- low models and TSFMs, e v aluated using RMSE and Mean Absolute Error (MAE). In both settings, ARF consistently outperforms the other shallow models and TSFMs. The performance gain is consistent with the data characteristics observed in Section 3.2 ; our 5G network dataset is dom- inated by irregular spikes, step-like changes, and lack of stable seasonality . Static models such as RF or XGB strug- gle in performance because they assume that the training distribution does not change ov er time, leading to poor gen- eralization when sudden data shifts occur . While the Online LR baseline, despite updating incrementally , cannot fully capture the complex, non-linear dynamics in the data. Sim- ilarly , TSFMs performance degrades due to a shift in data distribution in the zero-shot scenario, as these pre-trained models are trained only on low-frequency data, limiting their ability to capture high-frequenc y dynamics with unpre- 8 Bridging the High-Frequency Data Gap: A Millisecond-Resolution Network Dataset for Adv ancing TSFMs T able 6. Performance metrics of benchmarked models with the increasing temporal resolution. T emporal Resolution Prediction Horizon ARF TTM Zero-shot TTM Fine-tuning Chronos Zero-shot RMSE MAE RMSE MAE RMSE MAE RMSE MAE 100 ms 96 0.0457 0.0262 0.0765 0.0434 0.0743 0.0421 0.0622 0.0338 200 ms 48 0.0471 0.0267 0.0870 0.0499 0.0880 0.0496 0.0740 0.0389 500 ms 20 0.0398 0.0218 0.0855 0.0490 0.0894 0.0542 0.0711 0.0372 1000 ms 10 0.0297 0.0176 0.0856 0.0500 0.0856 0.0500 0.0580 0.0326 2000 ms 5 0.0289 0.0169 0.0880 0.0527 0.0915 0.0584 0.0671 0.0354 3000 ms 4 0.0289 0.0185 0.1049 0.0618 0.1061 0.0638 0.0860 0.0443 dictable spikes and irre gular patterns. Based on the results it can be seen that e ven after ﬁne-tuning and further hyper - parameter tuning (see Section A.5.3 ) on our dataset, the performance of TSFMs remains suboptimal, as they fail to generalize effecti vely . In contrast, ARF is designed to handle concept drift by dynamically updating its ensemble of trees as new patterns appear . This allows it to quickly adapt to data distrib ution changes and maintain predicti ve accuracy e ven in the presence of strong irregularities. While it is observed that Chronos offers a competiti ve performance in the uni v ariate setting, ARF outperforms Chronos in the multiv ariate setting. The performance of these models is more clearly reﬂected in Fig. 5 and Fig. 6 . W e observe that ARF follow the curve/trend of the bitrate much better than the other shallow models and TSFMs. For the purpose of visualization, we average the actual and predicted values for each test sample. 5. Ablation Study 5.1. Fine-tuning Strategies for TTM In this section, we analyze how different ﬁne-tuning strate- gies af fects the performance of TTM. W e explore tw o dif fer- ent ﬁne-tuning strategies: (i) Head-only ﬁne-tuning ( Ekam- baram et al. , 2024 ), where we freeze the entire backbone and decoder and only train the ﬁnal prediction head, (ii) Adapter -based ﬁne-tuning ( Houlsby et al. , 2019 ), where we incorporate lightweight MLP adapter modules inside the mixer blocks while keeping the original TTM weights frozen. Recent works on ﬁne-tuning TSFMs ( T omar et al. ) has sho wn that e ven widely used P arameter-Efﬁcient Fine- T uning (PEFT) methods like Lo w-Rank Adaptation (LoRA) do not consistently improve the performance of TSFMs. Our ﬁndings in T able 5 aligns with this observation; e ven though both the ﬁne-tuning strategies are architecturally compati- ble with TTM, their performance is worse as compared to default TTM ﬁne-tuning approach. 5.2. T emporal Resolution In this section, we ev aluate the performance of ARF and TSFMs in a multi variate setting. W e analyze the effect of increasing the temporal resolution of the data on the per- formance of both ARF and TSFMs, performing ﬁne-tuning only for TTM because of its computational ef ﬁciency . The prediction horizon is ﬁxed at 9.6 seconds for all tempo- ral resolutions. Speciﬁcally , we ev aluate these models on a newly ﬁltered data; pedestrian mobility pattern for the video streaming traf ﬁc class, to highlight that the characteristics of our dataset dif fer from those of the pre-trained datasets. T able 6 shows the performance of ARF and TSFMs. No- tably , increasing the temporal resolution does not improv e the performance of TSFMs. In contrast, ARF consistently outperforms TSFMs at each resolution, as higher temporal resolution reduces noise and improv es its predictions. This indicates that TSFMs perform poorly not only because of temporal resolution (i.e., high frequency), but also due to the inherent characteristics of our data. 6. Conclusion and Future W ork W e present a no vel high-frequency time series dataset captur- ing millisecond-resolution measurements from real-world wireless network. This dataset ﬁlls a critical gap in exist- ing large-scale resources, which largely lack ﬁne-grained, real-time wireless network data. Our experiments reveal the limitations of current TSFMs and highlight the need to incorporate di verse, high-resolution datasets during pre- training to improve generalization. In the future, we will use this dataset for the use case of anomaly detection and transfer learning across various mobility proﬁles. References Aksu, T ., W oo, G., Liu, J., Liu, X., Liu, C., Sav arese, S., Xiong, C., and Sahoo, D. Gift-ev al: A benchmark for general time series forecasting model e valuation. arXiv pr eprint arXiv:2410.10393 , 2024. 9 Bridging the High-Frequency Data Gap: A Millisecond-Resolution Network Dataset for Adv ancing TSFMs Ansari, A. F ., Stella, L., T urkmen, C., Zhang, X., Mer- cado, P ., Shen, H., Shchur , O., Rangapuram, S. S., Pineda Arango, S., Kapoor , S., Zschiegner , J., Mad- dix, D. C., Mahone y , M. W ., T orkkola, K., Gordon W il- son, A., Bohlke-Schneider , M., and W ang, Y . Chronos: Learning the language of time series. T ransactions on Machine Learning Researc h , 2024. ISSN 2835- 8856. URL https://openreview.net/forum? id=gerNCVqqtR . Asuncion, A., Newman, D., et al. Uci machine learning repository . 2007. Published by Irvine, CA, USA. Beck, N., Dovern, J., and V ogl, S. Mind the naiv e fore- cast! a rigorous e v aluation of forecasting models for time series with low predictability: N. beck et al. Applied Intelligence , 55(6):395, 2025. Breiman, L. Random forests. Machine Learning , 45(1): 5–32, 2001. doi: 10.1023/A:1010933404324. Published by Springer . Chen, S., Long, G., Jiang, J., and Zhang, C. Federated foun- dation models on heterogeneous time series. In Proceed- ings of the AAAI Conference on Artiﬁcial Intelligence , volume 39, pp. 15839–15847, 2025. Chen, T . and Guestrin, C. Xgboost: A scalable tree boost- ing system. In Proceedings of the 22nd ACM SIGKDD International Confer ence on Knowledge Discovery and Data Mining , pp. 785–794, 2016. Das, A., K ong, W ., Sen, R., and Zhou, Y . A decoder- only foundation model for time-series forecasting. In F orty-ﬁrst International Confer ence on Machine Learn- ing , 2024. Ekambaram, V ., Jati, A., Dayama, P ., Mukherjee, S., Nguyen, N., Gifford, W . M., Reddy , C., and Kalagnanam, J. T iny time mixers (ttms): Fast pre-trained models for enhanced zero/fe w-shot forecasting of multi v ariate time series. Advances in Neural Information Pr ocessing Sys- tems , 37:74147–74181, 2024. Godahew a, R., Ber gmeir , C., W ebb, G. I., Hyndman, R. J., and Montero-Manso, P . Monash time series forecasting archi ve. In Neural Information Pr ocessing Systems T rac k on Datasets and Benchmarks , 2021. Gomes, H. M., Barddal, J. P ., Ferreira, L. E. B., and Bifet, A. Adapti ve random forests for data stream regression. In ESANN , 2018. Houlsby , N., Giurgiu, A., Jastrzebski, S., Morrone, B., De Laroussilhe, Q., Gesmundo, A., Attariyan, M., and Gelly , S. Parameter-ef ﬁcient transfer learning for nlp. In International confer ence on machine learning , pp. 2790– 2799. PMLR, 2019. Hu, E. J., Shen, Y ., W allis, P ., Allen-Zhu, Z., Li, Y ., W ang, S., W ang, L., Chen, W ., et al. Lora: Low-rank adaptation of large language models. International Conference on Learning Repr esentations, ICLR , 1(2):3, 2022. K ollovieh, M., Ansari, A. F ., Bohlke-Schneider , M., Zschiegner , J., W ang, H., and W ang, Y . B. Predict, reﬁne, synthesize: Self-guiding diffusion models for probabilis- tic time series forecasting. Advances in Neural Informa- tion Pr ocessing Systems , 36:28341–28364, 2023. Langley , P . Crafting papers on machine learning. In Langle y , P . (ed.), Proceedings of the 17th International Confer ence on Machine Learning (ICML 2000) , pp. 1207–1216, Stan- ford, CA, 2000. Morgan Kaufmann. Liang, Y ., W en, H., Nie, Y ., Jiang, Y ., Jin, M., Song, D., Pan, S., and W en, Q. Foundation models for time series analysis: A tutorial and survey . In Pr oceedings of the 30th A CM SIGKDD Conference on Knowledge Discovery and Data Mining , pp. 6555–6565, 2024. Ouhamma, R., Maillard, O.-A., and Perchet, V . Stochastic online linear regression: the forward algorithm to replace ridge. Advances in Neural Information Pr ocessing Sys- tems , 34:24430–24441, 2021. Rasul, K., Ashok, A., Williams, A. R., Khorasani, A., Adamopoulos, G., Bhagwatkar, R., Bilo ˇ s, M., Ghonia, H., Hassen, N., Schneider , A., et al. Lag-llama: T owards foundation models for time series forecasting. In R0- F oMo: Robustness of F ew-shot and Zer o-shot Learning in Lar ge F oundation Models , 2023. Shchur , O., T urkmen, A. C., Erickson, N., Shen, H., Shirko v , A., Hu, T ., and W ang, B. Autogluon–timeseries: Automl for probabilistic time series forecasting. In International Confer ence on Automated Machine Learning , pp. 9–1. PMLR, 2023. Su, C., Cai, Z., Tian, Y ., Chang, Z., Zheng, Z., and Song, Y . Diffusion models for time series forecasting: A survey . arXiv pr eprint arXiv:2507.14507 , 2025. Thakur , S. C. Foundation models for time series forecasting. International IT J ournal of Resear ch, ISSN: 3007-6706 , 2(4):144–156, 2024. T omar , S., T irupathi, S., Marinescu, R., Daly , E. M., and Dusparic, I. At4ts: Autotune for time series foundation models. T ransactions on Machine Learning Resear ch . W ang, J., Jiang, J., Jiang, W ., Li, C., and Zhao, W . X. Libc- ity: An open library for trafﬁc prediction. In Pr oceed- ings of the 29th International Conference on Advances in Geographic Information Systems , SIGSP A TIAL ’21, pp. 145–148, Ne w Y ork, NY , USA, 2021. Association for Computing Machinery . ISBN 9781450386647. 10 Bridging the High-Frequency Data Gap: A Millisecond-Resolution Network Dataset for Adv ancing TSFMs W oo, G., Liu, C., Kumar , A., Xiong, C., Savarese, S., and Sahoo, D. Uniﬁed training of uni versal time series fore- casting transformers. In Proceedings of the 41st Inter- national Confer ence on Machine Learning , ICML ’24. JMLR.org, 2024. W u, H., Xu, J., W ang, J., and Long, M. Autoformer: Decom- position transformers with auto-correlation for long-term series forecasting. Advances in Neural Information Pr o- cessing Systems, NeurIPS , 34:22419–22430, 2021. Xiaoming, S., Shiyu, W ., Y uqi, N., Dianqi, L., Zhou, Y ., Qingsong, W ., and Jin, M. T ime-moe: Billion-scale time series foundation models with mixture of experts. In ICLR 2025: The Thirteenth International Conference on Learning Repr esentations . International Conference on Learning Representations, 2025. Zhou, H., Zhang, S., Peng, J., Zhang, S., Li, J., Xiong, H., and Zhang, W . Informer: Beyond efﬁcient transformer for long sequence time-series forecasting. In Proceed- ings of the AAAI Conference on Artiﬁcial Intelligence , volume 35, pp. 11106–11115, 2021. 11 Bridging the High-Frequency Data Gap: A Millisecond-Resolution Network Dataset for Adv ancing TSFMs A. A ppendix A.1. Use cases The dataset enables a wide range of learning tasks that support more adaptive Radio Access Network (RAN) behavior , particularly within O-RAN systems where short-term predictions and rapid classiﬁcations can guide near-real-time control. Its millisecond-resolution measurements, combined with detailed PHY - and MA C-layer indicators and labels for trafﬁc type and mobility class, allow the design of regression models that forecast throughput, channel quality , and link reliability over horizons from a fe w milliseconds to sev eral seconds. Such predictions can inform scheduling decisions at the DU, guide proacti ve MCS and po wer adjustments, improve rate control for latenc y-sensiti ve applications, and support mobility steering by anticipating future channel degradation for fast-moving users. The temporal characteristics of the dataset, including irregular b ursts and heavy-tailed dynamics, make it well suited for e valuating predictiv e approaches in environments where rapid ﬂuctuations dominate. The dataset also supports classiﬁcation tasks in volving mobility identiﬁcation and trafﬁ c-type recognition. Since user mov ement patterns such as static, pedestrian, car , bus, and train produce distinct combinations of SINR, CQI, and bitrate variability , models trained on these traces can infer mobility beha vior directly from RAN KPIs. Such inferences allow the RAN Intelligent Controller (RIC) to select mobility-aw are handov er strategies, tune power control settings, or commit resources more ef ﬁciently . Traf ﬁc classiﬁcation, which extends across benign and malicious ﬂows, pro vides an additional line of evidence for service-awareness and security monitoring. The dataset includes benign web, V oIP , IoT , and video trafﬁc, as well as multiple attack types such as DDoS-Ripper, DoS-Hulk, PortScan, and Slo wloris. This makes it possible to detect abnormal trafﬁc solely from network-side performance indicators, enabling security functions that do not rely on deep packet inspection. Beyond supervised learning, the dataset’ s sharp spikes, v olatility clusters, and inconsistent seasonal structure create strong opportunities for anomaly detection. Deviations in CQI, SINR, b uffer occupancy , pack et loss, or bitrate can rev eal early signs of congestion, or malicious activity . Because the dataset includes both dynamic mobility patterns and diverse traf ﬁc sources, anomaly detectors built on it can be tested against conditions where network behavior changes rapidly and non-linearly . This setting mirrors real operational networks more closely than traditional low-frequenc y datasets and supports the design of proactiv e mitigation strategies within the RIC. Finally , the dataset’ s combination of high-frequency time series, labelled mobility classes, and labelled trafﬁc classes allo ws for multi-task learning and transfer learning studies. Models can be trained on one mobility class and ev aluated on another, or jointly predict throughput while classifying user behavior . This supports research on generalization across heterogeneous RAN conditions and offers a realistic foundation for dev eloping predicti ve, adapti ve, and security-oriented control functions that operate within the O-RAN architecture. A.2. Limitations Our current study provides v aluable insights into the performance of shallow models and TSFMs for millisecond resolution wireless network data, and sho ws the need to utilize this dataset to enhance the generalizability and applicability of TSFM pre-training and ﬁne-tuning capabilities. Howe ver , there are certain limitations in the study that highlight areas for potential improv ement in future research. These include: • The empirical benchmark results for shallow models such as XGBoost and Random Forest only had limited hyper- parameter tuning, whereas standard Hyperparameter Optimization (HPO) techniques could hav e been applied to further optimize their performance. Giv en the paper’ s primary focus on comparing benchmark performance between shallow models and TSFMs, any potential marginal impro vements through HPO were deemed secondary to the main objectiv e. Nonetheless, additional hyper -parameter tuning was performed for both TTM and Lag-Llama; howe ver , ARF continued to outperform both models. • Further , default implementations of the TSFMs were considered for the performance on zero-shot models. Feature engineering and data pre-processing strategies can potentially improv e the performance of TSFMs but this was not considered. Since shallow models work directly on the ra w data and perform reliable forecasting, the same was done for TSFMs to make the comparison fair . • Default ﬁne-tuning implementations were explored for each TSFM, whereas TTM was further ev aluated using distinct ﬁne-tuning strategies. Howe ver , nov el techniques such as autotuning and Lo w-Rank Adaptation (LoRA) ( Hu et al. , 12 Bridging the High-Frequency Data Gap: A Millisecond-Resolution Network Dataset for Adv ancing TSFMs 0 2 Org. 1e7 mac_dl_brate 0 1 T rend 1e7 2.5 0.0 2.5 Season. 1e6 01 00:00 01 00:10 01 00:20 01 00:30 01 00:40 01 00:50 0 1 Resid. 1e7 (a) 0 25 Org. OT 0 25 T rend 0 5 Season. 2016-10-26 00 2017-03-25 00 2017-08-22 00 2018-01-19 00 2018-06-18 00 5 0 5 Resid. (b) 0 5000 Org. OT 3000 4000 T rend 1000 0 1000 Season. 2016-07-18 00 2017-02-03 00 2017-08-22 00 2018-03-10 00 2018-09-26 00 2019-04-14 00 2500 0 2500 Resid. (c) 10000 0 Org. OT 1000 0 T rend 2500 0 Season. 2020-02-08 00 2020-04-08 00 2020-06-07 00 2020-08-06 00 2020-10-05 00 2020-12-04 00 5000 0 Resid. (d) 0.0 0.2 Org. OT 0.025 0.050 T rend 0.00 0.05 Season. 2016-10-26 00 2017-03-25 00 2017-08-22 00 2018-01-19 00 2018-06-18 00 0.0 0.1 Resid. (e) F igure 7. STL decomposition of time series: (a) Network, (b) ETTh1, (c) Electricity , (d) W eather, (e) T rafﬁc. 13 Bridging the High-Frequency Data Gap: A Millisecond-Resolution Network Dataset for Adv ancing TSFMs 01 00:00 01 00:10 01 00:20 01 00:30 01 00:40 01 00:50 Timestamp 0.0 0.5 1.0 1.5 2.0 Downlink bitrate 1e7 Actual Rolling Mean Rolling Standard Deviation (a) 2016-07-01 00:00 2016-10-01 00:00 2017-01-01 00:00 2017-04-01 00:00 2017-07-01 00:00 2017-10-01 00:00 2018-01-01 00:00 2018-04-01 00:00 2018-07-01 00:00 Timestamp 0 10 20 30 40 OT Actual Rolling Mean Rolling Standard Deviation (b) 2016-09-01 00:00 2017-01-01 00:00 2017-05-01 00:00 2017-09-01 00:00 2018-01-01 00:00 2018-05-01 00:00 2018-09-01 00:00 2019-01-01 00:00 2019-05-01 00:00 Timestamp 0 1000 2000 3000 4000 5000 6000 OT Actual Rolling Mean Rolling Standard Deviation (c) 2020-01-01 00:00 2020-03-01 00:00 2020-05-01 00:00 2020-07-01 00:00 2020-09-01 00:00 2020-11-01 00:00 2021-01-01 00:00 Timestamp 10000 8000 6000 4000 2000 0 2000 4000 OT Actual Rolling Mean Rolling Standard Deviation (d) 2016-07-01 00:00 2016-10-01 00:00 2017-01-01 00:00 2017-04-01 00:00 2017-07-01 00:00 2017-10-01 00:00 2018-01-01 00:00 2018-04-01 00:00 2018-07-01 00:00 Timestamp 0.00 0.05 0.10 0.15 0.20 OT Actual Rolling Mean Rolling Standard Deviation (e) F igure 8. Rolling mean and standard deviation of time series: (a) Network, (b) ETTh1, (c) Electricity , (d) W eather, (e) T rafﬁc. 14 Bridging the High-Frequency Data Gap: A Millisecond-Resolution Network Dataset for Adv ancing TSFMs 0 20 40 60 80 100 Lags 1.00 0.75 0.50 0.25 0.00 0.25 0.50 0.75 1.00 Autocorrelation Plot (a) 0 20 40 60 80 100 Lags 1.00 0.75 0.50 0.25 0.00 0.25 0.50 0.75 1.00 Autocorrelation Plot (b) 0 20 40 60 80 100 Lags 1.00 0.75 0.50 0.25 0.00 0.25 0.50 0.75 1.00 Autocorrelation Plot (c) 0 20 40 60 80 100 Lags 1.00 0.75 0.50 0.25 0.00 0.25 0.50 0.75 1.00 Autocorrelation Plot (d) 0 20 40 60 80 100 Lags 1.00 0.75 0.50 0.25 0.00 0.25 0.50 0.75 1.00 Autocorrelation Plot (e) F igure 9. Autocorrelation of time series: (a) Network, (b) ETTh1, (c) Electricity , (d) W eather, (e) T rafﬁc. 15 Bridging the High-Frequency Data Gap: A Millisecond-Resolution Network Dataset for Adv ancing TSFMs 4 2 0 2 4 Theoretical quantiles 0.50 0.25 0.00 0.25 0.50 0.75 1.00 Ordered V alues 1e7 Probability Plot (a) 4 2 0 2 4 Theoretical quantiles 8 6 4 2 0 2 4 Ordered V alues Probability Plot (b) 4 2 0 2 4 Theoretical quantiles 2000 1000 0 1000 2000 Ordered V alues Probability Plot (c) 4 2 0 2 4 Theoretical quantiles 6000 4000 2000 0 2000 Ordered V alues Probability Plot (d) 4 2 0 2 4 Theoretical quantiles 0.04 0.02 0.00 0.02 0.04 0.06 0.08 0.10 Ordered V alues Probability Plot (e) F igure 10. Autocorrelation of time series: (a) Network, (b) ETTh1, (c) Electricity , (d) W eather, (e) T rafﬁc. 16 Bridging the High-Frequency Data Gap: A Millisecond-Resolution Network Dataset for Adv ancing TSFMs T able 7. Number of features in different datasets. Dataset No. of features Network 47 ETTh1 8 Electricity 322 W eather 22 T raf ﬁc 863 T able 8. Features used in multiv ariate setting. Featur e Description CQI Channel Quality Indicator MCS Modulation and Coding Scheme pkt ok Number of packets sent pkt nok Number of packets dropped id ue Number of ue’ s connected in the BS pusch sinr Noise ratio interference in the Physical Uplink Shared Channel pucch sinr Noise ratio interference in the Physical Uplink Control Channel pusch rssi Signal strength in the Physical Uplink Shared Channel pucch rssi Signal strength in the Physical Uplink Control Channel pucch samples Number of PUCCH samples 2022 ) strategies were not considered since the focus was on zero-shot and fe w-shot learning. Future work on ablation studies is proposed to inv estigate whether optimizing few-shot learning parameters can signiﬁcantly enhance the performance of TSFMs. A.3. Perf ormance Evaluation Metrics The Root Mean Squared Error (RMSE), and Mean Absolute Error (MAE) are calculated as follows: RMSE ( Y t , ˆ Y t ) = v u u t 1 T T X t =1 ( Y t − ˆ Y t ) 2 , (1) MAE ( Y t , ˆ Y t ) = 1 T T X t =1    ( Y t − ˆ Y t )    , (2) where Y t and ˆ Y t are the actual and predicted bitrate values, and T is the total number of samples in the test data. A.4. Data Characteristics Comparison In this section, we compare our 5G network dataset with those used in the pre-training of TSFMs. The comparison focuses on ke y data characteristics, including statistical distrib utions, temporal dependencies, and statistical v ariability , as illustrated in Figs. 7 , 8 , 9 , and 10 . W e compare the datasets using STL decomposition, rolling mean and standard deviation, autocorrelation (A CF), and residual QQ plots. Our 5G network data is clearly the most different; its trend shifts abruptly in 17 Bridging the High-Frequency Data Gap: A Millisecond-Resolution Network Dataset for Adv ancing TSFMs T able 9. Performance of benchmarked models using ten features. Model Multiv ariate RMSE MAE XGB 0.0347 0.0234 ARF 0.0273 0.0155 Naiv e 0.0417 0.0239 TTM (Zero-shot) 0.0359 0.0230 TTM (Fine-tuning) 0.0358 0.0228 Chronos (Zero-shot) 0.0285 0.0181 Chronos (Fine-tuning) 0.0280 0.0176 T able 10. Performance metrics of benchmarked models on a ne w ﬁltered data. Univ ariate Multiv ariate Model RMSE MAE RMSE MAE XGB 0.1440 0.1087 0.1440 0.1087 ARF 0.1728 0.1125 0.0968 0.0634 Naiv e 0.1309 0.0932 0.1309 0.0932 TTM (Zero-shot) 0.1279 0.0922 0.1279 0.0922 T able 11. Performance metrics of benchmarked models on a ne w ﬁltered data. T raf ﬁc Labels Multiv ariate Mobility Pattern : Y ouT ube Mobility Pattern : Portscan Model ARF TTM (Zero-shot) ARF TTM (Zero-shot) RMSE MAE RMSE MAE RMSE MAE RMSE MAE Static 0.0166 0.0123 0.0359 0.0230 0.0858 0.0585 0.1192 0.0904 Pedestrian 0.0457 0.0262 0.0765 0.0434 0.0819 0.0555 0.1127 0.0860 Bus 0.0790 0.0406 0.0805 0.0481 0.0522 0.0337 0.0945 0.0712 T rain 0.0564 0.0303 0.0623 0.0443 0.1684 0.1180 0.1681 0.1308 Car 0.0404 0.0217 0.0568 0.0320 0.0819 0.0555 0.1375 0.1037 steps, seasonality is weak and mostly hidden by noise, rolling statistics change suddenly , the A CF sho ws strong temporal persistence with slo w decay , and the residual QQ plot departs strongly from normality due to sharp spikes. In contrast, the ETTh1 dataset has a mostly steady trend with mild rises and falls, small but regular seasonal cycles, stable rolling statistics, weak cyclical autocorrelation, and residuals close to normal. The Electricity dataset also remains steady in its trend but shows stronger repeating seasonal patterns, its rolling mean is ﬂat and variance is stable, clear cycles in the A CF , and residuals with occasional de viations. The W eather dataset is mostly ﬂat with rare sharp jumps, no meaningful seasonality , sudden v ariance spikes in rolling statistics, weak A CF signals, and QQ plots highlighting outliers. Finally , T raf ﬁc dataset combines a smooth upward trend with strong, consistent seasonality , gradually increasing rolling mean with stable variance, clear seasonal autocorrelation, and residuals that follow normality fairly well. T o conclude, our dataset differs from the others because its persistence comes from clustered extremes and abrupt shifts rather than smooth or cyclical structure, making it the least regular and most unpredictable series. 18 Bridging the High-Frequency Data Gap: A Millisecond-Resolution Network Dataset for Adv ancing TSFMs 01 00:45 01 00:50 01 00:55 Timestamp 1 2 3 4 5 6 7 8 Downlink bitrate 1e6 Actual XGB Naive 01 00:45 01 00:50 01 00:55 Timestamp 1 2 3 4 5 6 7 8 Downlink bitrate 1e6 ARF 01 00:45 01 00:50 01 00:55 Timestamp 1 2 3 4 5 6 7 8 Downlink bitrate 1e6 T TM Zero-shot T TM Fine-tuning 01 00:45 01 00:50 01 00:55 Timestamp 1 2 3 4 5 6 7 8 Downlink bitrate 1e6 Chronos Zero-shot Chronos Fine-tuning 00:46 00:48 1 2 3 4 1e6 00:46 00:48 1 2 3 4 1e6 00:46 00:48 1 2 3 4 1e6 00:46 00:48 1 2 3 4 1e6 F igure 11. Actual v .s. Predicted bitrate values using ten features. T able 12. Hyper-parameter tuning of the TTM model in the multi variate setting. Learning Rate (LR) RMSE MAE Fine-tune Percent RMSE MAE No. of Epochs RMSE MAE 0.01 0.0390 0.0249 10 0.0358 0.0227 50 0.0359 0.0227 0.001 0.0387 0.0247 15 0.0365 0.0226 80 0.0359 0.0227 0.00001 0.0359 0.0227 25 0.0366 0.0227 100 0.0359 0.0227 0.000001 0.0359 0.0229 30 0.0367 0.0226 T able 13. Hyper-parameter tuning of Lag-Llama model. Context Length RMSE MAE Batch Size RMSE MAE 15 0.0350 0.0231 16 0.0330 0.0227 25 0.0324 0.0217 32 0.0314 0.0218 35 0.0327 0.0221 128 0.0332 0.0221 A.5. Ablation Study A . 5 . 1 . N U M B E R O F F E A T U R E S T able 7 summarizes the number of features in each dataset. Our network data contains 47 features in total, providing enough features for multiv ariate setting. This ensures that our dataset is well-suited for training TSFMs, similar to existing pre-trained datasets. In our initial experiments, we used a subset of four important features from our network dataset as 19 Bridging the High-Frequency Data Gap: A Millisecond-Resolution Network Dataset for Adv ancing TSFMs 01 00:24 01 00:25 01 00:26 01 00:27 01 00:28 01 00:29 01 00:30 Timestamp 1 2 3 4 5 6 Downlink bitrate 1e6 Actual XGB ARF 01 00:24 01 00:25 01 00:26 01 00:27 01 00:28 01 00:29 01 00:30 Timestamp 1 2 3 4 5 6 Downlink bitrate 1e6 Naive T TM Zero-shot 00:28 00:29 2 4 1e6 00:28 00:29 2 4 1e6 (a) 01 00:24 01 00:25 01 00:26 01 00:27 01 00:28 01 00:29 01 00:30 Timestamp 1 2 3 4 5 6 Downlink bitrate 1e6 Actual XGB ARF 01 00:24 01 00:25 01 00:26 01 00:27 01 00:28 01 00:29 01 00:30 Timestamp 1 2 3 4 5 6 Downlink bitrate 1e6 Naive T TM Zero-shot 00:28 00:29 2 4 1e6 00:28 00:29 2 4 1e6 (b) F igure 12. Actual v .s. Predicted bitrate values on a ne w ﬁltered data: (a) Univ ariate, (b) Multi variate. mentioned in T able 2 . W e extended our analysis to include ten features in total, as sho wn in T able 8 , and ev aluated the performance of the benchmarked models in this multiv ariate setting. T able 9 shows that ARF outperforms all the other benchmarked models in this multi v ariate setting as well. The performance of these models is more clearly reﬂected in Fig. 11 . W e observe that ARF follows the curve/trend of the bitrate much better than other benchmarked models. All models were ev aluated using their default hyper-parameters. A . 5 . 2 . F I LT E R E D D A TA E V A L U A T I O N In this section, we ev aluate the performance of the benchmarked models on mobility patterns and trafﬁc classes that differ from those presented in Section 3.2 . The raw data is ﬁltered based on mobility patterns and traf ﬁc generated from malicious activities. In particular , we focus on the train mobility pattern for the Dos-Hulk-C trafﬁc class. This analysis also demonstrates the potential of the dataset for transfer learning use case; by training models on one set of mobility patterns and trafﬁc classes and e valuating them on a dif ferent set, we can assess how well kno wledge learned in one context generalizes to another . T able 10 presents the performance of ARF and TTM, ev aluated using RMSE and MAE in both uni v ariate and multi v ariate settings. W e speciﬁcally include TTM in our analysis because of its computational ef ﬁciency . For the ﬁltered dataset, we observe that TTM outperforms ARF in the uni variate setting. Ho we ver , in the multiv ariate setting, ARF achieves better performance compared to the other models. Fig. 12 further illustrates how these models follo w the trend of the bitrate. W e further extend our ev aluation to additional ﬁltered subsets of our data, each representing distinct combinations of 20 Bridging the High-Frequency Data Gap: A Millisecond-Resolution Network Dataset for Adv ancing TSFMs mobility patterns and trafﬁc classes. Here, we restrict our experiments to the multi variate setting. T able 11 shows that TTM consistently has poorer performance compared to ARF in most traf ﬁc labels. These results indicate the stronger generalization of ARF in the multiv ariate setting. A . 5 . 3 . T S F M S H Y P E R - P A R A M E T E R T U N I N G In this section, we ev aluate the performance of TTM and Lag-Llama models under different hyper-parameter settings. T able 12 summarizes the results of hyper -parameter tuning for the TTM model in the multi v ariate setting. It presents the performance of the TTM model under v arious learning rates, ﬁne-tuning percentages, and number of epochs. W e ﬁrst ev aluated the different learning rates, observing that a learning rate of 0.00001 achie ves the lo west errors. Using this optimal learning rate (0.00001), we further e xperimented with dif ferent ﬁne-tuning percentages, observing that ﬁnetuning 10% of training data results in lower RMSE and while the MAE remains largely similar across all ﬁne-tuning percentages. W e also tuned the number of training epochs using this learning rate, but found that increasing the number of epochs did not signiﬁcantly change either the RMSE or MAE, indicating that the performance is largely insensiti ve to the number of epochs beyond the def ault setting. TTM provides a b uilt-in learning rate ﬁnder , which we used to determine the optimal learning rate for the main results presented in T able 4 . Using the learning rate ﬁnder algorithm, we obtained an optimal learning rate of 0.0011 for the main results in T able 4 , resulting in an RMSE of 0.0391 and an MAE of 0.0249. This shows that tuning the learning rate can noticeably improve the performance of the TTM model after ﬁne-tuning. Ne vertheless, ARF continues to outperform TTM. Further , T able 13 presents the hyper-parameter tuning results for the Lag-Llama model in the uni variate setting. W e ﬁrst ev aluated different conte xt lengths and observed that a context length of 25 performs better than conte xt length 15. For the main results in T able 4 , we used a context length of 5 to ensure a fair comparison with the shallo w models, which operate under the same input length constraint. W ith this context length of 5, Lag-Llama achie ves RMSE of 0.0474 and MAE of 0.0268. After selecting the best-performing context length (25), we then experimented with different batch sizes. As shown in the table, a batch size of 32 provides slightly better performance compared to the other batch sizes. Similarly , we performed additional e xperiments in a uni variate setting with the zero-shot Chronos-base model, which has 200M parameters, whereas the main results in T able 4 use Chronos-small with 46M parameters. Although Chronos-base achiev es slightly lower errors (RMSE: 0.0294, MAE: 0.0179), the improvement ov er Chronos-small (RMSE: 0.0313, MAE: 0.0185) appears modest despite a more than fourfold increase in model size. Moreover , ARF continues to outperform Chronos in terms of RMSE, indicating that the performance gap is not solely due to model scale. 21

Bridging the High-Frequency Data Gap: A Millisecond-Resolution Network Dataset for Advancing Time Series Foundation Models

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment