Surprised by Attention: Predictable Query Dynamics for Time Series Anomaly Detection

Surprised b y A tten tion: Predictable Query Dynamics for Time Series Anomaly Detection Kadir-Kaan Özer 1 , 2 , René Eb eling 1 , and Markus Enzw eiler 2 1 Mercedes-Benz A G, Germany {kadir.oezer, rene.ebeling}@mercedes-benz.com 2 Institute for Intelligen t Systems, Esslingen Univ ersity of Applied Sciences, German y markus.enzweiler@hs-esslingen.de Abstract. Multiv ariate time series anomalies often manifest as shifts in cross-c hannel dep endencies rather than simple amplitude excursions. In autonomous driving, for instance, a steering command might be internally consisten t but decouple from the resulting lateral acceleration. Residual- based detectors can miss suc h anomalies when ﬂexible sequence mo dels still reconstruct signals plausibly despite altered co ordination. W e in tro duce AxonAD , an unsup ervised detector that treats multi- head attention query ev olution as a short horizon predictable pro cess. A gradien t-up dated reconstruction pathw ay is coupled with a history-only predictor that forecasts future query v ectors from past con text. This is trained via a mask ed predictor-target ob jective against an exp onential mo ving av erage (EMA) target enco der. A t inference, reconstruction error is combined with a tail-aggregated query mismatch score, which mea- sures cosine deviation b etw een predicted and target queries on recen t timesteps. This dual approac h pro vides sensitivity to structural dep en- dency shifts while retaining amplitude-level detection. On proprietary in-v ehicle telemetry with interv al annotations and on the TSB-AD multi- v ariate suite (17 datasets, 180 series) with threshold-free and range-aw are metrics, AxonAD impro ves ranking qualit y and temporal lo calization ov er strong baselines. Ablations conﬁrm that query prediction and com bined scoring are the primary driv ers of the observed gains. Co de is av ailable at the URL https://github.com/iis- esslingen/AxonAD . Keyw ords: Artiﬁcial Intelligence · Deep Learning · Machine Learning · Time Series Analysis 1 In tro duction Mo dern vehicles pro duce dense telemetry streams where diﬀerent c hannels, from steering angle and throttle p osition to lateral acceleration and ya w rate, are sampled at high frequency . F aults in these systems rarely present as individual c hannels leaving their nominal range. Instead, the t ypical failure mo de is a c o or- dination br e ak : a steering command that no longer pro duces the exp ected lateral resp onse, or a throttle position that decouples from engine torque. Detecting 2 K.-K. Özer et al. (a) Nominal dynamics q tgt τ ˆ q pred τ τ − 3 τ − 2 τ − 1 d q ≈ 0 : match (b) Co ordination anomaly ˆ q pred τ q tgt τ τ − 3 τ − 2 τ − 1 d q ≫ 0 : mismatch Fig. 1. Query predictability in 2D query space (schematic, single head). Gray : past query tra jectory . Blue dashed : predicted query . Red : EMA target query . (a) Nominal: predictor and target agree. (b) Coordination anomaly: the target query diverges from the predicted tra jectory , pro ducing large d q ev en when p er-channel amplitudes are within normal b ounds. suc h anomalies matters directly for ﬂeet monitoring, warran t y analytics, and safet y v alidation. This setting exp oses a fundamen tal limitation of residual-based unsupervised detectors. A ﬂexible sequence mo del can accurately reconstruct each channel while missing that the join t co ordination pattern across channels has c hanged [ 20 , 28 , 32 ]. Lo w reconstruction error do es not guaran tee that learned representations preserve the full dep endency structure. A ttention mechanisms [ 30 ] capture relational structure through query and k ey matching, but are typically treated as a one-shot computation for the cur- ren t input window. Under stationary nominal dynamics, the query vectors that con trol attention routing should evolv e predictably ov er short horizons. Struc- tural anomalies can disrupt this predictability even when p er-channel amplitudes remain plausible, making query mismatch a diagnostic signal complementary to reconstruction error. Figure 1 illustrates this idea. AxonAD combines tw o coupled pathw ays. The ﬁrst reconstructs the input windo w using bidirectional self attention. The second is a history-only predictor that maps a time-shifted embedding stream to future multi-head query vectors, trained with a masked cosine loss against an exponential moving av erage (EMA) target enco der. At inference, reconstruction error and query mismatch are each robustly standardized on nominal training data and summed to pro duce the ﬁnal anomaly score. W e ev aluate on proprietary in-vehicle telemetry with in terv al annotations as the primary setting and on the multiv ariate TSB-AD suite [ 25 , 22 ]. Across b oth, AxonAD improv es threshold-free ranking and temp oral lo calization rela- tiv e to strong baselines, and ablations conﬁrm that query prediction and score com bination are the primary drivers. Our con tributions are: Predictable Query Dynamics for Time Series Anomaly Detection 3 – A predictive attention anomaly detector that treats query vectors as a tem- p orally predictable signal rather than a one-shot routing decision, providing sensitivit y to structural dep endency shifts. – Query mismatch as a tail-focused anomaly score that complemen ts recon- struction residuals with a cosine distance signal in query space. – A stable training scheme based on EMA predictor and target netw orks with mask ed sup ervision, av oiding direct sup ervision on attention maps or v alue outputs. 2 Related W ork 2.1 Classical Unsup ervised Multiv ariate Detection Isolation-based metho ds ﬂag anomalies as points that are easily separated under random partitioning [ 15 , 21 ]. Density and neighborho o d metho ds detect samples whose lo cal geometry diﬀers from the nominal distribution [ 7 , 11 ]. Robust matrix decomp osition approac hes mo del data as low-rank structure plus sparse corrup- tion [ 8 ], and clustering, histogram, and copula-based methods extend this family with alternative density surrogates [ 36 , 16 , 13 , 19 ]. Because none of these metho ds capture context-dependent coupling that v aries across op erating mo des, they ha ve limited sensitivity to co ordination-type anomalies. 2.2 Deep Sequence Mo dels with Residual or Likelihoo d Scoring Deep detectors learn nominal dynamics through reconstruction or forecasting and score anomalies by residual magnitude or lik eliho o d deviation. Recurrent reconstruction [ 24 ] and probabilistic v arian ts suc h as V AE and sto c hastic recurren t mo dels [ 18 , 26 , 27 , 28 , 33 ] p erform well across many b enchmarks, as do light weigh t sp ectral and con volutional v arian ts [ 35 ]. How ever, residual scoring can miss anomalies that shift dep endencies while leaving p er-channel v alues plausible, particularly under nonstationarit y where ﬂexible mo dels ma y still reconstruct accurately despite altered co ordination [20,28,32]. 2.3 A ttention and Relation A w are Anomaly Scoring A ttention weigh ts enco de learned relational structure [ 30 ] and hav e been used directly for scoring, for example b y measuring asso ciation discrepancies [ 34 ] or b y mo deling sensor relations with graph structures [ 12 ]. T ransformer bac kb ones ha v e also b een adapted to anomaly detection through reconstruction and forecasting pip elines [ 20 , 29 , 32 , 37 ]. AxonAD diﬀers in that it scores the pr e dictability of query v ectors ov er time, capturing what the mo del is ab out to attend to, rather than scoring the atten tion weigh ts themselves or the v alue residuals. 4 K.-K. Özer et al. Ground Truth X Z Shift t − 1 EMA θ tgt EMA stopgrad ⊥ + + Input Window X R T × F Input Embedding H on R T × D Linear Proj. W q , W k , W v Q rec K , V R N h × T × d h Multi- Head Attn. FFN & Output Head ˆ X R T × F Reconstruction MSE ↓ d rec Causal TCN (Axon) b Q pred R N h × T × d h Inference tail cosine k d q Final Anomaly Score S = rz( d rec ) +rz( d q ) T arget Embedding H tgt R T × D T arget Proj. W tgt q Q tgt R N h × T × d h T raining masked cosine M L JEP A Legend Inference – – T raining – – EMA Update Reconstruction Pathway (online) Predictive Attention Pathwa y EMA Target Encoder Fig. 2. AxonAD o verview. The online reconstruction encoder computes self attention using queries Q rec . In parallel, a history-only predictor forecasts b Q pred and is trained to matc h EMA target queries Q tgt (stop-gradien t). Query mismatch on the last v alid timesteps yields d q , and reconstruction yields d rec . Atten tion divergence (KL tail) is not included in the default scoring pip eline. 2.4 Self-Sup ervised Predictive Ob jectives Predictiv e self-sup ervised learning encourages representations to b e inferable from context under masking, commonly stabilized via EMA target netw orks [ 2 , 5 ]. Related masking ob jectives hav e b een applied to time series representation learn- ing [ 1 , 35 , 37 ]. Most detectors that use prediction sup ervise v alues or latent states and score residuals at inference. AxonAD instead applies predictive sup ervision directly in query space, making the training ob jective and the inference scoring signal the same cosine distance. Section 3.4 exploits this consistency . 3 Mo del Architecture Figure 2 gives an ov erview. The mo del takes a ﬁxed-length window X ∈ R T × F and produces a reconstruction ˆ X ∈ R T × F together with t w o window-lev el signals: a reconstruction score d rec and a query mismatch score d q , combined after robust standardization in to the ﬁnal anomaly score. The architecture has three comp onents: (i) a gradient-updated reconstruction path wa y based on bidirectional self atten tion, (ii) a history-only predictiv e path wa y that forecasts future multi-head query v ectors from a time-shifted em b edding stream, and (iii) an EMA target enco der that provides stable query sup ervision targets [ 14 ]. Throughout this paper, online refers to gradient-updated parameters, not streaming causalit y . Predictable Query Dynamics for Time Series Anomaly Detection 5 Notation. T denotes the window length, F the num b er of channels, and D the em b edding dimension. N h is the n umber of attention heads with head dimension d h = D / N h . s is the forecast horizon, k the n umber of tail timesteps used for query mismatch aggregation, and m ∈ (0 , 1) is the EMA momen tum. W e use τ ∈ { 1 , . . . , T } for within window timestep indices. F or a window ending at absolute time t , w e write X t − T +1: t ∈ R T × F for its T ro ws. Shared embedding. A linear pro jection with learnable p ositional bias maps X to a shared p er timestep representation, follow ed b y la y er normaliza- tion [4] applied b efore atten tion: H on = LN( XW e + b e + P ) , P ∈ R T × D . This sequence feeds b oth the reconstruction self attention and the predictiv e branc h (after applying the history-only time shift describ ed b elow). 3.1 Online Reconstruction Path w a y The online enco der forms m ulti-head queries, keys, and v alues via learned pro jec- tions: ( Q rec , K , V ) ∈ R N h × T × d h . Standard multi-head self attention [ 30 ] ov er full within window context pro duces con text features Z , which are pro cessed by a p osition-wise feedforward netw ork and a linear output head to obtain ˆ X . The reconstruction score is the mean squared ℓ 2 error o ver timesteps: d rec = 1 T T X τ =1 ∥ ˆ x τ − x τ ∥ 2 2 . (1) 3.2 Predictiv e Atten tion P athw a y The predictive pathw a y forecasts query vector evolution using only past con text, pro ducing an anomaly signal sensitive to co ordination shifts even when windo ws remain plausible in amplitude. History-only shift. T o preven t information leak age, we construct a time- shifted em b edding stream with forecast horizon s : e H τ = ( 0 , τ ≤ s, H on , τ − s , τ > s, ensuring that any prediction at timestep τ dep ends only on embeddings av ailable up to τ − s . Causal predictor. A causal temp oral predictor g ( · ) maps the shifted sequence to predicted m ulti-head queries: b Q pred = g ( e H ) ∈ R N h × T × d h , 6 K.-K. Özer et al. with causalit y enforced so that the output at τ dep ends only on e H ≤ τ . W e denote the p er head, p er timestep slice b y b q pred h,τ = b Q pred [ h, τ , :] ∈ R d h , with the corresp onding EMA target slice q tgt h,τ deﬁned in Section 3.3. The predictiv e branc h forecasts queries only , not keys or v alues, keeping it light weigh t and aligning sup ervision directly with the inference scoring signal. 3.3 EMA T arget Enco der and Masked T raining W e maintain an EMA target enco der w ith parameters θ tgt that trac k the online parameters θ on : θ tgt ← m θ tgt + (1 − m ) θ on , m ∈ (0 , 1) , with no gradient up dates to the target parameters [ 14 ]. Given the same input windo w X , the EMA enco der pro duces a target embedding sequence H tgt ∈ R T × D in the same wa y as H on but using θ tgt . T arget queries are obtained via a mirrored pro jection: Q tgt = reshap e N h  H tgt W tgt q  ∈ R N h × T × d h , (2) where W tgt q is the EMA trac ked counterpart of the online query pro jection. T raining minimizes reconstruction error together with a masked cosine loss in query space, follo wing a JEP A st yle sc heme [ 2 ]. A set of masked timesteps M ⊂ { s + 1 , . . . , T } is sampled via contiguous time patch masking ov er v alid timesteps (inputs remain unmask ed). The resulting loss is: L JEP A = 1 |M| N h X τ ∈M N h X h =1  1 − D b q pred h,τ ∥ b q pred h,τ ∥ 2 + ε cos , stopgrad( q tgt h,τ ) ∥ stopgrad( q tgt h,τ ) ∥ 2 + ε cos E . (3) The stop-gradient on q tgt h,τ ensures that only the predictor is up dated to match the targets, not the rev erse. 3.4 Query Mismatc h and Final Anomaly Score A t inference, AxonAD computes t wo complementary window-lev el signals: d rec (Eq. (1) ) and a query mismatch score d q deriv ed from cosine deviations b etw een predicted and EMA target queries on the tail of the window, emphasizing the most recen t timesteps. The tail-aggregated query mismatc h is deﬁned as: τ 0 = max  s + 1 , T − k + 1  , k eﬀ = T − τ 0 + 1 , d q = 1 N h k eﬀ N h X h =1 T X τ = τ 0  1 − D b q pred h,τ ∥ b q pred h,τ ∥ 2 + ε cos , q tgt h,τ ∥ q tgt h,τ ∥ 2 + ε cos E , (4) Predictable Query Dynamics for Time Series Anomaly Detection 7 rz( d rec ) rz( d q ) d rec only d q only S = threshold nominal nominal amplitude coordination Fig. 3. Score complemen tarity (sc hematic). Nominal windows spread along b oth axes but cluster near the origin in b oth scores simultaneously . Near-b oundary amplitude and co ordination anomalies are moderate on b oth axes, falling inside b oth single-axis thresholds (dotted lines) but separated by the additiv e S (dashed diagonal). where τ 0 enforces b oth v alidit y under the s step history constraint and tail fo cus of nominal length k , and k eﬀ normalizes by the actual num b er of summed timesteps. Because d rec and d q can hav e very diﬀerent dynamic ranges across datasets, eac h comp onent is robustly standardized using median and interquartile range (IQR) statistics computed exclusiv ely on nominal training windows: rz( u ) = u − median( u ) IQR( u ) + ε rz , IQR( u ) = Q 0 . 75 ( u ) − Q 0 . 25 ( u ) , (5) and the ﬁnal anomaly score is: S ( X ) = rz( d rec ( X )) + rz( d q ( X )) . (6) The additive form means that a single threshold on S captures anomalies that elev ate either comp onent or b oth. Figure 3 illustrates the geometry: amplitude anomalies raise d rec while co ordination anomalies raise d q , and the diagonal constan t score contour separates all anomaly types from the nominal cluster. T raining and inference consistency . The cosine distance used for mask ed sup ervision in Eq. (3) is the same metric reused at inference as d q in Eq. (4) . This means the predictor is trained directly on the deploy ed scoring ob jectiv e. An attention div ergence diagnostic (KL tail) is implemented for ablation analysis only and is not part of the default scoring pip eline. 4 Exp erimen tal Setup Proto col. W e ev aluate in t wo settings: (i) proprietary in-vehicle telemetry with in terv al annotations, and (ii) the TSB-AD multiv ariate suite (17 datasets, 8 K.-K. Özer et al. train (nominal) test v al ﬁrst anomaly: 43,410 0 40k 80k 80,000 steps, 19 c hannels, T =100 stride 1, test anomaly rate 0.089 (30 interv als, median duration 108) Fig. 4. Chronological split and anomaly onset for the proprietary telemetry stream. 180 series) under the oﬃcial pip eline [ 22 , 25 ]. T raining is strictly unsup ervised. All parameters and robust scoring statistics are ﬁt on nominal training windo ws only , with lab els reserved for ev aluation. Hyp erparameters for all metho ds are selected on the oﬃcial TSB-AD tuning comp onen t (20 multiv ariate series) and then ﬁxed. T elemetry lab els are never used for h yp erparameter selection, thresholding, p ostpro cessing, or early stopping. Early stopping uses a ﬁxed criterion (v alidation reconstruction error) on an unlab eled split carv ed from the nominal training preﬁx. Lab el-free transfer chec k. T o verify that hyperparameters selected on TSB-AD transfer reasonably to the telemetry domain, we compare distributional similarit y using z-scored summary features (scale, shap e, auto correlation, and sp ectral descriptors) computed on train segments. The telemetry segment is not an outlier: its leav e-one-out Mahalanobis distance falls at the 45th p ercentile and its nearest-neigh b or distance at the 55th p ercentile. 4.1 Datasets, Splits, and Windowing The proprietary telemetry stream con tains 80,000 timesteps with F = 19 con- tin uous channels (Figure 4). Anomalies are annotated as con tiguous in terv als (30 total, duration 1 to 292 with median 108, aﬀecting 1 to 4 channels with median 2) spanning the following types: ﬂatline, drift, level shift, spike, v ariance jump, and correlation break. The chronological split is: train [0 , 40000) with an in ternal 20% v alidation holdout (train _ sub [0 , 32000) , v al [32000 , 40000) ), and test [40000 , 80000) . The ﬁrst anomaly o ccurs at index 43,410, so b oth training and v alidation partitions are anomaly free. The TSB-AD m ultiv ariate suite aggregates 180 series across 17 datasets [ 25 , 22 ]. W e follow the oﬃcial ev aluator and proto col throughout. Causalit y and latency . Window scoring uses no lo ok ahead: S ( X t − T +1: t ) dep ends only on samples up to t . F or real-time deploymen t, each window score is naturally assigned to its endp oint t (detection time). How ever, to comply with the p oint-wise metric computation of the TSB-AD ev aluation framework, oﬄine b enc hmark scores are assigned to the center of the window at t − ⌊ ( T − 1) / 2 ⌋ . This sequence alignmen t applies b oundary edge padding and eﬀectively incorp orates a lo ok ahead of ⌊ ( T − 1) / 2 ⌋ steps solely for temp oral lo calization ev aluation. Reconstruction atten tion remains bidirectional within each window, while query prediction is history-only via the s step shift. Predictable Query Dynamics for Time Series Anomaly Detection 9 4.2 Baselines and Metrics W e compare against classical, deep reconstruction and forecasting, and T rans- former-based detectors implemented in the oﬃcial TSB-AD framework: Isolation F orest [ 21 ], Extended Isolation F orest [ 15 ], LSTMAD [ 24 ], OmniAnomaly [ 28 ], USAD [ 3 ], V AE v ariants [ 10 , 18 , 26 , 27 ] including V ASP [ 31 ], and T ransformer- based baselines (TFTResidual [ 20 ], TimesNet [ 32 ], T ranAD [ 29 ], Anomaly T rans- former [ 34 ]). The main pap er reports a representativ e subset, with full results in the App endix. W e rep ort threshold-free ranking via AUC-R OC, AUC-PR, VUS-ROC, and VUS-PR [ 6 ], and lo calization via P A-F1, Even t-F1, Range-F1, and Aﬃliation-F1 using the oﬃcial ev aluator [ 22 , 25 ]. F or F1 family metrics, op erating p oints follow the ev aluator’s default threshold sweep (oracle). 4.3 AxonAD Conﬁguration A single conﬁguration is used across all datasets. The mo del applies a linear em b edding with learnable p ositional bias ( N (0 , 0 . 02) ), prenorm multi-head self atten tion, and a feedforward netw ork of width 2 D with ReLU. The predictive branc h is a causal temporal conv olutional netw ork [ 17 ] with dilations (1 , 2 , 4 , 8) , k ernel size 3, and dropout 0.1. The EMA target encoder [ 14 ] is initialized from the online mo del and up dated each step with momentum m = 0 . 9 . Query sup ervision uses time patc h masking fo cused on later timesteps (mask ratio 0.5, block fraction 0.5). T raining minimizes reconstruction MSE plus cosine query prediction loss with uncertain ty weigh ting [ 9 ], optimized via Adam W [ 23 ] (w eight decay 10 − 5 ), gradien t clipping at 1.0, and early stopping on v alidation reconstruction error. Unless stated otherwise, rep orted results use T = 100 , D = 128 , 8 attention heads, forecast horizon s = 1 , tail length k = 10 , learning rate 5 × 10 − 4 , batch size 128, and up to 50 ep o chs with patience 3. Results are av eraged ov er four seeds { 2024 , . . . , 2027 } . All exp eriments hav e b een run on a single Apple MacBo ok Pro (M3 Max, 32 GB uniﬁed memory) using PyT orch with Apple Silicon acceleration. 5 Results W e ﬁrst rep ort results on the proprietary telemetry stream, which is the primary applied setting, and then on the TSB-AD b enc hmark to assess generalization. T able 1 reports results on the proprietary telemetry stream. AxonAD ac hiev es the strongest threshold-free metrics by a wide margin, with AUC-PR of 0.285 v ersus 0.128 for the next b est metho d (SISV AE). The gains are esp ecially pro- nounced on Even t-F1 (0.420 vs 0.255) and Range-F1 (0.328 vs 0.262), indicating that AxonAD not only ranks anomalies more accurately but also lo calizes them b etter in time. The large gap is consistent with the prev alence of co ordination breaks in this dataset: anomalies that alter cross-channel dep endencies without pro ducing large p er-channel excursions are precisely the regime where query mismatc h provides the most v alue. 10 K.-K. Özer et al. T able 1. Proprietary telemetry results (TSB-AD ev aluation suite). Mean ± std o ver four seeds. Best p er metric in b old . Model Proprietary T elemetry (19 channels, 80,000 timesteps) AUC-PR AUC-R OC VUS-PR VUS-ROC P A-F1 Even t-F1 Range-F1 Aﬃliation-F1 LSTMAD [24] 0.082 ± 0.004 0.651 ± 0.009 0.083 ± 0.004 0.624 ± 0.009 0.533 ± 0.014 0.255 ± 0.015 0.139 ± 0.006 0.723 ± 0.003 SISV AE [18] 0.128 ± 0.030 0.586 ± 0.026 0.070 ± 0.012 0.504 ± 0.052 0.270 ± 0.100 0.231 ± 0.060 0.225 ± 0.054 0.699 ± 0.018 TFTResidual [20] 0.071 ± 0.006 0.644 ± 0.025 0.070 ± 0.005 0.582 ± 0.018 0.424 ± 0.022 0.164 ± 0.019 0.110 ± 0.009 0.752 ± 0.026 VSV AE [26] 0.100 ± 0.005 0.617 ± 0.031 0.065 ± 0.005 0.535 ± 0.027 0.214 ± 0.048 0.188 ± 0.004 0.262 ± 0.037 0.730 ± 0.012 M2N2 [1] 0.065 ± 0.001 0.596 ± 0.001 0.064 ± 0.001 0.553 ± 0.004 0.392 ± 0.022 0.196 ± 0.020 0.120 ± 0.009 0.680 ± 0.000 MA V AE [10] 0.094 ± 0.006 0.561 ± 0.034 0.059 ± 0.006 0.487 ± 0.051 0.220 ± 0.076 0.199 ± 0.015 0.202 ± 0.017 0.680 ± 0.000 V ASP [31] 0.050 ± 0.001 0.540 ± 0.014 0.051 ± 0.002 0.449 ± 0.016 0.190 ± 0.008 0.099 ± 0.004 0.119 ± 0.013 0.686 ± 0.008 WV AE [27] 0.087 ± 0.013 0.541 ± 0.043 0.057 ± 0.007 0.467 ± 0.050 0.249 ± 0.103 0.226 ± 0.038 0.163 ± 0.011 0.680 ± 0.000 TimesNet [32] 0.055 ± 0.001 0.579 ± 0.003 0.056 ± 0.000 0.531 ± 0.003 0.306 ± 0.020 0.102 ± 0.004 0.092 ± 0.002 0.680 ± 0.000 IF orest [21] 0.041 ± 0.000 0.472 ± 0.000 0.044 ± 0.000 0.328 ± 0.000 0.140 ± 0.000 0.086 ± 0.000 0.195 ± 0.000 0.682 ± 0.000 T ranAD [29] 0.041 ± 0.000 0.470 ± 0.003 0.044 ± 0.000 0.417 ± 0.004 0.237 ± 0.003 0.086 ± 0.000 0.107 ± 0.007 0.680 ± 0.000 USAD [3] 0.040 ± 0.001 0.470 ± 0.010 0.044 ± 0.001 0.371 ± 0.011 0.122 ± 0.005 0.087 ± 0.001 0.152 ± 0.033 0.682 ± 0.001 OmniAnomaly [28] 0.041 ± 0.000 0.459 ± 0.000 0.043 ± 0.000 0.338 ± 0.000 0.150 ± 0.000 0.086 ± 0.000 0.126 ± 0.000 0.680 ± 0.000 AxonAD (ours) 0.285 ± 0.014 0.702 ± 0.011 0.157 ± 0.012 0.634 ± 0.017 0.533 ± 0.016 0.420 ± 0.019 0.328 ± 0.014 0.715 ± 0.024 T able 2. TSB-AD m ultiv ariate b enchmark (17 datasets, 180 series). Mean ± std o ver all series. Best p er metric in b old . Model TSB-AD (multiv ariate, 17 datasets, 180 time series) AUC-PR AUC-R OC VUS-PR VUS-R OC P A-F1 Event-F1 Range-F1 Aﬃliation-F1 V ASP [31] 0.339 ± 0.319 0.762 ± 0.195 0.401 ± 0.338 0.809 ± 0.185 0.669 ± 0.318 0.520 ± 0.361 0.400 ± 0.260 0.849 ± 0.123 OmniAnomaly [28] 0.372 ± 0.341 0.744 ± 0.250 0.424 ± 0.354 0.777 ± 0.240 0.627 ± 0.354 0.528 ± 0.367 0.432 ± 0.292 0.841 ± 0.126 WV AE [27] 0.354 ± 0.331 0.747 ± 0.248 0.413 ± 0.349 0.778 ± 0.248 0.576 ± 0.388 0.502 ± 0.383 0.365 ± 0.280 0.838 ± 0.137 USAD [3] 0.363 ± 0.339 0.738 ± 0.256 0.412 ± 0.350 0.771 ± 0.244 0.622 ± 0.355 0.519 ± 0.364 0.422 ± 0.288 0.837 ± 0.131 SISV AE [18] 0.323 ± 0.290 0.759 ± 0.234 0.372 ± 0.315 0.786 ± 0.227 0.551 ± 0.367 0.470 ± 0.355 0.369 ± 0.278 0.824 ± 0.129 MA V AE [10] 0.299 ± 0.297 0.697 ± 0.256 0.351 ± 0.322 0.728 ± 0.256 0.568 ± 0.372 0.463 ± 0.360 0.325 ± 0.264 0.812 ± 0.132 VSV AE [26] 0.290 ± 0.286 0.709 ± 0.256 0.342 ± 0.321 0.734 ± 0.257 0.596 ± 0.355 0.487 ± 0.347 0.374 ± 0.254 0.841 ± 0.121 M2N2 [1] 0.319 ± 0.358 0.740 ± 0.198 0.323 ± 0.359 0.779 ± 0.183 0.876 ± 0.184 0.603 ± 0.372 0.282 ± 0.233 0.860 ± 0.118 T ranAD [29] 0.258 ± 0.318 0.675 ± 0.221 0.308 ± 0.347 0.742 ± 0.210 0.753 ± 0.314 0.530 ± 0.367 0.218 ± 0.154 0.826 ± 0.125 TFTResidual [20] 0.250 ± 0.313 0.710 ± 0.210 0.308 ± 0.338 0.777 ± 0.186 0.746 ± 0.318 0.472 ± 0.362 0.207 ± 0.161 0.846 ± 0.114 TimesNet [32] 0.201 ± 0.246 0.618 ± 0.279 0.271 ± 0.297 0.686 ± 0.277 0.750 ± 0.292 0.427 ± 0.354 0.176 ± 0.129 0.821 ± 0.117 IF orest [21] 0.210 ± 0.232 0.704 ± 0.191 0.253 ± 0.260 0.750 ± 0.184 0.655 ± 0.335 0.403 ± 0.322 0.243 ± 0.178 0.801 ± 0.110 LSTMAD [24] 0.248 ± 0.328 0.597 ± 0.337 0.245 ± 0.329 0.626 ± 0.343 0.657 ± 0.412 0.507 ± 0.413 0.198 ± 0.175 0.701 ± 0.350 AxonAD (ours) 0.437 ± 0.323 0.825 ± 0.169 0.493 ± 0.325 0.859 ± 0.146 0.698 ± 0.316 0.600 ± 0.336 0.471 ± 0.290 0.860 ± 0.132 T able 2 sho ws that these gains generalize b eyond telemetry . On the TSB- AD multiv ariate suite, AxonAD ac hieves the highest mean A UC-PR (0.437), VUS-PR (0.493), and Range-F1 (0.471). M2N2 leads on P A-F1, and V ASP and OmniAnomaly are comp etitive on Aﬃliation-F1, but all three rank b elow AxonAD on threshold-free metrics. Classical detectors achiev e mo derate AUC- R OC but low er AUC-PR and range-aw are scores. T ransformer-based detectors are comp etitiv e on subsets of series but show low er mean ranking in aggregate. Figure 5 conﬁrms that improv emen ts are broadly distributed: AxonAD wins on a clear ma jority of the 180 series against ev ery baseline, with all paired Wilco xon signed-rank tests yielding p < 10 − 4 . 6 Ablation Studies T able 3 rep orts ablations on the TSB-AD multiv ariate tuning subset (20 series) under the oﬃcial proto col. All v ariants share identical prepro cessing, windo wing, and metric computation. Rows are group ed by the design dimension under study and discussed in that order b elo w. Predictable Query Dynamics for Time Series Anomaly Detection 11 0 50 100 150 OmniAnomaly USAD V ASP M2N2 T ranAD TFTResidual TimesNet # series AUC-PR: wins/losses (out of 180) baseline wins AxonAD wins − 0 . 2 − 0 . 1 0 ← fa v ors AxonAD AUC-PR diﬀerence Median ∆ (baseline − AxonAD) Fig. 5. Paired A UC-PR comparison on TSB-AD multiv ariate ( n = 180 ). Left: win/loss coun ts. Right: median paired diﬀerence with lollipop connectors from zero. All paired Wilco xon tests yield p < 10 − 4 with entirely negativ e 95% b o otstrap CIs (full statistics in the App endix). Scoring comp onents. The base conﬁguration ( Base ) achiev es the s tron- gest balanced proﬁle across ranking and lo calization metrics. Removing the query branc h at inference and using S = rz ( d rec ) alone ( R e c on only ) reduces VUS-PR b y 0.055 and Even t-F1 by 0.117. Retaining b oth branches but replacing cosine mismatc h with MSE distance in query space ( Sc or e MSE ) yields a similar drop, indicating that the cosine formulation matters b eyond simply combining tw o scores. Using the query signal alone ( JEP A only, Q ) reduces A UC-PR by 0.145 and AUC-R OC by 0.097 despite retaining comp etitive P A-F1, conﬁrming that reconstruction is necessary for reliable ranking across all anomaly types. The cosine-based combined score therefore yields the most reliable b ehavior across metric families. KL tail. A dding attention divergence on top of the default score ( Sc or e MSE+JEP A KL ) yields no consistent impro vemen t ov er Base on any metric. W e treat the KL tail as a diagnostic signal only and exclude it from the default scoring pip eline. EMA and masking. Removing the EMA target enco der entirely ( EMA 0 , i.e. m = 0 ) reduces A UC-PR by 0.024 and Even t-F1 by 0.051. Moderate momen tum ( EMA 0.99 , m = 0 . 99 ) incurs a similar A UC-PR p enalty of 0.048, while v ery high momentum ( EMA 0.999 , m = 0 . 999 ) likewise degrades ranking; b oth extremes conﬁrm that the default m = 0 . 9 strik es the right balance b et ween target stabilit y and resp onsiveness to online updates. Increasing the masking ratio to 0.8 ( Mask 0.8 ) similarly reduces AUC-PR and Even t-F1, indicating that o verly aggressive masking makes the predictive task to o hard during training. 12 K.-K. Özer et al. T able 3. AxonAD ablation on the TSB-AD multiv ariate tuning subset (20 series). Mean ± std. Best p er metric in b old . Rows are grouped to match the discussion order b elo w. V ariant TSB-AD (multiv ariate, ablation subset) AUC-PR AUC-ROC VUS-PR VUS-ROC P A-F1 Even t-F1 Range-F1 Aﬃliation-F1 Base 0.558 ± 0.285 0.861 ± 0.137 0.658 ± 0.301 0.915 ± 0.102 0.855 ± 0.248 0.773 ± 0.262 0.564 ± 0.263 0.904 ± 0.123 Recon only 0.511 ± 0.330 0.820 ± 0.218 0.603 ± 0.366 0.858 ± 0.223 0.728 ± 0.325 0.656 ± 0.322 0.541 ± 0.303 0.856 ± 0.142 Score MSE 0.513 ± 0.327 0.828 ± 0.204 0.604 ± 0.365 0.868 ± 0.208 0.730 ± 0.317 0.652 ± 0.311 0.534 ± 0.293 0.852 ± 0.139 JEP A only , Q 0.413 ± 0.317 0.764 ± 0.200 0.533 ± 0.353 0.846 ± 0.177 0.822 ± 0.283 0.683 ± 0.321 0.396 ± 0.248 0.892 ± 0.114 Score MSE+JEP A KL 0.554 ± 0.285 0.860 ± 0.137 0.655 ± 0.300 0.916 ± 0.100 0.854 ± 0.248 0.772 ± 0.262 0.560 ± 0.266 0.896 ± 0.127 EMA 0 0.534 ± 0.302 0.855 ± 0.143 0.636 ± 0.319 0.908 ± 0.113 0.818 ± 0.250 0.722 ± 0.266 0.560 ± 0.258 0.873 ± 0.134 EMA 0.99 0.510 ± 0.325 0.859 ± 0.137 0.608 ± 0.350 0.910 ± 0.105 0.824 ± 0.246 0.700 ± 0.269 0.564 ± 0.263 0.883 ± 0.127 EMA 0.999 0.527 ± 0.310 0.856 ± 0.146 0.636 ± 0.322 0.913 ± 0.114 0.804 ± 0.249 0.724 ± 0.269 0.556 ± 0.261 0.882 ± 0.134 Mask 0.8 0.533 ± 0.306 0.859 ± 0.142 0.631 ± 0.329 0.911 ± 0.111 0.808 ± 0.251 0.703 ± 0.271 0.547 ± 0.263 0.864 ± 0.137 Heads=4 0.525 ± 0.306 0.856 ± 0.139 0.630 ± 0.323 0.914 ± 0.101 0.808 ± 0.247 0.711 ± 0.258 0.531 ± 0.272 0.876 ± 0.128 D =64 0.516 ± 0.334 0.836 ± 0.183 0.613 ± 0.357 0.896 ± 0.147 0.796 ± 0.260 0.692 ± 0.289 0.553 ± 0.267 0.860 ± 0.142 Horizon 25 0.502 ± 0.328 0.854 ± 0.150 0.599 ± 0.361 0.907 ± 0.118 0.821 ± 0.250 0.676 ± 0.294 0.517 ± 0.304 0.881 ± 0.128 Predict keys 0.405 ± 0.332 0.735 ± 0.233 0.495 ± 0.369 0.803 ± 0.227 0.748 ± 0.312 0.622 ± 0.359 0.462 ± 0.279 0.857 ± 0.126 Predict v alues 0.405 ± 0.332 0.736 ± 0.232 0.496 ± 0.372 0.801 ± 0.228 0.744 ± 0.326 0.623 ± 0.358 0.439 ± 0.294 0.857 ± 0.126 Predict attn map, Q 0.403 ± 0.334 0.754 ± 0.214 0.500 ± 0.378 0.832 ± 0.198 0.766 ± 0.307 0.630 ± 0.361 0.379 ± 0.251 0.870 ± 0.133 Predict attn map, QK 0.388 ± 0.369 0.757 ± 0.228 0.486 ± 0.395 0.826 ± 0.210 0.709 ± 0.334 0.553 ± 0.365 0.402 ± 0.286 0.844 ± 0.132 Predict hidden state 0.400 ± 0.337 0.742 ± 0.234 0.481 ± 0.371 0.815 ± 0.221 0.720 ± 0.337 0.604 ± 0.360 0.428 ± 0.260 0.856 ± 0.116 1 3 5 25 0 . 4 0 . 5 0 . 6 0 . 7 s metric v alue F orecast horizon s A UC-PR VUS-PR Range-F1 3 5 10 20 0 . 4 0 . 5 0 . 6 0 . 7 k metric v alue T ail length k A UC-PR VUS-PR Range-F1 Fig. 6. Sensitivity to forecast horizon s (left) and tail aggregation length k (righ t) on the tuning subset. Capacit y and h orizon. Reducing the num ber of attention heads from 8 to 4 ( He ads=4 ) low ers AUC-PR by 0.033 with a smaller eﬀect on lo calization metrics. Halving the mo del dimension from 128 to 64 ( D = 64 ) reduces AUC-PR b y 0.042 and AUC-R OC by 0.025. Increasing the forecast horizon to s = 25 ( Horizon 25 ) reduces AUC-PR b y 0.056, consistent with a harder prediction task in tro ducing more score v ariance at inference. Prediction target. Predicting keys ( Pr e dict keys ), v alues ( Pr e dict values ), atten tion maps scored with query inputs only ( Pr e dict attn map, Q ), attention maps scored with b oth query and key inputs ( Pr e dict attn map, QK ), or inter- mediate hidden states ( Pr e dict hidden state ) is consistently inferior to predicting query v ectors across all ranking and lo calization metrics, supp orting the design c hoice of query prediction as the sup ervision and scoring target. P arameter sensitivit y . Figure 6 shows sensitivity to the forecast horizon s and the tail length k . Performance p eaks at s = 1 (AUC-PR 0.545, Range-F1 Predictable Query Dynamics for Time Series Anomaly Detection 13 0 100 200 300 AxonAD OmniAnomaly USAD T ranAD IF orest seconds End to end time (s) ﬁt score all windows 0 0 . 2 0 . 4 6 . 92 · 10 − 2 0 . 19 1 . 9 · 10 − 2 2 . 79 · 10 − 2 0 . 46 ms p er window Scoring latency (ms/window) Fig. 7. Runtime on the telemetry stream (80,000 samples, stride 1). Left: wall-clock time split in to ﬁtting and scoring. Right: p er windo w scoring latency . Classical baselines ha ve no iterativ e training, so their time is entirely attributed to scoring. 0.553) and is generally low er for larger horizons, as a harder prediction task increases score v ariance. F or tail length, threshold-free ranking is stable across k ∈ { 3 , 5 , 10 , 20 } (A UC-PR in [0 . 524 , 0 . 537] ), while Range-F1 p eaks at k = 10 , suggesting that k primarily controls temp oral smo othing. Mec hanistic diagnostics. T o verify that query mismatc h captures mean- ingful attention structure rather than noise, we run a descriptive analysis on the tuning subset (not used for mo del selection). Sp earman correlation b etw een query deviation magnitude ∥ ∆Q ∥ and tail KL divergence KL ( A tgt ∥ A pred ) , where A tgt and A pred denote attention weigh ts from EMA target and predicted queries resp ectiv ely , is frequently p ositive (median ρ = 0 . 677 , ρ ≥ 0 . 50 in 15 of 20 series). This conﬁrms that query mismatc h tracks genuine attention redistribution. T ail atten tion entrop y is nondegenerate (range 3.18 to 4.53), ruling out collapsed atten tion as a confound. The windo w-level c orrelation betw een reconstruction error and query mis- matc h is small (median ρ = 0 . 211 ). Among anomalous windows, the fraction with high query mismatch but low reconstruction error is 0.192, and the rev erse is 0.095. Both regimes o ccur in most series, and combining comp onents improv es A UC-PR o ver the b est single comp onent in 8 of 20 series. This supp orts the co verage interpretation underlying the combined score. Run time. Figure 7 proﬁles runtime on the telemetry stream. AxonAD has the highest ﬁtting cost (334 s) but achiev es p er windo w scoring latency of 0.069 ms, low er than OmniAnomaly (0.190 ms) and Isolation F orest (0.461 ms). The ﬁtting cost reﬂects iterative gradient-based training and is amortized at deplo yment. F or a ﬂeet monitoring pip eline processing windows at 10 Hz, the 0.069 ms latency leav es ample margin for real-time op eration. 14 K.-K. Özer et al. 7 Conclusion AxonAD detects multiv ariate time series anomalies by monitoring the predictabil- it y of atten tion query vectors. A bidirectional reconstruction path w ay is coupled with a history-only predictor trained via masked EMA distillation in query space, pro ducing a query mismatch signal that complements reconstruction residuals and resp onds to structural dep endency shifts. Robust standardization of b oth comp onen ts enables reliable score combination across heterogeneous datasets without lab el-based calibration. On proprietary in-vehicle telemetry , where co ordination breaks b et ween steer- ing, acceleration, and p ow ertrain c hannels are the dominan t fault mode, AxonAD impro ves AUC-PR by 2 . 2 × o ver the next b est baseline and Even t-F1 by 1 . 6 × . These gains transfer to the multiv ariate TSB-AD b enc hmark (17 datasets, 180 series), where AxonAD leads on threshold-free ranking and range-aw are lo caliza- tion. Ablations establish that query prediction outp erforms alternative predictiv e targets and that combining b oth scores is necessary for the b est aggregate p er- formance. The low p er window inference latency (0.069 ms) and the absence of lab el-based threshold tuning supp ort integration into streaming vehicle monitor- ing pip elines. References 1. Abran tes, J., Lange, R.T., T ang, Y.: Comp etition and A ttraction Improv e Mo del F u- sion (Aug 2025). https://doi.org/10.48550/arXiv.2508.16204 , http://arxiv. org/abs/2508.16204 , 2. Assran, M., Duv al, Q., Misra, I., Bo janowski, P ., Vincent, P ., Rabbat, M., LeCun, Y., Ballas, N.: Self-sup ervised learning from images with a joint-em b edding predictive arc hitecture. https://doi.org/10.48550/arXiv.2301.08243 , abs/2301.08243 3. Audib ert, J., Mic hiardi, P ., Guy ard, F., Marti, S., Zuluaga, M.A.: USAD: UnSup er- vised anomaly detection on multiv ariate time series. In: Pro ceedings of the 26th A CM SIGKDD International Conference on Knowledge Discov ery & Data Min- ing. pp. 3395–3404. ACM (2020). https://doi.org/10.1145/3394486.3403392 , https://dl.acm.org/doi/10.1145/3394486.3403392 4. Ba, J.L., Kiros, J.R., Hinton, G.E.: Lay er normalization (2016), https://arxiv. org/abs/1607.06450 5. Balestriero, R., LeCun, Y.: LeJEP A: Prov able and scalable self-sup ervised learning without the heuristics. https://doi.org/10.48550/arXiv.2511.08544 , http:// 6. Boniol, P ., Krishna, A.K., Bruel, M., Liu, Q., Huang, M., Palpanas, T., T say , R.S., Elmore, A., F ranklin, M.J., P aparrizos, J.: V us: Eﬀectiv e and eﬃcient accuracy measures for time-series anomaly detection (2025), 13318 7. Breunig, M.M., Kriegel, H.P ., Ng, R.T., Sander, J.: LOF: identifying densit y-based lo cal outliers 29 (2), 93–104 (2000). https://doi.org/10.1145/335191.335388 , https://dl.acm.org/doi/10.1145/335191.335388 Predictable Query Dynamics for Time Series Anomaly Detection 15 8. Candès, E.J., Li, X., Ma, Y., W right, J.: Robust principal comp onent analysis? 58 (3), 1–37 (2011). https://doi.org/10.1145/1970392.1970395 , https://dl.acm.org/ doi/10.1145/1970392.1970395 9. Cip olla, R., Gal, Y., Kendall, A.: Multi-task learning using uncertaint y to w eigh losses for scene geometry and semantics. In: 2018 IEEE/CVF Conference on Com- puter Vision and Pattern Recognition. pp. 7482–7491 (2018). https://doi.org/ 10.1109/CVPR.2018.00781 10. Correia, L., Go os, J.C., Klein, P ., Bäck, T., K ononov a, A.: MA-V AE: Multi-Head A ttention-Based V ariational Auto enco der Approach for Anomaly Detection in Multiv ariate Time-Series Applied to Automotive Endurance P o wertrain T esting:. In: Proceedings of the 15th International Joint Conference on Computational In telligence. pp. 407–418. SCITEPRESS - Science and T echnology Publications, Rome, Italy (2023). https://doi.org/10.5220/0012163100003595 , https://www. scitepress.org/DigitalLibrary/Link.aspx?doi=10.5220/0012163100003595 11. Co ver, T., Hart, P .: Nearest neighbor pattern classiﬁcation. IEEE T ransactions on Information Theory 13 (1), 21–27 (January 1967). https://doi.org/10.1109/TIT. 1967.1053964 12. Deng, A., Ho oi, B.: Graph Neural Net work-Based Anomaly Detection in Multiv ariate Time Series (Jun 2021). https://doi.org/10.48550/arXiv.2106.06947 , http:// arxiv.org/abs/2106.06947 , 13. Goldstein, M., Dengel, A.: Histogram-based outlier score (hbos): A fast unsup ervised anomaly detection algorithm (09 2012) 14. Grill, J.B., Strub, F., Altché, F., T allec, C., Ric hemond, P .H., Buc hatsk ay a, E., Do ersc h, C., Pires, B.A., Guo, Z.D., Azar, M.G., Piot, B., Kavuk cuoglu, K., Munos, R., V alko, M.: Bootstrap y our own latent: A new approach to self-sup ervised learning (2020), 15. Hariri, S., Kind, M.C., Brunner, R.J.: Extended Isolation F orest. IEEE T rans- actions on Knowledge and Data Engineering 33 (4), 1479–1489 (Apr 2021). https://doi.org/10.1109/TKDE.2019.2947676 , https://ieeexplore.ieee.org/ document/8888179/ 16. He, Z., Xu, X., Deng, S.: Discov ering cluster-based local outliers. Pat- tern Recognition Letters 24 (9), 1641–1650 (2003). https://doi.org/https: //doi.org/10.1016/S0167- 8655(03)00003- 5 , https://www.sciencedirect.com/ science/article/pii/S0167865503000035 17. Lea, C., Flynn, M.D., Vidal, R., Reiter, A., Hager, G.D.: T emp oral con volutional net works for action segmen tation and detection. CoRR abs/1611.05267 (2016), 18. Li, L., Y an, J., W ang, H., Jin, Y.: Anomaly Detection of Time Series With Smo othness-Inducing Sequential V ariational Auto-Encoder. IEEE T rans- actions on Neural Netw orks and Learning Systems 32 (3), 1177–1191 (Mar 2021). https://doi.org/10.1109/TNNLS.2020.2980749 , https://ieeexplore. ieee.org/document/9064715/ 19. Li, Z., Zhao, Y., Botta, N., Ionescu, C., Hu, X.: COPOD: Copula-based outlier de- tection. https://doi.org/10.48550/arXiv.2009.09463 , 2009.09463 20. Lim, B., Arık, S.Ö., Lo eﬀ, N., Pﬁster, T.: T emp oral F usion T ransform- ers for interpretable m ulti-horizon time series forecasting. International Jour- nal of F orecasting 37 (4), 1748–1764 (Oct 2021). https://doi.org/10.1016/j. ijforecast.2021.03.012 , https://www.sciencedirect.com/science/article/ pii/S0169207021000637 16 K.-K. Özer et al. 21. Liu, F.T., Ting, K.M., Zhou, Z.H.: Isolation F orest. In: 2008 Eighth IEEE In- ternational Conference on Data Mining. pp. 413–422 (Dec 2008). https://doi. org/10.1109/ICDM.2008.17 , https://ieeexplore.ieee.org/document/4781136 , iSSN: 2374-8486 22. Liu, Q., P aparrizos, J.: The Elephan t in the Ro om: T ow ards A Reliable Time-Series Anomaly Detection Benchmark. In: Adv ances in Neural Information Pro cessing Systems 37. pp. 108231–108261. Neural Information Processing Systems F ounda- tion, Inc. (NeurIPS), V ancouv er, BC, Canada (2024). https://doi.org/10.52202/ 079017- 3437 , http://www.proceedings.com/079017- 3437.html 23. Loshc hilov, I., Hutter, F.: Decoupled weigh t decay regularization (2019), https: 24. Malhotra, P ., Vig, L., Shroﬀ, G., Agarwal, P .: Long short term memory netw orks for anomaly detection in time series (04 2015) 25. P aparrizos, J., Boniol, P ., P alpanas, T., T say , R.S., Elmore, A., F ranklin, M.J.: V olume under the surface: a new accuracy ev aluation measure for time-series anomaly detection. Pro ceedings of the VLDB Endowmen t 15 (11), 2774–2787 (Jul 2022). https://doi.org/10.14778/3551793.3551830 , https://dl.acm.org/doi/ 10.14778/3551793.3551830 26. P ereira, J., Silveira, M.: Unsupervised Anomaly Detection in Energy Time Series Data Using V ariational Recurrent Autoenco ders with Atten tion. In: 2018 17th IEEE In ternational Conference on Machine Learning and Applications (ICMLA). pp. 1275–1282. IEEE, Orlando, FL (Dec 2018). https://doi.org/10.1109/ICMLA. 2018.00207 , https://ieeexplore.ieee.org/document/8614232/ 27. P ereira, J., Silv eira, M.: Unsupervised represen tation learning and anomaly de- tection in ECG sequences. In ternational Journal of Data Mining and Bioinfor- matics 22 (4), 389 (2019). https://doi.org/10.1504/IJDMB.2019.101395 , http: //www.inderscience.com/link.php?id=101395 28. Su, Y., Zhao, Y., Niu, C., Liu, R., Sun, W., P ei, D.: Robust Anomaly De- tection for Multiv ariate Time Series through Sto chastic Recurrent Neural Net- w ork. In: Pro ceedings of the 25th ACM SIGKDD International Conference on Kno wledge Discov ery & Data Mining. pp. 2828–2837. ACM, Anc horage AK USA (Jul 2019). https://doi.org/10.1145/3292500.3330672 , https://dl.acm.org/ doi/10.1145/3292500.3330672 29. T uli, S., Casale, G., Jennings, N.R.: T ranAD: Deep T ransformer Net- w orks for Anomaly Detection in Multiv ariate Time Series Data (Ma y 2022). https://doi.org/10.48550/arXiv.2201.07284 , 2201.07284 , 30. V aswani, A., Shazeer, N., P armar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L., Polosukhin, I.: A ttention is all you need. In: Pro ceedings of the 31st International Conference on Neural Information Pro cessing Systems. p. 6000–6010 (2017) 31. V on Schleinitz, J., Graf, M., T rutschnig, W., Sc hrö der, A.: V ASP: An auto enco der- based approac h for multiv ariate anomaly detection and robust time series prediction with application in motorsport. Engineering Applications of Artiﬁcial Intelligence 104 , 104354 (Sep 2021). https://doi.org/10.1016/j.engappai.2021.104354 , https://linkinghub.elsevier.com/retrieve/pii/S0952197621002025 32. W u, H., Hu, T., Liu, Y., Zhou, H., W ang, J., Long, M.: TimesNet: T emp oral 2D-V ariation Mo deling for General Time Series Analysis (Apr 2023). https://doi.org/10.48550/arXiv.2210.02186 , 2210.02186 , Predictable Query Dynamics for Time Series Anomaly Detection 17 33. Xu, H., Chen, W., Zhao, N., Li, Z., Bu, J., Li, Z., Liu, Y., Zhao, Y., P ei, D., F eng, Y., Chen, J., W ang, Z., Qiao, H.: Unsup ervised Anomaly Detection via V ariational Auto- Enco der for Seasonal KPIs in W eb Applications (F eb 2018). https://doi.org/10. 48550/arXiv.1802.03903 , , 34. Xu, J., W u, H., W ang, J., Long, M.: Anomaly T ransformer: Time Series Anomaly Detection with Asso ciation Discrepancy (Jun 2022). https://doi.org/10.48550/ arXiv.2110.02642 , http://arxiv.org/abs/2110.02642 , 35. Xu, Z., Zeng, A., Xu, Q.: FITS: Modeling time series with $10k$ parameters. https: //doi.org/10.48550/arXiv.2307.03756 , 36. Y airi, T., Kato, Y., Hori, K.: F ault detection by mining association rules from housek eeping data (06 2001) 37. Zhou, T., Niu, P ., W ang, X., Sun, L., Jin, R.: One ﬁts all:p ow er general time series analysis b y pretrained lm (2023), Predictable Query Dynamics for Time-Series Anomaly Detection Supplemen tary Material S1 A dditional T ables T able S1. TSB-AD multiv ariate b enchmark (17 datasets, 180 time series). Mean ± standard deviation o ver all ev aluated series. Best result p er metric in b old . Model TSB-AD (multiv ariate, 17 datasets, 180 time series) AUC-PR AUC-R OC VUS-PR VUS-ROC P A-F1 Even t-F1 Range-F1 Aﬃliation-F1 V ASP 0.339 ± 0.319 0.762 ± 0.195 0.401 ± 0.338 0.809 ± 0.185 0.669 ± 0.318 0.520 ± 0.361 0.400 ± 0.260 0.849 ± 0.123 OmniAnomaly 0.372 ± 0.341 0.744 ± 0.250 0.424 ± 0.354 0.777 ± 0.240 0.627 ± 0.354 0.528 ± 0.367 0.432 ± 0.292 0.841 ± 0.126 WV AE 0.354 ± 0.331 0.747 ± 0.248 0.413 ± 0.349 0.778 ± 0.248 0.576 ± 0.388 0.502 ± 0.383 0.365 ± 0.280 0.838 ± 0.137 USAD 0.363 ± 0.339 0.738 ± 0.256 0.412 ± 0.350 0.771 ± 0.244 0.622 ± 0.355 0.519 ± 0.364 0.422 ± 0.288 0.837 ± 0.131 SISV AE 0.323 ± 0.290 0.759 ± 0.234 0.372 ± 0.315 0.786 ± 0.227 0.551 ± 0.367 0.470 ± 0.355 0.369 ± 0.278 0.824 ± 0.129 OF A 0.300 ± 0.300 0.639 ± 0.289 0.367 ± 0.342 0.694 ± 0.286 0.675 ± 0.305 0.517 ± 0.372 0.360 ± 0.251 0.833 ± 0.126 CNN 0.347 ± 0.356 0.770 ± 0.176 0.352 ± 0.359 0.807 ± 0.164 0.828 ± 0.266 0.643 ± 0.366 0.301 ± 0.240 0.866 ± 0.120 MA V AE 0.299 ± 0.297 0.697 ± 0.256 0.351 ± 0.322 0.728 ± 0.256 0.568 ± 0.372 0.463 ± 0.360 0.325 ± 0.264 0.812 ± 0.132 VSV AE 0.290 ± 0.286 0.709 ± 0.256 0.342 ± 0.321 0.734 ± 0.257 0.596 ± 0.355 0.487 ± 0.347 0.374 ± 0.254 0.841 ± 0.121 GDN 0.272 ± 0.305 0.738 ± 0.193 0.332 ± 0.329 0.802 ± 0.175 0.756 ± 0.310 0.499 ± 0.364 0.208 ± 0.134 0.846 ± 0.119 M2N2 0.319 ± 0.358 0.740 ± 0.198 0.323 ± 0.359 0.779 ± 0.183 0.876 ± 0.184 0.603 ± 0.372 0.282 ± 0.233 0.860 ± 0.118 T ranAD 0.258 ± 0.318 0.675 ± 0.221 0.308 ± 0.347 0.742 ± 0.210 0.753 ± 0.314 0.530 ± 0.367 0.218 ± 0.154 0.826 ± 0.125 TFTResidual 0.250 ± 0.313 0.710 ± 0.210 0.308 ± 0.338 0.777 ± 0.186 0.746 ± 0.318 0.472 ± 0.362 0.207 ± 0.161 0.846 ± 0.114 KMeansAD 0.252 ± 0.267 0.691 ± 0.202 0.296 ± 0.302 0.732 ± 0.195 0.675 ± 0.309 0.483 ± 0.364 0.326 ± 0.235 0.819 ± 0.126 AutoEncoder 0.294 ± 0.373 0.669 ± 0.212 0.295 ± 0.368 0.691 ± 0.207 0.597 ± 0.357 0.439 ± 0.411 0.283 ± 0.297 0.800 ± 0.139 PCA 0.242 ± 0.293 0.676 ± 0.238 0.277 ± 0.307 0.712 ± 0.223 0.514 ± 0.340 0.370 ± 0.329 0.325 ± 0.263 0.789 ± 0.122 TimesNet 0.201 ± 0.246 0.618 ± 0.279 0.271 ± 0.297 0.686 ± 0.277 0.750 ± 0.292 0.427 ± 0.354 0.176 ± 0.129 0.821 ± 0.117 FITS 0.197 ± 0.253 0.611 ± 0.271 0.267 ± 0.300 0.686 ± 0.274 0.763 ± 0.281 0.422 ± 0.342 0.181 ± 0.131 0.816 ± 0.115 Donut 0.213 ± 0.270 0.627 ± 0.239 0.262 ± 0.308 0.693 ± 0.237 0.525 ± 0.414 0.406 ± 0.385 0.180 ± 0.167 0.769 ± 0.200 CBLOF 0.263 ± 0.344 0.664 ± 0.206 0.260 ± 0.341 0.697 ± 0.193 0.648 ± 0.340 0.448 ± 0.412 0.302 ± 0.283 0.811 ± 0.149 IF orest 0.210 ± 0.232 0.704 ± 0.191 0.253 ± 0.260 0.750 ± 0.184 0.655 ± 0.335 0.403 ± 0.322 0.243 ± 0.178 0.801 ± 0.110 LSTMAD 0.248 ± 0.328 0.597 ± 0.337 0.245 ± 0.329 0.626 ± 0.343 0.657 ± 0.412 0.507 ± 0.413 0.198 ± 0.175 0.701 ± 0.350 RobustPCA 0.238 ± 0.349 0.589 ± 0.241 0.238 ± 0.352 0.616 ± 0.235 0.573 ± 0.387 0.379 ± 0.385 0.332 ± 0.318 0.789 ± 0.135 EIF 0.186 ± 0.218 0.667 ± 0.174 0.210 ± 0.258 0.708 ± 0.168 0.741 ± 0.291 0.438 ± 0.374 0.258 ± 0.230 0.812 ± 0.121 COPOD 0.205 ± 0.292 0.652 ± 0.185 0.203 ± 0.290 0.686 ± 0.177 0.717 ± 0.320 0.414 ± 0.364 0.242 ± 0.237 0.799 ± 0.128 HBOS 0.161 ± 0.196 0.633 ± 0.183 0.190 ± 0.247 0.672 ± 0.184 0.667 ± 0.331 0.399 ± 0.345 0.244 ± 0.225 0.796 ± 0.117 KNN 0.133 ± 0.156 0.499 ± 0.218 0.176 ± 0.220 0.580 ± 0.219 0.681 ± 0.366 0.447 ± 0.391 0.205 ± 0.149 0.791 ± 0.140 LOF 0.096 ± 0.092 0.534 ± 0.098 0.138 ± 0.192 0.597 ± 0.147 0.563 ± 0.368 0.325 ± 0.328 0.149 ± 0.143 0.764 ± 0.133 AnomalyT ransformer 0.068 ± 0.060 0.506 ± 0.053 0.115 ± 0.184 0.538 ± 0.098 0.658 ± 0.359 0.361 ± 0.363 0.138 ± 0.121 0.737 ± 0.195 AxonAD (ours) 0.437 ± 0.323 0.825 ± 0.169 0.493 ± 0.325 0.859 ± 0.146 0.698 ± 0.316 0.600 ± 0.336 0.471 ± 0.290 0.860 ± 0.132 Predictable Query Dynamics for Time-Series Anomaly Detection 19 T able S2. TSB-AD m ultiv ariate b enchmark protocol on proprietary dataset. Mean ± standard deviation o ver 4 random seeds. Best result per metric in b old . Model TSB-AD (Proprietary Dataset) AUC-PR AUC-R OC VUS-PR VUS-ROC P A-F1 Even t-F1 Range-F1 Aﬃliation-F1 LSTMAD 0.082 ± 0.004 0.651 ± 0.009 0.083 ± 0.004 0.624 ± 0.009 0.533 ± 0.014 0.255 ± 0.015 0.139 ± 0.006 0.723 ± 0.003 SISV AE 0.128 ± 0.030 0.586 ± 0.026 0.070 ± 0.012 0.504 ± 0.052 0.270 ± 0.100 0.231 ± 0.060 0.225 ± 0.054 0.699 ± 0.018 TFTResidual 0.071 ± 0.006 0.644 ± 0.025 0.070 ± 0.005 0.582 ± 0.018 0.424 ± 0.022 0.164 ± 0.019 0.110 ± 0.009 0.752 ± 0.026 RobustPCA 0.070 ± 0.000 0.634 ± 0.000 0.066 ± 0.000 0.570 ± 0.000 0.312 ± 0.000 0.162 ± 0.000 0.100 ± 0.000 0.680 ± 0.000 VSV AE 0.100 ± 0.005 0.617 ± 0.031 0.065 ± 0.005 0.535 ± 0.027 0.214 ± 0.048 0.188 ± 0.004 0.262 ± 0.037 0.730 ± 0.012 M2N2 0.065 ± 0.001 0.596 ± 0.001 0.064 ± 0.001 0.553 ± 0.004 0.392 ± 0.022 0.196 ± 0.020 0.120 ± 0.009 0.680 ± 0.000 OF A 0.061 ± 0.002 0.597 ± 0.003 0.060 ± 0.001 0.542 ± 0.001 0.224 ± 0.010 0.109 ± 0.007 0.122 ± 0.015 0.680 ± 0.000 MA V AE 0.094 ± 0.006 0.561 ± 0.034 0.059 ± 0.006 0.487 ± 0.051 0.220 ± 0.076 0.199 ± 0.015 0.202 ± 0.017 0.680 ± 0.000 KNN 0.085 ± 0.000 0.563 ± 0.000 0.059 ± 0.000 0.506 ± 0.000 0.336 ± 0.000 0.231 ± 0.000 0.080 ± 0.000 0.680 ± 0.000 CNN 0.058 ± 0.002 0.568 ± 0.006 0.059 ± 0.002 0.524 ± 0.007 0.382 ± 0.033 0.186 ± 0.018 0.105 ± 0.004 0.681 ± 0.003 WV AE 0.087 ± 0.013 0.541 ± 0.043 0.057 ± 0.007 0.467 ± 0.050 0.249 ± 0.103 0.226 ± 0.038 0.163 ± 0.011 0.680 ± 0.000 LOF 0.055 ± 0.000 0.543 ± 0.000 0.056 ± 0.000 0.552 ± 0.000 0.561 ± 0.000 0.166 ± 0.000 0.124 ± 0.000 0.680 ± 0.000 TimesNet 0.055 ± 0.001 0.579 ± 0.003 0.056 ± 0.000 0.531 ± 0.003 0.306 ± 0.020 0.102 ± 0.004 0.092 ± 0.002 0.680 ± 0.000 FITS 0.050 ± 0.000 0.563 ± 0.000 0.053 ± 0.000 0.548 ± 0.001 0.451 ± 0.002 0.091 ± 0.001 0.071 ± 0.001 0.680 ± 0.000 GDN 0.052 ± 0.006 0.547 ± 0.027 0.052 ± 0.004 0.478 ± 0.027 0.349 ± 0.074 0.115 ± 0.028 0.087 ± 0.005 0.681 ± 0.004 V ASP 0.050 ± 0.001 0.540 ± 0.014 0.051 ± 0.002 0.449 ± 0.016 0.190 ± 0.008 0.099 ± 0.004 0.119 ± 0.013 0.686 ± 0.008 AutoEncoder 0.047 ± 0.003 0.541 ± 0.018 0.051 ± 0.003 0.495 ± 0.022 0.185 ± 0.026 0.107 ± 0.005 0.084 ± 0.016 0.680 ± 0.000 EIF 0.049 ± 0.002 0.500 ± 0.006 0.047 ± 0.000 0.357 ± 0.010 0.224 ± 0.019 0.118 ± 0.014 0.192 ± 0.016 0.690 ± 0.007 AnomalyT ransformer 0.045 ± 0.003 0.491 ± 0.019 0.047 ± 0.002 0.454 ± 0.032 0.379 ± 0.031 0.118 ± 0.033 0.064 ± 0.006 0.591 ± 0.086 HBOS 0.064 ± 0.000 0.479 ± 0.000 0.046 ± 0.000 0.349 ± 0.000 0.282 ± 0.000 0.171 ± 0.000 0.193 ± 0.000 0.701 ± 0.000 KMeansAD 0.042 ± 0.000 0.495 ± 0.000 0.046 ± 0.000 0.343 ± 0.000 0.161 ± 0.009 0.087 ± 0.002 0.098 ± 0.007 0.680 ± 0.000 CBLOF 0.041 ± 0.000 0.481 ± 0.000 0.044 ± 0.000 0.352 ± 0.000 0.155 ± 0.000 0.086 ± 0.000 0.073 ± 0.000 0.680 ± 0.000 IF orest 0.041 ± 0.000 0.472 ± 0.000 0.044 ± 0.000 0.328 ± 0.000 0.140 ± 0.000 0.086 ± 0.000 0.195 ± 0.000 0.682 ± 0.000 T ranAD 0.041 ± 0.000 0.470 ± 0.003 0.044 ± 0.000 0.417 ± 0.004 0.237 ± 0.003 0.086 ± 0.000 0.107 ± 0.007 0.680 ± 0.000 USAD 0.040 ± 0.001 0.470 ± 0.010 0.044 ± 0.001 0.371 ± 0.011 0.122 ± 0.005 0.087 ± 0.001 0.152 ± 0.033 0.682 ± 0.001 PCA 0.037 ± 0.000 0.447 ± 0.000 0.043 ± 0.000 0.377 ± 0.000 0.107 ± 0.000 0.092 ± 0.000 0.148 ± 0.000 0.684 ± 0.000 OmniAnomaly 0.041 ± 0.000 0.459 ± 0.000 0.043 ± 0.000 0.338 ± 0.000 0.150 ± 0.000 0.086 ± 0.000 0.126 ± 0.000 0.680 ± 0.000 COPOD 0.035 ± 0.000 0.433 ± 0.000 0.041 ± 0.000 0.368 ± 0.000 0.131 ± 0.000 0.090 ± 0.000 0.170 ± 0.000 0.710 ± 0.000 Donut 0.036 ± 0.001 0.443 ± 0.020 0.041 ± 0.001 0.386 ± 0.027 0.091 ± 0.006 0.086 ± 0.000 0.098 ± 0.059 0.680 ± 0.000 AxonAD (ours) 0.285 ± 0.014 0.702 ± 0.011 0.157 ± 0.012 0.634 ± 0.017 0.533 ± 0.016 0.420 ± 0.019 0.328 ± 0.014 0.715 ± 0.024 20 Supplemen tary Material T able S3. Pairwise comparison against AxonAD on TSB-AD m ultiv ariate (17 datasets, 180 time series). F or each baseline and metric we report: win-rate (wins/180), mean and median p erformance delta ( ∆ ), Wilcoxon p -v alue, and 95% CI for ∆ . Negative ∆ indicates the baseline is w orse than AxonAD (per your export). Entries with non- signiﬁcan t Wilcoxon test ( p ≥ 0 . 05 ) are b old . Model AUC-PR (vs AxonAD) AUC-R OC (vs AxonAD) WR ∆ mean ∆ med p CI 95 WR ∆ mean ∆ med p CI 95 AnomalyT ransformer 0.039 -0.3693 -0.3203 1 . 24 × 10 − 30 [-0.4124, -0.3255] 0.033 -0.3185 -0.3520 1 . 14 × 10 − 29 [-0.3432, -0.2919] AutoEncoder 0.194 -0.1427 -0.1462 1 . 93 × 10 − 11 [-0.1969, -0.0882] 0.172 -0.1554 -0.1776 2 . 33 × 10 − 14 [-0.1916, -0.1224] CBLOF 0.178 -0.1738 -0.1506 2 . 19 × 10 − 13 [-0.2287, -0.1185] 0.139 -0.1610 -0.1601 6 . 06 × 10 − 16 [-0.1949, -0.1268] CNN 0.267 -0.0899 -0.0859 1 . 36 × 10 − 8 [-0.1388, -0.0393] 0.228 -0.0544 -0.0555 2 . 24 × 10 − 8 [-0.0829, -0.0268] COPOD 0.194 -0.2324 -0.2315 2 . 30 × 10 − 14 [-0.2903, -0.1716] 0.133 -0.1730 -0.2094 2 . 19 × 10 − 16 [-0.2064, -0.1398] Donut 0.117 -0.2237 -0.1705 2 . 11 × 10 − 22 [-0.2635, -0.1844] 0.122 -0.1978 -0.1527 6 . 15 × 10 − 23 [-0.2329, -0.1657] EIF 0.189 -0.2505 -0.2159 1 . 98 × 10 − 17 [-0.2998, -0.2029] 0.150 -0.1582 -0.1657 1 . 57 × 10 − 16 [-0.1888, -0.1277] FITS 0.061 -0.2398 -0.1988 2 . 99 × 10 − 28 [-0.2730, -0.2076] 0.094 -0.2138 -0.1651 5 . 35 × 10 − 25 [-0.2496, -0.1779] GDN 0.156 -0.1647 -0.1067 1 . 01 × 10 − 20 [-0.1964, -0.1345] 0.189 -0.0870 -0.0667 4 . 46 × 10 − 14 [-0.1108, -0.0643] HBOS 0.194 -0.2759 -0.2588 2 . 24 × 10 − 19 [-0.3224, -0.2293] 0.128 -0.1916 -0.1984 2 . 23 × 10 − 20 [-0.2217, -0.1611] IF orest 0.217 -0.2274 -0.1814 5 . 74 × 10 − 15 [-0.2780, -0.1776] 0.200 -0.1207 -0.1122 5 . 24 × 10 − 15 [-0.1503, -0.0908] KMeansAD 0.239 -0.1853 -0.1482 4 . 22 × 10 − 12 [-0.2335, -0.1336] 0.261 -0.1337 -0.1174 3 . 58 × 10 − 14 [-0.1646, -0.1023] KNN 0.128 -0.3043 -0.2596 1 . 09 × 10 − 25 [-0.3452, -0.2632] 0.039 -0.3257 -0.3305 1 . 04 × 10 − 29 [-0.3554, -0.2940] LOF 0.133 -0.3406 -0.2886 2 . 50 × 10 − 25 [-0.3881, -0.2962] 0.039 -0.2906 -0.3184 7 . 00 × 10 − 30 [-0.3126, -0.2674] LSTMAD 0.144 -0.1893 -0.1256 9 . 95 × 10 − 21 [-0.2249, -0.1550] 0.139 -0.2274 -0.1489 3 . 90 × 10 − 22 [-0.2673, -0.1873] M2N2 0.239 -0.1181 -0.0955 3 . 82 × 10 − 10 [-0.1695, -0.0663] 0.217 -0.0851 -0.0866 2 . 90 × 10 − 10 [-0.1162, -0.0542] MA V AE 0.322 -0.1380 -0.0390 2 . 07 × 10 − 9 [-0.1802, -0.0936] 0.317 -0.1273 -0.0279 6 . 14 × 10 − 10 [-0.1621, -0.0932] OF A 0.217 -0.1368 -0.0907 1 . 09 × 10 − 15 [-0.1715, -0.1029] 0.206 -0.1855 -0.1181 1 . 35 × 10 − 16 [-0.2228, -0.1473] OmniAnomaly 0.328 -0.0649 -0.0190 2 . 76 × 10 − 5 [-0.0961, -0.0354] 0.372 -0.0806 -0.0043 1 . 94 × 10 − 5 [-0.1097, -0.0518] PCA 0.161 -0.1949 -0.1406 1 . 09 × 10 − 15 [-0.2428, -0.1483] 0.178 -0.1488 -0.0990 2 . 65 × 10 − 19 [-0.1793, -0.1211] RobustPCA 0.150 -0.1989 -0.2100 1 . 05 × 10 − 15 [-0.2548, -0.1407] 0.106 -0.2354 -0.2480 3 . 91 × 10 − 19 [-0.2782, -0.1953] SISV AE 0.272 -0.1143 -0.0522 1 . 73 × 10 − 11 [-0.1511, -0.0781] 0.322 -0.0659 -0.0157 9 . 74 × 10 − 7 [-0.0917, -0.0403] TFTResidual 0.133 -0.1868 -0.1252 4 . 79 × 10 − 21 [-0.2212, -0.1509] 0.206 -0.1152 -0.0852 1 . 54 × 10 − 14 [-0.1429, -0.0886] TimesNet 0.117 -0.2355 -0.1916 5 . 03 × 10 − 27 [-0.2693, -0.2038] 0.133 -0.2068 -0.1294 1 . 99 × 10 − 22 [-0.2433, -0.1702] T ranAD 0.194 -0.1793 -0.1080 1 . 24 × 10 − 18 [-0.2153, -0.1433] 0.161 -0.1498 -0.1024 5 . 06 × 10 − 19 [-0.1791, -0.1197] USAD 0.294 -0.0744 -0.0275 2 . 14 × 10 − 7 [-0.1044, -0.0451] 0.294 -0.0867 -0.0084 2 . 33 × 10 − 8 [-0.1159, -0.0580] V ASP 0.289 -0.0978 -0.0449 1 . 83 × 10 − 10 [-0.1286, -0.0676] 0.333 -0.0626 -0.0158 1 . 13 × 10 − 6 [-0.0857, -0.0400] VSV AE 0.333 -0.1465 -0.0592 8 . 43 × 10 − 9 [-0.1927, -0.1017] 0.378 -0.1159 -0.0281 1 . 17 × 10 − 6 [-0.1525, -0.0778] WV AE 0.322 -0.0829 -0.0381 5 . 57 × 10 − 7 [-0.1182, -0.0451] 0.372 -0.0775 -0.0103 8 . 81 × 10 − 5 [-0.1086, -0.0479] T able S4. Con tinuation of T able S3: VUS-PR, R-based-F1, and Aﬃliation-F1 vs AxonAD. Missing metrics in your export (VUS-ROC, P A-F1, Even t-F1, Range-F1) are not sho wn. Model VUS-PR (vs AxonAD) R-based-F1 (vs AxonAD) Aﬃliation-F (vs AxonAD) WR ∆ mean ∆ med p CI 95 WR ∆ mean ∆ med p CI 95 WR ∆ mean ∆ med p CI 95 AnomalyT ransformer 0.039 -0.3784 -0.2998 1 . 65 × 10 − 30 [-0.4217, -0.3367] 0.067 -0.3325 -0.3168 2 . 57 × 10 − 29 [-0.3678, -0.2964] 0.200 -0.1228 -0.0718 3 . 64 × 10 − 18 [-0.1478, -0.0993] AutoEncoder 0.172 -0.1987 -0.1348 1 . 88 × 10 − 16 [-0.2422, -0.1567] 0.189 -0.1882 -0.2248 2 . 51 × 10 − 13 [-0.2396, -0.1352] 0.333 -0.0596 -0.0207 6 . 96 × 10 − 7 [-0.0831, -0.0354] CBLOF 0.161 -0.2334 -0.1601 1 . 49 × 10 − 19 [-0.2776, -0.1924] 0.200 -0.1692 -0.2151 1 . 54 × 10 − 12 [-0.2177, -0.1185] 0.361 -0.0487 -0.0081 8 . 71 × 10 − 5 [-0.0752, -0.0225] CNN 0.278 -0.1408 -0.0769 9 . 59 × 10 − 12 [-0.1805, -0.1039] 0.217 -0.1694 -0.1844 3 . 38 × 10 − 13 [-0.2190, -0.1180] 0.483 0.0062 -0.0002 0.676 [-0.0127, 0.0264] COPOD 0.150 -0.2900 -0.2262 3 . 03 × 10 − 22 [-0.3360, -0.2450] 0.183 -0.2287 -0.2535 6 . 43 × 10 − 15 [-0.2815, -0.1741] 0.333 -0.0609 -0.0538 2 . 21 × 10 − 7 [-0.0853, -0.0363] Donut 0.100 -0.2312 -0.1677 2 . 85 × 10 − 23 [-0.2700, -0.1925] 0.100 -0.2913 -0.2873 7 . 64 × 10 − 27 [-0.3273, -0.2543] 0.261 -0.0913 -0.0147 1 . 59 × 10 − 13 [-0.1181, -0.0681] EIF 0.161 -0.2836 -0.1920 2 . 10 × 10 − 23 [-0.3275, -0.2431] 0.189 -0.2125 -0.2535 3 . 91 × 10 − 14 [-0.2640, -0.1614] 0.367 -0.0477 -0.0163 1 . 59 × 10 − 5 [-0.0704, -0.0255] FITS 0.100 -0.2265 -0.1607 1 . 92 × 10 − 26 [-0.2615, -0.1927] 0.083 -0.2898 -0.2566 9 . 49 × 10 − 27 [-0.3260, -0.2536] 0.306 -0.0445 -0.0226 1 . 56 × 10 − 7 [-0.0606, -0.0286] GDN 0.161 -0.1611 -0.0866 1 . 87 × 10 − 19 [-0.1936, -0.1294] 0.111 -0.2625 -0.2455 1 . 79 × 10 − 25 [-0.2969, -0.2273] 0.372 -0.0145 -0.0024 0.0621 [-0.0308, 0.0024] HBOS 0.128 -0.3030 -0.2395 9 . 54 × 10 − 25 [-0.3484, -0.2584] 0.189 -0.2271 -0.2438 7 . 61 × 10 − 15 [-0.2803, -0.1770] 0.333 -0.0637 -0.0600 3 . 80 × 10 − 8 [-0.0855, -0.0419] IF orest 0.189 -0.2399 -0.2016 8 . 52 × 10 − 17 [-0.2862, -0.1896] 0.217 -0.2275 -0.2268 9 . 05 × 10 − 18 [-0.2689, -0.1849] 0.339 -0.0590 -0.0399 4 . 68 × 10 − 9 [-0.0767, -0.0401] KMeansAD 0.239 -0.1970 -0.1592 2 . 18 × 10 − 14 [-0.2403, -0.1514] 0.289 -0.1449 -0.1326 1 . 06 × 10 − 8 [-0.1866, -0.0998] 0.422 -0.0407 -0.0138 1 . 06 × 10 − 4 [-0.0599, -0.0196] KNN 0.061 -0.3168 -0.2638 1 . 46 × 10 − 28 [-0.3565, -0.2764] 0.167 -0.2660 -0.2216 6 . 33 × 10 − 23 [-0.3040, -0.2264] 0.300 -0.0695 -0.0435 2 . 44 × 10 − 9 [-0.0909, -0.0484] LOF 0.067 -0.3548 -0.2799 2 . 79 × 10 − 29 [-0.3986, -0.3115] 0.133 -0.3220 -0.2865 1 . 54 × 10 − 25 [-0.3641, -0.2781] 0.279 -0.0953 -0.0677 9 . 44 × 10 − 12 [-0.1179, -0.0716] LSTMAD 0.144 -0.2482 -0.1504 8 . 26 × 10 − 22 [-0.2914, -0.2049] 0.094 -0.2726 -0.2264 1 . 95 × 10 − 27 [-0.3066, -0.2361] 0.344 -0.1594 -0.0079 6 . 26 × 10 − 8 [-0.2043, -0.1159] M2N2 0.228 -0.1702 -0.0953 5 . 15 × 10 − 14 [-0.2105, -0.1322] 0.206 -0.1889 -0.2161 3 . 35 × 10 − 14 [-0.2381, -0.1375] 0.439 -0.0004 -0.0011 0.376 [-0.0183, 0.0175] MA V AE 0.300 -0.1425 -0.0495 2 . 12 × 10 − 10 [-0.1839, -0.0997] 0.228 -0.1455 -0.0838 1 . 20 × 10 − 15 [-0.1805, -0.1105] 0.322 -0.0484 -0.0026 4 . 08 × 10 − 8 [-0.0648, -0.0310] OF A 0.222 -0.1260 -0.0797 3 . 25 × 10 − 14 [-0.1600, -0.0930] 0.283 -0.1111 -0.0820 4 . 37 × 10 − 11 [-0.1419, -0.0796] 0.367 -0.0275 -0.0035 7 . 76 × 10 − 4 [-0.0429, -0.0124] OmniAnomaly 0.328 -0.0697 -0.0172 6 . 67 × 10 − 6 [-0.0998, -0.0396] 0.361 -0.0393 -0.0153 3 . 44 × 10 − 3 [-0.0691, -0.0107] 0.422 -0.0186 -0.0002 0.0852 [-0.0358, -0.0010] PCA 0.161 -0.2167 -0.1472 2 . 20 × 10 − 18 [-0.2611, -0.1722] 0.267 -0.1456 -0.0754 3 . 45 × 10 − 11 [-0.1832, -0.1076] 0.317 -0.0712 -0.0198 1 . 59 × 10 − 10 [-0.0891, -0.0532] RobustPCA 0.128 -0.2550 -0.2011 4 . 81 × 10 − 22 [-0.3000, -0.2135] 0.267 -0.1393 -0.1996 1 . 13 × 10 − 8 [-0.1946, -0.0826] 0.294 -0.0709 -0.0578 1 . 68 × 10 − 8 [-0.0964, -0.0461] SISV AE 0.233 -0.1215 -0.0660 4 . 74 × 10 − 13 [-0.1569, -0.0870] 0.289 -0.1016 -0.0493 1 . 16 × 10 − 9 [-0.1368, -0.0688] 0.383 -0.0360 -0.0005 1 . 56 × 10 − 4 [-0.0529, -0.0188] TFTResidual 0.150 -0.1857 -0.1180 5 . 43 × 10 − 20 [-0.2228, -0.1480] 0.139 -0.2636 -0.2759 3 . 74 × 10 − 25 [-0.2979, -0.2285] 0.400 -0.0136 -0.0024 0.0563 [-0.0307, 0.0041] TimesNet 0.100 -0.2222 -0.1716 3 . 55 × 10 − 26 [-0.2572, -0.1889] 0.089 -0.2952 -0.2675 1 . 66 × 10 − 27 [-0.3304, -0.2586] 0.306 -0.0390 -0.0161 8 . 00 × 10 − 7 [-0.0547, -0.0224] T ranAD 0.233 -0.1848 -0.0947 1 . 27 × 10 − 16 [-0.2241, -0.1479] 0.128 -0.2533 -0.2310 5 . 43 × 10 − 26 [-0.2868, -0.2187] 0.383 -0.0337 -0.0035 6 . 61 × 10 − 4 [-0.0499, -0.0167] USAD 0.289 -0.0812 -0.0253 1 . 98 × 10 − 8 [-0.1111, -0.0518] 0.350 -0.0490 -0.0214 1 . 16 × 10 − 3 [-0.0774, -0.0200] 0.411 -0.0229 -0.0012 2 . 56 × 10 − 2 [-0.0397, -0.0050] V ASP 0.256 -0.0927 -0.0427 4 . 00 × 10 − 10 [-0.1235, -0.0622] 0.350 -0.0705 -0.0641 6 . 53 × 10 − 6 [-0.0999, -0.0422] 0.444 -0.0115 -0.0012 0.139 [-0.0275, 0.0058] VSV AE 0.339 -0.1512 -0.0672 4 . 18 × 10 − 9 [-0.1971, -0.1069] 0.339 -0.0967 -0.0449 1 . 17 × 10 − 6 [-0.1337, -0.0611] 0.472 -0.0189 -0.0000 0.120 [-0.0348, -0.0029] WV AE 0.300 -0.0802 -0.0363 8 . 24 × 10 − 7 [-0.1156, -0.0426] 0.250 -0.1062 -0.0697 6 . 29 × 10 − 11 [-0.1386, -0.0721] 0.422 -0.0216 -0.0004 4 . 16 × 10 − 3 [-0.0386, -0.0039]

Surprised by Attention: Predictable Query Dynamics for Time Series Anomaly Detection

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment