Unsupervised Symbolic Anomaly Detection

Unsup ervised Sym b olic Anomaly Detection Md Maruf Hossain 1 [0000 − 0002 − 3256 − 7701] ⋆ , Tim Katzk e 1 , 2 [0009 − 0000 − 0154 − 7735] ⋆ , Simon Klüttermann 1 [0000 − 0001 − 9698 − 4339] ⋆ , and Emman uel Müller 1 , 2 [0000 − 0002 − 5409 − 6875] 1 TU Dortm und Universit y , Dortm und, German y 2 Researc h Center T rust worth y Data Science and Security , UA Ruhr, German y mdmaruf.hossain@tu-dortmund.de, tim.katzke@tu-dortmund.de, Simon.Kluettermann@cs.tu-dortmund.de Abstract. W e prop ose SYRAN, an unsup ervised anomaly detection metho d based on symbolic regression. Instead of enco ding normal pat- terns in an opaque, high-dimensional mo del, our metho d learns an en- sem ble of human-readable equations that describe symbolic inv ariants: functions that are approximately constant on normal data. Deviations from these inv ariants yield anomaly scores, so that the detection logic is interpretable by construction, rather than via p ost-hoc explanation. Exp erimen tal results demonstrate that SYRAN is highly interpretable, pro viding equations that correspond to kno wn scien tiﬁc or medical rela- tionships, and maintains strong anomaly detection p erformance compa- rable to that of state-of-the-art metho ds. Keyw ords: Anomaly Detection · Outlier Detection · Symbolic Regres- sion · Relation Mining 1 In tro duction Anomaly detection (AD) is an imp ortan t problem in machine learning, with coun tless applications ranging from fraud and fault detection [8,14,33] to medicine [4] and scien tiﬁc research [10,20]. In this work, we fo cus on the common one- class setting, where a mo del is trained only on (predominantly) normal samples and later assigns anomaly scores to unseen data. Numerous metho ds hav e b een prop osed for this setting, spanning classical statistical and distance-based ap- proac hes to mo dern deep neural netw ork metho ds [22,25], but all ultimately aim to capture regularities in normal data and ﬂag deviations from these patterns as anomalies. Ho wev er, despite this progress, most mo dern anomaly detection algorithms do not allo w accessing these identiﬁed patterns. Instead, these are usually hidden a wa y in a non-transparen t mo del with thousands to millions of parameters. W e b eliev e this to b e a signiﬁcant limitation, as anomaly detection is often applied in high-risk scenarios (e.g., healthcare [4] or predictive maintenance [14]), where ⋆ Equal con tribution. 2 M. Hossain et al. ensuring that an anomaly detection metho d w orks as in tended is critical. F ur- thermore, the black-box b eha vior of anomaly detection mo dels has le d to ﬁelds lik e fairness b eing understudied [3], which is esp ecially critical as anomaly de- tection is increasingly used, for example, to preselect candidates for hiring [17]. Finally , it should not b e underestimated how such extracted patterns can b e used to further understand existing b eha viors. Many scientiﬁc discov eries ha ve started out either as patterns in data [29] or as anomalies tow ards those [7]. Motiv ated by these limitations, w e prop ose SYRAN , a metho d that mo dels normalit y through symb olic invariants . F ormally , SYRAN learns a collection of scalar functions that are approximately constan t on normal data and uses their deviations as anomaly scores. Each function is represented as a symbolic expression relating input features, yielding anomaly scores that are in terpretable b y construction. In contrast to p ost-hoc explanation tec hniques [12], SYRAN directly pro duces closed-form equations that can b e insp ected, v alidated, and, if desired, incorp orated into downstream analyses or decision rules. Our approach is inspired by Emmy No ether’s p ersp ectiv e of describing physi- cal systems through conserv ation la ws [21], and is conceptually related to recent w ork on surrogate anomaly detection, most notably DEAN [13], which learns deep neural netw orks that are approximately constant on normal data. SYRAN adopts a similar inv ariance viewp oin t but instan tiates the surrogate as sym b olic expressions obtained via sym b olic regression [23] rather than as neural netw orks. This shift in representation in tro duces speciﬁc challenges whic h we tac kle in this pap er: a tendency tow ard trivial constant solutions, a combinatorial search ov er expression structures, and the need to balance data ﬁt against expression com- plexit y for interpretabilit y . W e ev aluate our metho d in terms of b oth explainability and anomaly detec- tion p erformance, demonstrating unparalleled explainability with only a minor p erformance tradeoﬀ on common anomaly detection datasets. A dditionally , since our mo dels are mathematical equations, they can b e easily b e used in almost ev ery programming language and on almost ev ery device. T o facilitate repro- ducibilit y , our co de is a v ailable at github.com/KDD-OpenSource/SYRAN. 2 Related W orks 2.1 Sym b olic Regression Sym b olic Regression (SR) seeks to infer mathematical expressions from data, disco vering b oth model structure and parameters simultaneously [24]. Its ca- pacit y to yield interpretable and h uman-readable mo dels has driven extensive researc h ov er the past decades [18]. Notable systems, such as Eureqa [28], ha ve demonstrated SR’s ability to rediscov er physical laws from exp erimen tal data. Ho wev er, most existing metho ds op erate in a supervised manner: given an input, they search for a function to approximate the output. Instead, we are fo cusing on the unsup ervised case (and particularly anomaly detection) here. Giv en an input, we search for relationships that are fulﬁlled in the data. This is more Unsup ervised Symbolic Anomaly Detection 3 complicated than the normal sup ervised case, since we hav e no direct goal to optimize, there migh t b e multiple equations, and we are susceptible to trivial solutions (consider the relationship sin( x ) 2 + cos( x ) 2 = 1 ). 2.2 Unsup ervised Anomaly Detection Anomaly detection is the task of detecting unusual, unexp ected, or exceptional samples that deviate from a normal pattern [6,25]. Thus, the main c hallenge of anomaly detection is to understand this normal pattern. Dep ending on the algo- rithm used, this normal pattern is enco ded diﬀeren tly . F or example, an auto en- co der [27] utilizes neural net works, whereas IF orest [16] emplo ys tree ensembles. Ho wev er, one fundamental limitation is that the normal pattern is enco ded in a high num ber of complex-to-understand parameters. This, for example, hides situations where the wrong type of normal pattern is learned [12] a nd limits what can b e understo od from a trained model. And while explainability meth- o ds exist [12], these only provide limited explanations after the fact. Instead, we suggest SYRAN, an anomaly detection metho d that is inherently explainable, as it is built from sym b olic equations deriv ed from the data. 2.3 Sym b olic Regression for Anomaly Detection Applying symbolic regression tow ards anomaly detection is a growing and promis- ing research direc tion. F or example, [19] uses sym b olic regression to detect anomalous orbital tra jectories. Similarly , [30] uses a symbolic approach to de- tect fraudulent ﬁnancial transactions, and [26] ﬁnds machine failures using sym- b olic anomaly detection. How ever, these approaches are far from b eing general anomaly detection metho ds, as eac h of them relies hea vily on the attributes of their sp eciﬁc application area. F urthermore, none of these metho ds employs un- sup ervised sym b olic regression. Instead, they either require access to rare, costly lab eled anomalies or simply predict a key metric. T o the b est of our knowledge, our metho d, SYRAN, is the ﬁrst general symbolic anomaly detection metho d that do es not require lab eled data and can b e applied to any application. 3 The SYRAN Mo del Let the training data b e denoted as X train = { x ( n ) } N n =1 ⊂ R d , which we assume to consist (almost) exclusively of normal samples. A t test time, the goal is to assign an anomaly score score ( x ) ∈ R to an y x ∈ R d , such that higher scores indicate a higher degree of abnormalit y . Instead of learning a blac k-b o x representation of the normal data distri- bution, SYRAN (SYm b olic Regression for unsupervised ANomaly detection) mo dels normality through symb olic invariants . W e call a function f : R d → R an inv ariant if it is (approximately) constant on normal data. Without loss of generalit y , we ﬁx this constant to 1 and aim for f ( x ) ≈ 1 for all x ∈ X train . (1) 4 M. Hossain et al. F ollo wing the idea of learning conserved quan tities as in DEAN [13], we searc h for suc h functions but instead instan tiate them as h uman-readable symbolic expressions. 3.1 Learning a Single Symbolic In v arian t In v ariance Loss on Normal Data F or a candidate in v ariant f : R d → R , we measure how well it satisﬁes (1) on the training data b y the av erage absolute deviation from 1 : L 1 ( f ) = 1 N N X n =1   f ( x ( n ) ) − 1   . (2) Minimizing the inv ariant-learning ob jectiv e L 1 encourages f to b e approximately constan t on X train . How ev er, in the symbolic setting in particular, this ob jectiv e alone easily admits trivial solutions that are useless for anomaly detection. A v oiding T rivial Solutions The loss L 1 is minimized not only b y mean- ingful inv ariants of the data manifold but also b y functions that are globally constan t, or excessiv ely conv oluted expressions that ev aluate to approximately 1 ev erywhere. Such functions cannot distinguish b et ween normal samples and anomalies. T o discourage these trivial solutions, we in tro duce a noise contrast term. W e generate an auxiliary set X rnd = { ˜ x ( ℓ ) } N rnd ℓ =1 b y sampling each feature indep enden tly from a uniform distribution o ver its empirical range: ˜ x ( ℓ ) j ∼ U  min n x ( n ) j , max n x ( n ) j  , j = 1 , . . . , d. On this random bac kground, we deﬁne L noise ( f ) = 1 N rnd N rnd X ℓ =1   f ( ˜ x ( ℓ ) ) − 1   . (3) Constan t functions that minimize L 1 also yield L noise ≈ 0 . In contrast, a useful in v ariant should b e sp eciﬁc to the normal data manifold and therefore typically violated on random noise. W e enco de this preference via a hinge-type p enalt y that encourages L noise to b e at least a margin ∆ > 0 to preven t near-constant solutions while also a voiding that the noise term dominates the optimization. Complexit y Regularization T o obtain interpretable symbolic in v arian ts, w e additionally p enalize the complexity of f . Let c ( f ) denote a non-negativ e com- plexit y measure of the expression (e.g., prop ortional to the num b er and expres- sivit y of no des in its expression tree). W e deﬁne a saturating complexity penalty L c ( f ) = log  1 + log(1 + c ( f ))  , (4) where a hyperparameter γ > 0 controls the strength of this regularization. The double logarithm ensures diminishing returns on the complexit y p enalization, s.t. it do es not ov erwhelm the data ﬁt for mo derately complex expressions. Unsup ervised Symbolic Anomaly Detection 5 Com bined Ob jective Combining these terms, the loss for a single candidate in v ariant f is L ( f ) = L 1 ( f ) + max  0 , ∆ − L noise ( f )  + γ L c ( f ) . (5) The ﬁrst term enforces approximate constancy on normal data, the second pre- v ents trivial global constants by con trasting them with random noise, and the third promotes compact and in terpretable symbolic expressions. 3.2 Sym b olic Parameterization and Optimization W e parameterize each in v ariant f as a sym bolic expression constructed from a ﬁxed set of op erators and functions, such as addition, subtraction, multipli- cation, division, and common nonlinearities (e.g., sin , cos , exp ). Thus, f can b e represen ted as a ro oted expression tree whose leav es are input features and constan ts, and whose internal no des are op erators. T o minimize L ( f ) in this non-conv ex, discrete searc h space, we employ a simple tree-based ev olutionary symbolic regression algorithm [11]. The opti- mizer maintains a population of candidate expressions, applies m utation and crosso ver op erators to generate new candidates, and selects those with low er loss, while aiming to maintain high diversit y throughout the optimization pro- cess. In principle, an y symbolic regression engine capable of ev aluating a custom ﬁtness function could b e used. 3.3 Ensem bling of Sym b olic Inv ariants A single inv ariant f t ypically captures only one simple relation in the data. Real-w orld datasets may exhibit m ultiple approximate inv arian ts, and symbolic regression is inheren tly sto chastic. T o increase robustness and expressiveness, SYRAN therefore learns an ensemble of M in v arian ts { f i } M i =1 and aggregates their deviations. F eature Bagging T o induce diversit y in the ensemble, w e adopt feature bag- ging [15]. F or each ensem ble member i ∈ { 1 , . . . , M } , we sample a subset of K features S i ⊆ { 1 , . . . , d } , | S i | = K, without replacement, and restrict f i to dep end only on the corresp onding co ordi- nates x S i . W e then train f i b y minimizing L ( f i ) as in (5), using the same training data but with inputs pro jected on to S i . This yields a collection of simple, div erse in v ariants, eac h fo cusing on a diﬀeren t subspace of the feature space. Calibration and Aggregation Each inv arian t f i induces a raw deviation score d i ( x ) =   f i ( x S i ) − 1   . (6) 6 M. Hossain et al. In practice, the scales of the deviations d i ( x ) can v ary substantially across en- sem ble members: some inv ariants remain very close to 1 on normal data, while others exhibit larger and p ossibly heavy-tailed deviations, so a naive av erage of d i w ould b e dominated b y a few high-v ariance comp onen ts. T o obtain a robust aggregate, we normalize eac h comp onen t’s deviation b y its mean deviation on the training data, denoted ¯ d i , and pass the result through the logistic sigmoid function σ , yielding calibrated p er-component scores s i ( x ) . Finally , SYRAN aggregates these calibrated scores by a simple a verage, score ( x ) = 1 M M X i =1 s i ( x ) = 1 M M X i =1 σ  d i ( x ) ¯ d i  , (7) where larger v alues indicate stronger violations of the learned symbolic inv ariants and therefore a higher lik eliho o d of b eing anomalous. 3.4 Algorithm Overview The complete training and inference pip eline is summarized in Algorithm 1. Dur- ing training, SYRAN samples feature subsets, and learns in v arian ts via sym b olic regression. The ev olutionary pro cess iterativ ely reﬁnes the symbolic expressions using mutation and crossov er op erations ov er G generations. A t test time, eac h sample is ev aluated by all in v ariants and the resulting calibrated p er-comp onen t scores are av eraged. Notably , due to the in dependence of the ensem ble member functions, training and inference can b e largely parallelized. Algorithm 1 SYRAN: Input: T raining data X train = { x ( n ) } N n =1 , T est data X test = { x ( n ) } N n =1 , ensem ble size M , feature bagging size K , noise margin ∆ , complexity weigh t γ , generations G Output: Anomaly scores: score ( X m test ) 1: for i = 1 to M do 2: Sample feature subset S i ⊆ { 1 , . . . , d } with | S i | = K 3: Dra w random noise X rnd o ver S i 4: Initialize random sym b olic expression f (0) i 5: for g = 1 to G do 6: Ev aluate L ( f ( g − 1) i ) , restricting inputs to x S i 7: Set f ( g ) i based on up dating f ( g − 1) i with m utation and crossov er 8: end for 9: Compute mean deviation ¯ d i = 1 N P N n =1   f i ( x ( n ) S i ) − 1   10: Compute test deviations d i ( X m test ) =   f i ( X m test ) − 1   11: end for 12: score ( X m test ) = 1 M P M i =1 σ ( d i ( X m test ) / ¯ d i ) 4 Redisco v ering Kepler’s Third Law T o illustrate SYRAN’s ability to uncov er meaningful symbolic in v ariants, we ﬁrst consider a small, physics-inspired example. Kepler’s third law states that Unsup ervised Symbolic Anomaly Detection 7 for b o dies orbiting the same central mass, the square of the orbital p eriod T is prop ortional to the cub e of the semi-ma jor axis a of the orbit. After rescaling units, this relation can b e written in the inv ariant form T 2 a 3 ≈ 1 , (8) i.e., the quan tity T 2 /a 3 is conserv ed across diﬀerent orbits. W e construct a tw o-dimensional training dataset from published orbital pa- rameters of 13 b odies orbiting the Sun (planets and dwarf planets), each de- scrib ed b y its orbital p erio d T and semi-ma jor axis a . All 13 samples are treated as normal data and constitute X train for this exp erimen t. W e apply SYRAN exactly as describ ed in Section 3 with input dimension d = 2 and feature subset size K = 2 , so that each inv ariant may depend on b oth T and a . Eac h equation is the result of the evolutionary algorithm ev aluating G = 30000 diﬀerent options. A mo dest complexity weigh t of γ = 0 . 1 and a noise margin of ∆ = 1 app ear to w ork well in this example and are used throughout the rest of the pap er. As a measure of success, we insp ect the ensem ble of learned inv ariants { f i } and count how many are algebraically equiv alen t to Kepler’s law, up to sim- ple rearrangements of T 2 and a 3 . In a typical run, roughly 30% of the learned expressions fall in to this class. Examples of such inv ariants include E 1 ( T , a ) = ( a/T ) ( T /a 2 ) , E 2 ( T , a ) = ( a/T ) 2 a − 1 , E 3 ( T , a ) = a  a T  2 . This demonstrates that given only a small set of “normal” observ ations and without any prior knowledge of the underlying la w, SYRAN can automatically redisco ver a compact symbolic inv ariant that matches a well-kno wn ph ysical re- lation. While one of the most impactful ph ysicists of history required ab out 10 y ears to ﬁnd this equation, SYRAN can do the same in about one minute of com- putation. It thereby provides an interpretable sanity c heck for our metho dology and illustrates it’s p oten tial for data-driv en discov ery of conserved quantities. 5 Exp erimen ts W e ev aluate SYRAN on 19 publicly av ailable datasets from the ADBench b enc h- mark suite [6], cov ering diverse real-world domains including healthcare, bio- c hemical, and scientiﬁc data. Restricting attention to those ADBench datasets with at most 30 features yielded 21 candidates. Among these, the fault and wdb c datasets are excluded b ecause their optimization did not reliably complete within a 2 -hour time budget, leaving 19 datasets: pima, breastw, cardio, stamps, car- dioto cograph y , lymphography , pageblo c ks, glass, wa veform, annth yroid, y east, p endigits, wilt, hepatitis, wine, thyroid, wb c, vo w els, vertebral. Unless stated otherwise, we use a single set of hyperparameters for all datasets, chosen based on the exp erimen t in Section 4: complexity weigh t γ = 0 . 1 , noise margin ∆ = 1 , and feature subset size K = 2 . F or each dataset we train an ensemble of M = 50 in v ariants. The eﬀect of v arying these hyperparameters is analyzed in Section 5.3. 8 M. Hossain et al. 5.1 Exp erimen tal Results SYR AN (max) L OF DEAN KNN IF or est CBL OF DeepSVDD PCA HB OS OCSVM COPOD ECOD SYR AN (mean) L OD A D A GMM SOD 0 0 25 25 50 50 75 75 100 100 AUC-ROC (%) (a) AUC-R OC scores for each algorithm across all datasets considered, sorted by mean. F or each b o x, the line indicates the median, and the square the mean p erformance. 6 8 10 12 14 IF orest (5.2) KNN (5.5) L OF (5.7) DEAN (5.7) CBL OF (6.2) PCA (6.7) DeepSVDD (7.8) (14) SOD (11) DAGMM (9.3) SYRAN (9.3) L ODA (9) COPOD (8.4) ECOD (8.2) HBOS (7.9) OC SVM Critical Difference Diagram (b) CD diagram for mean AUC-R OC. 2 4 6 8 10 12 14 SYRAN (2) IF orest (5.8) KNN (6.1) DEAN (6.3) L OF (6.4) CBL OF (6.8) PCA (7.3) (14) SOD (12) DAGMM (9.8) L ODA (9.4) COPOD (8.9) ECOD (8.8) HBOS (8.5) OC SVM (8.4) DeepSVDD Critical Difference Diagram (c) CD diagram for max AUC-R OC. Fig. 1: Box plots and critical diﬀerence diagrams comparing the AUC-R OC p er- formance of the SYRAN ensem ble (a,b) and its b est-p erforming ensemble mem- b er (a,c) against baseline anomaly detection metho ds. Figure 1 presents the p erformance of our method in relation to comp etitor results for 14 comp etitor algorithms, as rep orted in reference [13], spanning from classical shallo w metho ds to mo dern deep learning ensembles. The b o xplot in Figure 1a shows, that the ov erall ensemble SYRAN (me an) ac hieves comp etitiv e p erformance, while the b est individual inv ariant in eac h en- sem ble SYRAN (max) , obtained by selecting the ensemble member with highest A UC-ROC p er dataset, attains the highest mean p erformance across datasets. T o assess statistical signiﬁcance, w e follow the standard F riedman–Wilcoxon proto col. W e ﬁrst apply a F riedman test [5] to detect ov erall diﬀerences b et ween metho ds and then p erform pairwise Wilcoxon tests [31] with Bonferroni–Holm correction [9] at α = 0 . 05 . The resulting critical diﬀerence (CD) diagrams for the ensem ble and the b est performing ensemble member are sho wn in Figures 1b and 1c, resp ectively . F or the ensemble, the a verage rank is slightly worse than that of some baselines (e.g., LOF and DEAN), but the CD diagram indicates that these diﬀerences are not statistically signiﬁcant. With its b est ensemble mem b ers alone, ho wev er, SYRAN achiev es the best av erage rank and no metho d signiﬁcan tly outp erforms it, and several w eaker baselines are signiﬁcantly w orse. Unsup ervised Symbolic Anomaly Detection 9 0 2 4 6 8 10 x c s 0.0 2.5 5.0 7.5 10.0 12.5 15.0 17.5 20.0 f ( X B C W ) f ( X B C W ) = x c s + | 1 x c s | (a) breast w 0 50 100 150 200 250 300 350 400 x d o s 20 40 60 80 100 120 x l l a f ( X V C ) = ( x d o s 3 x l l a ) 2 + 0 . 8 4 4 9 0.8 0.9 1.0 1.1 1 . 2 f ( X V C ) (b) v ertebral 11 12 13 14 15 x a c 10 5 0 5 10 15 20 25 30 f ( X W Q ) f ( X W Q ) = 1 . 0 7 5 9 x a c 1 1 . 1 2 8 2 (c) wine Fig. 2: Symbolic inv ariants learned by SYRAN on three ADBenc h datasets. Sig- niﬁcan t deviations of function v alues from 1 are indicative of anomalous b eha vior. This gap highligh ts the p oten tial of leveraging interpretabilit y . Eac h ensemble consists of b oth strongly and w eakly indicativ e inv ariants, but as we will show in Section 5.2, a domain exp ert can easily insp ect and select the most plausible in v ariant for a giv en application, yielding a substan tial p erformance gain ov er blind av eraging. Among all metho ds considered, SYRAN is the only one that naturally supp orts such exp ert-in-the-lo op selection base d on closed-form equa- tions. Moreo ver, the fact that SYRAN can outp erform more complex baselines on sev eral datasets using a single inv ariant ov er at most tw o features suggests that some ADBench tasks may b e unsuitable for ev aluating metho ds designed for highly complex data. 5.2 In terpreting Learned Equations Bey ond aggregate p erformance metrics, SYRAN provides explicit s ym b olic in- v arian ts that can b e insp ected and interpreted. In this subsection, we analyze individual inv ariants learned on three representativ e datasets from ADBenc h. F or each dataset, we visualize in Figure 2 the b eha vior of the single ensem ble mem b er with the highest A UC-ROC ov er its feature space. The breastw dataset [32] concerns breast cancer diagnosis. One of its fea- tures, x cs , enco des the uniformity of cell size on a discrete scale from 1 (highly uniform) to 10 (highly non-uniform), a prop ert y plausibly asso ciated with ma- lignancy . On this dataset, the b est inv ariant depends only on x cs and has the form f BCW ( x cs ) = x cs + | 1 − x cs | , b eha ving in line with medical in tuition. The vertebral dataset [2] con tains biomec hanical measurements of the lum- bar spine for detecting pathological conditions. Here, the most indicativ e in- v arian t learned by SYRAN relates the degree of sp ondylolisthesis x dos and the lum bar lordosis angle x lla via f VC ( x dos , x lla ) =  x dos / (3 x lla )  2 + 0 . 8449 . , achiev- ing a AUC-R OC of roughly 90% on this dataset. Although we did not ﬁnd a direct medical reference for this sp eciﬁc form ula, it suggests a plausible join t indicator w orthy of further clinical inv estigation. Finally , the wine dataset [1] describ es chemical prop erties of Italian wines, with one wine t yp e treated as normal and the others as anomalies. The b est- 10 M. Hossain et al. p erforming learned inv ariant dep ends only on the alcohol conten t x ac and takes the form f W ( x ac ) = 1 . 0759 / ( x ac − 11 . 1282) . Within the empirically observed range of x ac , this expression eﬀectively separates normal from anomalous wines, yielding a A UC-ROC of 100% . As w e lac k detailed do cumen tation on which wine t yp es are designated normal, we cannot conﬁrm whether this matches established o enological kno wledge; how ev er, the fact that a single one-dimensional inv ariant p erfectly solves the task raises questions ab out the usefulness of this b enc hmark for ev aluating anomaly detection metho ds. These three cases illustrate ho w SYRAN’s sym b olic inv ariants can reco v er es- tablished domain indicators, prop ose new candidate relationships b et ween mea- suremen ts, and exp ose datasets that are trivially separable by a single feature. 5.3 Ablation study 0.001 0.003 0.01 0.03 0.1 0.2 0.3 0.4 0.5 1 c o m p l e x i t y w e i g h t 60 65 70 75 80 85 90 95 100 AUC-ROC (%) 2 3 4 5 f e a t u r e b a g g i n g s i z e K 0.1 0.2 0.5 1 2 5 10 n o i s e m a r g i n SYR AN (max) SYR AN (mean) L OF Fig. 3: Mean A UC-R OC p erformance across b enc hmark datasets for SYRAN and its b est ensemble member across diﬀerent hyperparameterizations compared to LOF as its strongest comp etitor. The default conﬁguration is highligh ted in gray . Our three h yp erparameters complexity w eigh t γ , feature bagging size K , and noise margin ∆ , aﬀect b oth interpretabilit y and predictive p erformance. While w e delib erately chose our b enc hmark h yp erparameters based on the example from Section 4 to ensure fairness, it is likely that they are sub optimal. Thus, we state the p erformance for alternativ e paramterizations in Figure 3. W e v ary one h yp erparameter at a time and lea ve the remaining tw o at their default v alues. The results are remark ably stable, and as anticipated, there are ev en noticeable impro vemen ts in p erformance. Ov erall, it seems that our choice of default h yp er- parameters derived from Section 4 is reasonable, but not optimal. F or example, b y choosing γ = 0 . 03 , K = 2 , ∆ = 2 , the p erformance of the ensemble c hanges b y +4 . 39% , and of the b est mem b er by − 0 . 05% , leading to substan tially b etter b enc hmark results. How ev er, we did not p ermit suc h v ariation of parameters, as this w ould b e an unfair comparison, and they might not generalize. Next to the anomaly detection p erformance, another imp ortant metric is the in terpretability of the equations that we learned. This is mostly aﬀected by the h yp erparameter γ . Thus, we give the b est equation on the breastw [32] dataset for v arious v alues of γ in T able 1, sho wing that it is p ossible to trade b et ween high p erformance and high expressivity in the equations we learn. Unsup ervised Symbolic Anomaly Detection 11 T able 1: Eﬀect of the complexity w eight on sym b olic expressions for the breastw dataset [32]. Here, x i represen t the i -th feature of the dataset ( x 2 corresp onds to x cs in Section 5.2). W e state the b est ensemble member equation for each γ . Complexit y weigh t γ Best Equation AUC-R OC ( % ) 0.001  1 . 0500 x 3  x 6 · ( − 0 . 0526) · 2 . 2257 99 . 19 0.01 x cos  − 1 . 6023 x 4  2 97 . 60 0.10 x 2 + | 1 − x 2 | 97 . 73 0.50 x 2 97 . 73 6 Conclusion SYRAN shows that symbolic regression can serve as a viable and interpretable approac h for unsup ervised anomaly detection. It learns ensem bles of h uman- readable sym b olic inv ariants that are approximately constant on normal data, optimized in a wa y that enforces constancy , p enalizes trivial constants via random- noise con trast, and controls expression complexity to fav or compact equations. In exp erimen ts, SYRAN rediscov ers Kepler’s third law and, on 19 ADBenc h datasets, ac hieves AUC-R OC p erformance comp etitiv e with black-box baselines while yielding closed-form anomaly scores that recov er known medical indicators, suggest new relationships, and reveal p oten tial ﬂaws in established b enc hmarks. These properties make SYRAN w ell suited for expert-guided anomaly detection. F uture work includes scaling to higher-dimensional and temp oral data and incor- p orating constrain ts or priors to impro ve in terpretability and detection accuracy . A ckno wledgmen ts. This researc h w as supported by the Research Cen ter T rust- w orthy Data Science and Securit y ( https://rc- trust.ai ), one of the Research Alliance cen ters within the Universit y Alliance Ruhr ( https://uaruhr.de ). References 1. A eb erhard, S., F orina, M.: Wine. UCI Machine Learning Rep ository (1992) 2. Barreto, G., Neto, A.: V ertebral Column. UCI Machine Learning Rep ository (2005) 3. Ding, X., Xi, R., Akoglu, L.: Outlier Detection Bias Busted: Understanding Sources of Algorithmic Bias through Data-centric F actors. In: Proceedings of the AAAI/A CM Conference on AI, Ethics, and So ciety . vol. 7, pp. 384–395 (2024) 4. F ernando, T., Gamm ulle, H., Denman, S., Sridharan, S., F o ok es, C.: Deep Learning for Medical Anomaly Detection - A Survey. ACM Computing Surveys (2022) 5. F riedman, M.: A Comparison of Alternative T ests of Signiﬁcance for the Problem of m Rankings. Annals of Mathematical Statistics 11 , 86–92 (1940) 6. Han, S., Hu, X., Huang, H., Jiang, M., Zhao, Y.: ADBench: Anomaly Detection Benc hmark. In: Neural Information Pro cessing Systems (NeurIPS) (2022) 7. Hanson, N.R.: The Concept of the Positron: A Philosophical Analysis. Cam bridge Univ ersity Press (1963) 12 M. Hossain et al. 8. Hilal, W., Gadsden, S.A., Y awney , J.: Financial F raud: A Review of Anomaly Detection T ec hniques and Recen t A dv ances. Exp ert Systems with Applications 193 , 116429 (2022) 9. Holm, S.: A Simple Sequen tially Rejective Multiple T est Procedure. Scandinavian Journal of Statistics 6 (2), 65–70 (1979) 10. Kinney , D., Kemp es, C.: Epistemology and Anomaly Detection in Astrobiology. Biology & Philosophy 37 (4), 22 (2022) 11. Klüttermann, S.: T ow ards Evolutionary Optimization Using the Ising Mo del. ArXiv abs/2511.15377 (2025) 12. Klüttermann, S., Balestra, C., Müller, E.: On the Eﬃcient Explanation of Out- lier Detection Ensembles Through Shapley V alues. In: Paciﬁc-Asia Conference on Kno wledge Discov ery and Data Mining. pp. 43–55. Springer (2024) 13. Klüttermann, S., Katzke, T., Müller, E.: Unsup ervised Surrogate Anomaly Detec- tion. In: Joint Europ ean Conference on Machine Learning and Knowledge Disco v- ery in Databases. pp. 71–88. Springer (2025) 14. Klüttermann, S., Pek a, V., Do ebler, P ., Müller, E.: T ow ards Highly Eﬃcient Anomaly Detection for Predictive Maintenance. In: International Conference on Mac hine Learning and Applications (ICMLA). pp. 1691–1696. IEEE (2024) 15. Lazarevic, A., Kumar, V.: F eature Bagging for Outlier Detection. In: Pro ceedings of the ACM SIGKDD International Conference on Kno wledge Disco very and Data Mining. pp. 157–166 (2005) 16. Liu, F.T., Ting, K.M., Zhou, Z.H.: Isolation F orest. In: International Conference on Data Mining (ICDM). pp. 413–422. IEEE (2008) 17. Ma, M., Han, L., Zhou, C.: Research and application of transformer based anomaly detection mo del: A literature review. ArXiv abs/2402.08975 (2024) 18. Makk e, N., Chawla, S.: Interpretable scien tiﬁc disco very with symbolic regression: a review. Artiﬁcial Intelligence Review 57 (1) (2024) 19. Manzi, M., V asile, M.: Orbital Anomaly Reconstruction Using Deep Symbolic Re- gression. In: Pro ceedings of the International Astronautical Congress (IA C) (2020) 20. Mikuni, V., Nachman, B., Shih, D.: Online-compatible unsup ervised nonresonant anomaly detection. Ph ysical Review D. 105 (5) (2022) 21. No ether, E.: Inv arian te V ariationsprobleme. Nachric h ten von der Gesellschaft der Wissensc haften zu Göttingen, Mathematisch-Ph ysik alische Klasse pp. 235–257 (1918) 22. Oltean u, M., Rossi, F., Y ger, F.: Meta-surv ey on Outlier and Anomaly Detection. Neuro computing 555 , 126634 (2023) 23. P aisner, M., Cox, M.T., Perlis, D.: Symbolic Anomaly Detection and Assessment Using Growing Neural Gas. In: International Conference on T o ols with Artiﬁcial In telligence. pp. 175–181 (2013) 24. Radw an, Y.A., Kronberger, G., Winkler, S.: A Comparison of Recent Algorithms for Symbolic Regression to Genetic Programming. ArXiv abs/2406.03585 (2024) 25. Ruﬀ, L., Kauﬀmann, J.R., V andermeulen, R.A., Mon tav on, G., Samek, W., Kloft, M., Dietterich, T.G., Müller, K.R.: A Unifying Review of Deep and Shallow Anomaly Detection. Pro ceedings of the IEEE 109 (5), 756–795 (2021) 26. Saﬁk ou, E., Pattipati, K.R., Bollas, G.M.: F ault Diagnosis and Prognosis With Inferen tial Sensors: A Hybrid Approac h Integrating Symbolic Regression and In- formation Theory . IEEE Access (2025) 27. Sakurada, M., Y airi, T.: Anomaly Detection Using Auto enco ders with Nonlinear Dimensionalit y Reduction. In: Pro ceedings of the MLSDA 2014 2nd W orkshop on Mac hine Learning for Sensory Data Analysis. pp. 4–11 (2014) Unsup ervised Symbolic Anomaly Detection 13 28. Sc hmidt, M., Lipson, H.: Distilling F ree-F orm Natural Laws from Exp erimental Data. Science 324 (5923), 81–85 (2009) 29. T olle, K., T ansley , S., Hey , T.: The F ourth Paradigm: Data-Intensiv e Scien tiﬁc Disco very [Poin t of View]. Pro ceedings of the IEEE 99 , 1334–1337 (08 2011) 30. Visb eek, S., Acar, E., den Hengst, F.: Explainable F raud Detection with Deep Sym- b olic Classiﬁcation. In: W orld Conference on Explainable Artiﬁcial In telligence. pp. 350–373. Springer (2024) 31. Wilco xon, F.: Individual Comparisons by Ranking Methods. Biometrics Bulletin 1 (6), 80–83 (1945) 32. W olb erg, W.: Breast Cancer Wisconsin (Original). UCI Machine Learning Rep os- itory (1990) 33. Zhang, M., Alv arez, R.M., Levin, I.: Election forensics: Using mac hine learning and syn thetic data for p ossible election anomaly detection. PloS one 14 (10) (2019)

Unsupervised Symbolic Anomaly Detection

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment