Compliance-Aware Predictive Process Monitoring: A Neuro-Symbolic Approach

Compliance-A w are Predictiv e Pro cess Monitoring: A Neuro-Sym b olic Approac h F abrizio De San tis 1  , Gyunam P ark 2 , Wil M.P . v an der Aalst 3 , and F rancesco Zanic helli 1 1 Univ ersity of Parma, P arma, Italy {fabrizio.desantis,francesco.zanichelli}@unipr.it 2 Eindho ven Univ ersity of T echnology , Eindhov en, Netherlands g.park@tue.nl 3 R WTH Aac hen Universit y , Aachen, Germany wvdaalst@pads.rwth-aachen.de Abstract. Existing approac hes for predictiv e pro cess monitoring are sub-sym b olic, meaning that they learn correlations b et w een descriptiv e features and a target feature fully based on data, e.g., predicting the surgical needs of a patient based on historical ev en ts and biometrics. Ho wev er, such approaches fail to incorp orate domain-sp eciﬁc pro cess constrain ts (knowledge), e.g., surgery can only b e planned if the patient w as released more than a week ago, limiting the adherence to compli- ance and pro viding less accurate predictions. In this pap er, we present a neur o-symb olic appro ach for predictive pro cess monitoring, lev eraging Logic T ensor Net works (L TNs) to inject pro cess knowledge into predic- tiv e mo dels. The prop osed approach follows a structured pip eline consist- ing of four key stages: 1) feature extraction; 2) rule extraction; 3) knowl- edge base creation; and 4) knowledge injection. Our ev aluation shows that, in addition to learning the pro cess constraints, the neuro-symbolic mo del also achiev es better p erformance, demonstrating higher compli- ance and impro ved accuracy compared to baseline approaches across all compliance-a ware exp erimen ts. Keyw ords: Predictiv e pro cess monitoring · Pro cess mining · Neuro- sym b olic AI · Deep learning · Reasoning 1 In tro duction In business pro cess management, predictiv e pro cess monitoring has emerged as a vital to ol for organizations, enabling them to an ticipate pro cess outcomes and irregularities by analyzing historical ev ent data [10]. Curren t approac hes rely hea vily on adv anced neural net works like LSTMs and T ransformers that excel at pattern recognition [19]. How ev er, these metho ds struggle to formally incor- p orate constraints, domain expertise, and new business rules that may conﬂict with historical data (e.g., “surgery can only b e planned if a patient was released 2 F. De Santis et al. more than a week ago”). This prev ents purely data-driv en mo dels from captur- ing crucial contextual insights, highlighting the need for hybrid approaches that com bine statistical learning with symbolic reasoning. Neur o-symb olic artiﬁcial intel ligenc e , whic h integrates neural net works with sym b olic knowledge represen tation, oﬀers a promising framew ork to address these challenges [4]. By embedding domain knowledge, neuro-sym b olic approaches enable mo dels to lev erage b oth statistical patterns from data and explicit logical priors. This pap er proposes a neuro-symbolic approac h for predictive pro cess monitoring, enhancing neural netw orks with sym b olic reasoning. First, we ex- tract and categorize features from even t logs into: c ontr ol-ﬂow fe atur es , temp or al fe atur es , and p aylo ad fe atur es . The feature categories deﬁne the v o cabulary for the rule extraction, with feature types aligning with pro cess rule types (knowl- edge). Once extracted, we formalize the rules into a structured kno wledge base using ﬁrst-order logic. Then, we categorize knowledge t yp es based on their relationships to the clas- siﬁcation problem, obtaining three types of pro cess kno wledge: class-dep endent know le dge , whic h is further categorized into non-outc ome-oriente d and outc ome- oriente d knowledge, and class-indep endent know le dge . Finally , this knowledge can b e injected into our neuro-symbolic mo del, lev eraging the Logic T ensor Net- w ork (L TN) framew ork [3]. The knowledge is injected in three diﬀerent w ays in relation to the main classiﬁcation task: at the input level , b y expanding the feature space; at the output level , b y redeﬁning the classiﬁcation mo del’s output; and in p ar al lel , b y incorp orating additional knowledge that is not strictly related to the main classiﬁcation task but can inﬂuence the training. The core idea is to lev erage domain knowledge to guide the learning pro- cess, ensuring adherence to business rules while improving predictive accuracy . Ev aluations on four real-life even t logs using a compliance-aw are test set sho w that our approach outp erforms traditional deep learning mo dels, such as LSTM and T ransformer, and knowledge-encoding techniques, such as seman tic loss, in b oth accuracy and compliance with domain rules, main taining high constraint adherence ev en when training data contains few compliant examples. The pap er is structured as follows. In Sect. 2, w e giv e an ov erview of b oth predictiv e pro cess monitoring and neuro-symbolic AI ﬁelds. W e introduce pre- liminary concepts in Sect. 3. In Sect. 4, we show how rules can be mapp ed to the feature space, what type of rules can b e extracted, and ho w the kno wledge base created with these rules can b e leveraged by a binary classiﬁer. Finally , we assess our approach’s capabilities to enhance accuracy and compliance in Sect. 5 and conclude the pap er in Sect. 6. 2 Related W ork Predictiv e Process Monitoring (PPM) is a branc h of process mining that aims to forecast the future states of ongoing business pro cess instances. By analyzing ev ent logs, PPM seeks to predict v arious asp ects, such as the next activities, remaining time, or p otential outcomes of pro cesses [16, 22]. Deep learning, par- Compliance-A ware Predictive Pro cess Monitoring 3 ticularly through T ransformer arc hitectures [5], LSTM netw orks [19], and graph neural mo dels [2], has signiﬁcan tly adv anced PPM. Although recen t work has increasingly focused on integrating fairness principles in to predictive process monitoring [17], little atten tion has been paid to the role of logical constraints and business rules, which remain essen tial in compliance-sensitiv e settings. Most existing metho ds remain predominantly data-driven and ignore constraints that should guide or restrict predictions. Only a small subset of studies incorp orates explicit pro cess knowledge: V azifehdo ostirani et al. [20] enco de control-ﬂo w pat- terns as predeﬁned input features, Di F rancescomarino et al. [11] enforce L TL constrain ts as p ost-processing rules, and Mezini et al. [15] em b ed L TL constraints directly into mo del training through a diﬀerentiable loss aimed at improving suﬃx prediction. How ever, these strategies hav e t wo key limitations: (1) they concen trate mainly on control-ﬂo w constraints while paying limited attention to temp oral aspects and payload data, and (2) they in tegrate domain knowledge only at the input or output lev el (except for [15]). Neuro-sym blic AI aims to combine the strengths of symbolic reasoning and sub-sym b olic learning [4]. This integration addresses fundamen tal limitations in b oth traditional AI paradigms. Approaches like δ ILP [8] and DeepProbLog [14] bring together logic programming paradigms and deep neural mo dels, enabling end-to-end learning b y com bining neural predicates with probabilistic logic. Other metho ds, such as semantic loss functions [21] and constraint-a ware train- ing metho ds [9], embed formal rules directly in to diﬀerentiable ob jectives. Recent adv ancemen ts fo cus on systematic approaches for injecting domain kno wledge in to neural architectures. Kno wledge-Enhanced Neural Netw orks (KENNs) [7] inject prior knowledge through residual connections, enforcing constrain ts while main taining end-to-end learning. Logic T ensor Netw orks (L TNs) [3] embed ﬁrst-order logic into neural compu- tational graphs through fuzzy logic semantics, oﬀering a uniﬁed paradigm that em b eds ﬁrst-order logic formulas during training. This brings several b eneﬁts to the PPM domain: (1) it allows diﬀerent forms of pro cess knowledge, ranging from control-ﬂo w constrain ts expressed in L TL to pa yload-dep enden t business rules, as w ell as relations inv olving durations or waiting times; (2) fuzzy se- man tics enable soft satisfaction of constrain ts, making them robust to partial compliance, uncertain ty , and rule conﬂicts often presen t in real pro cess execu- tions; and (3) learning and reasoning become in trinsically coupled, as logical form ulas inﬂuence the neural optimization pro cess instead of b eing applied only after mo del training. Our approach injects kno wledge into the reasoning pro cess through a neuro-symbolic arc hitecture that lev erages the expressiv e capabilities of ﬁrst-order logic. This enables the formulation of interpretable and ﬂexible logical rules. F urthermore, we extend the scop e of our analysis to encompass additional asp ects of the pro cess, such as data attributes, using new rules and logical constrain ts that may conﬂict with the data. 4 F. De Santis et al. SatAgg Fig. 1: Binary classiﬁcation bac kbone implemen ted as an L TN computational graph [3]. 3 Preliminaries 3.1 Logic T ensor Net w ork (L TN) Logic T ensor Net works em b ed ﬁrst-order logic into diﬀeren tiable computation graphs, allowing neural mo dels to jointly learn from data and from logical knowl- edge bases [3]. L TNs ground symbolic ob jects (constants, functions, predicates) in to tensors, ev aluate fuzzy connectives, and aggregate the satisﬁability of all form ulas in a knowledge base. This makes them a conv enien t substrate for tasks suc h as classiﬁcation, relational reasoning, and constrain t satisfaction while k eep- ing end-to-end training diﬀeren tiable. W e build on this general machinery b y instantiating an L TN-based binary classiﬁer that serv es as the bac kb one for our compliance-aw are extensions. Fig. 1 summarizes the resulting computational graph, and the k ey components are describ ed b elo w. Gr ounde d inputs. Eac h training sample b elongs either to the p ositiv e set X + or the negative set X − . V ariables x + and x − range ov er these t wo sets. The grounding op erator G X maps every v ariable instan tiation to its feature vector, i.e., G X ( x + ) = { v 1 , . . . , v |L + | } and analogously for x − . Domain constants (e.g., activit y labels, pa yload v alues) are grounded via G C , whereas deterministic at- tributes are pro duced by functions grounded through G F . Pr e dic ate evaluation. The unary predicate A mo dels the binary classiﬁer ∆ . Its grounding G θ ( A ) is a neural netw ork (LSTM or T ransformer enco der follow ed b y an MLP) parameterized by θ . Applying G θ ( A ) to G X ( x + ) returns a vector of truth v alues G θ ( A ( x + )) with en tries in [0 , 1] ; v alues close to 1 indicate samples lik ely to belong to the p ositive class. W e explicitly mo del the complementary predicate ¬ A by applying the fuzzy negation 1 − u to the outputs obtained on X − . Quantiﬁers and satisfaction. Universal quantiﬁers aggregate truth v alues for all individuals in L + or L − via the pMeanError op erator: ∀ x + A ( x + ) = 1 −  1 |L + | X v ∈G X ( x + ) (1 − G θ ( A ( v ))) p  1 p , Compliance-A ware Predictive Pro cess Monitoring 5 with p ≥ 1 controlling ho w strictly deviations are penalized (higher p enforces stricter universals). Existen tial quan tiﬁers rely on the standard generalized mean (pMean). A kno wledge base K aggregates all grounded form ulas through SatAgg (again pMeanError), and the optimization minimizes L = 1 − SatAgg ϕ ∈K  G θ ( ϕ )  , thereb y tuning θ so that all constraints jointly approac h truth v alue 1. This L TN classiﬁer forms the reference pip eline. In Sec. 4, w e introduce pro cess-speciﬁc logical formulas and sho w ho w they extend the bac kb one by (i) enriching the grounded inputs, (ii) constraining predicate outputs, and (iii) adding parallel ob jectiv es that share the same satisfaction op erator. 3.2 Predictiv e Pro cess Monitoring W e now instan tiate the L TN bac kb one on pro cess data. An ev ent log records m ultiple tr ac es , eac h describing one process instance as an ordered sequence of ev ents σ = ⟨ e 1 , . . . , e n ⟩ sharing the same case iden tiﬁer. Every even t e = ( a, c, t, attr 1 , . . . , attr m ) captures the executed activity a , the case c , its times- tamp t , and optional pa yload attributes. Predictiv e pro cess monitoring targets ongoing instances: a pr eﬁx l = ( σ, k ) collects the ﬁrst k ev ents of trace σ . Let L be the set of all preﬁxes extracted from the training logs, with L + and L − denoting the positive and negative subsets according to the selected outcome lab el (e.g., complication vs. no complication). The predictive mo del ∆ : L → { + , −} decides whether a new preﬁx b elongs to the p ositiv e class. This setting plugs directly in to the notation in tro duced in Sect. 3.1: preﬁxes act as v ariables x + and x − , their feature vectors are the groundings G X , activity names and pa yload v alues p opulate the constan ts grounded via G C , and deter- ministic descriptors such as waiting times are pro duced by functional groundings G F . The predicate A b ecomes the predictive classiﬁer P that scores eac h preﬁx, exactly as sho wn in Fig. 1. Finally , domain rules can now b e written o ver these sym b ols. F or instance, the constraint “if antibiotics ar e not administer e d within two hours after sur gery for an elderly p atient, the p atient wil l have c omplic ations” can b e expressed as ∀ l ∈ L  ( W aitT ime ( l, Surg , ATB ) > 2 ∧ Ag e ( l ) > 60) → P ( l )  where W aitT ime, Ag e ∈ F and P is the predicate mo deling the risk of compli- cations. These formulas will b e injected into the L TN knowledge base in Sec. 4. 4 Neuro-Sym b olic AI for Predictive Pro cess Monitoring Our approach injects process kno wledge in to predictiv e process monitoring through the pip eline illustrated in Fig. 2. Starting from an ev ent log of traces preﬁxes, w e extract b oth features and rules, whic h are then used to build a kno wledge 6 F. De Santis et al. Event Log Knowledge injection Process Rules Features Feature extraction Rule extraction L TN Binary Classifier 1 2 3 Knowledge base creation KB 4 Control-Flow Rules T emporal Rules Payload Rules Class- independent Class- dependent Rule types Knowledge categories Control-Flow Features T emporal Features Payload Features Feature categories Non-outcome- oriented Outcome- oriented Fig. 2: The pip eline follow ed in the approach, whic h consists of feature extraction, rule extraction, knowledge base creation and injection, and then the creation of the neuro-sym b olic mo del leveraging the L TN framework. base. This knowledge is ﬁnally in tegrated into a binary classiﬁer enriched with pro cess-lev el constrain ts. T o illustrate our approac h, consider a healthcare scenario where the goal is to predict whether a patien t will exp erience p ost-surgical complications. A t ypical case includes preparatory assessments, such as medical history review ( Rev ), physical examinations ( Exam ), laboratory tests ( Lab ), and an tibiotic ad- ministration ( ATB ), follow ed by surgery ( Surg ), during which k ey factors suc h as pro cedure duration and intraoperative even ts are recorded. P ostop erativ e steps include pain management ( PAdm ) and follo w-up chec ks ( PostCU ). In Preﬁxes l from past executions, together with current and historical patient data, supp ort outcome prediction. W e use the predicate P to mo del the probability that, for a giv en preﬁx, the patient will exp erience p ost-surgical complications, with l + and l − represen ting p ositiv e and negative cases, resp ectively . 4.1 F eature Extraction First, we extract features from ev ent logs to capture relev ant characteristics of pro cess execution. These features deﬁne the vocabulary for b oth rule extraction and prediction tasks, with the feature space design aligning with the types of pro cess rules (knowledge) we aim to incorp orate. Our approac h utilizes three main feature categories: – Contr ol-ﬂow fe atur es capture activity sequences based on De clar e [1] con- strain ts. In our healthcare example, these represent the activities p erformed b y the patient and the relationship b et ween them, e.g., H asAct ( l , Rev ) indi- cates if the activit y Rev was p erformed for a patient. – T emp or al fe atur es capture time-based asp ects based on Service Level Agree- men ts (SLAs) [18] on activit y durations, w aiting times, cycle time, etc. In our healthcare example, this includes the time elapsed b et ween surgery and an tibiotic administration, e.g., W aitT ime ( l , Surg , ATB ) . Compliance-A ware Predictive Pro cess Monitoring 7 – Paylo ad fe atur es represent data associated with ev ents and cases, divided in to: • Case-level p aylo ad : Data related to the entire case, including n umerical, categorical, and unstructured v alues av ailable at preﬁx l initiation. In our healthcare example, this includes patient age and pre-existing conditions (diab etes, hypertension, ob esit y). • Event-level p aylo ad : Data asso ciated with speciﬁc ev ents in the trace preﬁx l , which can only be used once the corresp onding activity o ccurs. In our healthcare example, this includes diagnostic test results, medical rep orts, or medication administered during treatment. 4.2 R ule Extraction The second main step of our pip eline is rule extraction. W e leverage established pro cess mining tec hniques to automatically discov er and formalize rules from ev ent logs. The types of rules that can be extracted are directly linked to the previously deﬁned feature space, as process rules m ust map onto appropriate features to b e leveraged eﬀectively . W e extract three t yp es of rules: – Contr ol-ﬂow rules : These rules derive from Declarative mining, whic h iden ti- ﬁes constrain ts such as sequential relationships, mutual exclusion, existence constrain ts, and choice relationships [13]. The resulting rules can b e repre- sen ted as Linear T emp oral Logic (L TL) constrain ts. In our healthcare ex- ample, con trol-ﬂow rules sp ecify the required order of medical pro cedures. Using L TL templates like r esp onse , chain r esp onse , and pr e c e denc e , w e can formalize rules suc h as: • “If Rev o ccurs, Exam should o ccur after Rev ” represented as □ ( Rev = ⇒ ♢ Exam ) • “ PostCU should immediately follo w Surg ” represented as □ ( Surg = ⇒  PostCU ) • “ PAdm should o ccur only if PostCU has o ccurred b efore” represented as ( ¬ PAdm U PostCU ) ∨ □ ( ¬ PAdm ) – T emp or al rules : These rules focus on timing and duration asp ects of activi- ties and pro cesses, capturing constrain ts related to the time elapsed b et ween activities, activit y durations, and ov erall p erformance. T emp oral rules are ex- tracted from SLA compliance analysis [18], pro ducing IF-THEN statements capturing exp ected timing b eha viors. In our healthcare example, a temp o- ral rule migh t state that “if ATB happens within tw o hours after Surg , the lik eliho o d of complications decreases. ” – Paylo ad rules : These rules deriv e from payload features and capture the data con text of pro cess even ts and cases. W e automatically extract pa yload rules using statistical analysis to uncov er correlations betw een pa yload attributes and outcomes [6]. Pa yload rules are typically expressed in IF-THEN format. In our healthcare example, a payload rule might sp ecify that “if a patient’s o xygen saturation falls b elo w 90% post-surgery , the risk of complications increases. ” 8 F. De Santis et al. Domain exp ertise also pla ys a crucial role in supplemen ting automatically ex- tracted rules. F or example, knowledge that “patien ts with conditions such as diab etes hav e increased complication risks” can b e man ually incorporated in to the kno wledge base. 4.3 Kno wledge Base Creation Once extracted, w e formalize the rules into a structured knowledge base. Since our goal is to integrate these rules in to an L TN framework, we need to translate the v arious rule types into ﬁrst-order logic (F OL) representations. T ranslation of R ules into F OL W e translate three main types of rules into F OL: – Contr ol-ﬂow rules: W e follo w the approach in [12] to conv ert L TL rules in to FOL, preserving the semantics of temp oral op erators while allowing for gradien t-based optimization. F or instance, in a healthcare pro cess example, the constraint □ ( Rev = ⇒ ♢ Exam ) is translated into ∀ l ( H asAct ( l , Rev ) ∧ N ext ( l , Rev , Exam )) . – T emp or al and p aylo ad rules: W e conv ert IF-THEN temp oral rules and pa y- load rules into logical formulas b y decomp osing each rule into its anteceden t and consequen t. W e determine the appropriate quan tiﬁer based on the se- man tics of eac h an teceden t. When a rule should apply univ ersally to all instances, we use the universal quantiﬁer ( ∀ ); when it should hold for at least one instance, we use the existen tial quantiﬁer ( ∃ ). F or example, the rule “for patien ts l + at risk of complications, if ATB is p erformed within tw o hours after Surg , the lik eliho od of complications decreases” translates to: ∀ l + ( W aitT ime ( l + , Surg , ATB ) ≤ 2 → ¬ P ( l + )) . Kno wledge Categorization W e categorize kno wledge types based on their relationship to the classiﬁcation problem: – Class-dep endent know le dge: This kno wledge type relates to the classes in the classiﬁcation problem and is further divided in to: • Non-outc ome-oriente d know le dge: This enco des constraints related to the problem’s classes but not directly to the outcome. The main classiﬁca- tion predicate does not app ear within these constrain ts. F or example, a constraint might establish that “elderly diab etic patients l + require sp ecial monitoring. ” • Outc ome-oriente d know le dge: This knowledge directly inv olves the out- come, including the classiﬁcation predicate in the consequent of the IF- THEN implication. F or instance, “for patien ts l + at risk of complica- tions, if ATB is p erformed within tw o hours after Surg , then the risk of complications decreases. ” – Class-indep endent know le dge: This is general knowledge not tied to the class of the trace, such as control-ﬂo w constrain ts that do not inﬂuence the pro cess Compliance-A ware Predictive Pro cess Monitoring 9 mask: B C A SatAgg Fig. 3: The three w ays of injecting kno wledge: (A) feature expansion for class- dep enden t, non-outcome-oriented kno wledge, (B) output reﬁnement for class- dep enden t, outcome-oriented knowledge, and (C) parallel constraints for class- indep enden t knowledge. Eac h injection path wa y addresses a diﬀeren t failure mo de of purely data-driven predictors: feature expansion propagates declara- tiv e facts such as “elderly diab etic p atients r e quir e sp e cial monitoring” into the feature space, output reﬁnement op erationalizes outcome rules like “timely an- tibiotics lower c omplic ation risk” directly on the predicate outputs, and parallel constrain ts regularize the shared representation by p enalizing pro cess executions structural rules such as “me dic al history r eview must pr e c e de physic al examina- tion. ” outcome. F or example, a constraint might dictate that “for patients l Exam should b e p erformed after Rev . ” Eac h knowledge t yp e (control-ﬂo w, temp oral, and pa yload) can b e classiﬁed according to these categories, guiding ho w they are in tegrated in to the predictive approac h. 4.4 Kno wledge Injection After creating the kno wledge base, w e inject this kno wledge in to our neuro- sym b olic mo del. W e prop ose three distinct injection metho ds based on the three kno wledge categories. As sho wn in Fig. 3, each metho d injects knowledge at diﬀeren t p oin ts relative to the binary classiﬁcation predicate P , which serves as the core comp onen t of our prediction system. F eature Expansion This metho d injects class-indep endent, non-outc ome-oriente d know le dge b y prepro cessing data b efore it reaches the binary classiﬁcation pred- icate P (blo ck A in Fig. 3, highligh ted in yello w). This kno wledge enhances the feature space with additional information without directly referencing the classiﬁcation outcome. 10 F. De Santis et al. Since these constraints serve to expand the feature space, they are imple- men ted as predicates mo deled using functions that return a deterministic v alue in the in terv al [0 , 1] when ev aluated on a trace preﬁx. T o ensure their integration within the o verall reasoning pro cess, these logical rules are also incorp orated in to the kno wledge base, allowing them to in teract with other knowledge and con- tribute directly to the optimization ob jectiv e by inﬂuencing the loss function. W e apply constraints universally to all instances using the ∀ op erator, then pass the augmen ted representations to the classiﬁcation predicate P . In our healthcare scenario, the kno wledge that “elderly diab etic patien ts require sp ecial monitoring” can b e expressed as: Ag e ( l ) > 60 ∧ H asC ond ( l, Diab etes ) → S pecM on ( l ) The premise of this formula is used to generate a new feature indicating whether b oth conditions are satisﬁed for each patient l , where Ag e returns the patient’s age and H asC ond returns 1 if the patient has diab etes and 0 otherwise. The v alues pro duced by ev aluating such logical rules are then concatenated with the original feature v ector representation. Output Reﬁnement This metho d injects class-dep endent, outc ome-oriente d know le dge by mo difying the outputs of the classiﬁcation predicate (blo c k B in Fig. 3, highlighted in red). These constrain ts directly inﬂuence prediction out- comes by imp osing rules on the outputs P ( l + ) and P ( l − ) . This conﬁguration has the most direct impact on ﬁnal predictions since it reﬁnes the classiﬁcation results rather than modifying the input feature space. The constrain ts can ad- just prediction probabilities based on domain rules that explicitly reference the outcome. In our healthcare scenario, the rule “timely antibiotic administration reduces complication risk” can directly mo dify the complication probability es- timated by predicate P . When antibiotics are administered within tw o hours of surgery , the constrain t decreases the truth v alue for p ositiv e preﬁxes ( P ( l + ) ). P arallel Constraints This metho d injects class-indep endent know le dge as par- allel constrain ts that op erate alongside the main classiﬁcation task (blo c k C in Fig. 3, highlighted in blue). Since this knowledge do es not dep end on outcomes, it applies to all preﬁxes l . These constraints are implemen ted as auxiliary predi- cates modeled by separate neural netw orks, serving as complementary ob jectiv es that m ust b e satisﬁed join tly with the main classiﬁcation task. Although they do not reference the class lab el, their gradients regularize the shared represen tation by p enalizing pro cess executions that violate structural rules. This prev ents the classiﬁer from o v erﬁtting to non-complian t patterns that acciden tally correlate with the outcome. T o ensure the optimization of the primary predictor is not adv ersely inﬂuenced, only preﬁxes that conform to the enco ded domain rules are incorp orated into the parallel constraint-related loss. Preﬁxes that violate these rules are therefore excluded from this loss comp onen t, prev enting non-conforman t b eha vior from introducing misleading gradients dur- ing training. The truth v alues of these auxiliary predicates are aggregated and Compliance-A ware Predictive Pro cess Monitoring 11 connected to the main satisfaction operator so that satisfying the structural rules increases the o verall SatAgg score. In our healthcare scenario, process-structural kno wledge such as “medical history review must precede physical examination” and “patients should follow prop er nutrition guidelines” can b e enco ded as ∀ l  ( H asAct ( l, Rev ) ∧ N ext ( l, Rev , Exam )) ∧ N utriAdeq ( l )  where H asAct is a predicate mo deling the probabilit y that Rev happ ened in l , N ext is the predicate mo deling the probability that Exam follo ws Rev , and N utr iAdeq is a predicate mo deling the probability that the patient is following a prop er diet. Enforcing these constraints encourages the classiﬁer to rely on medically plausible preﬁxes; when such structure correlates with the outcome (e.g., w ell-structured care often yields fewer complications), indirect p erformance gains are observ ed. 5 Ev aluation 5.1 Datasets W e ev aluated our approac h on an outcome prediction task using four publicly a v ailable real-life ev en t logs. W e selected the even t logs based on the presence of case-level and ev ent-lev el attributes that could serve as domain knowledge to deﬁne meaningful logical rules and inﬂuence outcome prediction. – BPIC2012 : This ev ent log pertains to the loan application pro cess of a Dutc h bank. W e deﬁned the lab eling based on whether an application is accepted or not. – BPIC2017 : This is a higher quality v ersion of BPIC2012 with more examples and features. The lab eling is based on whether an application is accepted or not. – T r aﬃc ﬁnes : A real-life even t log of a system managing traﬃc ﬁnes. W e deﬁned the labeling based on whether the ﬁne is sen t for credit collections or paid in full. – Sepsis : This even t log con tains sepsis cases from a hospital. W e deﬁned the outcome lab el based on whether the patient is admitted to the ICU or not. 5.2 Exp erimen tal Design In our experiments 4 , we compare eigh t architectural v ariants to ev aluate the impact of incorp orating domain knowledge into predictive process monitoring: 1. LSTM [19], TFR [5]: Purely data-driven baselines trained with binary-cross en tropy . 4 The co de to reproduce our experiments is av ailable at https://github.com/ FabrizioDeSantis/NeSyPPM- Compliance . 12 F. De Santis et al. 2. LSTM-FE , TFR-FE : LSTM and T ransformer mo dels with kno wledge en- co ded in the feature space. 3. LSTM-SL , TFR-SL : LSTM and T ransformer mo dels trained with seman- tic loss [21]. 4. L TN-Data-L , L TN-Data-T : L TN [3] with LSTM/T ransformer backbones trained without additional domain kno wledge. 5. L TN-L , L TN-T : Our prop osed approach incorp orates domain-sp eciﬁc pro- cess kno wledge through the neuro-symbolic framework (i.e., conﬁgurations A, B, and C). The LSTM mo del has tw o la yers and a hidden dimension of 128. The trans- former arc hitecture uses a hidden size of 128, along with tw o atten tion heads. Both mo dels are trained using the Adam optimizer with a learning rate of 0.001 and a batch size of 32. Categorical v ariables are represented through 64- dimensional em b eddings, while numerical features are normalized and then pro- cessed using linear pro jection lay ers. All mo dels are trained for 100 ep ochs, with early stopping applied based on v alidation set p erformance to prev ent ov erﬁtting. While more complex neural arc hitectures could b e considered, we inten tionally used a simpliﬁed design to isolate and clearly ev aluate the impact of injected kno wledge constraints on prediction p erformance. Our approach is architecture- agnostic, with results exp ected to transfer to other neural netw ork designs. W e do not compare with knowledge-enhanced metho ds for next suﬃx prediction, as these techniques enco de only control-ﬂo w information and are structurally tied to sequence con tinuation. Adapting them to outcome prediction would require redesigning their core mo deling assumptions and would not provide a meaning- ful and fair baseline. Similarly , we do not consider p ost-ho c output reﬁnemen t with hard rule enforcement, which acts as an oracle b y imp osing constraints after prediction rather than shaping the learned represen tations. In con trast, our framew ork integrates process knowledge directly during training, enabling the mo del to learn under uncertaint y and noise rather than relying on external corrections. F or ev aluation, we adopted an 80-20 train-test split, reserving 20% of the training as a v alidation set. The test set consists of 1) rule-c ompliant tr ac es that satisfy b oth their original outcome lab el and the consequent of applicable rules and 2) randomly selected examples from the original dataset. This design sp eciﬁcally ev aluates ho w w ell mo dels adapt to new compliance constrain ts that may not b e represen ted in historical training data. Note that the kno wledge base asso ciated with eac h even t log consists of six logical rules. Our ev aluation fo cuses on three key asp ects: 1. R Q1 : How do es the injection of domain knowledge improv e predictive p er- formance compared to purely data-driv en approaches? 2. R Q2 : Ho w do the diﬀeren t knowledge injection conﬁgurations in the en- hanced L TN (conﬁgurations A, B, and C in Fig. 3) inﬂuence predictiv e p er- formance? 3. R Q3 : Ho w well do the mo dels adhere to domain constrain ts, as measured b y the prop ortion of satisﬁed constraints ov er all applicable constraints? A Compliance-A ware Predictive Pro cess Monitoring 13 T able 1: P erformance on compliance-aw are test set comparing purely data-driven baselines (LSTM, TFR, L TSM-FE, TFR-FE) with knowledge-enhanced v ariants. Results sho w mean and standard deviation ov er 5 seeds. Model Sepsis BPIC2012 BPIC2017 T raﬃc ﬁnes Accuracy F1 Accuracy F1 Accuracy F1 Accuracy F1 LSTM 81,86 ± 1,57 71.61 ± 4.70 53.19 ± 2.45 52.49 ± 1.99 64.62 ± 0.76 64.04 ± 1.15 77.34 ± 1.54 77.27 ± 1.65 TRF 81,53 ± 2,31 73.66 ± 2.78 58.80 ± 1.21 56.15 ± 1.09 69.22 ± 3.11 68.93 ± 3.53 78.26 ± 1.51 78.23 ± 1.52 LSTM-FE 81,71 ± 1.60 70.49 ± 3.87 51.89 ± 0.97 51.80 ± 1.03 63.35 ± 0.54 62.20 ± 0.49 77.15 ± 0.61 77.05 ± 0.69 TRF-FE 78.61 ± 3.34 67.01 ± 4.09 62.29 ± 2.49 58.83 ± 2.16 66.41 ± 1.64 65.77 ± 1.97 75.18 ± 0.81 74.90 ± 0.92 LSTM-SL 79.90 ± 2.72 69.70 ± 4.59 53.70 ± 1.94 53.08 ± 1.52 64.98 ± 0.59 64.34 ± 0.63 77.54 ± 1.42 77.46 ± 1.52 TRF-SL 78.60 ± 1.25 67.66 ± 2.35 64.24 ± 3.27 60.98 ± 1.95 68.74 ± 1.70 68.28 ± 2.08 78.88 ± 0.25 78.83 ± 0.25 L TN-Data-L 81.44 ± 2.50 71.31 ± 5.37 54,60 ± 0,89 53,61 ± 0,92 71.33 ± 0.14 70.85 ± 0.20 76.65 ± 0.66 76.53 ± 0.70 L TN-Data-T 80.31 ± 1.61 74.54 ± 3.33 60,33 ± 3,81 57,51 ± 2,84 75.07 ± 0.81 74.21 ± 0.48 77.09 ± 1.12 76.94 ± 1.27 L TN-L 92.68 ± 0.75 91.18 ± 0.56 64.50 ± 1.76 62.59 ± 0.49 76.59 ± 0.14 75.10 ± 0.33 79.89 ± 0.50 79.89 ± 0.50 L TN-T 93.15 ± 1.59 92.06 ± 1.59 64.66 ± 3.44 61.41 ± 2.20 76.99 ± 0.37 75.40 ± 0.31 80.20 ± 0.25 80.19 ± 0.25 constrain t is considered applicable when a preﬁx satisﬁes the constraint’s conditions, and it is satisﬁed when the mo del’s prediction complies with the constrain t. 5.3 Exp erimen tal Results Our ev aluation demonstrates signiﬁcant p erformance diﬀerences among the three arc hitectural v arian ts. As sho wn in T ab. 1, the enhanced L TN with domain kno wledge consisten tly outperforms b oth the baseline LSTM and basic L TN mo dels. R Q1: Prediction P erformance L TNs enriched with domain knowledge out- p erform both purely data-driven counterparts and mo dels using semantic loss across all datasets. The p erformance gains are more pronounced on smaller datasets: Sepsis ac hieves an F1 impro v ement of 18.4% (TFR → L TN-T), and BPIC12 improv es by 10.1% (LSTM → L TN-L). On larger datasets, such as T r aﬃc ﬁnes , the improv emen ts are more mo dest (1.96% TFR → L TN-T). This attenua- tion is likely due to the abundance of training samples, whic h reduces the relativ e inﬂuence of logical rules during learning. Notably , the mo del achiev es p erfor- mance impro vemen ts even in settings where the num b er of compliant traces in the training set is limited (3.18% for Sepsis and 13.09% for BPIC12 ). Overall, these results highligh t the eﬀectiveness of incorp orating domain knowledge into predictiv e pro cess monitoring, esp ecially when data is limited. R Q2: Knowledge Injection Impact W e analyzed the impact of diﬀeren t neuro-sym b olic integration strategies. The results highlight arc hitectural trade- oﬀs (see Fig. 4): – Output Reﬁnement (B): Consistently delivered the strongest p erformance impro vemen ts, eﬀectively lev eraging new compliance constraints ev en when training data con tained few compliant examples. – F eature Expansion (A): Demonstrated limited eﬀectiveness compared to out- put reﬁnement, as reﬂected in b oth classiﬁcation metrics and compliance scores. 14 F. De Santis et al. Accuracy F1 Precision Recall 65 70 75 80 85 90 95 Compliance 0 25 50 75 100 Score TFR L TN-Data- T L TN_B L TN_A L TN_AB L TN_B C LNT_AC L TN_AB C (a) Sepsis Accuracy F1 Precision Recall 45 50 55 60 65 70 75 80 Compliance 0 25 50 75 100 Score TFR L TN-Data- T L TN_B L TN_A L TN_AB L TN_B C LNT_AC L TN_AB C (b) BPIC2012 Accuracy F1 Precision Recall 65 70 75 80 Compliance 0 25 50 75 100 Score TFR L TN-Data- T L TN_B L TN_A L TN_AB L TN_B C LNT_AC L TN_AB C (c) BPIC2017 Accuracy F1 Precision Recall 74 76 78 80 82 Compliance 0 25 50 75 100 Score TFR L TN-Data- T L TN_B L TN_A L TN_AB L TN_B C LNT_AC L TN_AB C (d) T raﬃc ﬁnes Fig. 4: Ablation study for L TN-T of the diﬀerent injection metho ds with 5 dif- feren t seeds. Injection tec hniques were ﬁrst ev aluated individually (i.e., feature expansion L TN_A and output reﬁnemen t L TN_B), follow ed b y an ev aluation of their p ossible com binations, where the L TN_ABC is the main conﬁguration. The letters used to denote the conﬁgurations corresp ond to the injection meth- o ds in Fig. 3. – P arallel Constrain ts (C): A dding constrain ts unrelated to the main predic- tion task sligh tly impro ves classiﬁcation p erformance when com bined with conﬁguration B. While such constrain ts typically increase computational complexit y when mo deled using neural net works, they are implemented as deterministic functions without additional neural mo dels. These constrain ts migh t prov e v aluable in multi-task scenarios, suc h as simultaneously predict- ing the next activit y . R Q3: Compliance Assessmen t A k ey adv antage of the enhanced L TN is its ability to consistently satisfy the imp osed logical constraints b y directly en- co ding them into the training process, achieving high compliance scores even in settings where compliant traces are scarce. While seman tic-loss models ma y in some cases yield improv emen ts o ver purely data-driven approaches, suc h gains are not consistent, as the seman tic-loss functions act merely as regularizers. As sho wn in Fig. 5, the baseline mo dels exhibit competitive compliance scores on the BPIC2017 log, largely due to the high prev alence of compliant traces in the training set (43.92%), which allows statistical correlations to capture v alid patterns. How ev er, purely data-driven approac hes struggle to generalize when in- tro duced to new logical constraints or when compliant patterns are statistically rare, as observed in Sepsis (4.55%) and BPIC2012 (13.09%). In contrast, our Compliance-A ware Predictive Pro cess Monitoring 15 LSTM TFR LSTM-FE TFR-FE LSTM-SL TFR-SL L TN-L L TN- T Model 0 20 40 60 80 100 Compliance (a) Sepsis LSTM TFR LSTM-FE TFR-FE LSTM-SL TFR-SL L TN-L L TN- T Model 0 20 40 60 80 100 Compliance (b) BPIC2012 LSTM TFR LSTM-FE TFR-FE LSTM-SL TFR-SL L TN-L L TN- T Model 0 20 40 60 80 100 Compliance (c) BPIC2017 LSTM TFR LSTM-FE TFR-FE LSTM-SL TFR-SL L TN-L L TN- T Model 0 20 40 60 80 100 Compliance (d) T raﬃc ﬁnes Fig. 5: Compliance scores obtained with 5 diﬀerent seeds comparing purely data- driv en baselines with knowledge-enhanced v ariants. neuro-sym b olic approach explicitly embeds these constraints into the learning ob jectiv e. Consequently , the enhanced L TN demonstrates sup erior adaptability , main taining high compliance scores even when the historical training distribu- tions do not hea vily supp ort sp eciﬁc pro cess rules. 6 Conclusion In this w ork, w e presen ted a neuro-symbolic approac h based on L TNs for pre- dictiv e pro cess monitoring, leveraging business rules and logical constrain ts to enhance predictive and compliance adherence in compliance-aw are settings. Ex- p erimen tal results show ed that our metho d can eﬀectively lev erage injected con- strain ts. Our approach demonstrated the ability to build a reliable predictive mo del, thanks to the injected kno wledge, even when the training set lac ks com- plian t traces. These ﬁndings show the p otential of neuro-symbolic learning in predictiv e process monitoring, esp ecially in scenarios where domain knowledge and logical consistency are critical. In future w ork, w e plan to formalize an explicit representation deﬁning sp eciﬁc templates to automatically connect fea- tures with rules. Moreov er, we aim to ev aluate our neuro-symbolic approach on a case study where domain knowledge pla ys a critical role, such as in healthcare. Finally , we plan to extend the approach to more pro cess-a w are tasks, suc h as predictions related to next ev ents and time-related asp ects. A c knowledgemen ts F. De Santis is supp orted b y the Italian Ministry of Univ ersity and Research (MUR) under the National Recov ery and Resilience Plan (NRRP), Mission 4, Comp onen t 1, In vestmen t 4.1, CUP D91I23000080006, funded b y the Europ ean Union - NextGenerationEU. 16 F. De Santis et al. References 1. v an der Aalst, W.M.P ., Pesic, M., Schonen b erg, H.: Declarative w orkﬂows: Balanc- ing b et ween ﬂexibility and support. Comput. Sci. Res. Dev. 23 (2), 99–113 (2009). https://doi.org/10.1007/S00450- 009- 0057- 9 2. Amiri Ely asi, K., v an der Aa, H., Stuck ensc hmidt, H.: Pgtnet: A pro cess graph transformer net work for remaining time prediction of business pro cess instances. In: International Conference on A dv anced Information Systems Engineering. pp. 124–140. Springer (2024) 3. Badreddine, S., d’A vila Garcez, A.S., Seraﬁni, L., Spranger, M.: Logic tensor net- w orks. Artif. In tell. 303 , 103649 (2022). https://doi.org/10.1016/J.ARTINT. 2021.103649 4. Bh uyan, B.P ., Ramdane-Cherif, A., T omar, R., Singh, T.P .: Neuro-sym b olic ar- tiﬁcial intelligence: a surv ey . Neural Comput. Appl. 36 (21), 12809–12844 (2024). https://doi.org/10.1007/S00521- 024- 09960- Z 5. Bukhsh, Z.A., Saeed, A., Dijkman, R.M.: Pro cesstransformer: Predictive business pro cess monitoring with transformer netw ork. CoRR abs/2104.00721 (2021) 6. Cafaro, M., Epicoco, I., Pulimeno, M.: Data mining: Mining frequen t patterns, asso ciations rules, and correlations. In: Encyclop edia of Bioinformatics and Com- putational Biology - V olume 1, pp. 358–366 (2019). https://doi.org/10.1016/ B978- 0- 12- 809633- 8.20472- X 7. Daniele, A., Seraﬁni, L.: Knowledge enhanced neural netw orks. In: Nay ak, A.C., Sharma, A. (eds.) PRICAI 2019: T rends in Artiﬁcial In telligence - 16th P a- ciﬁc Rim International Conference on Artiﬁcial In telligence, Cuvu, Y anuca Is- land, Fiji, August 26-30, 2019, Pro ceedings, P art I. Lecture Notes in Computer Science, v ol. 11670, pp. 542–554. Springer (2019). https://doi.org/10.1007/ 978- 3- 030- 29908- 8_43 8. Ev ans, R., Grefenstette, E.: Learning explanatory rules from noisy data. J. Artif. In tell. Res. 61 , 1–64 (2018). https://doi.org/10.1613/JAIR.5714 9. Fisc her, M., Baluno vic, M., Drac hsler-Cohen, D., Gehr, T., Zhang, C., V ec hev, M.T.: DL2: training and querying neural netw orks with logic. In: Chaudh uri, K., Salakh utdinov, R. (eds.) Pro ceedings of the 36th International Conference on Ma- c hine Learning, ICML 2019, 9-15 June 2019, Long Beach, California, USA. Pro- ceedings of Machine Learning Researc h, vol. 97, pp. 1931–1941. PMLR (2019) 10. F rancescomarino, C.D., Ghidini, C.: Predictive process monitoring. In: v an der Aalst, W.M.P ., Carmona, J. (eds.) Pro cess Mining Handb ook, Lecture Notes in Business Information Processing, vol. 448, pp. 320–346. Springer (2022). https: //doi.org/10.1007/978- 3- 031- 08848- 3_10 11. F rancescomarino, C.D., Ghidini, C., Maggi, F.M., P etrucci, G., Y eshc henko, A.: An ey e in to the future: Leveraging a-priori knowledge in predictive business process monitoring. In: Carmona, J., Engels, G., Kumar, A. (eds.) Business Pro cess Man- agemen t - 15th International Conference, BPM 2017, Barcelona, Spain, September 10-15, 2017, Pro ceedings. Lecture Notes in Computer Science, vol. 10445, pp. 252– 268. Springer (2017). https://doi.org/10.1007/978- 3- 319- 65000- 5_15 12. Giacomo, G.D., V ardi, M.Y.: Linear temp oral logic and linear dynamic logic on ﬁnite traces. In: Rossi, F. (ed.) IJCAI 2013, Pro ceedings of the 23rd International Join t Conference on Artiﬁcial Intelligence, Beijing, China, August 3-9, 2013. pp. 854–860. IJCAI/AAAI (2013) 13. Maggi, F.M., Mo oij, A.J., v an der Aalst, W.M.P .: User-guided discov ery of declar- ativ e process mo dels. In: Pro ceedings of the IEEE Symp osium on Computa- Compliance-A ware Predictive Pro cess Monitoring 17 tional Intelligence and Data Mining, CIDM. pp. 192–199. IEEE (2011). https: //doi.org/10.1109/CIDM.2011.5949297 14. Manhaev e, R., Dumancic, S., Kimmig, A., Demeester, T., Raedt, L.D.: Neura l probabilistic logic programming in deepproblog. Artif. In tell. 298 , 103504 (2021). https://doi.org/10.1016/J.ARTINT.2021.103504 15. Mezini, A., Umili, E., Donadello, I., Maggi, F.M., Mancanelli, M., P atrizi, F.: Neuro-sym b olic predictiv e process monitoring. CoRR abs/2509.00834 (2025). https://doi.org/10.48550/ARXIV.2509.00834 16. Oy amada, R.S., T av ares, G.M., Junior, S.B., Cera volo, P .: Enhancing predictiv e pro cess monitoring with time-related feature engineering. In: International Con- ference on Adv anced Information Systems Engineering. pp. 71–86. Springer (2024) 17. P eep erk orn, J., De V os, S.: Ac hieving group fairness through indep endence in pre- dictiv e pro cess monitoring. In: International Conference on Adv anced Information Systems Engineering. pp. 185–203. Springer (2025) 18. T aghiabadi, E.R., F ahland, D., v an Dongen, B.F., v an der Aalst, W.M.P .: Di- agnostic information for compliance c hecking of temp oral compliance require- men ts. In: Adv anced Information Systems Engineering - 25th In ternational Con- ference, CAiSE 2013, V alencia, Spain, June 17-21, 2013. Pro ceedings. Lecture Notes in Computer Science, vol. 7908, pp. 304–320. Springer (2013). https: //doi.org/10.1007/978- 3- 642- 38709- 8_20 19. T ax, N., V erenich, I., Rosa, M.L., Dumas, M.: Predictive business process mon- itoring with LSTM neural net works. In: Dubois, E., P ohl, K. (eds.) A dv anced Information Systems Engineering - 29th International Conference, CAiSE 2017, Essen, Germany , June 12-16, 2017, Pro ceedings. Lecture Notes in Computer Science, v ol. 10253, pp. 477–492. Springer (2017). https://doi.org/10.1007/ 978- 3- 319- 59536- 8_30 20. V azifehdo ostirani, M., Genga, L., Dijkman, R.M.: Enco ding high-lev el control-ﬂo w construct information for process outcome prediction. In: Burattin, A., P olyvyan yy , A., W eb er, B. (eds.) 4th International Conference on Pro cess Mining, ICPM 2022, Bolzano, Italy , Octob er 23-28, 2022. pp. 48–55. IEEE (2022). https://doi.org/ 10.1109/ICPM57379.2022.9980737 21. Xu, J., Zhang, Z., F riedman, T., Liang, Y., den Bro eck, G.V.: A semantic loss function for deep learning with symbolic kno wledge. In: Dy , J.G., Krause, A. (eds.) Pro ceedings of the 35th Interna tional Conference on Mac hine Learning, ICML 2018, Stockholmsmässan, Sto c kholm, Sw eden, July 10-15, 2018. Proceedings of Mac hine Learning Research, vol. 80, pp. 5498–5507. PMLR (2018) 22. Zhou, W., Polyvy an yy , A., Bailey , J.: Pro cess mo del forecasting using deep tem- p oral learning. In: In ternational Conference on A dv anced Information Systems Engineering. pp. 294–312. Springer (2025)

Compliance-Aware Predictive Process Monitoring: A Neuro-Symbolic Approach

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment