Modelling common cause failures of large digital I&C systems with coloured Petri nets

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

The purpose of this study is the representation of Common Cause Failures (CCF) in large digital systems. The system under study is representative of a control system of a nuclear plant. The model for CCF is the generalized Atwood model. It can represent independent failures, CCF non-lethal for some system elements and CCF lethal to all. The Atwood model was modified to “direct” non-lethal DCC on certain parts of the system and take into account the different possible origins of DCC. Maintenance and repairs are taken into account in the model that is thus dynamic. The main evaluation results are probabilistic, the considered indicator is the probability of failure on demand (PFD). A comparison is made between the estimator of the PFD taking into account all the failures and the estimator taking into account only the detected failures.

💡 Research Summary

The paper presents a comprehensive methodology for modeling and evaluating Common Cause Failures (CCF) in large digital Instrumentation & Control (I&C) systems, using a nuclear power plant control system as a representative case study. The authors adopt the generalized Atwood model, which distinguishes three failure modes: independent failures, non‑lethal (partial) CCF that affect only a subset of components, and lethal (global) CCF that incapacitate the entire system. Recognizing that in real‑world digital I&C architectures non‑lethal CCF often target specific functional blocks (e.g., a particular sensor channel or a logic module), the authors extend the Atwood formulation to “direct” non‑lethal Digital CCF (DCC) onto predefined parts of the system. This extension allows the model to capture the spatial concentration of partial failures, a capability lacking in traditional Atwood implementations.

To implement the extended model, the authors employ Coloured Petri Nets (CPN), a formalism that combines the graphical clarity of Petri nets with the expressive power of token colours (data attributes). Each I&C component is represented by a place that can hold tokens indicating its current state: normal, failed, or under repair. Independent failures are modeled as stochastic transitions with a rate λ for each component. Non‑lethal DCC events are introduced as transitions that fire with probability β and generate tokens of a specific colour; these tokens are routed only to the places corresponding to the targeted subsystem, thereby limiting the impact of the failure. Lethal DCC events are modeled as transitions with probability γ that simultaneously affect all component places, driving the whole system into a failed state.

Maintenance and repair processes are explicitly modeled as additional transitions and timed delays. Periodic inspections, fault detection, and corrective actions each have their own stochastic timing distributions (exponential, normal, etc.), allowing the net to evolve dynamically over long operational periods. This dynamic treatment overcomes the static assumptions of conventional Markov reliability models, which cannot easily represent time‑varying maintenance policies or the interaction between failure detection and repair.

The primary performance metric evaluated is the Probability of Failure on Demand (PFD), a standard safety‑critical indicator for safety‑instrumented systems. Two PFD estimators are derived from the CPN simulations: (1) a “total‑failure” PFD that includes all failure modes (independent, non‑lethal DCC, lethal DCC) and (2) a “detected‑failure” PFD that accounts only for failures that are identified by the system’s diagnostic or external monitoring functions. Simulation results reveal that when non‑lethal DCC is confined to specific subsystems, the total‑failure PFD remains relatively low, reflecting the limited scope of partial failures. However, the detected‑failure PFD can be substantially higher, highlighting the critical role of fault‑detection coverage and maintenance response in overall system reliability.

A sensitivity analysis on the Atwood parameters β (non‑lethal CCF proportion) and γ (lethal CCF proportion) further clarifies their impact on PFD. Increasing β raises the frequency of partial failures, but because these failures are localized, the overall system availability degrades only modestly. In contrast, a rise in γ leads to a sharp increase in PFD, as a single lethal DCC event disables the entire control system. These findings provide quantitative guidance for design decisions such as redundancy allocation, functional segregation, and independent power supplies, which aim to reduce γ or mitigate its consequences.

In summary, the study demonstrates that a CPN‑based dynamic model, enriched with a directed non‑lethal DCC extension of the Atwood model, can accurately capture the complex interplay of independent failures, partial common‑cause events, lethal common‑cause events, and maintenance activities in large digital I&C systems. The approach yields actionable insights into how fault‑detection capabilities and maintenance strategies influence the PFD, thereby supporting more informed safety‑case arguments for nuclear power plants and other high‑integrity digital control environments. Future work is suggested to focus on empirical calibration of model parameters, integration with real‑time monitoring data, and extension to multi‑plant or networked control architectures.

Modelling common cause failures of large digital I&C systems with coloured Petri nets

💡 Research Summary

Comments & Academic Discussion

Leave a Comment