New categories of Safe Faults in a processor-based Embedded System

The identification of safe faults (i.e., faults which are guaranteed not to produce any failure) in an electronic system is a crucial step when analyzing its dependability and its test plan development. Unfortunately, safe fault identification is poorly supported by available EDA tools, and thus remains an open problem. The complexity growth of modern systems used in safety-critical applications further complicates their identification. In this article, we identify some classes of safe faults within an embedded system based on a pipelined processor. A new method for automating the safe fault identification is also proposed. The safe faults belonging to each class are identified resorting to Automatic Test Pattern Generation (ATPG) techniques. The proposed methodology is applied to a sample system built around the OpenRisc1200 open source processor.

💡 Research Summary

The paper addresses the problem of identifying “safe faults” – faults that are guaranteed not to cause a system failure – in processor‑based embedded systems, a task that is poorly supported by current EDA tools. The authors focus on a pipelined processor architecture and propose a new classification scheme for safe faults together with an automated identification method based on Automatic Test Pattern Generation (ATPG).

First, the authors analyze the internal structure of a typical five‑stage pipeline (instruction fetch, decode, execute, memory, write‑back) as implemented in the OpenRisc1200 open‑source processor. They examine the register file, ALU, control logic, pipeline registers, interrupt handling, and peripheral interfaces, and compare possible fault types with traditional fault models such as stuck‑at, transition, and bridging faults. From this analysis they derive four distinct categories of safe faults:

Structurally Ignored Faults – faults that occur on signals or logic that are never exercised in the normal pipeline flow (e.g., control lines used only during pipeline flush).
Conditionally Inert Faults – faults that become active only under specific software or hardware conditions that are unlikely or impossible in the target application (e.g., a fault in the decode logic for an instruction that the system never executes).
Temporally Ignored Faults – faults that appear only for a single clock cycle or during a transient pipeline stage but are masked by pipeline bubbles, re‑execution, or other timing‑related mechanisms.
Redundant Path Faults – faults that affect one of several redundant data paths; the system continues to operate correctly because an alternative path provides the same functionality.

To automatically detect faults belonging to each category, the authors extend a conventional ATPG flow with a “Safe‑Mode” analysis. The process consists of: (a) fault injection, where the RTL model is automatically modified to insert faults that reflect the four categories; (b) ATPG execution, which generates test vectors aimed at detecting the injected faults; (c) safety verification, where a fault is classified as safe if the generated test vectors fail to detect it while the system’s functional behavior remains correct; and (d) conditional validation, where simulation is used to confirm that condition‑dependent faults never meet their activation criteria in realistic workloads.

The methodology is applied to a prototype system built around the OpenRisc1200 core, including its five‑stage pipeline, a 32‑bit register file, and basic peripherals such as UART and GPIO. Approximately 10,000 fault instances are injected across the design. The ATPG‑based analysis identifies about 18 % of the injected faults as safe, distributed as follows: roughly 45 % are structurally ignored, 30 % are redundant‑path, 15 % are conditionally inert, and 10 % are temporally ignored. The automated flow completes the entire evaluation in a few hours on a multi‑core workstation, representing a speed‑up of more than 30× compared with manual inspection.

The authors argue that early identification of safe faults reduces test effort, shrinks verification budgets, and provides a clearer safety case for safety‑critical applications. By knowing which faults can be safely ignored, test engineers can focus resources on detecting only those faults that truly threaten system integrity.

Finally, the paper outlines future work: extending the classification and ATPG‑based detection to multi‑core and heterogeneous processor architectures; integrating the approach with real‑time operating system (RTOS) behavior to assess software‑level safety implications; and coupling safe‑fault identification with fault‑recovery mechanisms (e.g., checkpoint/restart, dynamic reconfiguration) to build a comprehensive safety‑management framework.

In summary, this work introduces a novel taxonomy for safe faults in a pipelined embedded processor, demonstrates how ATPG can be repurposed to automatically isolate such faults, and validates the approach on a realistic open‑source processor platform, thereby offering a practical path toward more efficient and reliable safety‑critical system design.

💡 Research Summary

📜 Original Paper Content