Security Metrics in Industrial Control Systems
Risk is the best known and perhaps the best studied example within a much broader class of cyber security metrics. However, risk is not the only possible cyber security metric. Other metrics such as resilience can exist and could be potentially very valuable to defenders of ICS systems. Often, metrics are defined as measurable properties of a system that quantify the degree to which objectives of the system are achieved. Metrics can provide cyber defenders of an ICS with critical insights regarding the system. Metrics are generally acquired by analyzing relevant attributes of that system. In terms of cyber security metrics, ICSs tend to have unique features: in many cases, these systems are older technologies that were designed for functionality rather than security. They are also extremely diverse systems that have different requirements and objectives. Therefore, metrics for ICSs must be tailored to a diverse group of systems with many features and perform many different functions. In this chapter, we first outline the general theory of performance metrics, and highlight examples from the cyber security domain and ICS in particular. We then focus on a particular example of a class of metrics that is different from the one we have considered in earlier chapters. Instead of risk, here we consider metrics of resilience. Resilience is defined by the National Academy of Sciences (2012) as the ability to prepare and plan for, absorb, recover from, or more successfully adapt to actual or potential adverse events. This chapter presents two approaches for the generation of metrics based on the concept of resilience using a matrix-based approach and a network-based approach. Finally, a discussion of the benefits and drawbacks of different methods is presented along with a process and tips intended to aid in devising effective metrics.
💡 Research Summary
This chapter addresses the design and application of security metrics specifically for Industrial Control Systems (ICS), emphasizing the need to move beyond traditional risk‑centric measures toward resilience‑based metrics. The authors begin by outlining the general concept of performance metrics in cybersecurity, noting that risk—defined as the product of threat, vulnerability, asset value, and potential loss—has been the dominant focus of research and practice. While risk metrics are valuable for identifying and prioritizing threats, they do not capture how well an ICS can continue operating after an adverse event, a shortcoming that becomes critical given the unique characteristics of many control environments.
ICS assets often consist of legacy hardware and software that were originally engineered for functionality and real‑time performance rather than security. They are highly heterogeneous, employ a variety of proprietary protocols, and frequently lack built‑in security controls. Consequently, a one‑size‑fits‑all risk metric is insufficient; defenders need a metric that reflects the system’s ability to prepare for, absorb, recover from, and adapt to disruptions. The authors adopt the National Academy of Sciences (2012) definition of resilience, which partitions the concept into four sequential capabilities: preparation, absorption, recovery, and adaptation. Each capability can be expressed through measurable attributes such as detection latency, mean time to repair, redundancy levels, and adaptive reconfiguration speed.
Two concrete methodologies for constructing resilience metrics are presented.
-
Matrix‑Based Approach
- Structure: Rows represent representative threat scenarios (e.g., malware infection, physical sabotage, supply‑chain compromise). Columns correspond to the four resilience functions (prepare, absorb, recover, adapt).
- Scoring: Each cell receives a quantitative score derived from expert elicitation, historical incident data, or structured questionnaires. The score reflects the effectiveness of a given function against a specific scenario.
- Usage: The completed matrix offers a balanced view of where the system is strong or weak, supports gap analysis, and is easily communicated to management and regulators. Its main advantage is simplicity and visual clarity, but it relies heavily on subjective judgments and may overlook complex interdependencies.
-
Network‑Based Approach
- Modeling: System components (PLCs, RTUs, HMIs, sensors, communication links) are modeled as nodes; their interactions form edges.
- Attributes: Nodes are assigned weights for criticality, vulnerability score, and recovery time. Edges receive weights for link strength, latency, and protection mechanisms.
- Metric Calculation: Graph‑theoretic measures—such as network efficiency, average shortest‑path recovery time, and robustness indices—are computed to produce an overall resilience score. This method captures cascading effects, inter‑component dependencies, and the propagation of failures across the control network. Its strengths lie in analytical depth and the ability to simulate “what‑if” attack scenarios; however, it demands extensive data collection, sophisticated modeling tools, and careful calibration of weights.
The chapter proceeds to compare the two approaches. The matrix method excels in ease of implementation, stakeholder alignment, and rapid gap identification, making it suitable for organizations with limited resources or those seeking a high‑level dashboard. The network method, by contrast, provides a granular, systemic view that is essential for large‑scale, highly interconnected plants where indirect effects can be catastrophic. The authors caution that each method carries trade‑offs: subjectivity versus data intensity, static snapshot versus dynamic simulation, and ease of reporting versus analytical rigor.
To guide practitioners, a five‑step process for developing effective security metrics is proposed:
- Define Objectives – Align security metrics with business goals, regulatory requirements, and operational priorities.
- Collect Data – Gather logs, configuration inventories, vulnerability scan results, incident reports, and performance statistics.
- Select Modeling Technique – Choose matrix, network, or a hybrid based on system complexity, data availability, and stakeholder needs.
- Validate – Conduct pilot studies, scenario‑based simulations, and expert reviews to ensure the metric reflects real‑world behavior.
- Iterate and Refine – Incorporate feedback from operations, update data sources, and adjust weighting schemes on a regular cadence.
Practical tips include: (a) integrating metrics into visual dashboards that link directly to decision‑making processes; (b) running regular tabletop exercises and live‑fire drills to test whether the metrics predict actual recovery performance; (c) assigning clear ownership for metric maintenance and establishing periodic review meetings; (d) mapping metrics to existing standards such as IEC 62443, NIST CSF, or ISO 27001 to satisfy compliance while driving improvement; and (e) leveraging automation for data ingestion, metric calculation, and report generation to reduce manual effort and increase timeliness.
In conclusion, while risk‑based metrics remain valuable for threat identification and prioritization, they fall short of addressing the continuity challenges inherent in modern industrial environments. Resilience‑oriented metrics, built through either matrix or network methodologies, provide a more comprehensive picture of an ICS’s ability to withstand and recover from adverse events. By quantifying preparation, absorption, recovery, and adaptation, these metrics enable operators to make informed investment decisions, prioritize hardening activities, and ultimately sustain safe and reliable plant operations. The authors advocate for a blended approach—using the matrix for high‑level governance and the network model for deep technical analysis—to achieve a robust, actionable security measurement framework tailored to the diverse and evolving landscape of industrial control systems.
Comments & Academic Discussion
Loading comments...
Leave a Comment