Engineering Safety in Machine Learning
Machine learning algorithms are increasingly influencing our decisions and interacting with us in all parts of our daily lives. Therefore, just like for power plants, highways, and myriad other engineered sociotechnical systems, we must consider the safety of systems involving machine learning. In this paper, we first discuss the definition of safety in terms of risk, epistemic uncertainty, and the harm incurred by unwanted outcomes. Then we examine dimensions, such as the choice of cost function and the appropriateness of minimizing the empirical average training cost, along which certain real-world applications may not be completely amenable to the foundational principle of modern statistical machine learning: empirical risk minimization. In particular, we note an emerging dichotomy of applications: ones in which safety is important and risk minimization is not the complete story (we name these Type A applications), and ones in which safety is not so critical and risk minimization is sufficient (we name these Type B applications). Finally, we discuss how four different strategies for achieving safety in engineering (inherently safe design, safety reserves, safe fail, and procedural safeguards) can be mapped to the machine learning context through interpretability and causality of predictive models, objectives beyond expected prediction accuracy, human involvement for labeling difficult or rare examples, and user experience design of software.
💡 Research Summary
The paper “Engineering Safety in Machine Learning” argues that, as machine‑learning (ML) algorithms become pervasive in everyday decision‑making, the discipline of safety engineering must be brought to bear on ML systems. Safety is formally defined as the minimization of both risk—the expected cost of harmful outcomes when the probability distribution of those outcomes is known—and epistemic uncertainty—the lack of knowledge about the true distribution, especially in rare or unexpected regimes. This dual definition distinguishes safety from the traditional statistical‑ML objective of minimizing average loss (empirical risk minimization, ERM).
The authors critique ERM and its extension, structural risk minimization (SRM), on three grounds. First, ERM assumes i.i.d. training data drawn from the true joint distribution of inputs and labels; in many safety‑critical contexts this assumption fails because of dataset shift, sampling bias, or insufficient coverage of low‑probability regions. Second, the loss functions used in ERM are abstract measures of prediction error and do not directly encode human‑centric costs such as loss of life, loss of liberty, or societal inequity. Third, ERM’s reliance on laws of large numbers guarantees convergence only asymptotically, whereas deployed systems operate on a finite, often small, set of test cases where the empirical risk can differ dramatically from the true risk. These gaps are especially problematic when the stakes are high.
To clarify where safety considerations are essential, the paper introduces a dichotomy of applications. Type A applications are those where model predictions influence consequential decisions—medical diagnosis, loan approval, criminal sentencing—where harms are severe, data are scarce or biased, and epistemic uncertainty is high. Type B applications involve low‑consequence, high‑volume tasks—video compression, news‑ranking, speech transcription—where large datasets and extensive A/B testing reduce uncertainty, making risk minimization sufficient.
The core contribution is mapping four classic engineering safety strategies to the ML context:
-
Inherently Safe Design – Exclude hazards by design. In ML this means preferring models that are interpretable and causally grounded, and deliberately omitting non‑causal or spurious features. Regularization or constraints beyond SRM are required to enforce interpretability or causal structure, even at some cost to predictive accuracy, thereby reducing epistemic uncertainty.
-
Safety Reserves – Build margins (safety factors) into the system. In ML this translates to robust optimization, distributional‑worst‑case (min‑max) training, or augmenting loss functions with safety‑multipliers that penalize high‑cost errors more heavily. Such reserves protect against unseen distribution shifts and rare events.
-
Safe Fail – Design graceful degradation pathways. When a model’s confidence is low or an out‑of‑distribution input is detected, the system should defer to a human operator or switch to a conservative fallback. Human‑in‑the‑loop labeling of rare or ambiguous cases, active learning, and anomaly detection are concrete mechanisms.
-
Procedural Safeguards – Implement operational policies, monitoring, and user‑experience controls. Continuous performance auditing, logging, real‑time risk dashboards, and UI designs that surface uncertainty to end‑users help catch failures early and guide corrective action.
The paper’s contributions are: (i) a precise, decision‑theoretic definition of safety for ML; (ii) a theoretical analysis showing why ERM alone cannot guarantee safety in high‑risk domains; (iii) the Type A/Type B taxonomy that clarifies when additional safety engineering is required; and (iv) a concrete mapping of engineering safety strategies to ML methodology, providing a research agenda for safety‑aware model design, robust training, human‑in‑the‑loop systems, and operational safeguards. Future work is suggested on quantifying safety margins, automating causal feature selection, and building real‑time safety monitoring frameworks, all aimed at ensuring that increasingly powerful ML systems can be deployed responsibly in society.
Comments & Academic Discussion
Loading comments...
Leave a Comment