Enhancing Decision-Making in Windows PE Malware Classification During Dataset Shifts with Uncertainty Estimation

February 18, 2026

Reading time: 5 minute

...

📝 Original Info

Title: Enhancing Decision-Making in Windows PE Malware Classification During Dataset Shifts with Uncertainty Estimation
ArXiv ID: 2512.18495
Date: 2025-12-20
Authors: Rahul Yumlembam, Biju Issac, Seibu Mary Jacob

📝 Abstract

Artificial intelligence techniques have achieved strong performance in classifying Windows Portable Executable (PE) malware, but their reliability often degrades under dataset shifts, leading to misclassifications with severe security consequences. To address this, we enhance an existing LightGBM (LGBM) malware detector by integrating Neural Networks (NN), PriorNet, and Neural Network Ensembles, evaluated across three benchmark datasets: EMBER, BODMAS, and UCSB. The UCSB dataset, composed mainly of packed malware, introduces a substantial distributional shift relative to EMBER and BODMAS, making it a challenging testbed for robustness. We study uncertainty-aware decision strategies, including probability thresholding, PriorNet, ensemble-derived estimates, and Inductive Conformal Evaluation (ICE). Our main contribution is the use of ensemble-based uncertainty estimates as Non-Conformity Measures within ICE, combined with a novel threshold optimisation method. On the UCSB dataset, where the shift is most severe, the state-of-the-art probability-based ICE (SOTA) yields an incorrect acceptance rate (IA%) of 22.8%. In contrast, our method reduces this to 16% a relative reduction of about 30% while maintaining competitive correct acceptance rates (CA%). These results demonstrate that integrating ensemble-based uncertainty with conformal prediction provides a more reliable safeguard against misclassifications under extreme dataset shifts, particularly in the presence of packed malware, thereby offering practical benefits for real-world security operations.

💡 Deep Analysis

📄 Full Content

JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2021 1 Enhancing Decision-Making in Windows PE Malware Classification During Dataset Shifts with Uncertainty Estimation Rahul Yumlembam, Biju Issac, and Seibu Mary Jacob Abstract—Artificial intelligence techniques have achieved strong performance in classifying Windows Portable Executable (PE) malware, but their reliability often degrades under dataset shifts, leading to misclassifications with severe security conse- quences. To address this, we enhance an existing LightGBM (LGBM) malware detector by integrating Neural Networks (NN), PriorNet, and Neural Network Ensembles, evaluated across three benchmark datasets: EMBER, BODMAS, and UCSB. The UCSB dataset, composed mainly of packed malware, intro- duces a substantial distributional shift relative to EMBER and BODMAS, making it a challenging testbed for robustness. We study uncertainty-aware decision strategies, including probability thresholding, PriorNet, ensemble-derived estimates, and Induc- tive Conformal Evaluation (ICE). Our main contribution is the use of ensemble-based uncertainty estimates as Non-Conformity Measures within ICE, combined with a novel threshold optimi- sation method. On the UCSB dataset, where the shift is most severe, the state-of-the-art probability-based ICE (SOTA) yields an incorrect acceptance rate (IA%) of 22.8%. In contrast, our method reduces this to 16% a relative reduction of about 30% while maintaining competitive correct acceptance rates (CA%). These results demonstrate that integrating ensemble-based un- certainty with conformal prediction provides a more reliable safeguard against misclassifications under extreme dataset shifts, particularly in the presence of packed malware, thereby offering practical benefits for real-world security operations. Index Terms—Conformal Prediction, Windows PE Malware, Machine Learning, Deep Learning, Uncertainty estimation I. INTRODUCTION Artificial Intelligence (AI) and machine learning are pivotal in enhancing the detection, prevention, and response mech- anisms against Windows Portable Executable (PE) malware in cybersecurity. The Portable Executable (PE) format serves as the standard file format for executable programs within Microsoft’s 32-bit and 64-bit Windows operating systems [1]. PE files encompass various formats, including .exe files, dynamic link libraries (.dlls), BAT/Batch files (.bat), control panel applications (.cpl), kernel modules (.srv), device drivers (.sys), and numerous others, with .exe files being the most prevalent among them. Recent studies illustrate that machine learning and deep learning models, characterized by various representations of PE files, have achieved high accuracy in Rahul Yumlembam, and Biju Issac are with the Department of Computer and Information Sciences, Northumbria University, Newcastle, UK (email: r.yumlembam@northumbria.ac.uk, bijuissac@northumbria.ac.uk). Corresponding author: Biju Issac Seibu Mary Jacob is with the School of Computing, Engineering & Digital Technologies, Teesside University, Middlesbrough, UK (email: s.jacob@tees.ac.uk). malware detection. However, high accuracy only sometimes translates to reliability, especially in new or ambiguous cases. Typically, classifiers predict the class of an instance based on the highest probability, neglecting the confidence or uncer- tainty of this prediction. For instance, if a model predicts a PE file as benign with a 60% probability and as malware with a 40% probability, it classifies the file as benign despite the narrow margin, indicating low confidence in the prediction. Such borderline predictions pose risks in critical applications like cybersecurity, prompting the need for strategies to assess each prediction’s certainty. For example, a benign PE file such as a Windows update utility may be misclassified as benign with low confidence, while actually embedding a trojan downloader. This misclassification allows it to bypass security layers and trigger further payload downloads, leading to severe security breaches in enterprise networks. There are two broad categories which induce uncertainty in the classifier’s pre- diction: (1) Data uncertainty (Aleatoric Uncertainty), caused by measurement errors or inherent system randomness, and (2) Model uncertainty (Epistemic Uncertainty), resulting from inadequate knowledge or model limitations [25]. Address- ing model uncertainty is feasible through improvements in architecture, learning processes, or training data, while data uncertainty is irreducible [2]. Consider a scenario where a security operations centre de- ploys a machine learning model trained on historical malware samples. An attacker develops a slightly modified ransomware variant that incorporates new obfuscation techniques not seen during training. The model may classify the sample as benign with 55% probability and as malware with 45% probability, reflecting low confidence. In such borderline cases, relying solely on the h

📄 Read Full PDF on ArXiv