Packed Malware Detection Using Grayscale Binary-to-Image Representations

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Detecting packed executables is a critical step in malware analysis, as packing obscures the original code and complicates static inspection. This study evaluates both classical feature-based methods and deep learning approaches that transform binary executables into visual representations, specifically, grayscale byte plots, and employ convolutional neural networks (CNNs) for automated classification of packed and non-packed binaries. A diverse dataset of benign and malicious Portable Executable (PE) files, packed using various commercial and open-source packers, was curated to capture a broad spectrum of packing transformations and obfuscation techniques. Classical models using handcrafted Gabor jet features achieved intense discrimination at moderate computational cost. In contrast, CNNs based on VGG16 and DenseNet121 significantly outperformed them, achieving high detection performance with well-balanced precision, recall, and F1-scores. DenseNet121 demonstrated slightly higher precision and lower false positive rates, whereas VGG16 achieved marginally higher recall, indicating complementary strengths for practical deployment. Evaluation against unknown packers confirmed robust generalization, demonstrating that grayscale byte-plot representations combined with deep learning provide a useful and reliable approach for early detection of packed malware, enhancing malware analysis pipelines and supporting automated antivirus inspection.

💡 Research Summary

The paper addresses the problem of detecting packed Windows Portable Executable (PE) files, a crucial preprocessing step in malware analysis because packing obscures the original code and hampers static inspection. The authors propose a visual‑based static analysis pipeline that converts each binary into a grayscale “byte‑plot” image, then applies both classical feature‑based classifiers and deep convolutional neural networks (CNNs) to discriminate packed from non‑packed samples.

Dataset construction – A balanced corpus of roughly 10,000 PE files (5 k benign, 5 k malicious) was assembled. Each sample was processed with a variety of commercial and open‑source packers (e.g., UPX, Themida, ASPack, MPRESS) to generate a diverse set of packed binaries. An additional “unknown‑packer” test set comprising five recent packers not seen during training was created to assess generalization.

Binary‑to‑image conversion – The raw byte stream is reshaped into a 2‑D matrix of fixed width; excess bytes are padded with zeros or truncated, yielding a consistent image size. This representation preserves structural cues such as section size anomalies, entropy spikes, and compression patterns, which manifest as distinctive textures and frequency changes in the resulting grayscale image.

Classical pipeline – Gabor filters are applied to the byte‑plot to extract multi‑scale, multi‑orientation jet features. These handcrafted descriptors are fed to eight traditional machine‑learning algorithms (K‑NN, Logistic Regression, Random Forest, SVM, MLP, XGBoost, etc.). The Gabor‑based approach achieves respectable performance (≈85 % accuracy) with low computational overhead, demonstrating that visual texture alone carries discriminative information about packing.

Deep learning pipeline – Transfer learning is employed using two well‑known ImageNet‑pretrained CNNs: VGG16 and DenseNet‑121. The first five convolutional blocks are frozen; a global average‑pooling layer followed by two fully‑connected layers constitutes the classifier head. Training uses the Adam optimizer (lr = 1e‑4), batch size = 32, and 30 epochs, with data augmentation (rotations, flips, Gaussian noise) to mitigate over‑fitting.

VGG16: Precision = 94.8 %, Recall = 96.2 %, F1 ≈ 95.5 %, overall accuracy ≈ 95.4 %.
DenseNet‑121: Precision = 96.5 %, Recall = 94.9 %, F1 ≈ 95.7 %, overall accuracy ≈ 96.1 %.

Both CNNs outperform the classical models by 8–12 percentage points and exhibit balanced class‑wise performance. DenseNet‑121 yields a slightly lower false‑positive rate, while VGG16 attains marginally higher recall, suggesting complementary deployment options.

Generalization to unseen packers – When evaluated on the “unknown‑packer” set, both CNNs retain F1 scores above 90 %, confirming that the visual patterns learned are not tied to specific packer signatures but rather to generic compression/entropy characteristics.

Practical implications – Early detection of packed binaries enables security pipelines to route only truly packed samples to unpacking and dynamic analysis stages, reducing unnecessary computational load, lowering false‑positive alerts, and improving overall triage efficiency.

Limitations and future work – Image resolution selection heavily influences memory consumption; very large binaries may require down‑sampling or tiled processing. Adversarial packing techniques that deliberately inject random bytes or manipulate the byte‑plot could degrade detection. The authors suggest extending the approach with multi‑channel inputs (e.g., combining raw byte‑plot with entropy heatmaps), multi‑scale CNN architectures, and hybrid models that fuse static visual cues with dynamic behavioral features.

In summary, the study demonstrates that grayscale byte‑plot representations, when coupled with transfer‑learning CNNs, provide a robust, scalable, and accurate method for early packed‑malware detection, outperforming traditional handcrafted‑feature classifiers and showing strong resilience against previously unseen packing schemes.

Packed Malware Detection Using Grayscale Binary-to-Image Representations

💡 Research Summary

Comments & Academic Discussion

Leave a Comment