Adaptive Hybrid Optimizer-based Framework for Lumpy Skin Disease Identification
š Original Paper Info
- Title: Adaptive Hybrid Optimizer based Framework for Lumpy Skin Disease Identification- ArXiv ID: 2601.01807
- Date: 2026-01-05
- Authors: Ubaidullah, Muhammad Abid Hussain, Mohsin Raza Jafri, Rozi Khan, Moid Sandhu, Abd Ullah Khan, Hyundong Shin
š Abstract
Lumpy Skin Disease (LSD) is a contagious viral infection that significantly deteriorates livestock health, thereby posing a serious threat to the global economy and food security. Owing to its rapid spread characteristics, early and precise identification is crucial to prevent outbreaks and ensure timely intervention. In this paper, we propose a hybrid deep learning-based approach called LUMPNet for the early detection of LSD. LUMPNet utilizes image data to detect and classify skin nodules -- the primary indicator of LSD. To this end, LUMPNet uses YOLOv11, EfficientNet-based CNN classifier with compound scaling, and a novel adaptive hybrid optimizer. More precisely, LUMPNet detects and localizes LSD skin nodules and lesions on cattle images. It exploits EfficientNet to classify the localized cattle images into LSD-affected or healthy categories. To stabilize and accelerate the training of YOLOv11 and EfficientNet hybrid model, a novel adaptive hybrid optimizer is proposed and utilized. We evaluate LUMPNet at various stages of LSD using a publicly available dataset. Results indicate that the proposed scheme achieves 99% LSD detection training accuracy, and outperforms existing schemes. The model also achieves validation accuracy of 98%. Moreover, for further evaluation, we conduct a case study using an optimized EfficientNet-B0 model trained with the AdamW optimizer, and compare its performance with LUMPNet. The results show that LUMPNet achieves superior performance.š” Summary & Analysis
1. **Key Contribution 1: Livestock Health Monitoring System** A new methodology combining deep learning and machine learning allows for real-time monitoring of the health status of cattle and pigs. This is akin to a doctor continuously checking on a patient's condition to respond quickly to any changes.-
Key Contribution 2: Accurate Disease Detection
Various deep learning models are utilized to identify Lumpy Skin Disease (LSD) in its early stages, enabling farmers to take preventive measures and manage animal health more effectively. This is similar to diagnosing a virus before it spreads on a large scale and treating it promptly. -
Key Contribution 3: Data-Driven Decision Making
Evaluating the performance of different models helps develop optimized disease control and management strategies. It’s like a farmer accurately assessing livestock health status to allocate resources efficiently.
š Full Paper Content (ArXiv Source)
Deep learning, Image data, Lumpy skin disease, Precision livestock farming
Introduction
Precision livestock farming utilizes automated structures to monitor livestock, enhancing their production, reproduction, health, welfare, and environmental impact . It plays a vital role in the early detection of diseases and contamination in animals. Specifically, pores and skin parasitic infections pose a chief financial undertaking within the livestock enterprise. These infections cause discomfort, impairing the animalās capacity to relax and eat well. Moreover, those infections also compromise the skinās innate defense mechanisms, which in turn increase susceptibility to bacterial infections. Certain ectoparasites, which include blood-feeding flies and precise tick species, are economically significant as they transmit diverse illnesses. The pores and skin lesions attributable to those parasites are specifically harmful.
| Disease | Skin Nodules Description | Fatality Rate | Early Detection Indicators | Other Metrics | Nodules Detection Importance |
|---|---|---|---|---|---|
| Lumpy Skin Disease (LSD) | Severe nodules on the skin, mainly on the neck and back | High | Fever, swelling, reduced milk yield | Requires early diagnosis to prevent spread | Very high |
| Bovine Tuberculosis (TB) | Granulomas (nodules) may appear on skin or internal organs | Low | Coughing, weight loss, and fever | Skin nodules often internal, may require advanced tests | Moderate |
| Poxvirus Infections | Nodular lesions, usually around the head or genital area | Low to Moderate | Skin lesions, scabs, fever | Can affect other animals; highly contagious | Moderate |
| Bluetongue | Nodules around the head, ears, and neck | Moderate | Swelling, drooling, fever, lameness | Often seasonal, transmitted by vectors | High |
| Contagious Ecthyma (Orf) | Scab-like lesions around mouth, udder, and teats | Low | Lesions around the mouth or udder | Highly infectious, zoonotic | High |
| Mycotic Dermatitis | Nodules and crusts formed due to fungal infection | Low | Scaly skin, itching, hair loss | Can spread through contaminated environments | Moderate |
| Schmallenberg Virus (SBV) | Skin nodules with fever and swelling of joints | Low (usually) | Fever, swelling, reduced appetite | Vector-borne, affects pregnant cattle | High |
| Bovine Parapoxvirus | Nodules on the udder, teats, and skin | Low | Lesions on udder and teats | Zoonotic, easily spread in unsanitary conditions | High |
Livestock, consisting of farm animals, cows, and buffalo, are vulnerable to a viral infection referred to as Lumpy Skin Disease (LSD) Ā . This disease is transmitted by means of blood-feeding insects like ticks, flies, and mosquitoes. In addition to causing fever and skin nodules, the virus can be fatal, especially for animals that have never been exposed to it before. Control measures include vaccination and culling infected animals. LSD results in considerable financial losses, affecting species such as cows and water buffalo. The disease is caused by the LSD virus, which belongs to the Capripox genus within the Poxiviridae family. The virus persists in lesions or scabs for prolonged intervals, making these skin lesions the primary source of transmission. Cattle farms around the world experience high infection rates, with over 7,500 deaths connected to LSD in the past 5 years, out of 190,000 reported cases as in Figure. 1. LSD critically affects the economic system, leading to reduced milk production and animal losses. The Table Ā [tab_diseases] summarizes the diverse diseases and their signs related to livestock. There are several cattle diseases associated with skin nodules, which can be classified based on key characteristics such as the description of nodules, fatality rate, early detection indicators, and the importance of nodule detection. LSD stands out because of its high fatality rate and the extreme nature of the skin nodules, which can be commonly observed on the neck and lower back. Early detection of LSD is crucial to save from its rapid unfold, with signs such as fever, swelling, and reduced milk yield serving as key indicators. While different sicknesses, which include Bovine Tuberculosis, Bluetongue, and Contagious Ecthyma, additionally contain skin nodules, LSD requires mainly urgent interest because of its significant impact on farm animalsā health and productivity. Adamu et al. conducted an experimental analysis of LSDāaffected cattle in Ethiopia. They confirmed LSDV as the primary cause of the outbreak and identified nodules and fever as the main symptoms of the disease.
Recent advancements in artificial intelligence (AI) have also made it possible to perform the early detection of various livestock diseases. In this direction, various researchers have reported their work in the literature related to preciosn livestock farming. Chong et al. propose a method for estimating sheep live weight using LiteHRNet, a Lightweight High-Resolution Network, with RGB-D images. The design of efficient network heads was guided by Class Activation Mapping (CAM), ensuring visual interpretability and applicability in real-world environments. The technique was tested on a challenging dataset together with 726 RGB-D pictures of sheep, with weights starting from 19.5 to 94 kg. The consequences of the comparative experiments show that the lightweight Convolutional Neural Network (CNN) model trained on RGB-D pictures yields an acceptable weight estimation overall performance. It successfully achieves a Mean Average Percentage Error (MAPE) of 14.605%, with only 1.06 M parameters. Gouda eta l. predicted LSD using multiple ML models under multiclass balance. They apply under-sampling and oversampling to address the class imbalance. Random forest along with random oversampling achieved 82% accuracy.
style="width:7.5cm;height:6cm" />
Qiao et al. introduce a deep learning framework designed to monitor and classify dairy cow behaviors. This framework combines the Convolutional 3D (C3D) network with Convolutional Long Short-Term Memory (ConvLSTM). The system classifies five key behaviors: feeding, exploring, grooming, walking, and standing. The model initially extracts 3D convolutional features from the video frames using the C3D network. In the second phase, the scheme captures spatio-temporal information using ConvLSTM. This information is then passed through a softmax layer for classification. Using a 30-frame video length, the proposed method achieved classification accuracies of 90.32% and 86.67% on datasets for calves and cows, respectively, outperforming state-of-the-art models such as BiLSTM, Inception-V3, SimpleRNN, LSTM, and C3D.
style="width:19cm;height:14cm" />
In another study, Qiao et al. provide an in-depth analysis of key techniques in precision livestock farming, focusing on methods for cattle identification, live weight estimation, and body condition evaluation. They review multiple applicable studies and present them in a comprehensive and organized manner. Based on the findings and evaluation of latest improvements, the authors are expecting that the adoption of high-precision, non-contact, and automatic technologies may be critical for the destiny of precision cattle farming. They additionally emphasize the significance of integrating emerging 3D model reconstruction and deep learning methods into these fields of study.
Wang et al. develop a system for monitoring the respiratory rate in group-housed pigs. The proposed system identifies video clips where the pigs are in a resting state. It applies an object detection algorithm to automatically detect each animal and select the region of interest (ROI) without requiring manual input. They estimate the respiratory rate by analyzing time-varying features from the ROI. They collect video footage of a group of five pigs wearing an ECG belt around their abdomens to provide gold-standard respiratory rate measurements. The comparison of the estimates from the AI-based method showed a high level of accuracy, with a mean absolute error of 2.38 breaths per minute. Saqib et al. employ the MobileNetV2 model and the RMSprop optimizer in order to devise a novel deep learning scheme for detecting LSD. They test their scheme on both healthy and infected cattle datasets, achieving 95% accuracy. They outperform existing benchmarks by 5ā10%. The image dataset includes 464 images of healthy cows and 329 images of LSD-affected cows.
Xiao et al. propose a method for identifying individual cows using top-view images. They capture a top-down image of the cow, followed by the application of an enhanced Mask R-CNN model to segment the image and extract features related to the cowās back shape. They employ the Fisher method to select the optimal subset of features. They use a support vector machine (SVM) classifier for cow identification. Compared to the traditional Mask R-CNN model, the improved Mask R-CNN model achieves precision, recall, F1 score, and average precision values of 98.21%, 96.48%, 97.34%, and 97.39%, respectively. Mahmud et al. summarize the details of recent advancements in the application of deep learning in precision cattle farming. They specifically emphasize health monitoring and identification. Out of 56 selected studies, 58% focused on cattle identification, while the remaining studies concentrated on health monitoring. The analysis revealed the use of 20 different deep learning models, with CNNs being the most commonly utilized, followed by LSTM, Mask-RCNN, and Faster-RCNN models.
| Stage | Symptoms |
|---|---|
| Incubation Stage | - No visible symptoms, virus replicates. |
| - Usually lasts 2-4 weeks after infection. | |
| Early Stage | - Fever. |
| - Loss of appetite. | |
| - Depression. | |
| - Mild respiratory distress. | |
| - Lesions start to appear. | |
| Intermediate Stage | - Development of large lesions on the skin. |
| - Lesions range from 2-5 cm in size. | |
| - Lesions start on the neck, head, and limbs. | |
| Progression Stage | - Lesions ulcerate and secrete fluid. |
| - Increased body temperature continues. | |
| - Secondary bacterial infections may occur. | |
| - Lameness due to lesions. |
Stages and Symptoms of LSD
| Ref. (Year) | Approach | Model Performance | Dataset | Evaluation Methods | Challenges |
|---|---|---|---|---|---|
| Girma et al. (2021) | CNN for feature extraction and SVM for classification in LSD detection. | 95.7% accuracy with SVM for LSD classification. | 1,740 image dataset for LSD detection. | Evaluated with SVM, Random Forest with SVM achieving 95.7% accuracy. | Noise removal techniques needed for better image quality and region identification. |
| Neha et al. (2022) | Random Forest for LSD prediction in specific geographic locations. | 97% accuracy with Random Forest for LSD prediction. | 18,603 instances and 16 features from a dataset. | Performance comparison with Random Forest (97% accuracy). | Image noise and the challenges of accurate segmentation for LSD detection. |
| Kumar et al. (2024) | Automated LSD detection using deep learning with transfer learning | VGG16 and MobileNetV2 achieved accuracies of 96.07% and 96.39% | Publicly available datasets of healthy and LSD-affected cattle | Performance evaluation using accuracy, sensitivity, specificity, precision | Challenges in dataset balancing and the need for high sensitivity. |
| Saqib et al. (2024) | Deep learning approach using MobileNetV2 and RMSprop optimizer. | 95% accuracy with MobileNetV2 for LSD detection. | 464 healthy and 329 LSD-affected cattle images. | 95% accuracy on healthy and LSD-affected cattle images dataset. | Partitioning dataset to balance the classes and ensuring diversity in images. |
| Asad etal. (2025) | ViT classifier with preprocessing (resizing, normalization, augmentation) | 98% accuracy | 8,000 cattle images | Evaluated using accuracy, precision, recall, and F1 score. | Ensuring generalization to field conditions |
| Proposed: LUMPNet (2025) | YOLOv11 + EfficientNet + AWDR for LSD early detection and monitoring. | 99% accuracy in early detection of LSD. | 1024 image dataset, 324 for LSD detection | Multiple-stage performance assessment with 99% accuracy. | Handling early-stage lesion variability, ensuring precise region localization |
LSD is characterized by fever, swollen superficial lymph nodes, and numerous nodules ranging from 2 to 5 centimeters in diameter on the skin and mucosal membranes Ā . Infected cattle also suffer from lameness and swelling in their limbs. The virus often causes lasting skin damage in affected animals. The disease leads to poor growth, infertility, abortion, chronic weakness, decreased milk production, and even death. Fever typically appears about a week after the viral infection, with the initial fever possibly exceeding 41°C and lasting for up to a week. During this period, all superficial lymph nodes become enlarged. The distinctive nodules appear 7 to 19 days after infection, accompanied by mucopurulent discharge from the nose and eyes. These nodules affect deeper layers of skin, such as the subcutis and even the muscle, impacting both the dermis and epidermis. The lesions may be well-defined or merge together and can appear anywhere on the body, though they are most commonly found on the head, neck, udder, scrotum, vulva, and perineum. Skin lesions may either resolve quickly or persist as painful lumps. In some cases, the lesions may become more severe, forming deep ulcers that may be pus-filled and surrounded by granulation tissue. Initially, the nodules discharge serum and appear gray to white upon cutting, eventually developing a central necrotic core after approximately two weeks. The virus is more easily transmitted through the nodules on mucous membranes such as the mouth, nose, eyes, rectum, udder, and genitalia. Table 1 shows the progression of LSD, detailing symptoms across stages from fever and lethargy in the early phase to the development of painful skin lesions. A growing body of research is leveraging ML to predict the occurrence of infectious diseases in animal populations. Specifically, ML methods have been developed to predict the onset of LSD infections. Researchers use ML techniques to analyze image data from infected animals, with these methods proving effective in predicting LSD and other viral diseases in test datasets Ā . Ananda et al. proposes using deep CNN feature extractors i.e. VGGā16, VGGā19, Inceptionāv3, combined with machineālearning classifiers for automated LSD detection, reporting improved feature extraction and better performance versus classical methods.
Neha et al. tested the performance of the Random Forest model for detecting LSD in various cattle. They also compared the performance of other machine learning techniques. Raj et al. proposed a hybrid model based on VGG-19 and ResNet-50 deep learning architectures to accurately detect LSD conditions. They used principal component analysis to decrease the dimensionality of the feature vector. Liang et al. utilized machine learning techniques to predict outbreaks of African Swine Fever by analyzing global bioclimatic data. While other methods achieved an accuracy of 80.4%, the Random Forest technique proved more accurate than the SVM when applied to a dataset of key climate variables, with an accuracy of 76.02%. Niu et al. employed machine learning methods to predict outbreaks of Peste des Petits Ruminants using bioclimatic variables and altitude data. Their modelās precision ranged from 47% to 96%, with the Random Forest method showing superior performance when applied to data from three countries not included in the training dataset. Figure. 2 summarizes the role of our proposed LUMPNet in Precision livestock farming and early disease detection.
Novelty and contributions
A number of deep learning (DL)-based models have been proposed in the literature for early detection of LSD-affected cattle . However, most of these models do not generalize across diverse environments and data scarcity, leading to suboptimal performance. Additionally, the traditional feature extraction methods adopted by the recent literature fail to effectively classify LSD cases, particularly in the early stages when symptoms, such as skin nodules, are less pronounced. Moreover, most of these models take a long converge time and are not scalable, making them inefficient to be deployed in real-world livestock monitoring systems .
To address these issues, we propose a unified, fastāconverging, and scalable deepālearning pipeline approach called LUMPNet, which integrates advanced object detection, classification, and intelligent optimization techniques to enhance the accuracy of early-stage LSD detection by focusing on the visual appearance of skin nodules. The modular design of LUMPNet enables it to support multi-domain adaptability, enabling re-training for new object categories or environmental conditions with minimal overhead. The novelty and cotributions are as follows.
-
LUMPNet combines YOLOv11-based detection with an EfficientNet-based CNN classifier for accurate localization and precise classification of the affected animals. The EfficientNet-based CNN classifier integrates compound scaling in depth, width, and resolution to enhance accuracy.
-
An Adaptive Hybrid optimizer (AWDR) is proposed, which combines the variance smoothing capability of RMSProp with the decoupled weight-decay regularization of AdamW through a dynamic blending coefficient. The optimizer enhances gradient stability, accelerates convergence, mitigates overfitting, and improves generalization on small or imbalanced datasets.
-
Extensive evaluations are conducted to demonstrate that LUMPNet outperforms state-of-the-art baselines in terms of precision, recall, F1-score, and inference latency by around.
Table [tab:comparison1] summarizes the comparison of LUMPNet with the existing approaches. The rest of the paper is structured as follows. Section 2 presents the proposed model and methodology. Section 3 discusses the case study in which we compare the previous models with the EfficientNet baseline model. Section 4 explains the results, and Section 5 concludes the paper.
Methodology
LUMPNet detects LSD in cattle by integrating deep learning models i.e. YOLOv11 and EfficientNet, with novel AWDR optimizer. More precisely, the model uses a multifaceted approach encompassing object detection and semantic segmentation techniques. The process begins by identifying the cattle in the dataset . Both manual and automated segmentation is performed using the segment anything (SAM) package . This results in separating the cattleās image from the colored background. Next, the main phase starts, where the symptoms of LSD, specifically skin nodules and lesions, are identified using YOLOv11 . The LSD-affected cattle is identified by finding the skin nodules or the affected skin. For classification, the model utilizes EfficientNet . Figure [flowchart] illustrates the system architecture, which comprises the following stages: dataset preparation, training a YOLOv11 model, segmentation, training an LUMPNet model for LSD classification, and evaluation.
In this paper, we use image dataset publicly available on Kaggle . The dataset contains 1024 cattle images, in which both healthy and LSD affected cattle images exist. In these images, the LSD affected cattle have a mild to extreme appearance of skin nodules. The appearance of nodules characterizes the multiple stages of LSD. Among these images, there are 324 images of cattle with lumpy skin and an additional 700 images of healthy cattle. A key challenge was to develop an effective partitioning strategy that would support accurate model training and evaluation. To this end, 500 images of healthy cattle were allocated for training, while the remaining 200 were designated for testing. Similarly, 224 images of LSD-affected cattle were selected for training, and 100 images were reserved for testing.
| Feature Name | Description | Type |
|---|---|---|
| Nodule Count | Total number of visible nodules detected on the skin surface | Quantitative |
| Nodule Size | Average area or diameter of individual nodules | Quantitative |
| Nodule Shape | Circularity, aspect ratio, and irregularity of nodules | Morphological |
| Edge Density | Density of edges (sharp transitions) around nodules | Structural |
| YOLOv11 Region Proposals | Bounding box features: position, size, confidence score | Learned Feature |
| EfficientNet Embeddings | Deep learned features extracted from intermediate convolutional layers | Learned Feature |
Image Pre-processing
During image pre-processing, we perform both manual and automatic segmentation. Moreover, we mask the background color of the images with blue to maintain consistency, enhance the focus of the cattleās skin, and ultimately improve the accuracy of the model. This leads to the accurate detection of skin-related features. For automatic segmentation, we utilize SAM, which provides more accurate region detection. We improve model prediction by visualizing masks, points and bounding boxes through the utility functions. For object detection, we use ultralytics, NumPy, OpenCV, PyTorch and Matplotlib. Moreover, we integrate Roboflow with the system architecture and model training pipelines in order to achieve automated data management and efficient model training. We perform manual segmentation on the dataset to ensure higher quality and more precise data. This segmentation is important to effectively training the model and accurately evaluating its performance.
Feature Extraction
The feature extraction stage operates on the cattle region obtained after pre-processing/segmentation and produces three complementary cues that are fed to the learning models as explained in Table [tab:lsd_features]. Specifically, we compute (1) Edges of cattle: gradient-based edge maps to emphasize nodule boundaries and local shape irregularities; (2) RGB color space: the native RGB representation with per-channel statistics that preserve appearance and pigmentation differences associated with lesions; and (3) Binary mask creation: a foreground mask isolating the cattle silhouette to suppress background clutter and constrain subsequent analysis. Formally, for an input image $`x`$ and its segmentation $`\hat{M}`$, we derive $`E=\text{edge}(x\odot\hat{M})`$, $`C=x_{\text{RGB}}`$, and $`B=\mathbb{1}(\hat{M}>0)`$, and form a composite feature tensor $`F=[E,\,C,\,B]`$. During training, $`F`$ is paired with image-level labels (healthy vs.Ā LSD-affected) to supervise the models; at test time, the same features guide prediction. The above-mentioned features provide complementary geometric and photometric evidence that improves nodule localization.
Implementation of YOLOv11
YOLOv11 is a CNN-based framework for real-time object classification and detection . It introduces architectural enhancements that include Cross Stage Partial with kernel size 2 (C3k2) blocks for efficient feature reuse and Spatial Pyramid Pooling-Fast (SPPF) for multi-scale context. It strengthens spatial focus using Convolutional block with Parallel Spatial Attention (C2PSA). These characteristics enhance performance, particularly for detecting subtle LSD skin nodules in cattle. We implement pretrained weights via transfer learning and generate detection plots for qualitative evaluation. YOLOv11 partitions an input image into multiple grid cells. Each cell predicts bounding boxes with associated confidence scores. Figure 5 illustrates the complete architectural design of the YOLOv11 algorithm.
style="width:82.0%" />
During bounding box regression, the regression model predicts these parameters via the Distribution Focal Loss (DFL) mechanism, permitting localization of LSD lesions on livestock pores and skin. YOLOv11 replaces C2f with the C3k2 block, so it provides proficient implementation of the Cross Stage Partial (CSP) Bottleneck. This architectural innovation helps gradient float via deep network layers, stopping vanishing gradient issues at the same time as keeping faster processing speeds. Residual connections permit the community to study complicated capabilities vital for subtle lesion detection. Intersection Over Union (IoU) quantifies the overlap amongst anticipated and floor fact bounding bins. It is mathematically described as
\begin{equation}
\text{IoU} = \frac{\text{Area of Overlap}}{\text{Area of Union}} = \frac{|B_{\text{pred}} \cap B_{\text{gt}}|}{|B_{\text{pred}} \cup B_{\text{gt}}|} .
\end{equation}
YOLOv11 utilizes Complete IoU (CIoU) loss that promises high-precision lesion localization.
Loss function formulation
The loss function of YOLOv11 combines Complete Intersection over Union (CIoU) loss, Distribution Focal Loss (DFL), and Binary Cross-Entropy (BCE) loss. The total loss is expressed as a weighted sum
\begin{equation}
\mathcal{L}_{\text{total}} = \lambda_{\text{box}} \mathcal{L}_{\text{CIoU}} + \lambda_{\text{dfl}} \mathcal{L}_{\text{DFL}} + \lambda_{\text{cls}} \mathcal{L}_{\text{BCE}} ,
\end{equation}
where $`\lambda_{\text{box}} = 7.5`$, $`\lambda_{\text{dfl}} = 0.5`$, and $`\lambda_{\text{cls}} = 1.5`$ are the default weighting factors that balance the relative importance of each loss component. The CIoU loss incorporates three geometric factors: overlap area, center point distance, and aspect ratio consistency. It is formulated as ,
\begin{equation}
\mathcal{L}_{\text{CIoU}} = 1 - \text{IoU} + \frac{\rho^2(b, b^{gt})}{c^2} + \alpha v ,
\end{equation}
where $`\rho^2(b, b^{gt})`$ represents the Euclidean distance between the center points of the predicted box $`b`$ and ground truth box $`b^{gt}`$, $`c`$ denotes the diagonal length of the smallest enclosing box covering both boxes, and $`v`$ measures the consistency of aspect ratios. Distribution Focal Loss models bounding box boundaries as probability distributions over discrete bins. It is formulated as,
\begin{equation}
\begin{split}
\mathcal{L}_{\mathrm{DFL}}(P, y) \;=\; &\; -\sum_{i=0}^{n-1} \Big[ \; (y_i^{+} - y)\,\log\big(p_i\big) \\
&\qquad\qquad\qquad\qquad\; +\; (y - y_i^{-})\,\log\big(p_{i+1}\big) \; \Big] ,
\end{split}
\end{equation}
where $`y`$ represents the ground truth coordinate, $`y_i^-`$ and $`y_i^+`$ are the floor and ceiling integer bins surrounding $`y`$, and $`p_i`$ denotes the predicted probability for bin $`i`$. DFL also captures uncertainty in bounding box localization, particularly beneficial for detecting LSD lesions with ambiguous or blurry boundaries. Binary Cross-Entropy (BCE) loss optimizes class predictions for each detected object. For the binary LSD classification task (healthy vs. infected), the BCE loss is computed as
\begin{equation}
\mathcal{L}_{\text{BCE}} = -\frac{1}{N} \sum_{i=1}^{N} \left[ y_i \log(\hat{p}_i) + (1 - y_i) \log(1 - \hat{p}_i) \right] ,
\end{equation}
where $`N`$ represents the total number of predictions, $`y_i \in \{0, 1\}`$ denotes the ground truth label, and $`\hat{p}_i`$ is the predicted probability for class presence.
style="width:100.0%" />
As demonstrated in Figure. 6, we apply $`3\times3`$ convolutions with stride $`(2,2)`$ to downsample while increasing feature depth; each block uses batch normalization and SiLU activations to stabilize training and enable nonlinear pattern learning. Channel widths scale from 32 to 64, 128, 256, 512, and 1024, capturing progressively deeper features for robust detection of LSD skin nodules.
Architecture of YOLOv11
Our model of YOLOv11 contains a Spatial Pyramid Pooling-Fast (SPPF) block. This block aggregates features from several receptive field sizes through sequential max-pooling operations, mathematically expressed as,
\begin{equation}
\begin{split}
\mathrm{SPPF}(x) \;=\; &\;\mathrm{Conv}_{1\times 1}\Big(
\mathrm{Concat}\Big[\, x,\; \mathcal{M}_k(x),\\
&\qquad\; \mathcal{M}_k\big(\mathcal{M}_k(x)\big),\;
\mathcal{M}_k\big(\mathcal{M}_k(\mathcal{M}_k(x))\big) \,\Big] \Big) ,
\end{split}
\end{equation}
where $`\mathcal{M}_k`$ represents max-pooling with kernel size $`k`$ (typically $`k=5`$) and stride 1, and $`\text{Concat}`$ denotes channel-wise concatenation. The C2PSA block permits the network to selectively cognizance on critical spatial areas in the feature maps. It additionally computes attention weights primarily based on spatial correlations
\begin{equation}
\mathbf{A}_{\text{spatial}} = \sigma\left( \text{Conv}_{7 \times 7}\left( \text{Concat}[\text{MaxPool}(\mathbf{F}), \text{AvgPool}(\mathbf{F})] \right) \right),
\end{equation}
where $`\mathbf{F}`$ represents the input feature map, $`\sigma`$ denotes the sigmoid activation function, and the attention-refined features are obtained through
\begin{equation}
\mathbf{F}_{\text{refined}} = \mathbf{A}_{\text{spatial}} \odot \mathbf{F} ,
\end{equation}
This attention mechanism allows YOLOv11 to concentrate computational resources on potentially lesion-containing regions while suppressing irrelevant background features, dramatically improving detection accuracy for small or partially occluded LSD nodules, a common challenge in field conditions where cattle have varying coat colors, lighting conditions, and occlusions from environmental factors. The model architecture continues to grow deeper with more convolutional blocks and feature fusion operations through the neck component, enabling the extraction of both low-level and high-level features. Lower-level features extracted in earlier layers focus on fine details, including edges, texture patterns, color variations, and subtle skin surface changes characteristic of early-stage LSD lesions.
During the detection phase, the neck component employs Path Aggregation Network (PANet) principles with upsampling and concatenation techniques to merge features from different network layers. The neck processes features at multiple scales: P3 (high-resolution, $`80 \times 80`$), P4 (medium-resolution, $`40 \times 40`$), and P5 (low-resolution, $`20 \times 20`$). Feature fusion is achieved through
\begin{equation}
\mathbf{F}_{\text{fused}}^{(i)} = \text{C3k2}\left( \text{Concat}\left[ \mathbf{F}_{\text{backbone}}^{(i)}, \text{Upsample}(\mathbf{F}_{\text{fused}}^{(i+1)}) \right] \right)
\end{equation}
for the top-down pathway, and
\begin{equation}
\mathbf{F}_{\text{out}}^{(i)} = \text{C3k2}\left( \text{Concat}\left[ \mathbf{F}_{\text{fused}}^{(i)}, \text{Downsample}(\mathbf{F}_{\text{out}}^{(i-1)}) \right] \right)
\end{equation}
for the bottom-up pathway. This bidirectional feature fusion enables the model to generate multi-scale predictions that can detect both small isolated nodules on the neck and shoulders and larger lesion complexes on the body trunk and limbs.
The final detection head generates predictions at three different scales, outputting for each anchor point: (1) class probabilities $`p_{\text{cls}} \in [0,1]^C`$ for $`C`$ classes, (2) bounding box coordinates $`(x, y, w, h)`$ modeled as probability distributions through DFL, and (3) objectness scores $`p_{\text{obj}} \in [0,1]`$ indicating detection confidence. The detection head employs decoupled prediction branches where classification and localization are processed through separate convolutional pathways, reducing the task conflict and improving overall performance. For each prediction scale, the head uses small $`1 \times 1`$ convolutions to ensure precise predictions at each spatial location,
\begin{equation}
\begin{aligned}
p_{\text{cls}} &= \sigma(\text{Conv}_{1 \times 1}^{\text{cls}}(\mathbf{F}_{\text{out}})) \\
\mathbf{d}_{\text{box}} &= \text{Softmax}(\text{Conv}_{1 \times 1}^{\text{box}}(\mathbf{F}_{\text{out}})) \\
p_{\text{obj}} &= \sigma(\text{Conv}_{1 \times 1}^{\text{obj}}(\mathbf{F}_{\text{out}})) ,
\end{aligned}
\end{equation}
where $`\mathbf{d}_{\text{box}}`$ represents the distribution over discrete coordinate bins for DFL.
Overall, the YOLOv11 model processes images of size $`640 \times 640 \times 3`$. The spatial dimensions progressively reduce due to strided convolutions and pooling operations: $`640 \times 640 \rightarrow 320 \times 320 \rightarrow 160 \times 160 \rightarrow 80 \times 80 \rightarrow 40 \times 40 \rightarrow 20 \times 20`$, while the depth of feature maps increases systematically: $`3 \rightarrow 32 \rightarrow 64 \rightarrow 128 \rightarrow 256 \rightarrow 512 \rightarrow 1024`$ channels. During the stages, the model captures complex hierarchical features and combines multi-scale feature maps from three resolution levels through the neckās feature fusion operations. These fused results are processed by the detection head to generate accurate predictions for detecting LSD in cattle.
Adaptive Hybrid Optimizer: AWDR
We devise an AWDR optimizer that combines the weight decay regularization capabilities of AdamW with the adaptive variance smoothing nature of RMSprop. The proposed fusion model not only enables stable gradient updates during the early stages of training, moreover, preserves better generalization in later epochs. The optimizer implements a dynamic transition schedule. In this scheduling mechanism, time-dependent coefficient $`\beta(t)`$ controls the gradual shift between RMSprop-like behavior towards AdamW-dominant behavior. The dynamic transition is formulated as,
\begin{equation}
\beta(t) = \beta_{0} \cdot \left(1 - \frac{t}{T}\right),
\end{equation}
where $`t`$ denotes the current epoch, $`T`$ is the maximum number of epochs, and $`\beta_{0}`$ is an initialization factor. The AWDR update rule is then expressed as,
\begin{equation}
g_{t} = \beta(t) \cdot RMSprop(g_{t}) + \left(1 - \beta(t)\right) \cdot AdamW(g_{t}).
\end{equation}
This optimizer results in more stable convergence and reduced overfitting. Therefore, RMSprop contributes stronger gradient smoothing during early high-variance updates, while AdamW progressively regulates weight decay in the later optimization stages. It also provides three main advantages over traditional singular optimizer, i.e.,
-
Improved training stability across long training cycles
-
Faster convergence with reduced gradient oscillations
-
Better generalization performance due to AdamW-based weight decay enforcement
Consequently, AWDR significantly enhances the learning capability of our proposed hybrid architecture for Lumpy Skin Disease detection, enabling higher robustness and improved classification accuracy. Figure. 7 explains the architecture of the proposed optimizer.
style="width:10cm;height:13cm" />
EfficientNet-based Classification
EfficientNet is used as the main component for LSD detection in our model as demonstrated in Figure. 8. It is a CNN architecture equipped with the concept of uniformly scaling a model, called compound scaling, a crucial advancement in deep learning. The compound scaling strategy is used to solve the trade-off among accuracy, efficiency, and size during CNN scaling. In this strategy, a compound coefficient, denoted as $`\phi`$, is applied to scale the networkās width, depth, and resolution in a balanced and systematic manner, as given below
\begin{equation}
\begin{aligned}
\text{depth:} \quad & d = \alpha \phi \\
\text{width:} \quad & w = \beta \phi \\
\text{resolution:} \quad & r = \gamma \phi \\
\text{s.t.} \quad & \alpha \cdot \beta^2 \cdot \gamma^2 \approx 2 \\
& \alpha \geq 1 \\
& \beta \geq 1 \\
& \gamma \geq 1 .
\end{aligned}
\end{equation}
Here, $`\alpha`$, $`\beta`$, and $`\gamma`$ are constants that are determined through a small grid search. Conceptually, $`\phi`$ is a coefficient particular to the consumer to manipulate the general scaling of the model, indicating how many extra assets should be allotted for scaling. Meanwhile, $`\alpha`$, $`\beta`$, and $`\gamma`$ determine how those properties are disbursed throughout the networkās width, depth, and resolution, respectively. This model approaches anomalies in cattle pictures with the use of transfer learning, which allows the model to quickly adapt to the primary features of the dataset. Therefore, it serves as a pretrained version for anomaly detection. Moving in advance with anomaly detection of the use of EfficientNet, the proposed model involves a couple of key steps. These consist of pre-processing responsibilities, which comprise of resizing images to the preferred input proportions for the model and augmenting the dataset to develop model generalization.
style="width:18cm;height:9cm" />
style="width:7cm;height:5cm" />
style="width:8cm;height:6cm" />
style="width:8cm;height:6cm" />
The EfficientNet model is selected as the base structure for transfer
learning to know, making use of pretrained weights. We alter the modelās
parameters on the training dataset to adapt it for LSD detection. To
compare the model, we make predictions on the test dataset.
FiguresĀ 9
andĀ 10 show the results of image
classification using EfficientNet. EfficientNet model begins with a
Conv2dNormActivation layer that performs a convolution using a
64-filter, 3$`\times`$3 kernel with stride adjustments, followed
by batch normalization and the SiLU activation function. The model also
uses MBConv blocks to perform efficient feature extraction. These blocks
reduce the spatial resolution while increasing the depth.
The model utilizes an Adaptive Average Pooling layer, which aggregates the spatial features into a single vector. This pooled representation encapsulates the most relevant information from the input image and is ready to be passed into the final classification layers. The classifier part of the model consists of a Dropout layer, which helps prevent overfitting by randomly deactivating certain neurons during training, and a Linear layer that produces the final output. This output layer generates the classification results, which can be adjusted to handle specific tasks like distinguishing between LSD-affected cattle and healthy cattle. The entire EfficientNet model consists of multiple trainable parameters, making it capable of being fine-tuned during training for a given task, with no fixed or frozen layers.
EfficientNet begins with standard convolutional operations to extract spatial features from the input data. At each spatial location $`(i, j)`$, the output is obtained by applying a filter over a local neighborhood of the input feature map. The operation involves element-wise multiplication between the filter weights and the input region, followed by summation and the addition of a bias term
\begin{equation}
y_{i,j} = \sum_{m=-k}^{k} \sum_{n=-k}^{k} x_{i+m,j+n} \cdot w_{m,n} + b .
\end{equation}
Here, $`x_{i+m,j+n}`$ represents the input at a neighboring position, $`w_{m,n}`$ are the kernel weights, $`b`$ is the bias term, and $`y_{i,j}`$ is the resulting output at that position. The filter weights are learned during training to capture meaningful patterns in the data. To stabilize and speed up training, batch normalization is applied after convolutional operations. It normalizes the input features to have zero mean and unit variance, then scales and shifts the result using learnable parameters. The normalized output is given by
\begin{equation}
z_{i,j} = \gamma \left( \frac{x_{i,j} - \mu}{\sigma} \right) + \beta .
\end{equation}
In this equation, $`\mu`$ and $`\sigma`$ are the mean and standard deviation of the input features, while $`\gamma`$ and $`\beta`$ are learnable scaling and shifting parameters.
EfficientNet uses the SiLU activation function that is defined as
\begin{equation}
\sigma(x) = x \cdot \frac{1}{1 + e^{-x}}.
\end{equation}
The core building blocks of EfficientNet are MBConv layers, which are based on MobileNetV2ās inverted residuals and depthwise separable convolutions. First, a depthwise convolution is applied independently to each channel
\begin{equation}
y_{i,j,c} = \sum_{m=-k}^{k} \sum_{n=-k}^{k} x_{i+m,j+n,c} \cdot w_{m,n,c} .
\end{equation}
This operation focuses on spatial filtering within each channel. A pointwise convolution combines the outputs as follows:
\begin{equation}
y_{i,j} = \sum_{c} x_{i,j,c} \cdot w_{c} .
\end{equation}
At the end of the network, a fully connected (linear) layer maps the high-level features to class scores for classification tasks. For each output class $`k`$, the score is computed as,
\begin{equation}
y_k = \sum_i x_i \cdot w_{i,k} + b_k .
\end{equation}
Here, $`x_i`$ are the input features, $`w_{i,k}`$ are the weights associated with class $`k`$, and $`b_k`$ is the corresponding bias. The final predicted class is the one with the highest score
\begin{equation}
\hat{y} = \arg\max(y_1, y_2, \dots, y_K) .
\end{equation}
Case study
We conducted a case study in which we implemented an optimized EfficientNetāB0 model trained with the AdamW optimizer to compare its performance against our proposed LUMPNet. To evaluate the efficacy of the proposed image classification pipeline for the early detection of LSD in cattle, we analyze the learning dynamics of the EfficientNetāB0 based model over 50 training epochs. Figure. 11 - 16 illustrates the dataset details, and progression of training and validation accuracy, revealing several critical insights into the modelās optimization and generalization behaviors. Both the training and validation accuracy metrics exhibit a robust positive correlation with the number of epochs, confirming that the model is effectively learning discriminative features from the augmented dataset. A noteworthy characteristic of the training trajectory is the persistent superiority of validation accuracy over training accuracy. This phenomenon is consistent with the use of a heavy augmentation regimen, including random geometric transformations and color jitter, during the training phase. These augmentations introduce significant regularization by artificially increasing the difficulty of the training task, whereas the validation set remains relatively “clean,” undergoing only resizing and normalization. Consequently, the model performs better on the simpler validation distribution than on the highly perturbed training data, a strong indicator that the model is not overfitting.
style="width:7.5cm;height:5.5cm" />
style="width:7.5cm;height:5.5cm" />
The volatility observed in the accuracy curves, characterized by sharp stochastic fluctuations (e.g., near epochs 8, 14, and 26), can be attributed to the aggressive learning rate modulation governed by the OneCycleLR scheduler. This scheduler, in conjunction with the AdamW optimizer, facilitates rapid exploration of the loss landscape, preventing the model from settling into sharp local minima early in the training process. By the final epoch, the validation accuracy surpasses 85%, effectively outperforming the training accuracy which stabilizes near 80%.
The EfficientNet baseline attains 83.77% accuracy with an F1āscore of 78.72% on the validation split. The model yields an MCC of 0.6144 and a best observed accuracy of 85.71%, indicating moderate-to-strong discriminative performance. Inference latency is around 837.7 ms per image ($`\sim`$1.2 FPS), which limits realātime use and suggests optimization. We conclude that the EfficientNet baseline delivers high efficacy on the LSD dataset, producing validation accuracies in the midā90s after stabilization. We treat this pipeline as a reliable baseline for comparing against our proposed LUMPNet hybrid system and we continue to refine it toward a deployable earlyāwarning tool for LSD detection in field settings.
Results and Discussion
The proposed model is tested on a high-performance machine equipped with 64 GB RAM, an NVIDIA RTX 4090 GPU, and a 13th Gen Intel Core i9 processor running Windows 11. For reproducibility and collaborative experiments, Google Colab is also utilized, offering seamless access to widely-used libraries, support for both GPU and TPU acceleration, and smooth integration with Google Drive for efficient data storage and sharing. The model is tested using the Lumpy skin image dataset, which is publicly available . We use Compute Unified Device Architecture (CUDA) to enhance the computational efficiency of the model. It also increases the speed of the model-training process. Table [tab:1] compares the performance of the proposed techniques with the previous models .
| Model | Accuracy |
|---|---|
| Ensemble Method | 92% |
| Extreme learning machine (ELM) | 90% |
| DenseNet121 | 89% |
| CNN-Model | 84% |
| RMSProp+MobileNetV2 | 95% |
| LUMPNet | 99% |
Comparison with similar studies.
| Metric | Value |
|---|---|
| Total Samples | 1250 |
| Training Samples | 1000 |
| Testing Samples | 250 |
| Healthy Samples | 625 |
| Infected Samples | 625 |
| Class Balance Ratio | 1:1.00 |
Dataset Summary
Precision is defined as the ratio of true positive predictions ($`TP`$) to the total number of positive predictions ($`TP + FP`$), where $`FP`$ denotes false positives. In other words, precision evaluates the proportion of correct positive predictions made by the model.
\begin{equation}
\text{Precision} = \frac{TP}{TP + FP} .
\end{equation}
Here, $`TP`$ stands for true positives, and $`FP`$ represents false positives. On the other hand, Recall quantifies the modelās ability to correctly identify true positives ($`TP`$) from all actual positive instances ($`TP + FN`$), where $`FN`$ represents false negatives.
\begin{equation}
\text{Recall} = \frac{TP}{TP + FN}
\end{equation}
Average Precision (AP) is calculated as the weighted mean of precision values across different recall thresholds. Alternatively, it can be interpreted as the proportion of true positives ($`TP`$) out of all detections ($`TP + FP`$) made by the model.
\begin{equation}
\text{Average Precision} = \frac{TP}{TP + FP}
\end{equation}
Table [tab:1] compares the performance of various ML and DL models with the proposed deep learning technique for LSD detection and classification. The comparison is based on five key performance metrics, i.e., Area Under the Curve (AUC), Classification Accuracy (CA), F1-score, Precision, and Recall. The K-Nearest Neighbors (KNN) model exhibits good performance across all metrics, achieving an AUC of 0.983, a classification accuracy of 0.961, and an F1-score of 0.961. Similarly, the Decision Tree model achieves an AUC of 0.962, with an identical accuracy, F1-score, precision, and recall of 0.962. The Random Forest model outperforms KNN and Decision Tree models, demonstrating an AUC of 0.995 and accuracy of 0.977, indicating its effectiveness in LSD detection. However, Support Vector Machine (SVM) significantly underperforms, with the lowest AUC (0.482), classification accuracy (0.432), and F1-score (0.379), suggesting its inefficiency in handling the dataset used for LSD classification. This could be attributed to the complexity of feature extraction and classification in LSD cases, where DL-based feature extraction may be more effective. The Neural Network model achieves an AUC of 0.992, an accuracy of 0.962, and an F1-score of 0.962, showing improved robustness compared to conventional ML models. NaĆÆve Bayes, on the other hand, performs slightly worse, with an AUC of 0.945 and accuracy of 0.883, which may be due to its assumption of feature independence that does not align well with LSD features. The Adaboost model performs competitively, with an AUC of 0.972 and an accuracy of 0.972, demonstrating its ability to boost weak classifiersā performance. However, despite its good performance, it does not outperform the proposed model.
| Metric | Value |
|---|---|
| Test Accuracy (%) | 78.57 |
| F1-Score Macro (%) | 72.36 |
| Precision Macro (%) | 77.25 |
| Recall Macro (%) | 70.68 |
| Average Inference Time (ms) | 814.67 |
| Frames Per Second (FPS) | 1.2 |
| Model Parameters | 4,887,332 |
| Number of Classes | 2 |
| Total Test Samples | 160 |
Case study - Model Evaluation: Test dataset
This improved performance may be attributed to DLās ability to extract rich, hierarchical information from photos of cattle impacted by LSD. The results presented in the table clearly demonstrate that our DL-based techniques perform better than traditional ML models for early LSD detection. The robustness of the suggested method in successfully detecting and categorizing LSD cases is implied by its high accuracy (99%) and exceptional AUC, F1-score, precision, and recall.
While the recall for healthy cattle is 1.0, indicating the modelās ability to catch all non-inflamed instances without any false negatives, the recall for inflamed livestock is 0.98, indicating that the model successfully recognizes 0.98% of inflamed cases, but it misses some. The modelās overall best performance is highlighted by the F1 score, which strikes a compromise between precision and recall, being 1.0 for healthy cattle and 0.99 for inflamed farm animals.
The Precision-Confidence Curve showed a value of 0.978, reflecting the modelās ability to precisely identify cattle with minimal false positives. Meanwhile, the Recall-Confidence Curve scored 0.93, demonstrating that the modelās predictions become increasingly reliable with higher confidence levels. The overall accuracy of the model is 99%, meaning that nearly all predictions were correct, and both the macro and weighted averages for precision, recall, and F1 score are also 0.99, showing consistently high performance across both classes, even with a slight class imbalance. The LUMPNet model proves to be highly effective in detecting LSD, with exceptional precision and recall, particularly in ensuring that when it predicts a cattle as infected, it is almost always correct, making it a reliable tool for this task. Fig 17 illustrates the training and validation accuracy of the LUMPNet model over 20 epochs in the task of detecting LSD in cattle. Both training and validation accuracy improve rapidly within the first few epochs, with training accuracy stabilizing around 99% and validation accuracy fluctuating slightly around 97-98%.
Conclusion and Future Work
The proposed work on detecting LSD in cattle through image data and AI demonstrates a comprehensive and effective workflow that integrates a novel AWDR optimizer with two major ML techniques. This work utilizes YOLOv11 for cattle detection and EfficientNet for accurately identifying LSD symptoms, particularly skin nodules, which are early indicators of the disease. The results show that the LUMPNet model performs exceptionally well, achieving 99% accuracy, high precision, recall, and F1 scores, confirming its reliability in detecting infected cattle. The systematic approach from dataset preparation to evaluation proves to be transformative for early disease detection, offering significant benefits to livestock management by enabling timely intervention and improving animal health outcomes. This research paves the way for a robust AI-driven solution in veterinary disease management, ensuring more efficient and accurate detection of diseases like LSD in cattle.
LUMPNet model introduces a hybrid deep learning framework that integrates AWDR optimizer and YOLOv11 with the SAM to achieve both object detection and fine-grained segmentation of LSD-affected cattle. Unlike standard YOLOv11, which detects objects using bounding boxes alone, this approach enhances precision by segmenting lesion areas after detection, making it particularly effective for medical image analysis. Additionally, the use of a Dynamic Filtering Layer (DFL) refines the detection outputs by focusing on high-confidence regions, a novel technique that significantly improves accuracy in overlapping or subtle visual cases. The model also leverages enhanced multi-scale feature fusion using custom convolutional layers, C2f blocks, and Spatial Pyramid Pooling (SPPF), which are tailored to detect lesions of case study. On the classification side, EfficientNet is fine-tuned through transfer learning on a domain-specific dataset. The use of compound scaling ensures a balanced trade-off between accuracy and efficiency, allowing for effective lesion classification even with limited training data. A key innovation of the research lies in the seamless integration of detection and classification.
Future work will focus on expanding the dataset across diverse cattle breeds and environments to improve model generalisability. Additionally, integrating temporal analysis for disease progression tracking and developing a mobile application for real-time, field-based detection will be explored. Further enhancements to the segmentation accuracy using transformer-based models and cross-domain validation with other livestock diseases are also planned to broaden the systemās applicability in veterinary diagnostics.
Data availability statement
The datasets generated and/or analysed in the current work are publicly available in the kaggleās repository āLumpy Skin Images Datasetā accessible at https://www.kaggle.com/datasets/warcoder/lumpy-skin-images-dataset .
š ė ¼ė¬ø ģź°ģė£ (Figures)


















