Quantization-Aware Neuromorphic Architecture for Skin Disease Classification on Resource-Constrained Devices
On-device skin lesion analysis is constrained by the compute and energy cost of conventional CNN inference and by the need to update models as new patient data become available. Neuromorphic processors provide event-driven sparse computation and support on-chip incremental learning, yet deployment is often hindered by CNN-to-SNN conversion failures, including non-spike-compatible operators and accuracy degradation under class imbalance. We propose QANA, a quantization-aware CNN backbone embedded in an end-to-end pipeline engineered for conversion-stable neuromorphic execution. QANA replaces conversion-fragile components with spike-compatible transformations by bounding intermediate activations and aligning normalization with low-bit quantization, reducing conversion-induced distortion that disproportionately impacts rare classes. Efficiency is achieved through Ghost-based feature generation under tight FLOP budgets, while spatially-aware efficient channel attention and squeeze-and-excitation recalibrate channels without heavy global operators that are difficult to map to spiking cores. The resulting quantized projection head produces SNN-ready logits and enables incremental updates on edge hardware without full retraining or data offloading. On HAM10000, QANA achieves 91.6% Top-1 accuracy and 91.0% macro F1, improving the strongest converted SNN baseline by 3.5 percentage points in Top-1 accuracy (a 4.0% relative gain) and by 12.0 points in macro F1 (a 15.2% relative gain). On a clinical dataset, QANA achieves 90.8% Top-1 accuracy and 81.7% macro F1, improving the strongest converted SNN baseline by 3.2 points in Top-1 accuracy (a 3.7% relative gain) and by 3.6 points in macro F1 (a 4.6% relative gain). When deployed on BrainChip Akida, QANA runs in 1.5 ms per image with 1.7 mJ per image, corresponding to 94.6% lower latency and 99.0% lower energy than its GPU-based CNN implementation.
💡 Research Summary
The paper introduces QANA (Quantization‑Aware Neuromorphic Architecture), a complete end‑to‑end system for on‑device skin lesion classification that bridges the gap between high‑accuracy convolutional neural networks (CNNs) and low‑power spiking neural networks (SNNs) on resource‑constrained neuromorphic hardware. The authors identify two major obstacles in deploying CNN‑based dermatology models on neuromorphic chips: (1) conversion‑induced accuracy loss caused by non‑spike‑compatible operators such as BatchNorm and global pooling, and (2) severe class imbalance that amplifies errors on rare lesion types after quantization and conversion.
To address these issues, QANA is built around a quantization‑aware CNN backbone that is explicitly designed to be conversion‑stable. The backbone consists of four stages of Ghost modules, which generate a compact set of primary features via a 1×1 pointwise convolution and then cheaply expand them with depthwise operations to produce “ghost” channels. This design yields a high representational capacity while keeping FLOPs under tight limits (≈120 M). Each stage incorporates a Spatially‑Aware Efficient Channel Attention (SA‑ECA) block and a lightweight Squeeze‑and‑Excitation (SE) module, providing channel‑wise recalibration without resorting to global pooling, which is difficult to map onto spiking cores.
Activation functions are bounded with ReLU6, constraining outputs to the 0‑6 range. This bounded range aligns naturally with 8‑bit uniform integer quantization, reduces calibration sensitivity, and simplifies the subsequent spike‑encoding step. BatchNorm parameters are re‑parameterized into integer scale and offset terms so that the entire graph can be expressed in integer arithmetic before conversion.
The data‑preprocessing pipeline first applies a deterministic Grad‑CAM‑based cropping to focus on the lesion region, resizing all images to 64 × 64 × 3 to meet the on‑chip memory budget of the BrainChip Akida processor. Standard augmentations (horizontal/vertical flips, luminance and contrast jitter) are applied with fixed random seeds for reproducibility. To mitigate the extreme class imbalance of the HAM10000 and a proprietary clinical dataset, the authors employ SMOTE not on raw pixels but on the high‑dimensional embeddings produced by the frozen backbone. After standardization and PCA whitening, k‑nearest‑neighbor (k = 5) relationships are used to select “safe” minority anchors. Synthetic embeddings are generated by convex interpolation between an anchor and a neighbor, plus a small perturbation drawn from the anchor’s local principal components. The synthetic vectors are inverse‑whitened back to the original feature space and injected directly into the classifier head during training, leaving the backbone untouched. This strategy enriches minority class representations without introducing visual artifacts.
The quantized projection head maps the 4 × 4 × C feature map (C ≈ 256) to class logits using an 8‑bit integer linear layer. Because all preceding layers already satisfy integer constraints, the BrainChip CNN2SNN conversion toolkit can generate a deterministic spiking graph with no floating‑point operations. Spike thresholds and integration windows are calibrated on‑chip using a small validation set, ensuring that the spiking dynamics faithfully reproduce the original CNN logits.
A key contribution is the incremental learning protocol. When new patient data become available, only the projection head (the read‑out layer) is fine‑tuned for a few epochs using a mixture of real and synthetic embeddings. The backbone remains frozen, allowing rapid on‑chip adaptation without full retraining or off‑device data transfer, which is crucial for privacy‑sensitive medical applications.
Experimental results demonstrate that QANA achieves 91.6 % Top‑1 accuracy and 91.0 % macro F1 on the public HAM10000 benchmark, surpassing the strongest converted SNN baseline by 3.5 percentage points (4 % relative gain) in Top‑1 and 12 points (15.2 % relative gain) in macro F1. On a separate clinical dataset, QANA reaches 90.8 % Top‑1 and 81.7 % macro F1, again improving the best SNN baseline by 3.2 points (3.7 % relative) and 3.6 points (4.6 % relative), respectively.
When deployed on the BrainChip Akida neuromorphic processor, QANA processes an image in 1.5 ms while consuming only 1.7 mJ per inference. Compared with the same model executed on an NVIDIA RTX 3090 GPU, this represents a 94.6 % reduction in latency and a 99.0 % reduction in energy consumption. The model contains fewer than 1 M parameters and occupies less than 2 MB of on‑chip memory, confirming its suitability for edge devices with severe resource constraints.
In summary, the paper makes five intertwined contributions: (1) a quantization‑aware, spike‑compatible CNN backbone that eliminates conversion‑fragile components; (2) Ghost‑based feature generation that maximizes efficiency under tight FLOP budgets; (3) lightweight channel attention mechanisms that avoid global operators; (4) embedding‑space SMOTE for robust class‑balance handling; and (5) a practical on‑chip incremental learning workflow that updates only the read‑out layer. By jointly addressing accuracy, efficiency, and adaptability, QANA sets a new benchmark for neuromorphic dermatology inference and opens the door to privacy‑preserving, real‑time skin cancer screening on portable hardware.
Comments & Academic Discussion
Loading comments...
Leave a Comment