Hybrid Classical-Quantum Transfer Learning with Noisy Quantum Circuits

Quantum transfer learning combines pretrained classical deep learning models with quantum circuits to reuse expressive feature representations while limiting the number of trainable parameters. In this work, we introduce a family of compact quantum t…

Authors: D. Martín-Pérez, F. Rodríguez-Díaz, D. Gutiérrez-Avilés

Hybrid Classical-Quantum Transfer Learning with Noisy Quantum Circuits
Hybrid Classical-Quan tum T ransfer Learning with Noisy Quan tum Circuits D. Mart ´ ın-P´ erez 1 , F. Rodr ´ ıguez-D ´ ıaz 1 , D. Guti´ errez-Avil´ es 2 , A. T roncoso 1 , F. Mart ´ ınez- ´ Alv arez 1 1 Data Science and Big Data Lab, Pablo de Olavide University, Seville, Spain 2 Department of Computer Science, University of Seville, Seville, Spain { dmarper2, froddia, atrolor, fmaralv } @upo.es, dgutierrez3@us.es Abstract. Quantum transfer learning com bines pretrained classical deep learning mo dels with quan tum circuits to reuse expressiv e feature represen tations while limiting the num ber of trainable parameters. In this work, w e introduce a family of compact quantum trans- fer learning architectures that attac h v ariational quan tum classifiers to frozen conv olutional backbones for image classification. W e instantiate and ev aluate sev eral classical-quan tum hybrid mo dels implemented in Penn yLane and Qiskit, and systematically compare them with a classical transfer-learning baseline across heterogeneous image datasets. T o ensure a realistic assessment, we evaluate all approaches under both ideal simulation and noisy emulation using noise mo dels calibrated from IBM quantum hardware specifications, as w ell as on real IBM quantum hardw are. Exp erimen tal results show that the prop osed quan tum transfer learning architectures achiev e competitive and, in several cases, superior accuracy while consistently reducing training time and energy consumption relative to the classical baseline. Among the ev aluated approaches, Penn yLane-based implementations pro vide the most fav orable trade-off betw een accuracy and computational efficiency , suggesting that h y- brid quantum transfer learning can offer practical b enefits in realistic NISQ era settings when feature extraction remains classical. Keywords: quantum computing, transfer learning, h ybrid neural netw orks, v ariational cir- cuits. 1. Introduction The rapid adv ancement of artificial in telligence (AI) and mac hine learning (ML) o v er the past decade has significan tly transformed n umerous sectors, including healthcare, industry , finance, and environmen tal management. One of the key elemen ts b ehind these adv ancements has been deep learning [ 22 ], particularly con v olutional neural net w orks (CNNs), which ha ve demon- strated exceptional capabilities in image recognition, natural language pro cessing (NLP), and complex dec ision-making tasks. Despite their success, traditional deep learning mo dels often suffer from critical limitations, such as the need of large annotated datasets, high computa- tional demands, and substantial energy consumption during training and deplo ymen t phases [ 40 ]. These constrain ts ha v e motiv ated the exploration of alternative learning paradigms that can improv e efficiency and sustainability without sacrificing predictive p erformance. T ransfer learning has emerged as an effectiv e strategy to address some of these limitations. By reusing kno wledge learned from mo dels trained on large-scale datasets, transfer learning reduces b oth the data requiremen ts and computational cost of adapting mo dels to new tasks [ 29 ]. Classical–classical (CC) transfer learning, in which pretrained neural netw orks are fine- tuned on smaller target datasets, has therefore become a standard approac h across a wide range of applications. Ho w ever, ev en in this setting, fine-tuning large arc hitectures with millions of Date : March 19, 2026. 1 2 D. MAR T ´ IN-P ´ EREZ ET AL. parameters remains computationally exp ensiv e and energy-in tensiv e, particularly when applied rep eatedly across tasks [ 9 ]. P arallel to these dev elopmen ts, quan tum mac hine learning (QML) [ 36 ] has attracted increas- ing attention as a p otential paradigm for enhancing mac hine learning workflo ws. By exploiting quan tum-mechanical phenomena such as superp osition and entanglemen t, quantum computing promises computational adv an tages for sp ecific problem classes. A t present, ho wev er, quan tum hardw are operates in the noisy in termediate-scale quan tum (NISQ) [ 30 ] regime, c haracterized b y limited qubit coun ts, short coherence times, and restricted circuit depths. These constraints significan tly limit the direct applicabilit y of fully quantum algorithms to real-world machine learning problems [ 5 ]. T o ov ercome these limitations, h ybrid classical–quantum (CQ) approac hes hav e b een pro- p osed. Within this con text, Quan tum T ransfer Learning (QTL), and in particular CQ transfer learning, has emerged as a promising direction. In QTL, pretrained classical neural netw orks act as fixed feature extractors, while compact quan tum circuits are employ ed as trainable classification or regression heads. This architectural separation enables quantum models to op- erate on lo w-dimensional representations that retain the most informative features, improving parameter efficiency and mitigating curren t hardware constraints. QTL remains comparatively underexplored from a systematic b enc hmarking p erspective, particularly with resp ect to direct comparisons against well-established CC transfer-learning baselines under realistic exp erimen tal conditions. In this work, w e address this gap by con- ducting a controlled, systematic ev aluation of CC and CQ transfer learning pip elines, assessing their p erformance, efficiency , and robustness across heterogeneous image classification tasks from medical, biological, industrial, and general-vision domains. Our analysis exploits widely used pretrained CNN backbones and fo cuses on isolating the impact of the quantum component, comparing predictiv e accuracy , training time, and energy consumption across classical and hybrid mo dels. In addition, we explicitly account for realistic quan tum noise by incorporating hardware-calibrated noise models, thereby enabling a more accurate assessment of near-term quantum machine learning p erformance. In summary , our work mak es the following con tributions to QTL: (1) W e in tro duce a family of new QTL architectures that take features from pretrained con volutional backbones and feed them into compact v ariational quantum classifiers, instan tiated in b oth Penn yLane and Qiskit. (2) W e design and apply a fair ev aluation proto col that addresses classical feature ex- tractors, spans heterogeneous image datasets, and incorp orates realistic noise mo dels calibrated on IBM hardware to compare ideal sim ulation with noisy emulation. (3) W e present a head-to-head comparison of several quan tum-enhanced v arian ts and a classical transfer learning baseline, rep orting accuracy across m ultiple backbones and datasets. (4) W e analyze the effect of realistic noise on mo del p erformance and identify the need for error mitigation strategies in near-term devices. (5) W e release implementations and exp erimen tal configurations to enable repro ducible b enc hmarking of h ybrid CQ systems. The remainder of the pap er is structured as follo ws. Section 2 reviews related work on transfer learning and hybrid quan tum classifiers. Section 3 presents our metho dology , detailing the prop osed QTL architectures in Penn yLane and Qiskit. Section 4 presents the exp erimen tal results, comparing the quantum v arian ts with the classical baseline under ideal simulation and HYBRID CLASSICAL-QUANTUM TRANSFER LEARNING WITH NOISY QUANTUM CIRCUITS 3 realistic noise conditions. Finally , Section 5 concludes with a discussion of implications and future directions. 2. Rela ted Work In this section, we review prior work on QTL, with particular emphasis on hybrid CQ archi- tectures, application domains, and exp erimen tal ev aluation practices under NISQ constraints. A first line of research explores QTL for image-based classification tasks, particularly in medical and scien tific domains. Bali et al. [ 3 ] in tro duce Quan tumNet, a h ybrid framework com- bining classical feature extractors with v ariational quan tum classifiers for diab etic retinopathy detection. Mir et al. [ 24 ] similarly com bine pretrained con v olutional backbones with v ariational quan tum classifiers, ev aluating their approach across multiple quantum softw are frameworks. Mogalapalli et al. [ 26 ] study QTL for diverse image classification tasks, showing that the suitabilit y of classical backbones depends strongly on dataset characteristics and training con- figurations. Azev edo et al. [ 2 ] further v alidate the applicability of QTL in medical imaging through exp erimen ts on real quantum hardw are, demonstrating robustness b ey ond ideal sim- ulation. Bey ond medical imaging, QTL has also been applied to general vision and industrial datasets. Kumsett y et al. [ 21 ] introduce the T rashBox dataset and ev aluate hybrid CQ transfer learn- ing mo dels for waste classification, rep orting improv emen ts in b oth predictive p erformance and training efficiency . Otgonbaatar et al. [ 28 ] address the challenges of high-dimensional em b eddings and limited qubit a v ailability by combining pretrained V GG16 features with m ul- tiqubit quantum la yers, showing improv ed generalization through strongly en tangling circuits. Kati et al. [ 17 ] extend QTL to deepfake detection by in tegrating it with transformer-based vision arc hitectures, demonstrating the compatibilit y of hybrid quantum mo dels with mo dern atten tion-driven feature extractors. A complemen tary b ody of work inv estigates QTL across non-visual mo dalities, highlighting its applicabilit y b ey ond image classification. Th us, Qi et al. [ 31 ] prop ose a CQ transfer learning approac h for sp ok en command recognition using pretrained audio representations. W ang et al. [ 47 ] apply QTL to synthetic sp eec h detection by fine-tuning v ariational quantum circuits on top of large pretrained sp eec h models. Koik e et al. [ 20 ] extend QTL to wireless sensing to mitigate domain shift in h uman activit y monitoring using 60-GHz Wi-Fi signals. In the NLP domain, Buonaiuto et al. [ 6 ] explore h ybrid CQ classifiers for linguistic acceptabilit y judgments, ac hieving results comparable to state-of-the-art classical mo dels. QTL has also been explored in sequential and time-series mo deling tasks. W ang et al. [ 46 ] prop ose a lay er-enhanced quan tum LSTM combined with transfer learning for remaining useful life prediction in fuel cells, demonstrating improv ed predictive stability under limited data regimes. In a related but distinct setting, Han et al. [ 14 ] introduce a quan tum-inspired transfer learning approach for control optimization in wind turbines, ac hieving substantial run time reductions compared to con ven tional optimization techniques. Sev eral w orks focus on architectural strategies to adapt quan tum models to NISQ-era con- strain ts. Kim et al. [ 19 ] integrate pretrained CNNs with quantum conv olutional neural net- w orks to reduce circuit depth while maintaining classification accuracy . Y ogara j et al. [ 48 ] prop ose post-v ariational classical QTL to alleviate optimization c hallenges in v ariational quan- tum circuits, impro ving robustness across multiple backbones and datasets. Khatun et al. [ 18 ] further in vestigate QTL arc hitectures with an emphasis on adversarial robustness, demonstrat- ing improv ed resilience compared to both classical and fully quantum alternatives. 4 D. MAR T ´ IN-P ´ EREZ ET AL. F rom a broader p erspective, several studies explore transfer learning in quantum or quantum- inspired settings b ey ond standard supervised classification. Zen et al. [ 50 ] use transfer learning to improv e the scalability of neural-net w ork quantum states in many-bo dy physics, while V er- meire et al. [ 45 ] apply quantum-informed transfer learning to c hemical prop ert y prediction b y combining quan tum-calculated and exp erimen tal data, achieving improv ed accuracy and generalization in data-limited regimes. Despite these adv ances, existing QTL studies typically ev aluate a limited n umber of architec- tures or datasets, often rely on idealized quantum sim ulations, and rarely report computational cost or energy consumption as first-class ev aluation metrics. Moreov er, comparisons across dif- feren t quantum soft w are framew orks are frequently conducted in isolation, making it difficult to draw general conclusions ab out practical p erformance trade-offs in realistic NISQ settings. In contrast, our w ork pro vides a controlled and systematic b enc hmarking of classical–classical and CQ transfer le arning pip elines across m ultiple datasets, pretrained bac kb ones, quantum arc hitectures, and soft ware framew orks. By incorp orating realistic, hardware-calibrated noise mo dels and explicitly measuring training time and energy consumption, we aim to assess QTL not only in terms of predictiv e accuracy but also from the persp ective of practical deploymen t and sustainable computing. 3. Methodology This work presents a comparative analysis of four distinct image classification arc hitectures, obtained by combining multiple pretrained classical bac kbones with three families of transfer- learning heads, including a classical baseline and t w o quantum-h ybrid v arian ts. Our ob jective is not to demonstrate quantum adv an tage, but to assess whether compact quantum heads can act as effective and efficient alternativ es to classical classifiers under realistic NISQ constraints. T o this end, we ev aluate the impact of realistic quantum noise on h ybrid mo del p erformance through controlled exp erimen ts, ensuring fair comparison across all arc hitectures. Classical baselines employ pretrained conv olutional netw orks as frozen backbones, with only their final classifier lay ers fine-tuned using cross-en tropy loss and ImageNet normalization. These mo dels establish the reference accuracy and energy profile without any quantum lay er. Qiskit-based quantum h ybrids feed the same backbones into a SamplerQNN constructed with four qubits and a depth-three v ariational circuit, employing Hadamard initialization, angle enco ding via R Y gates, and brick-w all CNOT en tanglemen t patterns alternating b et w een ev en- o dd qubit pairs [ 25 ]. Eac h v ariational lay er applies parameterized R Y rotations follo w ed by en tangling op erations, with outputs interpreted through parity-based binary string classifica- tion. Optional shot sampling ( N shots = 1024) and depolarizing noise (single-qubit: p 1 q = 0 . 001, t wo-qubit: p 2 q = 0 . 01) emulate realistic NISQ hardware [ 30 ]. P ennyLane-based quan tum hybrids pro ject backbone features to a four-qubit quantum la y er implemen ted with AngleEmbedding follo w ed by BasicEntanglerLayers using circular CNOT top ology [ 4 ]. The circuit consists of three v ariational lay ers, each containing circular en tan- glemen t (qubits 0 → 1 → 2 → 3 → 0) and parameterized R Y rotations, with Pauli- Z ex- p ectation v alues on all qubits replacing the classical fully connected head. The noisy v ariant applies realistic IBM noise c hannels (amplitude and phase damping) after eac h gate op eration using the default.mixed device with parameters calibrated from IBM Heron r2 sp ecifica- tions (T1=250 µ s, T2=150 µ s). Across all scenarios, w e freeze the conv olutional parameters, standardize optimizer settings (Adam with lr=10 − 3 , StepLR scheduler with γ = 0 . 9), and log losses, accuracies, and confusion matrices to enable direct comparisons of classical and quan tum-enhanced transfer learning. HYBRID CLASSICAL-QUANTUM TRANSFER LEARNING WITH NOISY QUANTUM CIRCUITS 5 3.1. Classical-Classical T ransfer Learning. Our classical baseline emplo ys transfer learn- ing [ 29 , 49 , 51 ] using CNNs pretrained on ImageNet-1k [ 8 , 37 ]. W e ev aluate sev eral pretrained arc hitectures that v ary in design philosophy , parameter efficiency , and feature extraction capa- bilities [ 12 ]. In the training proto col, all backbone w eigh ts remain frozen, whereas only the final classi- fication lay er is optimized [ 49 ]. The classifier implements a simple linear transformation that maps extracted features to class logits, as defined in Eq. (1). (1) f class ( x ) = W ⊤ ϕ ( x ) + b where ϕ ( x ) represents the frozen backbone features, and W ∈ R d × C defines the trainable w eight matrix, with d b eing the feature dimension and C the num b er of classes. Algorithm 1: Classical T ransfer Learning T raining. Input: Dataset D , Pretrained Mo del M pre , Classes C Output: T rained Classical Mo del M 1 Initialize M ← M pre (e.g., ResNet18); 2 foreach p ar ameter p ∈ M backbone do 3 p.r eq uires g rad ← F alse ; // Freeze backbone 4 end 5 d in ← GetInputF eatures( M classif ier ); 6 M classif ier ← Linear( d in , C ) ; // Replace head 7 while ep o ch ≤ max ep o chs do 8 foreach b atch ( x, y ) ∈ D train do 9 f ← M backbone ( x ) ; // Extract features 10 log its ← M classif ier ( f ); 11 L ← CrossEn trop y( log its, y ); 12 Up date M classif ier via Gradient Descent; 13 end 14 end 3.2. Classical-Quantum Penn yLane (Ideal). The P enn yLane Standard arc hitecture in- tegrates a parameterized quan tum circuit as the classification lay er while retaining a frozen classical backbone for feature extraction [ 23 , 26 ]. The hybrid pip eline transforms input images through the follo wing sequence Eq. (2). (2) x ϕ CNN − − − → f W pre − − − → z tanh( · ) × π 2 − − − − − − − → θ QC − − → ⟨ ˆ O i ⟩ W post − − − − → y Classical features f ∈ R d from the frozen bac kb one undergo linear pro jection via W pre ∈ R n × d (d is the depth of the circuit) to matc h the quantum la y er dimensionalit y of n = 4 qubits. The hyperb olic tangen t activ ation scaled by π / 2 ensures angle-encoded inputs θ ∈ [ − π / 2 , π / 2] n remain within v alid rotation ranges. The parameterized quantum circuit QC( θ , ϕ ) pro cesses these enco ded features, pro ducing exp ectation v alues ⟨ ˆ O i ⟩ of Pauli- Z observ ables on all qubits. Finally , a p ost-quan tum linear lay er W post ∈ R C × n maps these quantum measuremen ts to class logits y . 6 D. MAR T ´ IN-P ´ EREZ ET AL. W e employ a v ariational quan tum circuit with n = 4 qubits and depth D = 3 [ 25 , 15 ]. W e select those n um bers b ecause the noise is low er than with more qubits, ac hieving the same or higher accuracy . W e tested 1 to 16 qubits, and 4 qubits were the b est option. Implemen ted using Penn yLane’s AngleEmbedding and BasicEntanglerLayers templates. Classical features are first enco ded into quan tum states via single-qubit R Y rotations [ 39 ], mapping input data θ into quantum-state amplitudes via Eq. (3). (3) ˆ U enc ( θ ) = n − 1 Y i =0 R ( i ) Y ( θ i ) The subsequent v ariational la y ers combine en tanglement and parameterized transforma- tions. Each of the D = 3 lay ers implements a circular CNOT pattern, creating ring-top ology connectivit y via Eq. (4). (4) ˆ U ent = CNOT 0 , 1 · CNOT 1 , 2 · CNOT 2 , 3 · CNOT 3 , 0 This circular arrangemen t ensures maximal qubit connectivit y [ 41 ], where the last qubit wraps around to entangle with the first, forming a complete ring. F ollowing en tanglemen t, trainable R Y gates apply parameterized rotations [ 10 ] using Eq. (5). (5) ˆ U ( l ) rot ( ϕ l ) = n − 1 Y i =0 R ( i ) Y ( ϕ l,i ) where ϕ l = ( ϕ l, 0 , ϕ l, 1 , ϕ l, 2 , ϕ l, 3 ) represents trainable parameters for la y er l . The complete circuit combines these operations sequen tially , as shown in Eq. (6). (6) ˆ U circuit ( θ , { ϕ l } ) = D − 1 Y l =0 h ˆ U ( l ) rot ( ϕ l ) · ˆ U ent i · ˆ U enc ( θ ) Measuremen t extracts outputs as exp ectation v alues of the Pauli- Z op erator on all qubits using Eq. (7). (7) ⟨ ˆ O i ⟩ = ⟨ ψ | ˆ Z i | ψ ⟩ , i = 0 , 1 , 2 , 3 yielding a 4-dimensional feature v ector for the p ost-quantum classifier using Pauli Z. The total num ber of trainable quan tum parameters amoun ts to n × D = 4 × 3 = 12. Circuit sim ulation employs Penn yLane’s default.qubit device [ 4 ], p erforming noiseless statev ector ev olution. Gradient computation utilizes the parameter-shift rule [ 25 ] using Eq. (8). (8) ∂ ⟨ ˆ O ⟩ ∂ ϕ i = 1 2 h ⟨ ˆ O ⟩ ϕ i + π / 2 − ⟨ ˆ O ⟩ ϕ i − π / 2 i This analytical approac h enables exact gradient ev aluation, facilitating efficien t v ariational parameter optimization through standard bac kpropagation. HYBRID CLASSICAL-QUANTUM TRANSFER LEARNING WITH NOISY QUANTUM CIRCUITS 7 Algorithm 2: Classical-Quantum Penn yLane (Ideal). Input: F eatures f , Qubits n , La y ers D Output: Exp ectation V alues ⟨ Z ⟩ 1 Initialize Device ‘default.qubit‘; 2 Define Circuit U ( θ , ϕ ):; 3 AngleEm b edding( θ ) ; // Encode data 4 for l ← 1 to D do 5 BasicEn tanglerLay ers( ϕ l ) ; // Circular CNOTs + Rot 6 end 7 return [ ⟨ Z 0 ⟩ , . . . , ⟨ Z n − 1 ⟩ ]; 8 F orward Pass: ; 9 x proj ← W pre · f ; // Reduce dimension 10 θ ← tanh( x proj ) · π 2 ; // Scale to [ − π / 2 , π / 2] 11 q out ← Execute U ( θ , ϕ ); 12 log its ← W post · q out ; | 0 ⟩ R y ( x 0 ) R y ( ϕ 0 , 0 ) R y ( ϕ 1 , 0 ) R y ( ϕ 2 , 0 ) ⟨ Z 0 ⟩ | 0 ⟩ R y ( x 1 ) R y ( ϕ 0 , 1 ) R y ( ϕ 1 , 1 ) R y ( ϕ 2 , 1 ) ⟨ Z 1 ⟩ | 0 ⟩ R y ( x 2 ) R y ( ϕ 0 , 2 ) R y ( ϕ 1 , 2 ) R y ( ϕ 2 , 2 ) ⟨ Z 2 ⟩ | 0 ⟩ R y ( x 3 ) R y ( ϕ 0 , 3 ) R y ( ϕ 1 , 3 ) R y ( ϕ 2 , 3 ) ⟨ Z 3 ⟩ Figure 1. Penn yLane ideal quan tum circuit 3.3. Classical-Quantum P ennyLane (Noisy). T o simulate realistic quantum hardw are op- erating in the Noisy Intermediate-Scale Quantum era [ 30 ], we implemen t noise mo dels based on calibration data from IBM Quantum devices (see Section 4.3.5 for hardware sp ecifications). The P ennyLane Noisy architecture emplo ys a densit y matrix simulator ( default.mixed ) to mo del decoherence and gate errors [ 4 ]. Noise Implementation. W e compute damping probabilities using exp onen tial decay form ulas deriv ed from these coherence times. F or single-qubit gates ( t (1 q ) gate = 32 ns) Eq. (9, 11). γ 1 q = 1 − exp − t (1 q ) gate T 1 ! ≈ 1 . 28 × 10 − 4 (9) λ 1 q = 1 − exp − t (1 q ) gate T 2 ! − γ 1 q ≈ 8 . 53 × 10 − 5 (10) Tw o-qubit CZ gates ( t (2 q ) gate = 68 ns) exp erience elev ated error rates due to longer execution durations 10, 12 8 D. MAR T ´ IN-P ´ EREZ ET AL. γ 2 q = 1 − exp − t (2 q ) gate T 1 ! ≈ 2 . 72 × 10 − 4 (11) λ 2 q = 1 − exp − t (2 q ) gate T 2 ! − γ 2 q ≈ 1 . 81 × 10 − 4 (12) These probabilities feed into Kraus op erator representations for amplitude damping E AD ( γ ) and phase damping E PD ( λ ). The noisy circuit maintains the identical top ology to P enn yLane Standard (circular CNOT entanglemen t, depth D = 3), but inserts these noise channels af- ter every quantum op eration to physically mo del the relaxation and dephasing inherent to sup erconducting transmon qubits. These probabilities feed into Kraus op erator representations for amplitude damping E AD ( γ ) and phase damping E PD ( λ ). The noisy circuit maintains the identical top ology to P enn yLane Standard (circular CNOT entanglemen t, depth D = 3), but inserts these noise channels af- ter every quantum op eration to physically mo del the relaxation and dephasing inherent to sup erconducting transmon qubits. Algorithm 3: Classical-Quantum Penn yLane (Noisy). Input: F eatures f , Noise Params Γ 1 q , Λ 1 q , Γ 2 q , Λ 2 q 1 Initialize Device ‘default.mixed‘; 2 Define Noisy Circuit U noisy ( θ , ϕ ):; 3 AngleEm b edding( θ ); 4 Apply E AD (Γ 1 q ) ◦ E P D (Λ 1 q ) on all qubits; 5 for l ← 1 to D do 6 foreac h CNOT ( q i , q j ) in Entangler do 7 Apply CNOT ( q i , q j ); 8 Apply E AD (Γ 2 q ) ◦ E P D (Λ 2 q ) on q i , q j ; 9 end 10 foreac h R otation R Y on q i do 11 Apply R Y ( ϕ l,i ); 12 Apply E AD (Γ 1 q ) ◦ E P D (Λ 1 q ) on q i ; 13 end 14 end 15 return [ ⟨ Z 0 ⟩ , . . . , ⟨ Z n − 1 ⟩ ]; 3.4. Classical-Quantum Qiskit (Ideal). The Qiskit-based arc hitecture is implemented us- ing the Qiskit Machine Learning framew ork [ 34 ]. W e distinguish b etw een the ideal and noisy configurations by selecting the appropriate Qiskit Primitiv es for the NeuralNet work bac k end. In the ideal setting, w e emplo y Qiskit’s EstimatorQNN to compute exact expectation v alues of the observ able. This c hoice eliminates sto chastic sampling noise, yielding a clean gradient signal for ev aluating the theoretical upp er b ound of the mo del’s expressive capacity . Con- v ersely , for noisy em ulation, we use SamplerQNN . This shift is methodologically grounded in the requirement to simulate realistic hardw are b eha vior: while the Estimator provides ana- lytical means, the Sampler reconstructs the quasi-probability distribution from finite coun ts of HYBRID CLASSICAL-QUANTUM TRANSFER LEARNING WITH NOISY QUANTUM CIRCUITS 9 bitstrings (shots). This allows the model to incorp orate readout errors and shot-noise fluctua- tions, whic h are intrinsic to the classification tasks in the NISQ era [ 30 ]. By using SamplerQNN in the noisy regime, we ensure that the classification head operates on the same measuremen t statistics that w ould b e obtained from a physical IBM Quantum pro cessor. The circuit arc hitecture emplo ys a bric k-w all en tangling pattern combined with Hadamard initialization. Hadamard gates first create equal superp osition states across all qubits using Eq. (13). (13) | ψ 0 ⟩ = n − 1 O i =0 H ( i ) | 0 ⟩ = 1 √ 2 n 2 n − 1 X j =0 | j ⟩ where eac h Hadamard gate H ( i ) prepares qubit i in uniform sup erp osition, creating an equal probabilit y distribution across all 2 n computational basis states. Classical data embedding follo ws through R Y rotations [ 39 ] using Eq. (14). (14) ˆ U enc ( θ ) = n − 1 Y i =0 R ( i ) Y ( θ i ) where each rotation R ( i ) Y ( θ i ) = exp( − iθ i Y / 2) enco des a classical feature θ i through rotation around the Y axis. V ariational lay ers employ parameterized bric k-w all en tangling blocks [ 25 ]. F or each lay er l ∈ { 0 , 1 , 2 } , the transformation combines alternating CNOT patterns with trainable rotations using Eq. (15). (15) ˆ U ( l ) v ar = ˆ U rot ( ϕ l ) · ˆ U odd ent · ˆ U even ent The even CNOT la y er entangles adjacent qubit pairs (0 , 1) and (2 , 3) using Eq. (16). (16) ˆ U even ent = CNOT 0 , 1 · CNOT 2 , 3 while the o dd lay er targets the in termediate pair (1 , 2) using Eq. (17) (17) ˆ U odd ent = CNOT 1 , 2 This alternating pattern ensures efficien t en tanglemen t distribution [ 41 ], thereb y establishing connectivit y b et ween all qubit pairs within the la yered structure. T rainable rotations following en tanglement apply parameterized R Y gates using Eq. (18). (18) ˆ U rot ( ϕ l ) = n − 1 Y i =0 R ( i ) Y ( ϕ l,i ) The complete circuit com bines initialization, enco ding, and v ariational la yers as sp ecified in Eq. (19). (19) ˆ U Qiskit ( θ , { ϕ l } ) = D − 1 Y l =0 ˆ U ( l ) v ar ( ϕ l ) · ˆ U enc ( θ ) · n − 1 Y i =0 H ( i ) 10 D. MAR T ´ IN-P ´ EREZ ET AL. with total trainable parameters reac hing n × D = 4 × 3 = 12. | 0 ⟩ H R y ( x 0 ) R y ( ϕ 0 , 0 ) R y ( ϕ 1 , 0 ) R y ( ϕ 2 , 0 ) | 0 ⟩ H R y ( x 1 ) R y ( ϕ 0 , 1 ) R y ( ϕ 1 , 1 ) R y ( ϕ 2 , 1 ) | 0 ⟩ H R y ( x 2 ) R y ( ϕ 0 , 2 ) R y ( ϕ 1 , 2 ) R y ( ϕ 2 , 2 ) | 0 ⟩ H R y ( x 3 ) R y ( ϕ 0 , 3 ) R y ( ϕ 1 , 3 ) R y ( ϕ 2 , 3 ) Figure 2. Qiskit quantum circuit emplo ying brick-w all entangling pattern (4 qubits, depth=3) All n qubits undergo measurement in the computational basis, pro ducing binary strings b ∈ { 0 , 1 } n . The in terpretation function maps binary strings to class lab els by computing the Hamming weigh t using Eq. (20). (20) f interpret ( b ) = HammingW eight( b ) mo d C where HammingW eight( b ) = P n − 1 i =0 b i coun ts the num b er of ones in the binary string and C denotes the num ber of classes. The SamplerQNN outputs probabilit y distributions ov er classes b y aggregating binary strings that map to identical lab els using Eq. (21). (21) P ( c ) = X b : f ( b )= c |⟨ b | ψ ( θ , ϕ ) ⟩| 2 The standard noiseless v ersion emplo ys Qiskit’s Sampler primitiv e with exact statev ector sim ulation [ 33 ], yielding deterministic outputs without shot noise or decoherence. Algorithm 4: Classical-Quantum Qiskit (Ideal) Input: F eatures f , Qubits n = 4, Depth D = 3 1 Initialize ‘EstimatorQNN‘ with Gradien t supp ort; 2 Define Circuit QC :; 3 Apply Hadamard to all q ; 4 Apply R Y ( θ ) ; // Feature Map 5 for l ← 1 to D do 6 Apply Brick-W all CNOTs; 7 Apply R Y ( ϕ l ) ; // Variational 8 end 9 Define Observ ables O = { Z I I I , I Z I I , . . . } ; 10 F orward Pass: ; 11 θ ← tanh( W in · f ) · π 2 ; 12 exp v als ← Estimator.run( QC, θ , O ); 13 return exp v al s ; HYBRID CLASSICAL-QUANTUM TRANSFER LEARNING WITH NOISY QUANTUM CIRCUITS 11 3.5. Classical-Quantum Qiskit (Noisy). The Qiskit Noisy architecture emplo ys the Qiskit Aer simulator [ 32 ], incorporating dep olarizing noise channels and finite-shot measurements. Justification for noise model selection. T o ensure a fair comparison b et w een framew orks, b oth P ennyLane and Qiskit implemen tations use identical underlying noise parameters derived from T able 4. While Penn yLane implements amplitude and phase damping c hannels (physically accurate for superconducting transmon qubits), Qiskit employs thermal r elaxation c ombine d with dep olarizing noise . Both approac hes are calibrated to ac hiev e equiv alen t circuit fidelit y for depth-3, 4-qubit circuits, as measured b y Eq. (22). (22) F target ≈ 0 . 94–0 . 96 This high fidelity range reflects the significantly improv ed p erformance of Heron r2 pro cessors. Qiskit Noise Calibration. Qiskit implements thermal relaxation errors using the same T 1 and T 2 v alues from T able 4, combined with dep olarizing c hannels calibrated to matc h Heron r2 gate error rates: p 1 q = 0 . 0002 (0 . 02% single-qubit error rate) (23) p 2 q = 0 . 005 (0 . 5% t w o-qubit error rate) (24) These v alues are identical to the gate error rates used in Penn yLane, ensuring equiv alent noise intensit y across framew orks. Fidelit y equiv alence v erification. With unified noise parameters based on the Heron r2 sp ec- ifications, b oth framew orks aim for equiv alent-circuit fidelity . F or a depth-3, 4-qubit circuit (appro ximately 28 total gates), w e estimate using Eq. (25): (25) F PL ≈ F QK ≈ 0 . 94–0 . 96 This higher fidelity range reflects the improv ed coherence and gate quality of Heron pro cessors compared to earlier generations. The equiv alence ensures that observed p erformance differences reflect framework-specific factors rather than noise in tensit y discrepancies. Shot Noise and Sampling. Unlike the density matrix simulation used in Penn yLane, Qiskit Noisy emplo ys finite-shot sampling ( N shots = 1024). This in tro duces statistical noise independent of gate errors, as given by Eq. (26). (26) σ shot [ ˆ P ( b )] ≈ 1 √ N shots ≈ 3 . 1% This sampling v ariance adds to the dep olarizing noise, creating a t w o-fold sto c hasticit y (gate errors + measurement uncertaint y). W e ac kno wledge this asymmetry as inherent to the framework architectures: Penn yLane computes exp ectation v alues analytically via trace op erations, whereas Qiskit samples computational-basis states, mimic king hardw are b eha vior. This distinction allows us to compare an upper-b ound p erformance limit (Penn yLane) against a realistic deplo ymen t scenario (Qiskit). As a summary of the metho dological setup, Figure 3 presents a schematic o verview of the fiv e ev aluated arc hitectures, highlighting the shared feature extraction stage and the classical and quantum transfer learning heads considered in this work. 4. Resul ts In this section, we describ e the datasets selected for the study (Section 4.1). W e also detail the metric used to ev aluate the mo dels (Section 4.2). The complete exp erimental setup is also 12 D. MAR T ´ IN-P ´ EREZ ET AL. Algorithm 5: Classical-Quantum Qiskit (Noisy). Input: F eatures f , Bac k end B , Shots N = 1024 Output: Class Probabilities 1 Build ‘NoiseModel‘ from back end B calibration; 2 Initialize ‘SamplerQNN‘ with N shots and NoiseMo del; 3 Define In terpret F unction I ( b ) = Hamming( b ) (mo d C ); 4 F orward Pass: ; 5 θ ← ScaleF eatures( f ); 6 Sample binary strings b ∼ | ψ ( θ , ϕ ) | 2 ( N times); 7 Compute raw counts { b : count } ; 8 Map counts to classes using I ( b ); 9 P ( c ) ← P b : I ( b )= c count( b ) N ; 10 return P ( c ); (a) Classical-Classical Input Image Backbone (frozen) Linear Classifier Class Logits (b) CQ-Penn yLane (Ideal) Input Image Backbone (frozen) Linear d → nq ubits VQC Linear nqubits → C Class Logits default.qubit (c) CQ-Penn yLane (Noisy) Input Image Backbone (frozen) Linear d → nq ubits VQC Noise Channels Linear nqubits → C Class Logits default.mixed IBM Heron r2 (d) CQ-Qiskit (Ideal) Input Image Backbone (frozen) Linear d → nq ubits SamplerQNN Linear 2 → C Class Logits StatevectorSampler (e) CQ-Qiskit (Noisy) Input Image Backbone (frozen) Linear d → nq ubits SamplerQNN Noise Model Linear 2 → C Class Logits AerSampler IBM Heron r2 Legend: Classical backbone Classical lay er Quantum circuit Noise simulation Figure 3. Ov erview of the five ev aluated architectures: classical baseline and four hybrid classical-quantum v ariants. presen ted, including the pretrained mo dels employ ed, the hardware and softw are environmen t, and all relev an t h yp erparameters. Additionally , w e outline the configuration of the quantum circuit (Section 4.3). Finally , we compare the obtained results in terms of b oth execution time and accuracy (Section 4.4). 4.1. Datasets. W e ev aluate four binary image datasets of heterogeneous difficulty: Hymenoptera (an ts vs. b ees), Brain T umor MRI (tumor vs. no tumor), Cats vs. Dogs, and Solar Dust (defect vs. normal). All images are resized to 224 × 224 and prepro cessed with ImageNet statistics. The training set is split 80% for training and 20% for v alidation, the held-out test set is used only for final rep orting. HYBRID CLASSICAL-QUANTUM TRANSFER LEARNING WITH NOISY QUANTUM CIRCUITS 13 T able 1 provides a comparative o v erview of the four datasets, highlighting the diversit y in scale, domain, and complexity . T able 1. Comprehensive dataset c haracteristics summary . Dataset T rain T est T otal Difficult y Domain Hymenoptera [ 44 ] 245 153 398 Medium En tomology Brain T umor [ 27 ] 2000 400 2400 High Medical Imaging Cats vs. Dogs [ 7 ] 3000 600 3600 Medium-Lo w General Vision Solar Dust [ 11 ] 240 60 300 High Industrial 4.2. Metrics. Our metric is test accuracy , defined as Eq. (27). (27) Acc = 1 N N X i =1 1( ˆ y i = y i ) , with N test samples, true labels y i and predictions ˆ y i , where 1( · ) denotes the indicator function. W e trac k training time (seconds) for efficiency t s . Unless stated otherwise, all runs rep orted here use 10 ep ochs and are executed on the CPU. Accuracy v alues are rep orted in [0 , 1]. F or readabilit y , we discuss p ercen tage p oin ts when comparing approaches. 4.3. Exp erimen tal Setup. Across all exp erimen ts, we freeze the con v olutional bac kb one and train only the head (classical linear lay er or quantum head). Common hyperparameters: Adam ( η = 10 − 3 , batc h size 16), 10 epo chs. Quantum settings: 4 qubits and depth 3. F or Qiskit Standard (SamplerQNN), we use 1024 shots and a dep olarizing noise mo del ( p 1 q = 0 . 0002, p 2 q = 0 . 005), calibrated to match Heron r2 gate error rates; the no-noise v ariant disables b oth shots and noise. Penn yLane Standard uses default.qubit (noiseless). Penn yLane Noisy follo ws realistic device-lik e damping/dephasing as in Section 3.3. 4.3.1. CNNs Pr etr aine d Backb ones. W e ev aluate four backbones: ResNet-18 [ 16 ] contains 11.7M parameters, with residual connections, producing 512-dimensional features. MobileNetV2 [ 38 ] emplo ys 3.5M parameters using depthwise separable con v olutions, yielding 1280-dimensional features. EfficientNet-B0 [ 43 ] in tegrates 5.3M parameters through comp ound scaling, gener- ating 1280-dimensional features. RegNet-X-400MF [ 35 ] comprises 5.2M parameters through a NAS-designed architecture, extracting 400-dimensional features. 4.3.2. Har dwar e and Softwar e Envir onment. All exp erimen ts w ere executed on a standardized computing platform, summarized in T able 2. CPU-only execution was delib erately chosen to eliminate GPU-related v ariability and ensure consisten t timing measurements across quantum and classical comp onen ts. All simulations were run single-threaded for quan tum circuit op erations, whereas the classical backbones relied on PyT orch’s default CPU parallelization. 4.3.3. T r aining Hyp erp ar ameters. W e adopt identical h yp erparameters across all five arc hitec- tures to isolate the impact of quan tum comp onents: The StepLR sc heduler reduces the learning rate b y a factor γ = 0 . 9 every 3 epo c hs, pro viding gen tle annealing without aggressive decay . This conserv ative schedule prov ed effective across all datasets without requiring dataset-sp ecific tuning. 14 D. MAR T ´ IN-P ´ EREZ ET AL. T able 2. Hardware and soft w are environmen t used in all experiments. Comp onen t Sp ecification CPU AMD Ryzen 5 3600 (3.6 GHz, 6 cores) RAM 24 GB DDR4-3200 Op erating system Windo ws 10 Pro Python 3.13.5 PyT orch 2.1.0+cpu T orchVision 0.16.0 P ennyLane 0.35 Qiskit ≥ 1.0.0, < 2.0.0 Qiskit Machine Learning 0.8.0 Qiskit Aer 0.13.1 T able 3. Unified training hyperparameters for all arc hitectures. P arameter V alue Optimizer Adam Learning rate ( η ) 1 × 10 − 3 Adam β 1 0.9 Adam β 2 0.999 Adam ϵ 1 × 10 − 8 W eight decay 0 (frozen bac kb one) LR scheduler StepLR Step size 3 epo chs Gamma ( γ ) 0.9 Batc h size 16 T otal epo chs 10 Gradien t clipping None (stable gradien ts) Quantum-sp e cific settings: Num b er of qubits ( n ) 4 Circuit depth ( D ) 3 P ennyLane device (clean) default.qubit P ennyLane device (noisy) default.mixed P ennyLane gradien t P arameter-shift rule Qiskit shots 1024 Qiskit gradient SPSA (2 ev aluations) 4.3.4. Quantum Cir cuit Configur ation. Both Penn yLane and Qiskit v arian ts emplo y shallo w v ariational quan tum circuits with four qubits and depth three, yielding a total of 12 trainable quan tum parameters ( n × D = 4 × 3 = 12). This configuration was deliberately chosen to balance expressive capacit y with noise robustness, reflecting practical constraints imposed by curren t NISQ hardware. Deeper or wider circuits were a voided to limit decoherence and ensure stable optimization during training. HYBRID CLASSICAL-QUANTUM TRANSFER LEARNING WITH NOISY QUANTUM CIRCUITS 15 Gradien t computation follows framework-specific b est practices. In the Penn yLane imple- men tations, gradien ts are ev aluated analytically using the parameter-shift rule [ 25 ], whic h computes exact deriv atives through a fixed n um b er of circuit ev aluations p er parameter. This approac h mak es deterministic gradien ts and a voi ds additional sto c hasticit y b ey ond that in- duced by quantum noise in the noisy simulations. By contrast, Qiskit-based mo dels rely on Sim ultaneous Perturbation Sto c hastic Appro xima- tion (SPSA) for gradient estimation [ 42 , 13 ]. SPSA appro ximates gradients via finite differences along randomly sampled p erturbation directions, requiring only tw o circuit ev aluations p er op- timization step. While this introduces con trolled stochastic noise into the optimization pro cess, it is w ell-suited to shot-based quan tum execution. It aligns with standard practice in v ariational quan tum algorithms implemen ted in the Qiskit ecosystem. Although these gradien t computation strategies differ substan tially , they are b oth consisten t with the design philosophies of their resp ectiv e framew orks [ 4 , 33 ]. P ennylane is arc hitected around hardware-compatible automatic differentiation, where the Parameter-Shift rule pro- vides unbiased gradien ts at the cost of 2n circuit ev aluations [ 4 ]. Conv ersely , Qiskit’s common in tegration with SPSA addresses the high-v ariance noise flo or of current IBM hardware by using a gradient-free stochastic appro ximation that remains effective under strict shot counts [ 13 , 1 ]. Consequen tly , the observed differences in optimization b eha vior should b e interpreted as in trinsic to the underlying to olchains rather than as artifacts of ad ho c metho dological c hoices. 4.3.5. IBM Quantum Har dwar e Configur ation. T o v alidate our simulation results, we con- ducted exp erimen ts on real IBM Quantum hardware accessed through IBM Quantum Cloud services. Both noisy sim ulations and real hardw are experiments use noise parameters calibrated from IBM Heron r2 pro cessors, ensuring consistency across ev aluation settings. Hardw are Sp ecifications. Noise parameters for simulation are based on publicly av ailable sp ecifi- cations for IBM Quantum Heron r2 pro cessors (2024–2025). These processors feature impro v ed coherence times and gate fidelities compared to earlier F alcon-generation devices. T able 4 presen ts the unified calibration parameters used across b oth P ennyLane and Qiskit implemen- tations. T able 4. IBM Heron r2 pro cessor sp ecifications used for noise calibration and real hardware exp erimen ts. P arameter V alue Usage Coher enc e pr op erties: T 1 (energy relaxation) 250 µ s PL damping T 2 (dephasing time) 150 µ s PL dephasing Gate timing: Single-qubit gate time 32 ns PL noise calc. Tw o-qubit gate time (CZ) 68 ns PL noise calc. Err or r ates: Single-qubit gate error ( p 1 q ) 0.02% Qiskit dep ol. Tw o-qubit gate error ( p 2 q ) 0.5% Qiskit dep ol. Readout error (SP AM) 1.2% Both 16 D. MAR T ´ IN-P ´ EREZ ET AL. F or Penn yLane noisy simulations, we compute amplitude and phase damping probabilities from coherence times using exp onen tial decay form ulas (Eqs. 9–12). F or Qiskit noisy simula- tions, we apply dep olarizing c hannels with error rates matching the gate error sp ecifications. This unified calibration ensures that b oth frameworks exp erience the same noise intensit y , enabling a fair cross-platform comparison. Real Hardw are Configuration. F or real QPU exp erimen ts, w e executed on ibm torino , a 133- qubit IBM Heron r2 pro cessor. T able 5 summarizes the execution configuration. T able 5. Real quan tum hardware execution configuration. P arameter V alue Bac kend ibm torino Pro cessor type IBM Heron r2 T otal qubits 133 Qubits used 4 Circuit depth (transpiled) ∼ 49 Shots p er circuit 100 T ranspilation lev el 1 The transpilation pro cess maps the logical 4-qubit circuit to physical qubits on the device top ology , inserting SW AP gates as necessary . The resulting transpiled circuit depth of approx- imately 49 gates reflects the ov erhead introduced b y hardware connectivity constraints. Circuit Batching and Gradien t Estimation. Real hardware execution emplo ys Qiskit’s SamplerV2 primitiv e for efficient circuit batching, aggregating all samples in a training batch in to a single job submission. F or gradient computation, we use Simultaneous P erturbation Sto c hastic Ap- pro ximation (SPSA) [ 42 ], whic h appro ximates gradients using only tw o circuit ev aluations p er step (Eq. 28): (28) ˆ g k = L ( θ + c k ∆ k ) − L ( θ − c k ∆ k ) 2 c k ∆ k where ∆ k ∈ {− 1 , +1 } P is a random Bernoulli pe rturbation vector and c k = 0 . 3 is the pertur- bation magnitude. This reduces hardw are ev aluations from 72 (parameter-shift) to 3 p er batch (1 forward + 2 SPSA). T raining Protocol Adaptations. Due to QPU access constrain ts (queue w ait times, daily quotas, execution costs), real hardware experiments use: 5 ep o c hs (vs. 10 in sim ulation), Hymenoptera dataset only (mo derate size balancing significance with time), 100 shots p er circuit, and SPSA learning rate η = 0 . 1. These adaptations reflect practical realities of NI SQ hardw are and should b e considered when interpreting the results in T able 6. 4.4. Result Discussion. W e rep ort the mean training time p er run (in seconds), a v eraged o ver the four bac kb ones for 10 ep ochs, as summarized in T able 7. 4.4.1. Hymenopter a. T able 6 presents the p er-backbone results on the Hymenoptera dataset demonstrate the most adv an tage ov er the classical baseline. This dataset exhibits the largest accuracy gap betw een quan tum-h ybrid and classical ap- proac hes. Penn yLane-Standard with ResNet18 achiev es 0.9744, outp erforming the b est classi- cal result (Efficien tNet-B0, 0.9231) b y ov er 5%. The same-bac kbone comparison is ev en more striking: Penn yLane improv es ResNet18 from 0.8589 to 0.9744, MobileNetV2 from 0.8846 to HYBRID CLASSICAL-QUANTUM TRANSFER LEARNING WITH NOISY QUANTUM CIRCUITS 17 T able 6. Hymenoptera: T est accuracy and (training time in seconds) p er bac kbone. Backbone CC Penn yLane PL-Noisy PL-Real Qiskit Qiskit-Noisy Qiskit-Real ResNet18 0.8589 (148.9) 0.9744 (107.2) 0.9615 (178.4) 0.8421 (704) 0.9359 (305.7) 0.9295 (352.1) 0.9487 (559) MobileNetV2 0.8846 (136.7) 0.9487 (80.7) 0.9423 (142.8) 0.8684 (685) 0.9744 (288.2) 0.9551 (331.6) 0.9103 (681) EfficientNet-B0 0.9231 (178.9) 0.9487 (107.9) 0.9359 (168.5) 0.6579 (682) 0.9231 (297.4) 0.9103 (348.9) 0.7179 (545) RegNet-X-400MF 0.8590 (81.0) 0.9231 (83.4) 0.9103 (128.7) 0.7895 (673) 0.9103 (270.2) 0.8974 (312.4) 0.8590 (511) 0.9487, and RegNet-X-400MF from 0.8590 to 0.9231. Qiskit-Standard matc hes Penn yLane’s p eak performance with MobileNetV2 (0.9744) and also surpasses the classical baseline across all backbones. Under noise, PL-Noisy degrades by 0.6–1.3% relative to Penn yLane ideal, while Qiskit-Noisy sho ws a wider spread of 0.6–1.9% relativ e to Qiskit ideal; nevertheless, ev ery noisy v ariant still exceeds the b est classical result. The consistent quantum adv antage across all four bac kb ones suggests that, on small datasets with mo derate in tra-class v ariability , the compact 12-parameter quantum head captures discriminative features that the frozen classical lay er alone fails to capture. Av eraged ov er the four bac kb ones, Penn yLane-Standard is the fastest approach (94.8 s), reducing training time b y 30% with resp ect to the classical baseline (136.4 s). PL-Noisy rises to 154.6 s (a 63% ov erhead ov er PL ideal), placing it 13% ab o v e the classical cost. Qiskit- Standard (290.4 s) and Qiskit-Noisy (336.3 s) are approximately 2 . 1 × and 2 . 5 × slow er than the classical baseline, respectively; the noise mo del adds 16% on top of ideal Qiskit. The ov erall pattern mirrors Cats vs. Dogs in relative terms, though absolute times are roughly 7 × shorter o wing to the m uc h smaller dataset (398 vs. 3 600 images). Hymenoptera is the only dataset ev aluated on real IBM Quantum pro cessors ( ibm torino , 5 ep ochs, 100 shots), enabling a direct comparison across the full simulation-to-hardw are pip eline. Qiskit-Real with MobileNetV2 ac hieves 94.87%, remark ably close to its ideal sim ulation coun- terpart (97.44%) and only 0.6% b elow the noisy simulation (95.51%), v alidating the predictive v alue of the calibrated noise model. ResNet18 follows a similar trend (Qiskit-Real 94.87% vs. Qiskit-Noisy 92.95%), actually exceeding the noisy sim ulation—lik ely due to fa vourable hard- w are conditions during the run. How ev er, Efficien tNet-B0 suffers a sharp drop on real hardw are (71.79% vs. 91.03% noisy), suggesting that its higher-dimensional feature space (1 280-d) is more sensitiv e to the comp ounding effects of transpilation o v erhead and SPSA gradien t noise on physical qubits. P enn yLane-Real sho ws a systematically larger accuracy gap relative to its ideal simulation: MobileNetV2 drops from 94.87% to 86.84% ( − 8.0%) and Efficien tNet-B0 from 94.87% to 65.79% ( − 29.1%). This wider degradation is attributed to the transition from the analytic parameter-shift gradient used in simulation to SPSA on hardware, which introduces additional stochasticit y that the P enn yLane-trained w eigh ts w ere not exposed to during ideal optimization. Regarding training time, real hardware runs av erage 670–690 s for PL-Real and 511–681 s for Qiskit-Real (5 ep ochs), appro ximately 5 × slow er per ep o c h than the correspond- ing noisy simulations, primarily due to job submission ov erhead and queue latency rather than circuit execution itself. Des pite these constraints, the Qiskit-Real results confirm that quan- tum transfer learning is practically deplo y able on curren t NISQ devices, with MobileNetV2 and ResNet18 achieving accuracy lev els competitive with noisy simulation at a manageable time cost. 4.4.2. Br ain T umor. T able 8 presents the per-backbone results on Brain T umor. 18 D. MAR T ´ IN-P ´ EREZ ET AL. T able 7. Average training time (s) per dataset and arc hitecture (10 ep ochs, CPU). Dataset CC Penn yLane PL-Noisy Qiskit Qiskit-Noisy Hymenoptera 136.4 94.8 154.6 290.4 336.3 Brain T umor 665.7 442.2 728.3 1458.4 1685.4 Cats vs. Dogs 1024.5 665.2 1102.0 2169.1 2507.2 Solar Dust 309.5 240.1 397.2 544.6 629.6 T able 8. Brain T umor: T est accuracy and training time (in seconds) per bac kb one. Bac kb one CC P enn yLane PL-Noisy Qiskit Qiskit-Noisy ResNet18 0.9725 (736.8) 0.9500 (541.9) 0.9425 (892.7) 0.9850 (1510.7) 0.9775 (1745.3) MobileNetV2 0.9825 (639.3) 0.9125 (418.1) 0.9050 (687.4) 0.9725 (1447.1) 0.9600 (1672.8) EfficientNet-B0 0.9725 (891.9) 0.9625 (476.0) 0.9525 (784.2) 0.9600 (1515.8) 0.9500 (1751.9) RegNet-X-400MF 0.9900 (394.9) 0.9400 (332.9) 0.9350 (548.7) 0.9450 (1359.9) 0.9375 (1571.4) Unlik e the previous dataset, the classical baseline dominates here. CC-RegNet-X-400MF ac hieves the highest ov erall accuracy (0.9900), and the classical head also outp erforms Mo- bileNetV2 (0.9825) and EfficientNet-B0 (0.9725). The only quantum result that surpasses a classical counterpart is Qiskit-Standard with ResNet18 (0.9850 vs. 0.9725). P ennyLane-Standard underp erforms the classical baseline on all backbones, with the gap ranging from 1.0% on EfficientNet-B0 to 7.0% on MobileNetV2, suggesting that the statev ector-based quantum head do es not add v alue when the classical features are already highly separable. Qiskit-Standard remains closer to the classical lev el, lik ely b ecause its shot- based sampling pro vides implicit regularization. Noise degrades PL-Noisy by 0.5–1.0% and Qiskit-Noisy b y 0.75–1.25% relativ e to their ideal counterparts, the smallest drops observ ed across all datasets, consistent with the h yp othesis that high baseline accuracy leav es limited ro om for noise-induced errors to change the classification outcome. P ennyLane-Standard remains the fastest v ariant (442.2 s on a verage), reducing training time b y 34% compared with the classical baseline (665.7 s). PL-Noisy reaches 728.3 s (a 65% o verhead o ver PL ideal), approximately 9% ab o ve the classical cost. Qiskit-Standard (1 458.4 s) and Qiskit-Noisy (1 685.4 s) are 2 . 2 × and 2 . 5 × slo wer than the classical baseline, resp ectiv ely , with the noise mo del adding 16% on top of ideal Qiskit. Because the classical head already provides near-p erfect accuracy on this task, P ennyLane’s computational sa vings do not translate into a net adv antage: the classical approach ac hiev es higher accuracy at a mo derate cost, making it the preferred c hoice for this dataset. 4.4.3. Cats vs. Do gs. T able 9 presen ts the p er-bac kb one results on the Cats vs. Dogs dataset, the largest in our ev aluation. Quan tum-hybrid heads consisten tly outp erform their classical coun terparts on this dataset. P ennyLane-Standard with MobileNetV2 achiev es the highest accuracy (0.9800), surpassing the b est classical result (EfficientNet-B0, 0.9500) b y 3 p ercen tage p oin ts. The gain is even more pronounced when comparing lik e-for-lik e bac kb ones: P enn yLane impro ves ResNet18 from 0.9160 to 0.9700, MobileNetV2 from 0.9383 to 0.9800, and RegNet-X- 400MF from 0.9200 to 0.9600. Qiskit-Standard is equally comp etitive, matching Penn yLane on ResNet18 (0.9700) and marginally surpassing it on Efficien tNet-B0 (0.9650 vs. 0.9617) and RegNet-X-400MF (0.9617 HYBRID CLASSICAL-QUANTUM TRANSFER LEARNING WITH NOISY QUANTUM CIRCUITS 19 T able 9. Cats vs. Dogs: T est accuracy and (training time in seconds) p er bac kbone. Bac kb one CC Penn yLane PL-Noisy Qiskit Qiskit-Noisy ResNet18 0.9160 (1157.5) 0.9700 (832.8) 0.9583 (1382.4) 0.9700 (2282.8) 0.9567 (2638.2) MobileNetV2 0.9383 (979.9) 0.9800 (627.5) 0.9717 (1038.6) 0.9683 (2160.5) 0.9550 (2497.1) EfficientNet-B0 0.9500 (1358.6) 0.9617 (740.2) 0.9500 (1224.8) 0.9650 (2226.5) 0.9517 (2573.8) RegNet-X-400MF 0.9200 (601.8) 0.9600 (460.4) 0.9483 (762.1) 0.9617 (2006.7) 0.9450 (2319.7) vs. 0.9600). Regarding noise resilience, PL-Noisy exhibits a mo derate degradation of 0.8–1.2% compared with Penn yLane ideal, while Qiskit-Noisy suffers a slightly larger drop of 1.3–1.7% relativ e to Qiskit ideal. Despite this, all noisy v ariants still surpass or match the classical baselines except Qiskit-Noisy on Efficien tNet-B0, whic h ties with the classical result (0.9517 vs. 0.9500). The ov erall pattern confirms that, on a medium-scale general-vision task, the 12-parameter quan tum head adds measurable discriminativ e capacit y ov er the frozen classical bac kb one alone. Av eraged across the four backbones, P enn yLane-Standard is the fastest approach (665.2 s), reducing training time b y 35% w ith resp ect to the classical baseline (1 024.5 s). This adv an- tage stems from Penn yLane’s light weigh t statevector execution, whic h pro cesses a 16-sample batc h in appro ximately 6 ms and a voids the transpilation and sc heduling ov erhead inheren t to Qiskit. PL-Noisy increases the cost to 1 102.0 s (a 66% ov erhead ov er PL ideal), explained b y the shift from statevector to density-matrix simulation ( default.mixed ), whic h scales mem- ory as O (2 2 n ) instead of O (2 n ); nev ertheless, it remains within 8% of the classical baseline. Qiskit-Standard (2 169.1 s) and Qiskit-Noisy (2 507.2 s) are approximately 2 . 1 × and 2 . 4 × slow er than the classical baseline, resp ectively , owing to circuit transpilation, finite-shot sampling ( N =1024), and SPSA gradien t estimation; the noise model adds a further 16% on top of ideal Qiskit. P enn yLane therefore offers the b est accuracy-to-cost trade-off on this dataset, achieving the highest accuracy in the shortest time. 4.4.4. Solar Dust. T able 10 presents the per-backbone results on Solar Dust, the smalle st and most challenging dataset in our ev aluation. T able 10. Solar Dust: T est accuracy and (training time in seconds) per bac kbone. Bac kb one CC Penn yLane PL-Noisy Qiskit Qiskit-Noisy ResNet18 0.8250 (331.6) 0.8333 (257.8) 0.8167 (426.4) 0.8833 (567.4) 0.8667 (655.9) MobileNetV2 0.8750 (300.7) 0.8667 (238.2) 0.8500 (394.1) 0.9000 (537.3) 0.8750 (621.1) EfficientNet-B0 0.8583 (377.9) 0.8667 (251.4) 0.8417 (415.9) 0.8583 (561.5) 0.8333 (649.1) RegNet-X-400MF 0.9250 (227.7) 0.9167 (212.9) 0.8917 (352.3) 0.8750 (512.0) 0.8500 (592.1) Solar Dust yields mixed results across frameworks. The classical RegNet-X-400MF achiev es the ov erall b est accuracy (0.9250), and Penn yLane-Standard is within 0.8% of it (0.9167). Ho wev er, Qiskit-Standard pro vides the largest bac kb one-lev el improv ements: Re sNet18 rises from 0.8250 to 0.8833 and MobileNetV2 from 0.8750 to 0.90. Penn yLane-Standard shows only marginal gains on the lighter backbones (ResNet18 +0.8%, EfficientNet-B0 +0.8%) and is sligh tly b elow the classical level on MobileNetV2 ( − 0.8%). This dataset exhibits the highest noise sensitivity: PL-Noisy degrades by 1.7–2.5% and Qiskit-Noisy by 1.7–2.5% relativ e to their ideal counterparts, reflecting the limited statistical supp ort that 300 samples provide 20 D. MAR T ´ IN-P ´ EREZ ET AL. for learning robust decision b oundaries in the presence of sto chastic quantum noise. The pattern suggests that Qiskit’s shot-based execution, whic h implicitly exp oses the optimiser to measuremen t uncertain t y during training, builds greater robustness to noise on small, high- difficult y datasets. P ennyLane-Standard is again the fastest approac h (240.1 s on av erage), 22% b elo w the clas- sical baseline (309.5 s). PL-Noisy rises to 397.2 s (a 65% o v erhead ov er PL ideal), placing it 28% ab ov e the classical cost, the highest relativ e ov erhead among all datasets, attributable to the density-matrix simulation scaling on a s mall batc h c oun t. Qiskit-Standard (544.6 s) and Qiskit-Noisy (629.6 s) are 1 . 8 × and 2 . 0 × slo w er than the classical baseline, resp ectiv ely; the noise model adds 16% ov er ideal Qiskit. Notably , the Qiskit-to-CC ratio is low er here than on the other three datasets (1 . 8 × vs. 2 . 1–2 . 2 × ) because the small data v olume reduces the n umber of circuit ev aluations per epo ch, partially amortising the fixed transpilation ov erhead. 5. Conclusions This work presen ted a comparative ev aluation of CQ hybrid transfer learning under realistic NISQ constraints, addressing key metho dological challenges through unified hyperparameters, calibrated noise mo dels deriv ed from IBM quantum hardware, and heterogeneous image class i- fication datasets. The study fo cused on assessing whether compact quan tum heads can act as effectiv e and efficien t alternativ es to classical classifiers when in tegrated with frozen, pretrained con volutional bac kb ones. Across the ev aluated tasks, quan tum-h ybrid architectures achiev ed comp etitiv e and, in sev- eral cases, sup erior performance relative to classical baselines, particularly on datasets with limited sample sizes or high intra-class v ariability . Using noise parameters calibrated to IBM Heron r2 pro cessor sp ecifications ( T 1 = 250 µ s, T 2 = 150 µ s), b oth P enn yLane and Qiskit noisy v arian ts show ed a mo derate degradation of 0.5–2.5% in accuracy relative to ideal simula- tions, with smaller datasets such as Solar Dust sho wing higher sensitivit y . The ideal P enn yLane mo del reac hed p eak accuracy of 97.44% on Hymenopter a . These results demonstrate that, with state-of-the-art quantum hardware c haracteristics, quantum-h ybrid transfer learning mo dels can maintain near-ideal performance ev en under realistic noise conditions. Crucially , w e extended our ev aluation to real IBM Quan tum hardw are ( ibm torino ), demon- strating practical deplo y abilit y of quan tum transfer learning on curren t NISQ devices. Due to the substantial computational o verhead of real QPU execution, including queue wait times, limited daily quotas, and the high cost of gradient estimation, these exp erimen ts were neces- sarily constrained to a single dataset (Hymenoptera) and reduced training duration (5 ep o c hs). Despite these limitations, the Qiskit implementation with RegNet-X-400MF achiev ed 85.90% test accuracy on real hardw are, representing only a 5% degradation relative to noisy sim ulation results and v alidating the predictiv e v alue of our calibrated noise models. This finding confirms that the p erformance trends observ ed in sim ulation translate meaningfully to actual quan tum pro cessors. Crucially , w e extended our ev aluation to real IBM Quan tum hardw are ( ibm torino ), demon- strating practical deplo y abilit y of quan tum transfer learning on curren t NISQ devices. Due to the substantial computational o verhead of real QPU execution, including queue wait times, limited daily quotas, and the high cost of gradient estimation, these exp erimen ts were neces- sarily constrained to the Hymenoptera dataset and a reduced training duration. Despite these limitations, the Qiskit implementation with MobileNetV2 ac hiev ed 94.87% test accuracy on real hardware, only 2.6% b elo w its ideal sim ulation result and 0.6% b elo w the noisy sim ula- tion, v alidating the predictiv e v alue of our calibrated noise models. Penn yLane-Real exhibited HYBRID CLASSICAL-QUANTUM TRANSFER LEARNING WITH NOISY QUANTUM CIRCUITS 21 greater degradation (up to − 8% on MobileNetV2), attributable to the shift from the analytic parameter-shift rule to SPSA gradient estimation on the QPU. Notably , EfficientNet-B0 prov ed highly sensitive to hardware noise under b oth frameworks, dropping to 71.79% (Qiskit) and 65.79% (P enn yLane), indicating that higher-dimensional feature spaces amplify the comp ound- ing effects of transpilation o v erhead and stochastic gradien ts on ph ysical qubits. These findings confirm that the p erformance trends observed in simulation translate meaningfully to actual quan tum pro cessors, while also highlighting the critical role of bac kb one selection and gradient strategy in real-device deploymen ts. F rom a computational p ersp ectiv e, the study revealed a clear trade-off b et w een sim ulation fidelit y and training efficiency . Penn yLane-based implementations consisten tly offered fav or- able accuracy–time trade-offs, outp erforming b oth classical baselines and Qiskit-based h ybrids in training sp eed. In contrast, Qiskit’s shot-based sampling pro vided a more explicit treatment of measuremen t uncertain t y . It exhibited stability in challenging datasets such as Solar Dust, alb eit at the cost of substantially increased computational o v erhead. T aken together, these findings supp ort the viabilit y of QTL as a resource-efficien t strategy , aligning with green AI ob jectives by reducing trainable parameters and limiting computational demands while main- taining comp etitive accuracy . Despite these encouraging results, several limitations constrain the scop e of the conclusions. Differences in noise mo deling across framew orks complicate direct fidelity comparisons, and the limited n um ber of runs preven ts definitiv e statements regarding statistical robustness across random initializations. The real hardware exp erimen ts, while demonstrating practical viability , w ere limited to a single dataset and a reduced training proto col due to QPU access constraints. Moreo ver, the choice of gradien t estimation method (parameter-shift vs. SPSA) significantly impacted real hardw are p erformance, highligh ting the need for hardware-a w are optimization strategies. F uture work will extend this study along several complementary directions. Systematic ev aluations across multiple random seeds are required to assess the stabilit y of the observed p erformance trends. Adopting unified noise models across framew orks, including identical P auli or Kraus channels, w ould enable more con trolled cross-platform comparisons. In addition, broader architectural explorations that v ary qubit counts, circuit depths, and entanglemen t patterns may help identify regimes in which hybrid quan tum heads consistently provide benefits. Regarding real hardw are deploymen t, extended training runs with error-mitigation tec hniques suc h as Zero-Noise Extrap olation, probabilistic error cancellation, or dynamical decoupling will b e essential to close the gap b et w een noisy sim ulation and real QPU p erformance. Finally , in vestigating hardware-efficien t gradien t methods that balance estimation accuracy with circuit ev aluation cost remains a critical direction for practical quantum machine learning. D a t a and code a v ailability Co des and data are av ailable at https://github.com/Data- Science- Big- Data- Research- Lab/ QTL A cknowledgments The authors would lik e to thank the Spanish Ministry of Science and Innov ation for the sup- p ort within the pro jects PID2023-146037OB-C21 and PID2023-146037OB-C22. W e ackno wl- edge P ablo de Olavide Universit y for funding the Q-Resilience pro ject. Finally , we ackno wledge the use of IBM Quan tum Credits for this work. The views expressed are those of the authors and do not reflect the official p olicy or p osition of IBM or the IBM Quantum team. 22 D. MAR T ´ IN-P ´ EREZ ET AL. References 1. D. Amaro et al., Filtering variational quantum algorithms for c ombinatorial optimization , Quan tum Science and T e c hnology 7 (2022), no. 1, 015013. 2. V. Azev edo, C. Silv a, and I. Dutra, Quantum transfer le arning for br e ast c anc er dete ction , Quantum Machine Intelligence 4 (2022), 26. 3. M. Bali, V. P . Mishra, A. Y enkik ar, and D. Chikm urge, QuantumNet: An enhanc ed diabetic retinop athy dete ction mo del using classic al de ep le arning–quantum transfer le arning , Metho dsX 14 (2025), 103185. 4. V. Bergholm, J. Izaac, M. Sc h uld, C. Gogolin, S. Ahmed, V. Ajith, M. S. Alam, G. Alonso-Lina je, B. Ak ash- Naray anan, A. Asadi, J. M. Arrazola, U. Azad, S. Banning, C. Blank, T. R. Bromley , B. A. Cordier, J. Ceroni, A. Delgado, O. Di Matteo, A. Dusko, T. Garg, D. Guala, A. Hay es, R. Hill, A. Ijaz, T. Isacsson, D. Ittah, S. Jahangiri, P . Jain, E. Jiang, A. Khandelwal, K. Kottmann, R. A. Lang, C. Lee, T. Loke, A. Low e, K. McKiernan, J. J. Meyer, J. A. Monta˜ nez-Barrera, R. Moy ard, Z. Niu, L. J. O’Riordan, S. Oud, A. Panigrahi, C.-Y. Park, D. Polata jk o, N. Quesada, C. Rob erts, N. S´ a, I. Schoch, B. Shi, S. Shu, S. Sim, A. Singh, I. Strandberg, J. Soni, A. Sz´ av a, S. Thabet, R. A. V argas-Hern´ andez, T. Vincen t, N. Vitucci, M. W eb er, D. Wierichs, R. Wiersema, M. Willmann, V. W ong, S. Zhang, and N. Killoran, Pennylane: Au- tomatic differ entiation of hybrid quantum-classic al c omputations , arXiv preprin t arXiv:1811.04968, 2018, 5. K. Bharti, A. Cervera-Lierta, T. H. Kyaw, et al., Noisy interme diate-scale quantum (nisq) algorithms , Reviews of Modern Physics 94 (2022), 015004. 6. G. Buonaiuto, R. Guarasci, A. Min utolo, G. De Pietro, and M. Esposito, Quantum tr ansfer le arning for ac c eptability judgements , Quantum Machine Intelligence 6 (2024), no. 1, 13. 7. W. Cukierski, Do gs vs. c ats dataset , Kaggle comp etition, 2013. 8. J. Deng, W. Dong, R. So c her, L.-J. Li, K. Li, and F.-F. Li, Imagenet: A lar ge-sc ale hier ar chical image datab ase , Pro ceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (Miami, FL, USA), IEEE, 2009, pp. 248–255. 9. Pulkit Dwiv edi and Mansi Ka jal, Efficient de ep learning tr aining: an ener gy-c onscious adaptive fr amework , International Journal of Data Science and Analytics 20 (2025), 4741–4755. 10. E. F arhi and H. Neven, Classific ation with quantum neural networks on near term pr o c essors , arXiv preprint arXiv:1802.06002, 2018, 11. S. Garladinne-Hemanth, Solar panel dust detection dataset , Kaggle, 2023. 12. I. Goo dfellow, Y. Bengio, and A. Courville, De ep le arning , MIT Press, Cambridge, MA, USA, 2016. 13. Gian Giacomo Guerreschi and Mikhail Smelyanskiy , Pr actic al optimization for hybrid quantum-classic al algorithms , Scientific Rep orts 10 (2020), no. 1, 4230. 14. K. Han, T. Huang, and L. Yin, T r ansfer le arning ac c eler ating c omplex p ar ameters optimizations b ase d on quantum-inspir e d par al lel multi-layer monte c arlo algorithm: Theory, applic ation, implementation , Applied Soft Computing 134 (2023), 109982. 15. V. Havl ´ ıˇ cek, A. D. C´ orcoles, K. T emme, A. W. Harrow, A. Kandala, J. M. Cho w, and J. M. Gambetta, Sup ervise d le arning with quantum-enhanc e d fe atur e sp ac es , Nature 567 (2019), 209–212. 16. K. He, X. Zhang, S. Ren, and J. Sun, De ep residual le arning for image r e c o gnition , Pro ceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (Las V egas, NV, USA), IEEE, 2016, pp. 770–778. 17. B. E. Katı, E. U. K¨ u¸ c¨ uksille, and G. Sarıman, Enhancing de epfake dete ction thr ough quantum tr ansfer le arning and class-attention vision tr ansformer ar chite ctur e , Applied Sciences 15 (2025), no. 2, 525. 18. A. Khatun and M. Usman, Quantum transfer le arning with adversarial r obustness for classific ation of high-r esolution image datasets , Adv anced Quantum T echnologies 8 (2025), no. 1, 2400268. 19. J. Kim, J. Huh, and D. K. P ark, Classic al-to-quantum c onvolutional neur al network tr ansfer le arning , Neurocomputing 555 (2023), 126643. 20. T. Koike-Akino, P . W ang, and Y. W ang, Quantum tr ansfer le arning for wi-fi sensing , Pro ceedings of the IEEE International Conference on Communications (ICC) (Seoul, South Korea), IEEE, 2022, pp. 654–659. 21. N. V. Kumsetty , A. Bhat Nekk are, K. S. Sowm y a, and A. Kumar M., T r ashb ox: T rash dete ction and clas- sific ation using quantum tr ansfer le arning , Pro ceedings of the Conference of Op en Innov ations Asso ciation (FRUCT) (Helsinki, Finland), Op en Innov ations Asso ciation FRUCT, 2022, pp. 125–130. 22. Y. LeCun, Y. Bengio, and G. Hinton, Deep le arning , Nature 521 (2015), no. 12, 436–444. 23. A. Mari, T. R. Bromley , J. Izaac, M. Sch uld, and N. Killoran, T r ansfer learning in hybrid classic al-quantum neur al networks , Quantum 4 (2020), 340. HYBRID CLASSICAL-QUANTUM TRANSFER LEARNING WITH NOISY QUANTUM CIRCUITS 23 24. A. Mir, U. Y asin, S. N. Khan, A. A thar, R. Jabeen, and S. Aslam, Diab etic r etinop athy detection using classic al–quantum tr ansfer le arning appr o ach and pr ob ability model , Computers, Materials and Con tinua 71 (2021), no. 2, 3733–3754. 25. K. Mitarai, M. Negoro, M. Kitaga wa, and K. F ujii, Quantum cir cuit learning , Ph ysical Review A 98 (2018), no. 3, 032309. 26. H. Mogalapalli, M. Abburi, B. Nithy a, and S. K. V amsi Bandreddi, Classic al–quantum tr a nsfer le arning for image classification , SN Computer Science 2 (2021), no. 5, 1–13. 27. M. Nickparv ar, Br ain tumor MRI dataset , Kaggle, 2021. 28. S. Otgon baatar, G. Sch warz, M. Datcu, and D. Kranzlm ¨ uller, Quantum tr ansfer le arning for r e al-world, smal l, and high-dimensional r emotely sense d datasets , IEEE Journal of Selected T opics in Applied Earth Observ ations and Remote Sensing 16 (2023), 433–447. 29. S. J. Pan and Q. Y ang, A survey on tr ansfer le arning , IEEE T ransactions on Knowledge and Data Engi- neering 22 (2010), no. 10, 1345–1359. 30. J. Preskill, Quantum computing in the NISQ era and beyond , Quantum 2 (2018), 79. 31. J. Qi and J. T ejedor, Classic al-to-quantum tr ansfer le arning for sp oken c ommand r e co gnition b ase d on quantum neur al networks , Pro ceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (Singap ore), IEEE, 2022, pp. 8627–8631. 32. Qiskit Aer Contributors, Qiskit aer: High p erformanc e simulator for quantum cir cuits , https://qiskit. org/ecosystem/aer/ , 2023. 33. Qiskit Contributors, Qiskit: An op en-sour c e fr amework for quantum c omputing , https://qiskit.org/ , 2023. 34. Qiskit Machine Learning Contributors, Qiskit machine le arning , https://qiskit.org/ecosystem/ machine- learning/ , 2023. 35. I. Radosav ovic, R. P . Kosara ju, R. Girshick, K. He, and P . Doll´ ar, Designing network design spac es , Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (Seattle, W A, USA), IEEE, 2020, pp. 10428–10436. 36. F. Ro dr ´ ıguez-D ´ ıaz, D. Guti´ errez-Avil´ es, A. T roncoso, and F. Mart ´ ınez- ´ Alv arez, A survey of quantum ma- chine le arning: F oundations, algorithms, fr ameworks, data and applications , ACM Computing Surveys 58 (2025), no. 4, 1–35. 37. O. Russak ovsky , J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy , A. Khosla, M. Bernstein, A. C. Berg, and F.-F. Li, Imagenet lar ge sc ale visual r e c o gnition chal lenge , International Journal of Computer Vision 115 (2015), no. 3, 211–252. 38. M. Sandler, A. How ard, M. Zhu, A. Zhmogino v, and L.-C. Chen, MobileNetV2: Inverte d residuals and line ar b ottlene cks , Pro ceedings of the IEEE Conference on Computer Vision and P attern Recognition (CVPR) (Salt Lake City , UT, USA), IEEE, 2018, pp. 4510–4520. 39. M. Sch uld, I. Sinayskiy , and F. Petruccione, An intr o duction to quantum machine learni ng , Contemporary Physics 56 (2015), no. 2, 172–185. 40. R. Sch w artz, J. Do dge, N. A. Smith, and O. Etzioni, Gr e en AI , Communications of the ACM 63 (2020), no. 12, 54–63. 41. S. Sim, P . D. Johnson, and A. Aspuru-Guzik, Expr essibility and entangling cap ability of par ameterize d quan- tum cir cuits for hybrid quantum-classic al algorithms , Adv anced Quan tum T echnologies 2 (2019), no. 12, 1900070. 42. James C. Spall, Multivariate sto chastic approximation using a simultane ous perturb ation gr adient appr ox- imation , IEEE T ransactions on Automatic Control 37 (1992), no. 3, 332–341. 43. M. T an and Q. V. Le, Efficientnet: R ethinking mo del scaling for c onvolutional neur al networks , Pro ceedings of the 36th International Conference on Machine Learning (ICML) (Long Beach, CA, USA), PMLR, 2019, pp. 6105–6114. 44. TheDataSith, Hymenopter a dataset , Kaggle, 2021. 45. F. H. V ermeire and W. H. Green, T r ansfer le arning for solvation fre e energies: F r om quantum chemistry to experiments , Chemical Engineering Journal 418 (2021), 129307. 46. F. W ang, G. A. Kebede, S. Lo, and B. H. W oldegiorgis, An emb e dding layer-b ase d quantum long short- term memory mo del with transfer le arning for pr oton exchange membr ane fuel stack r emaining useful life pr e diction , Energy 308 (2024), 133054. 47. R. W ang, J. Du, and T. Gao, Quantum tr ansfer learning using the lar ge-sc ale unsup ervise d pre-tr aine d mo del wavlm-lar ge for synthetic sp e e ch dete ction , Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (Rho des Island, Greece), IEEE, 2023, pp. 1–5. 24 D. MAR T ´ IN-P ´ EREZ ET AL. 48. K. Y ogara j, B. Quanz, T. Vik as, A. Mondal, and S. Mondal, Post-variational classic al quantum tr ansfer le arning for binary classific ation , Scientific Rep orts 15 (2025), no. 1, 23682. 49. J. Y osinski, J. Clune, Y. Bengio, and H. Lipson, How tr ansferable ar e fe atur es in de ep neur al networks? , Adv ances in Neural Information Pro cessing Systems 27 (Mon treal, QC, Canada), Curran Associates, Inc., 2014, pp. 3320–3328. 50. R. Zen, L. My , R. T an, F. H´ eb ert, M. Gattobigio, C. Miniatura, D. P oletti, and S. Bressan, T r ansfer le arning for sc alability of neur al-network quantum states , Physical Review E 101 (2020), no. 5, 053301. 51. F. Zhuang, Z. Qi, K. Duan, D. Xi, Y. Zhu, H. Zhu, J. Zhou, H. Huang, and Q. He, A c ompr ehensive survey on transfer le arning , Pro ceedings of the IEEE 109 (2021), no. 1, 43–76.

Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Leave a Comment