OpenDPDv2: A Unified Learning and Optimization Framework for Neural Network Digital Predistortion
Neural network (NN)-based Digital Predistortion (DPD) has demonstrated superior performance in improving signal quality in wideband radio frequency (RF) power amplifiers (PAs) employing complex modulation. However, NN DPDs usually rely on a large number of parameters for effective linearization and can significantly contribute to the energy consumption of the digital back-end in RF systems. This paper presents OpenDPDv2, an open-source, end-to-end framework that unifies PA modeling, NN-DPD learning, and deployment-oriented model optimization to reduce inference energy while preserving linearization performance. OpenDPDv2 introduces TRes-DeltaGRU, a delta-RNN DPD architecture with a lightweight temporal residual path that improves robustness under aggressive temporal sparsity, and it supports joint optimization of temporal sparsity and fixed-point quantization. On a 3.5 GHz GaN Doherty PA driven by a TM3.1a 200 MHz 256-QAM OFDM signal, the FP32 TRes-DeltaGRU model achieves ACPR of -59.9 dBc and EVM of -42.1 dB. By combining quantization with dynamic temporal sparsity, the model reduces inference energy by 4.5x while maintaining -51.8 dBc ACPR and -35.2 dB EVM at 56% temporal sparsity. Code, datasets, and documentation are publicly available at https://github.com/lab-emi/OpenDPD.
💡 Research Summary
OpenDPDv2 is an open‑source, end‑to‑end framework that unifies power‑amplifier (PA) modeling, neural‑network based digital predistortion (NN‑DPD) learning, and deployment‑oriented model optimization. The work addresses a critical bottleneck in modern wideband transmitters: while NN‑DPD can dramatically improve linearization for high‑order modulations such as 256‑QAM OFDM, the required hundreds of parameters and full‑precision floating‑point arithmetic lead to substantial digital back‑end power consumption.
The core technical contribution is the Temporal Residual‑DeltaGRU (TRes‑DeltaGRU) architecture. Traditional DeltaGRU exploits temporal redundancy by thresholding the element‑wise differences (Δ) of the input features and hidden states, converting dense matrix‑vector multiplications into sparse updates. However, in the original design the DPD output is directly coupled to the hidden‑state sparsity; aggressive thresholds therefore cause a loss of output dynamics and degrade linearization. TRes‑DeltaGRU decouples these two streams by adding a lightweight residual path built from a Temporal Convolutional Network (TCN). The TCN uses dilated 1‑D convolutions and the hardware‑friendly Hard‑Swish activation to capture a wide temporal context without increasing the parameter count. Consequently, even with a high delta threshold (56 % temporal sparsity) the residual path restores the missing dynamics, allowing the model to retain high‑quality linearization.
In parallel, the framework integrates quantization‑aware training (QAT) to move from FP32 to INT8 arithmetic. During training each layer learns a scaling factor constrained to a power‑of‑two, enabling fixed‑point implementation with simple shift operations. The authors formulate a joint loss that combines the linearization error (MSE at the PA output), a sparsity regularizer (encouraging larger Δ thresholds), and a quantization error term. This joint optimization lets temporal sparsity and low‑precision quantization reinforce each other, achieving a balance between computational reduction and signal fidelity.
The OpenDPDv2 pipeline consists of four stages: (1) data acquisition and PA surrogate modeling, (2) frozen PA model creation, (3) temporally‑sparse DPD learning using TRes‑DeltaGRU, and (4) quantization‑aware fine‑tuning. All stages are implemented in PyTorch, and the authors release the APA_200MHz dataset, the PA surrogate, baseline models (GMP, DeltaDPD, QuantDPD), and the full training scripts, ensuring reproducibility.
Experimental validation uses a 3.5 GHz GaN Doherty PA driven by a 200 MHz TM3.1a 256‑QAM OFDM signal. The FP32 TRes‑DeltaGRU achieves an Adjacent Channel Power Ratio (ACPR) of –59.9 dBc and an Error Vector Magnitude (EVM) of –42.1 dB. When combined with INT8 quantization and a dynamic delta threshold that yields 56 % temporal sparsity, the model still delivers –51.8 dBc ACPR and –35.2 dB EVM, while reducing inference energy by a factor of 4.5.
To ground the energy claims in realistic hardware behavior, the authors employ the Gem5 architectural simulator configured for an ARM Cortex‑A53 core. They collect instruction‑mix statistics, cache‑access patterns, and memory‑traffic volumes for each variant (dense FP32, sparse FP32, quantized sparse). The results show that mixed‑precision MACs are up to 20× more energy‑efficient than FP32, and that temporal sparsity dramatically cuts memory reads/writes, which dominate total energy consumption. The paper also discusses cache‑friendly scheduling and the impact of memory bandwidth on overall efficiency.
In summary, OpenDPDv2 delivers a practical solution for energy‑constrained wideband transmitters by (i) introducing TRes‑DeltaGRU, a delta‑RNN with a TCN‑based residual that preserves output dynamics under aggressive sparsity, (ii) jointly optimizing temporal sparsity and quantization through a unified training objective, (iii) providing a reproducible PyTorch framework with all data and code publicly available, and (iv) validating both RF performance and realistic processor‑level energy savings. The work paves the way for deploying high‑accuracy NN‑DPD in base‑station and Wi‑Fi hardware where power budgets are tight, and it opens avenues for further research on other PA topologies, higher‑order modulations, and ASIC/FPGA co‑design.
Comments & Academic Discussion
Loading comments...
Leave a Comment