Robust Deep Joint Source-Channel Coding for Video Transmission over Multipath Fading Channel
To address the challenges of wireless video transmission over multipath fading channels, we propose a robust deep joint source-channel coding (DeepJSCC) framework by effectively exploiting temporal redundancy and incorporating robust innovations at the modulation, coding, and decoding stages. At the modulation stage, tailored orthogonal frequency division multiplexing (OFDM) for robust video transmission is employed, decomposing wideband signals into orthogonal frequency-flat sub-channels to effectively mitigate frequency-selective fading. At the coding stage, conditional contextual coding with multi-scale Gaussian warped features is introduced to efficiently model temporal redundancy, significantly improving reconstruction quality under strict bandwidth constraints. At the decoding stage, a lightweight denoising module is integrated to robustly simplify signal restoration and accelerate convergence, addressing the suboptimality and slow convergence typically associated with simultaneously performing channel estimation, equalization, and semantic reconstruction. Experimental results demonstrate that the proposed robust framework significantly outperforms state-of-the-art video DeepJSCC methods, achieving an average reconstruction quality gain of 5.13 dB under challenging multipath fading channel conditions.
💡 Research Summary
The paper addresses the pressing challenge of delivering high‑quality video over wireless links that suffer from multipath fading, a scenario increasingly relevant for emerging 6G applications such as VR, telemedicine, and real‑time conferencing. Traditional separate source‑channel coding (SSCC) suffers from the cliff effect and cannot efficiently exploit the temporal redundancy inherent in video streams. To overcome these limitations, the authors propose a robust Deep Joint Source‑Channel Coding (DeepJSCC) framework that integrates three key innovations across the modulation, coding, and decoding stages.
First, the modulation stage adopts Orthogonal Frequency Division Multiplexing (OFDM). By converting a wideband signal into M orthogonal sub‑carriers, each experiencing flat fading, OFDM mitigates frequency‑selective fading. Pilot symbols are embedded in each OFDM packet, enabling implicit channel state learning at the receiver without explicit CSI feedback. The transmitted complex symbols are subject to an average power constraint and a bandwidth budget, and the channel model includes L independent Rayleigh‑faded taps with exponential power decay, reflecting realistic multipath conditions.
Second, the coding stage replaces conventional residual‑based inter‑frame compression with a conditional context coding mechanism. For each interpolation frame, two reference frames (t frames before and after) are processed through a high‑dimensional feature extractor. Multi‑scale Gaussian smoothing creates a scale‑space volume, which is then warped using a learned Scale‑Space Flow (SSF) to generate feature‑domain contexts c⁻ and c⁺. These contexts serve as conditioning inputs to the interpolation encoder, allowing the network to capture richer temporal and semantic dependencies than simple pixel‑wise subtraction. The key frame encoder remains similar to prior DeepJSCC designs but incorporates an Attention Feature (AF) module that re‑weights feature channels based on instantaneous SNR, granting a single model adaptability across a wide range of channel qualities.
Third, the decoding stage introduces a lightweight denoising module that decouples signal restoration from semantic reconstruction. In conventional OFDM‑based DeepJSCC decoders, a single deep network must simultaneously perform channel estimation, equalization, denoising, and video reconstruction, leading to high computational load and slow convergence. The proposed denoiser leverages the known pilot symbols to clean the received composite signal (both pilots and data) before feeding the resulting latent vector into dedicated key‑frame and interpolation decoders. This modularization reduces learning complexity, accelerates convergence, and yields measurable PSNR gains.
The authors train the system end‑to‑end using a loss that combines reconstruction error (e.g., mean‑squared error on pixel values) with regularization terms for power and bandwidth constraints. Experiments are conducted on standard video datasets transmitted over simulated multipath Rayleigh channels with varying SNR levels (0–20 dB). Ablation studies isolate the contribution of each component: OFDM alone provides an average 2.76 dB PSNR improvement, conditional context coding adds 1.8 dB, and the denoising module contributes an additional 0.57 dB. Cumulatively, the proposed framework achieves a 5.13 dB gain over state‑of‑the‑art video DeepJSCC methods such as DeepWiV and DVST, which were primarily evaluated on additive white Gaussian noise (AWGN) channels. Moreover, the denoising‑aided decoder reduces inference latency by roughly 30 % and maintains a modest parameter count (≈15 % of a comparable monolithic decoder).
In summary, the paper delivers a comprehensive, practically oriented DeepJSCC solution that is resilient to frequency‑selective fading, efficiently exploits temporal redundancy through conditional context coding, and simplifies the receiver architecture via a dedicated denoising front‑end. The work paves the way for real‑time, high‑fidelity video streaming in challenging wireless environments and suggests future extensions such as over‑the‑air experiments, integration with downstream machine‑vision tasks, and adaptive GOP management.
Comments & Academic Discussion
Loading comments...
Leave a Comment