ExpandNet: A Deep Convolutional Neural Network for High Dynamic Range Expansion from Low Dynamic Range Content
High dynamic range (HDR) imaging provides the capability of handling real world lighting as opposed to the traditional low dynamic range (LDR) which struggles to accurately represent images with higher dynamic range. However, most imaging content is still available only in LDR. This paper presents a method for generating HDR content from LDR content based on deep Convolutional Neural Networks (CNNs) termed ExpandNet. ExpandNet accepts LDR images as input and generates images with an expanded range in an end-to-end fashion. The model attempts to reconstruct missing information that was lost from the original signal due to quantization, clipping, tone mapping or gamma correction. The added information is reconstructed from learned features, as the network is trained in a supervised fashion using a dataset of HDR images. The approach is fully automatic and data driven; it does not require any heuristics or human expertise. ExpandNet uses a multiscale architecture which avoids the use of upsampling layers to improve image quality. The method performs well compared to expansion/inverse tone mapping operators quantitatively on multiple metrics, even for badly exposed inputs.
💡 Research Summary
The paper addresses the practical problem that most visual content is still available only in low‑dynamic‑range (LDR) format, while modern displays and pipelines increasingly require high‑dynamic‑range (HDR) imagery. Traditional inverse tone‑mapping operators (ETOs) are either global, applying a simple function to all pixels, or local, relying on handcrafted expand maps and heuristics. These approaches often fail on severely under‑ or over‑exposed regions and require expert tuning. Recent deep‑learning methods for image‑to‑image translation, such as U‑Net‑based architectures, have shown promise but typically employ encoder‑decoder structures with up‑sampling layers that introduce checkerboard artifacts, blocking, or banding, especially in large over‑exposed areas.
To overcome these limitations, the authors propose ExpandNet, a novel convolutional neural network specifically designed for LDR‑to‑HDR expansion without any up‑sampling operations. ExpandNet consists of three parallel branches:
-
Local Branch – Two convolutional layers with 3×3 kernels, stride 1, padding 1, and 64/128 feature maps respectively. Its receptive field is 5×5 pixels, enabling the network to capture fine‑grained, high‑frequency details directly at the pixel level.
-
Dilation Branch – Four dilated convolutional layers (kernel 3×3, dilation 2, stride 1, padding 2) each with 64 feature maps. The effective receptive field expands to 17×17 pixels, allowing the network to learn medium‑range structures that the local branch would miss.
-
Global Branch – The input image is resized to 256×256 and passed through seven stride‑2 convolutions (3×3 kernels, except the final 4×4 kernel) each producing 64 channels. This progressively reduces spatial resolution to a 1×1 feature vector that encodes image‑wide illumination and color context.
The outputs of the local and dilation branches retain the original spatial dimensions, while the global vector is broadcast to match those dimensions. All three are concatenated along the channel axis, yielding 256 feature maps, which are fused by a 1×1 convolution. A final 3×3 convolution produces the three‑channel HDR prediction. All hidden layers use the Scaled Exponential Linear Unit (SELU) activation, providing self‑normalizing properties and eliminating the need for batch normalization. The loss combines L1/L2 terms with an HDR‑specific tone‑mapping component to encourage accurate luminance reconstruction.
Training data are generated from a limited set of HDR images by applying random exposure adjustments, cropping, and color augmentations, producing on‑the‑fly 256×256 LDR‑HDR pairs. This augmentation strategy compensates for the scarcity of HDR ground truth. The network is trained end‑to‑end in a supervised fashion.
Evaluation includes PSNR, SSIM, HDR‑VDP2, and PU‑metric, as well as subjective visual comparisons. Across all metrics, ExpandNet outperforms classic global and local ETOs and recent CNN‑based methods, particularly in regions with severe clipping where other methods exhibit blocking or banding artifacts. Visual results show smoother highlight recovery, preserved texture, and faithful color rendition without the haloing typical of deconvolution‑based up‑sampling.
The contributions of the paper are: (i) a dedicated multi‑scale CNN architecture that avoids up‑sampling and thus reduces common artifacts; (ii) an effective data‑augmentation pipeline for limited HDR datasets; (iii) a comprehensive quantitative and qualitative benchmark demonstrating state‑of‑the‑art performance; and (iv) a fully automatic, parameter‑free solution suitable for non‑expert users. Limitations include the current focus on 1080p inputs, leaving scalability to ultra‑high‑resolution content untested, and the simple broadcast of the global vector, which may not capture complex spatial variations. Future work could explore attention mechanisms or hierarchical processing to further improve global‑local integration and extend the model to higher resolutions.
Comments & Academic Discussion
Loading comments...
Leave a Comment