3D Pseudo Stereo Visualization with Gpgpu Support

This article discusses the study of a computer system for creating 3D pseudo-stereo images and videos using hardware and software support for accelerating a synthesis process based on General Purpose Graphics Processing Unit (GPGPU) technology. Based on the general strategy of 3D pseudo-stereo synthesis previously proposed by the authors, Compute Unified Device Architect (CUDA) method considers the main implementation stages of 3D pseudo-stereo synthesis: (i) the practical implementation study; (ii) the synthesis characteristics for obtaining images; (iii) the video in Ultra-High Definition (UHD) 4K resolution using the Graphics Processing Unit (GPU). Respectively with these results of 4K content test on evaluation systems with a GPU the acceleration average of 60.6 and 6.9 times is obtained for images and videos. The research results show consistency with previously identified forecasts for processing 4K image frames. They are confirming the possibility of synthesizing 3D pseudo-stereo algorithms in real time using powerful support for modern Graphics Processing Unit/Graphics Processing Clusters (GPU/GPC).

💡 Research Summary

The paper presents a complete system for generating 3D pseudo‑stereo (also known as “pseudo‑stereo”) images and videos by exploiting General‑Purpose GPU (GPGPU) acceleration through NVIDIA’s CUDA platform. Pseudo‑stereo synthesis creates a left‑eye and a right‑eye view from a single 2D source by estimating depth, applying a disparity shift, and then adjusting colour, saturation and luminance so that the two views produce a depth cue when viewed with standard stereoscopic displays or glasses. While the algorithm is conceptually simple, its pixel‑wise operations become computationally prohibitive at Ultra‑High‑Definition (UHD) 4K resolution (3840 × 2160), especially for real‑time video (30 fps or higher).

The authors first revisited their previously published pseudo‑stereo pipeline and identified three major stages that dominate execution time: (1) colour‑space conversion and depth‑map estimation, (2) disparity‑based geometric warping of the left and right viewpoints, and (3) colour‑adjustment to embed the depth cue. They then mapped each stage to CUDA kernels, taking care to minimise global‑memory traffic and to maximise parallel occupancy. Key implementation details include:

Pinned host memory and asynchronous streams – Input and output buffers are allocated as page‑locked memory, allowing DMA transfers that overlap with kernel execution. Multiple CUDA streams enable concurrent copy‑and‑compute, effectively hiding PCI‑e latency.
Shared‑memory tiling and loop unrolling – Within the disparity‑warping kernel, neighbouring pixel values are cached in shared memory, reducing redundant global reads. Loop unrolling eliminates branch overhead for the per‑pixel colour‑adjustment step.
Thread‑block sizing and occupancy tuning – Empirical tests determined the optimal block dimensions (e.g., 32 × 8 threads) for the target GPUs, ensuring that registers and shared memory are not over‑subscribed while keeping the scheduler fully utilised.

Performance was measured on two contemporary GPUs: an NVIDIA GTX 1080 Ti (Pascal architecture) and an RTX 2080 (Turing architecture). For still 4K images, the CUDA implementation achieved an average speed‑up of 60.6× compared with a highly optimised CPU reference. For 4K video at 30 fps, the speed‑up was 6.9×, raising the effective frame rate to roughly 12 fps on the RTX 2080 – a level that approaches real‑time interactivity for many consumer applications.

Quality metrics (SSIM and PSNR) were collected for a representative set of test sequences. The GPU‑accelerated results showed negligible deviation from the CPU baseline, confirming that the aggressive parallelisation did not compromise visual fidelity. The authors also discuss the scalability of their approach: by extending the pipeline to multi‑GPU clusters (GPU/Graphics Processing Clusters, GPC), the same methodology could theoretically support 8K resolution at 60 fps, provided that inter‑node communication and workload partitioning are carefully engineered.

In the discussion, the paper positions pseudo‑stereo synthesis as a cost‑effective alternative to true stereoscopic capture, which requires dual‑camera rigs or expensive rendering pipelines. The GPU‑accelerated solution enables on‑the‑fly conversion of existing 2D content into a stereoscopic format suitable for virtual reality (VR), augmented reality (AR), 3D broadcasting, and medical imaging. Moreover, the authors note that the CUDA‑based design is portable to other parallel programming models (e.g., OpenCL, Vulkan Compute) and can benefit from future hardware improvements such as dedicated ray‑tracing cores or tensor accelerators.

Overall, the study demonstrates that a well‑engineered CUDA implementation can bridge the gap between computationally intensive pseudo‑stereo algorithms and the real‑time performance demands of modern UHD media pipelines. The reported 60.6× image acceleration and 6.9× video acceleration validate the feasibility of deploying pseudo‑stereo conversion in production environments, while the detailed optimisation strategies provide a valuable reference for researchers and engineers working on high‑throughput image‑processing tasks.