FD-DB: Frequency-Decoupled Dual-Branch Network for Unpaired Synthetic-to-Real Domain Translation
Synthetic data provide low-cost, accurately annotated samples for geometry-sensitive vision tasks, but appearance and imaging differences between synthetic and real domains cause severe domain shift and degrade downstream performance. Unpaired synthetic-to-real translation can reduce this gap without paired supervision, yet existing methods often face a trade-off between photorealism and structural stability: unconstrained generation may introduce deformation or spurious textures, while overly rigid constraints limit adaptation to real-domain statistics. We propose FD-DB, a frequency-decoupled dual-branch model that separates appearance transfer into low-frequency interpretable editing and high-frequency residual compensation. The interpretable branch predicts physically meaningful editing parameters (white balance, exposure, contrast, saturation, blur, and grain) to build a stable low-frequency appearance base with strong content preservation. The free branch complements fine details through residual generation, and a gated fusion mechanism combines the two branches under explicit frequency constraints to limit low-frequency drift. We further adopt a two-stage training schedule that first stabilizes the editing branch and then releases the residual branch to improve optimization stability. Experiments on the YCB-V dataset show that FD-DB improves real-domain appearance consistency and significantly boosts downstream semantic segmentation performance while preserving geometric and semantic structures.
💡 Research Summary
The paper tackles the long‑standing Sim2Real gap that hampers geometry‑sensitive vision tasks such as 6‑DoF pose estimation, semantic segmentation, and object detection. While synthetic data can be generated at scale with perfect pixel‑wise annotations, the appearance mismatch (illumination, sensor noise, post‑processing) between synthetic and real domains leads to severe performance degradation when models trained on synthetic data are deployed in the wild. Unpaired image‑to‑image translation is a popular remedy, but existing methods (CycleGAN, CUT, CyCADA, etc.) struggle to balance photorealism with structural fidelity: unconstrained generators often introduce geometric distortions, spurious textures, or label‑inconsistent changes that render the synthetic annotations unusable.
FD‑DB (Frequency‑Decoupled Dual‑Branch) proposes a fundamentally different architecture that explicitly separates low‑frequency style transfer from high‑frequency detail synthesis. The generator consists of two parallel branches:
-
Interpretable Editing Branch (G_edit) – a lightweight CNN predicts six global imaging parameters (white‑balance gains, exposure value, contrast factor, saturation, blur sigma, and grain amplitude/scale). These parameters are mapped through sigmoid, tanh, or log‑tanh functions to enforce physically plausible ranges. A differentiable chain of editing operators then applies the parameters to the input synthetic image, producing a low‑frequency “base” image y_edit. Because the operations only affect global color, tone, and blur, they preserve object boundaries and geometry.
-
Free Residual Branch (G_free) – a conventional encoder‑decoder (UNet‑style) generates a full‑resolution image y_free. A Gaussian low‑pass filter LP(·;σ) extracts its low‑frequency component y_L, and the high‑frequency residual y_H = y_free – y_L is computed.
The final translated image is obtained by gated fusion:
y_R = clip( y_edit + g ⊙ y_H ),
where g ∈
Comments & Academic Discussion
Loading comments...
Leave a Comment