단일 카메라 기반 도로 표면 정밀 복원 기술

Reading time: 6 minute
...

📝 Original Info

  • Title: 단일 카메라 기반 도로 표면 정밀 복원 기술
  • ArXiv ID: 2512.04303
  • Date: 2025-12-03
  • Authors: Gasser Elazab, Maximilian Jansen, Michael Unterreiner, Olaf Hellwich

📝 Abstract

Accurate perception of the vehicle's 3D surroundings, including fine-scale road geometry, such as bumps, slopes, and surface irregularities, is essential for safe and comfortable vehicle control. However, conventional monocular depth estimation often oversmooths these features, losing critical information for motion planning and stability. To address this, we introduce Gammafrom-Mono (GfM), a lightweight monocular geometry estimation method that resolves the projective ambiguity in single-camera reconstruction by decoupling global and local structure. GfM predicts a dominant road surface plane together with residual variations expressed by 𝛾, a dimensionless measure of vertical deviation from the plane, defined as the ratio of a point's height above it to its depth from the camera, and grounded in established planar parallax geometry. With only the camera's height above ground, this representation deterministically recovers metric depth via a closed form, avoiding full extrinsic calibration and naturally prioritizing nearroad detail. Its physically interpretable formulation makes it well suited for self-supervised learning, eliminating the need for large annotated datasets. Evaluated on KITTI and the Road Surface Reconstruction Dataset (RSRD), GfM achieves state-of-the-art near-field accuracy in both depth and 𝛾 estimation while maintaining competitive global depth performance. Our lightweight 8.88M-parameter model adapts robustly across diverse camera setups and, to our knowledge, is the first selfsupervised monocular approach evaluated on RSRD.

💡 Deep Analysis

Deep Dive into 단일 카메라 기반 도로 표면 정밀 복원 기술.

Accurate perception of the vehicle’s 3D surroundings, including fine-scale road geometry, such as bumps, slopes, and surface irregularities, is essential for safe and comfortable vehicle control. However, conventional monocular depth estimation often oversmooths these features, losing critical information for motion planning and stability. To address this, we introduce Gammafrom-Mono (GfM), a lightweight monocular geometry estimation method that resolves the projective ambiguity in single-camera reconstruction by decoupling global and local structure. GfM predicts a dominant road surface plane together with residual variations expressed by 𝛾, a dimensionless measure of vertical deviation from the plane, defined as the ratio of a point’s height above it to its depth from the camera, and grounded in established planar parallax geometry. With only the camera’s height above ground, this representation deterministically recovers metric depth via a closed form, avoiding full extrinsic calibr

📄 Full Content

Modern robotics and autonomous vehicles demand scalable 3D perception, but achieving it with a single camera remains challenging [25]. Monocular Geometry Estimation (MGE) reconstructs per-pixel 3D structure from a single image, enabling cost-effective 3D perception without specialized sensors [2]. Specifically, accurate [9] and RSRD [48], with input images in the top-left.

3D reconstruction of the near-road geometry is crucial for navigation in autonomous driving [17,18], supporting obstacle avoidance and motion planning [34], and it is also vital for off-road and legged robotics, where elevation and local slope guide traversability and footstep planning [23]. However, monocular depth estimation methods, which are commonly used in these domains, struggle to accurately capture road topography [49]. Textureless pavements and low-contrast surfaces often lead to oversmoothing and underestimation of slopes, causing small obstacles and surface irregularities to be missed [1,41,49]. This limitation is critical, as small height variations, such as bumps or road-level changes, can differentiate drivable regions from hazards and negatively impact vehicle dynamics and safety [18,24,29].

Metric foundation depth models [4,19,20,43,44] generalize well across domains and provide strong scene-level predictions. However, they do not explicitly target road-relative quantities such as height above the ground or local slope. Moreover, they often suffer from residual scale drift when deployed in new envi-ronments, typically requiring inference-time scale correction. In contrast, self-supervised monocular methods [5,8,12,15,30,32,39,47] attempt to resolve scale ambiguity by incorporating metric anchors such as odometry sensors or ground-plane constraints. While these strategies stabilize global scale, they remain depthcentric, leaving road geometry underconstrained.

In practice, existing monocular pipelines recover height above the road indirectly via costly postprocessing, such as elevation maps [23]. Recent topdown (BEV) approaches [49] model road surfaces explicitly but rely on discretized ground-plane grids and dense ground-truth supervision, limiting resolution and scalability in unlabeled settings. Since single-view depth is projectively ambiguous [26,42], a road-relative height-to-depth ratio 𝛾 = ℎ/𝑑 offers a complementary representation, making 𝛾 dimensionless and tied to the ground. As shown in Fig. 2, doubling the scale of an object relative to the ground leaves the apparent vertical offset unchanged. From a single image the metric configuration is indistinguishable, so depth remains scale ambiguous, whereas in 𝛾 space, both configurations take the same dimensionless value, consistently tied to the road. With a known ground plane and camera height, this 𝛾 value converts to absolute height and depth in a simple closed-form way. Accordingly, we introduce Gamma from Mono (GfM), a single-frame approach that reframes monocular geometry around the dominant road plane, mitigating projective ambiguity and reducing scale to a single camera-height parameter. Our key contributions are:

• We propose a model that directly predicts a roadrelative representation, comprising a global roadplane normal and a per-pixel height-to-depth ratio (𝛾), preserving near-road detail. • To our knowledge, this is the first single-frame, selfsupervised method that directly regresses 𝛾 for roadrelative geometry, enabling explicit, interpretable estimates of road topography. • We resolve metric scale from a dimensionless prediction, converting to metric depth using only known camera height and avoiding full extrinsic calibration or test-time fitting.

Most prior work in monocular geometry estimation predicts per-pixel depth [22]. By contrast, the height-todepth ratio 𝛾 has been a key parameter for multi-view reconstruction via planar parallax [14,27,28]. For example, MonoPP [8] uses 𝛾 only as a training-time scale cue distilled from multi-frame planar-parallax constraints.

On the other hand, Yuan et al. [45] computes per-pixel 𝛾 from multi-frame homography alignment with LiDAR supervision. Beyond these, monocular methods remain depth-centric. Both approaches rely on multi-view cues or explicit ground-truth depth, and neither regresses 𝛾 directly from a single image in a self-supervised setting.

Scale ambiguity. Monocular depth estimation (MDE) has advanced greatly, yet models trained only on monocular images recover depth only up to an unknown scale factor [35,42]. However, metric-scale depth is crucial for autonomous driving safety [33]. Recent foundational models such as Metric3D [44], UniDepth [19,20], MoGe [35,36], DepthAnything [42,43], and Depth-Pro [4] include metric-depth variants, yet still exhibit small but persistent scale drift in novel environments. Moreover, these methods are trained on millions of images and use large transformer backbones, making them less suitable for resource-constrained real-time deployment on limited hardware

…(Full text truncated)…

📸 Image Gallery

GroCo_depth_error.png GroCo_depth_error.webp GroCo_gamma.png GroCo_gamma.webp GroCo_gamma_error.png GroCo_gamma_error.webp Groco_depth.png Groco_depth.webp MoGe.png MoGe.webp NEW_figure_1.png NEW_figure_1.webp PP_fig1.png PP_fig1.webp augmented_rotated_image0.0_-5.0_0.0.png augmented_rotated_image0.0_-5.0_0.0.webp augmented_rotated_image0.0_0.0_-2.0.png augmented_rotated_image0.0_0.0_-2.0.webp augmented_rotated_image0.0_0.0_-5.0.png augmented_rotated_image0.0_0.0_-5.0.webp city_img_1.png city_img_1.webp city_out_1.png city_out_1.webp color_000006.png color_000006.webp color_000010.png color_000010.webp color_000013.png color_000013.webp color_000021.png color_000021.webp color_000022.png color_000022.webp color_000024.png color_000024.webp color_000034.png color_000034.webp color_000039.png color_000039.webp color_000083.png color_000083.webp color_000092.png color_000092.webp color_000141.png color_000141.webp color_000144.png color_000144.webp color_000145.png color_000145.webp color_000148.png color_000148.webp color_000150.png color_000150.webp color_000173.png color_000173.webp color_000185.png color_000185.webp color_000212.png color_000212.webp color_000222.png color_000222.webp color_000269.png color_000269.webp color_000336.png color_000336.webp color_000352.png color_000352.webp color_000358.png color_000358.webp color_000419.png color_000419.webp color_000446.png color_000446.webp color_000477.png color_000477.webp color_000528.png color_000528.webp color_000580.png color_000580.webp color_1.png color_1.webp color_2.png color_2.webp color_bar_bwr.png color_bar_bwr.webp colormaps_D.png colormaps_D.webp colormaps_E.png colormaps_E.webp depth_abs_rel_000021.png depth_abs_rel_000021.webp depth_abs_rel_000024.png depth_abs_rel_000024.webp depth_abs_rel_000034.png depth_abs_rel_000034.webp depth_abs_rel_000039.png depth_abs_rel_000039.webp depth_abs_rel_000092.png depth_abs_rel_000092.webp depth_abs_rel_000141.png depth_abs_rel_000141.webp depth_abs_rel_000144.png depth_abs_rel_000144.webp depth_abs_rel_000145.png depth_abs_rel_000145.webp depth_abs_rel_000148.png depth_abs_rel_000148.webp depth_abs_rel_000150.png depth_abs_rel_000150.webp depth_abs_rel_000173.png depth_abs_rel_000173.webp depth_abs_rel_000185.png depth_abs_rel_000185.webp depth_abs_rel_000212.png depth_abs_rel_000212.webp depth_abs_rel_000222.png depth_abs_rel_000222.webp depth_abs_rel_000269.png depth_abs_rel_000269.webp depth_abs_rel_000336.png depth_abs_rel_000336.webp depth_abs_rel_000352.png depth_abs_rel_000352.webp depth_abs_rel_000358.png depth_abs_rel_000358.webp depth_abs_rel_000419.png depth_abs_rel_000419.webp depth_abs_rel_000466.png depth_abs_rel_000466.webp depth_abs_rel_000477.png depth_abs_rel_000477.webp depth_abs_rel_000580.png depth_abs_rel_000580.webp depth_pred_000021.png depth_pred_000021.webp depth_pred_000024.png depth_pred_000024.webp depth_pred_000034.png depth_pred_000034.webp depth_pred_000039.png depth_pred_000039.webp depth_pred_000092.png depth_pred_000092.webp depth_pred_000141.png depth_pred_000141.webp depth_pred_000144.png depth_pred_000144.webp depth_pred_000145.png depth_pred_000145.webp depth_pred_000148.png depth_pred_000148.webp depth_pred_000150.png depth_pred_000150.webp depth_pred_000173.png depth_pred_000173.webp depth_pred_000185.png depth_pred_000185.webp depth_pred_000212.png depth_pred_000212.webp depth_pred_000222.png depth_pred_000222.webp depth_pred_000269.png depth_pred_000269.webp depth_pred_000336.png depth_pred_000336.webp depth_pred_000352.png depth_pred_000352.webp depth_pred_000358.png depth_pred_000358.webp depth_pred_000419.png depth_pred_000419.webp depth_pred_000466.png depth_pred_000466.webp depth_pred_000477.png depth_pred_000477.webp depth_pred_000580.png depth_pred_000580.webp depthpro_depth.png depthpro_depth.webp depthpro_depth_error.png depthpro_depth_error.webp depthpro_gamma.png depthpro_gamma.webp depthpro_gamma_error.png depthpro_gamma_error.webp example_full_pipeline.png example_full_pipeline.webp fdm_depth.png fdm_depth.webp fig_1_fixed.png fig_1_fixed.webp first_fig.png first_fig.webp first_fig_a.png first_fig_a.webp first_fig_b.png first_fig_b.webp gamma_abs_rel_000013.png gamma_abs_rel_000013.webp gamma_abs_rel_000021.png gamma_abs_rel_000021.webp gamma_abs_rel_000024.png gamma_abs_rel_000024.webp gamma_abs_rel_000034.png gamma_abs_rel_000034.webp gamma_abs_rel_000039.png gamma_abs_rel_000039.webp gamma_abs_rel_000092.png gamma_abs_rel_000092.webp gamma_abs_rel_000141.png gamma_abs_rel_000141.webp gamma_abs_rel_000144.png gamma_abs_rel_000144.webp gamma_abs_rel_000145.png gamma_abs_rel_000145.webp gamma_abs_rel_000148.png gamma_abs_rel_000148.webp gamma_abs_rel_000150.png gamma_abs_rel_000150.webp gamma_abs_rel_000173.png gamma_abs_rel_000173.webp gamma_abs_rel_000185.png gamma_abs_rel_000185.webp gamma_abs_rel_000212.png gamma_abs_rel_000212.webp gamma_abs_rel_000222.png gamma_abs_rel_000222.webp gamma_abs_rel_000269.png gamma_abs_rel_000269.webp gamma_abs_rel_000336.png gamma_abs_rel_000336.webp gamma_abs_rel_000352.png gamma_abs_rel_000352.webp gamma_abs_rel_000358.png gamma_abs_rel_000358.webp gamma_abs_rel_000419.png gamma_abs_rel_000419.webp gamma_abs_rel_000466.png gamma_abs_rel_000466.webp gamma_abs_rel_000477.png gamma_abs_rel_000477.webp gamma_abs_rel_000580.png gamma_abs_rel_000580.webp gamma_error_moge.png gamma_error_moge.webp gamma_error_unidepth.png gamma_error_unidepth.webp gamma_map_rotated-2.0_0.0_0.0.png gamma_map_rotated-2.0_0.0_0.0.webp gamma_map_rotated0.0_-5.0_0.0.png gamma_map_rotated0.0_-5.0_0.0.webp gamma_map_rotated0.0_0.0_-5.0.png gamma_map_rotated0.0_0.0_-5.0.webp gamma_pred_000013.png gamma_pred_000013.webp gamma_pred_000021.png gamma_pred_000021.webp gamma_pred_000024.png gamma_pred_000024.webp gamma_pred_000034.png gamma_pred_000034.webp gamma_pred_000039.png gamma_pred_000039.webp gamma_pred_000092.png gamma_pred_000092.webp gamma_pred_000141.png gamma_pred_000141.webp gamma_pred_000144.png gamma_pred_000144.webp gamma_pred_000145.png gamma_pred_000145.webp gamma_pred_000148.png gamma_pred_000148.webp gamma_pred_000150.png gamma_pred_000150.webp gamma_pred_000173.png gamma_pred_000173.webp gamma_pred_000185.png gamma_pred_000185.webp gamma_pred_000212.png gamma_pred_000212.webp gamma_pred_000222.png gamma_pred_000222.webp gamma_pred_000269.png gamma_pred_000269.webp gamma_pred_000336.png gamma_pred_000336.webp gamma_pred_000352.png gamma_pred_000352.webp gamma_pred_000358.png gamma_pred_000358.webp gamma_pred_000419.png gamma_pred_000419.webp gamma_pred_000466.png gamma_pred_000466.webp gamma_pred_000477.png gamma_pred_000477.webp gamma_pred_000580.png gamma_pred_000580.webp gfm_depth.png gfm_depth.webp gfm_depth_error.png gfm_depth_error.webp gfm_gamma.png gfm_gamma.webp gfm_gamma_error.png gfm_gamma_error.webp height_pred_000013.png height_pred_000013.webp kaggle1.png kaggle1.webp kaggle1_out.png kaggle1_out.webp kaggle2.png kaggle2.webp kaggle2_out.png kaggle2_out.webp last_fig_fixed.png last_fig_fixed.webp main_fig_fixed.png main_fig_fixed.webp new_colormap.png new_colormap.webp orig_gamma.png orig_gamma.webp orig_img.png orig_img.webp pc_000006.png pc_000006.webp pc_000010.png pc_000010.webp pc_000013.png pc_000013.webp pc_000022.png pc_000022.webp pc_000083.png pc_000083.webp pc_000528.png pc_000528.webp road_mask.png road_mask.webp unidepth.png unidepth.webp

Reference

This content is AI-processed based on ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut