Beyond a Single Light: A Large-Scale Aerial Dataset for Urban Scene Reconstruction Under Varying Illumination

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Recent advances in Neural Radiance Fields and 3D Gaussian Splatting have demonstrated strong potential for large-scale UAV-based 3D reconstruction tasks by fitting the appearance of images. However, real-world large-scale captures are often based on multi-temporal data capture, where illumination inconsistencies across different times of day can significantly lead to color artifacts, geometric inaccuracies, and inconsistent appearance. Due to the lack of UAV datasets that systematically capture the same areas under varying illumination conditions, this challenge remains largely underexplored. To fill this gap, we introduceSkyLume, a large-scale, real-world UAV dataset specifically designed for studying illumination robust 3D reconstruction in urban scene modeling: (1) We collect data from 10 urban regions data comprising more than 100k high resolution UAV images (four oblique views and nadir), where each region is captured at three periods of the day to systematically isolate illumination changes. (2) To support precise evaluation of geometry and appearance, we provide per-scene LiDAR scans and accurate 3D ground-truth for assessing depth, surface normals, and reconstruction quality under varying illumination. (3) For the inverse rendering task, we introduce the Temporal Consistency Coefficient (TCC), a metric that measuress cross-time albedo stability and directly evaluates the robustness of the disentanglement of light and material. We aim for this resource to serve as a foundation that advances research and real-world evaluation in large-scale inverse rendering, geometry reconstruction, and novel view synthesis.

💡 Research Summary

The paper introduces SkyLume, a large‑scale, real‑world UAV dataset specifically designed to study the impact of illumination changes on urban‑scale 3D reconstruction, novel view synthesis, and inverse rendering. Existing aerial datasets either lack systematic multi‑temporal captures, provide insufficient ground‑truth geometry, or do not support evaluation of illumination robustness. SkyLume fills this gap by collecting over 100 k high‑resolution (≈6 K) images from ten diverse urban regions, each captured at three distinct times of day—early morning, midday, and late afternoon—along identical RTK‑guided flight paths. For every region, four oblique views and one nadir view are recorded simultaneously, ensuring dense façade coverage while maintaining consistent viewpoints across the three illumination conditions.

To guarantee metric‑accurate geometry, the authors acquire a high‑precision DJI Zenmuse L2 LiDAR scan (≈5 cm horizontal, 4 cm vertical accuracy) for each scene. The LiDAR point clouds are used as a “metric scaffold”: they are rendered into pseudo‑RGB images that participate in a joint Structure‑from‑Motion (SfM) optimization together with the RGB images. This LiDAR‑guided SfM aligns all camera poses across the three time slots to a common coordinate system, and any residual drift is corrected with a small set of manually placed ground‑control points. The aligned poses then feed a LiDAR‑guided Multi‑View Stereo (MVS) pipeline that fuses the multi‑temporal imagery with the point cloud, producing dense, high‑fidelity meshes. Special handling is applied to reflective or translucent surfaces such as water bodies, where flat planar patches are reconstructed from LiDAR measurements to avoid MVS failures.

Beyond the raw data, SkyLume provides per‑frame depth and normal maps derived both from the meshes (dense) and directly from the LiDAR point clouds (sparser but metrically accurate). Solar elevation and azimuth for each image are also supplied, enabling future work on shadow removal or physically‑based illumination modeling. All assets are released in standard COLMAP format, together with predefined training/validation/test splits to facilitate reproducible benchmarking.

A central contribution is the Temporal Consistency Coefficient (TCC), a novel metric that quantifies cross‑time stability. TCC‑Albedo measures the L2 distance between albedo maps rendered from the same viewpoint across the three illumination periods, directly assessing how well a method separates material reflectance from lighting. TCC‑Geometry evaluates the pairwise Chamfer distance and normal deviation between meshes reconstructed from different times, exposing geometry inconsistencies caused by shadows being misinterpreted as solid structures. By providing both photometric and geometric temporal consistency scores, TCC complements traditional image‑level metrics such as PSNR, SSIM, or LPIPS, which ignore temporal coherence.

The authors benchmark several state‑of‑the‑art 3D Gaussian Splatting (3DGS) variants on SkyLume. Results reveal three key challenges: (1) illumination shifts degrade rendering quality, especially on weakly textured façades, glass, and water‑adjacent surfaces, leading to color bleeding and shadow over‑fitting; (2) geometry extraction suffers from shadow‑induced artifacts, where cast shadows are erroneously reconstructed as elevated surfaces, inflating depth error and corrupting normal estimates; (3) inverse rendering methods struggle to produce time‑invariant albedo, as residual shadows and exposure variations remain embedded in the estimated material maps, yielding high TCC‑Albedo values. These findings underscore the necessity of explicit light‑material disentanglement, dynamic illumination handling, and cross‑time regularization for city‑scale reconstruction.

In summary, SkyLume is the first real‑world aerial dataset that simultaneously offers (i) multi‑temporal, multi‑view high‑resolution imagery, (ii) precise LiDAR‑derived ground‑truth geometry, and (iii) a dedicated temporal consistency evaluation protocol. It enables rigorous assessment of illumination‑robust 3D reconstruction, novel view synthesis, and inverse rendering at city scale, and is expected to catalyze research into light‑aware neural representations, robust SfM/MVS pipelines, and real‑time digital twin generation for smart‑city applications.

Beyond a Single Light: A Large-Scale Aerial Dataset for Urban Scene Reconstruction Under Varying Illumination

💡 Research Summary

Comments & Academic Discussion

Leave a Comment