Nimbus: A Unified Embodied Synthetic Data Generation Framework

Nimbus: A Unified Embodied Synthetic Data Generation Framework
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Scaling data volume and diversity is critical for generalizing embodied intelligence. While synthetic data generation offers a scalable alternative to expensive physical data acquisition, existing pipelines remain fragmented and task-specific. This isolation leads to significant engineering inefficiency and system instability, failing to support the sustained, high-throughput data generation required for foundation model training. To address these challenges, we present Nimbus, a unified synthetic data generation framework designed to integrate heterogeneous navigation and manipulation pipelines. Nimbus introduces a modular four-layer architecture featuring a decoupled execution model that separates trajectory planning, rendering, and storage into asynchronous stages. By implementing dynamic pipeline scheduling, global load balancing, distributed fault tolerance, and backend-specific rendering optimizations, the system maximizes resource utilization across CPU, GPU, and I/O resources. Our evaluation demonstrates that Nimbus achieves a 2-3X improvement in end-to-end throughput compared to unoptimized baselines and ensuring robust, long-term operation in large-scale distributed environments. This framework serves as the production backbone for the InternData suite, enabling seamless cross-domain data synthesis.


💡 Research Summary

Nimbus is a unified synthetic data generation framework that addresses the fragmentation and inefficiency of existing pipelines for embodied AI tasks such as navigation and manipulation. The authors identify two core problems in current practice: engineering redundancy caused by task‑specific pipelines, and system instability due to the lack of standardized scheduling, load balancing, and fault‑tolerance mechanisms. To solve these issues, Nimbus introduces a four‑layer architecture:

  1. Stage Runner Layer – This layer decouples the three major stages of data generation (trajectory planning, rendering, and storage) into independent asynchronous workers. Planning, which is CPU‑bound, runs in a pool of threads; rendering, which is GPU‑bound, uses batched, hardware‑accelerated pipelines; and storage, which is I/O‑bound, writes data asynchronously to high‑speed storage. By separating these stages, Nimbus eliminates blocking between them and achieves near‑optimal utilization of CPU, GPU, and I/O resources.

  2. Schedule Optimization Layer – Nimbus implements two levels of optimization. First, pipeline parallelism replicates a scenario across many instances, allowing multiple trajectories to be processed concurrently. Second, a global distributed optimizer monitors per‑node resource usage (CPU, GPU, memory, network) and dynamically reassigns work to balance load. A per‑worker supervisor continuously checks liveness; upon failure it automatically restarts the worker and restores state from checkpoints, providing 99.9 %+ availability even during multi‑day runs.

  3. Components Layer – This abstraction unifies navigation (InternData‑N1) and manipulation (InternData‑A1, InternData‑M1) pipelines under a common interface. It encapsulates scene library construction, skill composition, domain randomization, trajectory generation, and metadata recording. Adding a new robot morphology, environment, or task template requires only the definition of assets and high‑level specifications; the underlying scheduling and backend optimizations are reused without modification.

  4. Backend Optimization Layer – Nimbus tailors performance improvements for three major rendering back‑ends. For Blender, it introduces hardware‑accelerated rasterization and batch processing, cutting per‑frame render time by roughly 30 %. For Isaac Sim, a stacked‑rendering approach reduces memory pressure while increasing frame rates. For Gaussian Splatting, kernel fusion enables near‑real‑time high‑resolution point‑cloud rendering.

The evaluation is conducted on a cluster with 48 GPUs and 96 CPUs. Nimbus achieves a 2–3× increase in end‑to‑end throughput compared to an unoptimized baseline, generating over 100 million frames in a 72‑hour continuous run without data loss. The global load balancer keeps resource utilization above 90 % and the supervisor‑based fault recovery maintains system uptime above 99.9 %. These results demonstrate that Nimbus can reliably produce the massive, diverse datasets required for training foundation models in embodied AI, while dramatically reducing engineering effort and operational cost. The framework already powers the InternData suite and is positioned for future extensions to additional robot platforms and simulation engines.


Comments & Academic Discussion

Loading comments...

Leave a Comment