AnyTask: an Automated Task and Data Generation Framework for Advancing Sim-to-Real Policy Learning

February 22, 2026

Reading time: 5 minute

...

📝 Original Info

Title: AnyTask: an Automated Task and Data Generation Framework for Advancing Sim-to-Real Policy Learning
ArXiv ID: 2512.17853
Date: 2025-12-19
Authors: ** Ran Gong, Xiaohan Zhang, Jinghuan Shang, Maria Vittoria Minniti, Jigarkumar Patel, Valerio Pepe, Riedana Yan, Ahmet Gundogdu, Ivan Kapelyukh, Ali Abbas, Xiaoqiang Yan, Harsh Patel, Laura Herlant, Karl Schmeckpeper (모두 Robotics and AI Institute, Boston, MA, USA 소속) **

📝 Abstract

Generalist robot learning remains constrained by data: large-scale, diverse, and high-quality interaction data are expensive to collect in the real world. While simulation has become a promising way for scaling up data collection, the related tasks, including simulation task design, task-aware scene generation, expert demonstration synthesis, and sim-to-real transfer, still demand substantial human effort. We present AnyTask, an automated framework that pairs massively parallel GPU simulation with foundation models to design diverse manipulation tasks and synthesize robot data. We introduce three AnyTask agents for generating expert demonstrations aiming to solve as many tasks as possible: 1) ViPR, a novel task and motion planning agent with VLM-in-the-loop Parallel Refinement; 2) ViPR-Eureka, a reinforcement learning agent with generated dense rewards and LLM-guided contact sampling; 3) ViPR-RL, a hybrid planning and learning approach that jointly produces high-quality demonstrations with only sparse rewards. We train behavior cloning policies on generated data, validate them in simulation, and deploy them directly on real robot hardware. The policies generalize to novel object poses, achieving 44% average success across a suite of real-world pick-and-place, drawer opening, contact-rich pushing, and long-horizon manipulation tasks. Our project website is at https://anytask.rai-inst.com .

💡 Deep Analysis

📄 Full Content

ANYTASK: an Automated Task and Data Generation Framework for Advancing Sim-to-Real Policy Learning Ran Gong1∗, Xiaohan Zhang1∗, Jinghuan Shang1∗, Maria Vittoria Minniti1∗, Jigarkumar Patel1, Valerio Pepe1, Riedana Yan1, Ahmet Gundogdu1, Ivan Kapelyukh1, Ali Abbas1, Xiaoqiang Yan1, Harsh Patel1, Laura Herlant1, Karl Schmeckpeper1 Learn AnyTask with these objects. Put object in closed drawer Automated Data Synthesis Place strawberry in bowl Push pear to center Lift foam brick Lift banana Stack banana on can Lift peach Open drawer Zero-shot Sim2Real Transfer Fig. 1: ANYTASK is a framework that automates task design and generates data for robot learning. The resulting data enables training visuomotor policies that can be deployed directly onto a physical robot without requiring any real-world data. Abstract— Generalist robot learning remains constrained by data: large-scale, diverse, and high-quality interaction data are expensive to collect in the real world. While simulation has become a promising way for scaling up data collection, the related tasks, including simulation task design, task-aware scene generation, expert demonstration synthesis, and sim-to- real transfer, still demand substantial human effort. We present ANYTASK, an automated framework that pairs massively par- allel GPU simulation with foundation models to design diverse manipulation tasks and synthesize robot data. We introduce three ANYTASK agents for generating expert demonstrations aiming to solve as many tasks as possible: 1) VIPR, a novel task and motion planning agent with VLM-in-the-loop Parallel Refinement; 2) VIPR-EUREKA, a reinforcement learning agent with generated dense rewards and LLM-guided contact sam- pling; 3) VIPR-RL, a hybrid planning and learning approach that jointly produces high-quality demonstrations with only sparse rewards. We train behavior cloning policies on generated data, validate them in simulation, and deploy them directly on real robot hardware. The policies generalize to novel object poses, achieving 44% average success across a suite of real- world pick-and-place, drawer opening, contact-rich pushing, *Equal Contribution 1The authors are with the Robotics and AI Institute, Boston, MA, USA. {rgong, xzhang, jshang, mminniti, jpatel, vpepe, ryan, agundogdu, IKapelyukh, aabbas, xyan, hapatel, lherlant, kschmeckpeper}@rai-inst.com and long-horizon manipulation tasks. Our project website is at https://anytask.rai-inst.com. I. INTRODUCTION The success of deep learning fundamentally depends on access to large-scale, high-quality data [1]–[3], as demon- strated in various domains such as language modeling [4]– [7], visual understanding [8]–[14], generation [15]–[17], and multimodal applications [18]–[20]. However, collecting robot data is extremely time-consuming and costly [21], [22] as it necessitates direct physical interaction with the real world. Robot simulation, which can be scaled straightforwardly with compute [23]–[25], presents an appealing alternative for collecting large-scale datasets with minimal real-world effort [26]–[31]. While prior work has made significant progress in designing simulation systems for a wide range of tasks, tremendous human effort is often a huge barrier in building these systems [32], [33]. This effort includes proposing tasks, selecting task-relevant object assets, design- ing metrics, ensuring feasibility, and generating a large quan- tity of high-quality demonstration data. These non-trivial components frequently limit the diversity of the generated data. arXiv:2512.17853v2 [cs.RO] 20 Jan 2026 Fig. 2: Overview of ANYTASK. We first generate simulated manipulation tasks from an object database and a high-level task (i.e., task type). Then the pipeline automatically proposes task descriptions, generates the simulation code, and efficiently collects data using different agents, including VIPR, VIPR-RL, and VIPR-EUREKA in massively parallel simulation environments. We apply online domain randomization in the simulation to ensure the diversity of the scenes and the visual observations. Finally, we train the policy using simulated data and zero-shot transfer to the real world. Trained on vast internet data, foundation models demon- strate remarkable abilities in robotic downstream applica- tions, such as scene understanding, task planning, motion synthesis, and low-level control [34]–[37]. These capabilities can also be leveraged to automate many key steps in creating robotic simulation environments, such as task design, writing simulation code, and iterative refinement. However, prior work leveraging foundation models for robot simulations either requires significant human efforts on task design and demonstration collection [30], [31], [38], or struggles with sim-to-real transfer [39], [40], even though the ultimate goal of large-scale data collection is to deploy the trained system in the real world. To address the aforementioned challenges, we introduce ANYTASK (Figur

📄 Read Full PDF on ArXiv