Title: AnyTask: an Automated Task and Data Generation Framework for Advancing Sim-to-Real Policy Learning
ArXiv ID: 2512.17853
Date: 2025-12-19
Authors: ** Ran Gong, Xiaohan Zhang, Jinghuan Shang, Maria Vittoria Minniti, Jigarkumar Patel, Valerio Pepe, Riedana Yan, Ahmet Gundogdu, Ivan Kapelyukh, Ali Abbas, Xiaoqiang Yan, Harsh Patel, Laura Herlant, Karl Schmeckpeper (모두 Robotics and AI Institute, Boston, MA, USA 소속) **
📝 Abstract
Generalist robot learning remains constrained by data: large-scale, diverse, and high-quality interaction data are expensive to collect in the real world. While simulation has become a promising way for scaling up data collection, the related tasks, including simulation task design, task-aware scene generation, expert demonstration synthesis, and sim-to-real transfer, still demand substantial human effort. We present AnyTask, an automated framework that pairs massively parallel GPU simulation with foundation models to design diverse manipulation tasks and synthesize robot data. We introduce three AnyTask agents for generating expert demonstrations aiming to solve as many tasks as possible: 1) ViPR, a novel task and motion planning agent with VLM-in-the-loop Parallel Refinement; 2) ViPR-Eureka, a reinforcement learning agent with generated dense rewards and LLM-guided contact sampling; 3) ViPR-RL, a hybrid planning and learning approach that jointly produces high-quality demonstrations with only sparse rewards. We train behavior cloning policies on generated data, validate them in simulation, and deploy them directly on real robot hardware. The policies generalize to novel object poses, achieving 44% average success across a suite of real-world pick-and-place, drawer opening, contact-rich pushing, and long-horizon manipulation tasks. Our project website is at https://anytask.rai-inst.com .
💡 Deep Analysis
📄 Full Content
ANYTASK: an Automated Task and Data Generation Framework for
Advancing Sim-to-Real Policy Learning
Ran Gong1∗, Xiaohan Zhang1∗, Jinghuan Shang1∗, Maria Vittoria Minniti1∗,
Jigarkumar Patel1, Valerio Pepe1, Riedana Yan1, Ahmet Gundogdu1, Ivan Kapelyukh1, Ali Abbas1,
Xiaoqiang Yan1, Harsh Patel1, Laura Herlant1, Karl Schmeckpeper1
Learn AnyTask with these objects.
Put object in closed drawer
Automated Data Synthesis
Place strawberry in bowl
Push pear to center
Lift foam brick
Lift banana
Stack banana on can
Lift peach
Open drawer
Zero-shot Sim2Real Transfer
Fig. 1: ANYTASK is a framework that automates task design and generates data for robot learning. The resulting data enables
training visuomotor policies that can be deployed directly onto a physical robot without requiring any real-world data.
Abstract— Generalist robot learning remains constrained by
data: large-scale, diverse, and high-quality interaction data
are expensive to collect in the real world. While simulation
has become a promising way for scaling up data collection,
the related tasks, including simulation task design, task-aware
scene generation, expert demonstration synthesis, and sim-to-
real transfer, still demand substantial human effort. We present
ANYTASK, an automated framework that pairs massively par-
allel GPU simulation with foundation models to design diverse
manipulation tasks and synthesize robot data. We introduce
three ANYTASK agents for generating expert demonstrations
aiming to solve as many tasks as possible: 1) VIPR, a novel
task and motion planning agent with VLM-in-the-loop Parallel
Refinement; 2) VIPR-EUREKA, a reinforcement learning agent
with generated dense rewards and LLM-guided contact sam-
pling; 3) VIPR-RL, a hybrid planning and learning approach
that jointly produces high-quality demonstrations with only
sparse rewards. We train behavior cloning policies on generated
data, validate them in simulation, and deploy them directly on
real robot hardware. The policies generalize to novel object
poses, achieving 44% average success across a suite of real-
world pick-and-place, drawer opening, contact-rich pushing,
*Equal Contribution
1The authors are with the Robotics and AI Institute, Boston, MA, USA.
{rgong, xzhang, jshang, mminniti, jpatel, vpepe, ryan,
agundogdu, IKapelyukh, aabbas, xyan, hapatel, lherlant,
kschmeckpeper}@rai-inst.com
and long-horizon manipulation tasks. Our project website is at
https://anytask.rai-inst.com.
I. INTRODUCTION
The success of deep learning fundamentally depends on
access to large-scale, high-quality data [1]–[3], as demon-
strated in various domains such as language modeling [4]–
[7], visual understanding [8]–[14], generation [15]–[17], and
multimodal applications [18]–[20]. However, collecting robot
data is extremely time-consuming and costly [21], [22] as it
necessitates direct physical interaction with the real world.
Robot simulation, which can be scaled straightforwardly
with compute [23]–[25], presents an appealing alternative
for collecting large-scale datasets with minimal real-world
effort [26]–[31]. While prior work has made significant
progress in designing simulation systems for a wide range
of tasks, tremendous human effort is often a huge barrier
in building these systems [32], [33]. This effort includes
proposing tasks, selecting task-relevant object assets, design-
ing metrics, ensuring feasibility, and generating a large quan-
tity of high-quality demonstration data. These non-trivial
components frequently limit the diversity of the generated
data.
arXiv:2512.17853v2 [cs.RO] 20 Jan 2026
Fig. 2: Overview of ANYTASK. We first generate simulated manipulation tasks from an object database and a high-level task
(i.e., task type). Then the pipeline automatically proposes task descriptions, generates the simulation code, and efficiently
collects data using different agents, including VIPR, VIPR-RL, and VIPR-EUREKA in massively parallel simulation
environments. We apply online domain randomization in the simulation to ensure the diversity of the scenes and the visual
observations. Finally, we train the policy using simulated data and zero-shot transfer to the real world.
Trained on vast internet data, foundation models demon-
strate remarkable abilities in robotic downstream applica-
tions, such as scene understanding, task planning, motion
synthesis, and low-level control [34]–[37]. These capabilities
can also be leveraged to automate many key steps in creating
robotic simulation environments, such as task design, writing
simulation code, and iterative refinement. However, prior
work leveraging foundation models for robot simulations
either requires significant human efforts on task design and
demonstration collection [30], [31], [38], or struggles with
sim-to-real transfer [39], [40], even though the ultimate goal
of large-scale data collection is to deploy the trained system
in the real world.
To address the aforementioned challenges, we introduce
ANYTASK (Figur