AgentBay: A Hybrid Interaction Sandbox for Seamless Human-AI Intervention in Agentic Systems

Reading time: 6 minute
...

📝 Original Info

  • Title: AgentBay: A Hybrid Interaction Sandbox for Seamless Human-AI Intervention in Agentic Systems
  • ArXiv ID: 2512.04367
  • Date: 2025-12-04
  • Authors: Researchers from original ArXiv paper

📝 Abstract

The rapid advancement of Large Language Models (LLMs) is catalyzing a shift towards autonomous AI Agents capable of executing complex, multi-step tasks. However, these agents remain brittle when faced with real-world exceptions, making Human-in-the-Loop (HITL) supervision essential for mission-critical applications. In this paper, we present AgentBay, a novel sandbox service designed from the ground up for hybrid interaction. AgentBay provides secure, isolated execution environments spanning Windows, Linux, Android, Web Browsers, and Code interpreters. Its core contribution is a unified session accessible via a hybrid control interface: An AI agent can interact programmatically via mainstream interfaces (MCP, Open Source SDK), while a human operator can, at any moment, seamlessly take over full manual control. This seamless intervention is enabled by Adaptive Streaming Protocol (ASP). Unlike traditional VNC/RDP, ASP is specifically engineered for this hybrid use case, delivering an ultra-low-latency, smoother user experience that remains resilient even in weak network environments. It achieves this by dynamically blending command-based and video-based streaming, adapting its encoding strategy based on network conditions and the current controller (AI or human). Our evaluation demonstrates strong results in security, performance, and task completion rates. In a benchmark of complex tasks, the AgentBay (Agent + Human) model achieved more than 48% success rate improvement. Furthermore, our ASP protocol reduces bandwidth consumption by up to 50% compared to standard RDP, and in end-to-end latency with around 5% reduction, especially under poor network conditions. We posit that AgentBay provides a foundational primitive for building the next generation of reliable, human-supervised autonomous systems.

💡 Deep Analysis

Deep Dive into AgentBay: A Hybrid Interaction Sandbox for Seamless Human-AI Intervention in Agentic Systems.

The rapid advancement of Large Language Models (LLMs) is catalyzing a shift towards autonomous AI Agents capable of executing complex, multi-step tasks. However, these agents remain brittle when faced with real-world exceptions, making Human-in-the-Loop (HITL) supervision essential for mission-critical applications. In this paper, we present AgentBay, a novel sandbox service designed from the ground up for hybrid interaction. AgentBay provides secure, isolated execution environments spanning Windows, Linux, Android, Web Browsers, and Code interpreters. Its core contribution is a unified session accessible via a hybrid control interface: An AI agent can interact programmatically via mainstream interfaces (MCP, Open Source SDK), while a human operator can, at any moment, seamlessly take over full manual control. This seamless intervention is enabled by Adaptive Streaming Protocol (ASP). Unlike traditional VNC/RDP, ASP is specifically engineered for this hybrid use case, delivering an ult

📄 Full Content

AgentBay: A Hybrid Interaction Sandbox for Seamless Human-AI Intervention in Agentic Systems Yun Piao, Hongbo Min, Hang Su, Leilei Zhang, Lei Wang, Yue Yin, Xiao Wu, Zhejing Xu, Liwei Qu, Hang Li, Xinxin Zeng, Wei Tian, Fei Yu, Xiaowei Li, Jiayi Jiang, Tongxu Liu, Hao Tian, Yufei Que, Xiaobing Tu, Bing Suo, Yuebing Li, Xiangting Chen, Zeen Zhao, Jiaming Tang, Wei Huang, Xuguang Li, Jing Zhao, Jin Li, Jie Shen, Jinkui Ren, Xiantao Zhang Alibaba Cloud Computing ABSTRACT The rapid advancement of Large Language Models (LLMs) is catalyzing a shift towards autonomous AI Agents capable of executing complex, multi-step tasks. However, these agents remain brittle when faced with real-world exceptions, making Human-in-the-Loop (HITL) supervision essential for mission-critical applications. In this paper, we present AgentBay, a novel sandbox service designed from the ground up for hybrid interaction. AgentBay provides secure, isolated execution environments spanning Windows, Linux, Android, Web Browsers, and Code interpreters. Its core contribution is a unified session accessible via a hybrid control interface: An AI agent can interact programmatically via mainstream interfaces (MCP, Open Source SDK), while a human operator can, at any moment, seamlessly take over full manual control. This seamless intervention is enabled by Adaptive Streaming Protocol (ASP). Unlike traditional VNC/RDP, ASP is specifically engineered for this hybrid use case, delivering an ultra-low-latency, smoother user experience that remains resilient even in weak network environments. It achieves this by dynamically blending command-based and video-based streaming, adapting its encoding strategy based on network conditions and the current controller (AI or human). Our evaluation demonstrates strong results in security, performance, and task completion rates. In a benchmark of complex tasks, the AgentBay (Agent + Human) model achieved more than 48% success rate improvement. Furthermore, our ASP protocol reduces bandwidth consumption by up to 50% compared to standard RDP, and in end-to-end latency with around 5% reduction, especially under poor network conditions. We posit that AgentBay provides a foundational primitive for building the next generation of reliable, human-supervised autonomous systems. 1 Introduction The proliferation of Large Language Models (LLMs) has given rise to a new paradigm of autonomous agents [1]. Systems like Auto-GPT [2] and frameworks like LangChain [3] empower agents to reason, plan, and execute tasks across digital environments. However, their deployment is hindered by two key challenges: brittleness and the need for secure human-in-the-loop (HITL) collaboration. Brittleness occurs when agents fail at unforeseen exceptions—modal pop-ups or CAPTCHAs [21]. The collaboration need arises when agents must securely handle private data. For example, an agent might autonomously navigate a website, but upon reaching the login page, it must pause and request human intervention to securely enter credentials (username and password), as the agent itself is not provisioned with this sensitive information. In both scenarios—unexpected failure or planned intervention—a seamless, low-friction handoff to a human operator is critical. This necessity for supervision highlights the critical importance of sandboxed environments that also support fluid human intervention. Leading agent sandboxes (e.g., E2B [7], Daytona [14]) recognize this need, and they typically address it by integrating general-purpose remote interaction protocols like VNC [8] or RDP [9] as their human intervention mechanism. While this approach provides a vital function, these protocols were not specifically designed for the rapid, arXiv:2512.04367v1 [cs.AI] 4 Dec 2025 low-friction handoff required in hybrid AI systems. They can introduce noticeable latency and may be less resilient under variable network conditions, which can diminish the fluidity of the human-agent collaboration. To address this specific challenge, we present a hybrid interaction sandbox infrastructure designed from the ground up to optimize this interaction (based on the production service AgentBay). AgentBay provides a single, isolated execution sandbox that can be controlled simultaneously via a programmatic API and open source SDK for AI Agent, or a high-performance graphical streaming interface for humans. The core of our system is the Adaptive Streaming Protocol (ASP). When a human takes over, it instantly switches to an ultra-low-latency, smooth, and resilient graphical stream explicitly optimized for interaction fluency, even on variable networks. We make the following contributions: • Hybrid Interaction Architecture: We present the design of a hybrid interaction sandbox system supporting diverse OS, mobile, browser, and code, that are secure, isolated execution environments. • Adaptive Streaming Protocol (ASP): We detail our novel streaming protocol that enables human control, specifically

…(Full text truncated)…

📸 Image Gallery

AI_Agent_with_Sandboxes.png AI_Agent_with_Sandboxes.webp Adaptive_Streaming_Protocol.png Adaptive_Streaming_Protocol.webp AgentBay_Security_Design.png AgentBay_Security_Design.webp System_architecture_of_AgentBay.png System_architecture_of_AgentBay.webp

Reference

This content is AI-processed based on ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut