EcomBench: Towards Holistic Evaluation of Foundation Agents in E-commerce

Reading time: 2 minute
...

📝 Original Info

  • Title: EcomBench: Towards Holistic Evaluation of Foundation Agents in E-commerce
  • ArXiv ID: 2512.08868
  • Date: 2025-12-09
  • Authors: Rui Min, Zile Qiao, Ze Xu, Jiawen Zhai, Wenyu Gao, Xuanzhong Chen, Haozhen Sun, Zhen Zhang, Xinyu Wang, Hong Zhou, Wenbiao Yin, Bo Zhang, Xuan Zhou, Ming Yan, Yong Jiang, Haicheng Liu, Liang Ding, Ling Zou, Yi R. Fung, Yalong Li, Pengjun Xie

📝 Abstract

Foundation agents have rapidly advanced in their ability to reason and interact with real environments, making the evaluation of their core capabilities increasingly important. While many benchmarks have been developed to assess agent performance, most concentrate on academic settings or artificially designed scenarios while overlooking the challenges that arise in real applications. To address this issue, we focus on a highly practical real-world setting, the e-commerce domain, which involves a large volume of diverse user interactions, dynamic market conditions, and tasks directly tied to real decision-making processes. To this end, we introduce EcomBench, a holistic E-commerce Benchmark designed to evaluate agent performance in realistic...

📄 Full Content

Large Language Models (LLMs) are rapidly advancing from passive knowledge retrievers to autonomous agents (Team et al., 2025;Qiu et al., 2025;Kimi, 2025;Zeng et al., 2025a;Li et al., 2025c) capable of reasoning, planning, and interacting with real-world environments. At its core, assessing the foundational capabilities of these advanced LLM agents has become essential, where a wide range of benchmarks have been proposed (Mial

…(Content truncated for length.)

📸 Image Gallery

categ_fig.png diff_fig.png hf-logo.png link.jpg logo_new.png main_fig.png modelscope-logo.png tongyi.jpg

Reference

This content is AI-processed based on open access ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut