Insight Agents: An LLM-Based Multi-Agent System for Data Insights

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Today, E-commerce sellers face several key challenges, including difficulties in discovering and effectively utilizing available programs and tools, and struggling to understand and utilize rich data from various tools. We therefore aim to develop Insight Agents (IA), a conversational multi-agent Data Insight system, to provide E-commerce sellers with personalized data and business insights through automated information retrieval. Our hypothesis is that IA will serve as a force multiplier for sellers, thereby driving incremental seller adoption by reducing the effort required and increase speed at which sellers make good business decisions. In this paper, we introduce this novel LLM-backed end-to-end agentic system built on a plan-and-execute paradigm and designed for comprehensive coverage, high accuracy, and low latency. It features a hierarchical multi-agent structure, consisting of manager agent and two worker agents: data presentation and insight generation, for efficient information retrieval and problem-solving. We design a simple yet effective ML solution for manager agent that combines Out-of-Domain (OOD) detection using a lightweight encoder-decoder model and agent routing through a BERT-based classifier, optimizing both accuracy and latency. Within the two worker agents, a strategic planning is designed for API-based data model that breaks down queries into granular components to generate more accurate responses, and domain knowledge is dynamically injected to to enhance the insight generator. IA has been launched for Amazon sellers in US, which has achieved high accuracy of 90% based on human evaluation, with latency of P90 below 15s.

💡 Research Summary

Insight Agents (IA) is a conversational multi‑agent platform designed to help Amazon sellers extract actionable business insights from their own data with minimal effort. The system follows a plan‑and‑execute paradigm built around a hierarchical manager‑worker architecture. The manager agent first performs out‑of‑domain (OOD) detection using a lightweight auto‑encoder that is trained on in‑domain question embeddings; an OOD threshold based on mean + λ·standard‑deviation of reconstruction error yields a precision of 96.9 % while incurring only 0.009 s latency per query. After passing the OOD gate, a fine‑tuned BERT‑small classifier routes the query to one of two specialized worker agents: the Data Presenter or the Insight Generator. This router achieves 83 % classification accuracy with an average latency of 0.31 s, substantially outperforming a comparable LLM‑based few‑shot classifier (60 % accuracy, >2 s latency).

Both workers operate on a Retrieval‑Augmented Generation (RAG) backbone that accesses internal tabular data through a robust API‑driven data model. The Data Presenter decomposes the user request into a sequence of API calls, generates payloads via slot‑filling, executes the calls, and aggregates the results. By relying on structured API calls rather than text‑to‑SQL translation, the system reduces syntax errors and hallucinations. The Insight Generator adds a domain‑aware routing layer that classifies the request into categories such as performance analysis, benchmarking, or recommendation. For each category, a set of domain‑specific prompts, few‑shot examples, and knowledge snippets are injected dynamically, enabling the LLM (Claude‑3 Sonnet) to produce concise, data‑grounded insights.

Parallel execution is employed: after routing, both worker agents are launched concurrently, and whichever finishes first can terminate the other, trading additional compute for lower end‑to‑end latency. The overall architecture also includes a guardrail post‑processor that filters out PII or policy‑violating content before returning the response to the seller.

The authors collected 301 annotated questions (178 in‑domain, 123 out‑of‑domain) to train the OOD detector and router, augmenting the in‑domain set to 300 examples per class using LLM‑generated variations. A benchmark of 100 real seller queries was evaluated by human auditors using relevance, correctness, and completeness metrics, as well as a question‑level accuracy threshold (≥0.8 on all three). IA achieved an average relevance of 0.977, correctness of 0.958, completeness of 0.993, and a question‑level accuracy of 89.5 %. The 90th‑percentile latency (P90) was 13.56 seconds, comfortably below the target of 15 seconds.

In conclusion, Insight Agents demonstrates that a carefully engineered combination of lightweight ML components for gating and routing, together with LLM‑driven planning and generation, can deliver high‑quality, low‑latency data insights at scale. Limitations include the relatively lower recall of the OOD detector (potentially allowing some out‑of‑scope queries to pass) and a strong dependency on internal API stability, which may require re‑engineering when APIs evolve. Future work will explore improving OOD recall, extending routing to more fine‑grained analysis types, and integrating external data sources to broaden the scope of insights offered to sellers.

Insight Agents: An LLM-Based Multi-Agent System for Data Insights

💡 Research Summary

Comments & Academic Discussion

Leave a Comment