Automating end-to-end data science pipeline with AI agents still stalls on two gaps: generating insightful, diverse visual evidence and assembling it into a coherent, professional report. We present A2P-Vis, a two-part, multi-agent pipeline that turns raw datasets into a high-quality data-visualization report. The Data Analyzer orchestrates profiling, proposes diverse visualization directions, generates and executes plotting code, filters lowquality figures with a legibility checker, and elicits candidate insights that are automatically scored for depth, correctness, specificity, depth and actionability. The Presenter then orders topics, composes chart-grounded narratives from the top-ranked insights, writes justified transitions, and revises the document for clarity and consistency, yielding a coherent, publication-ready report. Together, these agents convert raw data into curated materials (charts + vetted insights) and into a readable narrative without manual glue work. We claim that by coupling a quality-assured Analyzer with a narrative Presenter, A2P-Vis operationalizes coanalysis end-to-end, improving the real-world usefulness of automated data analysis for practitioners. For the complete dataset report, please see: https://www.visagent.org/api/output/ f2a3486d-2c3b-4825-98d4-5af25a819f56.
LLM-based AI agents have recently been applied to automate datacentric workflows, from early-stage pipeline construction such as Google's Data Science Agent [1] and related LLM-based systems [2,9,3] to domain-specific applications in genomics [4], machine learning benchmarking [5] and healthcare [6]. Beyond these pipelines, recent systems such as [7,11,10] highlight the emerging role of LLMs in structuring, verifying, and contextualizing insights for visual analytics. Yet, two persistent gaps remain: (i) producing diverse, evidence-rich visualizations with non-trivial insights, and (ii) assembling these materials into a coherent, professional report. We introduce A2P-Vis, a two-part, multi-agent workflow designed to close both gaps, as described in Figure 1. Our design is motivated by how data scientists carefully inspect initial seed data and come up with final insights from trial and errors (pick the best one among generated candidates). Our contributions are as follow:
Data Analyzer explores possible visualization directions, generates candidate plots, filters low-quality charts, finds the best insight from candidates and evaluates insights.
Presenter sequences topics, writes chart-grounded narratives with reasoned transitions, makes conclusions, and polishes the prose as a coherent visual story of findings.
Now we dive into each part of the framework.
This module ingests the dataset and outputs a metadata report, a diverse set of visualization topics, executable plots, and structured insights with scores. As shown in Figure 2, the process is demonstrated on a real dataset with grounded outputs at each stage. Sniffer The Sniffer performs lightweight dataset profiling to establish a reliable contract for downstream agents. It inspects the raw table to extract shape, column names and inferred types. From these basic features and randomly sampled data it produces a concise metadata report, which introduces the dataset, enumerates its attributes, and outlines plausible analysis themes. This report guides the Visualizer to find visualize directions and use valid columns and sensible encodings for code generation, reducing hallucinations and preventing routine failures (e.g., empty plots, degenerate scales). Additionally, it addresses context and cost: rather than streaming the full dataset into the model, downstream components consume the compact profile, which is sufficient for planning visualizations and generating code without exposing raw records. In practice, the Sniffer’s output serves as a schema contract and quality gate, improving the robustness of code generation and keeping subsequent analyses aligned to a consistent data view. Visualizer Given the metadata report, the Visualizer turns profilelevel signals into reliable figures through a tightly coupled, fourstep flow. First, the direction generator proposes concrete analysis targets and topics, emitting machine-readable guidance for each direction (topic, chart type, variables). Next, the code generator compiles that guidance into directly executable script. The script is then executed by the executor, which logs outcomes and, upon failure. If errors are raised, it will invoke the rectifier to repair the code strictly according to the error trace and then execute it again.
This module takes the outputs of the Data Analyzer (the metadata report, visualization topics, and quality-gated charts with scored insights) and assembles them into a publication-ready data visualization report. Following an “overview→ sections→ summary” sequence, it orders topics, drafts the introduction, writes chartgrounded sections with brief transitions, summarizes key takeaways, and finalizes the document with a revision pass. Figure 3 illustrates the workflow and the report layout. Ranker Given an unordered set of topics from the Data Analyzer, the Ranker determines a coherent sequence for narration by analyzing relationships among topics, such as shared variables, temporal order, or thematic similarity. It outputs an ordered list of topics that provides a logical flow for the report, serving as the backbone for the Introductor and subsequent modules. Introductor Given the ranked topic sequence and the metadata report, the Introductor drafts the opening of the report and sets the roadmap. It first presents the dataset’s basic properties such as size,
The dataset at hand is a rich bibliographic collection of academic papers….We will explore…
The
The dataset provides a comprehensive view of academic publishing trends in visualization and computer graphics, revealing several key patterns across topics… Assembler The Assembler compiles the introduction, section content, and summary into a complete Markdown report. It applies a consistent heading hierarchy, embeds figures at their designated locations with captions. It also generates elements such as date, headers, footers, and page numbers to standardize the report format.
Revisor Given the assembled Mar
This content is AI-processed based on open access ArXiv data.