AnoMod: A Dataset for Anomaly Detection and Root Cause Analysis in Microservice Systems
Microservice systems (MSS) have become a predominant architectural style for cloud services. Yet the community still lacks high-quality, publicly available datasets for anomaly detection (AD) and root cause analysis (RCA) in MSS. Most benchmarks emphasize performance-related faults and provide only one or two monitoring modalities, limiting research on broader failure modes and cross-modal methods. To address these gaps, we introduce a new multimodal anomaly dataset built on two open-source microservice systems: SocialNetwork and TrainTicket. We design and inject four categories of anomalies (Ano): performance-level, service-level, database-level, and code-level, to emulate realistic anomaly modes. For each scenario, we collect five modalities (Mod): logs, metrics, distributed traces, API responses, and code coverage reports, offering a richer, end-to-end view of system state and inter-service interactions. We name our dataset, reflecting its unique properties, as AnoMod. This dataset enables (1) evaluation of cross-modal anomaly detection and fusion/ablation strategies, and (2) fine-grained RCA studies across service and code regions, supporting end-to-end troubleshooting pipelines that jointly consider detection and localization.
💡 Research Summary
The paper addresses a critical gap in AIOps research for microservice systems (MSS): the lack of high‑quality, publicly available datasets that capture a wide variety of anomaly types and multiple monitoring modalities. Existing benchmarks focus mainly on performance‑related faults and provide only one or two data streams (typically logs, metrics, or traces), which limits the development of cross‑modal anomaly detection and root‑cause analysis (RCA) techniques.
To fill this void, the authors introduce AnoMod, a multimodal anomaly dataset built on two open‑source microservice applications—SocialNetwork (21 services) and TrainTicket (41 services). They design a taxonomy of anomalies covering four levels: Performance, Service, Database, and Code. Within this taxonomy they implement 24 concrete fault scenarios (e.g., CPU contention, network packet loss, service instance crashes, service‑communication failures, database connection‑pool exhaustion, method‑return‑value manipulation, injected Java exceptions). The selection of injection targets is guided by service‑dependency analysis, ensuring that the most impact‑prone services are exercised, unlike prior work that often injects faults randomly.
AnoMod captures five synchronized modalities during each experiment:
- Logs – timestamped container‑level records of execution flow and errors.
- Metrics – Prometheus time‑series covering node, application, and process indicators.
- Distributed Traces – Jaeger and SkyWalking traces that expose request propagation, latency, and call‑graph structure.
- API Responses – client‑side observations (HTTP status, latency, headers, body) generated by EvoMaster’s automated black‑box test suites, providing a direct view of user‑visible symptoms.
- Code Coverage Reports (CCR) – line/branch/function coverage collected via Gcov (C++) or JaCoCo (Java), linking runtime telemetry to the exact source code executed.
The data‑collection pipeline consists of three tightly controlled phases:
- Phase 1 – Workload Generation: EvoMaster automatically derives API specifications (Swagger/OpenAPI) and produces test suites that maximize endpoint coverage. Only successful test executions are used for data capture.
- Phase 2 – Anomaly Injection: Pre‑defined anomaly configurations are stored in an “Anomaly Library.” Performance, service, and database faults are injected with ChaosMesh (Kubernetes CRDs controlling pod failures, network latency, packet loss via
tc). Code‑level faults are injected with ChaosBlade’s JVM sandbox, allowing dynamic bytecode instrumentation without rebuilding images. The injection starts precisely before the workload and stops immediately after, guaranteeing clean temporal boundaries. - Phase 3 – Data Capture & Cleanup: A master script redeploys a fresh instance of the target MSS for each run, inserts additional informational logs for request‑flow tracing, executes the workload while simultaneously recording all five modalities, then terminates the anomaly, stores the data in a case‑named folder, and finally resets the environment.
Table 3 in the paper quantifies the dataset: millions of log lines, hundreds of thousands of traces, thousands of distinct Prometheus metrics, hundreds of thousands of API requests, and an average code‑coverage of ~78 % across all runs. The dataset thus provides a richly correlated, end‑to‑end view of system state, from low‑level resource metrics to high‑level user outcomes.
Compared with prior MSS datasets (e.g., Nezha, DeepTraLog, Eadro, AIOps’21, RCAEval, Multi‑source, LO2), AnoMod offers a substantially larger set of anomaly types (24 vs. ≤7), more modalities (5 vs. ≤3), and two distinct applications, making it the most comprehensive publicly available benchmark for MSS AIOps to date.
The authors acknowledge limitations: only two open‑source systems are covered, and all faults are injected via chaos‑engineering tools, which may not capture the full complexity of real production incidents. Moreover, the workload is generated solely by EvoMaster’s black‑box testing. Future work will explore additional workload generators, white‑box testing, and broader system coverage.
In conclusion, AnoMod supplies the research community with a high‑fidelity, multimodal benchmark that enables (1) evaluation of cross‑modal anomaly detection and fusion/ablation strategies, and (2) fine‑grained RCA that maps symptoms back to specific code regions and forward to user‑visible impact. By releasing the dataset on Zenodo and providing the collection scripts on GitHub, the authors aim to accelerate the development of more resilient microservice architectures and advanced AIOps solutions.
Comments & Academic Discussion
Loading comments...
Leave a Comment