Wasure: A Modular Toolkit for Comprehensive WebAssembly Benchmarking

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

WebAssembly (Wasm) has become a key compilation target for portable and efficient execution across diverse platforms. Benchmarking its performance, however, is a multi-dimensional challenge: it depends not only on the choice of runtime engines, but also on hardware architectures, application domains, source languages, benchmark suites, and runtime configurations. This paper introduces Wasure, a modular and extensible command-line toolkit that simplifies the execution and comparison of WebAssembly benchmarks. To complement performance evaluation, we also conducted a dynamic analysis of the benchmark suites included with Wasure. Our analysis reveals substantial differences in code coverage, control flow, and execution patterns, emphasizing the need for benchmark diversity. Wasure aims to support researchers and developers in conducting more systematic, transparent, and insightful evaluations of WebAssembly engines.

💡 Research Summary

The paper presents Wasure, a modular, extensible command‑line toolkit designed to simplify and systematize WebAssembly (Wasm) benchmarking across a wide range of dimensions: runtime engines, hardware platforms, source languages, benchmark suites, and configuration options. Implemented in Python and targeting Unix‑like systems, Wasure offers a three‑phase workflow—Preparation, Execution, and Evaluation—each encapsulated in dedicated subcommands.

During the Preparation phase, users can install or manually register Wasm runtimes (both browser‑based engines such as V8, SpiderMonkey, JavaScriptCore and standalone engines like wasmtime, wasmer, WASMEdge, wasm3, Wizard) and organize benchmarks into suites (e.g., MiBench, Ostrich). The toolkit supports “sub‑runtimes,” allowing a single engine to be benchmarked under different modes (interpreter, JIT, AOT) without duplicating installation metadata.

The Execution phase runs selected benchmarks on one or more runtimes, automatically measuring wall‑clock time, resident set size (RSS), and virtual memory size (VMS). It supports configurable repetitions, time‑outs, and per‑benchmark score extraction via regular expressions. A special “check” command executes two curated benchmark collections—wasm‑features and wasi‑proposals—to automatically detect which language or WASI proposals each engine supports, thereby surfacing compatibility gaps in a reproducible manner.

After runs complete, the Evaluation phase stores results in structured JSON files, offers CSV export, and generates comparative plots. The plotting module automatically chooses between normalized visualizations (when multiple runtimes have run the same benchmark) and absolute visualizations (when a benchmark is executed by a single runtime), ensuring fair comparison without manual post‑processing.

Beyond traditional performance metrics, the authors augment Wasure with a dynamic analysis component using the Wizard engine’s non‑intrusive instrumentation. By instrumenting every benchmark, they collect fine‑grained data such as function‑call frequencies, basic‑block coverage, and operation‑type distributions. The analysis reveals substantial variability: code‑coverage differences of up to 30 % across engines, markedly higher branch‑density in interpreter‑based runtimes, and distinct optimization patterns among compilation back‑ends (single‑pass, Cranelift, LLVM). These findings underscore that raw execution time alone cannot capture the nuanced behavior of Wasm programs and that benchmark diversity is essential for a holistic evaluation.

Wasure’s design emphasizes extensibility: new runtimes are added via simple JSON descriptors specifying executable paths, flags, and output parsers; new benchmarks are introduced by defining input files, argument handling, correctness checks, and score extraction rules. This low‑overhead approach enables researchers to keep pace with the rapidly evolving Wasm ecosystem, incorporate emerging standards, and share reproducible experiment configurations.

In summary, Wasure provides a comprehensive, reproducible, and transparent framework for Wasm performance evaluation, coupling systematic benchmark execution with deep dynamic analysis. It equips developers and researchers with the tools needed to make informed decisions about engine selection, optimization strategies, and future standard development, thereby advancing the state of empirical research in the WebAssembly domain.

Wasure: A Modular Toolkit for Comprehensive WebAssembly Benchmarking

💡 Research Summary

Comments & Academic Discussion

Leave a Comment