Ariel-ML: Computing Parallelization with Embedded Rust for Neural Networks on Heterogeneous Multi-core Microcontrollers

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Low-power microcontroller (MCU) hardware is currently evolving from single-core architectures to predominantly multi-core architectures. In parallel, new embedded software building blocks are more and more written in Rust, while C/C++ dominance fades in this domain. On the other hand, small artificial neural networks (ANN) of various kinds are increasingly deployed in edge AI use cases, thus deployed and executed directly on low-power MCUs. In this context, both incremental improvements and novel innovative services will have to be continuously retrofitted using ANNs execution in software embedded on sensing/actuating systems already deployed in the field. However, there was so far no Rust embedded software platform automating parallelization for inference computation on multi-core MCUs executing arbitrary TinyML models. This paper thus fills this gap by introducing Ariel-ML, a novel toolkit we designed combining a generic TinyML pipeline and an embedded Rust software platform which can take full advantage of multi-core capabilities of various 32bit microcontroller families (Arm Cortex-M, RISC-V, ESP-32). We published the full open source code of its implementation, which we used to benchmark its capabilities using a zoo of various TinyML models. We show that Ariel-ML outperforms prior art in terms of inference latency as expected, and we show that, compared to pre-existing toolkits using embedded C/C++, Ariel-ML achieves comparable memory footprints. Ariel-ML thus provides a useful basis for TinyML practitioners and resource-constrained embedded Rust developers.

💡 Research Summary

This paper introduces Ariel-ML, a novel toolkit designed to automate the parallelization of neural network inference on heterogeneous multi-core microcontrollers (MCUs) using the Rust programming language. The work addresses a critical gap in the evolving landscape of edge AI and TinyML, where low-power MCU hardware is shifting from single-core to multi-core architectures, and where Rust is increasingly replacing C/C++ for new embedded software due to its memory safety guarantees.

Ariel-ML consists of a host-side build system and a device-side model runtime. The build system integrates the IREE (Intermediate Representation Execution Environment) compiler to transpile models from mainstream frameworks like TensorFlow and PyTorch into optimized IREE modules tailored for specific target MCUs (supporting Arm Cortex-M, RISC-V, and ESP-32 families). This module contains model weights, operator implementations compiled to machine code, and VM bytecode describing the dataflow. The module, along with metadata (e.g., number of cores), is then co-compiled with application code, the IREE runtime, and the Rust-based Ariel OS to produce a deployable firmware image.

On the device, the Ariel-ML runtime core provides Rust bindings to the C-based IREE runtime, a profiler, and a key innovation: a greedy multicore scheduler. During inference, the IREE VM interprets bytecode to generate a sequence of dispatch commands for model operators. Each operator’s computation is divided into contention-free work items. The greedy scheduler dynamically pops these work items from a queue and dispatches them to available cores, minimizing idle time and enabling parallel execution both within (intra-operator) and across (inter-operator) model layers.

The authors evaluated Ariel-ML on several popular IoT boards (Nordic nRF52840dk, Espressif ESP32C3-devkit, RaspberryPi Pico) using a quantized LeNet-5 model, comparing it against RIOT-ML (a C/uTVM-based pipeline) and a hybrid RIOT+IREE variant. The results demonstrate several key findings: (1) Integrating IREE significantly reduces inference latency compared to uTVM-based approaches, due to its advanced optimizations and LLVM backend, though it incurs a memory overhead. (2) Ariel-ML achieves comparable memory footprints to existing C/C++ toolkits despite the Rust runtime, showing its efficiency. (3) On multi-core hardware like the RaspberryPi Pico, Ariel-ML’s scheduler effectively leverages parallel resources to further reduce latency.

In summary, Ariel-ML presents the first embedded Rust software platform that automates parallel inference for arbitrary TinyML models across diverse multi-core MCUs. By combining a modern ML compiler (IREE), a safe systems language (Rust), and an efficient runtime scheduler, it provides a foundational toolkit for developers facing the dual challenges of performance and security in next-generation edge AI applications. The full implementation is released as open source.

Ariel-ML: Computing Parallelization with Embedded Rust for Neural Networks on Heterogeneous Multi-core Microcontrollers

💡 Research Summary

Comments & Academic Discussion

Leave a Comment