$ exttt{lrnnx}$: A library for Linear RNNs

$	exttt{lrnnx}$: A library for Linear RNNs
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Linear recurrent neural networks (LRNNs) provide a structured approach to sequence modeling that bridges classical linear dynamical systems and modern deep learning, offering both expressive power and theoretical guarantees on stability and trainability. In recent years, multiple LRNN-based architectures have been proposed, each introducing distinct parameterizations, discretization schemes, and implementation constraints. However, existing implementations are fragmented across different software frameworks, often rely on framework-specific optimizations, and in some cases require custom CUDA kernels or lack publicly available code altogether. As a result, using, comparing, or extending LRNNs requires substantial implementation effort. To address this, we introduce $\texttt{lrnnx}$, a unified software library that implements several modern LRNN architectures under a common interface. The library exposes multiple levels of control, allowing users to work directly with core components or higher-level model abstractions. $\texttt{lrnnx}$ aims to improve accessibility, reproducibility, and extensibility of LRNN research and applications. We make our code available under a permissive MIT license.


💡 Research Summary

The paper introduces lrnnx, a unified open‑source library that consolidates a wide range of modern Linear Recurrent Neural Network (LRNN) architectures under a single, PyTorch‑centric API. Linear RNNs retain the attractive O(1) inference cost of classic recurrent models while guaranteeing stability through carefully designed parameterizations and discretization schemes. Over the past few years, numerous LRNN variants—such as S4, S5, LR‑U, S6, S7, STREAM, and Centaurus—have been proposed, each with its own matrix parameterization (time‑invariant LTI vs. time‑varying LTV), discretization method (zero‑order hold, bilinear, Dirac, event‑driven), and software stack (PyTorch, JAX, custom CUDA kernels). This diversity has resulted in a fragmented ecosystem: researchers must switch frameworks, rewrite data pipelines, and often implement low‑level CUDA kernels to reproduce results or benchmark models fairly.

lrnnx addresses these pain points through three core design principles:

  1. Unified Interface – All layers are defined by the state‑space recurrence
    (x_k = A(k)x_{k-1} + B(k)u_k,; y_k = C(k)x_k + D(k)u_k).
    The library provides two abstract base classes, LTI_LRNN and LTV_LRNN, that encapsulate time‑invariant and time‑varying parameterizations respectively. Concrete subclasses (e.g., S4, S5, LRU, S6, S7, STREAM, Centaurus) inherit from these bases, exposing a consistent constructor signature and a common set of methods (forward, step, discretize). Switching from an S5 to an S7 model, for instance, requires only a different class instantiation, leaving the surrounding training loop untouched.

  2. Multi‑Level Control – The library separates low‑level core recurrences (implemented with highly optimized einsum operations) from high‑level model components such as embedding layers, residual blocks, and language‑model heads. This hierarchy enables researchers to experiment with novel discretization schemes or custom state‑space parameterizations without rewriting the entire model, while also allowing end‑users to drop a ready‑made LRNN‑based encoder or decoder into existing pipelines.

  3. Modular Discretization & Custom CUDA Kernels – Discretization strategies are decoupled from layer definitions. Supported schemes include Zero‑Order Hold (ZOH), bilinear (Tustin), Dirac, and asynchronous event‑driven discretization. For LTV layers, the authors provide bespoke CUDA kernels inspired by the selective scan implementation in Mamba (Gu & Dao, 2024). These kernels fuse the scan operation with output projection, dramatically reducing memory overhead and eliminating per‑step CPU‑GPU synchronizations that typically plague PyTorch loops. The kernels are compatible with all supported discretizations, offering a performance edge over naïve JAX lax.scan or pure Python loops.

Performance Evaluation – Benchmarks were run on an NVIDIA A100 40 GB GPU using Python 3.12 and CUDA 12.9. The authors compared lrnnx implementations of LR‑U, S5, and Mamba against their original public repositories (JAX or PyTorch). Experiments varied batch size, sequence length (256 to 2048), and model dimension (state size fixed at 16). For each configuration, 10 warm‑up passes followed by 90 timed forward‑backward passes were executed, repeated five times; mean and standard deviation across runs were reported. Results show that lrnnx matches or slightly outperforms reference implementations in training throughput and remains competitive in autoregressive inference, with only a modest overhead at very short sequences due to PyTorch’s CPU‑side step handling.

Limitations – The current release is PyTorch‑only, excluding JAX or TensorFlow users. Custom CUDA kernels are optimized for NVIDIA hardware, so performance on AMD or CPU‑only platforms may be lower. Integration with large‑scale distributed training ecosystems (Hugging Face Transformers, DeepSpeed, FSDP) is not provided out‑of‑the‑box, requiring manual adapter layers. The library also lacks bidirectional LRNN variants and does not cover recent nonlinear recurrent models such as xLSTM, which are orthogonal but increasingly popular.

Conclusion – By standardizing the fragmented LRNN landscape, lrnnx dramatically lowers the engineering barrier for reproducibility, cross‑model benchmarking, and rapid prototyping. Its modular architecture, support for multiple discretizations, and high‑performance CUDA kernels make it a practical tool for long‑sequence modeling tasks in audio, vision, event‑stream, and other signal‑heavy domains where Transformers face quadratic scaling issues. The authors anticipate that lrnnx will become a foundational resource for the community, encouraging broader adoption of linear recurrent approaches and fostering further research into stability‑aware sequence models.


Comments & Academic Discussion

Loading comments...

Leave a Comment