Parallel Discrete Event Simulation with Erlang
Discrete Event Simulation (DES) is a widely used technique in which the state of the simulator is updated by events happening at discrete points in time (hence the name). DES is used to model and analyze many kinds of systems, including computer architectures, communication networks, street traffic, and others. Parallel and Distributed Simulation (PADS) aims at improving the efficiency of DES by partitioning the simulation model across multiple processing elements, in order to enabling larger and/or more detailed studies to be carried out. The interest on PADS is increasing since the widespread availability of multicore processors and affordable high performance computing clusters. However, designing parallel simulation models requires considerable expertise, the result being that PADS techniques are not as widespread as they could be. In this paper we describe ErlangTW, a parallel simulation middleware based on the Time Warp synchronization protocol. ErlangTW is entirely written in Erlang, a concurrent, functional programming language specifically targeted at building distributed systems. We argue that writing parallel simulation models in Erlang is considerably easier than using conventional programming languages. Moreover, ErlangTW allows simulation models to be executed either on single-core, multicore and distributed computing architectures. We describe the design and prototype implementation of ErlangTW, and report some preliminary performance results on multicore and distributed architectures using the well known PHOLD benchmark.
💡 Research Summary
The paper presents ErlangTW, a parallel discrete‑event simulation (PDES) middleware built entirely in the functional, concurrent language Erlang. The authors argue that the inherent features of Erlang—lightweight processes, asynchronous message passing, built‑in distribution, and the OTP supervision framework—make it especially suitable for implementing optimistic synchronization protocols such as Time Warp. In ErlangTW each logical process (LP) is an Erlang process that maintains its own event queue and rollback log. Events are encoded as tuples and sent as ordinary Erlang messages. When an LP receives an event whose timestamp is earlier than the current virtual time, it rolls back to the appropriate state and generates anti‑messages to cancel previously sent events, exactly as prescribed by the Time Warp algorithm.
A dedicated GVT (Global Virtual Time) manager periodically collects the minimum timestamp from all LPs, computes the global lower bound, and triggers garbage collection of rollback logs. Because GVT calculation is a potential bottleneck, the implementation allows the user to tune the GVT interval and the aggressiveness of log pruning. The whole system is organized as a hierarchy of OTP supervisors, which guarantees that failures of individual LPs or of the GVT manager are automatically detected and recovered without crashing the simulation.
To evaluate scalability, the authors used the standard PHOLD benchmark on three hardware configurations: a 4‑core workstation, an 8‑core workstation, and a 16‑node cluster (each node with 4 cores). Results show near‑linear speed‑up on the multicore machines when the GVT interval is chosen to keep rollback frequency low. On the cluster, performance gains taper off as network latency and bandwidth constraints increase the number of rollbacks, confirming the well‑known sensitivity of optimistic PDES to communication overhead. Nevertheless, even on the distributed platform ErlangTW achieved respectable throughput, demonstrating that Erlang’s efficient message routing and process scheduling can mitigate much of the cost.
Beyond raw performance, the paper emphasizes development productivity. In traditional C++‑based PDES frameworks, programmers must manually manage thread pools, lock‑free queues, and complex rollback data structures. With ErlangTW, a simulation model reduces to a set of callback functions that define how an LP processes an incoming event and generates new events. The same model code runs unchanged on a single core, a multicore machine, or a cluster, because Erlang abstracts away the details of node discovery and inter‑node communication. This dramatically lowers the learning curve for researchers who need to prototype large‑scale simulations but lack deep parallel‑programming expertise.
The authors conclude that Erlang provides a compelling platform for building PDES middleware: it offers high‑level abstractions that simplify optimistic synchronization, robust fault‑tolerance via OTP, and seamless scalability across heterogeneous hardware. Future work is outlined in three directions: (1) adaptive GVT algorithms that react to runtime load and network conditions, (2) hybrid synchronization schemes that combine optimistic and conservative techniques to reduce rollback overhead in high‑latency environments, and (3) memory‑efficient log management for simulations with billions of events. The paper also suggests applying ErlangTW to real‑world case studies such as network protocol evaluation and urban traffic modeling to further validate its practicality.
Comments & Academic Discussion
Loading comments...
Leave a Comment