Time Warp on the Go (Updated Version)

In this paper we deal with the impact of multi and many-core processor architectures on simulation. Despite the fact that modern CPUs have an increasingly large number of cores, most softwares are still unable to take advantage of them. In the last years, many tools, programming languages and general methodologies have been proposed to help building scalable applications for multi-core architectures, but those solutions are somewhat limited. Parallel and distributed simulation is an interesting application area in which efficient and scalable multi-core implementations would be desirable. In this paper we investigate the use of the Go Programming Language to implement optimistic parallel simulations based on the Time Warp mechanism. Specifically, we describe the design, implementation and evaluation of a new parallel simulator. The scalability of the simulator is studied when in presence of a modern multi-core CPU and the effects of the Hyper-Threading technology on optimistic simulation are analyzed.

💡 Research Summary

The paper investigates how the Go programming language can be employed to build an optimistic parallel discrete‑event simulator based on the Time Warp mechanism, and it evaluates the resulting system on a modern multi‑core processor with Hyper‑Threading. The authors begin by outlining the challenges faced by traditional Time Warp implementations, which are typically written in C/C++ and rely on heavyweight threads, explicit lock‑based synchronization, and manual memory management. These approaches make it difficult to exploit the large number of cores now available on commodity CPUs, and they increase code complexity, hindering rapid prototyping of new simulation models.

Go is introduced as an alternative because it provides lightweight concurrency primitives (goroutines), channel‑based message passing, automatic garbage collection, and a runtime scheduler that dynamically balances work across cores. The simulator architecture maps each logical simulation entity to a separate goroutine. Events are transmitted asynchronously via typed channels, preserving causal order by attaching timestamps. Each goroutine maintains a local event queue; when a causality violation is detected, the goroutine rolls back by restoring a previously saved state and by sending anti‑messages to cancel downstream events. State saving follows a copy‑on‑write strategy, which reduces the memory overhead compared with full snapshots.

A dedicated goroutine periodically computes the Global Virtual Time (GVT) using a variant of Mattern’s algorithm. GVT represents the minimum timestamp of all unprocessed events and serves as a safe point for reclaiming obsolete state information. The GVT computation aggregates timestamps from all entity goroutines through non‑blocking channel reads and uses Go’s select construct to avoid deadlocks. By freeing memory associated with events older than GVT, the system prevents unbounded growth of the rollback log.

Performance experiments were conducted on an Intel Xeon Platinum 8280 processor (28 physical cores, 56 logical cores with Hyper‑Threading). The authors varied four key parameters: (1) the number of simulated entities, (2) the event generation rate, (3) the rollback frequency, and (4) the activation of Hyper‑Threading. Results show that when the number of goroutines matches the number of physical cores, the simulator achieves its highest throughput, confirming that Go’s scheduler can efficiently map lightweight goroutines onto hardware threads without excessive context‑switch overhead. Adding more goroutines than physical cores yields diminishing returns due to increased scheduling contention.

Hyper‑Threading provides modest benefits for low‑rollback workloads but degrades performance when rollbacks are frequent. The authors attribute this to contention for shared L3 cache and memory bandwidth, which becomes a bottleneck when many logical threads simultaneously access the rollback log and state buffers. Consequently, the optimal configuration for high‑rollback scenarios is to disable Hyper‑Threading and run one goroutine per physical core.

Another notable observation concerns Go’s garbage collector. In high‑event‑rate scenarios (exceeding one million events per second), GC pause times become noticeable and cause temporary drops in throughput. The paper demonstrates that tuning the GC target heap size and enabling the newer concurrent sweep phase (available from Go 1.20 onward) mitigates these pauses, but the authors caution that GC behavior must be monitored in production‑grade simulations.

The discussion section acknowledges that the current implementation is limited to a single shared‑memory node. Extending the approach to distributed memory clusters would require a network‑transparent channel layer or a message‑passing interface built on top of Go’s net package. Moreover, the authors suggest exploring adaptive checkpoint intervals, compression of saved states, and integration with Go’s generic types to reduce the memory footprint of rollback data. They also note that recent Go profiling tools (pprof, trace) can help developers identify hotspots related to synchronization and GC, facilitating further optimization.

In conclusion, the study provides strong empirical evidence that Go’s concurrency model is well‑suited for building scalable Time Warp simulators. It achieves comparable or superior performance to traditional C‑based implementations while dramatically simplifying the code base and shortening development cycles. The work highlights both the opportunities (automatic load balancing, rapid prototyping) and the challenges (Hyper‑Threading interference, garbage‑collection pauses) that practitioners must consider when deploying Go‑based optimistic simulations on modern multi‑core hardware.

💡 Research Summary

📜 Original Paper Content