Maintaining Virtual Areas on FPGAs using Strip Packing with Delays

Every year, the computing resources available on dynamically partially reconfigurable devices increase enormously. In the near future, we expect many applications to run on a single reconfigurable device. In this paper, we present a concept for multitasking on dynamically partially reconfigurable systems called virtual area management. We explain its advantages, show its challenges, and discuss possible solutions. Furthermore, we investigate one problem in more detail: Packing modules with time-varying resource requests. This problem from the reconfigurable computing field results in a completely new optimization problem not tackled before. ILP-based and heuristic approaches are compared in an experimental study and the drawbacks and benefits discussed.

💡 Research Summary

The paper addresses the emerging need to run many independent applications concurrently on a single dynamically partially reconfigurable (DPR) FPGA. To enable this, the authors introduce a “virtual area” (VA) management concept that logically partitions the physical fabric into isolated regions, allowing each application to be loaded, reconfigured, and executed without interfering with the others. The central technical challenge is that the resource demands of modules change over time: a module may start with a modest footprint and later request additional logic, DSP blocks, or BRAM. The authors model this as a novel variant of the strip‑packing problem, which they call “Strip Packing with Delays.” In this formulation, the FPGA’s one‑dimensional resource (e.g., logic cells) is treated as a strip of fixed width, while modules are represented by time‑varying rectangles that can be delayed (i.e., shifted later in time) to avoid conflicts.

The paper first presents an integer linear programming (ILP) model that captures the start time, allocated resource amount, and possible delay for each module. Constraints enforce that the total allocated resources never exceed the FPGA capacity, that modules do not overlap in the same time slot, and that delayed allocations respect the module’s execution semantics. The objective is to minimize the makespan or the peak resource usage. While the ILP yields optimal solutions for small problem instances, its size grows combinatorially with the number of modules and time slots, making it impractical for realistic workloads.

To provide scalable alternatives, three heuristic algorithms are developed:

Greedy Height‑First (GHF) – Modules are sorted by their maximum instantaneous resource demand; the algorithm places the most demanding module first and inserts delays only when necessary to fit the remaining modules.
First‑Fit with Delay (FFD) – Modules are processed in chronological order; each module is placed in the earliest time slot where sufficient resources are available, and a delay is introduced if no immediate slot fits.
Simulated Annealing (SA) – An initial solution generated by GHF is iteratively refined by randomly perturbing start times and delay decisions, accepting changes according to a temperature schedule that gradually reduces exploration.

Experimental evaluation uses synthetic workloads that mimic realistic FPGA applications (e.g., video processing kernels, cryptographic accelerators) with varying numbers of modules (5–30) and diverse resource‑time profiles. Results show that the ILP solves instances up to about ten modules optimally within minutes, but for larger instances the runtime explodes to hours. The GHF and FFD heuristics produce feasible schedules in seconds, typically within 5–15 % of the ILP optimum. The SA approach improves on the greedy baselines, achieving solutions within 2–8 % of optimal while still completing in under a minute for the largest test cases. Importantly, the ability to delay module execution allows the scheduler to hide reconfiguration latency, thereby reducing overall system response time compared with a naïve “no‑delay” approach.

Beyond algorithmic contributions, the paper discusses system‑level implications of VA management. Implementing VAs requires (a) a runtime interface that guarantees data consistency across reconfiguration boundaries, (b) power and thermal monitoring to avoid hot‑spot formation when multiple VAs are active, and (c) integration with a real‑time scheduler that can enforce QoS constraints. The authors argue that the delay mechanism must be carefully bounded: excessive postponement can violate timing guarantees of latency‑sensitive tasks, while insufficient delay may lead to resource contention and failed placements. Consequently, a dynamic policy that adapts delay allowances based on current workload and QoS requirements is recommended.

In conclusion, the study proposes a comprehensive framework for multitasking on DPR‑FPGAs: virtual area isolation combined with a time‑aware strip‑packing model that accommodates dynamic resource requests. By comparing exact ILP solutions with fast heuristics, the authors provide practical guidance on trade‑offs between optimality and runtime. Future work is outlined in three directions: extending the model to multi‑resource dimensions (logic, DSP, BRAM simultaneously), incorporating dynamic power/thermal models into the placement algorithm, and integrating the VA manager with a full operating system stack for real‑time embedded applications. This research paves the way for highly efficient, flexible FPGA platforms capable of hosting a multitude of heterogeneous workloads on a single chip.

💡 Research Summary

📜 Original Paper Content