Wrangling Rogues: Managing Experimental Post-Moore Architectures

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

The Rogues Gallery is a new experimental testbed that is focused on tackling “rogue” architectures for the Post-Moore era of computing. While some of these devices have roots in the embedded and high-performance computing spaces, managing current and emerging technologies provides a challenge for system administration that are not always foreseen in traditional data center environments. We present an overview of the motivations and design of the initial Rogues Gallery testbed and cover some of the unique challenges that we have seen and foresee with upcoming hardware prototypes for future post-Moore research. Specifically, we cover the networking, identity management, scheduling of resources, and tools and sensor access aspects of the Rogues Gallery and techniques we have developed to manage these new platforms.

💡 Research Summary

The paper presents the design, implementation, and operational experience of the “Rogues Gallery,” an experimental testbed created to support emerging post‑Moore computing architectures—referred to as “rogues”—such as memory‑centric FPGAs, the Emu Chick neuromorphic system, and a field‑programmable analog array (FPAA). The authors argue that building a separate data‑center for each new prototype is prohibitively expensive and that a shared, managed environment can accelerate research, education, and industry collaboration.

Key motivations include the imminent end of traditional transistor scaling, the proliferation of heterogeneous accelerators (neuromorphic, quantum, reversible, etc.), and the lack of systematic guidance for evaluating these devices. The Rogues Gallery, launched by Georgia Tech’s Center for Research into Novel Computing Hierarchies (CRNCH) in 2017, acquires hardware from vendors, start‑ups, and research labs and makes it available to students, faculty, and external collaborators under a controlled data‑center‑like setting.

The initial hardware inventory comprises:

Two Nallatech 385 PCIe cards (Intel Arria 10, 8 GiB) and a Nallatech 520N PCIe card (Intel Stratix 10, 32 GiB, 100 Gbps network ports), plus a retired Micron AC‑510 card that pairs an Xilinx UltraScale‑060 with a 4 GiB Hybrid Memory Cube.
An Emu Chick system, which features a stationary Linux host and multiple “nodelets” each containing cache‑less Gossamer cores tightly coupled to memory, connected via RapidIO fabric.
A Georgia‑Tech‑developed FPAA, a USB‑attached board that combines a 16‑bit MSP430 microcontroller with a 2‑D array of ultra‑low‑power analog/digital processing elements.

Management challenges identified are networking, identity and access control, resource scheduling, and tool/sensor integration. The authors adopt a pragmatic approach that heavily re‑uses existing data‑center software:

Scheduling – Slurm is the primary resource manager for conventional CPUs, GPUs, and FPGA cards. Because the Emu Chick’s OS image is immutable and cannot run a Slurm daemon, the authors employ a “soft‑schedule” model using mailing‑list coordination and a front‑end virtual machine (VM) that mediates access.
Containerization – Singularity containers encapsulate compilation toolchains (e.g., Emu compilers, FPGA OpenCL) across multiple Linux distributions, avoiding the need to maintain parallel VM images. Early experiments with Kata containers aim to provide limited, sandboxed execution for anonymous users.
Networking – Dedicated “wrangling” switches enforce strict inbound SSH access while allowing outbound traffic for data set retrieval. The network topology isolates each rogue device behind firewalls, and the central IT admin network retains unrestricted access for maintenance.
Identity Management – Integration with Georgia Tech’s campus authentication system (CAS) and LDAP provides group‑based POSIX permissions. For devices that cannot run LDAP agents (e.g., Emu Chick, stand‑alone FPGA dev kits), a front‑end VM acts as a proxy, exposing the hardware only through controlled SSH tunnels and cgroup isolation.
Tool Support – Development environments are delivered as VM images or shared file‑shares; Jupyter notebooks are used for tutorials and demos. Power‑monitoring infrastructure for embedded platforms (e.g., NVIDIA Tegra) is also incorporated, though not discussed in depth.

Operational lessons include:

Invest, but expect turnover – Early prototypes may be short‑lived; therefore, the testbed minimizes dedicated physical servers, relying on VMs and containers to reduce re‑deployment effort.
Community is essential – Without an active user base, even technically impressive hardware will disappear. The Rogues Gallery fosters community through documentation wikis, mailing lists, workshops, and close vendor interaction.
Licensing and IP protection – Managing proprietary software licenses and protecting intellectual property requires careful segregation of privileges and network isolation.
Hardware‑agnostic tooling – By abstracting devices behind standard interfaces (Slurm, containers, VMs), the testbed can accommodate a wide variety of future rogues without extensive custom integration.

The authors plan an annual review of the hardware portfolio, guided by user feedback and emerging research trends, to keep the testbed up‑to‑date while keeping administrative overhead low. They also identify future work such as automated hardware discovery, more granular multi‑tenant security models, and the definition of standardized APIs for emerging domains like quantum and neuromorphic computing.

In summary, the Rogues Gallery demonstrates that a modest investment in virtualization, containerization, and existing cluster management tools can provide a flexible, secure, and community‑driven platform for exploring the diverse landscape of post‑Moore computing architectures. This approach offers a practical roadmap for other institutions seeking to lower the barrier to entry for experimental hardware while maintaining the operational rigor of a production data center.

Wrangling Rogues: Managing Experimental Post-Moore Architectures

💡 Research Summary

Comments & Academic Discussion

Leave a Comment