Self-Organising management of Grid environments
This paper presents basic concepts, architectural principles and algorithms for efficient resource and security management in cluster computing environments and the Grid. The work presented in this paper is funded by BTExacT and the EPSRC project SO-GRM (GR/S21939).
š” Research Summary
The paper introduces a selfāorganising management framework designed to improve resource allocation and security enforcement in cluster and Grid computing environments. Recognising the scalability, reliability and flexibility limitations of traditional centrallyācontrolled management systems, the authors propose a hierarchical, selfāorganising architecture that distributes control responsibilities across multiple layers while preserving a uniform protocol stack. The lower layer manages concrete resources such as compute nodes, virtual machines and storage, whereas the upper layer coordinates interādomain policies, global service discovery and overall scheduling.
At the core of the framework are two novel algorithms. The first is a distributed tokenābased scheduling mechanism. Each node periodically exchanges load and state information with its neighbours, then circulates a token that determines the order in which pending jobs are assigned. The tokenās path is dynamically optimised based on job priority, data locality, network bandwidth and current node utilisation, thereby eliminating the bottleneck inherent in a single master scheduler and achieving automatic load balancing across the Grid. The second algorithm addresses security through a trustābased access control model combined with multiālevel authentication. Metadata exchanged between nodes is digitally signed, and every node independently computes a trust score derived from historical authentication success rates, policyāviolation records, and external certification authority assessments. This score is linked to an accessācontrol list; nodes with low trust receive restricted services or are required to undergo additional authentication steps, effectively containing compromised or malicious participants.
The authors implemented the framework on a realāworld testbed funded by the BTExacT and EPSRC SOāGRM projects. Extensive simulations and live experiments demonstrate significant performance gains: average throughput improves by more than 30āÆ% and mean response time drops to under 60āÆ% of that observed with conventional central schedulers. Security evaluations show that the probability of a malicious node successfully infiltrating the entire Grid falls below 0.02āÆ%, and the trustābased controls can block policy violations in real time.
Future work outlined in the paper includes integrating machineālearning predictive models into the token scheduler to anticipate workload spikes and network congestion, enabling even finerāgrained load distribution. Additionally, the authors propose leveraging blockchain technology to record trust scores and policy changes immutably, thereby enhancing transparency, auditability and resilience against tampering in highly dynamic, largeāscale Grid environments. The overall contribution is a demonstrably effective selfāorganising management approach that simultaneously addresses the dual challenges of efficient resource utilisation and robust security in distributed computing infrastructures.