E-science applications may require huge amounts of data and high processing power where grid infrastructures are very suitable for meeting these requirements. The load distribution in a grid may vary leading to the bottlenecks and overloaded sites. We describe a hierarchical dynamic load balancing protocol for Grids. The Grid consists of clusters and each cluster is represented by a coordinator. Each coordinator first attempts to balance the load in its cluster and if this fails, communicates with the other coordinators to perform transfer or reception of load. This process is repeated periodically. We analyze the correctness, performance and scalability of the proposed protocol and show from the simulation results that our algorithm balances the load by decreasing the number of high loaded nodes in a grid environment.
Future e-science applications will require efficient processing of the data [1] where storages and processors may be distributed among the collaborating researchers. A computational Grid consists of heterogenous computational resources, possibly with different users, and provide them with remote access to these resources [2,3,4] and it is an ideal computing environment for e-science applications [5,6,7]. The Grid has attracted researchers as an alternative to supercomputers for high performance computing. One important advantage of Grid computing is the provision of resources to the users that are locally unavailable. Since there are multitude of resources in a Grid environment, convenient utilization of resources in a Grid provides improved overall system performance and decreased turn-around times for user jobs [8]. Users of the Grid submit jobs at random times. In such a system, some computers are heavily loaded while others have available processing capacity. The goal of a load balancing protocol is to transfer the load from heavily loaded machines to idle computers, hence balance the load at the computers and increase the overall system performance. Contemporary load balancing algorithms across multiple/distributed processor environments target the efficient utilization of a single resource and even for algorithms targeted towards multiple resource usage, achieving scalability may turn out difficult to overcome.
A major drawback in the search for load balancing algorithms across a Grid is the lack of scalability and the need to acquire system-wide knowledge by the nodes of such a system to perform load balancing decisions. Scalability is an important requirement for Grids like NASA`s Information Power Grid (IPG) [9]. Some algorithms have a central approach, yet others require acquisition of global system knowledge. Scheduling over a wide area network requires transfer and location policies. Transfer policies decide when to do the transfer [10] and this is typically based on some threshold value for the load. The location policy [11] decides where to send the load based on the system wide information. Location policies can be sender initiated [12,13,14] where heavily loaded nodes search for lightly loaded nodes, receiver initiated [15] in which case, lightly-loaded nodes search for senders or symmetrical where both senders and receivers search for partners [16]. Some agent based and game theoretic approaches were also proposed previously [17,18,19,20]. Load balancing across a Grid usually involves sharing of data as in an MPI (Message Passing Interface) scatter operation as in [21], [22]. MPICH-G2, is a Grid-enabled implementation of MPI that allows a user to run MPI programs across multiple computers, at the same or different sites, using the same commands that would be used on a parallel computer [23].
In this study, we propose a dynamic and a distributed protocol based on our previous work [24] with major modifications and detailed test results to perform load balancing in Grids. The protocol uses the clusters of the Grid to perform local load balancing decision within the clusters and if this is not possible, load balancing is performed among the clusters under the control of clusterheads called the coordinators. We show that the protocol designed is scalable and has favorable message and time complexities.
The rest of the paper is organized as follows: In Section 2, the proposed protocol including the coordinator and the node algorithms is described with the analysis. In Section 3, the implementation of the protocol using an example is detailed and test results using Indiana University Grid environment are analyzed in Section 4 and Section 5 has the concluding remarks along with discussions.
We extend the load balancing protocol [24] for Grids to achieve a more balanced load distribution [25]. We use the same daisy architecture shown in Fig. 1, which is shown to be more scalable for group communication among other well-known architectures [26]. In this architecture, coordinators are the interface points for the nodes to the ring and perform load transfer decisions on behalf of the nodes in their clusters they represent. They check whether load can be balanced locally and if this is not possible, they search for potential receivers across the Grid same as in [24]. Additionally, by using the advantage of a daisy architecture, a token circulates the coordinator ring and can carry the global load information to all coordinators when it is needed. By using this information, coordinators can distribute the load in a more balanced way and also know the time to finish. These extensions are detailed in Section 2.1.
In this paper, we categorize the Load to LOW, MEDIUM and HIGH classes. The node is LOW when it can accept load from other nodes. A node is HIGH loaded when it is detected to be higher than the upper threshold as defined in the previous protocol [24]. The main difference is in the MEDIUM load
This content is AI-processed based on open access ArXiv data.