QOS based user driven scheduler for grid environment

QOS based user driven scheduler for grid environment
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

As grids are in essence heterogeneous, dynamic, shared and distributed environments, managing these kinds of platforms efficiently is extremely complex. A promising scalable approach to deal with these intricacies is the design of self-managing of autonomic applications. Autonomic applications adapt their execution accordingly by considering knowledge about their own behaviour and environmental conditions.QoS based User Driven scheduling for grid that provides the self-optimizing ability in autonomic applications. Computational grids to provide a user to solve large scale problem by spreading a single large computation across multiple machines of physical location. QoS based User Driven scheduler for grid also provides reliability of the grid systems and increase the performance of the grid to reducing the execution time of job by applying scheduling policies defined by the user. The main aim of this paper is to distribute the computational load among the available grid nodes and to developed a QoS based scheduling algorithm for grid and making grid more reliable.Grid computing system is different from conventional distributed computing systems by its focus on large scale resource sharing, where processors and communication have significant inuence on Grid computing reliability. Reliability capabilities initiated by end users from within applications they submit to the grid for execution. Reliability of infrastructure and management services that perform essential functions necessary for grid systems to operate, such as resource allocation and scheduling.


💡 Research Summary

The paper proposes a QoS‑driven, user‑controlled scheduler for grid computing environments that explicitly incorporates reliability metrics into the resource allocation decision process. Recognizing that traditional grid schedulers focus mainly on performance indicators such as CPU speed or network bandwidth, the authors argue that autonomic applications require a richer set of quality‑of‑service (QoS) parameters, especially execution time and failure probability, which must be reflected in scheduling policies defined by the end‑user.

The authors first review related work, including Cellular Memetic Algorithms (cMAs) for batch scheduling, graph‑theoretic QoS‑aware service matching, and existing middleware such as EasyGrid AMS and the Alchemi framework. They note that most existing solutions either ignore node failures or treat them as a post‑hoc recovery step, which can lead to sub‑optimal makespan and reduced overall system reliability.

The core contribution is a scheduling algorithm that evaluates each grid node on two criteria: (1) a “success rate” derived from the inverse of the failure‑to‑repair rate (i.e., the probability that a node will successfully execute a job) and (2) the estimated execution time based on the node’s MIPS rating. The algorithm proceeds as follows:

  1. Discover all available nodes.
  2. Compute the success rate for each node.
  3. Rank nodes by descending success rate.
  4. If two or more nodes share the same success rate, compare their estimated execution times and select the node with the smallest value.
  5. Assign jobs to the selected node in a round‑robin fashion, repeating the process for subsequent jobs.

To support the reliability‑aware decision making, the paper introduces a qualitative five‑level reliability scale (High, Good, Medium, Low, Poor) that maps to quantitative availability ranges (90‑100 %, 80‑89 %, etc.). Users can specify a desired reliability level when submitting a job, and the scheduler will prioritize nodes that satisfy or exceed that level.

The authors also embed a Markov Reward Model (MRM) to analytically capture the joint behavior of performance and reliability. System states represent combinations of node health (operational, failed, under repair) and the reward function quantifies the contribution of each state to overall throughput. By solving the steady‑state equations, they obtain expected reward rates that guide the selection of nodes with favorable trade‑offs between speed and stability.

Experimental evaluation is performed on a small Alchemi‑based testbed. Two scenarios are compared: (a) a baseline scheduler that ignores reliability, and (b) the proposed reliability‑aware scheduler. Results show that incorporating the failure‑to‑repair metric reduces the average makespan by roughly 30 % and improves the overall success ratio of job completions. Visualizations of node failure‑to‑repair rates illustrate that nodes with low failure probabilities (e.g., “Grid node 8”) are preferentially selected, confirming the algorithm’s intended behavior.

The paper concludes that a user‑driven QoS scheduler, enriched with reliability information, can significantly enhance both execution speed and system dependability in grid environments. It also outlines future work, including extending the model to handle multiple QoS dimensions (cost, data locality, security), automating parameter tuning for the cMA component, and validating the approach on larger, heterogeneous infrastructures.

While the concept is promising, the manuscript suffers from several shortcomings: the success‑rate/exec‑time weighting is simplistic and may not capture complex user preferences; the cMA configuration is not described in depth, leaving questions about scalability; the experimental platform is limited in size, reducing confidence in applicability to real‑world scientific grids; and numerous typographical and formatting errors impede reproducibility. Nonetheless, the integration of reliability metrics into a user‑centric scheduling framework represents a valuable contribution to the field of autonomic grid computing.


Comments & Academic Discussion

Loading comments...

Leave a Comment