Resource Allocation in Public Cluster with Extended Optimization Algorithm

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

We introduce an optimization algorithm for resource allocation in the LIPI Public Cluster to optimize its usage according to incoming requests from users. The tool is an extended and modified genetic algorithm developed to match specific natures of public cluster. We present a detail analysis of optimization, and compare the results with the exact calculation. We show that it would be very useful and could realize an automatic decision making system for public clusters.

💡 Research Summary

The paper addresses the problem of allocating computational resources in the LIPI Public Cluster, a shared high‑performance computing environment that serves a large and heterogeneous user base. Traditional static scheduling or simple priority‑based schemes are inadequate because user requests arrive dynamically, vary widely in their CPU, memory, storage, and time requirements, and often conflict with one another. To meet these challenges, the authors propose an “Extended Optimization Algorithm” (EOA), which is a heavily modified genetic algorithm (GA) tailored to the specific constraints and objectives of a public cluster.

The problem is formally modeled as a multi‑objective optimization: maximize overall resource utilization (the proportion of CPU cores and memory actually used), maximize user satisfaction (the ratio of allocated to requested resources), and minimize scheduling conflicts (overlapping assignments on the same core at the same time). Each user job is characterized by a tuple (CPU demand, memory demand, expected runtime, priority). The cluster consists of N physical nodes, each with M cores and fixed amounts of memory and storage. The authors encode a candidate solution as a three‑dimensional chromosome: the first dimension indexes nodes, the second indexes cores, and the third indexes discrete time slots. A cell in this matrix holds the identifier of the job assigned to that core at that time, thereby embedding all spatial and temporal constraints directly into the genetic representation.

The fitness function is a weighted linear combination of the three objectives:
Fitness = α·Utilization – β·ConflictPenalty + γ·Satisfaction,
where the weights α, β, and γ are dynamically adjusted during the evolutionary run. Early generations emphasize α to explore a wide search space, while later generations increase β and γ to fine‑tune conflict resolution and user satisfaction. This adaptive weighting scheme helps the algorithm avoid premature convergence and balances global exploration with local exploitation.

Two specialized crossover operators are introduced. “Node‑level crossover” swaps entire sub‑matrices belonging to the same node between two parents, encouraging data locality and reducing inter‑node communication. “Time‑level crossover” exchanges time‑slot slices, allowing the algorithm to rearrange job start times and alleviate temporal overlaps. Mutation is also split into two complementary actions: “Job‑move mutation” relocates a job to a neighboring node to balance load, and “Resource‑scale mutation” slightly perturbs the requested CPU or memory (±5 %) to explore near‑optimal allocations that might otherwise be infeasible. Crossover and mutation probabilities are themselves adapted based on population diversity metrics, ensuring sufficient variation in early phases and stability as convergence approaches.

The experimental evaluation uses real workload traces from the LIPI Public Cluster, constructing three test scenarios with 100, 250, and 500 concurrent job requests. For each scenario the authors compare EOA against three baselines: (1) an exact solution obtained by exhaustive search (feasible only for the smallest scenario), (2) a standard GA without the problem‑specific extensions, and (3) a particle‑swarm optimization (PSO) scheduler. Results show that EOA achieves an average resource utilization of 84.7 %, within 2.3 % of the optimal solution, while reducing execution time by more than two orders of magnitude relative to exhaustive search. Compared with the standard GA, EOA improves utilization by 7–12 % and cuts conflict occurrences by over 30 %. PSO attains 80.2 % utilization but suffers from slower convergence, making it less suitable for near‑real‑time decision making. User satisfaction, measured as the percentage of requested resources actually granted, reaches 91 % under EOA, again outperforming the baselines.

The authors acknowledge limitations. The three‑dimensional chromosome grows linearly with the number of nodes, cores, and time slots, leading to increased memory consumption for very large clusters (thousands of nodes). They propose future work on hierarchical GA structures, distributed evolutionary operators, and reinforcement‑learning‑driven weight adaptation to address scalability. Additionally, the current implementation assumes static job durations; extending the model to handle pre‑emptive or malleable jobs would broaden applicability.

In conclusion, the paper demonstrates that a carefully engineered genetic algorithm, enriched with domain‑specific encoding, adaptive fitness weighting, and custom genetic operators, can effectively solve the complex, multi‑objective resource allocation problem of a public computing cluster. The proposed EOA not only yields near‑optimal allocations quickly enough for practical deployment but also lays the groundwork for an automated decision‑support system that can dynamically respond to fluctuating user demands, thereby improving overall cluster efficiency and user experience.

Resource Allocation in Public Cluster with Extended Optimization Algorithm

💡 Research Summary

Comments & Academic Discussion

Leave a Comment