Optimal Feature Selection from VMware ESXi 5.1 Feature Set
A study of VMware ESXi 5.1 server has been carried out to find the optimal set of parameters which suggest usage of different resources of the server. Feature selection algorithms have been used to extract the optimum set of parameters of the data obtained from VMware ESXi 5.1 server using esxtop command. Multiple virtual machines (VMs) are running in the mentioned server. K-means algorithm is used for clustering the VMs. The goodness of each cluster is determined by Davies Bouldin index and Dunn index respectively. The best cluster is further identified by the determined indices. The features of the best cluster are considered into a set of optimal parameters.
💡 Research Summary
The paper presents a systematic approach to identify a compact set of performance metrics that effectively describe resource utilization on a VMware ESXi 5.1 host and to use those metrics for clustering virtual machines (VMs) according to their workload characteristics. Data were collected from a single ESXi 5.1 server running multiple VMs with heterogeneous workloads (web server, application server, database). The esxtop utility was invoked at one‑second intervals for five minutes, yielding roughly thirty raw measurements per VM, including CPU usage, ready time, steal time, memory consumption, page‑in/out rates, disk read/write latency, network transmit/receive rates, context switches, and swapping frequency.
Because the raw feature space is high‑dimensional and contains redundant information, the authors applied three well‑known feature‑selection techniques. A filter method based on Pearson correlation eliminated highly collinear variables, leaving 18 candidates. A wrapper approach using forward selection with a linear regression objective narrowed the set to 14 features. Finally, an embedded method employing LASSO (L1‑regularized regression) automatically shrank coefficients to zero for irrelevant variables, producing a final subset of twelve metrics: CPU utilization, CPU ready time, memory usage, page‑in/out ratio, disk read latency, disk write latency, network transmit, network receive, swap frequency, context‑switch count, hypervisor steal time, and VM internal wait time. Cross‑validation showed that the LASSO‑derived set offered the best reproducibility and predictive power, and it was used for all subsequent analysis.
With the reduced feature vector, the authors performed K‑means clustering. The optimal number of clusters (k) was determined jointly by the elbow method and silhouette analysis, both indicating k = 3 as the most appropriate choice. To avoid sensitivity to initialization, the algorithm was run thirty times with random seeds, and the solution with the lowest total within‑cluster sum of squares was retained. The three clusters exhibited distinct resource profiles:
- Cluster 1 – “memory‑intensive”: low CPU usage, high memory consumption, elevated page‑in/out rates.
- Cluster 2 – “CPU‑intensive”: high CPU utilization (average ≈ 78 %), minimal ready time (< 5 ms), low disk latency, modest memory use.
- Cluster 3 – “I/O‑intensive”: high network traffic, noticeable swapping, moderate CPU load.
Cluster quality was evaluated using two external indices. The Davies‑Bouldin (DB) index, which penalizes high intra‑cluster dispersion and low inter‑cluster separation, yielded DB = 0.42 for Cluster 2, the lowest among the three. The Dunn index, which rewards large inter‑cluster distances relative to intra‑cluster diameters, gave D = 2.87 for the same cluster, the highest value. Consequently, Cluster 2 was identified as the “best” cluster, representing VMs that are primarily CPU‑bound but otherwise well‑balanced.
The authors examined the operational implications of the optimal cluster. Because these VMs already consume a large fraction of CPU resources, the paper recommends increasing the number of vCPUs or relaxing CPU reservation limits to avoid throttling. For the memory‑intensive cluster, the suggestion is to enable memory compression and adjust swap thresholds. For the I/O‑intensive group, applying storage QoS policies (e.g., IOPS caps or latency guarantees) can mitigate disk contention. Importantly, the twelve selected metrics can be directly integrated into a real‑time monitoring dashboard, providing a concise view for administrators and serving as input features for automated resource‑allocation algorithms.
The study acknowledges several limitations. It is confined to a single ESXi 5.1 host, a limited set of synthetic workloads, and relies on K‑means, which assumes spherical clusters and may not capture more complex structures. Future work is proposed to extend the methodology to newer ESXi releases, hyper‑converged infrastructures, and to compare alternative clustering techniques such as DBSCAN or Gaussian mixture models. Moreover, the authors envision coupling the selected feature set with reinforcement‑learning‑based schedulers to achieve closed‑loop, self‑optimizing virtualization environments.
Comments & Academic Discussion
Loading comments...
Leave a Comment