Reliability of Computational Experiments on Virtualised Hardware

Reading time: 5 minute
...

📝 Original Info

  • Title: Reliability of Computational Experiments on Virtualised Hardware
  • ArXiv ID: 1110.6288
  • Date: 2010-12-31
  • Authors: : Schad, D., El-Khamra, M., Ostermann, S., Kotthoff, L., Xu, Y., Furht, B., & Escalante, A.J.

📝 Abstract

We present preliminary results of an investigation into the suitability of virtualised hardware -- in particular clouds -- for running computational experiments. Our main concern was that the reported CPU time would not be reliable and reproducible. The results demonstrate that while this is true in cases where many virtual machines are running on the same physical hardware, there is no inherent variation introduced by using virtualised hardware compared to non-virtualised hardware.

💡 Deep Analysis

Figure 1

📄 Full Content

Running computational experiments is a task that requires a lot of resources. Especially recent research in Artificial Intelligence is concerned with the behaviour of a large number of problem-solving systems and algorithms on a large number of problems (Kotthoff et al., 2010;Xu et al., 2008). The purpose of these large-scale experiments is to build statistical models of the behaviour of certain systems and algorithms on certain problems to be able to predict the most efficient system for solving new problem instances.

The obvious problem is that a lot of computing resources are required to be able to run this kind of experiments. Provisioning a large number of machines is not only expensive, but also likely to waste resources when the machines are not being used. Especially smaller universities and research institutions are often unable to provide large-scale computing infrastructure and have to rely on support from other institutions.

The advent of publicly available cloud computing infrastructure has provided a possible solution to this problem. Instead of provisioning a large number of computers themselves, researchers can use computational resources provided by companies and only pay for what they are actually using. Nowadays commercial clouds are big enough to easily handle the demand running large-scale computational experiments generates.

This raises an important question however. How reliable and reproducible are the results of experiments run in the cloud? Are the CPU times reported more variable than on non-virtualised hardware?

While the focus of our evaluation is on computational experiments, we believe that the results are of interest in general. If a company is planning the provisioning of virtual resources, the implicit assumption is that the performance of the planned resources can be predicted based on the performance of the already provisioned resources. If these predictions are unreliable, too few resources could be provisioned, leading to a degradation of performance, or too many, leading to waste.

There has been relatively little research into the repeatability of experiments on virtualised hardware. El-Khamra et al. (2010) report large fluctuations of high-performance computing workloads on cloud infrastructure. Ostermann et al. (2010) evaluate the performance of the Amazon cloud with regards to its general suitability for scientific use. The handbook of cloud computing (Furht and Escalante, 2010) explores the issue in some of its chapters.

An experimental evaluation by Schad et al. (2010) again showed that there is large variability in performance and care must be taken when running scientific experiments. They provide an in-depth analysis of the various factors that affect performance, but only distinguish between two different virtual machine types provided by the Amazon cloud.

Our approach is more systematic and directly compares the variability of performance on virtualised and non-virtualised hardware with a real scientific workload. Our application is lifted straight from Artificial Intelligence research.

We are concerned with two major problems when running experiments. First, we want the results to be reliable in the sense that they faithfully represent the true performance of an algorithm or a system. Second, we want them to be reproducible in the sense that anybody can run the experiments again and achieve the same results we did.

We can assess the reliability of an experiment by running it several times and judging whether the results are the same within some margin of experimental error. Reproducibility is related to this notion, but more concerned with being able to reproduce the results in a different environment or at a different time. The two concepts are closely related however -if we cannot reproduce the results of an experiment it is also unreliable and if the results are unreliable there is no point in trying to reproduce them.

Running experiments on virtualised hardware gives an advantage in terms of reproducibility because the environment that an experiment was run in can be packaged as a virtual machine. This not only removes possible variability in the results due to different software versions, but also enables to reproduce experiments with unmaintained systems that cannot be built and would not run on contemporary operating systems.

The questions we investigate in this paper however are as follows.

-Is there inherently more variation in terms of CPU time on virtualised hardware than on non-virtualised hardware?

-Is the performance of virtualised hardware consistent and are we able to combine several virtual machines into a cluster and still get consistent results? -Are there differences between different clouds that use different controller software?

To evaluate the reliability of experimental results, we used the Minion constraint solver (Gent et al., 2006). We ran it on the following three problems.

-An n-queens instance that takes a couple

📸 Image Gallery

cover.png

Reference

This content is AI-processed based on open access ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut