The proliferation of commercial cloud computing providers has generated significant interest in the scientific computing community. Much recent research has attempted to determine the benefits and drawbacks of cloud computing for scientific applications. Although clouds have many attractive features, such as virtualization, on-demand provisioning, and "pay as you go" usage-based pricing, it is not clear whether they are able to deliver the performance required for scientific applications at a reasonable price. In this paper we examine the performance and cost of clouds from the perspective of scientific workflow applications. We use three characteristic workflows to compare the performance of a commercial cloud with that of a typical HPC system, and we analyze the various costs associated with running those workflows in the cloud. We find that the performance of clouds is not unreasonable given the hardware resources provided, and that performance comparable to HPC systems can be achieved given similar resources. We also find that the cost of running workflows on a commercial cloud can be reduced by storing data in the cloud rather than transferring it from outside.
Deep Dive into Scientific Workflow Applications on Amazon EC2.
The proliferation of commercial cloud computing providers has generated significant interest in the scientific computing community. Much recent research has attempted to determine the benefits and drawbacks of cloud computing for scientific applications. Although clouds have many attractive features, such as virtualization, on-demand provisioning, and “pay as you go” usage-based pricing, it is not clear whether they are able to deliver the performance required for scientific applications at a reasonable price. In this paper we examine the performance and cost of clouds from the perspective of scientific workflow applications. We use three characteristic workflows to compare the performance of a commercial cloud with that of a typical HPC system, and we analyze the various costs associated with running those workflows in the cloud. We find that the performance of clouds is not unreasonable given the hardware resources provided, and that performance comparable to HPC systems can be achie
The developers of scientific applications have many options when it comes to choosing a platform to run their applications. In the past these options included: local workstations, clusters, supercomputers and grids. Each of these choices offers various tradeoffs in terms of usability, performance, and cost. Recently, cloud computing has emerged as another promising solution for scientific applications and is rapidly gaining interest in the scientific community.
Many definitions of cloud computing have been proposed [4,34] [13]. These definitions vary in the scope of what constitutes a cloud and what features a cloud provides. For the purposes of this paper we consider a cloud to be a cluster that offers virtualized computational resources, service-oriented provisioning, and a “pay as you go” usage-based pricing model. Currently there are several commercial clouds that offer these features, such as Amazon EC2 [2], GoGrid [17], and FlexiScale [12]. In addition, it is now possible to build private clouds using open-source cloud computing middleware such as Eucalyptus [27], OpenNebula [28], and Nimbus [26].
Clouds offer many technical and economic advantages over other platforms that are just beginning to be identified. They combine the customization of virtual machines, the scalability and resource sharing of grids, and the stability and economy of software as a service (SaaS). The use of virtualization in particular has been shown to provide many useful benefits for scientific applications, including: user-customization of system software and services, performance isolation, check-pointing and migration, better reproducibility of scientific analyses, and enhanced support for legacy applications [11] [19].
Recently, many studies have investigated the use of clouds and virtualization for scientific applications [33] [36] [24] [32] [35] [10] [16]. These studies have primarily focused on tightly-coupled applications and common HPC benchmarks.
In this paper we study the use of cloud computing for scientific workflows. Workflows are loosely-coupled parallel applications that consist of a series of computational tasks connected by data-and control-flow dependencies. Many scientific analyses are easily expressed as workflows and they are commonly used to solve problems in many disciplines [37].
Clouds provide several benefits for workflow applications. These benefits include:
• Illusion of infinite resources-Unlike grids, clouds give the illusion that the available computing resources are unlimited. This means that users can request, and are likely to obtain, sufficient resources at any given time. Existing commercial clouds have a different workload than grids, however, and the illusion may break down for very large workflows, or if clouds become popular for scientific computing. • Leases-In grids and clusters the user specifies the amount of time required for a computation and delegates responsibility for allocating resources to a batch scheduler. In clouds, the user directly allocates resources as required to schedule their computations. This model is ideal for workflows and other loosely-coupled applications because it decreases the scheduling overheads that can significantly reduce their performance. • Elasticity-Clouds allow users to acquire and release resources on-demand. This enables workflow systems to easily grow and shrink the available resource pool as the needs of the workflow change over time. Previous work on the use of cloud computing for workflows has studied the cost and performance of clouds via simulation [8] and using an experimental cloud [18]. In this paper we extend that work using several workflows that represent different domains and different resource requirements. We use an existing commercial cloud, Amazon’s EC2 [2], in order to assess the potential of currently deployed clouds. We analyze the cost of running the experiments on EC2, and compare the EC2 performance to a typical HPC system, NCSA’s Abe cluster [25].
The contributions of this paper are:
• an experimental study of the performance of three workflows with different I/O, memory and CPU requirements on a commercial cloud • a comparison of the performance of cloud resources and typical HPC resources, and • an analysis of the various costs associated with running workflows on a commercial cloud. In this paper we focus on single, multi-core node performance, which provides adequate capabilities for the applications we are evaluating in this paper.
In order to evaluate the usefulness of cloud computing for scientific workflows we ran three different workflow applications: an astronomy application (Montage), a seismology application (Broadband), and a bioinformatics application (Epigenomics). These three applications were chosen because they cover a wide range of application domains and a wide range of resource requirements. Table 1 shows the relative resource usage of these applications in three different categories: I/O, me
…(Full text truncated)…
This content is AI-processed based on ArXiv data.