With the advances in e-Sciences and the growing complexity of scientific analyses, more and more scientists and researchers are relying on workflow systems for process coordination, derivation automation, provenance tracking, and bookkeeping. While workflow systems have been in use for decades, it is unclear whether scientific workflows can or even should build on existing workflow technologies, or they require fundamentally new approaches. In this paper, we analyze the status and challenges of scientific workflows, investigate both existing technologies and emerging languages, platforms and systems, and identify the key challenges that must be addressed by workflow systems for e-science in the 21st century.
Deep Dive into Scientific Workflow Systems for 21st Century e-Science, New Bottle or New Wine?.
With the advances in e-Sciences and the growing complexity of scientific analyses, more and more scientists and researchers are relying on workflow systems for process coordination, derivation automation, provenance tracking, and bookkeeping. While workflow systems have been in use for decades, it is unclear whether scientific workflows can or even should build on existing workflow technologies, or they require fundamentally new approaches. In this paper, we analyze the status and challenges of scientific workflows, investigate both existing technologies and emerging languages, platforms and systems, and identify the key challenges that must be addressed by workflow systems for e-science in the 21st century.
Scientific Workflow Systems for 21st Century,
New Bottle or New Wine?
Invited Short Paper
1Yong Zhao, 2Ioan Raicu, 2,3,4Ian Foster
1Microsoft Corporation, Redmond, WA, USA
2 Department of Computer Science, University of Chicago, Chicago, IL, USA
3Computation Institute, University of Chicago, Chicago, IL, USA
4Math & Computer Science Division, Argonne National Laboratory, Argonne, IL, USA
yozha@microsoft.com, iraicu@cs.uchicago.edu, foster@mcs.anl.gov
Abstract
With the advances in e-Sciences and the growing
complexity of scientific analyses, more and more
scientists and researchers are relying on workflow
systems
for
process
coordination,
derivation
automation, provenance tracking, and bookkeeping.
While workflow systems have been in use for decades,
it is unclear whether scientific workflows can or even
should build on existing workflow technologies, or they
require fundamentally new approaches. In this paper,
we analyze the status and challenges of scientific
workflows, investigate both existing technologies and
emerging languages, platforms and systems, and
identify the key challenges that must be addressed by
workflow systems for e-science in the 21st century.
- Introduction
Scientific workflow has become increasingly
popular in modern scientific computation as more and
more scientists and researchers are relying on
workflow systems to conduct their daily science
analysis and discovery. With technology advances in
both scientific instrumentation and simulation, the
amount of scientific datasets is growing exponentially
each year, such large data size combined with growing
complexity of data analysis procedures and algorithms
have rendered traditional manual processing and
exploration unfavorable as compared with modern in
silico processes automated by scientific workflow
systems (SWFS). While the term workflow speaks of
different things in different context, we find in general
SWFS are engaged and applied to the following
aspects of scientific computations: 1) describing
complex scientific procedures, 2) automating data
derivation processes, 3) high performance computing
(HPC) to improve throughput and performance, and 4)
provenance management and query.
Workflows are not a new concept and have been
around for decades. There were a number of
coordination languages and systems developed in the
80s and 90s [1,7], which share many common
characteristic with workflow systems (i.e. they
describe individual computation components and their
ports and channels, and the data and event flow
between them). They also coordinate the execution of
the components, often on parallel computing resources.
Furthermore, business process management systems
have been developed and invested in for years; there
are many mature commercial products and industry
standards such as BPEL [2]. In the scientific
community there are also many emerging systems for
scientific programming and computation [5,22]. Before
we jump on developing yet another workflow system,
a fundamental question to ask is whether we can use
existing technologies, or we should invent new
languages and systems in order to achieve the four
aspects mentioned earlier that are essential to scientific
workflow systems. This paper identifies the challenges
to workflow development in the context of scientific
computation; we present an overview of some of the
existing technologies and emerging systems, and
discuss opportunities in addressing these challenges.
- Multi-core processor architectures
Software development has been on a free ride for
performance gain as chipmakers continue to follow
Moore’s Law in doubling up transistors in minuscule
space. Little consideration has been given to code
parallelization since it has not been essential for the
average computer user until recently, when single CPU
core performance growth stagnated and multi-core
processors emerged on the market in 2005.
Due to the limitations to effectively increasing
processor clock frequency, hardware manufactures
started to physically reorganize chips into what we call
the multi-core architecture [10], involving linking
several microprocessor cores together on the same
semiconductor. Various manufactures from Intel,
AMD, IBM, Sun, have released dual-core, quad-core,
eight-core, and 64-threaded processors in the past few
years [13,21]. Given that 128-threaded SMP systems
are a reality today [21], it is reasonable to assume that
1024 CPU cores/threads or more per SMP system will
be available in the next decade.
The new multi-core architecture will force radical
changes in software design and development. We are
already seeing significant increase of research interests
in concurrency and parallelism, and multi-core
software development. The number of multiprocessor
research papers has increased sharply since year 2001,
surpassing the peak point in all the past years [10].
Con
…(Full text truncated)…
This content is AI-processed based on ArXiv data.