The Role of Provenance Management in Accelerating the Rate of Astronomical Research

Reading time: 6 minute
...

📝 Original Info

  • Title: The Role of Provenance Management in Accelerating the Rate of Astronomical Research
  • ArXiv ID: 1005.3358
  • Date: 2010-05-20
  • Authors: Researchers from original ArXiv paper

📝 Abstract

The availability of vast quantities of data through electronic archives has transformed astronomical research. It has also enabled the creation of new products, models and simulations, often from distributed input data and models, that are themselves made electronically available. These products will only provide maximal long-term value to astronomers when accompanied by records of their provenance; that is, records of the data and processes used in the creation of such products. We use the creation of image mosaics with the Montage grid-enabled mosaic engine to emphasize the necessity of provenance management and to understand the science requirements that higher-level products impose on provenance management technologies. We describe experiments with one technology, the "Provenance Aware Service Oriented Architecture" (PASOA), that stores provenance information at each step in the computation of a mosaic. The results inform the technical specifications of provenance management systems, including the need for extensible systems built on common standards. Finally, we describe examples of provenance management technology emerging from the fields of geophysics and oceanography that have applicability to astronomy applications.

💡 Deep Analysis

Deep Dive into The Role of Provenance Management in Accelerating the Rate of Astronomical Research.

The availability of vast quantities of data through electronic archives has transformed astronomical research. It has also enabled the creation of new products, models and simulations, often from distributed input data and models, that are themselves made electronically available. These products will only provide maximal long-term value to astronomers when accompanied by records of their provenance; that is, records of the data and processes used in the creation of such products. We use the creation of image mosaics with the Montage grid-enabled mosaic engine to emphasize the necessity of provenance management and to understand the science requirements that higher-level products impose on provenance management technologies. We describe experiments with one technology, the “Provenance Aware Service Oriented Architecture” (PASOA), that stores provenance information at each step in the computation of a mosaic. The results inform the technical specifications of provenance management system

📄 Full Content

Astronomers need to understand the technical content of data sets and evaluate published claims based on them. All data products and records from all the steps used to create science data sets ideally would be archived, but the volume of data would be prohibitively high. The highcadence surveys currently under development will exacerbate this problem; the Large Synoptic Survey Telescope alone is expected to deliver 60 PB of just raw data in its operational lifetime. There is therefore a need to create records of how data were derivedprovenance -that contain sufficient information to enable replication of the data. A report issued by the National Academy of Sciences dedicated to the integrity of digital data recommends the curation of the provenance of data sets as part of its key recommendations [1].

Provenance records must meet strict specifications if they are to have value in supporting research. They must capture the algorithms, software versions, parameters, input data sets, hardware components and computing environments. The records should be standardized and captured in a permanent store that can be queried by end users. In this paper, we describe how the Montage image mosaic engine acts as a driver for the application in astronomy of provenance management methodologies now in development. Provenance management is an active field in many areas of science, and we describe work in earth sciences and oceanography that has applicability to astronomy. [2] describes provenance management in more detail.

Montage (http://montage.ipac.caltech.edu ) is a toolkit for aggregating astronomical images in Flexible Image Transport System (FITS) format into mosaics. Its scientific value derives from three features of its design:

• It uses algorithms that preserve the calibration and positional (astrometric) fidelity of the input images to deliver mosaics that meet user-specified parameters of projection, coordinates, and spatial scale. It supports all projections and coordinate systems in use in astronomy.

• It contains independent modules for analyzing the geometry of images on the sky, and for creating and managing mosaics.

• It is written in American National Standards Institute (ANSI)-compliant C, and is portable and scaleable the same engine runs on desktop, cluster, supercomputer environments or clouds running common Unix-based operating systems.

There are four steps in the production of an image mosaic: 1. Discover the geometry of the input images on the sky from the input FITS keywords and use it to calculate the geometry of the output mosaic on the sky.

  1. Re-project the input images to the spatial scale, coordinate system, World Coordinate System (WCS)-projection, and image rotation.

  2. Model the background radiation in the input images to achieve common flux scales and background level across the mosaic. 4. Co-add the re-projected, background-corrected images into a mosaic. Each production step has been coded as an independent engine run from an executive script. Figure 1 illustrates the second through fourth steps for the simple case of generating a mosaic from three input mosaics. In practice, as many input images as necessary can be processed in parallel, limited only by the available hardware.

In the production steps shown in Figure 1, the files output by one step become the input to the subsequent step. That is, the reprojected images are used as input to the background rectification. This rectification itself consists of several steps that fit a model to the differences between flux levels of each image, and in turn the rectified, reprocessed images are input to the co-addition engine. Thus the production of an image mosaic actually generates a volume of data that is substantially greater than the volume of the mosaic. Table 1 illustrates this result for two use cases that return 3-color mosaics from the Two Micron All Sky Survey (2MASS) images (see http://www.ipac.caltech.edu/2mass/releases/allsky/doc/explsup.html) . One is a 6 deg sq mosaic of ρ Oph and the second is an All Sky mosaic. The table makes clear that the volume of intermediate products exceeds the mosaic size by factors of 30 to 50. The Infrared Processing and Analysis Center (IPAC) hosts an on-request image mosaic service (see Section 3) that delivers mosaics of user-specified regions of the sky, and it currently receives 25,000 queries per year. Were mosaics of the size of the ρ Oph mosaic processed with such frequency, the service would produce 3.8 PB of data each year. Such volumes are clearly too high to archive.

Montage makes three assumptions and approximations that affect the quality of the mosaics:

• Reprojection involves redistributing the flux from the input pixel pattern to the output pixel pattern. Montage uses a fast, custom algorithm that approximates tangent plane projections1 as polynomial approximations to the pixel pattern on the sky, which can produce small distortions in the pixel pattern of the mosaic.

…(Full text truncated)…

📸 Image Gallery

cover.png page_2.webp page_3.webp

Reference

This content is AI-processed based on ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut