Computer Science / Distributed Computing

Enabling Lock-Free Concurrent Fine-Grain Access to Massive Distributed Data: Application to Supernovae Detection

February 23, 2026

Reading time: 6 minute

...

#Computer Science #Data #Detection #Distributed Computing

📝 Original Info

Title: Enabling Lock-Free Concurrent Fine-Grain Access to Massive Distributed Data: Application to Supernovae Detection
ArXiv ID: 0810.2226
Date: 2008-10-14
Authors: Researchers from original ArXiv paper

📝 Abstract

We consider the problem of efficiently managing massive data in a large-scale distributed environment. We consider data strings of size in the order of Terabytes, shared and accessed by concurrent clients. On each individual access, a segment of a string, of the order of Megabytes, is read or modified. Our goal is to provide the clients with efficient fine-grain access the data string as concurrently as possible, without locking the string itself. This issue is crucial in the context of applications in the field of astronomy, databases, data mining and multimedia. We illustrate these requiremens with the case of an application for searching supernovae. Our solution relies on distributed, RAM-based data storage, while leveraging a DHT-based, parallel metadata management scheme. The proposed architecture and algorithms have been validated through a software prototype and evaluated in a cluster environment.

💡 Deep Analysis

Deep Dive into Enabling Lock-Free Concurrent Fine-Grain Access to Massive Distributed Data: Application to Supernovae Detection.

📄 Full Content

arXiv:0810.2226v1 [cs.DC] 13 Oct 2008 Enabling Lock-Free Concurrent Fine-Grain Access to Massive Distributed Data: Application to Supernovae Detection Bogdan Nicolae #1, Gabriel Antoniu ∗2, Luc Boug´e +3 # University of Rennes 1/IRISA Campus de Beaulieu, 35042 Rennes cedex, France 1 Bogdan.Nicolae@inria.fr ∗INRIA/IRISA Campus de Beaulieu, 35042 Rennes cedex, France 2 Contact: Gabriel.Antoniu@inria.fr + ENS Cachan, Brittany Extension/IRISA Campus Ker Lann, 35170 Bruz, France 3 Luc.Bouge@bretagne.ens-cachan.fr Abstract—We consider the problem of efﬁciently managing massive data in a large-scale distributed environment. We con- sider data strings of size in the order of Terabytes, shared and accessed by concurrent clients. On each individual access, a segment of a string, of the order of Megabytes, is read or modiﬁed. Our goal is to provide the clients with efﬁcient ﬁne- grain access the data string as concurrently as possible, without locking the string itself. This issue is crucial in the context of applications in the ﬁeld of astronomy, databases, data mining and multimedia. We illustrate these requiremens with the case of an application for searching supernovae. Our solution relies on distributed, RAM-based data storage, while leveraging a DHT- based, parallel metadata management scheme. The proposed architecture and algorithms have been validated through a software prototype and evaluated in a cluster environment. I. INTRODUCTION Large scale data management is becoming increasingly important for a wide range of applications, both scientiﬁc and industrial: modeling, astronomy, biology, gouvernamental and industrial statistics, etc. All these applications generate huge amounts of data that need to be stored, processed and eventually archived globally. In order to better illustrate these needs, this paper focuses on a real life astronomy problem: ﬁnding supernovae (stellar explosions). In a typical scenario, a telescope is used to take pictures of the same part of space at regular intervals, usually every month. Corresponding digital images are then compared in an attempt to ﬁnd variable objects, which might be candidates for supernovae. To conﬁrm that such objects are supernovae, considerable computational effort is necessary in order to distinguish the supernovae themselves from the other variable objects that may be present in the image: this requires to ana- lyze the light curve and spectrum of each potential candidate. To speed up the process of ﬁnding supernovae, multiple parts of space should be analyzed concurrently: as there is no dependency between different regions of space, the analysis itself is an embarrassingly parallel problem. The difﬁculty lies in the massive amount of data that needs to be managed and made available to the machines providing the computational power. Huge data size. Hundreds of GB of images from various parts of the sky may correspond to a single point in time. Since the analysis requires multiple consecutive images of the same part of the sky, the order of TB is quickly reached. Global view. Managing independent images manually is cumbersome. Applications ﬁnding supernovae (and not only) are much easier to design if a global view of the sky is available: ﬁnding the right image at a given time simply translates into accessing the right part of the sky view for that time. Let us consider a very simple abstraction of this problem, in which the view of the sky is a very long string of bytes (blob), obtained by concatenating the images in binary form. Assuming all images have a ﬁxed size, a speciﬁc part of the sky is accessible by providing the corresponding offset in the string. A simple transformation from two-dimensional to unidimensional coordinates is sufﬁcient. Efﬁcient ﬁne grain access. While many images make up the global view of the sky, each of them needs to be accessed individually. As each image is much smaller than the size of the string representing the sky, ﬁne-grain access to substrings is crucial. Versioning. As new images are taken by the telescope, the view of the sky needs to be updated, while the previous views of the sky still need to be accessible. It is desirable to refer to views of the sky at particular moments in time, therefore versioning is necessary. Read-read concurrency. Comparison of images for different parts of the sky is a massively parallel problem. That is, concurrent reads of different images in a view or concurrent reads of the same image in different views should be efﬁciently processed in parallel. Read-write concurrency. The telescope may gather and store new pictures (i.e. new versions of some part of the sky) while the analysis proceeds on the previous versions. Consequently, in our model, it is important to allow new versions of our global string to be generated and written while the earlier versions are read and analyzed: read-write concurrency is highly desirable for efﬁciency. Write-write concurrency. As multiple telescopes may

…(Full text truncated)…

🇰🇷 이 논문을 한글로 읽기

📄 Read Full PDF on ArXiv

Reference

This content is AI-processed based on ArXiv data.

Enabling Lock-Free Concurrent Fine-Grain Access to Massive Distributed Data: Application to Supernovae Detection

📝 Original Info

📝 Abstract

💡 Deep Analysis

📄 Full Content

Reference

Table of Contents

Table of Contents

📝 Original Info

📝 Abstract

💡 Deep Analysis

📄 Full Content

Reference

Related Posts

Middleware-based Database Replication: The Gaps between Theory and Practice

Fraud/Uncollectible Debt Detection Using a Bayesian Network Based Learning System: A Rare Binary Outcome with Mixed Data Structures

Prediction-Based Data Transmission for Energy Conservation in Wireless Body Sensors

Start searching

No results found