Efficient Algorithms and Data Structures for Massive Data Sets

February 23, 2026

Reading time: 7 minute

...

#Computer Science #Data Structures #Data

📝 Original Info

Title: Efficient Algorithms and Data Structures for Massive Data Sets
ArXiv ID: 1005.3473
Date: 2010-05-20
Authors: Researchers from original ArXiv paper

📝 Abstract

For many algorithmic problems, traditional algorithms that optimise on the number of instructions executed prove expensive on I/Os. Novel and very different design techniques, when applied to these problems, can produce algorithms that are I/O efficient. This thesis adds to the growing chorus of such results. The computational models we use are the external memory model and the W-Stream model. On the external memory model, we obtain the following results. (1) An I/O efficient algorithm for computing minimum spanning trees of graphs that improves on the performance of the best known algorithm. (2) The first external memory version of soft heap, an approximate meldable priority queue. (3) Hard heap, the first meldable external memory priority queue that matches the amortised I/O performance of the known external memory priority queues, while allowing a meld operation at the same amortised cost. (4) I/O efficient exact, approximate and randomised algorithms for the minimum cut problem, which has not been explored before on the external memory model. (5) Some lower and upper bounds on I/Os for interval graphs. On the W-Stream model, we obtain the following results. (1) Algorithms for various tree problems and list ranking that match the performance of the best known algorithms and are easier to implement than them. (2) Pass efficient algorithms for sorting, and the maximal independent set problems, that improve on the best known algorithms. (3) Pass efficient algorithms for the graphs problems of finding vertex-colouring, approximate single source shortest paths, maximal matching, and approximate weighted vertex cover. (4) Lower bounds on passes for list ranking and maximal matching. We propose two variants of the W-Stream model, and design algorithms for the maximal independent set, vertex-colouring, and planar graph single source shortest paths problems on those models.

💡 Deep Analysis

Deep Dive into Efficient Algorithms and Data Structures for Massive Data Sets.

📄 Full Content

On the external memory model, we obtain the following results. (1) An I/O efficient algorithm for computing minimum spanning trees of graphs that improves on the performance of the best known algorithm. (2) The first external memory version of soft heap, an approximate meldable priority queue. (3) Hard heap, the first meldable external memory priority queue that matches the amortised I/O performance of the known external memory priority queues, while allowing a meld operation at the same amortised cost. (4) I/O efficient exact, approximate and randomised algorithms for the minimum cut problem, which has not been explored before on the external memory model. ( 5) Some lower and upper bounds on I/Os for interval graphs.

On the W-Stream model, we obtain the following results. (1) Algorithms for various tree problems and list ranking that match the performance of the best known algorithms and are easier to implement than them. (2) Pass efficient algorithms for sorting, and the maximal independent set problems, that improve on the best known algorithms. (3) Pass efficient algorithms for the graphs problems of finding vertex-colouring, approximate single source shortest paths, maximal matching, and approximate weighted vertex cover.

(4) Lower bounds on passes for list ranking and maximal matching.

We propose two variants of the W-Stream model, and design algorithms for the maximal independent set, vertex-colouring, and planar graph single source shortest paths problems on those models.

iii First and foremost, I would like to thank my guide for all his support and encouragement during the course of my PhD and Masters. I am also grateful to him for always patiently listening to my ideas and doubts even when they were trivial or “silly”. Indeed, I have been inspired by his hard-working attitude, intellectual orientation and a thoroughly professional outlook towards research. The research training that I have acquired while working with him will drive my efforts in future.

I would also like to thank the members of my doctoral committee, in particular Profs. S. V. Rao, Pinaki Mitra and J. S. Sahambi, for their feedback and encouragement during the course of my PhD work. I am also grateful to the entire Computer Science faculty for their tremendous support and affection during my stay at IIT Guwahati. I would also like to thank the anonymous referees who have commented at various fora, on the I/O efficient minimum spanning trees algorithm presented in this thesis. Their comments have been specifically valuable in improving the rigour and presentation of this piece of work.

I also gratefully acknowledge MHRD, Govt of India and Philips Research, India for supporting my research at IIT Guwahati.

I take this opportunity to express my heartfelt thanks to my friends and colleagues who made my stay at IIT Guwahati an enjoyable experience. Knowing and interacting with friends such as Godfrey, Lipika, Minaxi, Mili, and Thoi has been a great experience. These friends have always been by my side whenever I have needed them.

It goes without saying that this journey would not have been possible without the tremendous support and encouragement I have got from my sister Chutti and my bhaiya Lucky, and my parents. Indeed no words can express my gratefulness towards my parents who have unconditionally and whole-heartedly supported me in all my endeavours. I am also grateful to my in-laws for their understanding, patience and tremendous support. I am also thankful to my sister-in-law Tinku for being a great friend and her understanding during difficult times.

Last but not the least, I thank my husband Mani for his constant encouragement, love, support and infinite patience. Without him, this journey could not have been completed so smoothly.

Over the years, computers have been used to solve larger and larger problems. Today we have several applications of computing that often deal with massive data sets that are terabytes or even petabytes in size. Examples of such applications can be found in databases [49,76], geographic information systems [9,42], VLSI verification, computerised medical treatment, astrophysics, geophysics, constraint logic programming, computational biology, computer graphics, virtual reality, 3D simulation and modeling [6], analysis of telephone calls in a voice network [16,26,32,47], and transactions in a credit card network [17], to name a few.

In traditional algorithm design, it is assumed that the main memory is infinite in size and allows random uniform access to all its locations. This enables the designer to

…(Full text truncated)…

📄 Read Full PDF on ArXiv

📸 Image Gallery

Reference

This content is AI-processed based on ArXiv data.

Efficient Algorithms and Data Structures for Massive Data Sets

📝 Original Info

📝 Abstract

💡 Deep Analysis

📄 Full Content

📸 Image Gallery

Reference

Table of Contents

Table of Contents

📝 Original Info

📝 Abstract

💡 Deep Analysis

📄 Full Content

📸 Image Gallery

Reference

Related Posts

An Extension of the Permutation Group Enumeration Technique (Collapse of the Polynomial Hierarchy: $mathbf{NP = P}$)

Approximability of Sparse Integer Programs

Energy and Link Quality Based Routing for Data Gathering Tree in Wireless Sensor Networks Under TINYOS - 2.X

Start searching

No results found