Associative Array Model of SQL, NoSQL, and NewSQL Databases

Reading time: 5 minute
...

📝 Abstract

The success of SQL, NoSQL, and NewSQL databases is a reflection of their ability to provide significant functionality and performance benefits for specific domains, such as financial transactions, internet search, and data analysis. The BigDAWG polystore seeks to provide a mechanism to allow applications to transparently achieve the benefits of diverse databases while insulating applications from the details of these databases. Associative arrays provide a common approach to the mathematics found in different databases: sets (SQL), graphs (NoSQL), and matrices (NewSQL). This work presents the SQL relational model in terms of associative arrays and identifies the key mathematical properties that are preserved within SQL. These properties include associativity, commutativity, distributivity, identities, annihilators, and inverses. Performance measurements on distributivity and associativity show the impact these properties can have on associative array operations. These results demonstrate that associative arrays could provide a mathematical model for polystores to optimize the exchange of data and execution queries.

💡 Analysis

The success of SQL, NoSQL, and NewSQL databases is a reflection of their ability to provide significant functionality and performance benefits for specific domains, such as financial transactions, internet search, and data analysis. The BigDAWG polystore seeks to provide a mechanism to allow applications to transparently achieve the benefits of diverse databases while insulating applications from the details of these databases. Associative arrays provide a common approach to the mathematics found in different databases: sets (SQL), graphs (NoSQL), and matrices (NewSQL). This work presents the SQL relational model in terms of associative arrays and identifies the key mathematical properties that are preserved within SQL. These properties include associativity, commutativity, distributivity, identities, annihilators, and inverses. Performance measurements on distributivity and associativity show the impact these properties can have on associative array operations. These results demonstrate that associative arrays could provide a mathematical model for polystores to optimize the exchange of data and execution queries.

📄 Content

1 Associative Array Model of
SQL, NoSQL, and NewSQL Databases Jeremy Kepner1,2,3, Vijay Gadepally1,2, Dylan Hutchison4, Hayden Jananthan3,5,
Timothy Mattson6, Siddharth Samsi1, Albert Reuther1 1MIT Lincoln Laboratory, 2MIT Computer Science & AI Laboratory, 3MIT Mathematics Department, 4University of Washington Computer Science Department, 5Vanderbilt University Mathematics Department, 6Intel Corporation

Abstract—The success of SQL, NoSQL, and NewSQL databases is a reflection of their ability to provide significant functionality and performance benefits for specific domains, such as financial transactions, internet search, and data analysis. The BigDAWG polystore seeks to provide a mechanism to allow applications to transparently achieve the benefits of diverse databases while insulating applications from the details of these databases.
Associative arrays provide a common approach to the mathematics found in different databases: sets (SQL), graphs (NoSQL), and matrices (NewSQL). This work presents the SQL relational model in terms of associative arrays and identifies the key mathematical properties that are preserved within SQL.
These properties include associativity, commutativity, distributivity, identities, annihilators, and inverses. Performance measurements on distributivity and associativity show the impact these properties can have on associative array operations. These results demonstrate that associative arrays could provide a mathematical model for polystores to optimize the exchange of data and execution queries. Keywords-Associative Array Algebra; SQL; NoSQL; NewSQL; Set Theory; Graph Theory; Matrices; Linear Algebra I. INTRODUCTION
Relational or SQL (Structured Query Language) databases [Codd 1970, Stonebraker 1976] such as PostgreSQL, MySQL, and Oracle have been the de facto interface to databases since the 1980s (see Figure 1) and are the bedrock of electronic transactions around the world. More recently, key-value stores (NoSQL databases) such as Google BigTable [Chang 2008], Apache Accumulo [Wall 2015], and MongoDB [Chodorow 2013] have been developed for representing large sparse tables to aid in the analysis of data for Internet search. As a result, the majority of the data on the Internet is now analyzed using key- value stores [DeCandia et al 2007, Lakshman & Malik 2010, George 2011]. In response to similar performance challenges, the relational database community has developed a new class of databases (NewSQL) such as C-Store [Stonebraker 2005], H-Store [Kallman 2008], SciDB [Balazinska 2009], VoltDB [Stonebraker 2013], and Graphulo [Hutchison 2015] to support new analytics capabilities within a database. The SQL, NoSQL, and NewSQL concepts have also been blended in hybrid processing systems, such as Apache Pig [Olston 2008], Apache Spark [Zaharia 2010], and HaLoop [Bu 2010]. An effective mathematical model that encompasses the concepts of SQL, NoSQL, and NewSQL would enable their interoperability. Such a mathematical model is the primary goal of this paper.

Figure 1. Evolution of SQL, NoSQL, NewSQL, and polystore databases. Each class of database delivered new mathematics, functionality, and performance focused on new application areas. SQL, NoSQL, and NewSQL databases are designed for specific applications, have distinct data models, and rely on different underlying mathematics (see Figure 2). Because of their differences, each database has unique strengths that are well suited for particular workloads. It is now recognized that special-purpose databases can be 100x faster for a particular application than a general-purpose database [Kepner 2014]. In addition, the availability of high performance data analysis platforms, such as the MIT SuperCloud [Reuther 2013, Prout 2015], allows high performance databases to share the same hardware platform without sacrificing performance.

Figure 2. Focus areas of SQL, NoSQL, NewSQL, and Polystore databases.

                     NoSQL 
                             SQL 
NewSQL 

Relational Model [Codd 1970] Google BigTable [Chang 2006] NewSQL [Cattell 2010] SQL Era NoSQL Era NewSQL Era Polystore Era BigDAWG Polystore [Elmore 2015] SQL NoSQL NewSQL Future Example PostgreSQL Accumulo SciDB BigDAWG Application Transactions Search Analysis All Data Model Relational Tables Key-Value Pairs Sparse Matrices Associative Arrays Math Set Theory Graph Theory Linear Algebra Associative Algebra Consistency Volume Velocity Variety Analytics Usability SQL NoSQL NewSQL Polystore This material is based upon work supported by the National Science Foundation under Grant No. DMS-1312831. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the N

This content is AI-processed based on ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut