Concept-oriented model: Modeling and processing data using functions

Reading time: 6 minute
...

📝 Original Info

  • Title: Concept-oriented model: Modeling and processing data using functions
  • ArXiv ID: 1911.07225
  • Date: 2019-11-19
  • Authors: Researchers from original ArXiv paper

📝 Abstract

We describe a new logical data model, called the concept-oriented model (COM). It uses mathematical functions as first-class constructs for data representation and data processing as opposed to using exclusively sets in conventional set-oriented models. Functions and function composition are used as primary semantic units for describing data connectivity instead of relations and relation composition (join), respectively. Grouping and aggregation are also performed by using (accumulate) functions providing an alternative to group-by and reduce operations. This model was implemented in an open source data processing toolkit examples of which are used to illustrate the model and its operations. The main benefit of this model is that typical data processing tasks become simpler and more natural when using functions in comparison to adopting sets and set operations.

💡 Deep Analysis

Deep Dive into Concept-oriented model: Modeling and processing data using functions.

We describe a new logical data model, called the concept-oriented model (COM). It uses mathematical functions as first-class constructs for data representation and data processing as opposed to using exclusively sets in conventional set-oriented models. Functions and function composition are used as primary semantic units for describing data connectivity instead of relations and relation composition (join), respectively. Grouping and aggregation are also performed by using (accumulate) functions providing an alternative to group-by and reduce operations. This model was implemented in an open source data processing toolkit examples of which are used to illustrate the model and its operations. The main benefit of this model is that typical data processing tasks become simpler and more natural when using functions in comparison to adopting sets and set operations.

📄 Full Content

1 Concept-oriented model: Modeling and processing data using functions Alexandr Savinov http://conceptoriented.org 17.11.2019

ABSTRACT We describe a new logical data model, called the concept- oriented model (COM). It uses mathematical functions as first- class constructs for data representation and data processing as opposed to using exclusively sets in conventional set-oriented models. Functions and function composition are used as primary semantic units for describing data connectivity instead of relations and relation composition (join), respectively. Grouping and aggregation are also performed by using (accumulate) functions providing an alternative to group-by and reduce operations. This model was implemented in an open source data processing toolkit examples of which are used to illustrate the model and its operations. The main benefit of this model is that typical data processing tasks become simpler and more natural when using functions in comparison to adopting sets and set operations.
KEYWORDS
Logical data models; Functional data models; Data processing
1 Introduction
1.1 Who Is to Blame?
Most of the currently existing data models, query languages and data processing frameworks including SQL and MapReduce use mathematical sets for data representation and set operations for data transformations. They describe a data processing task as a graph of operations with sets. Deriving new data means producing new sets from existing sets where sets can be implemented as relational tables, collections, key-value maps, data frames or similar structures.
However, many conventional data processing patterns describe a data processing task as deriving new properties rather than sets where properties can be implemented as columns, attributes, fields or similar constructs. If properties are represented via mathematical functions then this means that they are main units of data representation and transformation. Below we describe several typical tasks and show that solving them by means of set operations is a problem-solution mismatch, which makes data modeling and data processing less natural, more complex and error prone.

Figure 1: Example data model
Calculated attributes. Assume that there is a table with order Items characterized by Quantity and Price attributes (Fig. 1, left). The task is to compute a new attribute Amount as their arithmetic product. A solution in SQL is almost obvious:
SELECT *, Quantity * Price AS Amount
(1) FROM Items
Although this standard solution seems very natural and almost trivial, it does have one subtle flaw: the task was to compute a new attribute while this query produces a new table. Then the question is why not to do exactly what has been requested by producing a new attribute? Why is it necessary to produce a new table (with a new attribute) if we actually want to attach a new attribute to the existing table? A short answer is that such an operation for adding new (derived) attributes simply does not exist. We simply have no choice and must adopt what is available – a set operation.
Link attributes. Another generic data processing pattern consists in computing links (or references) between tables: given a record in one table, how can we access attributes of related records in another table? For example, assume that Price is an attribute of a second Products table (Fig. 1, right), and it does not exist as an attribute of the Items table. We have two tables, Items and Products, with attributes ProductId and Id, respectively, which relate their records. If now we want to compute the Amount for each item then the price needs to be retrieved from the second Products table. This task can be easily solved by copying the necessary attributes into a new table using the relational (left) join:
Items
ProductId Quantity
Price
Products Id
Price
Amount
Product
TotalQ
TotalA
table existing columns derived columns calculate link aggregate

2 SELECT i.*, p.Price
(2) FROM Items i
JOIN Products p
ON i.ProductId = p.Id
This new result table has the necessary attributes Quantity and Price copied from two source tables and hence it can be used for computing the amount. Yet, let us again compare this solution with the problem formulation. Do we really need a new table? No. Our goal was to have a possibility to access attributes of the second Products table (while computing a new attribute in the first Items table). Hence, it again can be viewed as a workaround and forced solution where a new (unnecessary) table is produced just because it is the only way to access related data in this set-oriented model.
Aggregated attributes. The next typical data processing task is data aggregation. Assume that for each product in Products, we want to compute the total number of items ordered (Fig. 1). Group-by operation provides a standard solution:

…(Full text truncated)…

Reference

This content is AI-processed based on ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut