Practical language based on systems of definitions
The article suggests a description of a system of tables with a set of special lists absorbing a semantics of data and reflects a fullness of data. It shows how their parallel processing can be constructed based on the descriptions. The approach also might be used for definition intermediate targets for data mining and unstructured data processing.
💡 Research Summary
The paper introduces a novel approach to business data processing called Dictionary‑Driven Reports (DDR). At its core, DDR treats any two‑dimensional table as a composition of four named lists: a list of table names, a list of column names, a list of row names, and a list of table attributes. Each list is drawn from a “local universe” (e.g., the set of natural numbers, real numbers, character strings, intervals, vectors, or graphs). Elements of a list carry a special meta‑attribute called genesis that records the origin of the element (which dictionaries and universes it belongs to). This provenance information enables rigorous consistency checks: when mapping input tables to an output table, a cell in the result may be produced only from input cells whose genesis includes the result cell’s genesis. In other words, table transformations must be homomorphisms respecting the underlying dictionary structure.
The authors describe a small algebra of dictionary operations: standard set operations (union, intersection, Cartesian product, difference), lexical and numeric ordering, and several higher‑level constructs. Hierarchical dictionaries can be generated with the “
A concrete example is given for an expense‑reporting scenario. Lists such as DEPARTMENT = {IT, HR}, QUARTER = {1, 2, 3, 4}, YEAR = {2007, 2008}, PROJECT = {pr1, pr2}, EXPENSE = {office, trip}, PERSONNEL = {p10,…,p23}, and INDICATOR = {expenses, personnel, amount, max personnel, max amount} are defined. The expense table is then described by:
- names = DEPARTMENT,
- columns = INDICATOR,
- rows = PROJECT // EXPENSE // PERSONNEL,
- attributes = ADDRESS × DEPARTMENT × QUARTER × YEAR.
Because the table definition is fully declarative, the system can automatically verify that all required reports (e.g., every department for each quarter) are present, that identifiers are correctly spelled, and that the data obeys appropriate measurement‑scale constraints (e.g., amounts can be summed, multiplied by constants, divided, but not multiplied by other amounts). The paper references Suppes‑Zinnes scales to formalize which operations are meaningful for each list.
Parallelism emerges naturally: cells that map to the same output cell have independent origins and can be processed concurrently. No explicit threading or synchronization primitives are needed; the homomorphic mapping guarantees that the aggregation (e.g., summation) of all contributing cells yields the correct result. This property is highlighted as a “parallel programming without using the word ‘parallel’” feature.
From a data‑mining perspective, the DDR model provides a systematic way to reconstruct the hidden schema of a collection of tables. By extracting the dictionaries (names, rows, columns, attributes) from observed tables, one can assess completeness (are any tables missing?), detect semantic anomalies, and define intermediate mining targets such as “find all tables that share the same row dictionary”. The authors argue that reconstructing a full, hole‑free set of connected tables is more valuable than gathering isolated fragments.
The paper also proposes a rule‑based execution engine built on two reserved tables: LRRE (local rule names) and LTP‑D (rule definitions). Each rule lists input tables and output tables; when any input table changes, the corresponding rule fires, producing new tables according to the declaratively defined transformations. Because rules never modify user tables directly, the system can be restarted without loss of consistency, similar to a message‑queue approach.
Implementation considerations are modest: the authors suggest a prototype in Linux using C++ or Java, without requiring a full relational database management system. The system could be extended with distributed functionality (e.g., rules for sending/receiving data across nodes) and security checks (rules that validate authorisation before allowing other rules to execute).
In summary, the DDR framework unifies table definition, semantic validation, parallel processing, and data‑mining target generation under a single declarative language based on dictionaries and homomorphic mappings. It offers a lightweight alternative to traditional DBMS‑centric designs, especially suited for environments where data is transient, highly structured as tables, and where rapid prototyping or educational projects are desired. Future work could explore richer universes (graphs, tensors), more sophisticated measurement‑scale algebra, and integration with existing data‑warehouse tools.
Comments & Academic Discussion
Loading comments...
Leave a Comment