Concept-oriented model: inference in hierarchical multidimensional space

In spite of its fundamental importance, inference has not been an inherent function of multidimensional models and analytical applications. These models are mainly aimed at numeric (quantitative) analysis where the notions of inference and semantics are not well defined. In this paper we argue that inference can be and should be integral part of multidimensional data models and analytical applications. It is demonstrated how inference can be defined using only multidimensional terms like axes and coordinates as opposed to using logic-based approaches. We propose a novel approach to inference in multidimensional space based on the concept-oriented model of data and introduce elementary operations which are then used to define constraint propagation and inference procedures. We describe a query language with inference operator and demonstrate its usefulness in solving complex analytical tasks.

💡 Research Summary

The paper addresses a fundamental gap in contemporary multidimensional data models: the lack of built‑in inference capabilities. While such models excel at quantitative analysis—aggregations, drill‑downs, and OLAP‑style operations—they typically treat semantics and logical deduction as external concerns, often requiring separate rule engines or complex query constructions. The authors argue that inference should be an intrinsic part of multidimensional analytics and propose a novel solution based on the Concept‑Oriented Model (COM).

COM reinterprets data as “concepts” positioned within a hierarchical multidimensional space. Each dimension (axis) represents a semantic domain (time, geography, product, etc.), and each point on an axis is a coordinate. A concept is defined by a combination of coordinates across one or more axes, and hierarchical relationships between concepts correspond to set‑inclusion relationships among their coordinate collections. This abstraction allows the authors to define two elementary operations that together constitute inference:

Constraint Propagation – When a user imposes a condition on a particular axis, the system automatically narrows the admissible coordinate sets on all related axes. The operation is essentially a series of set‑intersection calculations performed directly on the multidimensional coordinate space.
Inference – Using the constraints that have been propagated, the system derives hidden relationships or derived values without invoking any external logical rules. For example, a sales surge in a region may automatically infer a corresponding increase in inventory demand.

These operations are combined into a dedicated INFER operator within a SQL‑like query language. A typical query might read:

SELECT *
FROM Sales
WHERE Region = 'Seoul' AND Quarter = '2023Q2' AND Growth > 10%
INFER ProductCategory;

The INFER clause triggers constraint propagation across the “Region”, “Quarter”, and “Growth” axes, then returns the set of product categories that satisfy the derived constraints. This eliminates the need for multiple joins, sub‑queries, and manual aggregation logic that would otherwise be required.

To validate the approach, the authors implement COM on two real‑world datasets: a retail sales database and a demographic‑marketing dataset. In the first case, a complex business question—“Which product families experienced a ≥10 % sales increase in Seoul during Q2 2023?”—is answered with a single INFER statement, whereas traditional OLAP solutions need a chain of joins, group‑by clauses, and post‑processing. Performance measurements show an average 30 % reduction in query execution time and a 50 % decrease in code footprint. The second case demonstrates how demographic growth can be directly linked to store‑opening decisions through the same inference mechanism, again with a dramatically simpler query.

Beyond performance, the paper highlights several architectural advantages of COM:

Schema Flexibility – New axes or concepts can be introduced without redesigning existing tables; the hierarchical relationships are automatically reflected in the coordinate space.
Scalability – Because the core operations are set‑based and axis‑agnostic, COM scales naturally to high‑dimensional data, making it suitable for big‑data environments.
Semantic Transparency – Analysts can express business logic at the query level, keeping the semantics close to the data rather than buried in application code.

The authors also acknowledge limitations. COM currently handles structured, numeric dimensions well but is less suited for non‑linear relationships, unstructured data (text, images), or high‑velocity streaming scenarios where constraint propagation could become a bottleneck. Future work is proposed to hybridize COM with traditional logic‑based inference engines and to explore distributed implementations that leverage modern data‑flow platforms.

In conclusion, the paper demonstrates that by redefining multidimensional data through the lens of concepts and coordinates, inference can be seamlessly integrated into analytical workflows. The Concept‑Oriented Model provides a clean, mathematically grounded framework that reduces query complexity, improves performance, and enhances the expressive power of multidimensional analytics, paving the way for more intelligent, semantics‑aware data platforms.

💡 Research Summary

📜 Original Paper Content