Discovering Knowledge using a Constraint-based Language
Discovering pattern sets or global patterns is an attractive issue from the pattern mining community in order to provide useful information. By combining local patterns satisfying a joint meaning, this approach produces patterns of higher level and thus more useful for the data analyst than the usual local patterns, while reducing the number of patterns. In parallel, recent works investigating relationships between data mining and constraint programming (CP) show that the CP paradigm is a nice framework to model and mine such patterns in a declarative and generic way. We present a constraint-based language which enables us to define queries addressing patterns sets and global patterns. The usefulness of such a declarative approach is highlighted by several examples coming from the clustering based on associations. This language has been implemented in the CP framework.
💡 Research Summary
The paper addresses the challenge of extracting meaningful global patterns or pattern sets from the massive collections of local patterns typically generated by pattern‑mining algorithms. While local pattern mining has become a mature field, analysts often face the problem that the sheer number of patterns overwhelms them and that the patterns themselves are fragmented pieces of knowledge. Existing approaches to combine local patterns into higher‑level structures rely on ad‑hoc heuristics, post‑processing steps, or problem‑specific algorithms, which limits their generality and reusability.
In response, the authors propose a declarative, constraint‑based language that sits on top of a constraint‑programming (CP) engine. The language is built around four syntactic ingredients: constants (numeric values, items, patterns, transactions), variables (unknown patterns X₁ … X_k), operators (set operators such as ∪, ∩, \ and arithmetic operators), and function symbols. A set of built‑in functions is provided that are directly relevant to pattern mining: freq/1 (frequency of a pattern), size/1 (cardinality), cover/1 (the set of transactions covering a pattern), overlapItems/2 (number of shared items between two patterns), and overlapTransactions/2 (number of shared transactions). Users can also define their own functions by combining existing ones; examples include area(X) = freq(X) × size(X) and growth‑rate(X) = |D₂|·freq(X,D₁) / (|D₁|·freq(X,D₂)).
Constraints are relations over these terms and fall into three categories: (i) numerical constraints (e.g., freq(X) ≤ 10), (ii) set constraints (e.g., X ⊆ Y), and (iii) domain‑specific constraints such as closed(X) (X is a closed pattern), coverTransactions(
Comments & Academic Discussion
Loading comments...
Leave a Comment