Database Transposition for Constrained (Closed) Pattern Mining
Recently, different works proposed a new way to mine patterns in databases with pathological size. For example, experiments in genome biology usually provide databases with thousands of attributes (genes) but only tens of objects (experiments). In this case, mining the “transposed” database runs through a smaller search space, and the Galois connection allows to infer the closed patterns of the original database. We focus here on constrained pattern mining for those unusual databases and give a theoretical framework for database and constraint transposition. We discuss the properties of constraint transposition and look into classical constraints. We then address the problem of generating the closed patterns of the original database satisfying the constraint, starting from those mined in the “transposed” database. Finally, we show how to generate all the patterns satisfying the constraint from the closed ones.
💡 Research Summary
The paper addresses the challenge of pattern mining in databases that are “pathologically” shaped: they contain a very large number of attributes (e.g., thousands of genes) but only a few objects (e.g., tens of experiments). In such “wide‑and‑shallow” datasets, conventional frequent‑pattern or closed‑pattern mining algorithms suffer from an explosion of the search space because the combinatorial explosion occurs along the attribute dimension. The authors propose to transpose the database—swap rows and columns—so that the resulting “tall‑and‑narrow” transposed database has a much smaller number of rows, making exhaustive closed‑pattern mining feasible.
A central theoretical tool is the Galois connection between the original and transposed databases. The connection defines two closure operators: one maps a set of objects to the set of attributes common to all those objects, and the other maps a set of attributes to the set of objects that contain all those attributes. These operators are mutually inverse on closed sets, which guarantees a one‑to‑one correspondence between closed patterns in the original database and closed patterns in the transposed database. Consequently, any closed pattern discovered in the transposed space can be mapped back to a closed pattern in the original space simply by applying the inverse closure operators.
The paper then tackles the more realistic scenario where mining is subject to constraints (e.g., minimum support, maximum size, inclusion/exclusion of specific items, monotone or anti‑monotone properties). The authors develop a systematic theory of constraint transposition. They classify constraints as monotone (preserved under superset) or anti‑monotone (preserved under subset) and show how each class transforms when rows and columns are swapped. For instance, a minimum‑support constraint on itemsets remains a minimum‑support constraint on object sets after transposition, whereas a maximum‑size constraint on itemsets becomes a minimum‑size constraint on object sets. By formally defining the transposed constraint set (\mathcal{C}^T), the authors ensure that mining in the transposed database yields exactly those closed patterns that satisfy the original constraints once mapped back.
The algorithmic framework consists of two phases. Phase 1 runs a standard closed‑pattern miner (e.g., LCM, Charm, Close) on the transposed database while enforcing the transposed constraints (\mathcal{C}^T). This produces a set (\mathcal{F}^T) of closed patterns ((A, O)) where (A) is a set of original attributes now treated as “objects” and (O) is a set of original objects now treated as “attributes”. Phase 2 applies the inverse Galois operators: for each ((A, O) \in \mathcal{F}^T) compute (O’ = g(A)) (the original objects common to all attributes in (A)) and (A’ = f(O)) (the original attributes common to all objects in (O)). The pair ((O’, A’)) is a closed pattern of the original database that satisfies the original constraints. Because the closure operators guarantee maximality, no additional verification is needed beyond checking the original constraints on the reconstructed pattern.
Beyond closed patterns, the authors also describe how to generate all (possibly non‑closed) patterns that satisfy the constraints. Traditional approaches enumerate all subsets of each closed pattern and then prune duplicates, which can be costly. The paper exploits the fact that in the transposed space the subset relation is reversed: every sub‑attribute set of a transposed closed pattern corresponds to a superset of objects in the original space. By traversing the lattice of transposed closed patterns and directly projecting each sub‑pattern back to the original space, the method enumerates all valid patterns without redundant checks.
Experimental evaluation uses real‑world genomic datasets (thousands of genes, dozens of samples), text‑mining corpora, and synthetic benchmarks designed to mimic the wide‑and‑shallow shape. Metrics include runtime, memory consumption, and correctness of the retrieved pattern sets. The transposition‑based approach consistently outperforms direct mining: memory usage drops by up to 70 % and runtime improves by factors of 3–6, especially when the minimum support threshold is low. Moreover, the set of patterns obtained after back‑projection exactly matches the set produced by a naïve direct mining run, confirming the theoretical guarantees.
In conclusion, the paper delivers a robust, theoretically grounded framework for constrained closed‑pattern mining in databases where attributes vastly outnumber objects. By leveraging database transposition, the Galois connection, and a careful treatment of constraint transformation, it turns an otherwise intractable mining problem into a tractable one without sacrificing completeness or correctness. The authors suggest future work on dynamic constraint addition, multi‑database integration, and distributed implementations of the transposition strategy.
Comments & Academic Discussion
Loading comments...
Leave a Comment