Mining Patterns in Networks using Homomorphism
In recent years many algorithms have been developed for finding patterns in graphs and networks. A disadvantage of these algorithms is that they use subgraph isomorphism to determine the support of a graph pattern; subgraph isomorphism is a well-known NP complete problem. In this paper, we propose an alternative approach which mines tree patterns in networks by using subgraph homomorphism. The advantage of homomorphism is that it can be computed in polynomial time, which allows us to develop an algorithm that mines tree patterns in arbitrary graphs in incremental polynomial time. Homomorphism however entails two problems not found when using isomorphism: (1) two patterns of different size can be equivalent; (2) patterns of unbounded size can be frequent. In this paper we formalize these problems and study solutions that easily fit within our algorithm.
💡 Research Summary
The paper tackles the fundamental computational bottleneck in frequent pattern mining on graphs: the reliance on subgraph isomorphism, which is NP‑complete. By restricting the pattern class to rooted unordered trees and using subgraph homomorphism (a mapping that need not be injective) as the support measure, the authors obtain a polynomial‑time computable support. However, homomorphism introduces two serious issues. First, trees of different sizes can be homomorphically equivalent, leading to an infinite number of redundant patterns (Problem 1). Second, when the data graph contains cycles, arbitrarily large trees may all map homomorphically onto a small subgraph, causing an unbounded number of frequent patterns (Problem 2).
To resolve these problems the authors introduce the notion of core trees. A core tree is a minimal representative of a homomorphism equivalence class: it has no proper subtree that is homomorphically equivalent to it, and no two sibling subtrees are homomorphically comparable. Core trees therefore eliminate redundancy while preserving all frequent patterns under homomorphism.
The paper also defines an anti‑monotonic constraint on pattern size that guarantees finiteness: any tree whose root image set size falls below a user‑specified threshold θ is pruned, and because support is anti‑monotonic under root‑preserving homomorphism, larger extensions cannot become frequent once the constraint is violated.
Algorithmically, the authors adopt a canonical code based on depth‑first traversal. For each node they output its depth and label, producing a string; the lexicographically maximal string among all possible orderings is taken as the canonical representation. Two key properties hold: (1) every prefix of a canonical code is itself canonical for the corresponding prefix tree, and (2) sibling subtrees must appear in non‑increasing lexical order. These properties enable a level‑wise, prefix‑based enumeration where each candidate extension is generated by appending a new node to an existing canonical code, guaranteeing that no duplicate trees are produced.
The mining procedure starts with single‑node trees, iteratively extends canonical trees while maintaining the core condition, computes the root image set via homomorphism (which can be done in polynomial time for trees), and retains those whose support meets the threshold. If a candidate fails the core test, it is reduced to its core representative. The anti‑monotonic constraint prunes the search space early, ensuring incremental polynomial time for the entire enumeration.
Beyond basic frequent pattern mining, the authors extend the framework to discover closed and maximal patterns. A closed pattern has no super‑pattern with the same support; a maximal pattern cannot be extended without losing frequency. By leveraging the core tree structure and canonical codes, the algorithm can efficiently test these properties during enumeration.
Finally, the paper introduces syntactic constraints (e.g., label ordering, depth limits, forbidden substructures) that can be imposed a priori to focus the mining on domain‑relevant patterns. This is demonstrated on bibliographic graphs where author‑paper‑keyword relationships are modeled, showing that homomorphism‑based mining can capture patterns that are invisible to isomorphism‑based methods because the latter enforce injectivity.
Experimental evaluation on synthetic and real‑world networks (citation graphs, social networks) confirms that the proposed algorithm dramatically reduces runtime and memory consumption compared to traditional isomorphism‑based miners. The output size is also much smaller because only core representatives are emitted, and the closed/maximal extensions add negligible overhead.
In summary, the paper makes five major contributions: (1) a polynomial‑time support definition based on subgraph homomorphism for tree patterns, (2) the core‑tree concept that eliminates infinite redundancy, (3) an anti‑monotonic finiteness constraint, (4) a provably incremental‑polynomial enumeration algorithm using canonical codes, and (5) extensions for closed, maximal, and syntactically constrained mining. Together, these advances provide a practical and theoretically sound alternative to isomorphism‑based graph mining, opening the door to scalable pattern discovery in large, cyclic networks.
Comments & Academic Discussion
Loading comments...
Leave a Comment