Clustering Co-occurrence of Maximal Frequent Patterns in Streams
One way of getting a better view of data is using frequent patterns. In this paper frequent patterns are subsets that occur a minimal number of times in a stream of itemsets. However, the discovery of frequent patterns in streams has always been problematic. Because streams are potentially endless it is in principle impossible to say if a pattern is often occurring or not. Furthermore the number of patterns can be huge and a good overview of the structure of the stream is lost quickly. The proposed approach will use clustering to facilitate the analysis of the structure of the stream. A clustering on the co-occurrence of patterns will give the user an improved view on the structure of the stream. Some patterns might occur so much together that they should form a combined pattern. In this way the patterns in the clustering will be the largest frequent patterns: maximal frequent patterns. Our approach to decide if patterns occur often together will be based on a method of clustering when only the distance between pairs is known. The number of maximal frequent patterns is much smaller and combined with clustering methods these patterns provide a good view on the structure of the stream.
💡 Research Summary
The paper tackles the long‑standing challenge of mining frequent patterns in potentially infinite data streams, where it is impossible to know the final support of any itemset and where the sheer number of patterns quickly overwhelms analysts. To address both the computational and interpretability problems, the authors propose a two‑stage framework that first extracts a compact set of maximal frequent patterns (MFPs) and then clusters these patterns based on how often they co‑occur in the stream.
Stream‑aware frequency estimation
Instead of storing the whole stream, the method uses a sliding window or an exponential decay model to give higher weight to recent transactions. Approximate counting structures such as Count‑Min Sketch keep memory usage bounded while providing an estimate of the “window‑support” for each candidate itemset. A minimum support threshold is applied to this window‑support, yielding a set of frequent patterns that are guaranteed to be frequent in the recent past.
Deriving maximal frequent patterns
From the frequent set, the algorithm removes any pattern that is a proper subset of another pattern with the same support. The remaining patterns are maximal: they cannot be extended without violating the support constraint. This step dramatically reduces the number of patterns (often to less than 10 % of the original frequent set) and ensures that each MFP represents a distinct region of the data space.
Co‑occurrence distance matrix
For every pair of MFPs (P_i, P_j), the system computes a co‑occurrence ratio: the number of windows where both patterns appear together divided by the total number of windows where either appears. The distance is defined as d_{ij}=1−co‑occurrence‑ratio, yielding values in
Comments & Academic Discussion
Loading comments...
Leave a Comment