Relationship-aware sequential pattern mining

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Relationship-aware sequential pattern mining is the problem of mining frequent patterns in sequences in which the events of a sequence are mutually related by one or more concepts from some respective hierarchical taxonomies, based on the type of the events. Additionally events themselves are also described with a certain number of taxonomical concepts. We present RaSP an algorithm that is able to mine relationship-aware patterns over such sequences; RaSP follows a two stage approach. In the first stage it mines for frequent type patterns and {\em all} their occurrences within the different sequences. In the second stage it performs hierarchical mining where for each frequent type pattern and its occurrences it mines for more specific frequent patterns in the lower levels of the taxonomies. We test RaSP on a real world medical application, that provided the inspiration for its development, in which we mine for frequent patterns of medical behavior in the antibiotic treatment of microbes and show that it has a very good computational performance given the complexity of the relationship-aware sequential pattern mining problem.

💡 Research Summary

The paper introduces a novel problem called Relationship‑aware Sequential Pattern Mining (RSPM), which extends traditional sequential pattern mining by simultaneously considering (i) hierarchical taxonomies that describe event types and (ii) discrete, possibly multi‑valued relationships between pairs of events, each also described by its own taxonomy. An event is defined by a type t, a vector of concepts c(e) drawn from the type’s taxonomies, and for any pair of events (e_i, e_j) a relationship concept vector r(e_i, e_j) drawn from the relationship taxonomy ρ_{t_i t_j}. A pattern matches a sequence if the types, concept subsumption relations, and relationship concept vectors all align, and the temporal separators (transaction delimiters) are preserved. Additional constraints such as maximum gap and maximum projected length can be imposed.

To solve RSPM the authors propose RaSP, a two‑stage algorithm. Stage 1 works on the type‑aware representation of the database (Σ_{ta}), mining frequent type patterns using a modified GSP algorithm. Unlike standard GSP, the modification (MGSP‑SCC for sequences without delimiters and MGSP‑GCC for those with delimiters) records all occurrences of each candidate pattern, not just the first. Early pruning is achieved by pre‑computing type‑frequency vectors for both patterns and sequences, allowing immediate rejection when a pattern contains more instances of a type than the sequence. This stage yields a set of frequent type patterns Π_{ta} together with their full occurrence sets O.

Stage 2 refines each frequent type pattern. For every occurrence of Π_{ta} the corresponding original sequence is transformed into a type‑and‑concept‑aware representation (Σ_{ca}) that concatenates the event‑type, event‑concept, and relationship‑concept arrays. Because all occurrences of the same Π_{ta} have identical length in this representation, the refinement problem reduces to a conventional frequent pattern mining task on a set of equally‑long sequences. The mining is performed independently for each Π_{ta}, exploring deeper levels of the event and relationship taxonomies. Consequently, a root‑level pattern such as “Antibiotic A → Microbe X” can be specialized into “Antibiotic A (generic) → Microbe X (specific strain)”, capturing richer clinical semantics.

The authors analyze computational complexity. Stage 1 retains the O(N·L·|C|) time of classic GSP (N = number of sequences, L = average length, |C| = number of candidates) plus linear overhead for storing occurrences. Stage 2’s cost is proportional to the sum over all frequent type patterns of the number of their occurrences times the depth of the taxonomies, i.e., O(∑_p |O_p|·d).

Empirical evaluation uses a real‑world medical dataset comprising thousands of patient records, 20 event types, and 3‑4 hierarchical levels for antibiotics and microbes. RaSP discovers over 1,200 meaningful patterns, including clinically relevant associations such as specific antibiotic‑microbe co‑occurrences and resistance patterns. Compared with a baseline GSP‑only approach, RaSP achieves a 2–3× speed‑up while delivering far richer, relationship‑aware patterns.

The paper situates its contribution among prior work on GSP, PrefixSpan, and taxonomy‑aware mining, noting that none of these handle inter‑event relationships. By modeling relationships as symmetric, multi‑valued, and hierarchical, RaSP fills a gap and opens avenues for applications in electronic health records, log analysis, and social network event streams. Future directions include extending to n‑ary relationships, online streaming scenarios, and broader domain validation.

Relationship-aware sequential pattern mining

💡 Research Summary

Comments & Academic Discussion

Leave a Comment