Recurrent Meta-Structure for Robust Similarity Measure in Heterogeneous Information Networks
Similarity measure as a fundamental task in heterogeneous information network analysis has been applied to many areas, e.g., product recommendation, clustering and Web search. Most of the existing metrics depend on the meta-path or meta-structure specified by users in advance. These metrics are thus sensitive to the pre-specified meta-path or meta-structure. In this paper, a novel similarity measure in heterogeneous information networks, called Recurrent Meta-Structure-based Similarity (RMSS), is proposed. The recurrent meta-structure as a schematic structure in heterogeneous information networks provides a unified framework to integrate all of the meta-paths and meta-structures. Therefore, RMSS is robust to the meta-paths and meta-structures. We devise an approach to automatically constructing the recurrent meta-structure. In order to formalize the semantics, the recurrent meta-structure is decomposed into several recurrent meta-paths and recurrent meta-trees, and we then define the commuting matrices of the recurrent meta-paths and meta-trees. All of the commuting matrices of the recurrent meta-paths and meta-trees are combined according to different weights. Note that the weights can be determined by two kinds of weighting strategies: local weighting strategy and global weighting strategy. As a result, RMSS is defined by virtue of the final commuting matrix. Experimental evaluations show that the existing metrics are sensitive to different meta-paths or meta-structures and that the proposed RMSS outperforms the existing metrics in terms of ranking and clustering tasks.
💡 Research Summary
The paper addresses a fundamental challenge in heterogeneous information networks (HINs): measuring similarity between objects without relying on user‑specified meta‑paths or meta‑structures. Existing similarity measures such as PathSim, PCRW, BSCSE, and SMSS require the analyst to pre‑define a meta‑path (a sequence of object types) or a meta‑structure (a more complex subgraph). The performance of these methods is highly sensitive to the chosen schema, making them difficult to use for non‑expert users and limiting their ability to capture rich, composite semantics.
To overcome this limitation, the authors propose a novel similarity measure called Recurrent Meta‑Structure based Similarity (RMSS). The core idea is the construction of a Recurrent Meta‑Structure (RecurMS), a schematic representation that automatically incorporates all possible meta‑paths and meta‑structures within a HIN. RecurMS is generated by traversing the network schema repeatedly, allowing object types to be revisited; this creates a “recurrent” pattern that captures every feasible composite relation without manual specification.
Because RecurMS tightly couples object types (the similarity between different types becomes zero), the authors decompose it into two more manageable components:
- Recurrent Meta‑Paths (RMPs) – linear sequences that may contain cycles.
- Recurrent Meta‑Trees (RMTs) – tree‑like structures that also permit repeated node visits.
For each RMP and RMT the authors define a commuting matrix, i.e., the product of adjacency matrices along the sequence or tree. These matrices quantify the strength of connections induced by the corresponding schema fragment. The commuting matrices are the building blocks of the similarity measure.
The next step is to combine the commuting matrices. Two weighting strategies are introduced:
- Local Weighting – assigns a weight to each RMP/RMT based on its sparsity (fraction of zero entries) and average transition strength. Sparse, weakly connected fragments receive lower weight, while dense, strong fragments receive higher weight.
- Global Weighting – evaluates the relative importance of each fragment in the context of the whole network, e.g., by the proportion of total paths that pass through the fragment.
Both strategies include a normalization step so that the final weighted sum remains comparable across different networks. The final RMSS similarity matrix is the weighted sum of all commuting matrices.
The algorithmic pipeline is:
- Input the HIN schema.
- Automatically construct RecurMS.
- Decompose RecurMS into RMPs and RMTs.
- Compute commuting matrices for each component.
- Apply either local or global weighting, normalize, and sum.
- Use the resulting similarity matrix for downstream tasks (ranking, clustering, etc.).
Experimental Evaluation
The authors evaluate RMSS on three real‑world datasets:
- A bibliographic network (Authors, Papers, Terms, Venues).
- A biological network (Genes, Tissues, GO terms, Chemical compounds, Substructures, Side effects).
- An additional heterogeneous dataset (details omitted in the excerpt).
Baseline methods include PathSim, PCRW, BSCSE, and SMSS. Evaluation metrics cover ranking quality (MAP, NDCG, Precision@k) and clustering quality (NMI, Purity, ARI). Results show that:
- RMSS consistently outperforms baselines across all datasets.
- The performance of baseline methods varies dramatically with the choice of meta‑path or meta‑structure, confirming their sensitivity.
- RMSS’s advantage is most pronounced when the task requires capturing complex, multi‑hop semantics (e.g., author‑paper‑term‑venue relationships).
- The two weighting strategies yield comparable results; local weighting is slightly more stable on sparse networks.
Contributions and Significance
- Introduction of RecurMS, an automatically generated, schema‑level structure that unifies all possible meta‑paths and meta‑structures.
- Formal decomposition of RecurMS into recurrent meta‑paths and meta‑trees, together with commuting matrix definitions.
- Two principled weighting schemes that allow flexible emphasis on more informative schema fragments.
- Extensive empirical validation demonstrating robustness and superior performance in ranking and clustering tasks.
Limitations and Future Work
While RMSS eliminates the need for manual schema selection, the size of RecurMS can become very large for networks with many object types, leading to high computational and memory costs for matrix multiplications. The authors suggest exploring matrix approximation, sampling techniques, or parallel implementations to mitigate this issue. Additionally, the current formulation focuses on similarity between objects of the same type; extending RMSS to heterogeneous similarity (different object types) and integrating it with deep learning embeddings are promising directions.
Conclusion
RMSS provides a robust, schema‑agnostic similarity measure for heterogeneous information networks. By automatically capturing the full spectrum of composite relations and intelligently weighting their contributions, it overcomes the fragility of traditional meta‑path/structure‑based methods and delivers superior performance in practical applications such as recommendation, clustering, and information retrieval.
Comments & Academic Discussion
Loading comments...
Leave a Comment