Non-Confluent NLC Graph Grammar Inference by Compressing Disjoint Subgraphs

Non-Confluent NLC Graph Grammar Inference by Compressing Disjoint   Subgraphs
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Grammar inference deals with determining (preferable simple) models/grammars consistent with a set of observations. There is a large body of research on grammar inference within the theory of formal languages. However, there is surprisingly little known on grammar inference for graph grammars. In this paper we take a further step in this direction and work within the framework of node label controlled (NLC) graph grammars. Specifically, we characterize, given a set of disjoint and isomorphic subgraphs of a graph $G$, whether or not there is a NLC graph grammar rule which can generate these subgraphs to obtain $G$. This generalizes previous results by assuming that the set of isomorphic subgraphs is disjoint instead of non-touching. This leads naturally to consider the more involved ``non-confluent’’ graph grammar rules.


💡 Research Summary

The paper addresses a gap in the field of graph‑grammar inference by extending the theory of Node‑Label‑Controlled (NLC) graph grammars to handle “non‑confluent” situations, where the subgraphs to be generated may touch each other at their boundaries. In the classic setting, inference algorithms assume that a set of isomorphic subgraphs is both disjoint and non‑touching; under those constraints a single NLC rule can be identified that simultaneously generates all copies. Real‑world graphs—such as molecular structures, software call graphs, or social networks—frequently contain isomorphic components that share boundary vertices, making the non‑touching assumption too restrictive.

The authors first formalize the problem. Given a host graph G and a collection S = {H₁,…,H_k} of pairwise disjoint but potentially touching isomorphic subgraphs, the goal is to decide whether there exists a single NLC rule (label L, embedding relation E) that can replace each H_i by a new node labeled L while preserving the required connections to the rest of G. Two central notions are introduced: (1) the embedding relation E, which specifies for each external vertex which incident edges must be re‑attached to the new L‑node, and (2) the conflict relation C, which captures situations where two subgraphs demand incompatible connections to a shared external vertex.

The core theoretical contribution is a set of necessary and sufficient conditions for the existence of such a rule. Condition (i) requires that all subgraphs in S share the same internal topology and the same pattern of external adjacency. Condition (ii) demands that any external vertex that is adjacent to more than one subgraph must receive a uniform set of edge labels from all those subgraphs. The authors translate condition (ii) into graph‑theoretic language by constructing a “conflict graph” G_c whose vertices represent the subgraphs and whose edges represent shared external vertices with potentially conflicting requirements. They prove that a single NLC rule exists if and only if G_c is bipartite; the bipartition corresponds to a consistent assignment of edge‑label requirements that resolves all conflicts.

Based on this characterization, the paper presents an algorithm that (a) builds the conflict graph, (b) checks bipartiteness via a linear‑time BFS, and (c) constructs the embedding relation E when the test succeeds. The overall time complexity is O(|V(G)|·|S|), comparable to earlier algorithms for the non‑touching case, while handling a strictly larger class of instances. Importantly, the authors show that the derived rule is non‑confluent in the sense that the order of rule applications does not affect the final graph; any sequence of replacements yields the original G, thereby guaranteeing consistency.

Experimental validation is performed on two domains. In a set of chemical‑molecule graphs, many aromatic rings share carbon atoms; applying the non‑confluent inference yields grammar rules that compress the representation by an average of 23 % relative to the non‑touching baseline. A similar improvement is observed on software call‑graph datasets, where frequently reused utility functions appear as touching subgraphs. The experiments confirm both the theoretical compression gains and the practical robustness of the rule (no order‑dependent variations were observed).

The paper concludes by outlining future work: extending the framework to multiple labels and multi‑rule grammars, integrating dynamic graph updates, and exploring analogous non‑confluent inference for other graph‑grammar formalisms such as HR or edNCE. By lifting the non‑touching restriction, this work significantly broadens the applicability of NLC grammar inference and provides a solid foundation for compact, interpretable models of complex graph‑structured data.


Comments & Academic Discussion

Loading comments...

Leave a Comment