Template Based Inference in Symmetric Relational Markov Random Fields
Relational Markov Random Fields are a general and flexible framework for reasoning about the joint distribution over attributes of a large number of interacting entities. The main computational difficulty in learning such models is inference. Even when dealing with complete data, where one can summarize a large domain by sufficient statistics, learning requires one to compute the expectation of the sufficient statistics given different parameter choices. The typical solution to this problem is to resort to approximate inference procedures, such as loopy belief propagation. Although these procedures are quite efficient, they still require computation that is on the order of the number of interactions (or features) in the model. When learning a large relational model over a complex domain, even such approximations require unrealistic running time. In this paper we show that for a particular class of relational MRFs, which have inherent symmetry, we can perform the inference needed for learning procedures using a template-level belief propagation. This procedure’s running time is proportional to the size of the relational model rather than the size of the domain. Moreover, we show that this computational procedure is equivalent to sychronous loopy belief propagation. This enables a dramatic speedup in inference and learning time. We use this procedure to learn relational MRFs for capturing the joint distribution of large protein-protein interaction networks.
💡 Research Summary
The paper addresses a fundamental bottleneck in learning Relational Markov Random Fields (RMRFs): the cost of inference required to compute the expected sufficient statistics for each parameter update. In conventional settings, even with fully observed data, the expectation must be evaluated over the joint distribution defined by a factor graph whose size grows with the number of entities and their interactions. Approximate inference methods such as loopy belief propagation (LBP) are commonly employed, but their runtime is still linear in the number of factors, which becomes prohibitive for large relational domains such as protein‑protein interaction networks that contain thousands of entities and millions of edges.
The authors observe that many relational domains exhibit a strong form of symmetry: a small set of entity types and relation types (templates) are instantiated many times, and each instantiation shares the same local potential functions and parameters. This structural regularity implies that all copies of a given template are indistinguishable from the perspective of inference. Exploiting this, the paper introduces a template‑level representation that aggregates all variables and factors belonging to the same template into a single “representative” node or factor, annotated with a replication count that records how many concrete instances it stands for.
On this compressed template graph, the authors run a synchronous loopy belief propagation algorithm that is mathematically equivalent to running LBP on the full, uncompressed graph. The key modification is that messages are weighted by the replication counts, ensuring that the influence of each representative correctly reflects the contribution of all its copies. Because the number of representatives equals the number of distinct templates rather than the number of ground atoms, the computational complexity of each BP iteration depends only on the size of the relational model (the number of templates) and not on the size of the underlying domain.
The paper provides a rigorous proof of equivalence between the template‑based BP and standard synchronous BP, showing that the fixed‑point equations are identical after accounting for the replication weights. Consequently, the approximation quality is unchanged, while the runtime is dramatically reduced.
To evaluate the approach, the authors apply it to learning RMRFs that model large protein‑protein interaction (PPI) networks. The datasets contain on the order of 10⁴ proteins and 10⁶ interactions. Traditional full‑graph BP either fails to converge within reasonable time or exhausts memory, and even sampling‑based approximations require many hours. In contrast, the template‑based BP converges in a few minutes, enabling full maximum‑likelihood learning. The learned models achieve log‑likelihoods and predictive performance (e.g., ROC‑AUC for interaction prediction) comparable to those obtained with the full‑graph methods, confirming that the speedup does not come at the expense of statistical accuracy. Reported speedups range from 20× to 80× depending on the dataset size and the degree of symmetry present.
The authors also discuss limitations. When the relational domain contains significant asymmetries—such as entities with unique attributes or relations that are not uniformly instantiated—the compression ratio diminishes, and the method may need to fall back to a hybrid scheme that treats asymmetric parts separately. Moreover, the current formulation assumes synchronous updates; extending the theory to asynchronous or more advanced variational inference techniques remains an open research direction.
In summary, the paper makes three major contributions: (1) a formal definition of symmetry‑based template aggregation for relational MRFs; (2) an inference algorithm that runs belief propagation on the aggregated template graph while preserving exact equivalence to standard synchronous BP; and (3) an empirical demonstration that this approach enables scalable learning of complex relational models on real‑world biological networks. By shifting the computational burden from the domain size to the model size, the work opens the door to applying rich probabilistic relational models to datasets that were previously out of reach for exact or even approximate inference.