Discriminative Probabilistic Models for Relational Data

In many supervised learning tasks, the entities to be labeled are related to each other in complex ways and their labels are not independent. For example, in hypertext classification, the labels of linked pages are highly correlated. A standard approach is to classify each entity independently, ignoring the correlations between them. Recently, Probabilistic Relational Models, a relational version of Bayesian networks, were used to define a joint probabilistic model for a collection of related entities. In this paper, we present an alternative framework that builds on (conditional) Markov networks and addresses two limitations of the previous approach. First, undirected models do not impose the acyclicity constraint that hinders representation of many important relational dependencies in directed models. Second, undirected models are well suited for discriminative training, where we optimize the conditional likelihood of the labels given the features, which generally improves classification accuracy. We show how to train these models effectively, and how to use approximate probabilistic inference over the learned model for collective classification of multiple related entities. We provide experimental results on a webpage classification task, showing that accuracy can be significantly improved by modeling relational dependencies.

💡 Research Summary

The paper tackles the problem of classifying entities that are not independent but are linked through rich relational structures, a situation common in many real‑world tasks such as hyper‑text classification, social‑network labeling, and bio‑informatics. Traditional approaches either ignore these dependencies by classifying each instance independently, or they adopt Probabilistic Relational Models (PRMs), which extend Bayesian networks to relational domains. While PRMs provide a principled joint distribution, they suffer from two major drawbacks: (1) the directed acyclic graph (DAG) constraint prevents the representation of many natural cyclic dependencies that arise in relational data, and (2) PRMs are usually trained generatively, maximizing the joint likelihood of both features and labels, which is sub‑optimal when the ultimate goal is accurate label prediction.

To overcome these limitations, the authors propose a discriminative framework based on Conditional Markov Networks (CMNs), an undirected graphical model that directly models the conditional distribution P(Y | X,R) of labels Y given observed features X and relational links R. In this formulation, each entity i is associated with a label variable Y_i and a feature vector X_i, while each relational edge (i,j) is represented by a binary relation variable R_{ij}. The model defines two families of potential functions: (a) node potentials φ_i(Y_i, X_i) that capture the influence of local attributes on the label, and (b) edge potentials ψ_{ij}(Y_i, Y_j, R_{ij}) that encode relational dependencies such as label agreement across a hyperlink. The overall conditional distribution is expressed as

P(Y | X,R) = (1/Z(X,R)) exp(∑i φ_i + ∑{(i,j)} ψ_{ij}),

where Z is the partition function. Because Z is intractable for large graphs, the authors rely on approximate inference techniques.

Training proceeds by maximizing the conditional log‑likelihood of the observed labels, a truly discriminative objective. The authors employ stochastic gradient ascent with L2 regularization, and they introduce weight‑tying across identical relation types to reduce the number of free parameters. When only a subset of labels is observed, they use variational belief propagation to compute the required expectations for the gradient. This approach yields efficient parameter estimation even on graphs with thousands of nodes and edges.

For prediction, the learned CMN is used in a collective classification setting. An initial labeling is obtained from a standard independent classifier (e.g., an SVM). Then loopy belief propagation (LBP) iteratively refines the label distribution by passing messages along the relational edges, allowing information to flow through cycles. The authors demonstrate that after a few iterations the marginal label probabilities converge and the overall classification accuracy improves monotonically.

The experimental evaluation focuses on a real‑world web‑page classification dataset (similar to WebKB). Each page is described by textual features and a hyperlink graph. The proposed CMN model is compared against (1) independent SVM and Naïve Bayes baselines, (2) a PRM implementation using a directed Bayesian network, and (3) variants of the CMN without relational potentials. Results show that incorporating relational edge potentials yields a substantial boost: overall accuracy rises from roughly 78 % (independent baselines) to about 86 % with the full CMN, and the improvement is especially pronounced for minority classes where relational context compensates for sparse local features. The PRM baseline performs only marginally better than independent classifiers, confirming that the directed acyclic restriction and generative training hinder its ability to exploit relational information.

In the discussion, the authors highlight several key insights. First, undirected models naturally accommodate cycles, making them well‑suited for many relational domains where feedback loops are intrinsic. Second, discriminative training aligns the learning objective with the downstream task, leading to better predictive performance than generative alternatives. Third, approximate inference via belief propagation, while not exact, provides a practical trade‑off between computational cost and accuracy, enabling scalable collective classification. The paper also acknowledges limitations: the quality of the approximation depends on graph structure, and learning can become expensive for extremely dense networks. Future work is suggested in the direction of more advanced variational methods, stochastic inference, and extensions to multi‑relational or heterogeneous graphs.

Overall, the contribution of the paper is threefold: (1) a principled conditional Markov network formulation for relational data that overcomes the acyclicity constraint of PRMs, (2) an effective discriminative training algorithm that directly optimizes conditional likelihood, and (3) a demonstration that collective inference on the learned model yields significant accuracy gains on a realistic hyper‑text classification task. The approach opens the door for applying discriminative undirected graphical models to a broad range of relational learning problems, including social‑network analysis, knowledge‑graph completion, and biomedical entity classification.