Information Geometry of Absorbing Markov-Chain and Discriminative Random Walks
Discriminative Random Walks (DRWs) are a simple yet powerful tool for semi-supervised node classification, but their theoretical foundations remain fragmentary. We revisit DRWs through the lens of information geometry, treating the family of class-specific hitting-time laws on an absorbing Markov chain as a statistical manifold. Starting from a log-linear edge-weight model, we derive closed-form expressions for the hitting-time probability mass function, its full moment hierarchy, and the observed Fisher information. The Fisher matrix of each seed node turns out to be rank-one, taking the quotient by its null space yields a low-dimensional, globally flat manifold that captures all identifiable directions of the model. Leveraging the geometry, we introduce a sensitivity score for unlabeled nodes that bounds, and in one-dimensional cases attains, the maximal first-order change in DRW betweenness under unit Fisher perturbations. The score can lead to principled strategies for active label acquisition, edge re-weighting, and explanation.
💡 Research Summary
This paper provides a rigorous information‑geometric foundation for Discriminative Random Walks (DRWs), a semi‑supervised node classification method that conditions random walks on first hitting a labeled node of a target class. The authors model the graph with a log‑linear edge‑weight function A_{ij}(θ)=A_{ij}^{(0)}exp(θ^T φ_{ij}), where θ∈ℝ^p are learnable parameters and φ_{ij} are fixed edge features. From this model they construct a transition matrix P_θ = D_θ^{-1}A_θ and partition the state space for a given class y into an absorbing set A_y (the labeled nodes of class y) and a transient set S_y (all other nodes, including unlabeled nodes and nodes of other classes).
With this block‑structured representation, the transient‑to‑transient submatrix M = P_{S_y S_y}(θ) is sub‑stochastic with spectral radius ρ(M)<1, guaranteeing that the fundamental matrix Z_θ = (I−M)^{-1} exists and can be expressed as a Neumann series. The first‑passage time T_y from a seed node q∈S_y to the absorbing set has a discrete probability mass function (pmf)
p_θ(t|q) = e_q^T M^{t−1} R, R = P_{S_y A_y}(θ)1_{|A_y|},
and a probability generating function f_θ(z|q) = e_q^T (I−zM)^{-1}R. Using standard matrix identities the authors obtain closed‑form expressions for all factorial moments, in particular the mean μ_θ(q)=e_q^T Z_θ 1 and the variance σ_θ^2(q)=e_q^T(2Z_θ−I)Z_θ 1−μ_θ(q)^2.
Crucially, the paper derives the derivatives of M and Z_θ with respect to θ via a continuous‑time Lyapunov equation: ∂_θ Z = Z (∂θ M) Z. Because the edge weights are log‑linear, the first and second derivatives of A_θ, D_θ, and consequently of P_θ, are available in closed form as tensor contractions of the feature vectors φ{ij}. The log‑pmf ℓ_θ(t,q)=log p_θ(t|q) yields a score vector ∇θ ℓ_θ(t,q) that can be summed over t to form the observed Fisher information matrix for a seed node q:
F_θ(q) = Σ{t≥1} p_θ(t|q) ∇_θ ℓ_θ(t,q) ∇_θ ℓ_θ(t,q)^T.
A striking result is that for any seed node the Fisher matrix has rank one. All score vectors lie in a single direction, implying that only one linear combination of the parameters is statistically identifiable; the remaining p−1 directions belong to the null space of the Fisher metric. By quotienting the parameter space by this null space the authors obtain a globally flat (zero‑curvature) manifold of dimension p−1 that captures all identifiable variations of the model.
On this quotient manifold the authors define a sensitivity score for any unlabeled node q:
s(q) = ‖Π_{⊥} ∇θ B_L(q,y)‖{F^{-1}},
where B_L(q,y) is the DRW betweenness (the expected number of times a class‑y walk visits q before absorption), Π_{⊥} projects onto the orthogonal complement of the Fisher null space, and F^{-1} denotes the pseudoinverse of the Fisher matrix. This score upper‑bounds the maximal first‑order change in betweenness induced by a unit‑norm perturbation measured in Fisher distance; in the one‑dimensional identifiable case the bound is tight.
The paper demonstrates three practical uses of the sensitivity score: (1) active label acquisition by selecting unlabeled nodes with high Fisher‑weighted influence, (2) edge re‑weighting that targets edges contributing most to the identifiable direction, thereby improving classification margins, and (3) interpretability through visualizing nodes with large scores as key decision points in the DRW. Simple synthetic graph experiments illustrate how the score behaves and how it can guide model refinement.
Overall, the work reframes DRWs as absorption processes on a Markov chain, treats the family of hitting‑time distributions as a statistical manifold, and leverages information geometry to expose identifiability, curvature, and sensitivity properties that were previously hidden. This provides a principled foundation for optimization, uncertainty quantification, and active learning in graph‑based semi‑supervised learning, and opens avenues for extensions to continuous‑time chains, feature‑dependent walks, and scalable numerical implementations.
Comments & Academic Discussion
Loading comments...
Leave a Comment