Using higher-order Markov models to reveal flow-based communities in networks
Complex systems made of interacting elements are commonly abstracted as networks, in which nodes are associated with dynamic state variables, whose evolution is driven by interactions mediated by the edges. Markov processes have been the prevailing paradigm to model such a network-based dynamics, for instance in the form of random walks or other types of diffusions. Despite the success of this modelling perspective for numerous applications, it represents an over-simplification of several real-world systems. Importantly, simple Markov models lack memory in their dynamics, an assumption often not realistic in practice. Here, we explore possibilities to enrich the system description by means of second-order Markov models, exploiting empirical pathway information. We focus on the problem of community detection and show that standard network algorithms can be generalized in order to extract novel temporal information about the system under investigation. We also apply our methodology to temporal networks, where we can uncover communities shaped by the temporal correlations in the system. Finally, we discuss relations of the framework of second order Markov processes and the recently proposed formalism of using non-backtracking matrices for community detection.
💡 Research Summary
The paper addresses a fundamental limitation of traditional first‑order Markov models (M₁) for network dynamics: they assume that the future state depends only on the current node, ignoring any memory of the recent path. In many real‑world systems—human mobility, web traffic, citation cascades, email chains—the next step is strongly conditioned on the node visited just before. To capture this temporal dependence, the authors introduce second‑order Markov models (M₂), where the state space consists of directed edges (i→j) of the original network. A transition from state (i→j) to (j→k) is described by a transition matrix T₍→ij→→jk₎, which can be estimated directly from empirical pathway data. Consequently, the M₂ representation is equivalent to the line‑graph of the original network, yielding a directed “memory network” with 2M nodes (each undirected edge contributes two opposite‑direction nodes) and edges that encode permissible two‑step walks.
The central methodological contribution is the seamless integration of M₂ into the Markov Stability framework for community detection. Markov Stability R(t,P) measures the excess probability that a random walker, started in community C, remains in C after time t, compared to the probability at stationarity. By replacing the original adjacency‑based Laplacian with the transition matrix T of the memory network, and recomputing the stationary distribution π (the left eigenvector of the corresponding Laplacian), the same quality function can be evaluated on M₂. Varying the Markov time t reveals communities at multiple scales: short times expose fine‑grained flow‑retaining groups, long times merge them into larger modules. In the limit t→∞ the stability reduces to a spectral partition based on the Fiedler eigenvector, identical to classic spectral clustering.
Because M₂’s nodes are edges of the original graph, partitioning M₂ corresponds to an edge‑centric clustering of the original network. This naturally yields overlapping communities: a single original node can belong to several edge‑based groups, reflecting the multi‑membership patterns common in social systems. The authors emphasize that this edge‑based perspective is advantageous for networks where actors participate in multiple circles.
Empirical validation is performed on two datasets. First, the United States airline network: flight itineraries provide sequences of airports, from which a second‑order transition matrix is built. Compared with a standard M₁ analysis, the M₂ approach uncovers clusters that respect typical flight loops (e.g., A→B→A) and geographic regions more faithfully. Second, a temporally resolved school‑children interaction dataset (time‑stamped contacts) is transformed into edge‑sequences, yielding a memory network. Communities detected in this setting align with temporal activity patterns such as classroom sessions, lunch breaks, and after‑school activities—structures that are invisible to static, memory‑less methods.
Finally, the paper connects M₂ to the recently popular non‑backtracking matrix (NB). The NB matrix excludes immediate back‑tracking steps, improving spectral properties for sparse graphs. Since M₂ already forbids transitions that would immediately reverse direction (i→j followed by j→i is not allowed unless observed), the M₂ transition matrix embodies a data‑driven non‑backtracking constraint. The authors show that community detection using NB can be interpreted as a special case of the second‑order Markov stability formulation, positioning M₂ as a generalization that incorporates empirical memory while retaining the theoretical benefits of non‑backtracking walks.
In summary, the paper makes three key contributions: (1) a practical procedure to construct second‑order transition matrices from pathway data; (2) a rigorous extension of the Markov Stability quality function to memory networks, enabling multi‑scale, flow‑based community detection that naturally produces overlapping groups; and (3) a theoretical bridge between second‑order Markov dynamics and non‑backtracking spectral methods. These advances broaden the toolbox for analyzing dynamical processes on networks, especially when temporal correlations play a crucial role.
Comments & Academic Discussion
Loading comments...
Leave a Comment