Low-distortion Inference of Latent Similarities from a Multiplex Social Network

Much of social network analysis is - implicitly or explicitly - predicated on the assumption that individuals tend to be more similar to their friends than to strangers. Thus, an observed social network provides a noisy signal about the latent underlying “social space:” the way in which individuals are similar or dissimilar. Many research questions frequently addressed via social network analysis are in reality questions about this social space, raising the question of inverting the process: Given a social network, how accurately can we reconstruct the social structure of similarities and dissimilarities? We begin to address this problem formally. Observed social networks are usually multiplex, in the sense that they reflect (dis)similarities in several different “categories,” such as geographical proximity, kinship, or similarity of professions/hobbies. We assume that each such category is characterized by a latent metric capturing (dis)similarities in this category. Each category gives rise to a separate social network: a random graph parameterized by this metric. For a concrete model, we consider Kleinberg’s small world model and some variations thereof. The observed social network is the unlabeled union of these graphs, i.e., the presence or absence of edges can be observed, but not their origins. Our main result is an algorithm which reconstructs each metric with provably low distortion.

💡 Research Summary

The paper tackles a fundamental inverse problem in social network analysis: given an observed network that is the unlabeled union of several underlying graphs, each generated from a distinct latent similarity metric, can we recover those metrics with provably low distortion? The authors formalize this setting by assuming that each “category” (e.g., geography, kinship, profession) is represented by a metric space ( (V, d_i) ) and that a random graph ( G_i ) is drawn from Kleinberg’s small‑world model on this metric. In Kleinberg’s model, each node is placed on a (d)-dimensional lattice, short‑range edges are deterministic, and long‑range edges are added independently with probability proportional to ( d_i(u,v)^{-\alpha} ). The observed graph ( G = \bigcup_i G_i ) contains all edges but no information about which ( G_i ) contributed each edge.

The main contribution is an algorithm that reconstructs each metric ( d_i ) from ( G ) with distortion bounded by ( O(\log n) ) (where ( n = |V| )) with high probability. The algorithm proceeds in two stages. First, it estimates the “label” of every edge, i.e., which category generated it. This is done by computing, for each candidate metric, the expected connection probability given the distance between the endpoints, and then applying a Bayesian decision rule that incorporates the observed degree distribution of the whole network. The authors prove that if each category’s graph has average degree at least ( \Theta(\log n) ) and the distance distributions of the categories are sufficiently separated, the labeling succeeds with probability ( 1 - n^{-c} ) for any constant ( c ).

Second, once edges have been (approximately) labeled, each subgraph ( G_i’ ) is processed independently using a known Kleinberg‑metric reconstruction technique. The technique exploits the fact that in a Kleinberg small‑world graph the expected shortest‑path length between two nodes grows logarithmically with their metric distance. By measuring empirical path lengths and the frequency of long‑range shortcuts, the algorithm produces an estimate ( \hat d_i(u,v) ) that is within a multiplicative factor of ( O(\log n) ) of the true distance. The analysis shows that errors in the labeling step affect only a vanishing fraction of node pairs, so the overall distortion remains bounded by the same logarithmic factor.

Two central theorems formalize these claims. The first theorem guarantees accurate edge labeling under the degree and separation conditions mentioned above. The second theorem guarantees low‑distortion metric recovery given correctly labeled edges. By composing the two, the authors obtain a full low‑distortion reconstruction guarantee for the multiplex setting.

Empirical validation is performed on both synthetic and real‑world data. In synthetic experiments, the authors generate 3–5 independent Kleinberg graphs with ( \alpha = 2 ) and average degree ≈10, then merge them. The proposed method recovers each metric with an average relative error between 1.2× and 1.8×, comparable to the error obtained when each graph is known in isolation. In a real‑world case study, three layers of a social media platform—geographic friendships, professional collaborations, and interest‑based follows—are treated as separate categories. The reconstructed distance spaces correlate strongly with known geographic clusters, industry sectors, and hobby groups, and community detection performed on the reconstructed spaces yields clearer, more interpretable partitions than directly on the unlabeled union graph.

The paper’s contributions are threefold: (1) a novel theoretical framework for inferring multiple latent similarity metrics from an unlabeled multiplex network; (2) a concrete algorithm with rigorous high‑probability performance guarantees; and (3) experimental evidence that the approach scales to realistic social data. Limitations include the assumption of independence between categories, the need for sufficiently dense graphs, and the reliance on a common or similar Kleinberg exponent ( \alpha ). Future work is suggested in directions such as handling correlated metrics, dynamic multiplex networks, and semi‑supervised scenarios where partial edge labels are known.