The Infinity Mirror Test for Analyzing the Robustness of Graph Generators

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Graph generators learn a model from a source graph in order to generate a new graph that has many of the same properties. The learned models each have implicit and explicit biases built in, and its important to understand the assumptions that are made when generating a new graph. Of course, the differences between the new graph and the original graph, as compared by any number of graph properties, are important indicators of the biases inherent in any modelling task. But these critical differences are subtle and not immediately apparent using standard performance metrics. Therefore, we introduce the infinity mirror test for the analysis of graph generator performance and robustness. This stress test operates by repeatedly, recursively fitting a model to itself. A perfect graph generator would have no deviation from the original or ideal graph, however the implicit biases and assumptions that are cooked into the various models are exaggerated by the infinity mirror test allowing for new insights that were not available before. We show, via hundreds of experiments on 6 real world graphs, that several common graph generators do degenerate in interesting and informative ways. We believe that the observed degenerative patterns are clues to future development of better graph models.

💡 Research Summary

The paper introduces the “Infinity Mirror Test,” a stress‑testing framework designed to assess the robustness of graph generation models by repeatedly fitting a model to its own output. Traditional evaluation of graph generators typically compares a single generated graph G₀ to the original graph G using global statistics such as degree distribution, clustering coefficient, or assortativity. While useful, these metrics often mask subtle biases that only become apparent after the model’s assumptions are applied repeatedly.

The proposed test works as follows: starting from a real‑world network G, a chosen generator (e.g., Kronecker, Chung‑Lu, BTER) learns a model Θ₁ and produces G₀¹. This generated graph is then treated as a new training instance; the same generator learns Θ₂ from G₀¹ and creates G₀², and so on. The process is iterated k times (the authors use k = 10). At each iteration the generated graph G₀ᵏ is compared to the original G using a suite of metrics that capture both global and local structure: degree distribution, eigenvector centrality, hop‑plot (reachability vs. hop count), and Graphlet Correlation Distance (GCD), which measures the similarity of small subgraph (graphlet) frequencies.

Six diverse real networks are used: C. elegans neural network, a US power‑grid graph, the ArXiv GR‑QC co‑authorship network, an Internet router traffic graph, the Enron email exchange graph, and the DBLP co‑authorship graph. For each dataset the authors apply three generators (Kronecker, Chung‑Lu, BTER); ERGMs are excluded because they do not scale to the sizes considered.

Key findings:

Degree Distribution – Across most datasets and generators the degree distribution remains remarkably stable over ten recursions, indicating that global scale‑free or heavy‑tailed characteristics are preserved. The notable exception is the power‑grid data, where all generators gradually lose high‑degree nodes, reflecting a mismatch with the grid’s relatively uniform degree profile.
Eigenvector Centrality – This metric reveals generator‑specific degeneration. Chung‑Lu and BTER cause the ArXiv graph’s centrality distribution to flatten quickly, suggesting loss of “celebrity” nodes. Kronecker, however, retains centrality in the same dataset but fails dramatically on the power‑grid and router graphs, where its power‑law assumption leads to a rapid collapse of central nodes.
Hop Plot – Hop‑plot curves illustrate how quickly vertex neighborhoods expand. For Chung‑Lu and BTER the ArXiv hop plot flattens with each recursion, indicating an artificial shortening of paths. Conversely, all generators underestimate the hop‑plot shape of the power‑grid and router graphs already in the first generation, showing a systemic inability to capture long‑range connectivity in those networks.
Graphlet Correlation Distance – Local subgraph patterns expose degeneration invisible to global metrics. Even when degree distribution and clustering appear preserved, the frequency of triangles, squares, and 4‑cliques can drop sharply. This is especially pronounced for Chung‑Lu and BTER, whose generated graphs diverge from the original in GCD after only a few recursions, highlighting a loss of fine‑grained structural motifs.

Overall, the Infinity Mirror Test demonstrates that repeated application amplifies the latent biases of each generator, turning subtle one‑step discrepancies into pronounced structural failures. The test thus provides a more stringent, multi‑scale benchmark than conventional single‑step evaluations.

The paper acknowledges limitations: ERGMs are omitted due to scalability, the recursion depth is capped at ten, and only classic probabilistic generators are examined. Future work is suggested in three directions: (i) extending the test to modern deep generative models such as GraphVAE, GraphGAN, and diffusion‑based generators; (ii) exploring longer recursion horizons to study long‑term error accumulation; and (iii) developing bias‑correction mechanisms, possibly via meta‑learning or ensemble approaches, informed by the degeneration patterns uncovered.

In conclusion, the Infinity Mirror Test offers a powerful diagnostic tool for graph generation research, revealing how well a model preserves both global topology and local motif structure under repeated self‑application. This insight is valuable for any domain—social network analysis, biological network modeling, infrastructure simulation—where synthetic graphs must faithfully reflect the nuanced structure of real‑world networks.

The Infinity Mirror Test for Analyzing the Robustness of Graph Generators

💡 Research Summary

Comments & Academic Discussion

Leave a Comment