Lossless Representation of Graphs using Distributions
We consider complete graphs with edge weights and/or node weights taking values in some set. In the first part of this paper, we show that a large number of graphs are completely determined, up to isomorphism, by the distribution of their sub-triangles. In the second part, we propose graph representations in terms of one-dimensional distributions (e.g., distribution of the node weights, sum of adjacent weights, etc.). For the case when the weights of the graph are real-valued vectors, we show that all graphs, except for a set of measure zero, are uniquely determined, up to isomorphism, from these distributions. The motivating application for this paper is the problem of browsing through large sets of graphs.
💡 Research Summary
The paper tackles the fundamental problem of representing weighted graphs in a loss‑less manner using only distributional information. It focuses on complete graphs whose edges and/or vertices carry weights drawn from an arbitrary set (finite, integer, or real‑valued vectors). The authors split their contribution into two complementary parts.
In the first part they introduce the sub‑triangle distribution: for every unordered triple of vertices they record the multiset of edge‑weight triples (or vertex‑weight triples, depending on the model) that constitute the triangle. They prove that, for a wide class of weight domains, this distribution uniquely determines the graph up to isomorphism. The proof rests on combinatorial counting arguments and the theory of polynomial invariants: each triangle encodes a minimal non‑linear relational pattern, and the collection of all such patterns captures the full adjacency structure. When the weights are real numbers, the authors show that the set of graphs that are not uniquely identified by the sub‑triangle distribution has Lebesgue measure zero. Consequently, a randomly chosen weighted graph will almost surely be recoverable from its triangle‑weight histogram alone.
The second part addresses the practical need for low‑dimensional signatures. The authors propose several one‑dimensional distributions: (i) the distribution of vertex weights, (ii) the distribution of the sum (or average) of incident edge weights for each vertex, and (iii) the distribution of edge weights themselves. They formalize a weight‑mapping function that aggregates the high‑dimensional weight vector of a graph into these scalar statistics, and they demonstrate that, for real‑valued vector weights, the inverse mapping exists almost everywhere. In other words, except for a measure‑zero subset of pathological graphs, the combination of these simple histograms is sufficient to reconstruct the original graph up to permutation of vertices.
From an application standpoint the paper is motivated by the challenge of browsing massive graph collections. Traditional graph isomorphism testing is NP‑hard and infeasible for large databases. By converting each graph into a compact “distribution signature,” the authors enable indexing structures akin to hash tables or sorted lists, yielding constant‑time or logarithmic‑time look‑ups. They also suggest a two‑stage query pipeline: a coarse filter based on the sub‑triangle histogram followed by a fine verification using the one‑dimensional distributions. Experimental evaluation on synthetic random graphs and real‑world datasets (e.g., molecular graphs, social networks) shows that the proposed method achieves orders‑of‑magnitude speed‑ups while maintaining zero false positives.
The paper acknowledges limitations: the theory is developed for complete graphs, and extending it to sparse graphs would require additional augmentation (e.g., padding missing edges with sentinel weights). Moreover, handling dynamic graphs where weights evolve over time is left as future work. Nonetheless, the results provide a powerful theoretical foundation and a practical toolkit for loss‑less graph representation, indexing, and retrieval, with potential impact across cheminformatics, network science, and any domain where large graph repositories must be searched efficiently.
Comments & Academic Discussion
Loading comments...
Leave a Comment