The entities of real-world networks are connected via different types of connections (i.e. layers). The task of link prediction in multiplex networks is about finding missing connections based on both intra-layer and inter-layer correlations. Our observations confirm that that in a wide range of real-world multiplex networks, from social to biological and technological, a positive correlation exists between connection probability in one layer and similarity in other layers. Accordingly, a similarity-based automatic general-purpose multiplex link prediction method -- SimBins -- is devised that quantifies the amount of connection uncertainty based on observed inter-layer correlations in a multiplex network. Moreover, SimBins enhances the prediction quality in the target layer by incorporating the effect of link overlap across layers. Applied to various datasets from different domains, SimBins proves to be robust and superior than compared methods in majority of experimented cases in terms of accuracy of link prediction. Furthermore, it is discussed that SimBins imposes minor computational overhead to the base similarity measures making it a potentially fast method, suitable for large-scale multiplex networks.
Deep Dive into SimBins: An information-theoretic approach to link prediction in real multiplex networks.
The entities of real-world networks are connected via different types of connections (i.e. layers). The task of link prediction in multiplex networks is about finding missing connections based on both intra-layer and inter-layer correlations. Our observations confirm that that in a wide range of real-world multiplex networks, from social to biological and technological, a positive correlation exists between connection probability in one layer and similarity in other layers. Accordingly, a similarity-based automatic general-purpose multiplex link prediction method – SimBins – is devised that quantifies the amount of connection uncertainty based on observed inter-layer correlations in a multiplex network. Moreover, SimBins enhances the prediction quality in the target layer by incorporating the effect of link overlap across layers. Applied to various datasets from different domains, SimBins proves to be robust and superior than compared methods in majority of experimented cases in term
Link prediction has been an area of interest in the research of complex networks for over two decades [1], studying the relationships between entities (nodes) in data represented as graphs. The main goal is to reveal the underlying truth behind emerging or missing connections between node pairs of a network. Link prediction methods have a wide range of applications, from discovery of latent and spurious interactions in biological networks (which is basically quite costly if performed in traditional methods) [2,3] to recommender systems [4,5] and better routing in wireless mobile networks [6]. Numerous perspectives have been adopted to attack the problem of link prediction.
Similarity-based methods tend to measure how similar nodes are as an indication of likelihood of linkage between them. This approach is a result of assuming two nodes are similar if they share many common features [7]. A whole lot of nodes’ features stay hidden (or are kept hidden intentionally) in real networks. Additionally, it is an interesting question that despite of hiding a considerable amount of network information, what fraction of the truth behind a process (e.g. link formation) can still be extracted by solely including structural features? That is one of the main drives to utilize structural similarity indices for link prediction. Several different classifications of similarity measures have been proposed, among all, classifying based on locality of indices is of great importance. To name a few, Common Neighbors (CN) [1], Preferential Attachment (PA) [8], Adamic-Adar (AA) [9] and Resource Allocation (RA) [10] are popular indices focusing mostly on nodes’ structural features, each with unique characteristics. Despite their simplicity, these indices are popular due to their low computational cost and reasonable prediction performance. On the other hand, global indices take features of the whole network structure into account, tolerating higher cost of computation, usually in favor of more accurate information. Take length of paths between pairs of nodes for instance, which the well-known Katz [11] index operates on. Average Commute Time (ACT) [1] and PageRank [12] are some other notable global indices. Somewhere in between lies the quasi local methods which inherit properties from both local and global indices meaning that although they utilize some global network information, computational complexity is kept comparable to local methods, such as the Local Path (LP) [13] index and Local Random Walk (LRW) [14]. For more detailed information on these similarity indices (also described as unsupervised methods in the literature [15]), readers are advised to refer to [16].
Some researchers have tackled the link prediction problem using the ideas of information theory; as in [17] mutual information (MI) of common neighbors is incorporated to estimate the connection likelihood of a node pair. Moreover, Path Entropy (PE) [18] similarity index has been conducted which not only takes quantity and length of paths between a pair of nodes into account, but also considers the entropy of those paths affecting connection likelihood of the pair.
From a coarse-grained point of view, supervised models of link prediction reside in a different class than aforementioned unsupervised ones. They learn a group of parameters by processing input graph and use certain models, such as feature-based prediction (HPLP [19]) and latent feature extraction (Matrix Factorization [15]). Representation learning has helped automating the whole process of link prediction especially feature selection, one such example method is node2vec [20]. Learning-based methods usually lead to better results compared to similarity-based counterparts, but this does not mean that unsupervised models should be considered obsolete. On the one hand, unsupervised models provide a clearer insight on underlying characteristics of networks, take common neighbors (CN) for example which indicates the high clustering property of networks [18] or Adamic-Adar index which is based on the size of common nodes’ neighborhoods [9]. On the other hand, unsupervised methods can take much less computation effort, which makes them suitable for online prediction without any high cost training phase or feature selection process [21].
As said so far, complex networks research was focused on single-layer networks (simplex or monoplex) for many years. The study of multi-layer (multiplex or heterogeneous) networks has gained the attention of researchers in the past few years. Refs. [22,23] provide noteworthy reviews on history of multi-layer networks. Attempts for multi-layer link prediction are not abundant in which some of them are introduced here.
Hidden geometric correlation in real multiplex networks [24] is an interesting work which depicts how multiplex networks are not just random combinations of single-layer networks. They employ these geometric correlations for trans-layer link prediction i.e. inc
…(Full text truncated)…
This content is AI-processed based on ArXiv data.