Background and Objective: Heterogeneous complex networks are large graphs consisting of different types of nodes and edges. The knowledge extraction from these networks is complicated. Moreover, the scale of these networks is steadily increasing. Thus, scalable methods are required. Methods: In this paper, two distributed label propagation algorithms for heterogeneous networks, namely DHLP-1 and DHLP-2 have been introduced. Biological networks are one type of the heterogeneous complex networks. As a case study, we have measured the efficiency of our proposed DHLP-1 and DHLP-2 algorithms on a biological network consisting of drugs, diseases, and targets. The subject we have studied in this network is drug repositioning but our algorithms can be used as general methods for heterogeneous networks other than the biological network. Results: We compared the proposed algorithms with similar non-distributed versions of them namely MINProp and Heter-LP. The experiments revealed the good performance of the algorithms in terms of running time and accuracy.
Complex networks are graphs with non-trivial and complicated structural features that do not occur in simple networks such as lattices or random graphs. Modeling different processes with complex networks has recently attracted the research community (Silva & Zhao, 2016).
Most real-world networks such as social networks and biological networks are modeled as heterogeneous networks, which consist of different types of nodes and edges and make the knowledge discovery from such networks complicated and time-consuming. In comparison with homogeneous networks, heterogeneous networks contain richer structural and semantic information. Therefore, gaining knowledge and mining such networks requires specific algorithms with features different from the algorithms that run on homogeneous networks. On the other hand, their growth rate is much higher than that of homogeneous networks. Therefore, with the advent of such networks and considering the heaviness of the required processes, some algorithms and platforms are required to provide better performance and scalability in the face of such structures.
There are different approaches to discovering knowledge in heterogeneous networks, including semi-supervised learning. Label propagation is among the well-known and successful methods in this domain (Silva & Zhao, 2016). Its strength is in utilizing both local and global features of the network for semi-supervised learning (Zhou, Bousquet, Lal, Weston, & Schölkopf, 2004). In different approaches of label propagation, specific labels are assigned to individual nodes of the network, and the label information is then repeatedly propagated to the adjacent vertices. The propagation process is finally converged toward minimizing the objective function (Shahreza, Ghadiri, Mousavi, Varshosaz, & Green, 2017).
Bulk Synchronous Parallel (BSP) (Valiant, 1990), which is a parallel and vertex-centric programming model, has been used by Malewics in Pregel system. Google has introduced Pregel and implemented in C/C ++ language for large-scale processing of graphs (Malewicz et al., 2010).
The computations in Pregel are carried out by a sequence of super-steps. In each super-step, every node that is involved in calculations 1) receives the sent values of adjacent nodes from the previous super-step, 2) updates its values and state, and 3) sends its updated values to its neighboring nodes, which will be available in the next super-step. The Apache Giraph framework is an iterative system of graph processing inspired by Pregel. Giraph is an open-source platform that executes on the Hadoop distributed infrastructure to conduct the computations on billions of edges and thousands of machines (Martella & Shaposhnik). Giraph has developed the initial model of Pregel with enhanced features such as out-of-core computation, master computation, shared aggregators, and combiners (Ching, Edunov, Kabiljo, Logothetis, & Muthukrishnan, 2015).
In the present paper, due to the iterative nature of label propagation algorithms, we selected Apache Giraph as a distributed graph processing platform that makes use of vertex-centric programming model and is a good fit for iterative and scalable algorithms.
The MapReduce programming model and the Hadoop distributed processing framework are designed mainly for analyzing unstructured and tabular data (Farhangi, Ghadiri, Asadi, Nikbakht, & Pitre, 2017;Maleki, Azadani, & Ghadiri, 2016). However, they are not suitable for graph processing due to iterative nature of graph algorithms (Martella & Shaposhnik). Moreover, the results of experiments have revealed that iterative graph processing with the BSP significantly outperforms MapReduce especially for algorithms with many iterations and sparse communication (Kajdanowicz, Kazienko, & Indyk, 2014).
In addition to Giraph, other applications such as Haloop, Twister, Graphlab, Graphx, and Grace have been introduced to process iterative graph algorithms (Ching et al., 2015;Martella & Shaposhnik).
Giraph has one or more than one of the following advantages over each of the abovementioned applications:
Finding the cause of a bug is faster and easier in Giraph.
Giraph is more memory-efficient than other methods, and the problem of the out-ofmemory process occurs less frequently. Even if this happens in Giraph, it can conduct computations thanks to the out-of-core feature it has.
Unlike some applications such as Pregel, Giraph is open-source.
It performs better than these applications for higher volumes of data 5) In comparison with some other platforms, it has less overhead in using the network.
Regarding the fact that many business and research institutions use Hadoop, making use of other systems requires the creation of a separate service to work with the graph, while Giraph is implemented on Hadoop, and this is not needed. Furthermore, Giraph has been written in Java, whereas some of the other applications like Graphlab is written in C/C++ and as a result
This content is AI-processed based on open access ArXiv data.