DHLP 1&2: Giraph based distributed label propagation algorithms on heterogeneous drug-related networks
Background and Objective: Heterogeneous complex networks are large graphs consisting of different types of nodes and edges. The knowledge extraction from these networks is complicated. Moreover, the scale of these networks is steadily increasing. Thus, scalable methods are required. Methods: In this paper, two distributed label propagation algorithms for heterogeneous networks, namely DHLP-1 and DHLP-2 have been introduced. Biological networks are one type of the heterogeneous complex networks. As a case study, we have measured the efficiency of our proposed DHLP-1 and DHLP-2 algorithms on a biological network consisting of drugs, diseases, and targets. The subject we have studied in this network is drug repositioning but our algorithms can be used as general methods for heterogeneous networks other than the biological network. Results: We compared the proposed algorithms with similar non-distributed versions of them namely MINProp and Heter-LP. The experiments revealed the good performance of the algorithms in terms of running time and accuracy.
💡 Research Summary
The paper addresses the growing challenge of extracting knowledge from large heterogeneous complex networks, which consist of multiple node and edge types. Traditional label‑propagation methods, while effective on homogeneous graphs, struggle with scalability and memory consumption when applied to massive heterogeneous biomedical networks such as drug‑disease‑target graphs. To overcome these limitations, the authors introduce two distributed label‑propagation algorithms—DHLP‑1 and DHLP‑2—implemented on Apache Giraph, a Hadoop‑based graph‑processing framework that follows the Bulk‑Synchronous Parallel (BSP) model.
DHLP‑1 retains the classic label‑propagation scheme: each vertex maintains a label vector, receives label contributions from its neighbors in each super‑step, and updates its own vector by averaging. The novelty lies in parallelizing this process across a cluster, using Giraph’s message‑passing infrastructure and sparse vector representations to keep memory usage low. DHLP‑2 extends DHLP‑1 by explicitly modeling the heterogeneity of the network. It assigns a weight to each relationship type (e.g., drug‑target, disease‑target, drug‑disease) and performs propagation in a staged manner—first across one relation, then using the updated labels for the next relation. This staged, weighted approach allows the algorithm to emphasize biologically more informative edges and to mitigate the dilution of signal that can occur when all edge types are treated equally.
Both algorithms incorporate convergence monitoring at the master node by aggregating the L2 norm of label changes across the whole graph. The authors also filter out negligible messages before transmission, further reducing network traffic and improving runtime.
The experimental evaluation uses a real‑world biomedical network constructed from public resources such as DrugBank, the Comparative Toxicogenomics Database, and UniProt. The graph contains roughly 5,000 drugs, 3,000 diseases, and 4,000 protein targets, linked by about 30,000 heterogeneous edges. The task is drug repositioning: predicting novel drug‑disease associations. Performance is measured with Area Under the ROC Curve (AUC), Area Under the Precision‑Recall Curve (AUPR), and Top‑K hit rates, using known drug‑disease pairs as ground truth.
Results show that DHLP‑1 outperforms the non‑distributed MINProp algorithm by a factor of 5–7 in execution time while achieving comparable AUC (≈0.91) and Top‑10 hit rate (≈75%). DHLP‑2 surpasses Heter‑LP, the other state‑of‑the‑art non‑distributed method, delivering a 9‑fold speedup and slightly higher accuracy (AUC ≈0.93, Top‑10 hit rate ≈78%). The advantage of DHLP‑2 is most pronounced for rare diseases, where the weighted, staged propagation yields a noticeable boost in predictive power, suggesting that careful handling of edge‑type importance is crucial in heterogeneous settings.
The discussion highlights trade‑offs: DHLP‑1 is simpler and requires no prior knowledge of edge importance, making it a generic solution for any heterogeneous graph. DHLP‑2, while more complex and dependent on weight selection, offers superior accuracy, especially when certain relations are biologically more informative. Both methods assume a static graph; handling dynamic updates or streaming data would require additional mechanisms, which the authors identify as future work.
In conclusion, the study demonstrates that Giraph‑based distributed label propagation can scale to large heterogeneous biomedical networks without sacrificing predictive performance. DHLP‑1 provides a robust baseline, and DHLP‑2 showcases how incorporating heterogeneity-aware propagation strategies can further improve results. The authors propose extending the framework to even larger multi‑omics networks, integrating automatic weight learning, and supporting dynamic graph scenarios, thereby broadening the applicability of distributed label propagation beyond drug repositioning to any domain involving complex heterogeneous graphs.
Comments & Academic Discussion
Loading comments...
Leave a Comment