Phi-Former: A Pairwise Hierarchical Approach for Compound-Protein Interactions Prediction

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Drug discovery remains time-consuming, labor-intensive, and expensive, often requiring years and substantial investment per drug candidate. Predicting compound-protein interactions (CPIs) is a critical component in this process, enabling the identification of molecular interactions between drug candidates and target proteins. Recent deep learning methods have successfully modeled CPIs at the atomic level, achieving improved efficiency and accuracy over traditional energy-based approaches. However, these models do not always align with chemical realities, as molecular fragments (motifs or functional groups) typically serve as the primary units of biological recognition and binding. In this paper, we propose Phi-former, a pairwise hierarchical interaction representation learning method that addresses this gap by incorporating the biological role of motifs in CPIs. Phi-former represents compounds and proteins hierarchically and employs a pairwise pre-training framework to model interactions systematically across atom-atom, motif-motif, and atom-motif levels, reflecting how biological systems recognize molecular partners. We design intra-level and inter-level learning pipelines that make different interaction levels mutually beneficial. Experimental results demonstrate that Phi-former achieves superior performance on CPI-related tasks. A case study shows that our method accurately identifies specific atoms or motifs activated in CPIs, providing interpretable model explanations. These insights may guide rational drug design and support precision medicine applications.

💡 Research Summary

Phi‑Former addresses a critical gap in compound‑protein interaction (CPI) prediction: most deep‑learning models operate solely at the atomic level, ignoring that functional groups (motifs) are the primary units of molecular recognition. The authors propose a hierarchical representation that simultaneously encodes compounds and proteins as both atom‑level graphs and motif‑level graphs. Motif graphs are constructed by breaking single torsional bonds in small molecules while preserving double, triple, and ring bonds; for proteins, the backbone is kept intact and only side‑chains are split, reflecting that most interactions occur at side‑chains. Each motif is represented by the centroid of its constituent atoms and an embedding averaged from those atoms.

Both atom and motif graphs are processed by Graph Transformers, chosen for their ability to capture long‑range dependencies without over‑smoothing. Spatial Positional Encoding (SPE) converts Euclidean distances into Gaussian‑kernel features, which are added as attention bias to ensure rotation and translation invariance. The same encoder architecture is used for atoms, motifs, and a third encoder that treats motif embeddings as prior knowledge when encoding atoms, thereby enforcing a hierarchical constraint.

The pre‑training stage employs a distance‑based self‑supervised learning (SSL) task. Inter‑molecular distances between compound and protein nodes are masked, and the model must predict them. Three loss terms are defined: (1) atomic intra‑loss (L_V) predicts masked atom‑atom distances, (2) motif intra‑loss (L_M) predicts masked motif‑motif distances, and (3) an inter‑level loss (L_V|M) predicts atom‑atom distances conditioned on the already‑known motif‑motif distances. This trio of losses forces consistency across hierarchical levels, embodying the chemical intuition that motifs constrain atomic interactions.

During fine‑tuning, the pretrained encoders generate representations that are concatenated and fed to a simple linear head to regress binding affinity (pK_a). Experiments on PDBBind‑2019 (training) and CASF‑2016 (testing) show that Phi‑Former achieves an RMSE of 1.159 and Pearson correlation of 0.846, outperforming strong baselines such as SS‑GNN, OnionNet, and a UniMol‑based pretrained model. Ablation studies confirm that the hierarchical pre‑training contributes significantly to the performance gain.

A case study on π‑π stacking illustrates the model’s chemical consistency. An atom‑only version fails to detect the interaction, predicting an unrealistic 6 Å separation, whereas the motif‑constrained Phi‑Former correctly predicts the close aromatic stacking. Another example (Figure 1) demonstrates that the model avoids implausible C‑O hydrogen bonds by recognizing the underlying carbonyl‑pyridine motif affinity.

The paper’s contributions are threefold: (1) a novel pairwise hierarchical pre‑training framework for CPI, (2) intra‑ and inter‑level loss functions that enable mutually beneficial learning across atom and motif representations, and (3) comprehensive evaluation showing superior predictive performance and interpretability. Limitations include reliance on rule‑based motif extraction, dependence on pre‑computed 3D structures, and current focus on CPI only. Future work aims to automate motif discovery, scale to larger datasets, and extend the approach to drug‑drug and protein‑protein interaction tasks, potentially providing a unified framework for diverse molecular interaction predictions.

Phi-Former: A Pairwise Hierarchical Approach for Compound-Protein Interactions Prediction

💡 Research Summary

Comments & Academic Discussion

Leave a Comment