Disentangling multispecific antibody function with graph neural networks
Multispecific antibodies offer transformative therapeutic potential by engaging multiple epitopes simultaneously, yet their efficacy is an emergent property governed by complex molecular architectures. Rational design is often bottlenecked by the inability to predict how subtle changes in domain topology influence functional outcomes, a challenge exacerbated by the scarcity of comprehensive experimental data. Here, we introduce a computational framework to address part of this gap. First, we present a generative method for creating large-scale, realistic synthetic functional landscapes that capture non-linear interactions where biological activity depends on domain connectivity. Second, we propose a graph neural network architecture that explicitly encodes these topological constraints, distinguishing between format configurations that appear identical to sequence-only models. We demonstrate that this model, trained on synthetic landscapes, recapitulates complex functional properties and, via transfer learning, has the potential to achieve high predictive accuracy on limited biological datasets. We showcase the model’s utility by optimizing trade-offs between efficacy and toxicity in trispecific T-cell engagers and retrieving optimal common light chains. This work provides a robust benchmarking environment for disentangling the combinatorial complexity of multispecifics, accelerating the design of next-generation therapeutics.
💡 Research Summary
The paper tackles two major bottlenecks in the design of multispecific antibodies (msAbs): the emergent, non‑additive relationship between molecular architecture and biological activity, and the scarcity of comprehensive functional data for diverse formats. To address these challenges, the authors introduce a two‑stage computational framework.
First, they develop “Synapse,” a synthetic data generator that creates large‑scale, realistic functional landscapes. Domain sequences are sampled from a position‑aware statistical model derived from the Observed Antibody Space (OAS), ensuring that the amino‑acid composition mirrors real antibodies. Each domain receives an intrinsic fitness score via an extended Ehrlich function, which captures non‑linear, epistatic effects. The msAb is then represented as a graph where nodes are domains and edges encode physical connectivity. A global read‑out function aggregates weighted contributions from 1‑hop and 2‑hop neighbors, thereby modeling phenomena such as steric shielding, avidity gating, and positional effects. This pipeline yields datasets ranging from 10² to 10⁵ samples and spanning five levels of architectural complexity (monospecific to pentaspecific).
Second, the authors train a Graph Neural Network (GNN) to predict a scalar functional value (e.g., EC₅₀) from the antibody graph. They choose a Graph Isomorphism Network (GIN), which matches the expressive power of the 1‑WL test and performs iterative message passing to embed both domain features and topological context. As a baseline, a Multi‑Layer Perceptron (MLP) receives the same domain features but treats the molecule as an unordered set, thus ignoring connectivity. Comparative experiments reveal that for simple formats (mono‑ and bispecific) both models perform similarly because there is essentially only one possible topology. However, for trispesific, tetraspesific, and pentaspecific formats, the GIN consistently outperforms the MLP, especially as the training set grows. The MLP plateaus because it cannot distinguish different arrangements of identical domains, whereas the GIN leverages adjacency information to resolve these differences. An additional MLP variant that receives a one‑hot encoding of the format can match GIN performance on the training set but lacks the ability to generalize to unseen architectures, highlighting the advantage of a truly graph‑based model.
The authors further demonstrate transfer learning. A GIN pretrained on the large synthetic dataset is fine‑tuned on a small real‑world trispecific T‑cell engager (TCE) dataset. With as few as 10–50 experimental samples, the fine‑tuned model achieves markedly lower mean‑squared error than a model trained from scratch, showing that the synthetic pretraining provides useful inductive bias for data‑sparse regimes.
Two application case studies illustrate practical utility. In the first, the model is used to optimize Dual‑Antigen Targeted T‑cell Engagers (DAT‑TCEs) that bind CD3, Ly6E, and B7‑H4. By evaluating two topological variants—one with the high‑affinity B7‑H4 domain placed proximally (safe) and another placed distally (toxic)—the GNN predicts activity distributions and guides selection of the safer configuration while preserving potency. In the second case, the framework is applied to the discovery of a common light chain across multiple heavy‑chain variants. The GNN predicts the overall functional score for each light‑chain candidate in combination with the heavy chains, enabling rapid identification of the optimal shared light chain.
Overall, the paper contributes (1) a scalable synthetic data generation method that captures realistic, non‑linear structure‑function relationships; (2) a topology‑aware GNN architecture that outperforms sequence‑only baselines on complex msAb formats; and (3) a transfer‑learning strategy that leverages synthetic pretraining to achieve high predictive accuracy on limited experimental data. By explicitly modeling domain connectivity, the approach overcomes the limitations of traditional sequence‑based models and provides a versatile tool for rational msAb design. Future directions could include integrating experimental structural data (e.g., Cryo‑EM, SAXS) to enrich edge attributes, extending the framework to reinforcement‑learning‑driven architecture search, and applying the methodology to other modular biologics such as cytokine‑antibody fusions or bispecific nanobodies.
Comments & Academic Discussion
Loading comments...
Leave a Comment