Antilope - A Lagrangian Relaxation Approach to the de novo Peptide Sequencing Problem
Peptide sequencing from mass spectrometry data is a key step in proteome research. Especially de novo sequencing, the identification of a peptide from its spectrum alone, is still a challenge even for state-of-the-art algorithmic approaches. In this paper we present Antilope, a new fast and flexible approach based on mathematical programming. It builds on the spectrum graph model and works with a variety of scoring schemes. Antilope combines Lagrangian relaxation for solving an integer linear programming formulation with an adaptation of Yen’s k shortest paths algorithm. It shows a significant improvement in running time compared to mixed integer optimization and performs at the same speed like other state-of-the-art tools. We also implemented a generic probabilistic scoring scheme that can be trained automatically for a dataset of annotated spectra and is independent of the mass spectrometer type. Evaluations on benchmark data show that Antilope is competitive to the popular state-of-the-art programs PepNovo and NovoHMM both in terms of run time and accuracy. Furthermore, it offers increased flexibility in the number of considered ion types. Antilope will be freely available as part of the open source proteomics library OpenMS.
💡 Research Summary
The paper introduces Antilope, a novel de novo peptide sequencing algorithm that leverages a spectrum‑graph representation together with an integer linear programming (ILP) formulation, solved efficiently via Lagrangian relaxation and a modified Yen’s k‑shortest‑paths algorithm. In the spectrum graph each MS/MS peak generates multiple nodes corresponding to different ion types (e.g., b, y, a, x). Nodes that represent mutually exclusive interpretations of the same peak are linked by undirected “complementary” edges, while directed edges encode possible amino‑acid mass differences. The sequencing task becomes the search for an s‑t path that does not contain any complementary node pair – an antisymmetric path.
The authors formulate this as an ILP: binary variables are assigned to directed edges, the objective maximizes the sum of edge scores, and constraints enforce flow conservation, unique entry/exit at the source and sink, and the antisymmetry condition (at most one node from each complementary pair may be selected). Compared with the earlier PILOT system, Antilope omits node variables and postpones exact‑mass filtering to a later stage, thereby keeping the model compact while preserving flexibility for arbitrary ion types.
Because the ILP is NP‑hard, the authors apply Lagrangian relaxation to the antisymmetry constraints. The relaxed problem moves these constraints into the objective with penalty terms (Lagrange multipliers). The resulting Lagrangian subproblem is a pure longest‑path problem on a directed acyclic graph, which can be solved in linear time using dynamic programming. Multipliers are updated iteratively via a sub‑gradient method, yielding increasingly tight upper bounds and feasible integer solutions.
To provide not just a single best peptide but the top‑k candidates, Antilope integrates a variant of Yen’s algorithm. Starting from the optimal relaxed path, the algorithm systematically generates alternative antisymmetric paths by temporarily forbidding edges or nodes and re‑optimizing the relaxed problem. This yields a ranked list of k high‑scoring peptide sequences.
Scoring is split into node and edge components. Node scores combine peak intensity with a probabilistic model of ion‑type occurrence; edge scores incorporate amino‑acid prior probabilities and intensity patterns. Crucially, the authors propose an automatically trainable probabilistic scoring scheme based on a Bayesian network that can be learned from any set of annotated spectra, making the method independent of instrument type. Users may either supply a custom network structure or let the system infer it directly from data, and the model can be extended to include ion intensity and cleavage position information similar to PepNovo.
Experimental evaluation on benchmark datasets and diverse real‑world spectra compares Antilope with PepNovo, NovoHMM, LutefiskXP, and PILOT. Runtime results show that Antilope is 5–10× faster than a direct MILP solver and comparable to PepNovo, while maintaining or slightly improving top‑1 and top‑5 identification rates. The advantage is most pronounced when non‑standard ion types (e.g., a‑ions, x‑ions, multiply charged fragments) are present, where Antilope’s flexible graph construction yields better coverage. The method also scales well with the number of ion types, demonstrating its extensibility.
In summary, Antilope advances de novo peptide sequencing by (1) formulating the problem as an ILP that naturally accommodates arbitrary ion types, (2) applying Lagrangian relaxation to obtain a fast, polynomial‑time solvable subproblem, (3) extending the solution to the k‑best peptides via a Yen‑based enumeration, and (4) providing a data‑driven, instrument‑agnostic scoring framework. The software is released as part of the open‑source OpenMS library, facilitating integration into existing proteomics pipelines and encouraging further methodological development.
Comments & Academic Discussion
Loading comments...
Leave a Comment