Life over the past four billion years has been shaped by proteins and their capacity to assemble into three dimensional conformations. Protein sequence alignments have been the enabling technology for exploring the evolution and functional adaptation of proteins across the tree of life. Recent advancements in scaling the prediction of three dimensional protein structures from primary sequence alone, revealed that different modes of conservation and function operate on the sequence and structure level. This difference in protein conservation patterns and their underlying functional change that could emerge in suboptimal alignment configurations is often ignored in optimal protein alignment approaches. We introduce EMERALD-UI, an open-source interactive web application which is designed to reveal unexplored biology by visualising stable structural conformations or protein regions hidden in the suboptimal alignment space. Availability: EMERALD-UI is available at https://algbio.github.io/emerald-ui/. Contact: hdrost001@dundee.ac.uk or alexandru.tomescu@helsinki.fi.
When aligning two protein sequences, a user defined (and often heuristic) cost-matrix determines which alignment configuration is deemed most optimal by a respective alignment algorithm (1)(2)(3)(4). For two divergent protein sequences, for example, this could mean that thousands of alternative alignment configurations exist, but with lower total cost-matrix score. The core assumption of this optimal alignment approach is that due to its well-established heuristic insight, the optimal score is proportional to a biologically meaningful alignment configuration, especially when dealing with similar proteins with greater than 90% sequence identity (5). Although this assumption seems reasonable, when extrapolated to millions of pairwise comparisons, this approach can introduce a systematic optimal alignment bias, where alternative alignment configurations would have been more reasonable when biologically assessed (6). We recently introduced the open-source command line tool EMERALD which allows users to systematically explore the suboptimal alignment space and sample alignment-safe positions from the theoretically true alignment configuration (6). Using EMER-ALD, users can extract individual alignment-configuration positions which are stable across all or a proportion of suboptimal alignments. However, this tool lacks a comprehensive user interface and analytics capacity to reveal alternative biology or protein structural information hidden in suboptimal alignment configurations. EMERALD-UI solves this problem by providing an interactive web application to visualise the suboptimal alignment space between two protein sequences and map alignmentstable positions onto the three dimensional structure on the respective protein folds. We envision that users will be able to trial how alternative (suboptimal) alignment configurations will translate into differences in protein structural or proteinprotein interaction space. Especially, when deployed to proteins involved in protein complexes, we foresee new biological and functional insights emerging from new (and previously unrecognised) configurations of protein complexes that will arise from the systematic exploration of the suboptimal alignment space. Particularly when dealing with the comparison of highly divergent proteins, the suboptimal alignment space will reveal an evolutionary repertoire of structural conformations which can also be investigated in the context of their role in biological pathways, kinetic efficiency (e.g. enzymes) or drug target predictions.
In (6), we introduced the notion of alignment-safety for pairwise protein sequence alignments: EMERALD explicitly calculates the space of alternative (suboptimal) alignments. From this space, a set of safety windows can be extracted as contiguous alignment intervals that are shared across many (or all) alternative alignments. In detail, EMERALD considers a standard directed acyclic graph (DAG) representing a dynamic-programming alignment (using Needleman-Wunsch alignment with affine gaps) (7). To also represent sub-optimal alignments, this DAG incorporates all edges contained in an alignment of cost at most ∆ from the optimal one. In this graph, EMERALD computes alignment intervals (safety windows) that are common to an α proportion of all alignments in this DAG (α = 1 means common to all alignments). These safety windows are projected back onto the coordinates in the two protein sequences. In a large-scale experiment using the SwissProt database, we showed that especially for divergent sequence pairs, safety windows can retain a high fraction of structurally stable protein residues while covering only a small fraction of the sequence (6). This result supports the idea that exploring suboptimal alignment space reveals structurally significant conserved regions that a single optimal alignment would otherwise miss. In practice, the two EMERALD parameters ∆ and α impact alignment-safety computations as follows:
• Increasing ∆ enlarges the suboptimal alignment space.
Moreover, as ∆ increases, safety windows typically become shorter and/or fewer (and overall “safe coverage” tends to drop), because more alternative alignments create more disagreement about which residue pairings are robust.
• Increasing α makes the safety criterion stricter, meaning that safety-windows tend to become shorter or fewer, but result in higher-confidence. Decreasing α on the other hand makes the results more permissive, yielding longer or more windows, but less stable.
The new EMERALD-UI web application provides an interactive and responsive user interface for generating and exploring the suboptimal protein alignment space when comparing two protein sequences. Moreover, while EMERALD as a standalone command line tool requires user installation, EMERALD-UI is running natively in any modern webbrowser environment without manual installation requirements. The UI takes two user defined protein sequences as input and internally runs EM
This content is AI-processed based on open access ArXiv data.