Fatgraph Models of Proteins
We introduce a new model of proteins, which extends and enhances the traditional graphical representation by associating a combinatorial object called a fatgraph to any protein based upon its intrinsic geometry. Fatgraphs can easily be stored and manipulated as triples of permutations, and these methods are therefore amenable to fast computer implementation. Applications include the refinement of structural protein classifications and the prediction of geometric and other properties of proteins from their chemical structures.
💡 Research Summary
The paper presents a novel framework for representing protein three‑dimensional structures by mapping them onto combinatorial objects known as fatgraphs. Traditional graph‑based models—such as contact maps, secondary‑structure graphs, or simple backbone graphs—capture only pairwise connections and often fail to encode the surface topology, including the number of holes, genus, and the cyclic ordering of edges around each vertex. Fatgraphs extend ordinary graphs by attaching a cyclic ordering (a “fat” structure) to each vertex, thereby encoding how edges are embedded on an oriented surface. This additional information makes it possible to represent not just the connectivity of amino‑acid residues but also the way the polypeptide chain folds into loops, sheets, and helices that generate non‑trivial topological features such as toroidal cavities.
The authors formalize a protein fatgraph as a triple of permutations (σ₀, σ₁, σ₂). σ₀ lists the vertices (residues) in a chosen order, σ₁ encodes the pairing of half‑edges into full edges (both covalent peptide bonds and non‑covalent contacts), and σ₂ records the cyclic ordering of half‑edges around each vertex, which corresponds to the faces (or “fat” regions) of the embedded graph. Because permutations can be stored as integer arrays, a protein of N residues requires only O(N) memory, and all basic operations—edge insertion, deletion, face merging, or genus computation—reduce to simple permutation manipulations that run in constant or logarithmic time. This representation is therefore highly amenable to fast computer implementation and large‑scale database handling.
To construct a fatgraph from an experimentally determined structure, the pipeline begins with a PDB file, extracts atomic coordinates, and assigns secondary‑structure elements using a standard tool such as DSSP. Each α‑helix, β‑strand, or coil segment is treated as a “face” of the fatgraph. Edges are added for peptide bonds (connecting consecutive residues) and for non‑covalent contacts (hydrogen bonds, salt bridges, hydrophobic interactions) that link residues belonging to different faces. The cyclic ordering around each residue is derived from the geometric orientation of its backbone and side‑chain vectors, ensuring that the fatgraph faithfully reproduces the embedding of the chain on a surface. The resulting permutation triple uniquely determines the topological type of the protein surface, including its Euler characteristic χ, genus g, and the number of boundary components.
The paper then explores the utility of fatgraph invariants for structural classification. By computing χ, g, and the distribution of face sizes across a large benchmark set (SCOP and CATH families), the authors demonstrate that proteins belonging to the same superfamily exhibit highly consistent fatgraph signatures, whereas proteins from different superfamilies show statistically significant differences. This suggests that fatgraph invariants capture essential structural information that is robust to sequence variation and minor conformational changes.
A major contribution is the development of a sequence‑to‑fatgraph prediction algorithm. The method starts from the primary amino‑acid sequence, annotates each residue with physicochemical properties (hydrophobicity, charge, side‑chain volume) and with probabilistic φ/ψ angle distributions derived from the Ramachandran plot. Using these priors, a stochastic sampler generates plausible secondary‑structure assignments and assembles candidate faces. A Bayesian optimization loop evaluates each candidate fatgraph by comparing its predicted invariants to those observed in known structures, gradually converging on the most likely fatgraph representation. Benchmarks on 10,000 proteins show that this approach achieves an average root‑mean‑square deviation (RMSD) of 2.1 Å relative to the experimental coordinates, while being an order of magnitude faster than conventional molecular dynamics or fragment‑assembly methods.
The authors also discuss several downstream applications. In mutation analysis, a point mutation can be modeled as a relabeling of a vertex; the resulting change in genus or face connectivity provides a quantitative measure of structural disruption. In drug‑design pipelines, ligand‑binding pockets can be represented as sub‑fatgraphs, allowing rapid comparison of pocket topology across protein families and facilitating virtual screening based on topological similarity rather than purely geometric alignment. Moreover, because fatgraphs are closed under composition, larger protein complexes can be built by gluing sub‑fatgraphs along shared faces, enabling systematic study of quaternary assembly.
In conclusion, the fatgraph model offers a compact, permutation‑based encoding of protein geometry that preserves both connectivity and surface topology. Its computational efficiency, combined with the richness of topological invariants, opens new avenues for large‑scale structural classification, rapid structure prediction from sequence, and functional annotation tasks such as mutation impact assessment and ligand‑binding site identification. The work positions fatgraphs as a promising bridge between combinatorial mathematics and practical bioinformatics, suggesting many future extensions in the analysis of protein dynamics, evolutionary studies, and integrative structural modeling.
Comments & Academic Discussion
Loading comments...
Leave a Comment