PCEvo: Path-Consistent Molecular Representation via Virtual Evolutionary

PCEvo: Path-Consistent Molecular Representation via Virtual Evolutionary
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Molecular representation learning aims to learn vector embeddings that capture molecular structure and geometry, thereby enabling property prediction and downstream scientific applications. In many AI for science tasks, labeled data are expensive to obtain and therefore limited in availability. Under the few-shot setting, models trained with scarce supervision often learn brittle structure-property relationships, resulting in substantially higher prediction errors and reduced generalization to unseen molecules. To address this limitation, we propose PCEvo, a path-consistent representation method that learns from virtual paths through dynamic structural evolution. PCEvo enumerates multiple chemically feasible edit paths between retrieved similar molecular pairs under topological dependency constraints. It transforms the labels of the two molecules into stepwise supervision along each virtual evolutionary path. It introduces a path-consistency objective that enforces prediction invariance across alternative paths connecting the same two molecules. Comprehensive experiments on the QM9 and MoleculeNet datasets demonstrate that PCEvo substantially improves the few-shot generalization performance of baseline methods. The code is available at https://anonymous.4open.science/r/PCEvo-4BF2.


💡 Research Summary

PCEvo (Path‑Consistent Evolutionary) introduces a novel paradigm for molecular representation learning that explicitly models the incremental structural changes between similar molecules as a sequence of chemically feasible edit operations. Traditional approaches treat each molecule as an isolated static graph and train a model to directly predict properties from the final structure. This static formulation suffers from severe data inefficiency, especially in few‑shot regimes where labeled samples are scarce, leading to brittle structure–property mappings and poor generalization to unseen chemical space.

The core idea of PCEvo is to generate multiple “virtual evolutionary paths” connecting a source molecule Gₛ to a target molecule Gₜ. For each pair, the method first retrieves the top‑K nearest neighbors of Gₜ using Tanimoto similarity on extended‑connectivity fingerprints, ensuring that the two molecules share a substantial scaffold. It then solves a maximum common subgraph (MCS) problem to obtain an atom‑level alignment π and extracts a minimal set of graph edit operations S (add/remove/replace atoms, add/remove/change bonds) that transform Gₛ into Gₜ.

Because the order of applying these edits matters chemically (e.g., a bond cannot be added before its constituent atoms exist), the edits are organized into a directed acyclic dependency graph G_dep. An edge oᵢ → oⱼ indicates that operation oⱼ depends on the completion of oᵢ. Any topological sort of G_dep yields a valid edit sequence, i.e., a feasible evolutionary path τ. PCEvo randomly samples up to Pₘₐₓ distinct topological sorts per pair, providing combinatorial data augmentation: the same net property change Δy = yₜ – yₛ must be learned from many different intermediate trajectories.

Each edit operation oₜ is embedded into a continuous vector ψ(oₜ) =


Comments & Academic Discussion

Loading comments...

Leave a Comment