Circuits, Features, and Heuristics in Molecular Transformers
Reading time: 4 minute
...
📝 Original Info
Title: Circuits, Features, and Heuristics in Molecular Transformers
ArXiv ID: 2512.09757
Date: 2025-12-10
Authors: Kristof Varadi, Mark Marosi, Peter Antal
📝 Abstract
Transformers generate valid and diverse chemical structures, but little is known about the mechanisms that enable these models to capture the rules of molecular representation. We present a mechanistic analysis of autoregressive transformers trained on drug-like small molecules to reveal the computational structure underlying their capabilities across multiple levels of abstraction. We identify computational patterns consistent with low-level syntactic parsing and more abstract chemical validity constraints. Using sparse autoencoders (SAEs), we extract feature dictionaries associated with chemically relevant activation patterns. We validate our findings on downstream tasks and find that mechanistic insights can translate to predictive performance in various practical settings.
💡 Deep Analysis
📄 Full Content
Preprint. Work in progress.
CIRCUITS, FEATURES, AND HEURISTICS
IN MOLECULAR TRANSFORMERS
Kristof Varadi∗, Mark Marosi, Peter Antal
Budapest University of Technology and Economics
ABSTRACT
Transformers generate valid and diverse chemical structures, but little is known
about the mechanisms that enable these models to capture the rules of molecular
representation. We present a mechanistic analysis of autoregressive transformers
trained on drug-like small molecules to reveal the computational structure under-
lying their capabilities across multiple levels of abstraction. We identify computa-
tional patterns consistent with low-level syntactic parsing and more abstract chem-
ical validity constraints. Using sparse autoencoders (SAEs), we extract feature
dictionaries associated with chemically relevant activation patterns. We validate
our findings on downstream tasks and find that mechanistic insights can translate
to predictive performance in various practical settings.
1
INTRODUCTION
Designing molecules that satisfy multiple pharmacological and physicochemical constraints is a
core challenge in drug development. While many architectures directly encode chemical invariants,
adaptations of sequential architectures such as transformers (Vaswani et al., 2017; Radford et al.,
2019) must induce these rules purely from data. Despite the lack of inductive bias, these systems
have proven effective for de novo design (Honda et al., 2019; Chithrananda et al., 2020; Bagal
et al., 2022; Ross et al., 2022). Such flexibility allows transformers to model complex distributions,
making them popular for generative tasks where exploring specific regions of the computationally
intractable chemical space is required.
The simplicity of transformers also makes them opaque. Seminal work on molecular transformers
identified patterns related to syntactic and chemical validity (Bagal et al., 2022), yet a direct un-
derstanding of the processes involved in maintaining molecular representation is lacking. Modeling
structured dependencies is a subject of active investigation in the broader transformer interpretability
community (Hahn, 2020; Ebrahimi et al., 2020; Weiss et al., 2021; Yao et al., 2023), and distinguish-
ing whether molecular transformers rely solely on memorization or have induced similar generalized
algorithms for aspects of chemical validity is critical for applications in the life sciences.
This work presents a mechanistic analysis of an autoregressive transformer trained on commercially
available drug-like molecules. We perform experiments to locate computational units that facilitate
molecular generation.
The results presented show that molecular transformers can develop a number of specialized mech-
anisms involved in maintaining syntactic and chemical validity during inference. Using dictionary
learning, we obtain human-understandable patterns that correspond to various chemical substruc-
tures. Performance on downstream tasks suggests that sparse activations produce useful features for
molecular property prediction.
1.1
CONTRIBUTIONS
We train an autoregressive transformer on millions of molecules and conduct a mechanistic analysis
spanning multiple levels of abstraction. We summarize our main findings below.
∗Correspondence to kristofvaradi@edu.bme.hu
1
arXiv:2512.09757v1 [cs.LG] 10 Dec 2025
Preprint. Work in progress.
1. Ring and Branch Circuits: We identify attention heads that implement matching of SMILES
grammar elements, including a multi-head circuit for ring closures and a specialized branch-
balancing head, and show via targeted ablations that some of these heads are load-bearing for
validity (Section 3.1).
2. Valence Tracking: We find a distributed linear representation of valence capacity in the residual
stream and demonstrate that interventions along this direction monotonically modulate bond-
order predictions at decision points in a chemically consistent way (Section 3.2).
3. Chemical Features: Using SAEs, we extract sparse feature dictionaries that align with chemi-
cally meaningful activation patterns and develop a fragment-screening pipeline that links latents
to functional groups with reduced manual inspection (Section 4).
4. Practical Applications: We show that SAE-derived features improve transformer embeddings on
several property prediction tasks and are competitive with, and often complementary to, ECFP
fingerprints and supervised baselines (Section 5.1). We inject a small number of SAE features
into transformer activations at inference resulting in increased structural similarity, often steering
samples towards desired regions of chemical space (Section 5.2).
2
PRELIMINARIES
2.1
BACKGROUND
Molecular Representation.
Transformers require a consistent, machine-readable representation
of chemical structures. We employ the Simplified Molecular-Input Line-Entry System (SMILES)
(Weininger, 1988), which encodes the complete molecular topology as ASCII strings through a for-
mal grammar. Although a num