Circuits, Features, and Heuristics in Molecular Transformers

February 18, 2026

Reading time: 4 minute

...

📝 Original Info

Title: Circuits, Features, and Heuristics in Molecular Transformers
ArXiv ID: 2512.09757
Date: 2025-12-10
Authors: Kristof Varadi, Mark Marosi, Peter Antal

📝 Abstract

Transformers generate valid and diverse chemical structures, but little is known about the mechanisms that enable these models to capture the rules of molecular representation. We present a mechanistic analysis of autoregressive transformers trained on drug-like small molecules to reveal the computational structure underlying their capabilities across multiple levels of abstraction. We identify computational patterns consistent with low-level syntactic parsing and more abstract chemical validity constraints. Using sparse autoencoders (SAEs), we extract feature dictionaries associated with chemically relevant activation patterns. We validate our findings on downstream tasks and find that mechanistic insights can translate to predictive performance in various practical settings.

💡 Deep Analysis

📄 Full Content

Preprint. Work in progress. CIRCUITS, FEATURES, AND HEURISTICS IN MOLECULAR TRANSFORMERS Kristof Varadi∗, Mark Marosi, Peter Antal Budapest University of Technology and Economics ABSTRACT Transformers generate valid and diverse chemical structures, but little is known about the mechanisms that enable these models to capture the rules of molecular representation. We present a mechanistic analysis of autoregressive transformers trained on drug-like small molecules to reveal the computational structure under- lying their capabilities across multiple levels of abstraction. We identify computa- tional patterns consistent with low-level syntactic parsing and more abstract chem- ical validity constraints. Using sparse autoencoders (SAEs), we extract feature dictionaries associated with chemically relevant activation patterns. We validate our findings on downstream tasks and find that mechanistic insights can translate to predictive performance in various practical settings. 1 INTRODUCTION Designing molecules that satisfy multiple pharmacological and physicochemical constraints is a core challenge in drug development. While many architectures directly encode chemical invariants, adaptations of sequential architectures such as transformers (Vaswani et al., 2017; Radford et al., 2019) must induce these rules purely from data. Despite the lack of inductive bias, these systems have proven effective for de novo design (Honda et al., 2019; Chithrananda et al., 2020; Bagal et al., 2022; Ross et al., 2022). Such flexibility allows transformers to model complex distributions, making them popular for generative tasks where exploring specific regions of the computationally intractable chemical space is required. The simplicity of transformers also makes them opaque. Seminal work on molecular transformers identified patterns related to syntactic and chemical validity (Bagal et al., 2022), yet a direct un- derstanding of the processes involved in maintaining molecular representation is lacking. Modeling structured dependencies is a subject of active investigation in the broader transformer interpretability community (Hahn, 2020; Ebrahimi et al., 2020; Weiss et al., 2021; Yao et al., 2023), and distinguish- ing whether molecular transformers rely solely on memorization or have induced similar generalized algorithms for aspects of chemical validity is critical for applications in the life sciences. This work presents a mechanistic analysis of an autoregressive transformer trained on commercially available drug-like molecules. We perform experiments to locate computational units that facilitate molecular generation. The results presented show that molecular transformers can develop a number of specialized mech- anisms involved in maintaining syntactic and chemical validity during inference. Using dictionary learning, we obtain human-understandable patterns that correspond to various chemical substruc- tures. Performance on downstream tasks suggests that sparse activations produce useful features for molecular property prediction. 1.1 CONTRIBUTIONS We train an autoregressive transformer on millions of molecules and conduct a mechanistic analysis spanning multiple levels of abstraction. We summarize our main findings below. ∗Correspondence to kristofvaradi@edu.bme.hu 1 arXiv:2512.09757v1 [cs.LG] 10 Dec 2025 Preprint. Work in progress. 1. Ring and Branch Circuits: We identify attention heads that implement matching of SMILES grammar elements, including a multi-head circuit for ring closures and a specialized branch- balancing head, and show via targeted ablations that some of these heads are load-bearing for validity (Section 3.1). 2. Valence Tracking: We find a distributed linear representation of valence capacity in the residual stream and demonstrate that interventions along this direction monotonically modulate bond- order predictions at decision points in a chemically consistent way (Section 3.2). 3. Chemical Features: Using SAEs, we extract sparse feature dictionaries that align with chemi- cally meaningful activation patterns and develop a fragment-screening pipeline that links latents to functional groups with reduced manual inspection (Section 4). 4. Practical Applications: We show that SAE-derived features improve transformer embeddings on several property prediction tasks and are competitive with, and often complementary to, ECFP fingerprints and supervised baselines (Section 5.1). We inject a small number of SAE features into transformer activations at inference resulting in increased structural similarity, often steering samples towards desired regions of chemical space (Section 5.2). 2 PRELIMINARIES 2.1 BACKGROUND Molecular Representation. Transformers require a consistent, machine-readable representation of chemical structures. We employ the Simplified Molecular-Input Line-Entry System (SMILES) (Weininger, 1988), which encodes the complete molecular topology as ASCII strings through a for- mal grammar. Although a num

📄 Read Full PDF on ArXiv