Current biological AI models lack interpretability -- their internal representations do not correspond to biological relationships that researchers can examine. Here we present CDT-II, an "AI microscope" whose attention maps are directly interpretable as regulatory structure. By mirroring the central dogma in its architecture, CDT-II ensures that each attention mechanism corresponds to a specific biological relationship: DNA self-attention for genomic relationships, RNA self-attention for gene co-regulation, and DNA-to-RNA cross-attention for transcriptional control. Using only genomic embeddings and raw per-cell expression, CDT-II enables experimental biologists to observe regulatory networks in their own data. Applied to K562 CRISPRi data, CDT-II predicts perturbation effects (per-gene mean $r = 0.84$) and recovers the GFI1B regulatory network without supervision (6.6-fold enrichment, $P = 3.5 \times 10^{-17}$). Systematic comparison against ENCODE K562 regulatory annotations reveals that cross-attention autonomously focuses on known regulatory elements -- DNase hypersensitive sites ($201\times$ enrichment), CTCF binding sites ($28\times$), and histone marks -- across all five held-out genes. Two distinct attention mechanisms independently identify an overlapping RNA processing module (80% gene overlap; RNA binding enrichment $P = 1 \times 10^{-16}$). CDT-II establishes mechanism-oriented AI as an alternative to task-oriented approaches, revealing regulatory structure rather than merely optimizing predictions.
The central dogma 1 describes cellular information flow across three molecular layers: DNA encodes genetic instructions, RNA transmits and regulates this information, and proteins execute cellular functions. Artificial intelligence has transformed our ability to model each layer individually [2][3][4][5][6][7] , yet a fundamental limitation remains: current models lack interpretability 8,9 .
Their internal representations do not correspond to biological entities or relationships that researchers can examine and validate. The goal of biological research, however, is not merely to predict cellular responses but to understand the mechanisms that produce them. What biology needs is not task-oriented AI optimizing predictions, but mechanism-oriented AI revealing regulatory structure. To address this limitation, we reframe biological AI as an “AI microscope” whose attention maps are directly interpretable as regulatory structure. The Central Dogma Transformer 10 implements this principle by mirroring the central dogma in its architecture: DNA self-attention captures genomic relationships, RNA self-attention reflects gene co-regulation, and DNA-to-RNA cross-attention models transcriptional control. Because each attention mechanism corresponds to a specific biological relationship, the resulting maps provide direct readouts of regulatory organization. This architectural inductive bias fundamentally changes the meaning of prediction accuracy: unlike models optimizing prediction as an end goal, CDT-II uses prediction accuracy as evidence that the model has learned correct regulatory structure. As training progresses, improving predictions and discovering biology become the same objective.
Critically, CDT-II’s success stems not only from model architecture but also from task formulation. The model receives raw per-cell expression values as input and predicts expression changes (log2 fold changes) as output-without being provided the difference or any reference to the unperturbed state. This design forces the model to learn what constitutes “change” and which genes influence which: to predict how gene B changes when locus A is perturbed, the model must internalize the regulatory relationship between A and B. By not explicitly computing differences in the input, we ensure that attention maps reflect genuinely learned regulatory structure rather than artifacts of input preprocessing.
While CDT v1 required three pre-trained language models, CDT-II reduces the input requirements to genomic sequence embeddings and raw per-cell expression, enabling experimental biologists to observe regulatory networks in their own data. CDT-II is designed as an integration platform rather than a standalone model: pre-trained language models provide the best available representations of each molecular layer, while task-driven training on experimental data organizes this knowledge into interpretable regulatory maps. The DNA embedding component is deliberately modular, allowing it to be upgraded as genomic foundation models improve.
Because each attention mechanism corresponds to a specific biological relationship, CDT-II’s outputs generate experimentally testable hypotheses-creating a feedback cycle in which experimental validation improves the model, and the improved model generates sharper hypotheses for the next round of experiments.
We validated CDT-II on a large-scale K562 CRISPRi screen 11,12 , holding out five perturbation targets entirely-including GFI 1B, a master transcriptional regulator with hundreds of experimentally characterized trans-regulatory targets 12 -to test whether regulatory networks could be recovered from unsupervised attention patterns. Here we report five principal findings.
First, CDT-II achieves a validation Pearson correlation of 0.64 (per-gene mean r = 0.84) without requiring RNA or protein language models. Second, attention analysis recovers GFI 1B’s transregulatory network with 6.6-fold enrichment over random expectation (P = 3.5×10 -17 ), demonstrating unsupervised discovery of a known regulatory program. Third, two distinct attention mechanisms-RNA self-attention and DNA-to-RNA cross-attention-independently identify an overlapping RNA processing module (80% gene overlap) with highly significant functional enrichment (RNA binding, P = 1 × 10 -16 ), revealing hierarchical regulation across central dogma layers. Fourth, an ablation study establishes that gene set quality, determined by cross-dataset reproducibility, governs model resolution. Fifth, systematic comparison against ENCODE K562 regulatory annotations reveals that cross-attention autonomously focuses on known regulatory elements, including DNase hypersensitive sites and CTCF binding sites, across all five held-out genes. Together, these results demonstrate that CDT-II’s attention maps provide biologically meaningful observations of regulatory networks.
CDT-II implements a two-modality architecture that follows the directional logic of the cen
This content is AI-processed based on open access ArXiv data.