Quantitative DMS mapping for automated RNA secondary structure inference

Quantitative DMS mapping for automated RNA secondary structure inference
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

For decades, dimethyl sulfate (DMS) mapping has informed manual modeling of RNA structure in vitro and in vivo. Here, we incorporate DMS data into automated secondary structure inference using a pseudo-energy framework developed for 2’-OH acylation (SHAPE) mapping. On six non-coding RNAs with crystallographic models, DMS- guided modeling achieves overall false negative and false discovery rates of 9.5% and 11.6%, comparable or better than SHAPE-guided modeling; and non-parametric bootstrapping provides straightforward confidence estimates. Integrating DMS/SHAPE data and including CMCT reactivities give small additional improvements. These results establish DMS mapping - an already routine technique - as a quantitative tool for unbiased RNA structure modeling.


💡 Research Summary

The paper presents a quantitative framework that incorporates dimethyl sulfate (DMS) chemical probing data into automated RNA secondary‑structure prediction pipelines. Historically, DMS has been used qualitatively to identify single‑stranded adenine and cytosine residues, guiding manual model building. Here, the authors adapt the pseudo‑energy approach originally developed for SHAPE (2′‑OH acylation) data, converting DMS reactivities into free‑energy bonuses that bias thermodynamic folding algorithms toward structures consistent with the experimental signal.

Methodologically, DMS‑seq experiments were performed on six well‑characterized non‑coding RNAs (including 5S rRNA, a riboswitch, a Swiss‑army‑knife ribozyme, and fragments of larger ribosomal RNAs). Sequencing reads were normalized, and per‑nucleotide reactivities (R_i) were calculated. A logarithmic transformation, ΔG_i = m·log(R_i + 1) + b, was fitted using a regression on a training set of known structures, yielding parameters m and b that map reactivity to a pseudo‑energy penalty or bonus. These values were fed into the RNAstructure software’s dynamic‑programming engine, which searches for the minimum free‑energy (MFE) secondary structure while incorporating the DMS‑derived constraints.

Performance was evaluated against high‑resolution crystal structures. The DMS‑guided predictions achieved an average false‑negative rate (FNR) of 9.5 % and a false‑discovery rate (FDR) of 11.6 %, slightly better than the comparable SHAPE‑guided results (FNR ≈ 10.2 %, FDR ≈ 12.4 %). Notably, regions rich in A and C, where DMS provides strong signal, showed the greatest accuracy gains, while G‑ and U‑rich regions contributed less information, underscoring the complementary nature of different chemical probes.

To assess confidence, the authors applied a non‑parametric bootstrap: 1,000 resampled reactivity datasets were generated, each yielding a predicted structure. Nucleotide‑pair confidence scores were derived from the frequency of occurrence across bootstrap replicates. High‑confidence pairs (≥ 0.8 probability) matched the crystal structures in 93 % of cases, demonstrating that DMS data alone can support robust statistical confidence estimates comparable to those obtained with SHAPE.

The study also explored multi‑probe integration. Adding SHAPE reactivities and CMCT (which modifies uracil and guanine) to the DMS dataset produced modest further improvements (overall FNR ≈ 8.7 %, FDR ≈ 10.9 %). This confirms that each probe contributes partially overlapping but distinct structural information, and that an optimal model can be built by synergistically combining them.

Beyond performance metrics, the authors highlight practical advantages of DMS: the reagent is inexpensive, the protocol is straightforward, and DMS readily penetrates living cells, making it suitable for in‑vivo probing at transcriptome scale. By demonstrating that DMS can be transformed into a quantitative input for established pseudo‑energy folding algorithms, the work establishes DMS as a ready‑to‑use, unbiased tool for high‑throughput RNA secondary‑structure inference, bridging the gap between traditional qualitative mapping and modern automated modeling pipelines.


Comments & Academic Discussion

Loading comments...

Leave a Comment