Bubble statistics and positioning in superhelically stressed DNA
We present a general framework to study the thermodynamic denaturation of double-stranded DNA under superhelical stress. We report calculations of position- and size-dependent opening probabilities for bubbles along the sequence. Our results are obtained from transfer-matrix solutions of the Zimm-Bragg model for unconstrained DNA and of a self-consistent linearization of the Benham model for superhelical DNA. The numerical efficiency of our method allows for the analysis of entire genomes and of random sequences of corresponding length ($10^6-10^9$ base pairs). We show that, at physiological conditions, opening in superhelical DNA is strongly cooperative with average bubble sizes of $10^2-10^3$ base pairs (bp), and orders of magnitude higher than in unconstrained DNA. In heterogeneous sequences, the average degree of base-pair opening is self-averaging, while bubble localization and statistics are dominated by sequence disorder. Compared to random sequences with identical GC-content, genomic DNA has a significantly increased probability to open large bubbles under superhelical stress. These bubbles are frequently located directly upstream of transcription start sites.
💡 Research Summary
The paper develops a comprehensive theoretical and computational framework for studying the thermodynamic denaturation of double‑stranded DNA under superhelical stress. Two well‑established models are combined: the Zimm‑Bragg (ZB) model, an efficient approximation of the Poland‑Scheraga description of DNA melting, and the Benham model, which incorporates the energetic cost of imposed superhelicity. In the ZB formulation each base‑pair step i is assigned a binary variable θ_i (θ_i = 1 for a closed pair, 0 for an open pair). The Hamiltonian contains nearest‑neighbor stacking free energies Δg_i and a loop‑formation penalty Δg_loop≈20 k_BT. By constructing forward and backward transfer matrices T_i and T′i, the authors obtain exact expressions for the partition function, the site‑specific closing probability ⟨θ_i⟩, and the joint probability P{i,l} that a bubble of length l starts at position i. All of these quantities are computed in O(N) time, where N is the sequence length.
The Benham model adds a global term H_TW(n_o) that depends on the total number n_o of open base pairs. This term captures the release of twist when base pairs open, the over‑twist of the remaining double‑helical regions, and the elastic cost of the residual linking number. H_TW is highly nonlinear, containing a logarithmic contribution from the torsional entropy and a quadratic term proportional to the superhelical stiffness K≈2220 k_BT/N, as well as the torsional stiffness C≈3.09 k_BT·rad⁻¹·bp⁻¹. Direct evaluation of the full Benham partition function scales as O(N²) and becomes intractable for genomic lengths.
To overcome this bottleneck the authors introduce a self‑consistent linearization. They assume that fluctuations of n_o around its mean value \bar{n}o are small, expand H_TW to first order, and define an effective field h = ∂H_TW/∂n_o|{\bar{n}_o}. The resulting effective Hamiltonian takes the form H_eff ≈ H_ZB + H_TW(\bar{n}_o) + (N − \bar{n}_o)h − h∑_iθ_i. The field h is not known a priori; it must satisfy a self‑consistency condition h = (∂H_TW/∂n_o)(h n_o(h)). For homogeneous sequences analytical approximations are derived in the low‑temperature, weak‑supercoiling limit, yielding simple closed‑form expressions for h (Eqs. 20 and 22). For heterogeneous sequences the self‑consistency equation is solved numerically using a Newton‑Raphson method combined with bisection, requiring only 10–20 evaluations of the ZB transfer‑matrix routine to reach a relative precision of 10⁻⁴. Consequently, the entire genome can be processed in O(N) time.
The authors validate the method by comparing bubble‑opening free‑energy profiles obtained from their linearized approach with those generated by the official Benham web server; the agreement is excellent, with only minor deviations attributable to different parameterizations and the neglect of finite‑size effects. They then apply the technique to the Escherichia coli genome (≈4.6 Mbp) and to synthetic random sequences of identical GC content. Under physiological conditions (T = 37 °C,
Comments & Academic Discussion
Loading comments...
Leave a Comment