The Hill coefficient is often used as a direct measure of the cooperativity of binding processes. It is an essential tool for probing properties of reactions in many biochemical systems. Here we analyze existing experimental data and demonstrate that the Hill coefficient characterizing the binding of transcription factors to their cognate sites can in fact be larger than one -- the standard indication of cooperativity -- even in the absence of any standard cooperative binding mechanism. By studying the problem analytically, we demonstrate that this effect occurs due to the disordered binding energy of the transcription factor to the DNA molecule and the steric interactions between the different copies of the transcription factor. We show that the enhanced Hill coefficient implies a significant reduction in the number of copies of the transcription factors which is needed to occupy a cognate site and, in many cases, can explain existing estimates for numbers of the transcription factors in cells. The mechanism is general and should be applicable to other biological recognition processes.
Molecular recognition plays an important role in biological systems ranging from antigen-antibody identification to protein-protein binding [1]. In many cases the recognition process is driven by free-energy differences between a desired reaction and many competing undesired reactions [2][3][4][5]. One example, of particular importance, is that of protein-DNA interactions. Its role in understanding regulation in cells has led to large experimental effort to which aims at mapping the binding energy between transcription factors (TFs) in their specific state and different subsequences on the DNA [6]. It is known that to a good approximation the energy can be written as a sum of energies representing the binding energy of a nucleotide on the DNA to the region on the protein with which it is aligned [7,8]. Specifically, the binding energy of a nucleotide s = A, C, G, T to position j = 1, 2, ..., L (where L is the length of the protein's DNA binding domain in units of basepairs) is usually described by a 4 × L position weight matrix (PWM), ǫ s,j . By now, the PWM is known for many proteins and, together with a knowledge of the genomic sequence, it specifies the binding energy landscape of TFs to the DNA.
Irrespective of the energy landscape properties the activation of a cognate site-a specific location on the DNA-by a TF is usually described by a Hill curve [9]. Namely, if we consider a DNA molecular inside a container representing, say, a prokaryotic cell the activation probability of an operator by a TF is given by,
Here m is the number of TFs in the cell and at m = m 1/2 the occupation probability is one half (the conversion to concentrations is trivial). The Hill coefficient (HC), n, governs the steepness of the curve and is widely used to extract qualitative information about the regulation of genes from experimental data [10][11][12][13]. In the simplest cases, when there is no cooperative binding involved one expects n = 1. In the presence of cooperative interactions n is different than one. For example, in the case of activation by dimers one expects n = 2 if m is the number of monomers.
In this article we demonstrate that this simple intuitive picture for the Hill curve can fail. This is a direct consequence of a non-trivial combination of variations in the binding energy of TFs to different sites along the DNA and the steric repulsion between them. This leads to (a) a disorder enhanced Hill coefficient which is larger than one even in the absence of any cooperative binding to the operator, and (b) a dramatic increase in the occupation probability of the cognate site as compared to a system with no steric interactions between the TFs or a constant non-cognate binding energy. Importantly, we show that the results are essential for explaining the number of TFs found in cells.
The Hill curve, Eq. ( 1), is directly related to a formulation of the problem using statistical mechanics and the knowledge of the experimentally measured binding energy landscape of the TF to the DNA. To illustrate this we first focus on a simple case where: (i) There is no cooperativity associated with the structure of the TF or its binding properties, such that one would naively expects n = 1. (ii) The probability of the TF to be off the DNA or in a non-specific conformation on the DNA is negligible. Note that by a non-specific conformation we mean one where the TF is on the DNA but does not interact with the bases. This conformation, which typically occurs due to electrostatic interactions, exists The black solid lines are based on the freezing regime approximation, Eq. ( 13), while the red dashed lines are based on the non-steric approximation, Eq. ( 8). Filled, grey horizontal areas show the typical range of the TF’s copy number in E.coli. The symbol mc in (c) marks the crossover value of the Lrp TF between the uncrowded and crowded regimes, predicted by Eqs. ( 27) and ( 30). (d) The HC, obtained by a fit of the numerical data to Eq. 1, is shown as a function of the cognate site energy for the LexA TF. The solid line is the analytical prediction, Eq. ( 14), while the circles represent numerical data based on real DNA and cognate sites sequence. The filled circle represents the HC of a hypothetical cognate site with a perfect consensus sequence. The dashed line represents the result of the non-steric approximation, Eq. ( 5), which gives n = 1.
on any location along the DNA, including the cognate site. The effects of both simplifications are discussed in the Supplementary Information (SI). Using standard statistical mechanics it is straightforward to calculate m 1/2 and P T (m) numerically (see Methods). We use the PRODORIC database for PWMs of E.coli TFs [16], their cognate sequences and the genomic sequence of E.coli of length N = 2 × 4686077 [17]. In the next section we discuss the results of this approach and show that it can lead to rather counterintuitive Hill curves. We then give a simple theory which accounts for th
This content is AI-processed based on open access ArXiv data.