- Open Access
Stochasticity and noise-induced transition of genetic toggle switch
Journal of Uncertainty Analysis and Applications volume 2, Article number: 1 (2014)
The ability to predict and analyze the function of genetic circuits will enhance the design of autonomous, programmable, complex regulatory genetic structures. An abundance of modeling techniques has recently been developed to delineate simple genetic structures in terms of their constituents. Simple systems with characteristics of feedback inhibition, multi-stability, switching, and oscillatory expression have often been the focus. The present work is an attempt to improve existing deterministic models that fail to oblige to the crucial aspect of noise in genetic modeling.
The objective of this work is to analyze, model, and simulate the protein populations in gene expression mechanisms by resorting to stochastic algorithms. The system involves two types of genes; the protein produced from the expression of one gene is capable of turning off the expression of the other gene. Rates of degradation of these proteins are assumed to be proportional to their concentrations. The master equation of this ‘genetic toggle switch’ is formulated using the probabilistic population balance around a particular state and by considering five mutually exclusive events. The efficacy of the present methodology is mainly attributable to the ability to derive the governing equations for the means, variances, and covariance of the random variables by the method of system-size expansion of the nonlinear master Equation. A less laborious approach based on Kurtz’s limit theorems for the derivation of the stochastic characteristics is also presented for comparison. Solving the resultant ordinary differential equations governing the means, variances, and covariance of the master equations simultaneously using the published data yield information concerning not only the means of the two populations of proteins but also the minimal uncertainties of the populations inherent in the expressions. It is demonstrated that systems with small populations are susceptible to large internal fluctuations (or uncertainties) in their population evolution. Large uncertainties are observed after the populations enter the proximity of the saddle node, which is likely to cause transition of system’s steady state from one to another. Independent Monte-Carlo simulation runs clearly demonstrates that the occurrence of such internal noise-induced transition.
One of the earliest examples of a bistable genetic switch is represented in the rightward operator of bacteriophage lambda [1, 2]. The essential elements of this type of genetic switch, are a pair of promoters that each produces a repressor protein capable of inhibiting the production of the opposing repressor. Overlayed on these essential elements are several layers of regulatory nuance. To elucidate the impacts of these essential elements of a simplified regulatory circuit, a series of synthetic toggle switches were created.
Figure 1 shows the two-state genetic toggle switch consisting of two protein repressor genes and two promoters, which was investigated by Gardner et al. . Each promoter enables the production of one repressor and is inhibited by the other. They elegantly designed experiments that demonstrated switching of a toggle circuit from one steady state to another by switching system’s parameters across the bifurcation curve to a bistable region through either thermal inactivation of Repressor A or ligand binding-induced dissociation of the Repressor B-DNA complex. In the proximity of the bifurcation point, the final steady-state protein population possesses a bimodal distribution in their green fluorescent protein (GFP) fluorescence. It does not have a sharp jump from one fluorescence level to another, as the deterministic model predicts. The authors surmise that the stochastic nature of the dynamics blurs the bifurcation point.
McAdams and Arkin’s [4, 5] Monte-Carlo simulations of gene expression revealed the importance of fluctuations, or noises or uncertainties, of small systems. In such small systems, proteins are produced from an activated promoter in short bursts of variable numbers of proteins that occur at random time intervals. As a result, there can be large differences in the time between successive events in regulatory cascades across a cell population, which, in turn, creates both special and temporal heterogeneity of cell populations in biological systems. Soon after the discovery of the potential impacts of the stochasticity of genetic regulatory system, stochastic algorithms developed by chemical physicists have been introduced in analyzing gene expression (e.g., [6, 7]). The stochastic nature of a competitive expression mechanism can produce probabilistic outcomes in switching mechanisms that select between alternative regulatory paths, such as toggle switch.
Stochastic algorithms have been developed for analyzing noise of different origins and internal and external noises (e.g., [8–10]). External noises are the fluctuations created in an otherwise deterministic system by the application of an external random force, whose stochastic properties are supposed to be known. A Langevine equation is commonly adopted in the analysis of dynamics caused by external noises. Internal noise arises from discrete systems where only a limited number of variables affecting the populations of the discrete entities can be included in the analysis. Small discrete systems, such as genes of small populations, often exhibit notable internal fluctuations. A master equation, derived from probabilistic population balance around a particular state of the system by taking into account all mutually exclusive events, has been adopted this type of discrete state, continuous-time stochastic processes.
The stochasticity of gene expression is complicated by its nonlinearity. Multiple steady states, stability, and bifurcation in gene expressions (e.g., ) could mingle with the analysis of noise, or fluctuations. The efficacy of the master equation algorithm in gene expression is mainly attributable to its powerful ability to solve the nonlinear master equations through the system size expansion [9, 12]. In this approach, a suitable expansion parameter must be identified in the master equation. The expansion parameter represents the size of the fluctuations, and therefore, the magnitude of the jumps, or transitions, of system’s state. Since the internal noises are expected to be low when the system size is large, the system size has been proposed as an expansion parameter. Master equation formulation along with the system-size expansion has indeed applied to the analysis of noise in gene expression. It should be mentioned that the limit theorems of Kurtz [13–15] have rendered the complex procedure of system size expansion simple and highly accessible. Kurtz’s proof demonstrated the solution of a Langevine equation approaches to van Kampen’s system size expansion as the system size approaches to infinite.
Kepler and Elston  examined the stochastic dynamics of the single-gene system with and without feedback and a switching system composed of two mutually repressed genes. Several assumptions were made in their simplified model: the two genes share the same operator and same degradation rate, proteins bind to the operator as dimers, and rate of dimerization is fast. Both master equation and Monte-Carlo simulation were adopted in their study. Scott et al.  adopted the master equation along with the system size expansion algorithm in the estimation of internal noise of the single-gene system that involves the mRNA formation and degradation and protein formation and degradation.
The system size expansion has several limitations in modeling the gene regulatory process. It is a good approximation to the master equation for small internal noise and large system size. Moreover, the noise should be well within the boundary of attraction . Thus, noises in oscillatory process and those away from the steady states have been a focus of several studies. Tao et al.  studied the noise far from the steady states and revealed that during the approach to equilibrium, the noise is not always reduced by the strength of the feedback. This is contrary to results seen in the equilibrium limit which show decreased noise with feedback strength. Ito and Uchida  found that the internal noise of a regulatory single-gene system grows without bound in oscillatory networks and developed an alternative method for estimating the evolution of internal noise in such systems.
Kepler and Elston’s simulation work  demonstrated that simple noisy genetic switch have rich bifurcation structures. Among them, bifurcations driven solely by changing the rate of operator fluctuations even as the underlying deterministic system remains unchanged. They find stochastic bistability where the deterministic equations predict monostability and vice versa. Ochab-Marcinek  investigated the stationary behavior of a nonlinear system, a reduced, deterministic Yildirim and Mackey  model of the gene regulatory system, and discovered the transition of a steady state induced by noise. A perturbed Gaussian white noise term was introduced in the deterministic model followed by numerical simulations. Turcotte et al.  studied noise-induced stabilization of an unstable state of a genetic switch that undergoes a variety of bifurcations in response to parameter changes. Their Monte Carlo simulations showed that near one such bifurcation, noise induces oscillations around an unstable spiral point and thus effectively stabilizes this unstable fixed point.
In addition to the master equation algorithm, Monte Carlo simulation has been adopted in simulating the dynamic behaviors in genetic regulatory systems under the influences of internal noise (e.g., [11, 23, 24]). The Monte Carlo simulation shares the same assumption, the Markov property, as the master equation, and the noise can be obtained directly from master equation’s deterministic counterpart. Moreover, the Monte Carlo simulation is capable of revealing the various characteristics of nonlinear dynamic system, such as the number of steady states, bifurcation, and internal noises.
In this expositional work, the master equations are formulated by stochastic population balance. Van Kampen’s system size expansion of the resultant nonlinear master equation gives rise to the variances of the processes. We demonstrate the implementation of Kurtz’s limit theorems can efficiently achieve the same goal. Simulations are conducted based on both the master equations and the Monte Carlo procedure for three systems: bistable, monostable, and on the bifurcation curve. Finally, we demonstrate the possibility of transition induced by internal noises for a bistable system.
A genetic toggle switch with negative feedback to the genes consists of two mutually coupled genes. The transcription products of these genes are two inhibitory repressor proteins competing to shut off the production of two constitutive promoters [1, 3, 25]; the protein transcribed by a gene of one type is capable of deactivating the transcription of the other gene. A toggle switch typically has more than one possible stable steady state depending on the reaction parameters under consideration . There are a number of instances in nature where this switch-like behavior is utilized. The lysogeny/lysis switch of the bacteriophage λ virus infecting the bacterium Escherichia coli is a representative example and has been discussed in detail by Ptashne  and Ptashne and Gann .
Gardner  discussed results generated from their deterministic model of a negative feedback toggle switch. Each type of the repressor protein is involved in two types of processes. The first process corresponds to the production of the protein. The rate of protein production is proportional to the concentration of mRNA, which, in turn, is proportional to the concentration of the un-repressed gene, G. The repressor binding on un-repressed gene is commonly assumed to be in a quasi-steady state with the repressor, R, and the repressed gene, GR m , i.e.,
Moreover, by assuming the total number of un-repressed genes is much larger than that of R so that G remains constant during the process, it can be shown that the rate of production of protein is proportional to where K is the equilibrium constant of the above reaction and R the concentration of the repressor monomer [27–29]. The second process in the model of Gardner et al. is degradation of the protein that is assumed to be first order.
Similar to the work of Gardner, we will assume that the genes are in equilibrium with their repressed genes in the current work. The stochastic nature of a competitive expression mechanism can produce probabilistic outcomes in switching mechanisms that select between alternative regulatory paths, such as toggle switch.
The master equation describing the stochastic nature of the toggle switch is developed through the probabilistic population balance. The formulation of the master equation given below follows what Oppenheim et al. , Gardiner , and van Kampen  established. We have previously adopted this algorithm in the analysis of disease spread .
Let the random variables, N1(t) and N2(t) represent the populations of the repressor protein R1 and repressor protein R2 at time t, respectively. The random vector of the system is N(t) such that N(t) = [N1(t), N2(t)] and the realization of this random vector representing the state of the system at time t is given by n(t), where n(t) = [n1(t), n2(t)]. Moreover, the probability of the system to be in state n at time t is denoted by Pn 1,n 2(t) or P[n1(t), n2(t);t]. The following assumptions are imposed in driving the master equation governing the transition of the system among various states.
The random vector, N(t), is Markovian, i.e., for any set of successive times, t 1 < t 2 < … < t q , we have P [N(t q ) * N(t 1), N(t 2), , N(t q−1)] = P [N(t q ) * N(t q−1)].
The number of increments or decrements in population numbers of the classes depends only on the time interval, Δt, but not time, i.e., it is temporally homogeneous, signifying that N(Δt)and [N(t + Δt) − N(t)] are identically distributed.
The probability of an individual to produce or degrade is proportional to the duration of time interval, (t, t + Δt), if the value of Δt is sufficiently small.
The probabilities of two or more transitions to take place are negligible during the time interval, (t, t + Δt), so that at most, one transition occurs during this period.
Individual proteins in the same class have the same probability of contacting the genes, and therefore, have the same probability of repressing the genes. Similarly, the individual proteins in the same class have the same probability of being degraded.
Transition intensity functions
On the basis of the assumptions given in the proceeding subsection, the transition probability of each event can be written in terms of the transition intensity functions, k1, k2, α2, and α4, as follows:
The first transition-intensity function, k1, is the production probability of a type-1 repressor protein from a particular active (not repressed) gene of type-1 per unit time. Based on the assumption of temporal homogeneity, we have
where . By considering all active type-1 genes in the system, the probability that the population of the type-1 protein will increase by one is k1Ga 1, where Ga 1 denotes the number of active gene of type-1, i.e., the genes that are not repressed. Mathematically,
where f1 is the ratio of populations of active gene to total, active and repressed, genes of type-1. In writing the last line of the above statement, we assume that the total number of gene remains constant during the process of interest. Thus, the parameter, α1, is the probability that a particular active gene will transcribe and produce a type-1 protein per unit time multiplied by the total number of genes.
where K a is the equilibrium constant of the combination reaction of the active gene of type-1 and repressor, a m-mer, and m is the number of protein monomers of type-2 in the repressor. Combining the last two equations yields
The second transition intensity function, α2, is the overall consumption probability of a particular active protein of type-1 in time interval, (t, t + Δt), including its function in repressing protein type-2. Mathematically,
By considering all repressor protein of type-1 genes in the system, the probability that the population of the type-1 protein will decrease by one is α2n1, or,
By analogy, the third transition intensity function, k2, is the production probability of a type-2 repressor protein from a particular active (not repressed) gene of type-2 per unit time, or,
This definition will lead to the following transition probability:
where Ga 2 denotes the number of active gene of type-2, f2 the ratio of populations of active gene to total, active and repressed, genes of type-2, or,
K b the equilibrium constant of the combination reaction of the active gene of type-2 and repressor of type-1, a M-mer, and M is the number of protein monomers of type-1 in the repressor.
Also by analogy, the fourth transition-intensity function, α4, is the consumption probability of a particular active protein of type-2 during the time interval, (t, t + Δt), or,
By considering all repressor protein of type-2 genes in the system, we have,
It should be noted that the rates adopted in deterministic models and discussed earlier in the outset of the ‘Model Formulation’ section are used in defining the transition intensity functions below. The transition intensity functions have pivotal importance in master equation models and Monte Carlo simulations. More importantly, the adoption of deterministic rate constants in master equation is a cornerstone in the interpretation of intrinsic (or internal) noise van Kampen .
Based on the transition intensity functions defined above, the master equation can be obtained by taking probability balance of the following five mutually exclusive events leading to the evolution of the state of the system:
a R1 is produced while R2 remains constant
a R1 is degraded while R2 remains constant
a R2 is produced while R1 remains constant
a R2 is degraded while R1 remains constant
both R1 and R2 remain the same.
As illustrated in Figure 2, the probabilities that these five exclusive events will lead the system to state n at arbitrary time (t + Δt) can be written as follows:
where is the conditional probability of the system transition from state n′(t) to state n(t + Δt) per unit time.
Since these five events are mutually exclusive, we have
By substituting all the transition probabilities discussed in Equations 5 through 9 into the above expression, we obtain the probability of the system at state n at arbitrary time (t + Δt) as follows:
By rearranging the above equation and taking the limit as Δt → 0, we obtain the following master equation:
For convenience, the one-step operator, E, is defined through its effect on arbitrary function f(n) as van Kampen :
The master equation is rewritten compactly in terms of the one-step operator as follows:
The solution to the equation with the step operator yields the time-dependent joint probability distribution of the populations of repressor proteins.
System-size expansion based on van Kampen’s procedure
The approximation of the master equation, Equation 10 or 12, leads the evolution of the joint probability distribution of the populations of the two competing repressors, P n (t). Equation 10 comprises a set of ordinary differential equations with the joint probability function, P n (t), as its unknown. Each equation in the set represents a particular outcome of n; thus, solving Equation 12 for the joint probability distribution of an exceedingly large number of all possible n s is extremely difficult, if not impossible. In practice, however, it often suffices to determine only the expressions that govern a limited number of moments, especially the first and second moments, of the resultant population distribution. These expressions yield the means, variances, and covariances that can be correlated or compared with the experimental data.
Moreover, Equation 12 is nonlinear, which prevents the moments from being evaluated by averaging techniques or joint probability generating function techniques . This difficulty is circumvented by resorting to the system-size expansion, a rational approximation technique based on the power series expansion [9, 12, 31]. The technique gives rise to the deterministic macroscopic equations as well as the equations of fluctuations for the master equation.
To apply the system-size expansion, a suitable expansion parameter must be identified in the master equation, specifically in the transition intensity functions. The expansion parameter must govern the size of the fluctuations, and therefore, the magnitude of the jumps, or transitions. The macroscopic features are determined by the average behavior of all particles, while internal fluctuations are caused by the discrete nature of matter. Hence, we expect the fluctuations to be relatively small when the system size is large. The system size, Ω, has been proposed as an expansion parameter because it measures the relative importance of the fluctuations [9, 12, 31]. In the current genetic regulatory network, the total initial number of promoter population, or the total number of initial reactants, is chosen as Ω so that the noises estimated based on both the master equation and Monte Carlo simulations discussed below represent the standard deviations from the means.
For a linear system, fluctuations are of the order of Ω1/2 in a collection of Ω entities. As a result, their effect on the macroscopic properties is of the order of Ω−1/2[9, 12]. In the system under consideration, therefore, we expect that the joint probability, P n (t), will have a sharp maximum around the macroscopic value, n(t) = ΩΘ(t), with a width of the order of Ω1/2. Here, Θ(t) is a vector where elements are the mean numbers of the two protein populations, ∅(t) and θ(t) obtained through the solution of the macroscopic equations as will be elaborated later. To exploit these characteristics of the system, a new random vector Y(t) is defined as follows:
The equations of realizations of these expressions are given, respectively, by
Accordingly, the joint probability of n1 and n2 i.e., P n (t), is now transformed into that of y1 and y2, i.e., Ψ y (t). Subsequently, the new random vector, Y, the new joint probability distribution, Ψ y (t), and the definition of the one-step operator, E, Equation 11, are substituted into Equation 12. By expanding the right-hand side of the resultant expression into a Taylor’s series, the master equation in terms of the new variables is obtained, see Appendix 1. All appendices to this paper can be found in the supporting materials for this Journal.
Collecting the terms of order Ω1/2 in the right-hand side of the expanded equation gives rise to the following expressions governing the evolution of the macroscopic equation of the system:
where the constants, α1′, K a ′, α3′, and K b ′ correspond respectively to the parameters α1, K a , α3, and K b , normalized with Ω or a specific power of Ω so that collected terms in system size expansion have the same order of magnitude, i.e.,
Equations 17 and 18 are of the same forms as the macroscopic equations of Gardner .
Similarly, by collecting the terms of order Ω0 gives rise to the following linear Fokker-Plank equation , see Appendix 1, that governs the first and the second moments associated with the fluctuations of the system:
where the two matrices A and B are
A Fokker-Planck equation is considered linear if the coefficient matrix A, the drift term, is a linear function of Y and the coefficient matrix B, the diffusion term, is constant . Note that the macroscopic trajectories, N and 2, are functions of t only and they can be obtained by integrating Equations 17 and 18. Thus, the coefficients of the equation governing the fluctuations, A and B in Equations 22 and 23, are independent of the fluctuations, Y. For a linear Fokker-Planck equation, the ordinary differential equations governing the means and variances of the fluctuations, Y, can be derived by taking the first and second moments of Equation 21.
Taking the first moment of Equation 21 yields the expression governing the mean of the fluctuations, Y:
Similarly, taking the second moment of Equation 21 yields the expression governing the second moment of the fluctuations, Y:
System size expansion based on Kurtz’s limit theorems
The approximation of the master equation discussed in the preceding section, i.e., system size expansion method, can be derived and stated compactly in a general form based on Kurtz’s limit theorems [13–15] under the condition Ω → ∞. First, the master equation, Equation 11, can be written in the following continuous state, gain-loss form :
where W(n;n + r) is the transition probability from state n to state n + r per unit time. Both n and r in Equation 29 are now treated as continuous varying vectors. The convergence of the system size expansion procedure relies on two criteria for transition probability rate: small jump and slow varying . Mathematically, the small-jump criterion implies that there is a small δ so that
and the slow varying assumption means that there is a small δ so that
To satisfy these criteria, the unit jumps associated with the mutually exclusive events in the formulation of the master equation are replaced by jumps of size Ω−1, the system size or the largeness parameter. Thus, the random vector N(t) = (n1(t), n2(t)) is replaced by and time is replaced by . The resultant master equation of Equation 29 becomes
where δ(n) and δ i,j are Dirac and Kronecker delta functions, respectively. The four parameters on the right-hand side of Equation 33 are obtained from the definitions of transition intensity functions.
Kurtz’s limit theorems state that, as Ω → ∞ with an error of O(lnΩ/Ω), the statistical properties of the master equation, Equation 32, can be approximated by the following Fokker-Planck equation:
where the deterministic drift, , and diffusion coefficients, , are
The approximation of the master equation, Equation 12, can be found base on the fact that the Fokker-Planck equation, Equation 34, can be obtained by integrating the following nonlinear Langevin equation in Ito’s interpretation 
where the first term on the right-hand side of the above equation represents the deterministic, or macroscopic characteristic of the process, denotes a Gaussian white noise having the following means and covariance matrix
denotes a Gaussian white noise with a unit strength, and C i (ñ) denotes the effects of interactions of the noise and the system on the random variable. The discontinuity of Gaussian white noise has been the source of evolution of several algorithms in interpreting C i (ñ) during the process, and thus the conversion of a Langevin equation to its Fokker-Plank counterpart. In Ito’s algorithm, the value of C i (ñ) before the arrival of white noise is used in averaging. In Stratonovich’s algorithm, the averaged value of C i (ñ) during the time of noise is used in averaging, which yields an extra term in the macroscopic part of the Fokker-Plank equation. Since is never infinitely sharp and it lasts a finite time, the Ito and Stratonovich’s calculus are more appropriate in modeling internal and external noises, respectively .
With this Langevin representation in hand, the equations derived in the last section, i.e., Equations 17, 18, 22, and 23, can be readily obtained. Specifically, substituting Equations 37 and 38 into Equation 43 and ignoring the noise term yields Equations 17 and 18. Since the drift coefficient in a Fokker-Planck equation, matrix A in Equation 21, is the Jacobian matrix of the functions on the right-hand side Equations 17 and 18 , Equation 22 can be obtained by taking derivatives. Finally, it is obvious that the elements of the covariance matrix, Equations 39 through 42, are identical to those shown in Equation 23.
System size expansion based on Kurtz’s theorems is substantially simpler than the original procedure proposed by van Kampen . This efficiency was previously utilized by Aparicio and Solari  and Chua et al.  in their studies of stochastic population dynamics of disease transmission and chemical vapor deposition, respectively.
It should be mentioned that the system size expansion method discussed in this and last sections suffers several limitations. Simulation with the system-size expansion converges to the steady state within its boundary of attraction just like its deterministic counterpart, and it cannot be generate noise-induced transition, as it will be discussed later in the simulation section . The system size expansion near the steady-state boundary of attraction (i.e., away from the steady state) yields noises that are not compatible to those generated from near the steady states .
The genetic toggle switch model presented in the preceding section has been simulated by two approaches. The first approach relies on the solution of the governing equations for the first and second moments of the random variables derived from the master equations. The second approach resorts to the event-driven Monte Carlo algorithm.
Simulation based on the master equations
To effectively analyze the impact of system parameters, the equations governing the first and second moments are converted to dimensionless forms. Following Gardner’s procedure , we introduce the following variables, with the assumption α2 = α4:
When the effective rates of synthesis of the two proteins are comparable, we have
Then Equations 24 through 28 can be transformed into the following compact forms
Equations 49, 50, and 54 through 58 can be integrated simultaneously to obtain the statistical characteristics of the dynamical processes. Equations 49 and 50 yield the means of the populations while Equations 54 and 55 yield the means of the fluctuations, which are essentially zero due to the assumption of symmetric noises around the means, i.e., Equations 13 and 14. Equations 56 through 58 generate the variance and covariance of the two constituent populations. The integration was conducted in Matlab by ode45, a subroutine based on Gear’s method for stiff sets of ordinary differential equations.
As we will demonstrate later, some of the simulation results, including noise-induced transitions, depend on the parameter values and initial conditions, which, in turn, are closely related to the properties of the deterministic system, i.e., Equations 49 and 50. For a nonlinear system governed by Equations 49 and 50, the location of the parameters and in the bifurcation diagram and the initial population in the phase diagram have significant effects on the evolution of system’s state. In order to analyze the process under selected conditions, the values of the four parameters used for simulation, , , m, and M, are taken from published experimental results [3, 5, 34, 35] as well as the inference that can be drawn from the phase and bifurcation diagrams. A thorough review of the protein and mRNA reaction rates involved in the control mechanism can be found in Santallin and Mackey . The values of several of these variables can also be found in other regulatory modeling literature [7, 37–39]. As shown in Figure 3, for m = M = 2 and the traces of (u, v) by setting the right-hand sides of Equations 49 and 50 being zero yield with three interceptions. Liapunov stability analysis reveals that two of these steady states are stable, and the one in the middle is unstable, i.e., a saddle node. The bifurcation analysis, for m = M = 2, illustrates that the system has one or two stable steady states depending on the values of and (see Figure 4 and ).
As marked in Figure 4, three possible sets of and are sufficient to characterize the different cases of population dynamics: monostable, bistable, and bifurcation. Thus, the following three sets of parameter values are chosen in our simulations for characterizing the dynamics in different regions:
Case A, in bistable region: = 15.6 and = 15.6,
Case B, on bifurcation curve: = 15.6 and = 4.0,
Case C, in monostable region: = 15.6 and = 1.2.
We assume m = 2 and M = 2 for all the simulations presented herein.
Initial protein populations are also important to the evolution of the dynamics in several aspects. It is established in nonlinear dynamics that different initial conditions could lead to different steady states, and the evolution of the dynamics may be altered significantly by small variation of initial conditions. In this work, we will demonstrate that noise could induce system transition from one steady state to another when the populations pass through the neighborhood of an unstable steady state (or the saddle node) of a bistable system. Moreover, for very small initial populations, the numerical equations become invalid as the protein values tend to become so small that they drive the bifurcation lines beyond the domain of application. Thus, the choice of initial population should be such that it is between 10s and 100 s. In the present work, the initial populations are chosen u(0) = 155 and v(0) = 154 for the three cases discussed above. To further illustrate the effects of the initial populations, a simulation is conducted with u(0) = 15, v(0) = 155, = 15.6, and = 15.6 for the bistable system, or Case A. The population trajectories do not pass through the neighborhood of the saddle node in this simulation, and possess no risk of noise-induced transition.
It should be mentioned that the process of interest is characterized by the transition intensity functions, k1, k2, α2, and α4, defining the probabilities of transitions of each type of population per unit time. If the fraction of population converted per unit time is taken to represent the intensity function, its significance is equivalent to the deterministic rate constant of the specific rate. In other words, from the change in the population of a particular protein type i due to the conversion of type i during the time interval, (t, t + Δt), we have
where Ω stands for the system size, i.e., the total initial population; and − R i , the population converted attributable to transition type i protein per unit time. A detailed discussion of the relationship between the deterministic rate constant and the intensity function can be found in .
Simulation based on Monte Carlo simulation
Linear or nonlinear dynamic processes have been simulated either deterministically or stochastically by Monte Carlo procedures. It is worth noting that a well-developed class of Monte Carlo simulation procedures essentially shares identical computational bases with the master equation algorithm presented in the preceding sections. Specifically, the assumptions of Markov property and temporal homogeneity of the random variables lead to the definitions of transition intensity functions [33, 40, 41]. As discussed in the “Model Formulation” section, probability balances of various events on the basis of these intensity functions give rise to the master equations. In the Monte Carlo simulation, the system’s state is simulated by a step-wise, random-walk scheme based on the same intensity functions.
Process systems or phenomena can be simulated by time-driven and event-driven Monte Carlo procedures . The difference between these two procedures is in the manner of updating the time clock of the evolution of the system. The time-driven procedure advances the simulation clock by a pre-specified time increment, t, which is sufficiently small so that at most, one event will occur in this interval. The probability of an event occurring is determined by the nature and magnitudes of the transition intensity functions. In contrast, the event-driven procedure updates the simulation clock by randomly generating the waiting time, τ w , which has an exponential distribution [43, 44]; this distribution signifies that a population transition takes place completely randomly. At the end of each waiting interval, one event will occur, and the state to which the system will transfer is also determined by the nature of the transition intensity functions.
The process of interest here, i.e., genetic toggle switch, has been simulated by the event-driven procedure; it is usually computationally faster than the time-driven procedure. The simulation starts with a given initial distribution of population; the essential task is to obtain the probability distributions of the protein numbers at any subsequent times. To determine the system transition in each time step, two random numbers are generated for two different purposes. The first random number in (0, 1), i.e., r 1 , is for estimating the waiting time during which a possible transition of the system’s state will take place. The second random number in (0, 1), i.e., r 2 , is for identifying the transition type.
Let T n be the random variable representing the waiting time of the population of the system of interest at state n prior to its transition due to the transformation of a protein production or consumption. τ w is the realization of T n . Moreover, let G n (τ w ) be the probability that no transition takes place during τ w . Thus,
This can be expressed as (see derivation in Appendix 2)
The complement of G n (τ w )
expresses the cumulative probability distribution of T n up to τ w . The probability density function of T n , i.e.,
Therefore, h n (τ w ) has the following exponential form (see Appendix 2)
Note that H n (τ w ) is the probability function of T n .
Equation 64 indicates that to estimate the waiting time of a protein-regulated gene expression, τ w , a sequence of exponentially distributed random numbers must be generated. The sequences of the computer-generated random numbers, however, are usually uniformly distributed in interval [0, 1]. This uniform distribution, therefore, need be transformed into the exponential distribution, which can be accomplished by defining a new random variable, denoted by U, whose realization, denoted by u, assumes the value of H n (τ w ) at τ w [43, 44], i.e.,
It can be verified that if the waiting time, T n , whose realization is τ w , is exponentially distributed, then the random variable, U, whose realization is u, is uniformly distributed over interval [0, 1], see Appendix 3.
Probabilities of four possible transitions
After residing in state n = (n1, n2) for a waiting time of τ w , the system will transfer to one of its adjacent states. During the process, the transition intensity functions governing the four possible transitions of protein populations from state (n1, n2) to states (n1 − 1, n2), (n1 + 1, n2), (n1, n2 − 1), and (n1, n2 + 1) are α2, k1, α4, and k2, respectively. These transitions are exact equivalents of the transitions from states (n1 + 1, n2), (n1 − 1, n2), (n1, n2 + 1) and (n1, n2 − 1) to state (n1, n2), as shown in Figure 2. These four possible transitions are mutually exclusive events. Moreover, as discussed in the last section, one and only one of the four possible transitions takes place during the waiting time determined by the random number r 1 . Thus, the probability of the system transferring from (n1, n2) to (n1 − 1, n2) is
The probability of the system transferring from state (n1, n2) to (n1 + 1, n2) is
The probability of the system transferring from state (n1, n2) to (n1, n2 − 1) is
Similarly, the probability of the system transferring from state (n1, n2) to (n1, n2 + 1) is
Since the sum of Q1 through Q4 is 1, the transition type can be identified by the randomly generated number, r 2 . Specifically, r 2 falling within the interval,
implies that the population of type-1 protein decreases by 1, see Equation 67; r 2 falling within the interval,
implies that the population of type-1 protein increases by 1; r 2 falling within the interval,
implies that the population of type-2 protein decreases by 1; r 2 falling within the interval,
implies that the population of type-2 protein increases by 1.
The event-driven Monte Carlo procedure is conducted according to Rajamani . A step-wise description of the procedure is given below.
Define the initial populations of the two types of proteins, and let the system size, Ω, be the sum of the two protein populations. This Ω will also be the total number of independent simulations to be conducted before taking their statistics. Start the random walk from this point.
Select the total length of time of each simulation, T f has to be selected. For the current work, T f was chosen to be either 15 or 50 s.
Determine the length of the waiting time, τ w . First, generate a random number, r 1, from a uniform distribution in [0, 1]; then, calculate τ w , for a system’s transition state n(t) = (n 1(t), n 2(t)) according to Equation 66.
Update the computer clock by letting t = t + τ w .
Calculate the transition probabilities that the system will transfer from state n to the other states Q i ’s by Equations 67 through 70. Then, generate another random number r 2, from a uniform distribution in [0, 1]. Determine the transition type by examining in which interval given by Equations 71 through 74 is r 2 located.
Repeat steps 3 to 5 until the total time exceeds T f ; this terminates one replication of simulation.
Repeat steps 2 to 6 for Ω times, and store the resultant number in proteins of type i during the j th replication at time t, n ij (t). This yields the mean number of proteins of type i at time t as(75)
The variance of population of type i at time t can be calculated from its definition, i.e.,
The covariance around the means between the two types of populations i and j, at time t can be calculated from its definition, i.e.,
As mentioned at the outset of this section, both the Monte Carlo simulation and the simulation based on the master equations adopted in the current work are rooted in the identical set of transition intensity functions derived from the same set of assumptions. Thus, integrating the equations for the first and second moments of the master equations, Equations 54 through 58 for the process, is expected to generate results nearly identical to those from the Monte Carlo simulations, i.e., Equations 75 through 77. Equations 75 to 77 are expected to be nearly identical to Equations 54 to 58 together with 49 to 50.
Results and discussion
The present stochastic analysis of the genetic toggle switch yields the transition probabilities of mutually exclusive events through the definitions of the transition intensity functions of protein production as well as degradation. This analysis renders it possible to formulate the nonlinear master equations of the process as well as to derive the event-driven Monte Carlo simulation. Even though each of these was simulated separately they portrayed interesting analogies.
The stochastic algorithms developed here allow us to analyze the stochastic nature of the two-state toggle switch quantitatively. The master equations governing the numbers of the two types of protein are formulated from stochastic population balance. The stochastic pathways of the two proteins, i.e., their means and the fluctuations around these means, have been numerically simulated independently by the algorithm derived from the master equations, as well as by an event-driven Monte Carlo algorithm. Both algorithms have given rise to the identical results. Moreover, these analyses render it possible to circumvent the possibility of noise-induced transitions.
Simulation based on the master equations
Figures 5, 6, and 7 represent the temporal profiles of the Cases A through C discussed earlier. The left-hand parts of these figures are the exploded portion of the more completed simulations on the right. These simulations were conducted with m = M = 2, = 15.6, and the same set of initial conditions, u(0) = 155 and v(0) = 154. These initial conditions correspond to a point below the separatrix in the phase diagram, see Figure 3. The value of varies to illustrate the characteristics of three different cases of dynamics: in the bistable region, on the bifurcation curve, and in the monostable region. The standard deviation envelopes are plotted around the macroscopic trajectories.
Case A, = 15.6, represents a bistable system, as marked in the bifurcation diagram in Figure 4. Figure 5 presents the simulated results of this system based on the master equations. As expected, the populations eventually reaches the stable steady state #2 marked in Figure 3 since the initial conditions consist a point below the separatrix, and our analysis of the vector field depicting the flow of dynamics  suggests this outcome. The protein populations decrease rapidly and stay in the proximity of the saddle node for a while before they depart for their steady states, an observation consistent with the classical dynamics. During this period, the populations of the two proteins are very similar to each other. The fluctuations around the mean trajectories increase initially from zero and then decrease when they approach the steady states. In a stable system, the standard deviation of the number of either type-1 or type-2 proteins attains the maximum because the state of the system is usually well defined at the outset of the process and the uncertainties decline eventually until it varnishes upon stabilization . The uncertainty in the population of the type-2 protein in Figure 5 appears to remain constant; a special computer experiment was conducted with long simulation time to ensure that it indeed decreases over time.
The formulator is often confronted with a myriad of interacting factors related to a gene’s expression mechanisms before settling on a strategy to assess their impact. A mathematical description of this complex process usually relies on a manageable number of system variables. This lumping procedure inevitably results in a high degree of freedom and fluctuations, or uncertainties, in the predictions of populations of discrete systems . The behavior of an individual protein molecule in a discrete system with such a high degree of freedom is thus difficult to predict even when the system is monitored experimentally. The parameters in the equations, e.g., the transition intensity functions of the master equation algorithm adopted here, are presumed to depend only on the major variables of the system and to be independent of the variables of secondary importance. Neglecting these secondary variables is, in essence, the source of internal, or system, or minimal noises that should be appropriately analyzed stochastically. Thus, the internal noises caused by the discrete nature of a system are inherent in the system and they govern the minimum scattering expected of the random variable of interest. The experimentally observed scattering should always be larger than the predicted one induced by internal noises because of inevitable external noises attributable to experimental errors and imprecision of measuring devices. This implies that it is worth cautioning ourselves not to replicate the experiments excessively in an attempt to reduce the scattering far beyond what is predicted. It is interesting to note that fluctuations reported by Gardner et al. are significantly higher than what master equations will predict. The number of culture used in their fluorescence analysis was 40,000, and the actual number of culture in the sample is much larger than this number. Therefore, the noise levels reported by Gardner et al.  certainly involve not internal, but also external noises. External noises are the fluctuations created in an otherwise deterministic system by the application of a random force, whose stochastic properties are supposed to be known .
The two proteins do not have well-defined states as a deterministic model depicts when they pass the saddle node. Instead, their populations are probabilistically distributed. The two proteins have not only similar populations but also similar uncertainties in their populations. In fact, as shown in the left-hand side of Figure 5, the uncertainties in their populations are in the same order of magnitude. These characteristics imply that there is a high probability that the relative sizes of the two protein populations are switched when the system approaches the unstable steady state. This switch brings the populations to the region above the separatrix in the phase diagram in Figure 3, and the vector field in that region eventually leads the process to the steady state #1 marked in the same figure. The noise-induced phase transition has been examined in detail by Nicolis and Turner , Malek Mansour et al. , and Horsthemke and Lefever . Nicolis and Turner have shown that the fluctuations enhanced at a ‘critical point’ (populations closest to the instable steady state); the variances are of the order of Ω−1/2, a result consistent with that derived by van Kampen for expanding the master equation by system size expansion. Thus, systems with low populations are more subjective to noise-induced transitions. The noise enhancement near the instability is illuminated in Figure 5. Once the system moves away from the instability, the noises decrease and noise-induced transition becomes more difficult. Internal fluctuations do not change the local stability of the system, and the position of transition points is in no way modified by the presence of these fluctuations.
It should be mentioned that the parameter values for our simulation are carefully chosen to illustrate the possibility of noise-induced transition. Gardner et al.  did not observe this possibility probably because the populations of their system are very large and the difference between the two protein populations at the critical point is large, as discussed earlier.
Case B, = 4.0, represents system on the bifurcation line, as marked in the bifurcation diagram in Figure 4. Gardner  has a good exposition on the dependence of bifurcation diagram and phase diagram on the parameters. There are two steady states on the phase diagram, similar to Figure 3; one is stable and the other, unstable. Figure 6 presents the simulated results of this system based on the master equations. Similar to Case A, the populations eventually reach the stable steady state. The populations do not stay in the vicinity of the saddle node for a long time as they are in Case A. Although the two protein populations are very close to each other and the fluctuations are of the same order of the populations during this time period, one steady state characteristic guarantees the system’s final destination.
Case C, = 1.2, is a monostable system, as marked in the bifurcation diagram in Figure 4. Figure 7 presents the simulated results of this system based on the master equations. Similar to Case B, the populations eventually reach the stable steady state. Although the two protein populations have maximal fluctuations during the evolution of the dynamics, but they eventually vanish to zero.
Figure 8 presents the results from a simulation very similar to Case A. It uses the identical set of parameters for Case A and with a slightly different set of initial conditions: u(0) = 14 and v(0) = 154. It is a bistable system and the initial conditions represent a point above the separatrix in Figure 4. As expected, the process eventually reaches the steady state #1 shown in Figure 3. Unlike Case A, however, this dynamics does not pass through the proximity of the saddle node, and the fluctuations around the means do not permit easy switching between the two populations, see Figure 8.
Simulation based on Monte Carlo procedure
Monte Carlo simulations have yielded results essentially indistinguishable from those generated from the master equations. This is expected since the algorithms based on the event-driven Monte Carlo procedure and master equations derived in the present work are rooted in identical assumptions, i.e., the Markov property and temporal homogeneity of the random variables. These assumptions lead to the definitions of transition intensity functions that are the cornerstones of the formulation of the master equations and of the Monte Carlo procedure.
The fact that the two algorithms have yielded essentially the same results implies that both indeed define the evolution of dynamic process in a precisely equivalent way. The master-equation algorithm generates the equations governing the statistical moments of the process, which can be readily varied to cover a wide range of initial conditions, whereas the Monte Carlo procedure will require far more computational time and storage space under such circumstances.
Internal noise-induced transition was clearly observed during Monte Carlo simulation for Case A. Figure 9 demonstrates the two traces from two independent Monte Carlo simulations with parameters and initial conditions identical to those for Case A. These two independent Monte Carlo simulations result in two different steady states that is a consequence of internal noise-induced transition. It should be mentioned that results based on master equation, as shown in Figure 5, represent an averaged outcome of independent Monte Carlo simulations of Ω times (Ω = 155 + 154 = 309 for this case), which are indeed observed in our simulation experiments. As mentioned in the last section, systems of small populations are susceptible to large internal fluctuations (or uncertainties) in the evolution of their dynamics. The evolutions of protein statistics shown in Figure 5 also illustrate the large uncertainties after the populations enter the proximity of the saddle node. In fact, the uncertainty is of the same magnitude as the mean number of particles. Internal fluctuations are inherent characteristics of discrete systems that are beyond the regulation of external means. The results on the right-hand side of Figure 9 show a clear transition in protein numbers in a particular Monte Carlo simulation. The transition takes place soon after the populations enter the proximity of the saddle node. It is caused by the fact that the populations of both proteins are low and, therefore, there are susceptible to large internal fluctuations and noise-induced transitions. Noise-induced transition has been discussed by Nicolis and Turner , Malek Mansour et al. , and Horsthemke and Lefever .
As mentioned in the ‘Introduction’ section, the master equation and its system-size expansion suffers a few limitations. One of such limitations is that the algorithm is valid for the dynamics well within the boundary of attraction . For a bistable dynamics staring in a region outside this boundary, such as Case A, the Monte Carlo simulation converges to two possible steady states. The master equation algorithm converges to only one.
Some comparisons of the three algorithms are worth mentioning. The governing equations for the system size expansion can be derived in a straightforward manner, though the detailed derivations may be cumbersome and time consuming. It requires only a minor transformation of variable for some unstable stochastic processes, such as the diffusion process, well beyond the initial transient period . Unlike the Monte Carlo simulation, the derived moment equations can be repeatedly integrated for different sets of parameters and initial conditions. Consequently, system size expansion has been widely adopted in the derivation of governing equation of stochastic processes governed by internal noises.
Kurtz’s algorithm is highly compact and convenient. The implementation of the rigorous Kurtz algorithm requires knowledge about the relations among master, Langevin, and Fokker-Planck equations. It allows direct derivation of the equations governing the moments. However, the algorithm merely describes the dynamics in the initial transient period of unstable systems for selected processes, such as the diffusion process .
The Monte Carlo method is easy to implement because it bypasses all derivations of equations. It is most efficient when the number of random variables is large and the master equation is difficult to derive. Repeated simulations have to be carried out for different sets of parameters and initial conditions. The required computational time and disk space are usually high.
The current model adopts the essential concepts of a nonlinear toggle switch model for analyzing a protein-regulated system. The master equation algorithm, along with its system size expansion, involves the stochastic probability balance of the two types of populations. The resultant master equation should yield not only the deterministic evolution of protein populations during gene expression, but also the fluctuations, or uncertainties inherited in the prediction or measurement. Kurtz’s limit theorems significantly reduce the complex and laborious exercise of system size expansion. In fact, they will be indispensable tools for the analysis of really complex genetic networks.
The validity of the model is amply demonstrated by numerically calculating the evolution of population of both types and their fluctuations over time through two simulation algorithms, one based on the master equations and the other based on the event-driven Monte Carlo procedure. These two algorithms are implemented totally independently of each other but with the same set of system parameters, i.e., the transition intensity functions. Hence, it is indeed remarkable that the two algorithms have yielded essentially identical results.
Both simulation results demonstrate the possibility of noise-induced transition when the dynamics passes through the proximity of the saddle node. It happens when the protein populations are low and the noises are in the same order of magnitudes as the populations. This property may have practical applications in developing gene therapy, cell cycle control, and protein sensors.
E, one-step operator; N 1 , random variable representing population of repressor 1; N 2 , random variable representing population of repressor 2; n 1 , realization of random variable, N 1 (t); n 2 , realization of random variable, N 2 (t); N, random vector, i.e., [N 1 (t),N 2 (t),N 3 (t)]; n, realization of random vector N (t); P n , probability that the system is at state n at time t; t, time; Y, random variable denoting the fluctuations about macroscopic behavior; y, realization of random variable Y fluctuations; Q, the transition probability; u, Gardner’s concentration of repressor 1; v, Gardner’s concentration of repressor 2; K1, effective reaction rate for repressor 1 formation; K2, effective reaction rate for repressor 2 formation.
α1, the rate of production of repressor 1; α2, the rate of production of repressor 2; α3, the rate of degradation of repressor 1; α4, the rate of degradation of repressor 2; , the effective rate of synthesis of repressor 1 on system size; , the effective rate of synthesis of repressor 2 on system size; ∅, macroscopic number of repressor 1; θ, macroscopic number of repressor 2; τ w , the waiting time; λ, transition intensity function; Ψ, joint probability distribution in terms of random vector Y ; Ω, total number of repressors or system size; β, the multimerization constant of repressor 1; γ, the multimerization constant of repressor 2; Θ, the vector representing the two mean numbers of proteins in system-size expansion.
1, repressor 1; 2, repressor 2
Appendix 1: system-size expansion
The constituent populations at any time in the genetic toggle switch system can be represented by the random vector N(t) = [N1(t), N2(t)], their mean values can be taken as a deterministic vector Θ(t) = [ϕ(t), θ(t)] and their fluctuations can be taken as another random vector given by Y (t), where Y(t) = [Y1(t), Y2(t)]. As stated in Equations 13 and 14 in the text:
Their realizations of these expressions are given, respectively, by
Accordingly, the joint probability of n1 and n2, P n (t), is now transformed into that of y1 and y2, i.e., Ψ y (t).
Recall that in the context of deriving the master equation, the state or dependent variable of interest is the joint probability of the population distribution, P n (t), and the realization of random variables at time t, i.e., n1 and n2, are invariant with respect to time. Consequently, the time derivatives of Equations 80 and 81 are, respectively,
For the convenience of the subsequent expansion of the master equation, Eq. 11 in the text is restated below
Without causing confusion, the subscript y of Ψ y (t) is eliminated in the subsequent discussion. The step operators, and , convert n1 to n1 + 1 and n1 − 1, respectively. Similarly, Equation 80 suggests that shifts y1 to y1 + Ω− 1/2. Therefore, the operations of step operators in Equation 84 are equivalent to evaluating the values of target functions at shifted points through the following Taylor series expansions, i.e.,
Substituting Equations 85 through 89 into 84 yields
In order to collect the terms of same power of Ω in the subsequent expansion, the Ω dependence of the parameters in the above equation have to be examined and converted to their Ω -independent counterparts. The definitions of α1 and α3 in the ‘Model formulation’ section suggest that they are proportional to the system size, Ω, i.e.,
where and are independent of the system size, Ω. Moreover, the definitions of equilibrium constants, K a and K b , for the gene repression, G + m R ⇆ GR m , in the beginning of the ‘Model formulation’ section suggest
where and are independent of the system size, Ω.
The first and third terms on the right-hand side of the above expression can be expanded in power of Ω through known power and binomial expansions. Specifically, for small , we have
where denotes a binomial coefficient. Lumping the terms of the same power of Ω in the above expansion gives
Substituting the above expression into the first term on the right-hand side of Equation 93 yields
The expansion of the second term on the right-hand side of Equation 93 gives
Following the same procedure, the third and fourth terms on the right-hand side of Equation 93 can be expanded into the following power series of , respectively:
This equation can be rearranged in a linear Fokker-Plank equation form  as follows: