Remarks on the statistical study of protein-protein interaction in living cells

In this note, we focus on a selection model problem: a mono-exponential model versus a bi-exponential one. This is done in the biological context of living cells, where small data are available. Classical statistics are revisited to improve existing …

Authors: Ph. Heinrich, J. Kahn, L. Héliot

REMARKS ON THE ST A TISTICAL STUD Y OF PR OTEIN-PR OTEIN INTERA CTION IN LIVING CELLS PH. HEINRICH, J. KAHN, L. H ´ ELIOT, AND D. TRINEL Abstract. In this note, we fo cus on a selection mo del problem: a mono- exponent ial model v ersus a bi -exponent ial one. This is done in the biological con text of living cells , whe re small data are a v ailable. Classical s tatistics ar e revisited to improv e existing r esults. Some unav oidable li mits are also p oint ed out. 1. Introduction The measurement o f molecular dynamic interactions and their resp ective pro- po rtions in living cells or tissues is a ma jor question in biological and medicine resear ch. The F¨ orster reso nance energy transfer (FRET) is one of the b est known approaches to o bserve and quan titatively study pr otein-protein in teractions at a sub c ellular level ([7]). The FRET measurement can b e currently p erfor med by fluorescence lifetimes imaging microscopy (FLIM for s hort) in living cells and tis- sus. It ca n b e achiev ed via the time co rrelated s ingle photon count ing (TCSPC) metho d which provides a lifetime decay curve p er site ([8]). T o b e in terpr eted, this curve is fitted by selecting t he “b est” (with resp ect to a giv e n statistica l criterion) m ulti-exp onential mo del. Contrary t o a mono-expo nen tial model, a bi-expone ntial one witnesses interaction b etw een t wo proteins. Our aim is to find , pixel per pixel, which of these mo dels is accurate. But one difficult y is that the num b er of obse r ved photons per pixel is small for a n y statistical treatment in order to prese r ve the liv- ing cell and therefor e cannot be increased. An attempt to deal with the problem can be found in [7]. Our aim here is to go further in this direction po in ting out some improv ements and limits. Some a ccount of s ta tistical metho ds in this area can be fo und in [5] and [6]. 1.1. Mo dell ing fluorescence li fetimes. It is not necessar y to descr ibe here in details FLIM and TCSCP . W e only need to understand that lifetimes ar e mea sured as differences b etw een excitation times (pulses ) and emission times of photons. Denote b y r the p erio d b et ween tw o consecutive pulses. Here r is 12 nanoseconds, near v alues taken in practice. What is actually measured is a lifetime mo dulo r since w e cannot b e sure from what pulse it go es. It is a ssumed that lifetimes come from say K sp ecies and are obser ved in the int erv al [0 , r ) after infinitely ma ny pulses. In these conditions, each lifetime spe c ies k (1 ≤ k ≤ K ) admits the following probability density: (1) f k ( t ) = α k exp( − α k t ) 1 l [0 ,r ) ( t ) 1 − exp( − α k r ) 2010 Mathematics Subje ct Clas sific ation. 62F03, 92C40. Key wor ds and phr ases. Maximum likelihoo d, m ulti-exp onen tial mo del, model selection, FRET, FLIM, TCSPC. 1 2 PH. HEINRICH, J. KAHN, L. H ´ ELIOT, AND D. TRINEL where α k is the in verse mean lifet ime of the k - th sp ecies. A uniform noise is added with density (2) f 0 ( t ) = 1 l [0 ,r ) ( t ) r . If π k denotes the prop or tion o f the k -th s p ecies ( π 0 refers to the no ise’s one), we get the probability densit y of the fluores cence lif etime by writing (3) g ( t ) = K X k =0 π k f k ( t ) . 1.2. Mo dell ing the photon emiss ion. Let I k be the mea n photon num b er of sp ecies k detected betw een t wo pulses. Assume that photons o ccurr ences a r e inde- pendent. Then the total num b er o f detected photons is Poisson distributed with int ensity T P K k =0 I k if observ ations tak e pla ce during T pulses. F or a later use, it is conv enient to s et (4) I = K X k =0 I k . Note that w e hav e π k = I k I . Since the no ise intensit y I 0 will b e supp osed known, it is conv enient to consider prop ortions π ′ k among all sp ecies with k ≥ 1 except k = 0. Thus, we have for k ≥ 1 , π ′ k = I k I − I 0 = I I − I 0 π k . 1.3. Maximum lik eliho o d estimation (MLE) and lik eliho o d ra tio test. The aim is firstly the determination of the most probable parameter θ := ( α 1 , . . . , α K , I 1 , . . . , I K ) from obse r ved lifetimes mo dulo r denoted by t 1 , . . . , t n . The noise intensit y I 0 is suppo sed known. The related log-likeliho o d is then (5) L ( θ ) = L ( θ ; t 1 , . . . , t n ) = − I T + n log( I T ) − log ( n !) + n X i =1 log ( g ( t i )) . F o r physical reasons, in particula r since lifetimes a re sure to be b etw een 30 p c and 30ns, we may and do assume that θ lies in a co mpact parameter set. Numerical optimisation of the lik e lihoo d (5) is made easier b y kno wing deriv a- tives: ∂ g ( t ) ∂ I k = K X l =0 I l I 2 [ f k ( t ) − f l ( t )] = f k ( t ) − g ( t ) I ; ∂ g ( t ) ∂ α k = 1 l [0 ,r ) ( t ) I k I e − α k t (1 − e − α k r ) 2  1 − α k t + ( α k t − α k r − 1)e − α k r  ; ∂ L ( θ ) ∂ I k = − T + n I + n X i =1 f k ( t i ) − g ( t i ) I g ( t i ) = − T + 1 I n X i =1 f k ( t i ) g ( t i ) ; ∂ L ( θ ) ∂ α k = n X i =1 ∂ g ( t i ) ∂ α k g ( t i ) = I k I n X i =1 ∂ f k ( t i ) ∂ α k g ( t i ) . Denote by θ ∗ K the most probable parameter if there is K sp e cies. T o decide next which mo del from K = 1 or K = 2 is the most a ccurate, a classical statistic is the likelihoo d ratio D := [ L ( θ ∗ 2 ) − L ( θ ∗ 1 )] . REMARKS ON THE ST A TISTICAL STUDY OF PR OTE I N-P R OTEIN INTERACTION IN LIVING CELLS 3 F r om a theo retical p oint of vie w , since we are dealing with the n um be r of co mpo- nent s o f a mixture mo del, even the asymptotics under the null hypothesis are not the usual χ 2 statistics. It can b e expressed as a supremum ov er a Ga ussian pro cess on a subset of a fo ur-dimensional unit sphere (in o ur case) endow ed with the “ r ight” cov ariance function ([1 ] a nd refere nc e s therein). How ever this process depends also on the “true” p oint θ . Since all calculations are complicated, it is easier to simply simulate if we w a n t to know the level of a test asso ciated with a giv en threshold. Notice on the other hand that simulations hint that the likelihoo d ratio test is quite efficient for knowing the n um be r of comp onents in a mixture with co mpa ct parameter set (see for example [4] or [3]). 2. S election of the number of exponential species K 2.1. Comparisons. W e restricted ourse lves to test K = 1 versus K = 2 . It can b e already a difficult and interesting question, if few obser v ed photons are av ailable. With the help of simulated observ atio ns, we first optimised θ by MLE for each K and next tested K = 1 versus K = 2 via the lik eliho od ra tio statistics D . Compared to the o ne g iven in [7], the preceding statistical test is as efficient but with a bo ut 100 times less obse rv ations. F or the reader’s conv e nience and for compariso n, co nsider the table obtained in [7]: Nbr of photons / ∆ χ 2 10.0 20.0 30.0 4 0.0 50.0 90.0 Erro r (%) 1000 35.7 34.8 34.3 3 4.6 34.9 45 > 20 10000 13.7 12.0 11.9 1 2.1 12.9 27.3 < 20 10000 0 4.2 1.7 2.3 2.7 4.7 26.3 < 5 10000 00 1.7 0.0 0.0 0.0 3.3 32.7 < 2 T able 1. F requency of selection of the wrong mo del. It depends on the observ ations n umber and a ∆ χ 2 criterion whic h consists in comparing the χ 2 statistics for K = 1 and K = 2. Simulations were p e rformed on a mix of 1 /α 1 = 0 . 6 ns and 1 /α 2 = 2 . 4 ns with different pr op ortions π ′ 1 = 0 , . 077 , . 2 , . 43 , 1 with 1 0 0 noise pho tons. 30 sim ulations per condition. In similar sim ulation conditions, we ha ve obtained t he following: Nbr of photons Mean erro r rate (%) Best threshold Mean error rate at threshold 4 1000 12.8 .85 20 10000 0.3 4 0.3 10000 0 0 4 0 T able 2. F requency of selection of the wrong mo del. It depends on the o bserv ations num b er a nd a likeliho o d ratio criterio n. Simu- lations were perfor med on a mix of 1 /α 1 = 0 . 6 ns and 1 /α 2 = 2 . 4 ns with differ e n t pr opo rtions π ′ 1 = 0 , . 077 , . 2 , . 43 , 1 with 100 noise photons. 500 simulations for each propo rtion and n umber of pho- tons. Here, mea n err or rate is the av er age ov er the simulation num ber o f the p ercentage to select the wrong mo del. Best threshold means threshold that gives the smalles t mean err or rate; using a very crude optimisation. Notice that the stra ng e v alues 0 , . 077 , . 2 , . 43 , 1 of π ′ 1 ’s propo rtion corresp ond to v alues . 2 5 , . 5 , . 75 , 1 of prop ortion 4 PH. HEINRICH, J. KAHN, L. H ´ ELIOT, AND D. TRINEL η 1 = π ′ 1 α 2 π ′ 1 α 2 + π ′ 2 α 1 considered in [7]. A consequence is that we ne ver tes t more short-life photons than lo ng-life. Mor e ov er the case π ′ 1 = . 077 is not very far aw ay from the mono-exp onential case. In particular , with 1000 photons, a mong which 100 nois e photons, the expected num b er of photons with 0 . 6 nanoseco nds lifetime is less than the num b er o f noise photons. If we compute the error ra te for 1 000 photons without tha t case, we obtain for insta nc e a 2 . 6% er ror r ate for the likeliho o d ratio test at threshold 3. 2.2. Simulation scheme. The data set genera tion algorithm is as follo ws : 1. Sa mple n k the num ber of photons for each sp ecies k , including no ise ( k = 0 ), from a Poisson distr ibution of parameter T I k . 2. Dr aw n k lifetimes with distribution density f k for each s pecies k . 3. Retur n the set of all the sampled lifetimes, regar dle s s of k . Some differences betw een simulation methods should b e noted: • W e use random P oissonian num b er of photons rather than fixed num b er o f photons : we take in to account the “o ffset noise”. • Instrumental resp onse: we neglect the . 03 nanoseco nds long instrumental resp onse function. • Exact times vs channels: w e did no t use bins and work ed as if we knew t he exact detection times. Nevertheless these differences should hav e little effect and compariso ns still make sense. 3. Fur ther comments 3.1. With closer lifeti mes. If w e choo s e 1 /α 1 = 1 ns and 1 /α 2 = 2 ns as mean lifetimes, it is harder to select the right n umber of sp ecies: • With 10000 photons and π 0 1 − π 0 = . 01 as noise ratio: – If π ′ 1 = π ′ 2 = . 5 or π ′ 1 = . 75 , π ′ 2 = . 25, no wrong selection should o ccur, – If π ′ 1 = . 25 , π ′ 2 = . 75 , the error rate is ab out . 1% when the thres ho ld is calibrated so as to balance errors “mono to wards bi” and “bi to wards mono”. • With 1000 photons and π 0 1 − π 0 = . 0 1 as noise ratio : if π ′ 1 = π ′ 2 = . 5 , the error rate is ab out 15 % when the threshold is ca librated so as to bala nce error s “mono towards bi” and “bi tow ar ds mono”. If we c ho ose close mean lifetimes such as 1 /α 1 = 1 . 4 ns and 1 /α 2 = 1 . 6 ns, w e are to o close to the “b or der” of the mo del, and a bo ut 1 million photons is r equired to dis tinguish the t wo comp onents. By bor der, we mean a prop ortion close to 0 or 1 /α 1 close to 1 / α 2 so that identifiabilit y problems o ccur with small samples. Asymptotically , when we g et n times closer to the b order of a mixture mo del, we need n 4 times as ma n y pho tons to get the sa me sta tistical efficiency , for any pro cedure [2]. 3.2. Absolute l i mits. The former sentence ab out rates when we g et near er the bo rder is a first expr ession of limits that ca nno t b e broken, no matter how smart the statistica l pro cedure. T o give a small taste of what to expect, here ar e the b est error rate when having to c ho ose sp ecifically between tw o p ossible s e ts o f lifetime parameters and co rresp onding distribution probabilities f 1 and f 2 , w ith equal a priori p robabilities. In that situation, which is easier tha n the one studied in the article, the optimal choice is the one with g reater observed likeliho od, and the erro r rate is 1 2 − 1 4   f 1 − f 2   1 . REMARKS ON THE ST A TISTICAL STUDY OF PR OTE I N-P R OTEIN INTERACTION IN LIVING CELLS 5 • With 32 obser v ed photons and a signal to nois e r atio of 1 / 10, choose b e- t ween a mo no-exp onential with lifetime 2 . 4 ns, a nd a bi-expo nen tial with prop ortions 0 . 077 and 0 . 92 3 and lifetimes 0 . 6 and 2 . 4 ns: optimal er ror ra te > 25 %. • With 3 2 o bs erved photons and no noise, cho ose betw een a mono-exp onential with lifetime 2 . 6 ns, a nd a bi-exp onential with pr o po rtions o ne half and lifetimes 2 . 5 and 2 . 7 ns: optimal error rate > 4 9 . 75%. The second case is almost as bad as a coin toss, ignoring the data. References [1] Jean-Marc Aza ¨ ıs, Elisab eth Gassiat, and C´ ecile M ercadier. As ymptotic distri bution and lo cal pow er of the log-l ik eliho od ratio test f or m ixtures: b ounded and unbounded cases. Bernoul li , 12(5):775– 799, 2006. [2] Jiahua Chen. Optimal Rate of Conv er gence for Finite Mixture Mo dels. Anal. Stat. , 23(1):221– 233, 1995. [3] Jiahua Chen and John D. Kal bfleisc h. Modified lik eliho od ratio test in finit e mixture models with a structural parameter. Journal of Statistic al P lanning a nd Inferenc e , 129:93–107, 2005. [4] B. Goffinet, P . Loisel, and B. Laurent. T esting i n normal mi xture mo dels when the prop ortions are known. Biometrika , 79:842–846, 1992. [5] Peter Hall and Ben Sellinger. Bette r Estimates of Exponential D ecay Paramete rs. J. Phys. Chem. , 85:2941–2 946, 1 981. [6] Mi c hael Maus, M ircea Cotlet, Johan Hofk ens, Thomas Gensch, and F rans C. De Sc hryver. An Experimental Comparison of the Maxim um Lik el i hoo d Estimation and Nonlinear Le ast- Squares Fluorescence Lif etime Analysis of Single Molecules. Anal. Chem. , 73:2078–2086, 2001. [7] Corentin Spriet, Da v e T rinel, F ranc k Ricquet, Bernard V anden bunder, and Lauren t H´ eliot. Enhanced Fret Con trast in Lifetime Imaging. Cytometry P art A , 73A:745–753, 2008. [8] F ran¸ cois W a harte, Cor en tin Spriet, and Laurent H´ eliot. Setup and Characterization of a Mul- tiphoton FLIM I nstrumen t f or Protein-Protein Interact ion Measuremen t in Living Cell s . Cy- tometry Part A , 69A:299–306, 2006. Labora toire P aul P ainlev ´ e, UMR CNRS 85 24, Universit ´ e Lille 1, Cit ´ e Scientifique, 59655 Villen eu ve d’Ascq C edex, F rance E-mail addr ess : philipp e.heinric h@univ-lille1.fr, jonas.ka hn@math.u niv-lille1.fr Interdisciplinar y Research Institute, P arc de la Haute Borne, 50 a venue de Halley BP 7047 8, 59658 Vil leneuve d ’Ascq Cedex, France E-mail addr ess : laurent .heliot@i ri.univ-lille1.fr, dave.trinel @iri.univ -lille1.fr

Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Leave a Comment