Generative Modeling with Conditional Autoencoders: Building an Integrated Cell

Generativ e Modeling with Conditional A utoencoders: Building an Integrated Cell Gregory R. Johnson gregj@alleninstitute.or g Rory M. Dono van-Maiy e r or ydm@alleninstitute.org Mary M. Maleckar moll ym@allenins titute.org Allen Institute f or Cell Science, 615 W es tlake A v e N, Seattle, W A 98109 Abstract W e present a conditional g enerativ e model to learn variation in cell and nuclear mor phology and the location of subcellular str uctures from microscop y images. Our model generalizes to a wide range of subcellular localization and allo ws f or a probabilistic inter pretation of cell and nu- clear morphology and structure localization from ﬂuorescence imag es. W e demons trate the ef- f ectiveness of our approach by producing photo- realistic cell images using our generativ e model. The conditional nature of the model pro vides the ability to predict the localization of unobser v ed structures given cell and nuclear mor phology . 1. Introduction A central biological principle is that cellular or ganization is strongl y related to function. Location proteomics ( Murphy , 2005 ) addresses this by aiming to deter mine cell state – i.e. subcellular org anization – by elucidating the localization of all structures and ho w they change through the cell cycle, and in response to per turbations, e.g., mutation. Ho w ev er , determining cellular org anization is challeng ed by the mul- titude of diﬀerent molecular comple xes and organelles that comprise living cells and dr iv e their behaviors ( Kim et al. , 2014 ). Currently , the e xper imental state-of-the-ar t for live cell imaging is limited to the simultaneous visualization of only a limited number of tagged (2-6 tagged) molecules. Modeling approaches can address this limitation by inte- grating subcellular structure data from div erse imaging ex- periments. Due to the number and diversity of subcellular structures, it is necessary to build models that generalize w ell with respect to both representation and inter pretation. Image f eature-based methods ha ve pre viously been em- Pr eprint plo yed to describe and model cell organization ( Boland & Murphy , 2001 ; Car penter et al. , 2006 ; Rajaram et al. , 2012 ). While useful f or discriminative tasks, these approaches do not e xplicitly model the relationships between subcellular components, limiting the application to integ ration of all of these structures. Generativ e models are useful in this conte xt. The y cap- ture variation in a population and encode it as a probability distribution, accounting f or the relationships among struc- tures. Fundamental work has previousl y demonstrated the utility of expressing subcellular structure patter ns as a gen- erativ e model, which can then be used as a building block f or models of cell behavior , i.e. ( Mur ph y , 2005 ; Donov an et al. , 2016 ). Ongoing eﬀorts to construct generativ e models of cell or - ganization are pr imarily associated with the CellOrg anizer project ( Zhao & Mur ph y , 2007 ; Peng & Mur ph y , 2011 ). That w ork implements a “cytometr ic” approach to mod- eling that considers the number of objects, lengths, sizes, etc. from segmented images and/or in v erse procedural mod- eling, which can be particularl y useful f or both anal yzing image content and approaching integ rated cell org aniza- tion. These methods suppor t parametric modeling of many subcellular structure types and, as such, generalize well when lo w amounts of appropr iate imaging data are a vail- able. Ho we v er, these models may depend on preprocessing methods, such as segmentation, or other object identiﬁca- tion tasks f or which a g round tr uth is not a vailable. A d- ditionally , there may exis t subcellular structures f or which a parametr ic model does not e xist or ma y not be appropri- ate e.g., structures that vary widely in localization (diﬀuse proteins), or reorganize dramatically dur ing e.g. mitosis or during a stimulated state (such as microtubules). Thus, the presence of ke y str uctures f or which current meth- ods are not w ell suited motivates the need f or a ne w approach that generalizes well to a wide range of structure localiza- tion. Recent advances in adv ersar ial netw orks ( Goodf ellow et al. , Building the Integrated Cell 2014 ) are relev ant to our problem. The y ha ve the ability to learn distr ibutions ov er images, generate photo-realistic e xemplars, and lear n sophisticated conditional relation- ships; see e.g. Generative A dversarial Netw orks ( Good- f ellow et al. , 2014 ), V arational A utoencoders/GAN ( Larsen et al. , 2015 ), Adv ersar ial Autoencoders ( Makhzani et al. , 2015 ). Lev eraging these recent advances, we present a non- parametric model of cell shape and nuclear shape and lo- cation, and relate it to the variation of other subcellular components. The model is trained on data sets of 300–750 ﬂuorescence microscopy images; it accounts for the spatial relationships among these components, their ﬂuorescent intensities, and generalizes well to a variety of localization patterns. Using these relationships, the model allow s us to predict the outcome of unobserved experiments, as well as encode comple x image distributions into a lo w dimensional probabilistic representation. This latent space serves as a compact coordinate system to e xplore variation. In the f ollowing sections, we present the model, a discussion of the training and conditional modeling, and initial results which demonstrate its utility . W e then brieﬂy discuss the results in conte xt, cur rent limitations of the work and future e xtensions. 2. Model Description Our generativ e model ser v es sev eral distinct but comple- mentary purposes. At its core, it is a probabilistic model of cell and nuclear shape (speciﬁcall y , of cell shape and nuclear shape and location ) w edded to a probability distri- bution of structure localization (e.g. the localization of a certain protein) conditional on cell and nuclear shape. This model, in toto , can be used both as a classiﬁer f or imag es of localization patter n where the protein is unkno wn, and and as a tool with which one can predict the localization of unobserved str uctures de nov o . The main components of our model are tw o autoencoders; one which encodes the v ar iation in cell and nuclear shape, and another which lear ns the relationship betw een subcel- lular structures dependent on this encoding. Notation The images input and output by the model are multi-channel (see ﬁgure 2 ). Each image x consists of both reference c han- nels r and a structure channel s . Here, the cell and nuclear channels together ser v e as ref erence channels, and the struc- ture channel varies, taking on one of the follo wing structure types: α -actinin (actin bundles), α -tubulin (microtubules), β -actin (actin ﬁlaments), desmoplakin (desmosomes), ﬁb- rillar in (nucleolus), lamin B1 (nuclear membrane), my osin IIB (actom y osin bundles), Sec61 β (endoplasmic reticulum), Figure 1: The presented model. The top half of the dia- gram outlines the ref erence str ucture model; the bottom half sho ws conditional model. The parallel white bo xes indicate a nonlinear function. The model is a probabilistic model of cell and nuclear shape (speciﬁcall y , of cell shape and nuclear shape and location ) w edded to a probability distri- bution of structure localization (e.g. the localization of a certain protein) conditional on cell and nuclear shape. This model can be used both as a classiﬁer for images of localiza- tion pattern where the protein is unkno wn, and and as a tool f or prediction of the localization of unobserved structures de nov o . The main components are two autoencoders: one encoding the variation in cell and nuclear shape, and another which lear ns the relationship between subcellular structures dependent on this encoding. See Notation and Model de- scription for details. Figure adapted from ( Makhzani et al. , 2015 ) Building the Integrated Cell TOM20 (mitochondria), and ZO1 (tight junctions). W e de- note which content is being used by the use of superscripts; x r , s indicates all channels are being used, whereas x s in- dicates only the structure channel is being used, and x r only the reference channels. W e use y to denotes an inde x- valued categor ical variable indicating which structure type is labeled in x s . For e xample, y = 1 might cor respond to the α -actinin channel being active, y = 2 to the α -tubulin channel, etc. While y is a scalar integ er, we also use y , a one-hot v ector representation of y , with a one in the y th element of y and zeros elsewhere. 2.1. Model of cell and nuclear variation W e model cell shape and nuclear shape using an autoen- coder to construct a latent-space representation of these ref- erence channels. The model (ﬁgure 1 , upper half ) attempts to map imag es of reference channels to a multivariate nor- mal distribution of moderate dimension – here we use a sixteen dimensional distribution. The choice of a normal distribution as the pr ior f or the latent space is in many re- spects one of conv enience, and of small consequence to the model. The nonlinear mappings lear ned b y the encoder and decoder are coupled to both the shape and dimension- ality of the latent space distribution; the mapping and the distribution only function in tandem – see e.g. ﬁgure 4 in ( Makhzani et al. , 2015 ). The pr imary architecture of the model is that of an autoen- coder , which itself consists of tw o networks: an encoder Enc r that maps an imag e x to a latent space representa- tion z via a lear ned deter ministic function q ( z r | x r ) , and a decoder Dec r to reconstruct samples from the latent space representation using a similarl y lear ned function g ( ˆ x r | z r ) . W e use the f ollowing notation f or these mappings: z r = q ( z r | x r ) = Enc r ( x r ) (1) ˆ x r = g ( ˆ x r | z r ) = Dec r ( z r ) (2) where an input image x is distinguished from a recon- structed image ˆ x b y the hat o ver the v ector . 2.1.1. Encoder and Decoder The autoencoder minimizes the pix el-wise binary cross- entrop y loss of the input and reconstructed input using bi- nary cross entropy , L x r = H ( ˆ x r , x r ) (3) where H ( ˆ u , u ) = − 1 n Õ p u p log ˆ u p + ( 1 − u p ) log ( 1 − ˆ u p ) (4) and the sum is ov er all the pixels p in all the channels in the images u . W e use this function for all images regardless of content (i.e. we use it for x r and x r , s ) 2.1.2. Encoding Discrimina tor In addition to minimizing the abov e loss function, the au- toencoder’ s latent space – the output of Enc r – is regular ized b y the use of a discriminator EncD r , the encoding discr im- inator . This discriminator EncD r attempts to distinguish betw een latent space embeddings that are mapped from the input data, and latent space embeddings that are g enera- tiv e dra wn from the desired pr ior latent space distribution (which here is a sixteen dimensional multivariate nor mal). In attempting to f ool the discr iminator , the autoencoder is f orced to lear n a latent space distr ibution q ( z r ) that is sim- ilar in f or m to the pr ior distr ibution p ( z r ) ( Makhzani et al. , 2015 ). The encoding discriminator EncD r is trained on samples from both the embedding space z ∼ q ( z r ) and from the desired pr ior ˜ z ∼ p ( z r ) . W e ref er to z as obser v ed sam- ples, and ˜ z as generated samples, and use the subscr ipts obs and gen to indicate these labels. Trained on these sam- ples, EncD r outputs a continuous estimate of the source distribution, ˆ v EncD r ∈ ( 0 , 1 ) . The objectiv e function f or the encoding discriminator is thus to minimize the binary-cross entropy betw een the tr ue labels v and the estimated labels ˆ v f or generated and obser v ed images: L EncD r = H ( ˆ v z r gen , v z r gen ) + H ( ˆ v z r obs , v z r obs ) (5) 2.1.3. Decoding Discrimina tor The ﬁnal component of the autoencoder f or cell and nuclear shape is an additional adversarial network DecD r , the de- coding discriminator, which operates on the output of the decoder to ensure that the decoded images are representa- tiv e of the data distribution, similar to that of ( Larsen et al. , 2015 ). W e train DecD r on images from the data distribu- tion, x r obs ∼ X r , which w e ref er to as obser v ed images, and on decoded dra ws from the latent space, x r gen ∼ Dec r ( ˜ z r ) , which we ref er to as generated images. The loss function f or the decoding discr iminator is then: L DecD r = H ( ˆ v x r gen , v x r gen ) + H ( ˆ v x r obs , v x r obs ) (6) 2.2. Conditional model of structure localization Giv en a trained model of cell and nuclear shape variation from the abov e netw ork component, we then train a condi- tional model of structure localization localization upon the learned cell and nuclear shape model. This model (ﬁgure 1 , lo wer half ) consists of sev eral par ts, similar to those abov e: the core is a tandem encoder Enc r , s and decoder Dec r , s that encode and decode images to and from a low dimensional latent space; in addition, a discr iminativ e decoder EncD s regularizes the latent space, and a discriminative decoder DecD r , s ensures that the decoded images are similar to the input distr ibution. Building the Integrated Cell 2.2.1. Conditional Encoder The encoder Enc r , s is given images containing both the ref erence structure and structures of protein localization, x r , s and produces three outputs: ˆ z r , ˆ y , z s = Enc r , s ( x r , s ) = q ( ˆ z r , ˆ y , z s | x r , s ) (7) Here ˆ z r is the reconstructed cell and nuclear shape latent- space representation lear ned in Section 2.1 , ˆ y is an estimate of which structure channel was learned, and z s is a la- tent variable that encodes all remaining v ar iation in image content not due to cell/nuclear shape and structure chan- nel. Theref ore z s is lear ned dependent on the latent space embeddings of the ref erence structure, z r . The loss function f or the reconstruction of the latent space embedding of the cell and nuclear shape is the mean squared error between the embedding z r learned from the cell and nuclear shape autoencoder and the estimate ˆ z r of that em- bedding produced by the conditional por tion of the model: L ˆ z r = MSE ( z r , ˆ z r ) = 1 n k z r − ˆ z r k 2 (8) The output ˆ y in equation 7 is a probability distribution ov er structure channels, giving an estimate of the class label f or the structure. In our notation, y is an integer v alue representing the true str ucture channel, and takes an integ er value 1 . . . K , while y is the one-hot encoding of that label, a v ector of length K equal to 1 at the y th position and 0 otherwise. Similarl y , ˆ y is a vector of length K whose k th element represents the probability of assigning the label y = k . W e use the softmax function to assign these probabilities. In general, the softmax function is given by LogSoftMax ( u , i ) = log  e u i Í j e u j  (9) the loss function f or ˆ y is then L y = − LogSoftMax ( ˆ y , y ) (10) The ﬁnal output of the conditional encoder z s can be in- terpreted as a variable that encodes the variation in the localization of the labeled structure independent of cell and nuclear shape. 2.2.2. Encoding Discrimina tor The latent variable z s is similarl y regular ized by an ad- v ersar y EncD s that enf orces the distribution of this latent variable be similar to a chosen pr ior p ( z s ) . The loss function f or the adv ersar y takes the same form as equation 5 : L EncD r = H ( ˆ v z s gen , v z s gen ) + H ( ˆ v z s obs , v z s obs ) (11) 2.2.3. Conditional Decoder The conditional decoder Dec r , s outputs the image recon- struction given the latent space embedding ˆ z r , the class estimator ˆ y , and the structure channel variation z s : ˆ x r , s = Dec r , s ( ˆ z r , ˆ y , z s ) = g ( x r | ˆ z r , ˆ y , z s ) (12) The loss function f or imag e reconstruction takes the same f or m as equation 3 , the binar y cross entrop y between the input and reconstructed image: L x r , s = H ( ˆ x r , s , x r , s ) . (13) 2.2.4. Decoding Discrimina tor As in the cell and nuclear shape model, attached to the decoder Dec r , s is an adv ersar y DecD r , s intended to enf orce that the reconstructed images are similar in distribution to the input images. The output of this discriminator is a vector ˆ y DecD r , s that has | y | + 1 = K + 1 output labels, which take a value in [ 1 , . . . , K , gen ] . That is, ˆ y DecD r , s has one slot f or real images of each par ticular labeled structure channel, and one additional slot f or reconstructed (aka, generated) images of all channels. The loss function is theref ore L DecD r , s = − LogSoftMax  ˆ y DecD r , s , y  (14) 2.3. T raining procedure The training procedure occurs in tw o phases. W e ﬁrst train the model of cell and nuclear shape variation, components Enc r , Dec r , EncD r , DecD r , to conv erg ence (algor ithm 1 ). W e then train the conditional model, components Enc r , s , Dec r , s , EncD s , DecD r , s (algorithm 2 ). In training the model, we adopt three strategies from ( Larsen et al. , 2015 ): we limit er ror signals to relev ant networks by propagating the gradient update from an y DecD through only Dec , we update decoders with respect A dversarial dis- crimination of generated and reconstr ucted images, and we w eight the g radient update from the discriminators with the scalars γ Enc and γ Dec . The parameters are theref ore updated as f ollow s: θ Enc r + ← ∇ θ Enc r (L x r + γ Enc L EncD s ) (15) θ Dec r + ← ∇ θ Dec r (L x r + γ Dec L DecD s ) (16) θ Enc r , s + ← ∇ θ Enc r , s (L x r , s + L ˆ z r + L y + γ Enc L EncD s ) (17) θ Dec r , s + ← ∇ θ Dec r , s (L x r , s + γ Dec L DecD r , s ) (18) 2.4. Integrative Modelling Be yond encoding and decoding images, we are able to lev er - age the conditional model of structure localization given cell Building the Integrated Cell Algorithm 1 T raining procedure reference structure model θ Enc r , θ Dec r , θ EncD r , θ DecD r ← initialize network param- eters repeat X r ← random mini-batch from reference set Z r ← Enc s ( X r ) ˆ X r ← Dec r ( ˆ Z r ) ˆ V EncD r gen ← EncD r ( ˜ Z r ) ˆ V EncD r obs ← EncD r ( Z r ) ˆ V DecD r obs ← DecD r ( X r ) ˆ V DecD r gen ← DecD r ( Dec ( ˜ Z r )) L DecD r ← H ( ˆ V DecD r obs , V obs ) + H ( ˆ V DecD r gen , V gen ) θ DecD r + ← ∇ θ DecD r L DecD r L EncD r ← H ( ˆ V EncD r gen , V gen ) + H ( ˆ V EncD r obs , V obs ) θ EncD r + ← ∇ θ EncD r L EncD r L ˆ X r ← H ( ˆ X r , X r ) L EncD r ← H ( ˆ V EncD r obs , V gen ) L DecD r ← H ( ˆ V DecD r gen , V obs ) + H ( DecD r ( ˆ X r ) , V obs ) θ Enc r + ← ∇ θ Enc r L ˆ X r + γ Enc L EncD r θ Dec r + ← ∇ θ Dec r L ˆ X r + γ Dec L DecD r until conv erg ence and nuclear shape as a tool to predict the localization of un- observed structures, p ( x s | x r , y ) . In par ticular , we use the maximum likelihood structure localization giv en the cell and nuclear channels. The procedure f or predicting this localization is shown in algor ithm 3 . 3. Results 3.1. Data Set For the e xper iments presented here, we use a collection of 2D segmented cell images generated from a maximum in- tensity projection of a 3D confocal microscop y data set from human induced pluripotent stem cells gene edited to e x- press mEGFP on proteins that localize to speciﬁc structures, e.g. α -actinin (actin bundles), α -tubulin (microtubules), β - actin (actin ﬁlaments), desmoplakin (desmosomes), ﬁbr il- larin (nucleolus), lamin B1 (nuclear membrane), my osin IIB (actom y osin bundles), Sec61 β (endoplasmic reticulum), TOM20 (mitochondria), and ZO1 (tight junctions). Details of the source imag e collection are av ailable via the Allen Cell Explorer at http://allencell.org . Br ieﬂy , each image consists of channels cor responding to the nuclear signal, cell membrane signal, and a labeled sub-cellular structure of interest (see ﬁgure 2 ). Individual cells were seg- mented, and each channel was processed by subtracting the Algorithm 2 T raining procedure f or conditional relation- ship model θ Enc r , s , θ Dec r , s , θ EncD s , θ DecD r , s ← initialize network parameters repeat X r , s , Y , Z r ← random mini-batch from reference and str ucture set ˆ Z r , ˆ Y , Z s ← Enc r , s ( X r , s ) ˆ X r , s ← Dec s ( ˆ Z r , ˆ Y , Z s ) ˆ V EncD s gen ← EncD s ( ˜ Z s ) ˆ V EncD s obs ← EncD s ( Z s ) ˆ Y obs ← DecD r , s ( X r , s ) ˆ Y gen ← DecD r , s ( Dec ( ˆ Z r , ˆ Y , ˜ Z s )) L EncD s ← H ( ˆ V EncD r gen , V gen ) + H ( ˆ V EncD s obs , V obs ) θ EncD s + ← ∇ θ EncD s L EncD s L DecD r , s ← − LogSoftMax  ˆ Y obs , Y  − LogSoftMax  ˆ Y gen , Y gen  θ DecD r , s + ← ∇ θ DecD r , s L DecD r , s L ˆ X r , s ← H ( ˆ X r , s , X r , s ) L Y ← − LogSoftMax ( ˆ Y , Y ) L ˆ Z r ← MSE ( ˆ Z r , Z r ) L EncD s ← H ( ˆ V EncD s obs , V gen ) L DecD r , s ← − LogSoftMax ( ˆ Y gen , Y ) − LogSoftMax ( DecD r , s ( ˆ X r , s ) , Y ) θ Enc r , s + ← ∇ θ Enc r , s L ˆ X r , s + L Y + L ˆ Z r + γ Enc L EncD s θ Dec r , s + ← ∇ θ Dec r , s L ˆ X r , s + γ Dec L DecD r , s until conv ergence Algorithm 3 Structure integration procedure trained Enc r and Dec r , s x r ← reference structure image z r ← Enc r ( x r ) f or each structure in str uctures do y ← structure z s ← argmax z s p ( z s ) ˆ x r , s ← Dec r , s ( z r , y , z s ) append ˆ x s to x out end for Building the Integrated Cell Figure 2: Example images f or each of the 10 labeled struc- tures of f ocus in this paper . Ro ws cor respond to observed microscop y imag es, used as inputs to the model, f or six arbitrary cells, each with a particular ﬂuorescently labeled structure as named, sho wn in y ellow . The ref erence struc- tures, the cell membrane and nucleus (DN A), are sho wn in magenta and cyan, respectivel y . Images hav e been cropped f or visualization pur poses. See ﬁgure S6a f or isolated ob- served str ucture channel only . most populous pix el intensity , zeroing-out negativ e-valued pix els, rescaling image intensity between 0 and 1, and max- projecting the 3D image along the height-dimension. The cells w ere aligned by the ma jor axis of the cell shape, and centered according to the center of mass of the segmented nuclear region, and ﬂipped according to image ske w . Each of the 6077 cell imag es were rescaled to 0 . 317 µ m / px , and padded to 256 × 256 pix els. The model took approximatel y 16 hours to train on one Pascal Titan X GPU . 3.2. Model implementation A summar y of the model architectures is descr ibed in Sec- tion B . W e based the architectures and their implementa- tions on a combination of resources, pr imarily ( Larsen et al. , 2015 ; Makhzani et al. , 2015 ; Radf ord et al. , 2015 ), and Kai Ar ulkumaran ’ s A utoencoders package ( Ar ulkumaran , 2017 ). W e found that adding white noise to the ﬁrst la yer of decoder adv ersar ies, DecD r and DecD r , s , stabilizes the relationship betw een the adv ersar y and the autoencoder and improv es con ver gence as in ( Sønderb y et al. , 2016 ) and ( Salimans et al. , 2016 ). W e choose a sixteen dimensional latent space for both Z r and Z s . 3.3. T raining T o train the model, we used the Adam optimizer ( Kingma & Ba , 2014 ) to perform gradient-descent, with a batch size of 32, learning rate of 0.0002 f or all model components ( Enc r , Dec r , EncD r , DecD r , Enc r , s , Dec r , s , EncD s , DecD r , s ), with γ Enc and γ Dec values of 10 − 4 and 10 − 5 respectiv ely . The dimensionality of the latent spaces Z r and Z s w ere set to 16, and the pr ior distribution f or both is an isotropic gaussian. W e spit the data set into 95% training and 5% test (for more details see table S8 ), and trained the model of cell and nuclear shape f or 150 epochs, and the conditional model f or 220 epochs. The model was implemented in T orch7 ( Collobert et al. , 2011 ), and ran on an Nvidia Pascal TitanX. The model took approximatel y 16 hours to train. Fur ther details of our implementation can be f ound in the software repository . The training cur v es f or the ref erence and conditional model are shown in ﬁgure S3 . 3.4. Experiments W e per f or med a variety of “experiments ” exploring the util- ity of our model architecture. While quantitativ e assessment is paramount, the nature of the data makes qualitativ e as- sessment indispensable as well, and w e include experiments Building the Integrated Cell of this type in addition to more traditional measures of per- f or mance. 3.4.1. Image reconstr uction A necessary but not suﬃcient condition for our model to be of use is that the images of cells reconstructed from their latent space representations bear some semblance to the nativ e images. Examples of image reconstruction from the training and test set are shown in ﬁgure S1 f or our reference structures and ﬁgure S2 f or the str ucture localization model. As seen in the ﬁgures, the model is able to recapitulate the essential localization patterns in the cells, and produce accurate reconstructions in both the training and test data. 3.4.2. Latent sp a ce represent ation W e e xplored the generativ e capacity of our model by map- ping out the variation in cell mor phology due to trav ersal of the latent space. Since the latent spaces in our model are sixteen dimensional and isotropic, dimensionality reduction techniques are of little v alue, and we resor ted to mapping 2D slices of the space. T o demonstrate this v ariation is smooth, w e plot the ﬁrst tw o dimensions of the latent space f or cell and nuclear shape variation are sho wn in ﬁgure S4 . The ﬁrst two dimensions of the latent space f or structure variation are shown in ﬁg- ure S5 . In both ﬁgures, the or thogonal dimensions are set to their MLE value of zero. 3.4.3. Image Classifica tion While classiﬁcation is not our pr imary use-case, it is a w or thwhile benchmark of a w ell-functioning multi-class generativ e model. T o ev aluate the per f or mance of the class- label identiﬁcation of Enc r , s w e compared the results of the predicted labels and tr ue labels on our hold out set. A summary of the results of our multinomial classiﬁcation task is shown in table S9 . As seen in the table, our model is able to accurately classify most structure, and has trouble only on the poorl y sampled or under represented classes. 3.4.4. Integratin g Cell Ima ges Conditional upon the cell and nuclear shape, we predict the most lik ely position of an y par ticular structure via algo- rithm 3 . Some ex amples of the maximum likelihood esti- mate of structure localization given cell and nuclear shapes is shown in ﬁgure 3 . 4. Discussion Building models that capture relationships between the mor - phology and organization of cell structures is a diﬃcult problem. While previous research has f ocused on con- Figure 3: Most probable localization patter ns predicted for selected cells f or each structure (row s, top to bottom, struc- ture as labeled, shown in yello w). The ﬁrst 5 columns show the maximum likelihood of localization f or each structure, giv en the cell and nuclear shape. The last column (f ar r ight) sho ws an experimentally obser v ed cell with that labeled structure f or compar ison. As bef ore, ref erence structures, cell membrane and nucleus (DN A), are in magenta and cy an, respectiv ely . Images hav e been cropped for visual- ization purposes. Note f or e xample ho w ﬁbr illarin resides within the DNA, and lamin B1 sur rounds the DNA. See ﬁgure S6b for structure channel only . Building the Integrated Cell structing application-speciﬁc parametr ic approaches, due to the the extreme variation in localization among diﬀer - ence structures, these approaches may not be conv enient to emplo y f or all structures under all conditions. Here, we ha ve presented a nonparametric conditional model of struc- ture organization that generalizes well to a wide v ar iety of localization patter ns, encodes the variation in cell structure and org anization, allow s f or a probabilistic inter pretation of the image distribution, and generates high quality synthetic images. Our model of cell and subcellular structure diﬀers from pre- vious generativ e models ( Zhao & Mur phy , 2007 ; Peng & Murphy , 2011 ; Johnson et al. , 2015 ): w e directly model the localization of ﬂuorescent labels, rather than the detected objects and their boundaries. While object segmentation can be essential in cer tain contexts, and helpful in oth- ers, when these approaches are not necessar y , it can be advantag eous to omit these non-tr ivial inter mediate steps. Our model does not constitute a “cytometr ic” approach (i.e. counting objects), but due to the fact that w e are directl y modeling the localization of signal, we drastically reduce the modeling time b y minimizing the amount of segmen- tation and the task of ev aluating this segmentation with respect to the “ground tr uth ”. Ev en consider ing these these diﬀerences, our model is com- patible with e xisting frame works and will allow for mixed parametric and non-parametric localization relationships, where our model can be used f or predicting localization of structures when an appropriate parametr ic representation ma y not exis t. Our model per mits sev eral straightf or w ard extensions, in- cluding the obvious extension to modeling cells in three dimensions. Because of the ﬂe xibility of our latent-space representation, w e can potentially encode inf or mation such as position in the cell cy cle, or along a diﬀerentiation path- wa y . Given suﬃcient information, it would be possible to encode a representation of “str ucture space” to predict the localization of unobser v ed structures, or “per turbation space ”, such as in ( Paolini et al. , 2006 ), and potentially cou- ple this with active lear ning approaches ( Naik et al. , 2016 ) to build models that learn and encode the localization of div erse subcellular structures under diﬀerent conditions. Softw are and Dat a The code f or running the models used in this work is av ail- able at https://github.com/AllenCellModeling/ torch_integrated_cell The data used to train the model is av ailable at s3://aics. integrated.cell.arxiv.paper.data . A ckno wledg ements W e w ould lik e to thank R ober t F . Mur ph y , Julie The- riot, Rick Horwitz, Graham Johnson, Forrest Collman, Sharmishtaa Seshamani and Fuhui Long f or their helpful comments, sugg estions, and suppor t in the preparation of the manuscr ipt. Further more, we w ould like to thank all members of the Allen Institute f or Cell Science team, who g enerated and characterized the gene-edited cell lines, dev eloped image- based assay s, and recorded the high replicate data sets suit- able f or modeling. W e par ticularl y thank Liya Ding f or segmentation data. These contributions were absolutely critical f or model dev elopment. W e w ould like to thank Paul G. Allen, f ounder of the Allen Institute for Cell Science, for his vision, encouragement and support. A uthor Contributions GRJ conceiv ed, designed and implemented all experiments. GRJ, RMD, and MMM wrote the paper . Building the Integrated Cell Ref erences Arulk umaran, Kai. A utoencoders, 2017. URL https: //github.com/Kaixhin/Autoencoders . Boland, Michael V and Mur ph y , R ober t F . A neural netw ork classiﬁer capable of recognizing the patter ns of all major subcellular str uctures in ﬂuorescence microscope images of hela cells. Bioinformatics , 17(12):1213–1223, 2001. Carpenter, Anne E, Jones, Thouis R, Lamprecht, Michael R, Clarke, Colin, Kang, In Han, Fr iman, Ola, Guer tin, Da vid A, Chang, Joo Han, Lindquis t, R ober t A, Moﬀat, Jason, Golland, Polina, and Sabatini, David M. CellPro- ﬁler: imag e analy sis software f or identifying and quan- tifying cell phenotypes. Genome biology , 7(10):R100, 2006. Collobert, R onan, Kavukcuoglu, Kora y , and Farabet, ClÃľ- ment. T orch7: A matlab-like en vironment f or machine learning, 2011. Dono van, R or y M, T apia, Jose-Juan, Sullivan, De vin P , F aeder, James R, Mur ph y , Robert F , Dittrich, Markus, and Zuck er man, Daniel M. Unbiased rare ev ent sam- pling in spatial stochas tic sys tems biology models using a w eighted ensemble of tra jector ies. PLoS computational biology , 12(2):e1004611, 2016. Goodf ellow , Ian J, P ouget- Abadie, Jean, Mirza, Mehdi, X u, Bing, W arde-Far ley , David, Ozair, Sherjil, Cour ville, Aaron, and Bengio, Y oshua. Generativ e A dversarial Net- w orks. arXiv .org , June 2014. Johnson, G R, Buck, T E, Sullivan, D P , R ohde, G K, and Murphy , R F . Joint modeling of cell and nuclear shape variation. Molecular Biology of the Cell , 26(22):4046– 4056, No v ember 2015. Kim, Min-Sik, Pinto, Sneha M, Getnet, Derese, Nir u- jogi, Raja Sekhar , Manda, Sr ikanth S, Chaerkady , Raghothama, Madugundu, Anil K, Kelkar , Dhanashree S, Isserlin, R uth, Jain, Shobhit, et al. A draft map of the human proteome. Natur e , 509(7502): 575–581, 2014. Kingma, Dieder ik P and Ba, Jimmy . Adam: A Method f or Stoc hastic Optimization. arXiv .org , December 2014. Larsen, Anders Boesen Lindbo, Sønderb y , Søren Kaae, Larochelle, Hugo, and Winther , Ole. A utoencoding be- y ond pixels using a learned similar ity metric. arXiv .org , December 2015. Makhzani, Alireza, Shlens, Jonathon, Jaitl y , Na vdeep, Goodf ellow , Ian, and Fre y , Brendan. A dversarial Au- toencoders. arXiv .org , No vember 2015. Murphy , R F . Location proteomics: a sys tems approach to subcellular location. Biochemical Society transactions , 33(Pt 3):535–538, June 2005. Naik, Ar maghan W , Kangas, Joshua D, Sullivan, Devin P , and Mur ph y , Robert F . Activ e machine learning-dr iv en e xper imentation to deter mine compound eﬀects on pro- tein patter ns. eLif e , 5:e10047, February 2016. Paolini, Gaia V , Shapland, Ric hard H B, van Hoor n, Willem P , Mason, Jonathan S, and Hopkins, Andrew L. Global mapping of phar macological space. Natur e biot echnology , 24(7):805–815, July 2006. Peng, T ao and Mur ph y , Robert F . Imag e-der iv ed, three- dimensional generativ e models of cellular organization. Cytometry P ar t A , 79A(5):383–391, April 2011. Radf ord, Alec, Metz, Luke, and Chintala, Soumith. Un- supervised Representation Lear ning with Deep Conv o- lutional Generative Adv ersar ial Netw orks. arXiv .org , No vember 2015. Rajaram, Satwik, P avie, Benjamin, W u, Lani F , and Altschuler , Ste ven J. PhenoRipper: software f or rapidly proﬁling microscopy images. Natur e Met hods , 9(7):635– 637, June 2012. Salimans, Tim, Goodf ellow , Ian, Zaremba, W ojciech, Che- ung, Vic ki, Radf ord, Alec, and Chen, Xi. Improv ed T echniques f or T raining GANs. arXiv .org , June 2016. Sønderb y , Casper Kaae, Caballero, Jose, Theis, Lucas, Shi, W enzhe, and Huszár , Ferenc. Amor tised MAP Inf erence f or Image Super -resolution. arXiv .org , October 2016. Zhao, Ting and Mur ph y , R ober t F . Automated lear ning of generativ e models for subcellular location: Building blocks for sys tems biology. Cytometr y P art A , 71A(12): 978–990, 2007. Building the Integrated Cell A. Supplementary Figures Building the Integrated Cell Figure S1: Image input (row s 1 and 3) and reconstr uction (row s 2 and 4) from the reference model, sho wing training set (abo ve two row s), and test set (bottom two row s). Figure S2: Imag e input (ro ws 1 and 3) and reconstruction (row s 2 and 4) from the structure model, sho wing training set (abo ve two row s), and test set (bottom two row s). (a) (b) Figure S3: T raining cur v es f or the training of the reference model ( a ) and conditional model ( b ) Building the Integrated Cell (a) (b) Figure S4: ( a ) show s the ﬁrst tw o dimensions of the reference structure latent space Z r . ( b ) show s the ﬁrst two dimensions of the latent space sampleded at -3, -1.5, 0, 1.5 and 3 standard deviations in Z r 1 (horizontal) and Z r 2 (v er tical). Imag es hav e been cropped for visualization pur poses. (a) (b) Figure S5: ( a ) sho ws the ﬁrst two dimensions of the reference structure latent space Z s . ( b ) show s the ﬁrst two dimensions of the TOM20 latent space sampleded at -3, -1.5, 0, 1.5 and 3 standard de viations in Z s 1 (horizontal) and Z s 2 (v er tical). Images hav e been cropped f or visualization pur poses. Building the Integrated Cell (a) (b) Figure S6: ( a ) Example structure channels f or each of the 10 labeled structures in this paper and ( b ) predicted most probable localization patterns for selected cells from each labeled pattern. The ﬁrst 5 columns show the maximum likelihood localization f or the cor responding str uctures giv en the the same cell and nuclear shape. The last column show s a obser v ed cell with that labeled structure. Ro ws cor respond to structure types. Images hav e been cropped f or visualization pur poses. Building the Integrated Cell B. Model Arc hitectures 4 × 4 64 conv ↓ BNorm PReLU 4 × 4 128 conv ↓ BNorm PReLU 4 × 4 256 conv ↓ BNorm PReLU 4 × 4 512 conv ↓ BNorm PReLU 4 × 4 1024 conv ↓ BNorm PReLU 4 × 4 1024 conv ↓ BNorm PReLU | Z r | FC BNorm T able S1: Architecture of Enc r 1024 FC BNorm PReLU 4 × 4 1024 conv ↑ BNorm PReLU 4 × 4 512 conv ↑ BNorm PReLU 4 × 4 256 conv ↑ BNorm PReLU 4 × 4 128 conv ↑ BNorm PReLU 4 × 4 64 conv ↑ BNorm PReLU 4 × 4 | r | conv ↑ BN orm sigmoid T able S2: Architecture of Dec r 1024 FC Leaky RelU 1024 FC BNorm Leaky RelU 512 FC BNorm Leaky RelU 1 FC Sigmoid T able S3: Architecture of EncD r and EncD s + White Noise σ = 0 . 05 4 × 4 64 conv ↓ BNorm LeakyReLU 4 × 4 128 conv ↓ BNorm LeakyReLU 4 × 4 256 conv ↓ BNorm LeakyReLU 4 × 4 512 conv ↓ BNorm LeakyReLU 4 × 4 512 conv ↓ BNorm LeakyReLU 4 × 4 1 conv ↓ sigmoid T able S4: Architecture of DecD r 4 × 4 64 conv ↓ BNorm PReLU 4 × 4 128 conv ↓ BNorm PReLU 4 × 4 256 conv ↓ BNorm PReLU 4 × 4 512 conv ↓ BNorm PReLU 4 × 4 1024 conv ↓ BNorm PReLU 4 × 4 1024 conv ↓ BNorm PReLU {K FC, | Z r | FC, | Z s | FC} {BNorm, BNorm, BNorm} {Softmax, , } T able S5: Architecture of Enc r , s 1024 FC BNorm PReLU 4 × 4 1024 conv ↑ BNorm PReLU 4 × 4 512 conv ↑ BNorm PReLU 4 × 4 256 conv ↑ BNorm PReLU 4 × 4 128 conv ↑ BNorm PReLU 4 × 4 64 conv ↑ BNorm PReLU 4 × 4 | r + s | conv ↑ BNorm sigmoid T able S6: Architecture of Dec r , s + White Noise σ = 0 . 05 4 × 4 64 conv ↓ BNorm LeakyReLU 4 × 4 128 conv ↓ BNorm LeakyReLU 4 × 4 256 conv ↓ BNorm LeakyReLU 4 × 4 512 conv ↓ BNorm LeakyReLU 4 × 4 512 conv ↓ BNorm LeakyReLU 4 × 4 K+1 conv ↓ sigmoid T able S7: Architecture of DecD r , s Building the Integrated Cell C. Data Labeled Structure #total #train #test α -actinin 493 462 31 α -tubulin 1043 1002 41 β -actin 542 513 29 Desmoplakin 229 219 10 Fibrillarin 988 953 35 Lamin B1 785 739 46 My osin IIB 157 149 8 Sec61 β 835 784 51 TOM20 771 723 48 ZO1 234 229 5 T able S8: Labeled str uctures and their train/test split Building the Integrated Cell α -actinin 22 2 5 1 0 0 0 0 0 1 α -tubulin 0 36 3 0 0 0 0 1 1 0 β -actin 3 7 19 0 0 0 0 0 0 0 Desmoplakin 1 0 1 7 0 0 0 0 0 1 Fibrillarin 0 0 0 0 35 0 0 0 0 0 Lamin B1 0 0 0 0 0 46 0 0 0 0 My osin IIB 2 0 0 1 0 0 0 0 1 4 Sec61 β 0 1 0 0 0 0 0 50 0 0 TOM20 0 1 0 0 0 0 0 0 47 0 ZO1 1 0 0 1 0 0 2 0 0 1 T able S9: Labeled str ucture class prediction results on hold out

Generative Modeling with Conditional Autoencoders: Building an Integrated Cell

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment