Improving the resolution of Cryo-EM single particle analysis

We presented a new 3D refinement method for Cryo-EM single particle analysis which can improve the resolution of final electron density map in this paper. We proposed to enforce both sparsity and smoothness to improve the regularity of electron densi…

Authors: Zhenwei Luo

Improving the resolution of Cryo-EM single particle analysis
I m p r o v in g th e r e s o l u t i o n o f C r y o - EM s i n g le p a r t i c l e a n a ly s i s Z h e n w e i L u o Rice University, D epartment of Bioengine ering Houston, TX Zl24@rice.edu Abstract . We presented a new 3D refinement method for Cryo - EM single particle analysis which can improve the resolution of final electron density map in this paper . We propose d to enforce both sparsity and smoothness to improve the regularity of electron density map in the refinement process. To achieve this goal, w e designed a novel type of real space penalty function and incorporate d it into the refinement process. We bridged the backprojection step with local kernel regression, thus enabl ing us to embed the 3D model in reproducing kernel Hilbert space using specific kernels. We also proposed a first order method to solve the resul ting optimization problem and implemented it efficiently with CUDA. We compare d the performance of our new method with respect to the traditional method on real datasets using a set of widely used metric s for Cryo - EM model validation . We demonstrated that our method outperforms the traditional method in terms of those metrics. The implementation of our method can be found at https://github.com/alncat/cryoem. Keywords: Cryo - EM , 3D reconstruction, i ll -posed inverse problem, smoothness, sparsity, r eproduc ing kernel Hilbert space, CUDA 1 Introduction Cryo-electron microscopy (Cryo- EM ) single particle ana lysis becomes an inc re as ingly popular method to visualize molecular structure. C ryo- EM has certain advantages over the traditional X-ray crystallography as it doesn’t require c rystallization and isn’t plagued by the phas ing problem. However, there are als o many new challenge s a rise in th is promising technique. T he central problem of Cryo- EM single particle analys is is the incompleteness of experimental observations. More s pecif ically, the relative orientations and translations of all particles are missing, and the m embe rships of all particles in a structural heterogenous dataset remain unknown. Moreove r, the signal to noise ratio of Cryo- EM datas et is often very low since the electr on expos ure of the sample nee ds to be strictly lim ited to prevent radiation damage [ 1], [2]. Other 2 problems often present in Cryo- EM dataset is the nonuniform a ngular sampling, which results in in ade quate samples or ev en no s amples in certain orie ntations [3]. Therefore, building an ultrahigh dimension 3D model with incomplete and highly noisy data is ill -posed. To a lleviate this problem, prior assumptions must be incor porated into the building process to ensure the unique n ess of solution and the objectivity of f inal model. T wo prominent features of a 3D mo lec ular model a re s p arse and smooth. More specifically, s ince the electron densiti es of a molecule only occ upy a small part of the 3D volume it resides in , the molecule models are often sparse; Be cause atoms in molecules a re connected through chemical bond s, the ele ctron densities of molecule s vary smoothly across the space. The importance of smoothne s s pr ior is widely recognized in Cryo- EM based 3D model refinement. Early attempt to enfor ce the smoothness of the density map is by applying Wiener f ilter [4]. Later approaches improve upon the Wiener filter by using Bayesian statistics. As it was proposed by Scheres et al. , he first assumed that the Fourier components of the dens ity map are distributed according to a gaus sian distributio n [5]. He then obtained a c losed form solution of the corresponding Maximum a pos terior (MAP) pr oblem for the Fourier components of the density map, which is similar to the wiener fil tering, a nd employed this formula to update the 3D model in maximization step. T his approach is referred to as the traditional approach in the rest of this pape r. Though sparsity is a popular prior in solving inverse problem, it is a relatively novel notion to the Cryo- EM ba sed 3D model refinement. In this paper, we propose a new approach to regularize the 3D model which encourage s both sparsity and smoothn es s of the 3 D model. Our approach is inspired by many s ucces s ful methods which are proposed rec ently for solving the ill -posed inverse problem by imposing the spa rse and smooth p riors, s uch as   regularization, compressed sensing and total variation [6] – [9] . I t is tempting to apply the state-of-the-art sparsity learning algorithms to the Cr yo- EM based 3D model reconstruction problem. In this paper, to encourage the sparsen es s and smoothness of the reconstructed electron density map while s uppressing bias, we proposed a non- co ncave non- sm ooth restraint by combing lasso and total variat ion. Since the non- concave non-smooth target function is diff icult to optimize directly, we de sign ed a reweighted scheme to a pproximately optimize the target f unction with a s equence of weighted   regularization and total variation probl em. The w eigh ted problem can be solved by the smoothing proximal gradient method. As it is later re vealed in the method section, the traditional approach c an b e vie wed as applying a translation invariant rotationally symmetric kernel to the 3D model, whereas our new a pproach applies a s p atial ly varying kernel to the 3D model. Real 3D models often present anisotropic smoothness, i.e., electron densities remain constant along the directions of chemical bonds while they decay fast in directions perpendicul ar to the directions of chemical bonds. T herefore, our new approach can adapt to spatiall y varying 3 smoothness exis ts in the 3D models of macromolecules an d outperfor m traditional approach in certain classes of models. Anothe r challenge of 3D m odel recons truction in real space is its ultr ahigh dimens ionality which results in prohibi ting computational cost. For example, a common      3D volume repres ents hundreds of millions of variables. We address this challenge by implementing a C UDA accelerated solver for our method. It has been shown that our solver can obtain medium accurate solution within 10 seconds on a problem of size      . Our final modification to the Cryo- EM refinement process is bridging the backprojection with local kernel regression , thus paving a novel way to promote the s moothness of 3D model. We propose us ing gaussian ker nel in the backprojection step which can represent a widely us ed reproducing kernel Hilbert sp ac e — a Hilbert space e ndowed with some notion of smoothness . In the following paper, we fir st de scribe the underlying theory of Cryo- EM structure determination to define the statistical framework on which we later work. We then propose a ne w type of restraint for the dens ity map in real spa ce which encodes the available prior information. Next, since there is no close d fo rm solution for the new target function, we design an iterative method to compute a 3D model maximize s the new target function. Using tools in real ana ly sis , we el ucida te the difference between the traditional method and our new method theoretically. We also derive the relationship between bac kprojection and lo cal kernel regression and propose a new choice of kernel. Finally, we demonstrate the effectiveness of our new method by applying it to re al datas ets and compare the refined results with those obtained by traditional approach. We can observe improvements in both the gold standard FSC and model-map FSC in these datasets, which s ugg es ts that our m ethod is able to obtain reconstructed maps with higher resolution. 2 Met h ods The C ryo- EM refinement problem can be formulated as fitting a generative model whose data ge n eration proces s is illustrated as follows . For sim plicity, we ass ume th e particles are from a single structure, while structural heterogeneity can be easily incorporated into our generative model by treating the class mem bers hip of each particle as a hidden variable [5]. The images collected in Cryo- EM experiments are 2D projections of a 3D molecular structure. T he fourier tr ans form of the image has following relation wit h the fourier transform of the 3D molecular structure. Let the fourier t rans form of a 3D molecular structure be  , we first arrange the 3D volume  into a vector with  ele ments. Assume an    image  is formed by projecting the 4 3D volume  with the euler angle se t  and shifting by 󰇟     󰇠 from the origin, using projection-slice theorem, the fourier transform of the image  c an be expressed as        󰇛     󰇜             󰇛󰇜 where   is the  th component of the fourier transform of the image  whose corresponding 2D index is 󰇟   󰇠 ,    is the  th component of the contrast transfer function for the image  [10], [11], and    is the slice operator which takes out the plane in the 3D fourier transform  which is rotated from the  plane ac cording to the euler angle s et  . Note that images acquired in experiments are often contaminated by noises , we can ac count the unc ertainty of experimental data by a distribution. Suppose the fourier component   is distributed according to gaussian with the mea n defined in equation 󰇛󰇜 and variance   , and the gauss i an nois e of each component is independent, the marginal probability of observing image  can be obtained by integrating out all possible orientations  and translations   󰇟     󰇠 as foll owing,  󰇛     󰇜    󰇫                                      󰇬  󰇛󰇜 In following sections, we omit translation factors in the squa red difference term in equation 󰇛󰇜 to simplify expressions . After obtaining the marginal probability of data, we can then reconstruct the 3D model by maximizing the total log likelihood function    󰇛  󰇜    󰇛󰇜 Ho wever, this log likelihood function fits the model to experimental data without applying any prior, thus often generating overfitted models in practice. Hence , we should conside r adding appropriate prior s to the log likelihood function to guarantee the feasibility of solution and reduce overfitting during r efinem ent. A s mentioned in introduction, the 3D molecular models are both sparse and s mo oth . In order to incorporate these priors into refinement, a mathematical formulation for th ese priors must be developed. On the one hand, mathematician has associated the smoothness of a function with the norm of its gradient. On the other hand, sparsity refers to the 5 number of zeros in the values of function [12]. In the following section, we will formulate dif ferent smoothnes s priors and revea l the ir differ enc e. Two ke y equati ons illustrating the effect of pr evio us smoothnes s restraint and our new smoothness restraint are equation 󰇛󰇜 and equation 󰇛  󰇜 , respe ctively. Only considering smoothness s o far, let  be the 3D model, we can promote the smoothness of a 3D model by m aximizing the log likelihood while restraining the gradients of solution, namely, maximizing the following target function    󰇛     󰇜            󰇛󰇜 where  and  are pos itive parameters. We start by demonstrating that the restraint on fourier coeff icients in traditional method closely resembles the squared norm of gradients. When    , the norm of gradients can be s imply transformed to a weighted norm of f our ier coefficients using c ertain results in harmonic an alys is . Since the fourier t rans form of the derivative of a function equals with multiplying the fourier transform of this function by the corr es ponding frequency component, that is,  󰇡   󰇢     using Planchere l’s the orem, we have          󰇛         󰇜          󰇛󰇜 where     represents the fourier transform of the 3D model  . T raditional method proposed in [13] enforces smoothness by applying a quadratic r e straint on the magnitudes of fourier tr ans forms based on the assumption that they are distributed according to gauss ian. In traditional method, the regularization parameter  de pends only on the radius of fourier s hell, i.e., all fourier t ransforms    with the same radius            a re re gularized by the s ame regularization paramet er. More specifically, the regularization strength  󰇛󰇜 is the product of the average of the factor  appe ared in equation 󰇛  󰇜 in that shell and a function  󰇛 󰇜 of th e invers e of the gold standard FSC value of that shell [14] , that is,  󰇛  󰇜    󰇛󰇜   where 󰇛󰇜 is the value of FSC of the corresponding shell. In fact,  󰇛  󰇜   󰇛󰇜 , that is, the weight is the inverse of the s ignal to noise r atio at that s hell according to [15], [ 16]. It becomes e vident that traditional method enforces smoothness i n a s imilar way as restraining the squared norm of the gradient of 3D model  sinc e t hey a re both functions of the radius of fourier she ll. However, traditional method improves upon the simple squared gradient norm restraint by adopting the gold standa rd FSC derived restraint strength across differ ent s hells. To study the effect of the restraint of 6 traditional method in more detail, we will examine how this restr aint act s in the optimization process. B y using gr adie nt ascent algorithm, the penalized log likelihood function in equation 󰇛󰇜 can be maximized by updating the fourier transform of 3D model  as    󰆒              󰇛     󰇜      󰇛  󰇜      󰇛󰇜 We can then unde rst and the effect of the restraint by taking inver se fourier transform for the term  󰇛  󰇜    . By convolution theorem, let the fourier tr ans form of  󰇛  󰇜 be  󰇛     󰇜 , which is a radial function, the inverse fourier transform of  󰇛  󰇜    can be written as,  󰆒 󰇛  󰇜    󰇛      󰇜  󰇛  󰇜     󰇛󰇜 Equation 󰇛󰇜 clearly states that the restraint in traditional method a cts a translation invariant rotationally symmetric al kernel 󰇛       󰇜 durin g the optimizati on process. At each step, the old solution is updated by a linear co mbination of the gradient of log marginal li kelihood function and the radial kern el smoothed old solution. We thus postulate that the traditional method bias es toward the 3D model with homogeneous smoothness across space. We will present our new prio rs in the remainder of this section. Assume  is the 3D volume which is rearranged into a ve ctor, that is, a grid poi nt with index 󰇟   󰇠 is mapped to the  th component    of  , a nd let  be the corresponding 3D F ourier transform matrix , we can expres s the Fourier coefficients of the 3D volume  a s the result of matrix vector multi plication , namely,    . By encod ing th e s moothness of model using total variation norm, i.e., setting    in equation 󰇛 󰇜 , and the sparsity of the model using   norm, the log likelihood with pr io rs is of the form    󰇛     󰇜                 󰇛󰇜 where     is the   norm,     is the total variation norm, which is the sum of the magnitude of gradient at each grid point,               and  and  are positive parameters. T he gradient   in the total variation norm can be calculated via discrete approximation. T hough these two prio rs c an effectivel y guarantee both sparsity and s moothne ss of the solution, they will introduce certain bias es to the final solution as de scrib ed in [17]. F or example, the solution obtained with   regularization tends to shrink the nonze ro element s . Fan discovered that nonconcave penalty can 7 effectively prevent the true nonzero elements from overly s hr in king while preserving sparsity [17]. Hence, we consider employing a nonconcave penalt y — the log norm which is advoc ated in [18] to reduce possible bias es in solution. Our log likelihood function with nonconcave priors can be written as    󰇛     󰇜      󰇛           󰇛        󰇜󰇜   󰇛󰇜 where  is a small positive constant introduced to guard against t he s ingularity of l og function around zero. The remainder of this s ection is devoted to proposing an algorit hm to optimize our likelihood function with newly proposed priors. The total pena l ized likelihood function consists of three different terms which can be tackled by different algorithms. Fi rst of all, the log likelihood function of the marginal probability in equation 󰇛 󰇜 can be optimized by the expec tation-maximization method [19]. Th is algorithm works as follows; Since the difference between log likelihoods of the ma rginal probability can be lower bounded by the difference betwee n the sums of log likelihoods of the joint probability weighted by their corresponding posterior probabilities for latent variables , i.e.,   󰇛     󰇜    󰇛     󰇜    󰇛        󰇜 󰇛                            󰇜 , maximizing the lower bound improves th e log likelihood of the marginal probability at least as much [20]. At the expectation ste p, we calc ulate the posterior probability of latent variable s conditioned on a given image and a model. The method to compute the posterior probability derived in [5] can be applied in the context of our method without any modification. At the  th maximization step, we maximize the log likelihoods of marginal probabilities by replacing them with the lower bounds, as follows:      󰇛        󰇜                  󰇛            󰇛        󰇜󰇜    󰇛  󰇜 where 󰇛       󰇜 is the c onditional probability of the late nt variables given the observation   and the model   from the previous iteration,  and  are the weights of   norm and total variation nor m, respec tively. Th e 3D model at the  th step thus is the maximizer of equation 󰇛  󰇜 . The nons moothness and nonconcavity of newly introduced penalties pose big challenges for optimizin g the c orresponding objective function. T o address the 8 nonconcavity of log function, we use a sequence of w eighted concave los s to approximate the nonconca ve lo s s as in [18], [21] , [22]. More sp ecifically, a t iteration    , let the s olution obtained in previous iteration be   , we approximate equation 󰇛  󰇜 with the weighted   and total variation norms and define the solution at this iteration as the maximizer of the following equation,      󰇛        󰇜                   󰇛                             󰇜 󰇛  󰇜 Thes e appr oximations a re the tangent line s for each logarithm in t he log norm at the previous solution   . Since the log function is a concave function, its tangent line are its upper bounds [18]. Minimizing the upper bounds surely leads to lower log norm since the log function is monotone . Therefor e, this a pproximation algor ithm is guaranteed to maximize the penalized likelihood. It’s easy to se e that the we ight norms have similar debias ing effect s as the log function since it downweighs the components with large values in the   or TV norms, and it is of the sa me s pirit as the adaptive lasso method propose d by Zou [21]. By decomposing the nonc oncave problem as a series of conc ave optimization pr oblems , we may leverage existing algorithm s such as gradient asce nt in concave optimization to solve th e concave su bproblem. Howeve r, our target function contains some nons mooth terms, namely, th e total variation (T V) norm and the   norm in equation 󰇛  󰇜 . T hese terms are nondiffer entiable around zero and prevent us to apply the gradient ascent algorithm dir ec tly to maximize the penalized likelihood function. We then leve rage algorit hms for optimizing nonsmooth function to deal with those terms. For the total variation norm, we propos e to use Nesterov s moothing to obtain an approximate gradient [23], which allows us to circumvent the nondifferentiability of T V norm and apply gradie nt asce nt. T o derive the gradient of Nesterov smoothed TV norm, we begin with giving the disc rete form of TV norm. For a point  󰇟     󰇠 of a 3D model  , the dis crete approxim ation of the gradient of  at this point is of the form  󰇟     󰇠  󰇯    󰇟    󰇠    󰇟     󰇠    󰇟     󰇠 󰇰  󰇯  󰇟     󰇠   󰇟    󰇠  󰇟     󰇠   󰇟      󰇠  󰇟     󰇠   󰇟      󰇠 󰇰  󰇛  󰇜 Denote   as the matrix form of the discrete differentiation operator along the  th dimension for the vectorized volume  , na mely, we have    as the gradient of  9 along the  th dimension, and let   󰇟         󰇠  be a matrix composed by concatenating   by rows, the dual norm of TV nor m can be defined as,               󰇛  󰇜 where   󰇟         󰇠     is a vector formed by concatenating three vectors of the same s ize as  a nd   is defined as the vector space of the same dimension as  which satisfies the inequality   󰇟     󰇠     󰇟     󰇠     󰇟     󰇠    . Us ing Nesterov smoothing with smoothing parameter  , we can obta in a n ew functional   󰇛  󰇜 , i.e., the smoothed total variation norm, which is of the form,   󰇛  󰇜                   󰇛  󰇜 [24]. The gradient of the smoothed T V norm   󰇛  󰇜 can be written as   󰇛  󰇜      󰇛  󰇜  󰇛  󰇜 where   󰇛 󰇜 is a vec tor of the for m 󰇟         󰇠  and for each dimension   󰇟󰇠 ,   󰇟    󰇠     󰇛    󰇜󰇟    󰇠      󰇟     󰇠     󰇟     󰇠  󰇛    󰇜󰇟    󰇠    󰇛  󰇜 The above d erivation simply s hows that the gradient of smoothed TV norm can be obtained by first calculating the norm of discrete gradient of the volume  at each point 󰇟    󰇠 , and then s etting the gradient norm of the volume  ,   󰇟     󰇠 , with value smaller than the smoothing parameter  to the smoothing parameter itself, thus keeping the denominator of the gradient of TV norm in a valid range and a voiding the nondifferentiabili ty of the nons moothed TV norm around zero. We pa use to expose the effect of TV norm in the optimization process by examining the gradient   󰇛  󰇜 . For a grid point 󰇛     󰇜 , the gradient at this point is of the form,   󰇛  󰇜      󰇟     󰇠      󰇟󰇛    󰇜    󰇠  󰇛  󰇜 where   is a    vector w ith one on the  th entry and zeros elsewhere. Substituting equation 󰇛  󰇜 into equation 󰇛  󰇜 , w e can observe that the gradient of TV norm at gr id 10 point 󰇛   󰇜 depends on gradients of 3D model around this point. Mor e s pecifically, equation 󰇛  󰇜 can be reformulated as   󰇛  󰇜    󰇛  󰇜  󰇟     󰇠    󰇛   󰇜   󰆒  󰇟 󰇛    󰇜    󰇠     󰇛  󰇜 where  󰇛  󰇜     󰇟     󰇠      󰇟󰇛     󰇜    󰇠     ,  󰇛   󰇜   󰆒    󰇟     󰇠  and  󰇛   󰇜   󰆒    󰇟󰇛    󰇜    󰇠  . The continuous form of equation 󰇛  󰇜 can be written as  󰆒 󰇛  󰇜    󰇛    󰇜  󰇛  󰇜     󰇛  󰇜 where the kernel  󰇛  󰇜 depends on th e loc ation  in smoothed volume and the location  in original volume simultaneously. T herefore, the TV norm acts as a spatially varying kernel  󰇛  󰇜 during the optimization process , which ca n adapt to heterogenous smoothness exists in the 3D model. We continue to des ign an optimization algorithm for the   regularization part. The target function with   norm penalty can be optimized by proximal operator, which is the soft-thresholding operator in this case [25]. Combining above techniques, denote  as the expected minus log likelihood   󰇛        󰇜                ,   as the weight  󰇼   󰇼   in TV norm,    as the weight  󰇻   󰇻 in   norm, and   as the 3d volume at iteration  of this maximization step, and let the lea rni ng rate be   , at    th iteration, for th e  th component of  , to s olve the optimization problem defined in equation 󰇛  󰇜 , the proximal operator to update the 3d volume is defined as following,           󰇼   󰇛      󰇡                 󰇢󰇜󰇼        󰆒       󰇛  󰇜 which has a c lo se d form solution as follows,            󰆒           󰆒            󰆒      󰆒         󰇛  󰇜 where   󰆒       󰇡 󰇛   󰇜       󰇛   󰇜 󰇢 is the 3D model after gradient descent update and s ign represent s the sign function. We can clearly see the eff ec t of   norm 11 here as it se t s the volume whe re the valu e is relatively small to ze ro and guarantees the sparsity of the volume. Since the e lectron den sities for molecules should have higher values comparing with the ba ckground noises, we expect the background noises will be suppressed while the electron densities of molecules will ke ep untouch ed af ter applying the weighted   restraint. The s parsity restraint results in cle aner 3D models. We also consider using the implicit gradient descent method proposed in [26] to improve the stability of optimization pr oces s. This can be achie ved by adding a quadratic restraint which favors a new solution that is closer to the s olution obtained in previous maximization step. T herefor e, at the    th iteration of the  th maximization step, the optimal model  is defined as the maximizer of the equa tion below,     󰇛        󰇜                                                               󰇛  󰇜 where    is the solution obtained at the    th maximization step and    is the weight of the im plicit gradient descent res traint. Since the newly added term is quadratic and differentiable, it contributes a new term to the gradient des cent updated model    . In conclusion, at each maximization step, we use equation 󰇛  󰇜 iteratively to update the 3d volume  and obtain a final solution after certai n number of iterations. In our new method, we apply the expecta tion-maximization me thod with the aforementioned modifications to reconstruct the 3d volume un til the c onvergence criteria are met. 2.2 Local Kernel Regre ssi on An important step in the Cryo- EM refinement is bac kprojection. After the orientation an d translation of an image is estimated, the 2D fourier transform of image can be put back to its corresponding plane in the 3D fourier transform of ele ctron density map in this step. B ackproj ec tion is used to construct the lower bound o f the log marginal likelihood of each image as stated before, a nd can be form ulated a s an optimization problem where the 3D volume obtained by mini mi zing the total loss between slices of 3D volume and the corresponding fourier transfor mations of 2D images , i.e.,      󰇛        󰇜                  󰇛  󰇜 12 which is the first term in equation 󰇛  󰇜 . A practical problem to be considered here is that the 3D volume is discretely sampled with integer indices, while the 2D slice corresponding to the backprojected 2D image has noninteger indices. Formally, let 󰇟    󰇠    be the index of a point in the 2D image, suppose the corresponding slice of the 2D image in 3D volume is defined by the latent parameter  , which is a s et of euler angles defining the rotation of plane if ignor ing translation, after backprojection, the index of the point in 3D volume 󰇟    󰇠 is of the form,               󰇛  󰇜 where   is the rotation matrix tr ans forming the 2D image according to the eule r angles  Obviously, 󰇟    󰇠    , while the sampling point of the 3D volume has index 󰇟    󰇠    . Therefore, there is no one to one mapping between the s ampling point in the backprojected 2D slice    and the sampling point in t he 3D volume  . This dis crep anc y is solved by using the nonparam etric regress i on , which has been proposed in image processing by [27]. Simply put, denote   as the value at a certain point      , and let  be the value at the point     which is to be predicted, as in [27], [28] , we can define  as the minimizer of   󰇧        󰇨            󰇛  󰇜 where  󰇛󰇜 represents a chosen kernel function and  is the bandwidth of t he ke rnel function. Returning to the context of cryo em r efinement, for an orientation  , let the corresponding grid point of a f ourier coef ficient  of an image   in a 3D volume  be   󰇛  󰇜  󰇟        󰇠 wit h              , the value of the grid point   󰇟    󰇠    is the minimizer of   󰇛        󰇜  󰇧    󰇛󰇜     󰇨                󰇛  󰇜 The total loss i s of the for m,      󰇛        󰇜  󰇧    󰇛󰇜     󰇨                      󰇛  󰇜 13 For our new method, we consider using the gaussian kernel as our kernel function, namely,         󰇛              󰇜 , and we set the bandwidth  to be 2. For practical consideration, the fourier coefficient   should not be back projected to every grid point. L et    be the round operator, we consider back projecting   with index 󰇟        󰇠 to it neighboring gr id points 󰇟                        󰇠 , where    󰇟󰇠 . In the original implementation of R elion , they chose a kernel with tril inear interpolator like weight. Us ing the f ormalism introduced in this paper, their kernel function are of the form                               where                               and                                    , where    and    are floor and ceiling operators, respectively . Hence the four ier coefficient   is backprojected to 8 neighboring grid point s in the original implementation of Relion . We here take a second look at the e qu ation 󰇛  󰇜 . By grouping loss terms w.r.t    into a single term and ignoring constant terms pertaining to   , we can rearrange equation 󰇛  󰇜 as  󰇛󰇜 󰈒         󰇛        󰇜     󰇛󰇜                󰇛 󰇜 󰈒   󰇛  󰇜 where  󰇛󰇜      󰇛        󰇜     󰇛󰇜              . Thus our new algorithm estimates the value of 3D vol ume a t a grid point 󰇟   󰇠 by a discrete approximation of the continuous convolution between kernel function and the weighted experimental data   󰆓  󰆓  󰆓 , which is of the form         󰆓  󰆓  󰆓 󰇧       󰇨   󰆓  󰆓  󰆓   󰆒  󰆒  󰆒  󰇛  󰇜 where  is the normalization factor for the kernel . We can then choose a c ertain kernel to embed    in the reproducing kernel Hilbert space (RKH S) as s o ciate d with that kernel, which enforces    to exhibit the smoothness of RKHS induced by the chosen kernel [29]. 14 2.3 Gold Stand ard FSC The gold sta nd ard Fourier Shell Correlation ( FSC ) is a method to determine the resolution of refined model without overfitting [14] . T he FSC b etwe en the Fourier components F and G is de fined by  󰇛  󰇜      󰇍  󰆒    󰇛 󰇍  󰆒 󰇜  󰇍  󰆓        󰇍  󰆒    󰇍  󰆓        󰇍  󰆒    󰇍  󰆓   󰇛  󰇜 where  󰇍  󰆒  is the magnitude of the spatial frequency v ec tor  󰇍   . FSC measures the similarity between the two maps. The closer the FSC is to one, the more similar the two normalized maps are. Since the Cryo- EM refinement is carried out in unsupervised fas hion, the model quality is judged by examining the cons istency of models acros s independent refin eme nts. The gold standard FSC is ca lculated as fo llowing. At the beginning of refinement, the data is randomly split into two subsets with the same number of particles. Two s et s of models are refined independently fo r each subset. T he gold standard FSC refers to the FSC between these two independent reconstructions. T he frequen cy w here the gold-standard FSC cur ve pas ses through 0.143 is often denoted as the estimated resolution of the reconstr uctions [30]. Due to the noise outside the region whe re mole cule res ide s , the FSC between two maps may underestimate the true resolution. In practice, we often use a mask to exclude contents outside the molecule and obtain a m as ked FSC between t wo ma ske d maps. This result can also be u s ed as a reference for the r es olution estimation. 2.4 Model-Map FSC If there ex ists a predetermined high resolution atomic model us i ng X-ray crystallography, another way to validate the resolution of the Cr yo- EM 16 experimental map is to compare it with the atomic model. T he resolution is d etermined by the co rrelation between atomic model map and C ryo- EM experimental map. The first step to calculate the model -map FSC is fitting the atomic model into the experimental density map. T o avoid over fitting, we often employ the fitting method with minimal parameters such as rigid-body fitting. Then the model map is c o ns tructed from the fitted atomic model by sampling on the same grid as the expe rimental map. The model-map fit can then be evaluated by calculating the corr elation betwe en Fourier coefficients of model map and experimental map , namely, FSC , as it is defined in equation 󰇛  󰇜 . This kind of FSC is referred as model-map FSC [31]. The point where 15 the model-map FSC approache s a certain threshold can be refer red to as the resolution of experimental map. 2.5 Paramete r se ttings To choos e good par ame ters for our new method, we should determine the corr ec t scale for our parameters. A theoretica lly optimal sc ale c an be obtained by using the closed form solution of our new target function. For the lasso type problem, the closed form solution can be derived from its dual form [32 ]. The first step towa rds the dual form of our new target function is converting our new target function to a matrix form. With equation 󰇛  󰇜 in hand, we can write our 3D recons truction pro blem in matrix form as                                                          󰇛  󰇜 where  is a vector representation of t he 3D fourier transform data with  󰇛󰇜              󰇼   󰇛󰇜 󰇼              󰇛 󰇜 for   󰇟     󰇠  󰇟  󰇠 ,  is a diagonal matrix with diagonal element  󰇛   󰇜   󰇛 󰇜 , and  is the fourier transform of t he 3D model  . We can derive the dual for m of equation 󰇛  󰇜 by simplif ying our restraint. According to [32] , substituting the restraint in equation 󰇛  󰇜 by      , the dual of equation 󰇛  󰇜 with new restraint is of the form ,   󰇛        󰇜  󰇛      󰇜  󰇛        󰇜 󰇛  󰇜 subject to            󰇛  󰇜 , where 󰇛      󰇜  is the Moore-Penrose invers e for      ,             ,  is the paramete r for   restraint and  is the dual variable of the 3D volume  . Given  , the closed form of the solut ion of  c an be written as,   󰇛      󰇜  󰇛        󰇜  󰇛  󰇜 16 In equation 󰇛  󰇜 ,    is the inverse fourier transform of the data, which is the unregularized solution.    is the dual variable of restraint, which r egula rizes    . To achie ve s parsity in the so lution  ,  nee ds to zero out certain components of    . Since the dual variable  is bounded by  , the restraint parameter  should be of the same s cale a s the average of the magnitudes of     Though our restraint is of a more complex form than     , the dual of our restraint is in the space of a combination of two domains similar to       . Detailed derivations a b out the dual space of the combination of t wo norm can be found in [33], [34] . T herefore, we can set the parameters of our restraint to be of the scale as the square ro ot of the average of the squares of     We denote the square root of the average of the squ ares of    as                   in remaining sections. The s cale of i mplicit gradient desce nt restraint is ea sy to set s in ce it is quadratic. Note that each quadratic da ta lo ss term in our tar get function is s caled b y 󰇛󰇜 in equation 󰇛  󰇜 . Using the heuristic that the penalty term should matc h the loss term, we can se t the re straint parameter  to be on the s cale of the average of  󰇛󰇜 . We denote the average of  󰇛󰇜 a s 󰇛󰇜        in remaining sections. Other important parameters to be set are  and  . T hey are in the denominator s of our sparsity and smoothness restraint. If they are too small com paring with the density values of the 3D volume, the weights in our weighted norms will be very f lexible and strongly depend on the magnitude of the value of each voxel in the 3D volume. Su ch kinds of restraints might not be able to effectively remove background noises and cause two indep endent refinements diverge . If they are too large comparing with the density values of the 3D volume, the restraints degrade to the o riginal   and TV norms and leads to more biased solutions. Optimal values of  and  should as sign large weights to background noises and small weights to true molecular densities. Hence, we can s et the  to the level of dens ity values correspond ing to molecular content in the 3D volume.  ca n be s et close to  . This le v el ca n b e ea sily obtained fr om the intermediate volumes generated by the refinements us i ng traditional method. This is al s o similar to choose a threshold for creating a ma sk when computing masked FSC. In conclusion, we s hould set the restraint parameters  and  to be on the same scale as                   . Since the corresponding restraints are invers ely weighted by quantities with two other parameters  and  , we multiply the scale                   by  or  to counter acting the effects of  and  . For zero elements, their restraint parameters are then normalized to be on the s cale of                   . We performed a grid s earch on  -galactosidas e to determine the possible ranges of parameters. We star t ed by s etting  to 0.1, which is higher than the level of density values corresponding to molecular density. We consider ed setti ng   to be   since the magnitude of gradient is often smaller than the density value of molecule. T he 17 initial guess es for  a nd  were                       ,      󰇛󰇜        . We scanned through   󰇟  󰇠                    wit h a s tep size                     . The res olutions for every  are shown in Figure 1. We can see tha t the b es t result ca n be found at                       or                        . Figure 1 Resolutions of results refined with different β for β -galactosidase For 80S r ibosome , we use d the same  ,  and  as  -galactos idase. We sca nn ed through   󰇟󰇠 with a step size 0.4                    for                       and                       . The results are shown in Figure 2 . The best result s were obtain ed a t                        and                       or                     . For influ enz a HA trimer, we set  to 0.015 and  to   . We increase d  to  󰇛󰇜        . The  is fixed at                     during search. T he resolutions of results for diff erent  are shown in Figure 3. The best res ult wa s obtained at                        . We als o reran the refinement after decreasing  to 󰇛 󰇜        while fixing other parameters we found in the best result. T h e res olution of this attempt decreased to   Å. Hence, we s topped our search for HA trimer. Simil ar se arches w ere applied to other cas es and the best parameter are reported. 18 Figure 2 Resolutions of results refined with different β and α for 80S ribosome. Figure 3 Resolutions of results refined with different β for influenza HA trimer . 19 3 Results We tested our method by perform ing 3D refinement on real datase ts and comparing the refinement results with the models obta ined u s ing traditional method on the same datasets . The 3D refinement proce s s consists of iterations a lt ernating between expectation and maximization . Both method s used the same set tings s uch as adaptive sampling rate and oversa mple order in the expectation steps [13]. We also used the same conve rgenc e c riteria f or two methods during comparison, i.e., no resolution improvement and orientation and translation changes for at lea st two iterations [13]. The res olutions of diff erent dens ity maps refined by different methods are summarized in Table 1. We used  -galactosidase as a test case. This data set has been extensively used in previous research [14], [ 35]. Since there was no ready- to -use particle stack for model building, our test beg an with extracting particles from micrographs using the coordinates manually picke d by Richard Henderson [35]. After a round of 2D class ification, the particle s belong to the ma jor classes were selected for model building. An initial model was generated ab initio f rom the 3D clas sification procedure. The initial model was then low pass filtered to 50 Å and av eraged us ing D2 symmetry. We performed 3D refinements using different methods and the same initial model while enforcing D2 symmetry during the refinements. The be st parameters of our method are                                                       and      󰇛󰇜        . In the post process step, we cre ate d a mask with the final reconstruction using all particles in the 3D refinement procedure. Using relion _postpr ocess routine [16], we obtained post processed maps from independent maps by correcting the MTF of the detector and sharpening with automatically estimated B -factors. To calculate the model map FSC s and model map correlations, we fitt ed the a to mic coordinates of an E. coli  -galactosida se structure 3I3E [36] into the post processed density maps reconstructed by different methods using rigid boy fit in Chimera [37]. The results obtained from dif ferent methods are pr esented in Figure 4 and Figure 5. The FSC curves for the density maps refined by our new method ar e plot ted with solid lines, and the FSC curves for the density maps refined by the traditional method are plotted with dash lines . T he FSC c urves between unma s ked maps are c olored in blue, and the FSC curve s betwee n ma s ked maps are colored in red. The light blue line represents the FSC value equals 0.143. As it is shown in the Figure 4, the s oli d lines are above the dash lines in almost all r es olution range. This suggests that the density maps obtained from our new me thod are of higher quality comparing with the dens ity maps obtained by traditional method. The improvement of the density maps obtained from our new method is further validated by the model-map FSC curves calculated by Phenix.Mtriage [38]. The density maps recon s tructed by our ne w method have higher 20 correlations with respect to the rigid body fitted model in medium to high resolution shells. We also tested our method on the 80s ri bosome whic h i s c ollect ed by [39]. Sin ce no particle stack was deposited for this protein, our test was started f rom scratch. We extracted particles from the micrographs using the semi-automated se lection proce s s in Relion [40]. The particles we re pruned by a round of 2D cla ss if ication where only the particles classified to major classes were kept. We then constructed an ab initio model for 3D ref inement through 3D clas sification as before. 3D refinements continued from the 70 Å low pass filtered initi al model . The best parameters of our method are                                                      and      󰇛󰇜        . A post process whic h i s s imilar to the preceding ca s e was applied on th e final dens ity maps. To assess the Cryo- EM maps de t ermined using different methods, we fitted 80S crystal structure, 3U5B [41], using a simple rigid body f it into pos tprocessed maps to obtain high resolution reference atomic structures . The 40S and 60S subunits were fitted separately. Th e model-map FSC s were calculat ed be tween our maps and corresponding reference structures. Using the same color sc hem e a nd line type as in the  -galac tosidase figures, the gold standard FSC curves for refinements using different methods are reported in Figure 6. The FSC curves between dens ity map s from our new me thod are above the FSC curves betwe en densit y maps from the traditional method. For the model-m ap FSC curves s hown in Figure 7, the FSC between dens ity map refined by our m ethod a nd the rigid body fitted model overlaps with the FSC betwe en density map obtained by traditional method and the rig id body fitted model in low to medium r es olution. However, the dens ity map refined by our method has higher correlation with the model map in high resolution regions, whic h sugges ts th at the de nsity map obtained by our method achieves higher resolution comparing with the density map refined by traditional method. We tested our method on the influenza hemagglutinin (HA) t ri mer. The data was obtained from E M PIAR deposition with access ion number 100 97 [42]. We generated an initial model ab initio using the 3D class ification. The initial model was furthe r averaged ac cording to C3 s ymmetry. The 3D refinements were performed by using a 40 Å low-passe d filtered initi al model and e nforcing the C3 sy mmetry. The bes t parameters of our method are                                                      and    󰇛 󰇜        . To compare the results obtained from two diff erent refinement methods, we used a high re solution atomic structure of HA tr imer (PDB:3WHE) which was determined by Xray crystallography as a reference [43]. The atomic model was rigidly f itted into Cryo- EM ma ps using Chime ra [37]. The model- map FSC curves were reported. We also compared the post processed maps derived from the r es ults of different r ef inement methods. In the post proces s step, the final reconstruction using all particles in the 3D refinement procedur e was used to generate 21 a mask. T he fin al de nsity maps were created from the independ ent maps and corrected for the modulation tr ans fer function ( MT F) of the detector usin g relion_pos tprocess routine. They then sha rpened by applyi ng a ne gative B-factor that was automatically estimated. T he gold standard FSC curves b etwee n density maps refined by different methods are plotted in Figure 8 using the sa me color scheme an d line type as before. The unmas k ed gold s tandard FSC of our new method is greatly improved over the unmask gold standa rd FSC of traditional method. The mas k ed gold standard FSC of our new method is also improved in most of resolution she lls. T he de nsity map refined by our new method has higher resolution according to the 0.143 crit erion. Th is improvement is further conf irmed by the mode l-map FSCs in Figure 9. It is eas y to see that the post processed map of our new method has much highe r correlation w.r.t the atomic model in medium to high resolution shells. Another test case is the structure of the protein-conducting E R AD channel Hrd1 in complex with Hrd3 [44]. The Cryo-EM data was downloa ded f rom the EMPIAR with accession number 10099. Due to the heterogeneity of the data s et, the 3D c lassification was us ed to classify the p articles, and generate the correspondi ng initial models of different complexes for 3D ref inements . The particles w hich were classified as Hrd1/Hrd3 dimer wer e s elected, and then s ubject to 3D r efinement s . We performed 3D refinements using different methods and the s ame 20 Å low -pass ed filtered initial model. The best parameters of our method are                                                       and    󰇛󰇜        . The final density maps were c reat ed us ing same a pproaches a s before. The final results were compared w.r .t the atomic models of Hrd1 dimer (5V6P ) and Hrd3 monome r (5V7V) by cal culating model-map FSC [44]. The Hrd1 dimer and Hrd3 monomer were fitted into density map se parately. The gold standard FSC curves between the density maps refined by different methods are shown in Figure 10. It is easy to see that the FSC curves between density maps refined by our new methods is highe r than the FSC curves between density maps refined by traditional method in most regions, which sugge sts that the refin eme nt results of our new method should have highe r resolution comparing with t he refinement results of traditional method. The model-map FSC curves in Figure 11 corroborates our conclusion since the dens ity map obtained by our method ag ain has higher correlation w.r.t the rigid-body f i tted model map in most resolution she lls. Therefore, the density map obtained by our new method achieve d higher resolution compar ing to the dens ity map obtained by traditional method. The final tes t case is th e s tructure of the TMEM16A calcium-activated chloride channel [45]. The C ryo-E M data was downloaded from the EM PIAR with accession number 10123. We generated an initial model ab initio using the 3D classification. The initial model was further averaged according to C2 s ymmet ry. The 3D refinements were performed by using a 40 Å low-pas sed fil tered initial model and 22 enforcing the C2 symmetry. The best parameters of our m ethod are                                                       and    󰇛󰇜        . The final density maps were created us ing s ame approaches as before. The final results we re compared w.r.t the atomic model 6BGI by cal culating model-m ap FSC [45]. The gold standa rd FSC curves betwee n the d ens ity maps refined by different methods are shown in Figure 12. It is easy to see that the FSC curves between density maps refined by our new methods is higher than the FSC curves between density maps refined by traditi onal method in most regions, which s uggests that the refinement result s of our new method should have higher resolution comparing with the refinement results of traditi onal method. Even though the atomic model was obtained by fitting agains t t he de nsity map refined by traditional method, we can still obse rve that the model-map FSC curves of our method is above the same kinds of model -map FSC curves of traditi onal method in all resolution shells in Figure 13. T his corroborates our con clus ion that the density map obtained by our new method achieve d higher resolution compar ing to the dens ity map obtained by traditional method. We visually inspected different final density maps reconstructed by diff erent refinement methods, and compared them with the c orrespondin g atomic models . We here present some d ifference s between density maps refined by different methods along with the structures. T he first ex ample is for HA trim er and located within the residue 451 to 455 in chain C. T he density map refined by our new me thod is plotted in Figure 14 (a), and colored in green. The dens ity map refined by traditional method is plotted in Figure 14 (b), and colored in red. Both density maps are contoured at the same level. It is easy to see that the density map refined by tr aditional method has no density at the sidechain of residue 452 and incomplete density a t the sidec hain of residue 453, while the dens ity map refined by our new method presents more complete density at these re sidue s. Besides, the sidechain of residue 454 stays outside the density map refined by traditional method, while it is enclose d by the dens ity map refined our new method. These differences s uggest that the den s ity map refined our new method has higher map model correlation than the density map refin ed by traditional method. Another example is found in residues fro m 386 to 391 in the c hain A of Hrd1/Hrd3 dim er. T he dens ity maps refined by diff erent methods a re plotted along with the corresponding structure and c olored by the same color scheme in Figure 15 . The dens ity map refined by our new method shows density for the sidechain of residue 391, while the dens ity map refined by traditi onal method has no density above this level in the same regio n. Besides the density map refined b y our new method covers the s idechains of residue 389 and 390, while some atoms in those sidechains stay outside the density map refined by tr aditional method. T hese differences serve as complementary evidences to indicate that our new method can i mprove the model map correlation of the fin al dens ity map. For  -galactosida se, residues between 796 and 801 in chain A are taken as an e xample. W e can als o observe s ome improvements in 23 the density map refined by our new method. As it is shown in F igure 16 , the dens ity map refined by our new method has more densities for the sidechain of residue 799 in contrast to the density map refined by traditional method. The whole s idechain of residue 796 is s urrounded in the density map ref ined by our new method, whi le there are few atoms s tay out s ide of the density map refined by traditional method. The last difference is that the density map refined by our new method has more density near the sidechain of residue 800. The final example is taken from the res idues be twe en 105 and 120 in the chain J of 80S rib os ome. As it is shown in Figure 17 , the most noticeable improvements of density map refined by our new method in this region are located at residue 109 and residue 114. Contoured at the s ame l eve l, our density map shows clear density for sidechains of these residues comparing with the density map refined by traditional method. We can also observe that our density map has more complete density for the sidechain s of residue 120, 117 and 116. Finally, we presented the two post processed dens ity map s of TMEM16A in Figur e 18. The d ensity map of TMEM16A refined by our new method is shown in Figure 18 (a) , while the de nsity map of TMEM16A refined by traditional method is shown in F igure 18 (b). Both maps are contoured at the same level. We can see that the densi ty map refined by our new method exhibits more sec ond ary structure features in the middle of this de nsity map comparing the density map refined by traditional method. Bes ides, there are also more densities at the top and bottom of the densit y map refined by our new method. In all results presented in this paper, we can see that the improv ement of our new method in gold standard unmasked FSC is mo s t noticeable. Thi s phenomenon suggests that our method has supe rior denois ing effect to the 3d volume, thus producing less noisy reference model during refinement. Th e cleaner model in tur n leads to more accurate orientation and translation parameter estimation for each image in the expectation step. The se reciprocal improvements of our method result in better refinement results eventually. M oreover, though our new metho d introduced more parameters, the parameter settings of our new method which ge nerates better result than traditional method can be easily foun d in a relatively sm all range . First of all,  can be s et according to t he de nsity level of molecular content in the 3D volume. T his is of the same fas hion a s selecting the threshold for creating mask when computing masked FSC. In our test ca ses,  s a re in the range 󰇟󰇠                    ,  s are in the range 󰇟󰇠                    , and  s are in the range 󰇟  󰇠 󰇛󰇜         .  s are in the range 󰇣    󰇤  . T hese ranges provide useful guidance for future applications of our new method. Given these ranges, we can use the brute force approach – grid search to obtain the optimal parameter setting. 24 4 Conclusion In this paper, we proposed a ne w typ e of 3D refinement method for the Cryo- EM single particle analysis. Our analys i s reveals that our new method pr omotes differ ent kinds of smoothnes s comparin g with the traditional method. Unlike the traditional method [13] which en for ce s translation invariant rotationally symmetric smoothnes s to the 3D model, our new method en forces s p atially varying smoothnes s to the 3D model. Since most structures does not exhibit rotational symmetry, the traditional method might result in large bias in the final model. In contrast, our method can adapt to different smoothness in different regions in structures, thus gr eatly reducing model bias es a nd improving the f inal result s. Another approach to promot e s moothness we explored in this paper is by formulating the backprojection as a local kernel regress ion problem. This new formulation enables us to embed the 3D mo del in a RKHS with specific smoothness . We also introduced a new prior , s parsity, into the 3D refinement process. By setting relatively small values in the 3D model to zero, the spars ity restraint can suppress the s trength of signal which do es n’t belo ng to molecules in the 3D model and leads to better refinement result. We tested our n ew method on real datasets and c ompared the refinement results with the results obtained by traditional method. Using the criteria like gold standa rd FSC and model map FSC , we have shown that the results obtained by our new method has greatly im proved the resolution of electron density map and the c orrelation between the atomic model and electron density map. We expect our method to be deployed in Cryo- EM s tructure determination process and help experim ent ers to obtain higher resolution maps and structures. Table 1 Resolutions of different density maps of proteins refined by different methods. Gold Standard Traditional repr esents the resolution of the density map refined by tradit ion al method . Gold Standard Our represents the resolution of the density map refined by tradit ional method. Both resolutions are measured at FSC = 0.143 using the gold standard FSC and phase randomisation method [16] . Model Map Traditional refers to the resolution of density map refined by traditional method . Model Map Our refers to the resolution of density map refined by our method. Both resolutions are obtained at FSC = 0.143 using masked model- map FSC . Proteins Gold St an dard Traditional Gold St an dard Our Model Map Traditional Model Map Our  -galactos idase 4.16 3.97 4.05 3.91 80S ribosome 4.08 3.93 4.04 3.89 hemagglutinin 4.19 3.77 4.06 3.72 25 Hrd1/Hrd3 4.80 3.55 4.70 4.25 TMEM 14A 4.01 2.93 3.87 3.14 Figure 4 Gold standard unmasked FSC curves and m asked FSC curve s of the  -galactosidase calculated from two independent reconstructions for different refinement methods. Unmasked represents the unmasked FSC curve of results f rom our method, and masked represents the masked FSC curve of results fr om our method.   represents the unmasked FSC curve of results from traditional method, and   represents the masked FSC curve of results from traditional method. 26 Figure 5 Model - map FSC curve s between the post processed den sity map s of the  - galactosidase obtained using different methods and the corresponding rigid -body fitted atomic models 3I3E. Unmasked represents the unmasked FSC curve of results from our method, and masked represents the masked FSC curve of results from our method.    represents the unmasked FSC curve of results from traditional method, and   represents the masked FSC curve of results from traditional method. 27 Figure 6 Gold s tandard un ma sked and m asked FSC curves of 80S ribosome calculated between two independent re constructions for different methods. Unmasked represents the unmasked FSC curve of results from our method, and masked represents the masked FSC curve of results from our method.   represents the unmask ed FSC curve of results from traditional method, and   represents the masked FSC curve of results from traditional method. 28 Figure 7 FSC curves between the post processed maps refined using different methods and the corresponding rigid -body fitted atomic models 3U5B, for which the 40S and 60S subunits are fitted separately . Unmasked represents the unmasked FSC curve of results from our method, and masked represents the masked FSC curve of results from our method.    represents the unmasked FSC curve of results from traditional method, and   represents the masked FSC curve of results from traditional method. 29 Figure 8 Gold standard unm a sked and masked FSC curves of the influenza hemagglutinin trimer between two independent refinements for different methods . Unmasked represents the unmasked FSC curve of results from our method, and masked represents the masked FSC curve of results from our method.    represents the unmasked FSC curve of results from traditional method, and   represents the masked FSC curve of results from traditional method. 30 Figure 9 FSC curves between the post processed maps refined using different methods and the corresponding rigid -body fitted atomic models 3WHE. Unmasked represents the unmasked FSC curve of results from our method, and masked represents the masked FSC curve of results from our method.   represents the unmasked FSC curve of results from traditional method, and   represents the masked FSC curve of results from traditional method. 31 Figure 10 Gold standard un ma sked and masked FSC curves of the Hrd1/Hrd 3 complex between two independent refinements for different methods . Curves with different colors show the results of different methods . Unmasked represents the unmasked FSC curve of results from our method, and ma sked represents the masked FSC curve of results from our method.    represents the unmasked FSC curve of results from traditional method, and   represents the masked FSC curve of results from traditional method. 32 Figure 11 FSC curves between the post processed maps refined using different methods and the corresponding rigid -body fitted atomic models, for which the Hrd1 dimer and Hrd 3 subunits are fitted sep a rately . Unmasked represents the unmasked FSC curve of results from our method, and masked represents the masked FSC curve of results from our method.    represents the unmasked FSC curve of results from traditional method, and   represents the masked FSC curve of results from traditional method. 33 Figure 12 Gold standard un ma sked and masked FSC curves of the TMEM16A calcium- activated chloride channel between two independent refinements for differe nt methods. Curves with different colors show the results of different methods. Unmasked represents the unmasked FSC curve of results from our method, and masked represents the masked FSC curve of results from our method.   represents the unmasked FSC curve of results from traditi onal method, and   represents the masked FSC curve of results from traditional method. 34 Figure 13 FSC curves between the post processed maps refined using different methods and the corresponding rigid -body fitted atomic models. Unmasked represents the unmasked FSC curve of results from our method, and masked represents the masked FSC curve of results from our method.   represents the unmasked FSC curve of results from traditional method, and   represents the masked FSC curve of results from traditional method. 35 Figure 14 (a) Density map refined by new method for residues from 4 51 to 455 in the chain C of HA trimer. (b) Density map refined by traditional method for residu es from 451 to 455 in the chain C of HA trimer (a) (b) 36 Figure 15 (a) Density map refined by new method for residues from 386 to 391 in the chain A of Hrd1/Hrd3 dimer. (b) Density map refined by traditional method for residues from 386 to 391 in the chain A of Hrd1/Hrd3 Figure 16 (a) Density map refined by new method for residues from 796 to 801 in the chain A of β -galactosidase. (b) Density map refined by traditional method for residues from 796 to 801 in the chain A of β -galactosidase (a) (b) (a) (b) 37 Figure 17 (a) Density map refined by new method for residues from 105 to 120 in the chain J of 80S ribosome. (b) Density map refined by traditional method for resid ues from 105 to 120 in the chain J of 80S ribosome (a) (b) 38 Figure 18 (a) The postprocess density map for the TMEM16A refined by our method. (b) The postprocess density map for the TMEM16A refined by traditiona l method. Refere nc es [1] R. M. Glaeser, “Limitations to significant information in biological electron microscopy as a result of radiation damage,” J. Ultrastruct. Res. , vol. 36, no. 3 – 4, pp. 466 – 482, 1971. [2] I. A. M. Kuo and R. M. Glaeser, “Development of methodology for low exposure, high resolution electron microscopy of biological specimens,” Ultramicroscopy , vol. 1, no. 1, pp. 53 – 66, 1975. [3] R. M. Glaeser, “How good can cryo - EM become?,” Nat. M ethods , vol. 13, no. 1, p. 28, 2015. [4] C. V. Sindelar and N. Grigorieff, “An adaptation of the Wiener filter suitable for analyzing images of isolated single particles,” J. Struct. Biol. , vol. 176, no. 1, pp. 60 – 74, 2011. [5] S. H. Scheres, “A Bayesian v iew on cryo - EM structure determination,” J. Mol. Biol. , vol. 415, no. 2, pp. 406 – 418, 2012. [6] E. J. Candes, “The restricted isometry property and its implications for compressed sensing,” Comptes Rendus Math. , vol. 346, no. 9 – 10, pp. 589 – 592, 2008. [7] L . I. Rudin, S. Osher, and E. Fatemi, “Nonlinear total variation based noise removal algorithms,” Phys. Nonlinear Phenom. , vol. 60, no. 1 – 4, pp. 259 – 268, 1992. [8] R. Tibshirani, “Regression shrinkage and selection via the lasso,” J. R. Stat. Soc. Ser. B Me thodol. , pp. 267 – 288, 1996. (a) (b) 39 [9] D. L. Donoho, “Compressed sensing,” IEEE Trans. Inf. Theory , vol. 52, no. 4, pp. 1289 – 1306, 2006. [10] F. Thon, “Zur Defokussierungsabhängigkeit des Phasenkontrastes bei der elektronenmikroskopischen Abbildung,” Z. Für Natur forschung A , vol. 21, no. 4, pp. 476 – 478, 1966. [11] K. Hanszen, “The optical transfer theory of the electron microscope: fundamental, principles and applications,” Adv. Opt. Electron Microsc. , vol. 4, p. 1, 1971. [12] T. Hastie, R. Tibshirani, and M. Wain wright, Statistical learning with sparsity: the lasso and generalizations . CRC press, 2015. [13] S. H. Scheres, “RELION: implementation of a Bayesian approach to cryo - EM structure determination,” J. Struct. Biol. , vol. 180, no. 3, pp. 519 – 530, 2012. [14] S . H. Scheres and S. Chen, “Prevention of overfitting in cryo - EM structure determination,” Nat. Methods , vol. 9, no. 9, p. 853, 2012. [15] J. Frank and L. Al - Ali, “Signal - to -noise ratio of electron micrographs obtained by cross correlation,” Nature , vol. 256, no. 5516, pp. 376 – 379, 1975. [16] S. Chen et al. , “High - resolution noise substitution to measure overfitting and validate resolution in 3D structure determination by single particle electron cryomicroscopy,” Ultramicroscopy , vol. 135, pp. 24 – 35, 2013. [17] J. Fan and R. Li, “Variable selection via nonconcave penalized likelihood and its oracle properties,” J. Am. Stat. Assoc. , vol. 96, no. 456, pp. 1348 – 1360, 2001. [18] E. J. Candes, M. B. Wakin, and S. P. Boyd, “Enhancing sparsity by reweighted ℓ 1 mini mization,” J. Fourier Anal. Appl. , vol. 14, no. 5 – 6, pp. 877 – 905, 2008. [19] A. P. Dempster, N. M. Laird, and D. B. Rubin, “Maximum likelihood from incomplete data via the EM algorithm,” J. R. Stat. Soc. Ser. B Methodol. , vol. 39, no. 1, pp. 1 – 22, 1977. [2 0] Little, J. A. Roderick, Rubin, and B. Donald, “Statistical analysis with missing data,” Technometrics , vol. 45, no. 4, pp. 364 – 365, 2002. [21] H. Zou, “The adaptive lasso and its oracle properties,” J. Am. Stat. Assoc. , vol. 101, no. 476, pp. 1418 – 1429, 2006. [22] H. Zou and R. Li, “One - step sparse estimates in nonconcave penalized likelihood models,” Ann. Stat. , vol. 36, no. 4, p. 1509, 2008. [23] Y. Nesterov, “Smooth minimization of non - smooth functions,” Math. Program. , vol. 103, no. 1, pp. 127 – 152, 2005. [24] S. Becker, J. Bobin, and E. J. Candès, “NESTA: A fast and accurate first -order method for sparse recovery,” SIAM J. Imaging Sci. , vol. 4, no. 1, pp. 1 – 39, 2011. [25] N. Parikh, S. Boyd, and others, “Proximal algorithms,” Found. Trends® Optim. , vo l. 1, no. 3, pp. 127 – 239, 2014. [26] P. Toulis, “Implicit methods for iterative estimation with large data sets,” PhD Thesis, 2016. 40 [27] H. Takeda, S. Farsiu, and P. Milanfar, “Kernel regression for image processing and reconstruction,” IEEE Trans. Image P rocess. , vol. 16, no. 2, pp. 349 – 366, 2007. [28] Larray Wasserman, “Nonparametric Regression.” http://www.stat.cmu.edu/~larry/=sml/nonpar2019.pdf. [29] A. Gretton, “Introduction to rkhs, and some simple kernel algorithms,” Adv Top Mach Learn Lect. Conduct. Univ. Coll. Lond. , vol. 16, 2013. [30] P. Rosenthal, R. Crowther, and R. Henderson, “An objective criterion for resolution assessment in single - particle electron microscopy (appendix),” J Mol Biol , vol. 333, pp. 743 – 745, 2003. [31] C. - I. Bränd’en and T. Alwyn Jones, “Between objectivity and sub jectivity,” Nature , vol. 343, pp. 687 – 689, 1990. [32] R. J. Tibshirani, J. Taylor, and others, “The solution path of the generalized lasso,” Ann. Stat. , vol. 39, no. 3, pp. 1335 – 1371, 2011. [33] R. T. Rockafellar, Convex analysis . Princeton university press, 1970. [34] Y. - L. Yu, Arithmetic duality for norms . 2012. [35] K. R. Vinothkumar, G. McMullan, and R. Henderson, “Molecular mechanism of antibody - mediated activation of β - galactosidase,” Structure , vol. 22, no. 4, pp. 621 – 627, 2014. [36] M . L. Dugdale, D. L. Dymianiw, B. K. Minhas, I. d’Angelo, and R. E. Huber, “Role of Met -542 as a guide for the conformational changes of Phe-601 that occur during the reaction of β - galactosidase (Escherichia coli),” Biochem. Cell Biol. , vol. 88, no. 5, pp. 861 – 869, 2010. [37] E. F. Pettersen et al. , “UCSF Chimera— a visualization system for exploratory research and analysis,” J. Comput. Chem. , vol. 25, no. 13, pp. 1605 – 1612, 2004. [38] P. V. Afonine et al. , “New tools for the analysis and validation of cryo -E M maps and atomic models,” Acta Crystallogr. Sect. Struct. Biol. , vol. 74, no. 9, pp. 814 – 840, 2018. [39] X. Bai, I. S. Fernandez, G. McMullan, and S. H. Scheres, “Ribosome structures to near - atomic resolution from thirty thousand cryo - EM particles,” elife , vol. 2, p. e00461, 2013. [40] S. H. Scheres, “Semi - automated selection of cryo - EM particles in RELION - 1.3,” J. Struct. Biol. , vol. 189, no. 2, pp. 114 – 122, 2015. [41] A. Ben - Shem, N. G. de Loubresse, S. Melnikov, L. Jenner, G. Yusupova, and M. Yusupov, “ The structure of the eukaryotic ribosome at 3.0 \ AA resolution,” Science , vol. 334, no. 6062, pp. 1524 – 1529, 2011. [42] Y. Z. Tan et al. , “Addressing preferred specimen orientation in single -particle cryo- EM through tilting,” Nat. Methods , vol. 14, no. 8, p. 793, 2017. [43] Y. Iba et al. , “Conserved neutralizing epitope at globular head of hemagglutinin in H3N2 influenza viruses,” J. Virol. , p. JVI – 00420, 2014. [44] S. Schoebel et al. , “Cryo - EM structure of the protein - conducting ERAD channel Hrd1 in comple x with Hrd3,” Nature , vol. 548, no. 7667, p. 352, 2017. 41 [45] S. Dang et al. , “Cryo - EM structures of the TMEM16A calcium - activated chloride channel,” Nature , vol. 552, no. 7685, pp. 426 – 429, 2017.

Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Leave a Comment