Representation Theoretic Patterns in Multi-Frequency Class Averaging for Three-Dimensional Cryo-Electron Microscopy

We develop in this paper a novel intrinsic classification algorithm -- multi-frequency class averaging (MFCA) -- for classifying noisy projection images obtained from three-dimensional cryo-electron microscopy (cryo-EM) by the similarity among their …

Authors: Yifeng Fan, Tingran Gao, Zhizhen Zhao

Representation Theoretic Patterns in Multi-Frequency Class Averaging for   Three-Dimensional Cryo-Electron Microscopy
Represen tation Theoretic P atterns in Multi-F requency Class A v eraging for Three-Dimensional Cry o-Electron Microscopy Yifeng F an · Tingran Gao · Zhizhen Zhao Received: date / Accepted: date Abstract W e dev elop in this pap er a no vel intrinsic classification algorithm — multi-fr e quency class aver aging (MF CA) — for classifying noisy pro jection images obtained from three-dimensional cryo-electron microscopy (cry o-EM) b y the similarit y among their viewing directions. This new algorithm lev erages multiple irreducible represen tations of the unitary group to in tro duce additional redundancy into the representation of the optimal in-plane rotational alignmen t, extending and outp erforming the existing class a veraging algorithm that uses only a single represen tation. The formal algebraic model and representation theoretic patterns of the prop osed MFCA algorithm extend the framework of Hadani and Singer to arbitrary irreducible representations of the unitary group. W e conceptually establish the consistency and stability of MFCA by insp ecting the spectral prop erties of a generalized lo cal parallel transport op erator through the lens of Wigner D -matrices. W e demonstrate the efficacy of the prop osed algorithm with numerical exp erimen ts. Keyw ords Represen tation theory · Spectral theory · Differential geometry · Wigner matrices · Cry o-electron microscop y · Mathematical biology Mathematics Sub ject Classification (2010) 20G05 · 33C45 · 33C55 · 55R25 1 In tro duction The past decades ha ve witnessed an emerging and contin ued impact of cry o-electron microscopy (cryo-EM), the Nob el Prize winning imaging technology for determining three-dimensional structures of macromolecules, on a wide range of natural scientific fields [ 18 , 48 , 26 , 40 , 13 , 52 ]. Compared with its predecessor, X-ra y crystal- lograph y , of whic h the success builds upon the p oten tially difficult pro cedure of crystallization, cry o-EM is able to image the macromolecules in their native states and produces large n umbers of pro jection images for samples of molecules rapidly frozen in a thin lay er of vitreous ice. The pro jection images can be thought of as ZZ and YF ackno wledge the supp ort from Strategic Research Initiatives in the Universit y of Illinois at Urbana-Champaign and NSF grant DMS-1854791. TG ac knowledges the support from NSF DMS-1854831, an AMS-Simons T rav el Grant, and partial support from DARP A D15AP00109 and NSF grants I IS-1546413. Yifeng F an Department of Electrical and Computer Engineering Universit y of Illinois at Urbana–Champaign E-mail: yifengf2@illinois.edu Tingran Gao Committee on Computational and Applied Mathematics Department of Statistics Universit y of Chicago E-mail: tingrangao@galton.uchicago.edu Zhizhen Zhao Department of Electrical and Computer Engineering Coordinated Science Lab oratory Universit y of Illinois at Urbana–Champaign E-mail: zhizhenz@illinois.edu 2 Yifeng F an, Tingran Gao, Zhizhen Zhao tomographic pro jections of man y copies of an iden tical molecule at unknown and random orientations. A ma- jor computational c hallenge in reconstructing the three-dimensional molecular structure from these pro jection images is the extremely low signal-to-noise ratio (SNR) caused by the limited allow able electron dose (so as to av oid damaging the molecule b efore the imaging completes). It is thus customary to improv e the SNR by p erforming class aver aging — the pro cedure of aligning and then av eraging out pro jection images taken along nearb y viewing directions — from rotationally inv ariant pairwise comparisons of the pro jection images [ 53 , 26 ], b efore the do wnstream reconstruction workflo w suc h as angular reconstitution [ 65 , 39 , 56 ]. In addition to its sci- en tific v alue, the rich geometric structure in the cry o-EM imaging mo del has also inspired man y mathematical and algorithmic in vestigations [ 61 , 35 , 59 , 62 , 4 , 67 , 75 , 9 , 29 , 3 , 30 , 71 , 31 ]. 1.1 Bac kground: The Mathematical Mo del of Cryo-Electron Microscopy and Class A veraging F ollowing [ 63 , 38 ], w e view the collection of pro jection images  I i ∈ R L × L | i = 1 , . . . , N  as tomographic pro- jection images for the same three-dimensional ob ject along pro jection directions uniformly sampled from the t wo-sphere S 2 , as it is more con v enient to consider the imaging mo del in the molecule’s own lab frame, where the molecule is fixed and observed by an electron microscop e at v arious orien tations. F or simplicity , we assume the pro jection images are all cen tered, i.e. the center of mass of the clean pro jection images are at the cen ter of the images. The goal is to iden tify and classify pro jection images produced from similar pro jection directions, hereafter referred to as viewing dir e ctions . A p oint x ∈ SO(3) is identified with an orthonormal basis ( e 1 , e 2 , e 3 ) of R 3 , with orientation compatible with the canonical orthonormal co ordinate frame of R 3 . W e identify e 3 ∈ S 2 with the viewing direction and denote it for π ( x ) for the ease of notations. The 2D image obtained by the microscop e observed at a spatial orien tation x is a real v alued function I : R 2 → R , giv en by the X-ra y transform along the viewing direction: I ( s, t ) = Z R φ ( s e 1 + t e 2 + r e 3 ) d r for all ( s, t ) ∈ R 2 (1) where φ : R 3 → R is a real-v alued function modeling the electromagnetic potential induced from the charges of the molecule. W e assume the images I ( s, t ) are all supp orted on a b ounded set of R 2 whic h fits into the size of the pro jection images. T o measure the similarit y betw een an y t w o pro jection images I i and I j , obtained b y the tomographic pro jection along viewing directions π ( x i ) ∈ S 2 and π ( x j ) ∈ S 2 resp ectiv ely , we compute a rotationally inv arian t distance b etw een I i and I j defined as d RID ( I i , I j ) = min θ ∈ [0 , 2 π ) k I i − R θ ( I j ) k F , (2) where R θ ( I j ) stands for the op eration of rotating image I j b y an angle θ ∈ [0 , 2 π ) in the coun terclo ckwise orien tation, and k·k F is the matrix F rob enius norm. The optimal alignment angle b et ween I i and I j will b e denoted as θ ij = arg min θ ∈ [0 , 2 π ) k I i − R θ ( I j ) k F . (3) F or images I x and I y obtained from viewing directions π ( x ) and π ( y ) for x, y ∈ SO(3) and without noise con tamination, [ 38 ] mo dels the optimal alignment angle as the tr ansp ort data enco ding the angle of in-plane rotation needed to align frames x, y after one of them is parallel-transp orted to the fibre of the other using the canonical Levi-Civita connection on the unit sphere equipp ed with an induced Riemannian structure from the am bient space R 3 . A rough idea for filtering out far-apart viewing directions is through thresholding the rotationally inv ariant distances betw een pairs of pro jection images against a preset threshold parameter  > 0 that should b e tuned to reflect the confidence in the accuracy of the imaging pro cess. The pairwise comparison information after thresholding can be conv eniently enco ded into an observation gr aph G = ( V , E ) , where each v ertex of G stands for one of the pro jection images, and an edge ( i, j ) b elongs to the edge set E if and only if the rotationally in v ariant distance d RID ( I i , I j ) is smaller than the threshold. In an ideal noiseless world, the geometry of the graph G is a neigh b orho o d graph on the unit sphere S 2 , namely , t wo images are connected if and only if their viewing directions π ( x i ) and π ( x j ) are close on the unit sphere, h π ( x i ) , π ( x j ) i ≥ 1 − h , for h  1 . F rom the noisy cry o-EM images, the rotationally inv ariant distances d RID are affected by noise and d RID -based similarit y measure will connect images of very different views, in tro ducing short-cut edges on the unit sphere. The main problem here is th us to distinguish the “go o d” edges from the “bad” ones in the graph G , or, in other Representation Theoretic Patterns in Multi-F requency Class A veraging for 3D Cryo-EM 3 w ords, to distinguish the true neigh b ors from the outliers. The existence of outliers makes the classification problem non-trivial. Without excluding the outliers, av eraging rotationally aligned images with small in v ariant distance ( 2 ) yields a p o or estimate of the true signal, rendering infeasible the 3D ab initio reconstruction from denoised images. W e refer interested readers to [ 21 , 47 ] for more detailed statistical analysis of the rotationally in v arian t distance ( 2 ). The focus of this pap er is to rectify the noise-contaminated empirical transp ort data using the sp ectral information of an in tegral operator constructed from the initial lo cal transp ort data. 1.1.1 The Class A ver aging Algorithm One of the most natu ral ideas for p erforming class a veraging is through the eigenv ectors of the class av eraging matrix constructed from the empirical transp ort data { e ιθ ij } ( i,j ) ∈ E [ 63 , 38 ]. W e briefly recapture the main steps in the class av eraging algorithm b elow. Detailed discussions and the analysis of represen tation theoretical patterns can b e found in [ 63 , 38 ]. In this section we use notation [ N ] = { 1 , 2 , . . . , N } for N ∈ N . The algorithm b egins with computing rotationally in v arian t distances d ij b et ween all pairs of pro jection images I i and I j , along with the corresponding optimal alignment angles θ ij . After that, construct an N -by- N Hermitian matrix H b y H ij = ( e ιθ ij if ( i, j ) ∈ E , 0 otherwise , (4) where the edge set E ⊂ [ N ] × [ N ] is obtained by thresholding the pairwise distances { d ij : 1 ≤ i, j ≤ N } , i.e., ( i, j ) ∈ E if and only if d ij is b elow a preset threshold  > 0 , i.e., E := { ( i, j ) ∈ [ N ] × [ N ] : d RID ( I i , I j ) <  } . (5) Set D as the diagonal matrix with diagonal en tries D ii = N X j =1 | H ij | , 1 6 = i ≤ N (6) and compute the top three eigenv ectors ψ 1 , ψ 2 , ψ 3 ∈ C N of the normalized Hermitian matrix e H := D − 1 / 2 H D − 1 / 2 . Eac h pro jection image is then asso ciated with a p oint in C 3 b y means of the embedding map Ψ : { I i } N i =1 − → C 3 I i 7− → ( ψ 1 ( i ) , ψ 2 ( i ) , ψ 3 ( i )) where ψ 1 ( i ) , ψ 2 ( i ) , ψ 3 ( i ) denotes for the i th entries of ψ 1 , ψ 2 , ψ 3 , resp ectively . The measure of affinit y b etw een I i and I j is then computed using the embedding map Ψ : A ij := |h Ψ ( I i ) , Ψ ( I j ) i| k Ψ ( I i ) k k Ψ ( I j ) k , 1 ≤ i 6 = j ≤ N . (7) Finally , the neighbors of a pro jection image I i are determined b y thresholding the affinit y measures A ij : Neigh b ors of I i := { I j | A ij > 1 − γ } where 0 < γ < 1 is another preset threshold parameter that con trols the size of the neigh b orho o ds. 4 Yifeng F an, Tingran Gao, Zhizhen Zhao 1.2 Main Contributions The main con tributions of this pap er are (1) the introduction of the m ulti-frequency class av eraging (MFCA) algorithm to impro ve the viewing direction classification of cryo-EM single particle images, and (2) a complete c haracterization of the sp ectral information of a gener alize d lo cal p ar al lel tr ansp ort op er ator underlying the geometric relation in MFCA. Sp ecifically , motiv ated b y recen t works [ 3 , 32 , 24 , 25 ], whic h incorporate m ultiple represen tations of the pair- wise comparison information into the sync hronization problem, w e propose in this pap er a multi-fr e quency class aver aging algorithm using the extended empirical transp ort data { e ιkθ ij } ( i,j ) ∈ E for k = 1 , 2 , . . . , k max . It creates more than one cop y of the class av eraging matrix— one for each “frequency c hannel” corresp onding to one irreducible representation of SO(2) group element. Those matrices can b e viewed as the discretization of the generalized lo cal parallel transp ort op erators T ( k ) h . A formal definition of T ( k ) h can b e found in ( 24 ). The new algorithm uses the top 2 k + 1 eigen v ectors of the class av eraging matrix at frequency k to embed the images in to 2 k + 1 -dimensional complex space. The new frequency k -affinity measure is defined as the absolute normal- ized cross correlation of the embedded vectors. W e also propose to aggregate the affinity measures across the frequency channels to enforce the consistency of the nearest neighbor identification. Since the p erformance of the algorithm depends on the prop erties and stability of the top eigenv ectors, we p erform the sp ectral analysis of the corresp onding in tegral op erator T ( k ) h . W e sho w in Theorem 2 and Theorem 3 that the top eigenspace of T ( k ) h , denoted as W ( k ) , is (2 k + 1) -dimensional. In addition, w e show that the top eigenv alue of T ( k ) h decreases as k increases and the top sp ectral gap increases as k increases up to a threshold determined by the lo cal neigh b orho o d size. The increasing spectral gap implies the adv antage of using higher frequency information for class a veraging, as the numeric al stability of the eigen-decomp osition step in MFCA dep ends on the magnitude of the sp ectral gap. In addition to the c haracterization of the dimensionalit y of the top eigenspace W ( k ) of T ( k ) h , we also demon- strate in Theorem 4 and Theorem 5 the existence of a canonical identification of W ( k ) with a complex (2 k + 1) - dimensional linear space spanned b y (2 k + 1) linearly indep endent en try functions in the Wigner D -matrix asso ciated with the unique (2 k + 1) -dimensional unitary irreducible represen tation of SO(3) . A direct corol- lary of this canonical identification is the equality b et ween the frequency- k affinity measure and the viewing angle, thus generalizing the result in [ 38 ] for the affinit y measure ( 7 ). These facts establish the admissibility (consistency) of the prop osed MF CA algorithm. W e emphasize that these theoretical results are not straightforw ard extensions of the techniques in [ 38 ] to the generalized lo calized parallel transp ort operator T ( k ) h . The generating-function-based approach in [ 38 ] is not easy to generalize to our setting without hea vy notation and lengthy mathematical inductions. Instead, we observ ed that the constructions in [ 38 ] can be greatly simplified using an alternativ e construction by means of the Wigner D -matrices, whic h has b een widely used in studies in mathematical physics concerning the irreducible representation of SO(3) . In the clean, noiseless scenario, the m ulti-frequency class av eraging matrices certainly carry iden tical infor- mation for exactly reco vering the affinity among view directions of the pro jection images; the real adv antage, as argued and demonstrated in the theoretical analysis of [ 32 ] and the exp erimen tal results of [ 24 , 25 ], lies at the lo w SNR region where utilizing higher-moment information b ecomes particularly b eneficial even without in tro ducing additional indep endent measurements for those higher moments. Empirically , we observe that the algorithm can tolerate higher level of noise than what is allow ed according to the traditional Davis-Kahan theorem [ 17 ]. In addition, the p erformance of the single frequency- k class av eraging algorithm improv es as k increases up to a critical frequency index determined by the sp ectral gap, magnitudes of the top eigen v alues, and the noise level. Besides the improv ed numerical stabilit y due to increased spectral gap, using higher frequency information for class a veraging can also b e in terpreted as leveraging the additional redundancy encoded in the consistency of the “higher order momen ts,” whic h is in line with our contin ued exploration for a “geometric harmonic retriev al” initiated in [ 32 , 24 , 25 ]. Moreo ver, in contrast with the computationally demanding SDP approach in [ 3 ] or the noise-t yp e-dep endent appro ximate message passing approac h in [ 54 ], the prop osed MFCA algorithm is easily parallelizable as the eigen-decomp ositions for the class av eraging matrices in each frequency channel are completely indep enden t. Representation Theoretic Patterns in Multi-F requency Class A veraging for 3D Cryo-EM 5 1.3 Organization of the pap er The rest of this paper is organized as follo ws. Section 2 in tro duces the MFCA algorithms; Section 3 introduces the basic mathematical set-up and notations for the sp ectral analysis in the remainder of this paper; Section 4 presen ts the main theoretical contributions; Section 5 in terprets the admissibility of MF CA using the theoreti- cal results; Section 6 discusses the noise robustness for the algorithm under t wo probabilistic models. Section 7 illustrates the efficacy of MFCA through some n umerical exp eriments; Section 8 concludes and discusses p o- ten tial future directions. The basics on group and represen tation theory and technical pro ofs are deferred to the App endix. 2 Multi-F requency Class A veraging Algorithms Throughout our discussion inv olving m ultiple frequency c hannels, w e will fix an integer k max ≥ 1 for the total n umber of frequency channels considered. F or each frequency k = 1 , . . . , k max , we construct a separate class a veraging matrix by H ( k ) ij = ( e ιkθ ij if ( i, j ) ∈ E 0 otherwise (8) 2.1 Single F requency- k Affinit y Measure The Hermitian matrix H ( k ) stores the empirical transp ort data under the k th irreducible representation of SO(2) . W e then normalize each H ( k ) using the same degree matrix D as in ( 6 ); note that all matrices H ( k ) share the same sparsit y pattern determined b y E . After p erforming eigen-decomp osition for e H = D − 1 / 2 H ( k ) D − 1 / 2 , w e k eep the top (2 k + 1) eigen vectors ψ ( k ) 1 , . . . , ψ ( k ) 2 k +1 ∈ C N and define the embedding Ψ ( k ) : { I i } N i =1 − → C 2 k +1 (9) I i 7− →  ψ ( k ) 1 ( i ) , . . . , ψ ( k ) 2 k +1 ( i )  . W e compute the affinit y measure b etw een I i and I j at frequency k as A ( k ) ij :=    D Ψ ( k ) ( I i ) , Ψ ( k ) ( I j ) E      Ψ ( k ) ( I i )     Ψ ( k ) ( I j )   , 1 ≤ i 6 = j ≤ N . (10) Ob viously , Ψ (1) = Ψ and A (1) ij = A ij in the traditional class av eraging. W e can perform κ -nearest neighbor searc h using the affinity measure A ( k ) ij computed from an individual frequency k . The rationale b ehind the sp ecific forms of ( 9 ) and ( 10 ) is the core of this pap er. In a n utshell, we use a (2 k + 1) -dimensional embedding b ecause b y Theorem 2 and Theorem 3 we exp ect a sp ectral gap o ccurring betw een the (2 k + 1) th and (2 k + 2) th eigen vector of H ( k ) (coun ting multiplicities). The affinit y measure ( 10 ) is related to the closeness of tw o viewing directions by the relation ( 37 ) in Theorem 5 . 2.2 Com bining Information from Multiple F requencies Since each affinity measure in ( 10 ) reflects the closeness of t wo viewing directions, com bining those scores together can enforce the consistency of the classification results at each frequency and improv e the o verall accuracy . W e prop ose one wa y to aggregate the single frequency affinity measure as A All ij := k max Y k =1 A ( k ) ij . (11) W e choose aggregation ( 11 ) because the affinity measure ( 10 ) is related to the viewing angle b y the relation ( 37 ) in Theorem 5 . In particular, comparing ( 37 ) and [ 38 , Theorem 6] tells us that A ( k ) ij = A k ij for all 1 ≤ i 6 = j ≤ N . W e defer more detailed discussions of the geometric relation of this algorithm to Section 4.3 and Section 5 . 6 Yifeng F an, Tingran Gao, Zhizhen Zhao R emark 1 Note that ( 10 ) and ( 11 ) are not the only wa ys to distill and aggregate the affinity information from m ultiple irreducible representations. Other natural alternatives include S ( k ) ij := 2      D Ψ ( k ) ( I i ) , Ψ ( k ) ( I j ) E      Ψ ( k ) ( I i )     Ψ ( k ) ( I j )     1 k − 1 , 1 ≤ i 6 = j ≤ N (12) whic h in the noiseless scenario satisfies S ( k ) ij = S (1) ij = 2 A ij − 1 , for all k ≥ 1 . Therefore, it is natural to combing all G ( k ) ij b y arithmetic av eraging S All ij := 1 k max k max X k =1 S ( k ) ij . (13) Ho wev er, our empirical exp eriments suggest that it is numerically m uch more stable to av oid taking k th roots for large v alues of k . W e pro vide a brief interpretation of this phenomenon in Section 5 . There can b e other approaches to com bine the affinity scores from multiple frequencies, such as weigh ted a verage among different frequencies or ma jority voting. W e will explore other w ays to integrate multi-frequency information in the future. 3 Preliminaries for the Sp ectral Analysis of MFCA In this section, w e introduce our set-up and notations for the sp ectral analysis of MFCA. F or additional concepts in the relev ant group and representation theory and Wigner D -matrix, please refer to App endix A . 3.1 Set-up Throughout this paper, we view SO(3) as a SO(2) -bundle ov er the 2 -dimensional sphere S 2 in R 3 . F or any d ∈ N + , we view C d as a Hilb ert pro duct space equipped with the canonical Hermitian inner pro duct induced from the standard Euclidean inner product on R d . W e will distinguish tw o different types of group actions on SO (3) : If g ∈ SO(3) , g acts on elements of SO(3) b y left multiplication, denoted as g B x := g x, ∀ g , x ∈ SO(3) . If w ∈ SO(2) ⊂ SO(3) , unless otherwise sp ecified, w is assumed to b e uniquely iden tified with an SO(3) elemen t b y w = w ( θ ) =   cos θ − sin θ 0 sin θ cos θ 0 0 0 1   , for θ ∈ [0 , 2 π ) , (14) and acts on elements of SO (3) by right m ultiplication, i.e., x C w := xw , ∀ x ∈ SO (3) , w ∈ SO(2) . Unless confusions arise, w e will also denote x C g =: xg , g , x ∈ SO(3) for the right action of SO(3) on itself, when the con text is clear. F ollowing the conv ention of [ 38 ], we denote the transport data betw een x, y ∈ SO(3) b y T ( x, y ) , the unique SO (2) element satisfying x C T ( x, y ) = t π ( x ) ,π ( y ) y , (15) where t π ( x ) ,π ( y ) is the parallel transport along the unique geodesic on S 2 connecting π ( y ) to π ( x ) . The optimal alignmen t angle θ ij computed from ( 3 ) can b e used to construct an appro ximation of the transp ort data betw een x i and x j (the observ ation frames of I i and I j , resp ectively), at the presence of measuremen t and discretization error, by e T ( x i , x j ) := e ιθ ij . (16) Representation Theoretic Patterns in Multi-F requency Class A veraging for 3D Cryo-EM 7 W e refer to the e T ( x i , x j ) ’s as the empiric al tr ansp ort data . As sho wn in [ 38 ], T ( x, y ) satisfy the follo wing prop erties: T ( x, y ) = T ( y , x ) − 1 , ∀ x, y ∈ SO(3) (Symmetry) T ( g B x, g B y ) = T ( x, y ) , ∀ x, y ∈ SO(3) , ∀ g ∈ SO(3) (In v ariance) T ( x C w 1 , y C w 2 ) = w − 1 1 T ( x, y ) w 2 , ∀ x, y ∈ SO(3) , ∀ w 1 , w 2 ∈ SO (2) . (Equiv ariance) If ρ : SO(2) → C is any unitary represen tation of SO (2) on C , then the three properties ab ov e can also b e cast in to ρ ( T ( x, y )) = ρ ( T ( y , x )) , ∀ x, y ∈ SO(3) (Symmetry) ρ ( T ( g B x, g B y )) = ρ ( T ( x, y )) , ∀ x, y ∈ SO(3) , ∀ g ∈ SO(3) (In v ariance) ρ ( T ( x C w 1 , y C w 2 )) = ρ ( w 1 ) ρ ( T ( x, y )) ρ ( w 2 ) , ∀ x, y ∈ SO (3) , ∀ w 1 , w 2 ∈ SO (2) . (Equiv ariance) W e shall only assume the symmetry to b e strictly satisfied b y the empirical transp ort data; the other properties will b e assumed to hold only appro ximately . T o simplify notations, w e denote for any k ∈ Z T ( k ) ( x, y ) := ρ k ( T ( x, y )) , ∀ x, y ∈ SO(3) (17) where ρ k : SO(2) → C is the unique unitary irreducible represen tation of SO(2) with c haracter k ∈ Z . The corresp onding notation for the empirical transp ort data is e T ( k ) ( x i , x j ) . In any of these irreducible representations, the empirical transp ort data { e T ( k ) ( x i , x j ) | 1 ≤ i, j ≤ N } appro ximate th e ground truth transport data n T ( k ) ( x i , x j ) | 1 ≤ i, j ≤ N o only when the viewing directions π ( x i ) and π ( x j ) are close to each other, in the sense that the ve ctors π ( x i ) and π ( x j ) b elong to some small spherical cap of op ening angle α ∈ [0 , 2 π ) . 3.2 F unction on SO(3) and Isot ypic Decomposition W e will use the shorthand notation H = C (SO(3)) for the Hilb ert space of smo oth complex v alued functions on SO(3) , with standard Hermitian inner pro duct h f 1 , f 2 i H = Z SO(3) f 1 ( x ) f 2 ( x ) d x, f 1 , f 2 ∈ H . (18) Here d x denotes the normalized Haar measure on SO(3) . The left and righ t actions of the group elements induce corresp onding actions on the Hilb ert space H of complex-v alued functions ov er SO(3) : g · s ( x ) := s  g − 1 B x  , ∀ f ∈ H , x ∈ SO(3) , g ∈ SO (3) . w · s ( x ) := s ( x C w ) , ∀ s ∈ H , x ∈ SO(3) , w ∈ SO(2) . (19) The Hilb ert space H can also be considered as a unitary represen tation of SO(2) . Let ρ k : SO(2) → C b e the unique irreducible unitary represen tation of SO(2) of character k ∈ Z . H admits an isot ypic decomp osition H = M k ∈ Z H k , (20) where H k := { s ∈ H | s ( x C w ) = ρ k ( w ) s ( x ) for all x ∈ SO(3) and w ∈ SO(2) } . (21) Note that SO (3) acts on H k unitarily from the left b y g · s ( x ) := s  g − 1 B x  , ∀ g ∈ SO (3) , s ∈ H k , x ∈ SO(3) . 8 Yifeng F an, Tingran Gao, Zhizhen Zhao Eac h H k th us admits an isot ypic decomposition with resp ect to SO(3) , written as H k = M n ∈ N ≥ 0 H n,k (22) where H n,k denotes the isot ypic component corresponding to the unique irreducible representation of SO(3) of dimension (2 n + 1) , for n = 0 , 1 , . . . . An imp ortant observ ation is that each H n,k in ( 21 ) is of m ultiplicity 0 or 1 in H k : Theorem 1 ([ 38 , Theorem 7]) If n < | k | then H n,k = 0 . Otherwise, H n,k is isomorphic to the unique irr e ducible r epr esentation of SO(3) of dimension (2 n + 1) . 4 Main Theoretical Results 4.1 Generalized Parallel T ransp ort Operators The motiv ation for considering these isot ypic decompositions is to study the top eigenspace of the gener alize d p ar al lel tr ansp ort op er ator T ( k ) : H → H , defined as  T ( k ) s  ( x ) := Z SO(3) ρ k ( T ( x, y )) s ( y ) d y = Z SO(3) T ( k ) ( x, y ) s ( y ) d y , ∀ s ∈ H , x ∈ SO(3) , (23) for all k ∈ Z . When k = 1 , T ( k ) reduces to the p ar al lel tr ansp ort op er ator T : H → H defined in [ 38 , §2.3]. Similar to [ 38 , §2.3.1], w e can lo calize the generalized parallel transp ort operator T ( k ) for any k ∈ Z as  T ( k ) h s  ( x ) := Z B ( x,α ) ρ k ( T ( x, y )) s ( y ) d y = Z B ( x,α ) T ( k ) ( x, y ) s ( y ) d y , ∀ s ∈ H , x ∈ SO(3) , (24) where B ( x, α ) = { y ∈ SO(3) | ( π ( x ) , π ( y )) > cos α =: 1 − h } . Using the symmetry , inv ariance, and equiv ari- ance of the transp ort data (Section 3.1 ), we establish the following basic prop erties of T ( k ) for any k ∈ Z : (1) T ( k ) is self-adjoint. This can b e seen from the symmetry of transp ort data: for all s, w ∈ H , w e ha ve D T ( k ) s, w E H = Z SO(3) Z SO(3) ρ k ( T ( x, y )) s ( y ) w ( x ) d y d x = Z SO(3) Z SO(3) s ( y ) ρ k ( T ( y , x )) w ( x ) d y d x = D s, T ( k ) w E H . (2) T ( k ) comm utes with the action of SO(3) on H : by the inv ariance of transp ort data we hav e for all g ∈ SO (3) and s ∈ H , x ∈ SO(3) ,  T ( k ) ( g · s )  ( x ) = Z SO(3) ρ k ( T ( x, y )) s  g − 1 B y  d y z := g − 1 B y = = = = = = = Z SO(3) ρ k  T  g B  g − 1 B x  , g B z  s ( z ) d z = Z SO(3) ρ k  T  g − 1 B x, z  s ( z ) d z =  T ( k ) s   g − 1 B x  =  g ·  T ( k ) s  ( x ) . (3) L ` 6 = − k H ` ⊂ ker T ( k ) , and T ( k ) can b e view ed as an op erator from H − k to itself. This can b e v erified using the equiv ariance of T ( k ) . First, note that for an y s ∈ H w e hav e T ( k ) s ∈ H − k , since for any w ∈ SO (2) we ha ve w ·  T ( k ) s  ( x ) =  T ( k ) s  ( x C w ) = Z B ( x,α ) ρ k ( T ( x C w , y )) s ( y ) d y = Z B ( x,α ) ρ k ( w ) ρ k ( T ( x, y )) s ( y ) d y = ρ − k ( w ) Z B ( x,α ) ρ k ( T ( x, y )) s ( y ) d y = ρ − k ( w )  T ( k ) s  ( x ) . This pro ves that T ( k ) maps H in to H − k , b y the definition of isot ypic decomp osition ( 21 ) with respect to the SO(2) action. The conclusion that L ` 6 = − k H ` ⊂ k er T ( k ) then follows from Sc hur’s Lemma [ 11 , Theorem 2.1]. Representation Theoretic Patterns in Multi-F requency Class A veraging for 3D Cryo-EM 9 The arguments ab ov e can be applied to T ( k ) h , mutatis mutandis , and thus the same prop erties hold for the local generalized parallel transp ort op erator. In voking Sc h ur’s Lemma for a second time, we kno w that T ( k ) h acts on H n, − k as a scalar, i.e., T ( k ) h   H n, − k = λ ( k ) n ( h ) Id   H n, − k . (25) The multiplicit y-one theorem (Theorem 1 ) tells us that λ ( k ) n = 0 for all 0 ≤ n < | k | . In order to calculate the remaining λ ( k ) n ’s ( n ≥ | k | ) explicitly , it suffices to fix a p oint x 0 ∈ SO(3) , and pick an arbitrary function u ∈ H n, − k with u ( x 0 ) , and use relation λ ( k ) n =  T ( k ) h u  ( x 0 ) /u ( x 0 ) . W e will defer such computations for 0 < h  1 to Section 4.2 . Next subsection summarizes these prop erties, in preparation for the discussion on the main algebraic structure of the gener alize d intrinsic mo del in Section 4.3 . In [ 63 , 38 ], it was argued that the Hermitian matrix H in ( 4 ) should b e understoo d as the discretization (under uniform random sampling on SO(3) ) of an integral op erator T (1) h . Consequen tly , man y prop erties of the lo cal transp ort data matrix H can be studied through its “con tinuous limit” T h , esp ecially the eigenv alues and eigenv ectors, whic h conv erge to the eigenv alues and eigenfunctions of T h in an appropriate sense [ 44 ]; this p ersp ective is common in the manifold learning literature [ 5 , 6 , 15 , 62 , 30 ]. In the class av eraging setting, the integral op erator T h enjo ys many useful inv ariance and equiv ariance prop erties, whic h makes it relatively straigh tforward to study its sp ectral data using representation theoretic tools. Hadani and Singer noticed that T h acts on the subspace H − 1 of H . The space H − 1 is also canonically identified with the linear space of sections of a complex line bundle ov er SO(3) induced by the unitary irreducible represen tation of U(1) with character k = 1 [ 34 , 12 , 10 , 20 , 50 , 49 ]. F urthermore, T h comm utes with the induced left action of SO(3) on H − 1 , whic h by Sc hur’s theorem indicates that the eigenspaces of T h coincides with the isotypic components of H − 1 under the left SO(3) action. In particular, this mechanism can b e used to show that the top eigenspace of T h is the unique isot ypic comp onent of H − 1 corresp onding to the unique three-dimensional unitary irreducible representation of SO(3) for all sufficiently small h > 0 , and that the affinity measure 2 A ij − 1 is exactly iden tical with the cosine v alue of the viewing angle b etw een I i and I j in the noise-free setting. 4.2 Spectral Properties of the Lo cal P arallel T ransp ort Operator In this subsection w e summarize the sp ectral prop erties of T ( k ) h for h  1 (which is the relev ant regime for class av eraging). Pro ofs for the main theorems discussed in this subsection are deferred to Appendix B . These proofs essentially follow the proof ideas of [ 38 , Thero em 3 and Theorem 4], with tec hnical modification due to the complication of Jacobi polynomials — unlike the case for the Legendre p olynomials in volv ed in the analysis of single-frequency class av eraging, no sharp Bernstein-type inequalit y is kno wn for Jacobi p olynomials arising from the Wigner d -matrices. W e refer interested readers to discussions and conjectures in [ 14 , 36 , 45 ] for Bernstein-t yp e inequalities for Jacobi p olynomials. Theorem 2 (Eigen v alues of T ( k ) h for small h  1 ) The op er ator T ( k ) h has a discr ete sp e ctrum λ k n ( h ) for al l n ∈ N , and λ ( k ) n = 0 for al l 0 ≤ n < | k | . F or n ≥ | k | and h ∈ (0 , 2] , the dimension of the eigensp ac e of T ( k ) h c orr esp onding to λ ( k ) n is 2 n + 1 . In addition, λ ( k ) k and λ ( k ) k +1 have the fol lowing expr essions: λ ( k ) k ( h ) = 1 − (1 − h/ 2) k +1 k + 1 , (26) λ ( k ) k +1 ( h ) = 2( k + 1)(1 − (1 − h/ 2) k +2 ) k + 2 − (2 k + 1)(1 − (1 − h/ 2) k +1 ) k + 1 . (27) In the r e gime h  1 , the eigenvalue λ ( k ) n ( h ) ( n ≥ | k | ) adopts asymptotic exp ansion λ ( k ) n ( h ) = 1 2 h − 1 8  n 2 + n − k 2  h 2 + O ( h 3 ) . (28) R emark 2 When k = 1 , Theorem 2 reduces to [ 38 , Theorem 3]. 10 Yifeng F an, Tingran Gao, Zhizhen Zhao 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 -0.2 -0.1 0 0.1 0.2 0.3 0.4 0.5 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 -0.2 -0.1 0 0.1 0.2 0.3 0.4 0.5 Fig. 1: The top three eigen values λ ( k ) n ( h ) of operator T ( k ) h , for k = 1 (left) and k = 2 (righ t) o ver interv al h ∈ (0 , 2] . The pro of of Theorem 2 in App endix B.1 actually pro ves the stronger conclusion that each eigen v alue λ ( k ) n ( h ) is a polynomial in h > 0 of degree ( n + 1) whenever n ≥ | k | . The k ey step in the pro of is identifying that the ( − k , − k ) entry of the Wigner D-matrix D n − k, − k ( x ) ∈ H n,k for n ≥ | k | and is an appropriate function u for calculating the eigen v alues. The largest three eigen v alues for cases k = 1 and k = 2 can b e explicitly written out as λ (1) 1 ( h ) = 1 2 h − 1 8 h 2 , λ (1) 2 ( h ) = 1 2 h − 5 8 h 2 + 1 6 h 3 , λ (1) 3 ( h ) = 1 2 h − 11 8 h 2 + 25 24 h 3 − 15 64 h 4 , (29) and λ (2) 2 ( h ) = 1 2 h − 1 4 h 2 + 1 24 h 3 , λ (2) 3 ( h ) = 1 2 h − h 2 + 13 24 h 3 − 3 32 h 4 , λ (2) 4 ( h ) = 1 2 h − 2 h 2 + 57 24 h 3 − 70 64 h 4 + 7 40 h 5 . (30) Plots of λ ( k ) k + i , for i = 0 , 1 , 2 are provided in Figure 1 . Corollary 1 As k incr e ases, the eigenvalue λ ( k ) k de cr e ases and lim k →∞ λ ( k ) k = 0 . Pr o of (Pr o of of Cor ol lary 1 ) Based on Theorem 2 , the difference b etw een λ ( k +1) k +1 and λ ( k ) k for k ≥ 1 is, λ ( k +1) k +1 − λ ( k ) k = 1 − (1 − h/ 2) k +2 k + 2 − 1 − (1 − h/ 2) k +1 k + 1 = − 1 + (1 − h/ 2) k +1 (1 + h 2 ( k + 1)) ( k + 1)( k + 2) ( a ) < − 1 + (1 − h/ 2) k +1 (1 + h/ 2) k +1 ( k + 1)( k + 2) = − 1 + (1 − h 2 / 4) k +1 ( k + 1)( k + 2) < 0 , (31) where (a) is based on the fact that (1 + h 2 ) k +1 > 1 + ( k + 1) h 2 for h ∈ (0 , 2] via T aylor expansion. In addition, since 0 ≤ 1 − h/ 2 < 1 , lim k →∞ λ ( k ) k ( h ) = lim k →∞ 1 − (1 − h/ 2) k +1 k +1 = 0 . This is an important observ ation for determining the maximum frequency cutoff, which will b e further discussed in Section 6 . It is natural to conjecture that the top eigenspace of T ( k ) h is the (2 k + 1) -dimensional space corresp onding to eigenv alue λ ( k ) k ( h ) for sufficien tly small h > 0 . Moreov er, denote ∆ k := arg max h ∈ (0 , 2] λ ( k ) k +1 ( h ) = 1 k + 1 , (32) Representation Theoretic Patterns in Multi-F requency Class A veraging for 3D Cryo-EM 11 w e ha ve the following characterization of the sp e ctr al gap for T ( k ) h in the regime 0 < h  1 . Theorem 3 F or every value of h ∈ (0 , 2] , the lar gest eigenvalue of T ( k ) h is λ ( k ) k ( h ) . In addition, for every value of h ∈ (0 , ∆ k ] , the sp e ctr al gap G ( k ) ( h ) b etwe en the lar gest and the se c ond lar gest eigenvalue of T ( k ) h is G ( k ) ( h ) = λ ( k ) k − λ ( k ) k +1 = 2 − (1 − h/ 2) k +1 (( k + 1) h + 2) k + 2 . (33) Again, when k = 1 , Theorem 3 reduces to [ 38 , Theorem 4]. The main technicalit y of the pro of of Theorem 3 , whic h is deferred to App endix B.2 , is to sho w that λ ( k ) n ( h ) ≤ λ ( k ) k +1 ( h ) for every h ∈ (0 , ∆ k ] and n ≥ k + 1 , whic h appears evident from Figure 1 . F or small 0 < h  ∆ k , the sp ectral gap is approximately G ( k ) ( h ) ∼ 1 + k 4 h 2 , (34) whic h gets larger as the “angular frequency” k ∈ N increases. More generally , we hav e the following Corollary . Corollary 2 The sp e ctr al gap G ( k ) ( h ) incr e ases as k incre ases fr om 1 to k max =  1 h  − 1 . Pr o of (Pr o of of Cor ol lary 2 ) W e show that for any k ≥ 2 , the difference G ( k ) ( h ) − G ( k − 1) ( h ) is alw a ys p ositive for any h ∈ (0 , ∆ k ] . T o b egin with, w e explicitly write out the difference as G ( k ) ( h ) − G ( k − 1) ( h ) = 2 − (1 − h/ 2) k +1 (( k + 1) h + 2) k + 2 − 2 − (1 − h/ 2) k ( k h + 2) k + 1 = (1 − h/ 2) k (( k + 1) 2 h 2 + 2 k h + 4) − 4 2( k + 1)( k + 2) = (1 − h/ 2) k 2( k + 1)( k + 2)  ( k + 1) 2 h 2 + 2 k h + 4 − 4(1 − h/ 2) − k  | {z } =: ξ ( h ) = (1 − h/ 2) k 2( k + 1)( k + 2) ξ ( h ) , where ξ ( h ) is defined as a function of h . Since the term in fron t of ξ ( h ) is alwa ys p ositive for h ∈ (0 , ∆ k ] , it suffices to show ξ ( h ) > 0 for any k ≥ 2 and h ∈ (0 , ∆ k ] . T o this end, clearly ξ ( h ) = 0 when h = 0 then w e can instead show the deriv ative of ξ ( h ) is p ositive for any h ∈ (0 , ∆ k ] . That is, dξ ( h ) dh = 2( k + 1) 2 h + 2 k − 2 k (1 − h/ 2) k +1 . Again, when h = 0 we observe that dξ ( h ) dh | h =0 = 0 . So in order to show dξ ( h ) dh > 0 , for all h ∈ (0 , ∆ k ] we can instead chec k if the second order deriv ative of ξ ( h ) is p ositive for any h ∈ (0 , ∆ k ] . Indeed, w e ha ve d 2 ξ ( h ) dh 2 = 2( k + 1) 2 − k ( k + 1) (1 − h/ 2) k +2 = ( k + 1) (1 − h/ 2) k +2 (2( k + 1)(1 − h/ 2) k +2 − k ) ( a ) > ( k + 1) (1 − h/ 2) k +2 (2( k + 1)(1 − ( k + 2) h/ 2) − k ) ( b ) ≥ 0 where (a) comes from the inequality that (1 − x ) a > 1 − xa for an y x ∈ (0 , 1) and a > 2 , (b) is satisfied since 2( k + 1)(1 − ( k + 2) h/ 2) − k is linear and monotonically decreasing for h and the equalit y only holds when h = ∆ k = 1 k +1 . Therefore, we obtain that d 2 ξ ( h ) dh 2 > 0 , ∀ h ∈ [0 , ∆ k ] and it follo ws that dξ ( h ) dh > 0 , ∀ h ∈ (0 , ∆ k ] , furthermore we can conclude that G ( k ) ( h ) − G ( k − 1) ( h ) > 0 for any h ∈ (0 , ∆ k ] . This justifies one b enefit of setting k > 1 for class av eraging, as larger sp ectral gaps pro vide more robustness to noise corruption for k satisfying k < 1 h − 1 . More detailed discussion on the p erformance of the algorithm under noise p erturbation is in Section 6 . In practice, the choice of frequency cutoff dep ends on the neigh b orho o d size, noise t yp e and noise lev el and may need to b e empirically iden tified. 12 Yifeng F an, Tingran Gao, Zhizhen Zhao 4.3 The Main Algebraic Structure: Generalized Intrinsic Mo del Just as the in trinsic mo del established in [ 38 ] equates the “extrinsic mo del” S 2 with the “intrinsic mo del” of the top eigenspace W of T = T (1) , we will generalize this correspondence to the setting for general complex irreducible unitary represen tations of SO (2) . More specifically , we establish the corresp ondence b etw een the follo wing t wo generalized mo dels: – Gener alize d Extrinsic Mo del : F or every p oin t x = x ( ϕ, ϑ, ψ ) ∈ SO(3) , denote by δ ( k ) x : C → C 2 k +1 for the unique complex morphism sending 1 ∈ C to the first (index- ( − k ) ) column of the Wigner D -matrix D k (detailed in App endix A ), i.e., D k · , − k ( x ) =  D k − k, − k ( x ) , D k − k +1 , − k ( x ) , . . . , D k k − 1 , − k ( x ) , D k k, − k ( x )  > ∈ C 2 k +1 . – Gener alize d Intrinsic Mo del : Define W ( k ) as the top eigenspace of T ( k ) h , whic h by Theorem 2 and 3 , is (2 k + 1) -dimensional. Set for every p oint x ∈ SO(3) the map ϕ ( k ) x = p 1 / (2 k + 1) · (ev x | W ( k ) ) ∗ : C → W ( k ) , (35) where ev x : H → C is the ev aluation morphism at the p oint x ∈ SO(3) . The main algebraic structure of the multi-frequency in trinsic classification algorithm is summarized in the follo wing main theorem of this section. Theorem 4 The morphism τ : C 2 k +1 → H define d by τ : C 2 k +1 − → H v 7− →  x 7→ √ 2 k + 1 ·  δ ( k ) x  ∗ ( v )  is an isomorphism b etwe en C 2 k +1 and W ( k ) ⊂ H (as Hermitian ve ctor sp ac es). Mor e over, for every x ∈ SO(3) and k = 0 , 1 , . . . ther e holds τ ◦ δ ( k ) x = ϕ ( k ) x . (36) The pro of of Theorem 4 is deferred to App endix B.3 . Our pro of extends the argumen ts in the pro of of [ 38 , Theorem 5]. A k ey observ ation is that the top eigenv ector H  λ ( k ) k ( h )  coincides with the isot ypic subspace H k, − k (see Section 4.1 ). F urthermore, Theorem 4 reveals the corresp ondence b etw een the generalized extrinsic and intrinsic mo dels, in terms of the viewing angle information they enco de. This is summarized in the following result. Theorem 5 F or every p air of fr ames x, y ∈ SO(3) , we have    h ϕ ( k ) x ( v ) , ϕ ( k ) y ( u ) i W ( k )    =  h π ( x ) , π ( y ) i + 1 2  k (37) for any choic e of unit-norm c omplex numb ers v , u ∈ C . The pro of of Theorem 5 is deferred to App endix B.4 . R emark 3 When k = 1 , Theorem 4 and Theorem 5 reduce to [ 38 , Theorem 5] and [ 38 , Theorem 6], resp ectively , up to a different scaling constant for τ . The difference arises from our alternative, explicit construction of the isomorphism τ using Wigner D -matrices. Representation Theoretic Patterns in Multi-F requency Class A veraging for 3D Cryo-EM 13 5 In terpretation of the Theoretical Results for Multi-F requency Class A v eraging In this section, w e interpret the MFCA algorithm stated in Section 2 using the theoretical resul ts established in Section 4 , and pro vide conceptual explanations for the admissibilit y of MFCA in the noiseless regime. First, under the assumption that the pro jection images { I i | 1 ≤ i ≤ N } are pro duced from orthonormal frames { x i | 1 ≤ i ≤ N } sampled i.i.d. uniformly on SO (3) with resp ect to the normalized Haar measure, we view 1 N H ( k ) , the scaled class av eraging matrix at frequency k defined in Section 2 , as the discretization of the lo cal parallel transp ort operator T ( k ) h . W e kno w from standard results [ 44 , Theorem 3.1] that the eigen v alues of 1 N H ( k ) con verges to the eigenv alues of the generalized lo calized parallel transp ort operator T ( k ) h defined in ( 24 ) as the num b er of samples N go es to infinity and the opening angle α is sufficiently small. In particular, this implies that for large sample size N , the spectral gap of 1 N H ( k ) con verges to the spectral gap of T ( k ) h , which, b y Theorem 2 and Theorem 3 , is roughly of size (1 + k ) h 2 / 4 for h  1 k +1 and o ccurs betw een the (2 k + 1) th and the (2 k + 2) th eigen v alues of H ( k ) (rank ed in decreasing order). Moreo ver, as argued in [ 38 , Theorem 2], the MFCA embedding Ψ ( k ) defined in ( 9 ) corresp onds to the morphism ( 35 ) in the follo wing form: Ψ ( k ) ( x i )   Ψ ( k ) ( x i )   ≈ ϕ ( k ) x i (1) , for all x i ∈ SO (3) , (38) where k·k stands for the standard norm on C 2 k +1 . Combining ( 38 ) with Theorem 5 provides the justification for using A ( k ) to identify similar viewing angles, A ( k ) ij =    D Ψ ( k ) ( I i ) , Ψ ( k ) ( I j ) E      Ψ ( k ) ( I i )     Ψ ( k ) ( I j )   ≈    h ϕ ( k ) x (1) , ϕ ( k ) y (1) i W ( k )    =  h π ( x i ) , π ( x j ) i + 1 2  k . (39) This relation is demonstrated in the top ro ws of Figures 5 and 13 . In fact, Theorem 5 tells us that the affinit y measure S ( k ) ij defined in ( 12 ) coincides with the cosine v alue for the angle b etw een the tw o viewing directions in the noiseless regime. The form of the approximation iden tity ( 39 ) also suggests av oiding directly taking the k th ro ot of the correlation b etw een Ψ ( k ) ( I i ) and Ψ ( k ) ( I j ) as in ( 12 ) and ( 13 ) since this approac h loses con trol of the numerical relative error when A ( k ) ij is close to 0. In contrast, it is adv antageous to use the m ultiplicativ e forms ( 10 ) and ( 11 ) which do not worsen the relative error. The logarithm of the combined affinity A All has the following relation with the viewing angles, log  A All ij  = k max X k =1 log  A ( k ) ij  ≈ k max ( k max + 1) 2 log  h π ( x i ) , π ( x j ) i + 1 2  . (40) Using A All or log  A All  mak es small viewing angles m uch more prominent in the numerical pro cedures. One ma y well exp ect other linear com binations of the A ( k ) ij ’s, which are degree- k max p olynomials of the (cosine v alue of the) viewing angle. W e lea ve these further explorations to future work. 6 Analysis under Probabilistic Mo dels In this section, we discuss the benefit of using A ( k ) with k > 1 to iden tify nearest neigh b ors when the measure- men t graph is p erturb ed by noise. T o this end, w e use the random rewiring mo del [ 63 ] for the en tries of H (1) in Section 6.1 and extend it to incorp orate small angular p erturbation in Section 6.2 . W e start by randomly generating N orthonormal frames x 1 , x 2 , . . . , x N uniformly sampled from SO(3) according to the Haar measure. Eac h frame x i can be represented by a 3 × 3 orthogonal matrix R i = [ R 1 i , R 2 i , R 3 i ] and det( R i ) = 1 . W e iden tify the third column R 3 i as the viewing angle π ( x i ) of the molecule. The first t wo columns R 1 i and R 2 j form an orthonormal basis for the plane in R 3 p erp endicular to the viewing angle π ( x i ) . If the viewing angles for t wo pro jection images b elong to a small spherical cap with op ening angle α , then w e connect the tw o p oints in the graph (i.e. ( i, j ) ∈ E if h π ( x i ) , π ( x j ) i > cos α ). If x i and x j are tw o frames with the same viewing angle, 14 Yifeng F an, Tingran Gao, Zhizhen Zhao π ( x i ) = π ( x j ) , then R 1 i , R 2 i and R 1 j , R 2 j are tw o orthogonal bases for the same plane and the rotation matrix R − 1 i R j has the follo wing form: R − 1 i R j =   cos θ ij − sin θ ij 0 sin θ ij cos θ ij 0 0 0 1   . (41) When the viewing angles are slightly different, ( 41 ) holds approximately . The optimal in-plane rotational angle θ ij pro vides a go o d approximation to the angle θ ij that “aligns” the orthonormal bases for the planes π ( x i ) ⊥ and π ( x j ) ⊥ . Therefore, if h π ( x i ) , π ( x j ) i is close to 1, the angle θ ij is given by θ ij = arg min θ ∈ [0 , 2 π ) k R i ρ ( θ ) − R j k F , with ρ ( θ ) =   cos θ − sin θ 0 sin θ cos θ 0 0 0 1   . (42) In other w ords, the ground truth lo cal parallel transp ort data is computed b y aligning the lo cal frames within the connected neigh b orho o d, determined by the en tries of the matrix R − 1 i R j : cos θ ij =  R − 1 i R j  11 +  R − 1 i R j  22 q  R − 1 i R j  11 +  R − 1 i R j  22  2 +  R − 1 i R j  21 −  R − 1 i R j  12  2 , sin θ ij =  R − 1 i R j  21 −  R − 1 i R j  12 q  R − 1 i R j  11 +  R − 1 i R j  22  2 +  R − 1 i R j  21 −  R − 1 i R j  12  2 . (43) 6.1 Random Rewiring Model Starting from the clean neigh b orho o d graph constructed ab ov e, w e p erturb the graph based on the following pro cess: with probabilit y p , w e keep the clean edge and the associated transp ort data θ ij ; and with probabilit y 1 − p , w e remov e the edge ( i, j ) and randomly rewire i or j with a vertex drawn uniformly at random from the remaining vertices that are not already connected to i or j . W e assume that if the link b etw een i and j is a random link, then θ ij = φ ij , which is uniformly distributed ov er [0 , 2 π ) . Our mo del assumes that the underlying graph of links b etw een noisy data p oin ts is a small-world graph [ 68 ] on the sphere, with edges b eing randomly rewired with probability 1 − p . The alignments take their correct v alues for true links and random v alues for the rewired edges. The parameter p controls the signal to noise ratio of the graph connection where p = 1 indicates the clean graph. The matrix H ( k ) is a random matrix under this mo del with H ( k ) ij = ( e ıkθ ij , if ( i, j ) ∈ E and with probabilit y p, e ıkφ ij , if ( i, j ) / ∈ E and with probability (1 − p ) ¯ D N − ¯ D . (44) Since the exp ected v alue of the random v ariable e ıkφ v anishes for φ ∼ Uniform[0 , 2 π ) , the exp ected v alue of the matrix H ( k ) is E H ( k ) = pH ( k ) clean , (45) where H ( k ) clean is the clean matrix that corresp onds to p = 1 obtained in the case that all links and angles are set up correctly . A t eac h frequency k , the matrix H ( k ) can b e decomp osed into H ( k ) = pH ( k ) clean + R ( k ) , (46) where R ( k ) is a random matrix whose elements are R ( k ) ij =      (1 − p ) e ıkθ ij , if ( i, j ) ∈ E and with probability p, − pe ıkθ ij , if ( i, j ) ∈ E and with probability 1 − p, e ıkφ ij , if ( i, j ) / ∈ E and with probability (1 − p ) ¯ D N − ¯ D , (47) and ¯ D is the a verage degree of the clean neighborho o d graph. The elements in R ( k ) are independent zero mean random v ariables with finite moments, since the elements of R ( k ) are b ounded for 1 ≤ k ≤ k max . Representation Theoretic Patterns in Multi-F requency Class A veraging for 3D Cryo-EM 15 W e use k M k to denote the sp ectral norm of a matrix M . Since the underlying graph connectivity for all R ( k ) is iden tical and the mean and v ariance of R ( k ) ij are iden tical across k , the quantit y k R ( k ) k does not c hange o ver frequency index k . T o find an upp er bound on k R ( k ) k , we take p = 0 , where the matrix R ( k ) represen ts a sparse random graph. Since the surface area of a spherical cap with opening angle α is 4 π sin 2 α 2 and N p oints are uniformly distributed ov er the sphere, the av erage degree of the random graph is N sin 2 α 2 . Adapting [ 42 , Theorem 2.1] to our case, we can sho w that k R ( k ) k ≤ 2 √ N sin α 2 with high probabilit y . In Figure 2 , w e can see that the eigen v alues of R ( k ) follo ws Wigner’s semicircle la w [ 69 , 70 ]. F or the following discussion, we denote k max =  1 h − 1  . The ordered eigenv alues for pH ( k ) clean are ` ( k ) 1 ≥ ` ( k ) 2 ≥ · · · ≥ ` ( k ) N , and the ordered eigenv alues for H ( k ) are ˜ ` ( k ) 1 ≥ ˜ ` ( k ) 2 ≥ · · · ≥ ˜ ` ( k ) 2 k +2 , · · · ≥ ˜ ` ( k ) N , for k = 1 , . . . , k max . The sp ectral gap after the (2 k + 1) th eigen v alue for pH ( k ) clean is denoted as δ k = ` ( k ) 2 k +1 − ` ( k ) 2 k +2 . W e note that { ` ( k ) i } 2 k +1 i =1 ≈ pN λ ( k ) k and { ` ( k ) i } 4 k +3 i =2 k +2 ≈ pN λ ( k ) k +2 , and δ k ≈ pN G ( k ) , since 1 N H ( k ) clean is a discretization of the op erator T ( k ) h . W e consider the following three scenarios for the discussion of the stability of the algorithm under noise p erturbation: (1) small noise regime ( δ 1 ≥ 2 k R (1) k ), (2) medium noise regime ( δ 1 < 2 k R (1) k ≤ δ k max ), and (3) large noise regime ( δ k max < 2 k R (1) k ). • Smal l noise r e gime. This noise regime was previously considered in [ 63 ] to determine the threshold probability p c for the approximation of the top three eigenv ectors of H (1) and the top three eigenv ectors of H (1) clean under the random rewiring mo del. According to Corollary 2 , the spectral gap gets larger for higher frequency index k . This implies that the linear space spanned b y the first (2 k + 1) eigenv ectors of H ( k ) is closer to the top eigenspace of T ( k ) h , since the approximation error is inv ersely proportional to the sp ectral gap according to the renowned Da vis–Kahan theorem [ 17 , 72 ]. This also explains the choice of extracting the top (2 k + 1) eigenv ectors of H ( k ) in single frequency- k class a veraging. • Me dium noise r e gime. In this situation, we can find a ˜ k such that for all ˜ k ≤ k ≤ k max , δ k > 2 k R (1) k = 2 k R ( k ) k . In addition, we can show that ` ( k ) 4 k +3 > k R ( k ) k for k = 1 , . . . , k max . This is b ecause w e hav e λ ( k ) k +1 > G ( k ) for k ≤ k max and λ ( k ) k +1 decreases as k increases according to Theorem 2 and Theorem 3 . Using the same argument as in the small noise regime, w e can justify the b enefit of using the top (2 k + 1) eigen vectors of H ( k ) at k > 1 . • L ar ge noise r e gime. If we further decrease p , the sp ectral norm of R ( k ) b ecomes larger than the sp ectral gap δ k max . A ccording to Da vis-Kahan theorem, it seems impossible to reco v er the eigen vectors if the eigen v alue p erturbation is to o large. How ev er, we observ e that under this situation, the subspace spanned by the top 2 k + 1 eigenv ectors of H ( k ) still has non-trivial correlation with the subspace spanned b y the top 2 k + 1 eigen vectors of H ( k ) clean , if ˜ ` 2 k +1 > k R ( k ) k k and in other w ords ` 2 k +1 > 1 2 k R ( k ) k k . This phenomenon is similar to the phase transition for eigenv alues and eigenv ectors of a low rank matrix under the additiv e p erturbation of a Gaussian Wigner matrix in [ 7 , Section 3.1], although our underlying clean matrices H ( k ) clean are full rank. It seems that the eigenv ectors of the unp erturb ed matrix are possible to recov er even when the sp ectral gap is m uch smaller than that required by Davis-Kahan. In this case, the Davis-Kahan theorem is insufficient to b ound the distance b et ween the subspaces since it do es not consider the nature of the perturbation. It is useful to use p erturbation b ounds that tak e in to accoun t the nature of the p erturbation suc h as the upp er b ound on the entry-wise deviation of the eigenv ector in [ 23 , Theorem 8]. The theorem only applies to the situation with ` 2 k +1 > k R ( k ) k . According to the theorem, b oth δ k and ` 2 k +1 − k R ( k ) k app ear in the denominators of the terms in the upp er bound for the en try-wise deviation of the eigenv ector. As k increases, the sp ectral gap δ k increases, while the term ` 2 k +1 − k R ( k ) k decreases based on Corollaries 1 , 2 , and Theorem 3 . This implies that the upper b ound of the deviation in [ 23 , Theorem 8] will decrease initially as k increases from 1 b ecause the reduction in the term that contains δ k dominates, and then it will increase when the incremen ts in the terms containing ` 2 k +1 − k R ( k ) k b ecomes dominan t. W e empirically observe that the accuracy of the affinit y measure A ( k ) increases with increasing k up to a critical cutoff k c as detailed in Section 7.1 . W e identify k c as the p oin t when ` k c 2 k c +2 ≥ 1 2 k R ( k c ) k and ` ( k c +1) 2 k c +4 < 1 2 k R ( k c ) k , which corresp onds to when ˜ ` ( k ) 2 k +2 b ecomes very close to k R ( k ) k . The estimation of the top eigenspace gets less accurate when k increases b eyond k c , whic h will result in worse classification results using A ( k ) (see Figure 8 ). The eigen vector p erturbation of a full-rank matrix with additive random matrix is still an op en problem and w e will pro vide theoretical justification for our observ ations in the future. Based on the discussions ab ov e, we see the b enefit of using A ( k ) for k > 1 to select nearest neigh b ors b ecause the underlying embedding Ψ ( k ) can b e more stable than Ψ (1) . Under additive noise p erturbation in ( 44 ), each em b edding Ψ ( k ) is perturb ed randomly , but they hav e non-trivial correlation with the corresponding true 16 Yifeng F an, Tingran Gao, Zhizhen Zhao eigenspace when the noise is not to o large. In addition, Ψ ( k ) ( i ) enco des the viewing direction information in terms of the degree- k p olynomial of the frame x i and the underlying information on x i is perturb ed differently at different k ev en though the noise is not independent. The combined score is able to identify pairs that hav e consisten tly high affinities across k and filter out pairs that only hav e a couple of high scores across k . 6.2 Random Rewiring Model with Small Angular Errors W e extend the random rewiring model in Section 6.1 to incorp orate small angular errors in the pairwise alignmen t angles for the correctly connected pairs. Specifically , w e consider additive errors in the angle, ˜ θ ij = θ ij + ε ij , (48) where ε ij are indep endently dra wn from a distribution γ on the in terv al [0 , 2 π ) . W e also assume that E ( ε ij ) = 0 mo d 2 π . W e can ev aluate c k = E ( e ıkε ) for ε ∼ γ ([0 , 2 π )) . The matrix H ( k ) is a random matrix under this mo del with H ( k ) ij = ( e ık ( θ ij + ε ij ) , if ( i, j ) ∈ E and with probability p, e ıkφ ij , if ( i, j ) / ∈ E and with probability (1 − p ) ¯ D N − ¯ D . (49) Since the exp ected v alue of the random v ariable e ıkφ v anishes for φ ∼ Uniform[0 , 2 π ) , the exp ected v alue of the matrix H ( k ) is E H ( k ) = c k pH ( k ) clean , (50) where H ( k ) clean is the clean matrix that corresp onds to p = 1 obtained in the case that all links and angles are set up correctly . A t eac h frequency k , the matrix H ( k ) can b e decomp osed into H ( k ) = c k pH ( k ) clean + R ( k ) , (51) where R ( k ) is a random matrix whose elements are R ( k ) ij =      ( e ıkε ij − c k p ) e ıkθ ij , if ( i, j ) ∈ E and with probability p, − c k pe ıkθ ij , if ( i, j ) ∈ E and with probability 1 − p, e ıkφ ij , if ( i, j ) / ∈ E and with probability (1 − p ) ¯ D N − ¯ D . (52) The analysis follows the steps in Section 6.1 and k R ( k ) k ≤ 2 √ N sin α 2 with high probability . Comparing Eq. ( 50 ) with Eq. ( 45 ), w e find that the main difference is that the eigenv alues of E H ( k ) are scaled b y c k at frequency k . The condition for the sp ectral algorithm to work is that the spectral gap c k pN G ( k ) and the top eigenv alue c k pN λ ( k ) k are sufficiently large compared with k R ( k ) k . F or a w ell concen trated distribution γ , we can first ev aluate c k and then determine the critical cutoff frequency k c that satisfy the condition. With the same p in the random rewiring mo del, k c gets smaller in the presence of additional small angular errors since c k < 1 . In Section 7.1 , w e sho w the performance of the algorithms on a couple of examples where the angular noise follo ws a von Mises distribution. 6.3 Discussions In the previous t wo mo dels, w e only consider indep enden t edge noise, i.e., the entries in R ( k ) for a fixed k are indep enden t. Across different frequencies, the en tries R k are dependent through the relations of the irreducible represen tations of the angles ( θ ij , φ ij , and ε ij ) and the graph connectivity . W e note that these are simplified mo dels for illustrating the benefits of using A ( k ) for k > 1 . In the application to cry o-EM 2-D image analysis, the edge p erturbations are induced b y the independent noise from eac h image. In this case, for fixed frequency , the en tries in R ( k ) b ecomes dep enden t since the edge connections and alignments are affected by the noise in eac h image no de. Still w e observe similar b enefits of using A ( k ) for k > 1 with the cryo-EM class av eraging exp erimen ts detailed in Section 7.2 . W e lea ve the analysis of node level noise to future w ork. In addition, the current analysis fo cuses on data p oints that are uniformly distributed on the manifold. F or non-uniformly distributed data points, different normalization techniques introduced in diffusion maps [ 15 ] are needed to comp ensate for the non-uniform sampling density . Representation Theoretic Patterns in Multi-F requency Class A veraging for 3D Cryo-EM 17 (a) H (1) , p = 0 . 5 (b) H (4) , p = 0 . 5 (c) H (8) , p = 0 . 5 (d) H (20) , p = 0 . 5 (e) H (1) , p = 0 . 3 (f ) H (4) , p = 0 . 3 (g) H (8) , p = 0 . 3 (h) H (20) , p = 0 . 3 (i) R (1) , p = 0 . 5 (j) R (20) , p = 0 . 5 (k) R (1) , p = 0 . 3 (l) R (20) , p = 0 . 3 Fig. 2: Histograms of the eigenv alues of H ( k ) and R ( k ) in ( 44 ) for data generated from random rewiring mo del with N = 1000 , p = 0 . 5 , and p = 0 . 3 . (a) p = 0 . 5 (b) p = 0 . 3 (c) p = 0 . 15 Fig. 3: Proportion of the estimated nearest neighbors that satisfy h π ( x i ) , π ( x j ) i > 0 . 85 for p = 0 . 5 , 0 . 3 , and 0 . 15 . The n umber of frames N = 1000 and the num b er of nearest neigh b ors is 50. 7 Numerical Results W e conducted tw o sets of numerical exp eriments. The first set inv olves simulations of the probabilistic mo del in tro duced in [ 63 ]. The second set applies the proposed algorithm on the noisy simulated pro jection images of a 3-D volume of 70S rib osome. W e p oint out that there is no direct wa y to compare the p erformance of classification algorithms on real microscop e images, since their viewing directions are unkno wn. The only wa y to compare classification algorithms on real data is indirectly , b y ev aluating the resulting 3-D reconstructions. Here w e conduct only numerical exp eriments from which conclusions can be drawn directly for 2-D images. All 18 Yifeng F an, Tingran Gao, Zhizhen Zhao k = 1 k = 3 k = 5 p = 1 p = 0 . 2 p = 0 . 1 p = 0 . 08 Fig. 4: Bar plots of the 19 largest eigenv alues of the e H ( k ) at different k and p v alues. exp erimen ts in this section w ere executed on a Lin ux mac hine with 16 Intel Xeon 2.5GHz cores and 512GB of RAM. 7.1 Experiments with Random Rewiring Model W e generate N = 10 , 000 orthonormal frames x 1 , . . . , x N in R 3 , uniformly sampled from SO(3) with respect to the normalized Haar measure. T o generate the noisy graph under the probabilistic mo del introduced in [ 63 ], w e keep the correct edge in the neigh b orho o d graph with probabilit y p , and use the ground truth lo cal parallel transp ort data e ıkθ ij in ( 43 ). With probability 1 − p , we rewire the edge suc h that the no de i is connected to a randomly selected no de that is not connected with i . F or the rewired edge, the optimal in-plane rotational alignmen t angle is replaced with an angle uniformly sampled from 0 to 2 π . In the first exp erimen t, we use a small dataset with N = 1000 frames in order to visualize all eigen v alues of H ( k ) . The clean geometric neighborho o d is constructed by connecting p oin ts where h π ( x i ) , π ( x j ) i > 0 . 8 (the op ening angle α = 36 . 9 ◦ ) to make sure that the graph is well connected. W e v ary p and compute all the Representation Theoretic Patterns in Multi-F requency Class A veraging for 3D Cryo-EM 19 k = 1 k = 5 k = 10 p = 1 p = 0 . 2 p = 0 . 1 p = 0 . 08 Fig. 5: Scatter plots of A ( k ) ij against ( h π ( x i ) , π ( x j ) i + 1) k / 2 k at p = 1 , 0 . 2 , 0 . 1 and 0 . 08 and k = 1 , 5 , and 10 . The robustness of the approximation ( 39 ) is considerably more robust for larger values of k . eigen v alues of H ( k ) to illustrate the analysis in Section 6 . Figure 2 sho ws the histograms of the eigen v alues of the matrices H ( k ) and R ( k ) . W e observe that the top eigenv alue of H ( k ) decreases as k decreases which is consisten t with Corollary 1 . The upp er bound for k R ( k ) k as discussed in Section 6 is 2 √ N sin α 2 = 20 , which is consistent with the results sho wn in the b ottom row of Figure 2 . In addition, the same figure sho ws that, k R ( k ) k does not v ary with frequency index k under the random rewiring mo del. Comparing Figure 2a with Figure 2b , we see that the sp ectral gap b etw een (2 k + 1) th and (2 k + 2) th eigen v alues increases. Increasing k further, we observ e that the (2 k + 2) th eigen v alue of H ( k ) , i.e. ˜ ` ( k ) 2 k +2 , b ecomes very close to the right edge of the semi-circle as sho wn in Figure 2c . Figure 3 shows the prop ortion of the estimated 50 nearest neighbors for eac h frame that satisfy h π ( x i ) , π ( x j ) i > 0 . 85 . The prop ortion reaches the maximum at k = 9 for p = 0 . 5 and p = 0 . 3 . In the second exp eriment, we use 10 , 000 frames to sho w the sp ectral prop erties and the p erformance of the MF CA algorithm for large sample size. The clean geometric neighborho o d graph is constructed by connecting p oin ts where h π ( x i ) , π ( x j ) i > 0 . 92 (within 23 . 1 ◦ op ening angle). W e compute the eigenv alues and eigenv ectors 20 Yifeng F an, Tingran Gao, Zhizhen Zhao (a) p = 1 (b) p = 0 . 2 (c) p = 0 . 1 (d) p = 0 . 08 Fig. 6: Scatter plots for log multi-frequency class a veraging affinity log A all ij against h π ( x i ) , π ( x j ) i at p = 1 , 0 . 2 , 0 . 1 and 0 . 08 . (a) p = 1 (b) p = 0 . 2 (c) p = 0 . 1 (d) p = 0 . 08 Fig. 7: Histogram of the angles ( x -axis, in degrees) b etw een the viewing directions of 10,000 sim ulated frames and its 50 neighboring p oints at p = 1 , 0 . 2 , 0 . 1 , and 0 . 08 . F or A All , we use k max = 20 . (a) p = 0 . 2 (b) p = 0 . 1 (c) p = 0 . 08 Fig. 8: Comparing the p erformance of different affinities according to A ( k ) , A All , S All , and B ( k ) . W e ev aluate the prop ortion of the estimated nearest neighbors that satisfy h π ( x i ) , π ( x j ) i > 0 . 95 . of the normalized Hermitian matrix, e H ( k ) = D − 1 / 2 H D − 1 / 2 . Figure 4 shows the top eigenv alues of e H ( k ) . The m ultiplicities 2 k + 1 , 2 k + 3 , 2 k + 5 , . . . of the top eigenv alues are clearly demonstrated in the bar plots for p = 1 (the first row in Figure 4 ). As p decreases, the top sp ectral gap gets smaller and when p = 0 . 1 , it is h ard to iden tify the spectral gap for k = 1 , whereas the top sp ectral gap at k = 5 is still noticeable. This is consisten t with our exp ectation for impro ved sp ectral stabilit y for larger k . The estimated A ( k ) ij ’s provide go o d approximations to ( h π ( x i ) , π ( x j ) i + 1) k / 2 k (see the top ro w of Figure 5 ). This appro ximation deteriorates as p decreases. The low er left sub-figure of Figure 5 shows that the original single frequency class av eraging nearest neigh b or search algorithm fails at p = 0 . 08 . Figure 6 shows the scatter plots of the combined affinity against the dot pro ducts h π ( x i ) , π ( x j ) i b etw een the true viewing angles at v arying p . Even at p = 0 . 08 , the combined affinit y A All ij is still able to iden tify frames of similar viewing directions. W e ev aluate the p erformance of the prop osed algorithms on the nearest neighbor search by insp ecting the magnitudes of the angles b etw een the viewing directions of frames identified as neighbors by the algorithm. W e iden tify for each frame 50 nearest neighbors with respect to the affinity measure, and plot in Figure 7 the histogram of the angles b etw een the viewing directions of neighboring frames for v arying rewiring probabilities p = 1 , 0 . 2 , 0 . 1 , 0 . 08 . F rom Figure 7 , w e observe that using the affinity A ( k ) in ( 10 ) at higher frequency helps Representation Theoretic Patterns in Multi-F requency Class A veraging for 3D Cryo-EM 21 (a) h π ( x i ) , π ( x j ) i = − 0 . 20 (b) h π ( x i ) , π ( x j ) i = 0 . 99 Fig. 9: Histograms of the affinities A ( k ) ij with k = 1 , . . . , 25 for (a) a pair of wrongly identified nearest neighbors by A (1) and (b) a go od nearest neighbor pair iden tified by A All , but not b y any A ( k ) . The data is generated under random rewiring mo del with p = 0 . 08 . impro ve the performance of the single-frequency class a veraging nearest neigh b or search algorithm, especially for the noisy graph at p = 0 . 08 (i.e., 92% of the true edges are corrupted). Moreo ver, combining the measures at differen t k ’s according to ( 11 ) further impro ves the classification results with significant reduction of outliers at p = 0 . 08 compared to the single frequency nearest neighbor identification results. Singer et al. proposed to use more than top 3 eigenv ectors from e H (1) for nearest neigh b or classification in [ 63 , Section 7]. W e include it as an additional baseline for comparison here to illustrate the benefit of using the eigen vectors of e H ( k ) for k > 1 . Sp ecifically , using the top 2 k + 1 eigen vectors of e H (1) , w e define the affinity B ( k ) as, e Ψ (1) k ( i ) =  ψ (1) 1 ( i ) , ψ (1) 2 ( i ) , . . . , ψ (1) 2 k +1 ( i )  , B ( k ) =    h e Ψ (1) k ( i ) , e Ψ (1) k ( j ) i    k e Ψ (1) k ( i ) kk e Ψ (1) k ( j ) k . (53) W e compare the p erformance of the algorithms in terms of the prop ortion of estimated nearest neighbors that satisfy h π ( x i ) , π ( x j ) i > 0 . 95 . Figure 8b shows that under large noise regimes, where 90% of the clean edges are randomly rewired, using A ( k ) at k = 16 outp erforms the previous class a veraging algorithm that uses only the eigen vectors from e H (1) . As shown in Figure 8c , combining the information from differen t frequency channels can significantly b o ost the performance in finding true nearest neighbors. F or S All , the proportion reac hes the maxim um v alue 0 . 90 at k = 30 . F or A All , the prop ortion reaches the maximum v alue 0 . 94 at k = 20 . Because the higher-order terms A ( k ) get m uc h smaller than 1 and become less informativ e, incorp orating more A ( k ) comp onen ts deteriorates the performance of the com bined score A All when k > 20 . The combined affinit y S All is more stable at large k . T o understand why the com bined affinities can significantly improv e the classification results at p = 0 . 08 , w e chec k the v alues of A ( k ) for k = 1 , . . . , 25 for pairs of frames x i and x j that satisfy h π ( x ) i , π ( x ) j i < 0 . 95 , but are still identified as nearest neighbors b y A (1) . W e observe that although the corresp onding affinities at frequency 1 are ab ov e 0.97, A ( k ) ij at other frequency indices are b elow 0.7 and concen trated on the interv al (0 , 0 . 2] (see the example in Figure 9a ). Therefore, the com bined affinity A All is v ery small and such pair will be remo ved from the nearest neighbor list. In contrast, for a pair of true nearest neigh b ors that do es not app ear in an y nearest neighbor list by A ( k ) for k = 1 , . . . 25 , w e observ e that although the affinities are lo wer than 0.7, all individual affinities lie b etw een 0.2 and 0.5 (see Figure 9b ). Thus the combined affinity A All is higher for the pair in Figure 9b than the pair in Figure 9a . In summary , A All is able to not only reject wrongly identified nearest neighbors b y A ( k ) , but also find new correct nearest neighbors that are missed by A ( k ) . In the third exp eriment, we incorp orate the small angular p erturbation in to the random rewiring mo del according to Eq. ( 49 ). Sp ecifically , we assume that the distribution of the angular error follows the von Mises distribution, γ ( ε ) = e κ cos( ε ) 2 π I 0 ( κ ) , (54) where I 0 ( κ ) is the mo dified Bessel function of order 0 . The parameter κ controls the concentration of the distribution. F or this particular distribution, c k = E ( e ık ) = I k ( κ ) I 0 ( κ ) , where I k ( κ ) is the modified Bessel function of order k for k > 0 . The clean geometric neigh b orho o d graph is constructed by connecting p oints where 22 Yifeng F an, Tingran Gao, Zhizhen Zhao (a) γ ( ε ) (b) κ → ∞ (c) κ = 500 (d) κ = 64 Fig. 10: Comparing the p erformance of differen t affinities according to A ( k ) , A All , S All , and B ( k ) . (b) – (d) The prop ortion of the estimated nearest neighbors that satisfy h π ( x i ) , π ( x j ) i > 0 . 95 with v arious κ . (a) Distributions of ε ij for κ = 500 and κ = 64 . Clean Pro jections SNR = 0 . 05 SNR = 0 . 01 SNR = 0 . 008 Fig. 11: Samples of simulated pro jection images on 70S rib osome. F rom left to right: Clean pro jection images, images contam- inated by additive white Gaussian noise with signal to noise ratio SNR = 0 . 05 , 0 . 01 , and 0 . 008 . h π ( x i ) , π ( x j ) i > 0 . 7 with 10,000 frames. W e fix p = 0 . 08 ( 92% of the clean edges are randomly rewired) and v ary the parameter κ in von Mises distribution. W e sho w the accuracy of the 50-nearest neigh b or iden tification in Figure 10 . Figure 10a depicts the distribution of the angle ε with κ = 500 and κ = 64 . Figure 10b shows the results for random rewiring mo del without angular perturbation and the p erformance of A ( k ) is consistently b etter than B ( k ) . Comparing Figure 10b with Figure 8c , w e find that w e ac hieve higher accuracy in the nearest neigh b or identification from a more densely connected graph in all approac hes. F rom Figures 10b – 10d , w e find the p erformance of B ( k ) is stable o ver small angular p erturbation. In comparison, the p erformance of single frequency affinit y A ( k ) deteriorates as κ increases. This is due to the fact that c k gets smaller as κ increases and b oth the top sp ectral gap and top eigenv alue of E H ( k ) dep end on c k . Despite this, the combined scores still achiev e higher accuracy than A ( k ) and B ( k ) . 7.2 Experiments with Sim ulated Cryo-EM Images In this section, w e apply multi-frequency class av eraging on sim ulated cry o-EM pro jection images. F or eac h image, the goal is to identify pro jection images with similar viewing directions. W e simulate N = 10 , 000 clean pro jection images of size 129 × 129 pixels from a 3-D electron density map of the 70S rib osome. The orientations for the pro jection images are uniformly distributed ov er SO(3) . The clean images are contaminated by additive white Gaussian noise with differen t signal to noise ratios (SNRs). Sample images are presented in Figure 11 . Here, we do not consider the effects of contrast transfer functions (CTF s) on the images. In order to initially iden tify similar images and the corresp onding rotational alignments, we first expand eac h image on steerable basis, and denoise the images b y using ste er able PCA (sPCA) [ 73 ]. Then we generate the rotationally in v ariant features [ 75 ] from the filtered expansion co efficients to efficien tly iden tify nearest neighbors without performing all pairwise alignments. The optimal alignmen t parameters are estimated b etw een initial nearest neighbor pairs. The initial nearest neighbor list and alignment parameters are used to construct the initial graph. F or clean images, the initial graph corresp onds to the true neighborho o d graph. F or the extremely noisy images illustrated in Figure 11 , the initial similarity measure is corrupted by noise and images of totally different views can b e misiden tified as nearest neigh b ors. Representation Theoretic Patterns in Multi-F requency Class A veraging for 3D Cryo-EM 23 k = 1 k = 3 k = 5 Clean SNR = 0 . 05 SNR = 0 . 01 SNR = 0 . 008 Fig. 12: Bar plots of the top 20 eigenv alues at differen t frequency k and signal to noise ratio (SNR) for simulated cryo-EM pro jection images. In Figure 12 , we presen t the sp ectral patterns of the top eigenv alues of e H ( k ) built from our initial neigh- b orho o d iden tification and rotational alignmen t. A t high SNR, suc h as SNR ≥ 0 . 05 , we can clearly observe the m ultiplicities 2 k + 1 , 2 k + 3 , 2 k + 5 , . . . and the sp ectral gaps. As the SNR decreases, such sp ectral patterns deteriorate. In Figure 13 , we present the scatter plots of A ( k ) ij against ( h π ( x i ) , π ( x j ) i + 1) k / 2 k , with different SNRs. Similar to the syn thetic dataset, the A ( k ) ij ’s at frequency k = 1 fail at low SNRs, such as SNR = 0 . 01 0 . 008 , while the A ( k ) ij ’s at frequency k = 5 , 10 are still able to distinguish the images with similar viewing directions (i.e., ( h π ( x i ) , π ( x j ) i + 1) k / 2 k ≈ 1 ). This result indicates that b etter neighborho o d image identification can b e attained using higher frequency k . Moreov er, Figure 14 sho ws the scatter plots of the combined affinity against the dot products h π ( x i ) , π ( x j ) i b etw een the true viewing angles at v arying SNRs. Even at SNR = 0 . 01 , the com bined affinity A All ij is still able to distinguish pro jection images that hav e similar views π ( x ) , in contrast to the approximation results in Figure 13 . In Figure 15 , we ev aluate the results b y plotting the histogram of angels betw een viewing directions arccos( π ( x i ) , π ( x j )) betw een all identified neighboring images I i and I j . A t high SNR, suc h as SNR = 0 . 05 , using single frequency information as k = 1 , 3 , 5 can ac hieve similar results 24 Yifeng F an, Tingran Gao, Zhizhen Zhao k = 1 k = 5 k = 10 Clean SNR = 0 . 05 SNR = 0 . 01 SNR = 0 . 008 Fig. 13: Scatter plots of A ( k ) ij against ( h π ( x i ) , π ( x j ) i + 1) k / 2 k for clean and noisy pro jection images with SNR = 0 . 05 , 0 . 01 , 0 . 008 , at frequency k = 1 , 5 , and 10 . as combining all the frequencies. At low SNRs, suc h as SNR = 0 . 01 and 0 . 008 , A All whic h uses all frequencies information up to k = 20 , outperforms the results obtained from using only a single frequency at k = 1 , 3 , 5 . In Figure 16 , we compare the nearest neigh b or classification results using affinities A ( k ) , B ( k ) , A All , and S All at v arious frequency index k for noisy images with SNR = 0 . 05 , 0 . 01 , and 0 . 008 . The latter t wo affinities com bine A ( k 0 ) for k 0 = 1 , . . . , k . Each image is identified with 50 nearest neighbors and we ev aluate the prop ortion of the estimated nearest neighbors that satisfy h π ( x i ) , π ( x j ) i > 0 . 9 . A t SNR = 0 . 05 , all approaches achiev e high accuracy (see Figure 16a ). At SNR = 0 . 01 , A ( k ) is able to achiev e b etter classification results than B ( k ) for k b et ween 4 and 32 and the prop ortion reac hes 67 . 7% for A ( k ) at k = 22 . Using A All can improv e the results further at k = 40 , where the prop ortion reac hes 68 . 3% . The impro v ement of A ( k ) and A All compared with B ( k ) gets more prominen t at low er SNR (see Figure 16c with SNR = 0 . 008 ). W e note that the construction of the initial graph structure relies on the ev aluation of the rotational inv ariant distance based on the steerable PCA expansion co efficients of the pro jection images [ 74 , 73 , 75 ]. Thus the noise Representation Theoretic Patterns in Multi-F requency Class A veraging for 3D Cryo-EM 25 (a) Clean Pro jections (b) SNR = 0 . 05 (c) SNR = 0 . 01 (d) SNR = 0 . 008 Fig. 14: Scatter plots for log multi-frequency class av eraging affinity log A All ij against h π ( x i ) , π ( x j ) i for clean and noisy pro jection images with SNR = 0 . 05 , 0 . 01 , and 0 . 008 . (a) Clean Pro jections (b) SNR = 0 . 05 (c) SNR = 0 . 01 (d) SNR = 0 . 008 Fig. 15: Histogram of the angles ( x -axis, in degrees) b etw een the viewing directions of 10 , 000 simulated cry o-EM pro jection images and its 50 neighboring projection images, with differen t SNRs, from left to right : clean pro jection images, SNR = 0 . 05 , 0 . 01 , 0 . 008 . Here we set the maximum frequency k max = 20 . (a) SNR = 0 . 05 (b) SNR = 0 . 01 (c) SNR = 0 . 008 Fig. 16: Comparing the p erformance of different affinities according to A ( k ) , A All , S All , and B ( k ) for v arying k on noisy cryo-EM images. W e ev aluate the prop ortion of the estimated nearest neigh b ors that satisfy h π ( x i ) , π ( x j ) i > 0 . 9 . (a) h π ( x i ) , π ( x j ) i = − 0 . 55 (b) h π ( x i ) , π ( x j ) i = 0 . 99 (c) h π ( x i ) , π ( x j ) i = − 0 . 22 Fig. 17: Histograms of the affinities A ( k ) ij with k = 1 , . . . , 40 for (a) a pair of wrongly iden tified nearest neighbors b y A (1) , (b) a go o d nearest neigh bor pair iden tified b y A All , and (c) a wrongly identified nearest neighbor by A All . The simulated cryo-EM images are of SNR = 0 . 01 . 26 Yifeng F an, Tingran Gao, Zhizhen Zhao mo del is different from the probablistic mo dels in Section 6 and the p erturbation at each edge is induced by the noise on the corresp onding t w o nodes. Despite the difference in the noise mo del, w e still observ e the b enefit of using A ( k ) with k > 1 . How ev er, the improv ement of the com bined affinity A All is not as impressiv e as the examples shown in Figure 8 and Figure 10 . Although we observe that certain miss-classified nearest neighbors b y A (1) can b e corrected by A All as shown in Figure 17a , there are still some wrong nearest neighbors that enjo y consisten tly high affinities across different k ’s as sho wn in Figure 17c . 8 Conclusion and F uture W ork W e prop ose in this paper a nov el algorithm, referred to as m ulti-frequency class av eraging (MFCA), for clas- sifying noisy pro jection images in three-dimensional cryo-electron microscop y by the similarity among viewing directions. The new algorithm is a generalization of the eigen vector-based approach of in trinsic classification first appeared in [ 63 , 38 ]. W e also extended the represen tation theoretical framew ork of [ 37 , 38 ] b y means of explicit constructions inv olving the Wigner D -matrices, which completely characterizes the sp ectral informa- tion of a generalized lo calized parallel transp ort operator acting on sections of certain complex line bundle ov er the t wo-dimensional unit sphere in R 3 ; these theoretical results conceptually establish the admissibility and (impro ved) stability of the new MFCA algorithm. One intriguing future direction is to inv estigate in to refined and more systematic aggregations of the results obtained from eac h individual frequency c hannel. P otential candidates include (1) the harmonic-retriev al-type transformations as in m ulti-frequency phase synchronization [ 32 ], (2) cross-frequency in v ariant features suc h as bisp ectrum [ 41 , 8 ], and (3) tensor-based optimizations for multi-dimensional arrays [ 43 , 58 , 1 ]. The main idea is to further exploit the redundancy in the reconstructed information across differen t irreducible representations. A direct extension of the MF CA theoretical framew ork could be a refined geometric in terpretation of the m ulti- frequency v ector diffusion maps [ 24 ] in terms of aggregating in v arian t embeddings of the same underlying base manifold from m ultiple associated vector bundles of a fixed common principal bundle. Another future direction of in terest is to integrate the mult-frequency metho dology in to existing algorithmic approac hes for tac kling the heter o geneity problem in cryo-EM imaging analysis and comparative biology [ 2 , 46 , 31 ]. In the context of cryo-EM, this problem o ccurs when molecules in distinct conformations co exist in solution, and th us images collected in cryo-EM imaging from random orientations should typically b e first clustered in to subgroups (using e.g. the maximum likelihoo d classification approac hes [ 60 , 57 ]) b efore single- particle reconstruction techniques can b e applied to each individual subgroup. Recen t studies [ 16 , 28 , 27 ] even pro vided evidence for a contin uous distribution of conformation states to present in a solution, whic h is far b ey ond the capability of maxim um lik eliho o d classification metho ds. W e expect significant p erformance b o ost and sharp er theoretical results from extensions of the m ulti-frequency methodology in these problems. A ckno wledgemen ts The authors thank V era Miky oung Hur, Jared Bronski, Shmuel W einberger, and Shamgar Gurevic h for useful discussions. App endix A Basics on Group and Represen tation Theory A group G is a set with a m ultiplication op eration: G × G 7→ G ob eying the follo wing axioms: 1. F or an y x, y ∈ G , xy ∈ G (closure); 2. F or an y x, y, z ∈ G , ( xy ) z = x ( y z ) (associativity); 3. There is a unique elemen t of G denoted e and called the iden tity for whic h ex = xe = x for an y x ∈ G ; 4. F or an y x ∈ G there is a corresponding elemen t x − 1 ∈ G called the inv erse of x , which satisfies xx − 1 = x − 1 x = e for any x ∈ G . The group op erations may not b e commutativ e, i.e., xy is not necessarily equal to y x . This is crucial for our presen t purposes since 3D rotations do not commute. W e hav e a group G acting on a set X . This means that each g ∈ G has the corresp onding transformations based on a left (group) action L g : X → X and a righ t (group) action R g : X → X . A left (group) action of G on X is a rule for combining elemen ts g ∈ G and elements x ∈ X , denoted b y g B x . W e additionally require the following three axioms. 1. g B x ∈ X for all x ∈ X and g ∈ G . Representation Theoretic Patterns in Multi-F requency Class A veraging for 3D Cryo-EM 27 2. e B x = x for all x ∈ X . 3. g 2 B ( g 1 B x ) = ( g 2 g 1 ) B x for all x ∈ X and g 1 , g 2 ∈ G . A right (group) action of G on X is a rule for combining elements g ∈ G and elemen ts x ∈ X , denoted b y x C g . W e additionally require the following three axioms. 1. x C g ∈ X for all x ∈ X and g ∈ G . 2. x C e = x for all x ∈ X . 3. ( x C g 1 ) C g 2 = x C ( g 1 g 2 ) for all x ∈ X and g 1 , g 2 ∈ G . The action of G on X extends to functions on X as shown in ( 19 ). In the pap er we fo cus on tw o groups, namely SO(2) and SO(3) . Both are compact Lie groups and admit irreducible representations. The group SO(2) is commutativ e and thus its irreducible representations are one dimensional complex num b ers, ρ k ( w ( θ )) = e ιkθ , for w ∈ SO(2) with a rotational angle θ ∈ [0 , 2 π ) . The irre- ducible representations of SO(3) are giv en b y the Wigner D -matrices, whic h will be described in the subsection b elo w. App endix A.1 Wigner’s D - and d -Matrices In this section we recall the definition and relev ant prop erties of the Wigner’s D - and d -matrices, whic h are used extensively in the pap er for explicit computations related to the irreducible represen tations of SO(3) . Recall that elemen ts of SO(3) are realized as rotation matrices parameterized by Euler angles ( ϕ, ϑ, ψ ) ∈ [0 , 2 π ) × [0 , π ] × [0 , 2 ψ ) : each x ∈ SO (3) can be explicitly written as x = x ( ϕ, ϑ, ψ ) =   cos ϕ cos ψ − sin ϕ sin ψ cos ϑ − cos ϕ sin ψ − sin ϕ cos ψ cos ϑ sin ϕ sin ϑ sin ϕ cos ψ + cos ϕ sin ψ cos ϑ − sin ϕ sin ψ + cos ϕ cos ψ cos ϑ − cos ϕ sin ϑ sin ψ sin ϑ cos ψ cos ϑ cos ϑ   . (55) Note that this is equiv alent to writin g x = R 1 ( ϕ ) R 2 ( ϑ ) R 3 ( ψ ) , where R 1 ( ϕ ) =   1 0 0 0 cos ϕ − sin ϕ 0 sin ϕ cos ϕ   , R 2 ( ϑ ) =   cos ϑ 0 sin ϑ 0 1 0 − sin ϑ 0 cos ϑ   , R 3 ( ψ ) =   cos ψ − sin ψ 0 sin ψ cos ψ 0 0 0 1   . The last column in the matrix representation ( 55 ) is exactly the view dir e ction corresp onding to x ∈ SO (3) . F or the simplicity of statements, we denote the viewing direction of x ∈ SO(3) as π ( x ) = π ( x ( ϕ, ϑ, ψ )) = (sin ϕ sin ϑ, − cos ϕ sin ϑ, cos ϑ ) > ∈ R 3 . F or each integer ` = 0 , 1 , 2 , . . . , the Wigner’s D -matrix SO (3) 3 x 7→ D ` ( x ) ∈ C (2 ` +1) × (2 ` +1) is the unique (up to isomorphism) irreducible matrix representation of SO(3) of index ` . F or eac h x ∈ SO(3) , D ` ( x ) is a (2 ` + 1) -b y- (2 ` + 1) complex Hermitian matrix, of which the entries we denote by D ` mn ( x ) ( − ` ≤ m, n ≤ ` ). As group represen tations, w e ha v e for any ` = 0 , 1 , . . . and any x, x 0 ∈ SO(3) the multiplicativ e formula D `  x 0  D ` ( x ) = D `  x 0 B x  . (56) The 2 ` + 1 en tries in the central column of D ` , i.e., D ` m 0 ( − ` ≤ m ≤ ` ), gives rise to the 2 ` + 1 indep enden t spherical harmonics of degree ` . More generally , the 2 ` + 1 entries in the s th column ( − ` ≤ s ≤ ` ) of D ` giv e rise to the 2 ` + 1 indep endent spin-weigh ted spherical harmonics of degree ` and weigh t s [ 20 , 33 ]. Using the Euler angles, Wigner’s D -matrices can b e written explicitly as D ` mn ( ϕ, ϑ, ψ ) := D ` mn ( x ( ϕ, ϑ, ψ )) = e − ιmϕ d ` mn ( ϑ ) e − ιnψ , m, n = − `, . . . , ` (57) where matrices d ` ( ϕ ) are kno wn as Wigner’s d -matric es . They are real (2 ` + 1) -by- (2 ` + 1) matrices with an explicit formula for its ( m, n ) th en try as d ` mn ( ϑ ) = ( − 1) ` − n [( ` + m )! ( ` − m )! ( ` + n )! ( ` − n )!] 1 / 2 X s ( − 1) s  cos ϑ 2  m + n +2 s  sin ϑ 2  2 ` − m − n − 2 s s ! ( ` − m − s )! ( ` − n − s )! ( m + n + s )! 28 Yifeng F an, Tingran Gao, Zhizhen Zhao with the sum running ov er all s ∈ Z that mak e sense of the factorials [ 50 , §3.3.2]. W e will only need the explicit form of d ` mn for the sp ecial case m = n = − ` : In this case it is straightforw ard to v erify that the summation consists of only one term s = 2 ` , and hence d ` − `, − ` ( ϑ ) =  cos ϑ 2  2 ` =  cos 2 ϑ 2  ` =  1 + cos ϑ 2  ` . (58) Alternativ ely , d ` mn can also b e written explicitly in terms of Jacobi p olynomials as (see e.g. [ 50 , §13.1.1]) d ` mn ( ϑ ) = 2 − m  ( ` − m )! ( ` + m )! ( ` − n )! ( ` + n )!  1 2 (1 − cos ϑ ) m − n 2 (1 + cos ϑ ) m + n 2 P ( m − n,m + n ) ` − m (cos ϑ ) (59) where n P ( a,b ) n : n = 0 , 1 , 2 , . . . o denote the sequence of Jacobi p olynomials with parameters a, b [ 50 , §13.1.1]. This gives rise to the explicit formula for the diagonal entries of the Wigner d -matrices: d ` mm ( ϑ ) = 2 − m (1 + cos ϑ ) m P (0 , 2 m ) ` − m (cos ϑ ) . (60) In particular, w e see directly from ( 59 ) that d ` mn (0) = δ mn P (0 , 2 m ) ` − m (1) = δ mn · ` − m ` − m ! = δ mn (61) where δ mn is the Kronec ker delta notation δ mn = ( 1 if m = n 0 otherwise. If the Euler angles of x 0 tak e the form (0 , 0 , ψ ) , then by ( 56 ) w e hav e D ` mn ((0 , 0 , ψ ) B x ) = ` X s = − ` D ` ms (0 , 0 , ψ ) D ` sn ( x ) ( 57 ) = = ` X s = − ` d ` ms (0) e − ιsψ D ` sn ( x ) ( 61 ) = = e − ιmψ D ` mn ( x ) . (62) W e will need this relation in the proof of Theorem 2 . Recall from [ 66 , pp.21–22] that Euler angles admit physical interpretations for the rotation matrix: If we denote the canonical right-handed orthonormal basis in R 3 b y { e 1 , e 2 , e 3 } , and write R e i ( α ) ∈ SO (3) for the rotation around axis e i ( i = 1 , 2 , 3 ) by angle α , then rotation b y x ( ϕ, ϑ, ψ ) ∈ SO (3) is equiv alent to i) rotation b y angle ϕ around e 3 , ii) rotation by angle ϑ around the new axis e 0 2 = R e 3 ( ϕ ) e 2 , and iii) rotation by angle ψ around the new axis e 0 3 = R e 3 ( ϕ ) R e 2 ( ϑ ) e 3 . F rom this geometric interpretation, it is clear that the action of SO(2) on SO(3) considered throughout this pap er only affects the Euler angle ψ . In other w ords, under the canonical identification of SO(2) with SO(3) elements of the form g = g ( α ) =   cos α − sin α 0 sin α cos α 0 0 0 1   , α ∈ [0 , 2 π ) (63) then x ( ϕ, ϑ, ψ ) C g r ( α ) = x ( ϕ, ϑ, ψ + α ) . T ogether with ( 57 ), this implies D ` mn ( x ( ϕ, ϑ, ψ ) C g ( α )) = D ` mn ( x ( ϕ, ϑ, ψ + α )) = e − ιnα D ` mn ( x ( ϕ, ϑ, ψ )) = ρ n  g − 1  D ` mn ( x ( ϕ, ϑ, ψ )) (64) where again ρ n stands for the complex unitary irreducible represen tation of SO(2) of character n . App endix B Sp ectral Analysis of the Lo cal Generalized Parallel T ransp ort Op erators W e pro ve Theorem 2 to Theorem 5 in this app endix. Representation Theoretic Patterns in Multi-F requency Class A veraging for 3D Cryo-EM 29 App endix B.1 Pro of of Theorem 2 W e b egin with the isotypic decomp osition ( 20 ), ( 22 ). F ollowing ( 25 ), our strategy is to find a “go o d p oint” x 0 ∈ SO(3) and a “goo d function” u ∈ H n, − k ( n ≥ | k | ) such that u ( x 0 ) 6 = 0 , and ev aluate λ ( k ) n ( h ) =  T ( k ) h u  ( x 0 ) u ( x 0 ) . (65) T o this end, pic k the following basis for the Lie algebra so (3) : A 1 =   0 0 0 0 0 − 1 0 1 0   , A 2 =   0 0 1 0 0 0 − 1 0 0   , A 3 =   0 − 1 0 1 0 0 0 0 0   . It is straigh tforward to chec k that these elemen ts satisfy the commutator relations [ A 3 , A 1 ] = A 2 , [ A 2 , A 3 ] = A 1 , [ A 1 , A 2 ] = A 3 . W e fix x 0 = I 3 , the canonical standard orthonormal frame in R 3 . W e further equip SO(3) with standard spherical co ordinates — the Euler angles — of the form x = x ( ϕ, ϑ, ψ ) = x 0 C e ϕA 3 e ϑA 2 e ψA 3 where ( ϕ, ϑ, ψ ) ∈ (0 , 2 π ) × (0 , π ) × (0 , 2 π ) , as in [ 38 , §3.2.1]. The normalized Haar measure on SO(3) is giv en b y the density sin θ 8 π 2 d ϕ d ϑ d ψ . Consider the subgroup T A 3 of SO(3) generated by the infinitesimal elemen t A 3 . F or every k ∈ Z and n ∈ N with n ≥ | k | , the Hilb ert space H n,k admits yet another isotypic decomp osition with respect to the left action of T A 3 : H n, − k = n M m = − n H m n, − k (66) where s ∈ H m n, − k if and only if s  e − tA 3 B x  = e ιmt s ( x ) for every x ∈ SO(3) and t ∈ R . (67) As p ointed out in [ 38 , §3.3.1], elements of H m n, − k are often referred to as (generalized) spheric al functions . In the physics literature, they are also known as spin-weighte d spheric al functions , whic h are closely related with Wigner D -matrices [ 10 , 20 , 12 , 34 , 51 ]. W e extend the computation in [ 38 , §3] to k > 1 , by fully leveraging prop erties of the Wigner D -matrices. In fact, w e are going to fix m = − k and c ho ose the “go o d function” u as D n − k, − k , the ( − k, − k ) th entry of the Wigner D -matrix of weigh t n , for any n ≥ | k | — it is clear from ( 64 ) that D n − k, − k ∈ H − k for any n ≥ | k | , and from ( 62 ) we know that D n − k, − k satisfies ( 67 ) with m = − k . Our goal is to ev aluate λ ( k ) n ( h ) =  T ( k ) h D n − k, − k  ( x 0 ) D n − k, − k ( x 0 ) . (68) No w, on the one hand w e ha v e D n − k, − k ( x 0 ) = D n − k, − k (0 , 0 , 0) ( 57 ) = = d n − k, − k (0) ( 61 ) = = 1 . (69) On the other hand, note that by the inv ariance and equiv ariance of the transp ort data ( 17 ) w e hav e for any x = x ( ϕ, ϑ, ψ ) ∈ SO(3) T ( k ) ( x 0 , x ) = T ( k )  x 0 , x 0 C e ϕA 3 e ϑA 2 e ψA 3  = T ( k )  x 0 , e ϕA 3 B x 0 C e ϑA 2 e ψA 3  = T ( k )  e − ϕA 3 B x 0 , x 0 C e ϑA 2 e ψA 3  = T ( k )  x 0 C e − ϕA 3 , x 0 C e ϑA 2 e ψA 3  = e ιkϕ T ( k )  x 0 , x 0 C e ϑA 2  e ιkψ , 30 Yifeng F an, Tingran Gao, Zhizhen Zhao and D n − k, − k ( x ) = D n − k, − k ( ϕ, ϑ, ψ ) = e − ιkϕ d n − k, − k ( ϑ ) e − ιkψ = e − ιkϕ d n − k, − k  x 0 C e ϑA 2  e − ιkψ . Therefore,  T ( k ) h D n − k, − k  ( x 0 ) = Z B ( x,α ) T ( k ) ( x 0 , x ) D n − k, − k ( x ) d x = Z B ( x,α ) T ( k )  x 0 , x 0 C e ϑA 2  D n − k, − k  x 0 C e ϑA 2  d x ( ϕ, ϑ, ψ ) = Z B ( x,α ) ρ k  T  x 0 , x 0 C e ϑA 2  D n − k, − k  x 0 C e ϑA 2  d x ( ϕ, ϑ, ψ ) ( ∗ ) = = Z B ( x,α ) D n − k, − k  x 0 C e ϑA 2  C T  x 0 C e ϑA 2 , x 0  d x ( ϕ, ϑ, ψ ) ( ∗∗ ) = = Z B ( x,α ) D n − k, − k  x 0 C e ϑA 2  d x ( ϕ, ϑ, ψ ) , where ( ∗ ) used the fact that D n − k, − k ∈ H − k , and ( ∗∗ ) follows from the definition ( 15 ) and the geometric fact that x 0 C e ϑA 2 is exactly the parallel transp ort of x 0 along the unique geo desic connecting π ( x 0 ) to π  x 0 C e ϑA 2  :  x 0 C e ϑA 2  C T  x 0 C e ϑA 2 , x 0  = t π ( x 0 C e ϑA 2 ) ,π ( x 0 ) x 0 = x 0 C e ϑA 2 . It follows that  T ( k ) h D n − k, − k  ( x 0 ) = Z B ( x,α ) D n − k, − k  x 0 C e ϑA 2  d x ( ϕ, ϑ, ψ ) = 1 (2 π ) 2 Z 2 π 0 d ϕ Z α 0 sin ϑ 2 D n − k, − k (0 , ϑ, 0) d ϑ = Z α 0 sin ϑ 2 d n − k, − k ( ϑ ) d ϑ. Since d n − k, − k = d n k,k (see e.g. [ 50 , formula (3.16)]), this further implies  T ( k ) h D n − k, − k  ( x 0 ) = Z α 0 sin ϑ 2 d n kk ( ϑ ) d ϑ ( 60 ) = = 2 − ( k +1) Z α 0 sin ϑ (1 + cos ϑ ) k P (0 , 2 k ) n − k (cos ϑ ) d ϑ = − 2 − ( k +1) Z α 0 (1 + cos ϑ ) k P (0 , 2 k ) n − k (cos ϑ ) d cos ϑ z :=cos ϑ = = = = = 2 − ( k +1) Z 1 1 − h (1 + z ) k P (0 , 2 k ) n − k ( z ) d z (70) where in the last equality we used h = 1 − cos α . Using the explicit form of Jacobi p olynomials (see e.g. [ 64 , Chap. IV, form ula (4.2.1)]) P (0 , 2 k ) n − k ( z ) = n − k X ν =0 n − k n − k − ν ! n + k ν !  z − 1 2  ν  z + 1 2  n − k − ν = n − k X ν =0 n − k ν ! n + k ν !  z − 1 2  ν  z + 1 2  n − k − ν w e ha ve  T ( k ) h D n − k, − k  ( x 0 ) = 2 − ( k +1) · 2 k n − k X ν =0 n − k ν ! n + k ν ! Z 1 1 − h  z − 1 2  ν  z + 1 2  n − ν d z z :=1 − 2 w = = = = = = 1 2 n − k X ν =0 ( − 1) ν n − k ν ! n + k ν ! Z h 2 0 w ν (1 − w ) n − ν · 2 d w Representation Theoretic Patterns in Multi-F requency Class A veraging for 3D Cryo-EM 31 = n − k X ν =0 ( − 1) ν n − k ν ! n + k ν ! Z h 2 0 w ν (1 − w ) n − ν d w = n − k X ν =0 ( − 1) ν n − k ν ! n + k ν ! B  h 2 ; ν + 1 , n − ν + 1  where B ( x ; a, b ) = R x 0 w a − 1 (1 − w ) b − 1 d w is the incomplete Beta function. It follows that for all n ≥ | k | λ ( k ) n ( h ) =  T ( k ) h D n − k, − k  ( x 0 ) D n − k, − k ( x 0 ) =  T ( k ) h D n − k, − k  ( x 0 ) = n − k X ν =0 ( − 1) ν n − k ν ! n + k ν ! B  h 2 ; ν + 1 , n − ν + 1  . F rom the in tegral form of the incomplete Beta function it is clear that B ( h/ 2; ν + 1 , n − ν + 1) is a polynomial of degree ( n + 1) in h . In particular, by rep eatedly applying the recursiv e relation B ( x ; a + 1 , b ) = a b B ( x ; a, b + 1) − 1 b x a (1 − x ) b , w e easily obtain λ ( k ) k ( h ) = B  h 2 ; 1 , k + 1  = 1 − (1 − h/ 2) k +1 k + 1 , (71) λ ( k ) k +1 ( h ) = B  h 2 ; 1 , k + 2  − (2 k + 1) B  h 2 ; 2 , k + 1  = B  h 2 ; 1 , k + 2  − 2 k + 1 k + 1 B  h 2 ; 1 , k + 2  + 2 k + 1 k + 1  h 2   1 − h 2  k +1 = 2( k + 1)(1 − (1 − h/ 2) k +2 ) k + 2 − (2 k + 1)(1 − (1 − h/ 2) k +1 ) k + 1 , (72) λ ( k ) k +2 ( h ) = B  h 2 ; 1 , k + 3  − 4 ( k + 1) B  h 2 ; 2 , k + 2  + ( k + 1) (2 k + 1) B  h 2 ; 3 , k + 1  = B  h 2 ; 1 , k + 3  − 2B  h 2 ; 2 , k + 2  − (2 k + 1)  h 2  2  1 − h 2  k +1 = k k + 2 B  h 2 ; 1 , k + 3  + 2 k + 2  h 2   1 − h 2  k +2 − (2 k + 1)  h 2  2  1 − h 2  k +1 = k k + 2 · 1 − (1 − h/ 2) k +3 k + 3 + 2 k + 2  h 2   1 − h 2  k +2 − (2 k + 1)  h 2  2  1 − h 2  k +1 , (73) whic h giv e rise to ( 29 ) and ( 30 ). It now remains to compute a quadratic approximation for λ ( k ) n ( h ) for h → 0 , for all n ≥ | k | . This can b e done by direct computation using the integral form of the incomplete b eta function: for all n ≥ | k | , λ ( k ) n (0) = 0 , (74) ∂ h λ ( k ) n (0) = n − k X ν =0 ( − 1) ν n − k ν ! n + k ν ! · 1 2  h 2  ν  1 − h 2  n − ν      h =0 = 1 2 , (75) ∂ 2 h λ ( k ) n (0) = n − k X ν =1 ( − 1) ν n − k ν ! n + k ν ! · ν 4  h 2  ν − 1  1 − h 2  n − ν      h =0 + n − k X ν =0 ( − 1) ν n − k ν ! n + k ν ! · − ( n − ν ) 4  h 2  ν  1 − h 2  n − ν − 1      h =0 32 Yifeng F an, Tingran Gao, Zhizhen Zhao = − 1 4 ( n − k ) ( n + k ) − n 4 = − 1 4  n 2 + n − k 2  (76) and ( 28 ) follo ws from the T aylor expansion λ ( k ) n ( h ) = λ ( k ) n (0) + h ∂ h λ ( h ) n (0) + h 2 2 ∂ 2 h λ ( h ) n (0) + O  h 3  . This completes the entire pro of of Theorem 2 . u t App endix B.2 Pro of of Theorem 3 Lemma 1 (1) Ther e exists h ( k ) 1 ∈ (0 , 2] such that λ ( k ) n ( h ) ≤ λ ( k ) k ( h ) for every n ≥ k + 1 and h ∈ (0 , h ( k ) 1 ] . (2) Ther e exists h ( k ) 2 ∈ (0 , 2] such that λ ( k ) n ( h ) ≤ λ ( k ) k +1 ( h ) for every n ≥ k + 2 and h ∈ (0 , h ( k ) 2 ] . Pr o of (Pr o of of L emma 1 ) Since λ ( k ) n (0) = 0 for all k ∈ Z and n ≥ | k | , we will just compare the first order deriv atives ∂ h λ ( k ) n ( h ) ov er an in terv al with 0 as the left end point. By ( 70 ), ∂ h λ ( k ) n ( h ) admits a closed form expression in terms of Jacobi p olynomials: ∂ h λ ( k ) n ( h ) = 1 2  1 − h 2  k P (0 , 2 k ) n − k (1 − h ) h =cos α = = = = = 1 2  1 + cos α 2  k P (0 , 2 k ) n − k (cos α ) . In particular, under change-of-coordinates h = 1 − cos α w e ha ve ∂ h λ ( k ) k ( h ) = 1 2  1 + cos α 2  k P (0 , 2 k ) 0 (cos α ) = 1 2  1 + cos α 2  k , ∂ h λ ( k ) k +1 ( h ) = 1 2  1 + cos α 2  k P (0 , 2 k ) 1 (cos α ) = 1 2  1 + cos α 2  k [1 − ( k + 1) (1 − cos α )] . It is clear that 0 < ∂ h λ ( k ) k +1 ( h ) < ∂ h λ ( k ) k ( h ) for all h = 1 − cos α ∈ (0 , 1 / ( k + 1)] , which together with λ ( k ) k (0) = λ ( k ) k +1 (0) gives rise to λ ( k ) k +1 ( h ) ≤ λ ( k ) k ( h ) for all 0 < h ≤ 1 k + 1 . (77) With ( 77 ), the proof of both (1) and (2) of Lemma 1 is reduced to only the part (2) of Lemma 1 . The remaining of this pro of is dev oted to establishing (2) of Lemma 1 . By the classical result of Szegő [ 64 , Theorem 8.21.13], there exists a fixed p ositive num ber c > 0 suc h that P (0 , 2 k ) n − k (cos θ ) = 1 √ n k ( θ ) h cos ( N θ + γ ) + ( n sin θ ) − 1 O (1) i , for all c n ≤ θ ≤ π − c n (78) where k ( θ ) = 1 p π sin ( θ/ 2) cos ( θ / 2) · [cos ( θ / 2)] 2 k =  2 1 + cos θ  k r 2 π sin θ , N = n + 2 k + 1 2 , λ = − π 4 . In particular, b y making the O (1) term in ( 78 ) explicit, we ha ve for some absolute constant C > 0  1 + cos θ 2  k    P 0 , 2 k n − k (cos θ )    ≤ r 2 nπ · 1 √ sin θ  1 + C n sin θ  for all c n ≤ θ ≤ π − c n . (79) Note that the left hand side is precisely the absolute v alue of 2 ∂ h λ ( k ) n ( h ) = 2 ∂ h λ ( k ) n (1 − cos θ ) . W e seek an upp er bound for the right hand side of ( 79 ) that holds uniformly for all sufficiently large n . T o this end, consider the largest zero of P (0 , 2 k ) n − k ( x ) for x ∈ [ − 1 , 1] , denoted as x ∗ n − k = cos α ∗ n − k (th us α ∗ n − k is the smallest zero of the Representation Theoretic Patterns in Multi-F requency Class A veraging for 3D Cryo-EM 33 function α 7→ P (0 , 2 k ) n − k (cos α ) on α ∈ [0 , π ] ). W ell-known estimates for the extreme zero of Jacobi p olynomials (see e.g. [ 19 , §2.2]) indicates x ∗ n − k > 1 − O  1 n 2  as n → ∞ ⇒ α ∗ n − k → 0 as n → ∞ th us for any  1 > 0 there exists N 1 > 0 suc h that for all sufficiently large n ≥ N 1 w e ha ve sin α ∗ n − k ≥ (1 −  1 ) α ∗ n − k (80) since lim x → 0 (sin x ) /x = 1 . In the mean while, [ 22 , Theorem 3.1] b ounds x ∗ n − k = cos α ∗ n − k from ab ov e by x ∗ n − k <  2 k + 1 2  2 + 4 ( n − k )  n + k + 1 2  (2 n − 2 k + 1 + 2 k ) 2 = 4 n 2 + 2 n + 1 / 4 (2 n + 1) 2 = 1 − 2 n + 3 / 4 (2 n + 1) 2 . (81) Using the elemen tary inequalit y 1 − x 2 / 2 ≤ cos x for x ∈ [0 , 2] , ( 81 ) leads to 1 −  α ∗ n − k  2 2 ≤ cos α ∗ n − k = x ∗ n − k < 1 − 2 n + 3 / 4 (2 n + 1) 2 ⇒ α ∗ n − k > s 4 n + 3 / 2 (2 n + 1) 2 → 1 √ n as n → ∞ whic h further implies (1) for sufficien tly large n , α ∗ n − k ∈ [ c/n.π − c/n ] , and (2) by c ho osing n sufficien tly large w e can ensure for the same arbitrary  1 > 0 c hosen for ( 80 ), that, in addition to ( 80 ), there holds α ∗ n − k > 1 −  1 √ n . (82) No w consider the smallest local extremum µ ∗ n − k of the function P (0 , 2 k ) n − k (cos α ) for α ∈ [0 , π ] that is larger than α ∗ n − k , i.e., µ ∗ n − k := min n α ∈ [0 , π ] | ∂ α P (0 , 2 k ) n − k (cos α ) = 0 and α ≥ α ∗ n − k o whic h b y ( 82 ) is guaranteed to fall within [ c/n, π − c/n ] . F or an y n ≥ N 0 , by ( 79 ), ( 80 ), and ( 82 ), we hav e  1 + cos µ ∗ n − k 2  k    P (0 , 2 k ) n − k  cos µ ∗ n − k     ≤ r 2 π n · 1 p sin µ ∗ n − k  1 + C n sin µ ∗ n − k  ≤ r 2 π n · 1 p sin α ∗ n − k  1 + C n sin α ∗ n − k  ≤ r 2 π n · 1 q (1 −  1 ) α ∗ n − k  1 + C n (1 −  1 ) α ∗ n − k  < 1 (1 −  1 ) n 1 4 r 2 π  1 + C (1 −  1 ) 2 √ n  . The same inequality holds if we replace µ ∗ n − k with an y other extremum of the function α 7→ P (0 , 2 k ) n − k (cos α ) in α ∈ [ c/n, π − c/n ] . In particular, this implies that for all sufficiently large n we hav e (recalling that h = 1 − cos α ) ∂ h λ ( k ) n ( h ) = ∂ h λ ( k ) n (1 − cos α ) ≤ 1 2  1 + cos α 2  k    P (0 , 2 k ) n − k (cos α )    < 1 4 for all a ∈ [ c/n, π − c/n ] . The rest of the pro of follows easily from the proof of [ 38 , Theorem 4]: Let a 0 ∈ (0 , π ) b e such that ∂ h λ ( k ) k +1 ( h ) = ∂ h λ ( k ) k +1 (1 − cos α ) = 1 2  1 + cos α 2  k [1 − ( k + 1) (1 − cos α )] > 1 4 for all α < α 0 and sufficien tly large n ; the remaining finitely cases can be verified directly as claimed in [ 38 , §A.2.1, pp. 612]. Note that such a v alue α 0 exists b ecause when α = 0 (i.e., h = 1 ) ∂ h λ ( k ) k +1 (1) = 1 2 > 1 4 . 34 Yifeng F an, Tingran Gao, Zhizhen Zhao As argued in [ 38 , §A.2, pp. 611], we set z 0 = cos α 0 and h ( k ) 1 = h ( k ) 2 = 1 + z 0 , whic h ensures ∂ h λ ( k ) n ( z ) ≤ ∂ h λ ( k ) k +1 ( z ) for all z ∈ [ − 1 , z 0 ] , and furthermore λ ( k ) n ( h ) ≤ λ ( k ) k +1 ( h ) , for all n ≥ k + 1 . This prov es (2) of Lemma 1 and completes the entire pro of of Lemma 1 . Lemma 2 (1) Ther e exists N ( k ) 1 > 0 such that λ ( k ) n ( h ) ≤ λ ( k ) k ( h ) for every n ≥ N ( k ) 1 and h ∈ [ h ( k ) 1 , 2] . (2) Ther e exists N ( k ) 2 > 0 such that λ ( k ) n ( h ) ≤ λ ( k ) k ( h ) for every n ≥ N ( k ) 2 and h ∈ [ h ( k ) 2 , 1 / ( k + 1)] . Pr o of (Pr o of of L emma 2 ) First note, on the one hand, that the Sc hatten 2 -norm of T ( k ) h can b e easily computed: By [ 55 , Theorem VI.23],    T ( k ) h    2 2 = Z SO(3) Z SO(3)    T ( k ) h ( x, y )    2 d x d y = Z SO(3) Z B ( y ,α )    T ( k ) h ( x, y )    2 d x d y = Z SO(3) Z B ( y ,α ) d x d y = Z α 0 sin ϑ 2 d ϑ = 1 − cos ϑ 2 = h 2 where the last equality follows from h = 1 − cos α . On the other hand,    T ( k ) h    2 2 = ∞ X n = k (2 n + 1)  λ ( k ) n  2 whic h giv es the same bound as [ 38 , form ula (A.4)]: λ ( k ) n ( h ) ≤ √ h √ 4 n + 2 . Since by ( 71 ) we ha ve λ ( k ) k ( h ) = 1 − (1 − h/ 2) k +1 k + 1 ≥ 1 −  1 − h ( k ) 1 / 2  k +1 k + 1 for all h ∈ [ h ( k ) 1 , 2] , it is straightforw ard to v erify by direct computation that there exists N ( k ) 1 > 0 such that √ h/ √ 4 n + 2 ≤ λ ( k ) k ( h ) for every n ≥ N ( k ) 1 and h ∈ [ h ( k ) 1 , 2] . This pro ves (1) of Lemma 2 . F urthermore, b y ( 72 ) λ ( k ) k +1 ( h ) = − k k + 1 · 1 − (1 − h/ 2) k +2 k + 2 + 2 k + 1 k + 1  h 2   1 − h 2  k +1 a direct computation for the deriv ative of the left hand side with respect to h gives ∂ h λ ( k ) k +1 ( h ) = 1 2 [1 − ( k + 1) h ]  1 − h 2  k from whic h it is easy to directly verify that h 7→ λ ( k ) k +1 ( h ) ac hiev es its maximum at h = 1 / ( k + 1) ov er h ∈ [0 , 2] , and λ ( k ) k +1 ( h ) > 0 for all h ∈ [0 , 1 / ( k + 1)] . It is then easy to verify by direct computation that there exists N ( k ) 2 suc h that √ h/ √ 4 n + 2 ≤ λ ( k ) k +1 ( h ) for ev ery n ≥ N ( k ) 2 and h ∈ [ h ( k ) 2 , 1 / ( k + 1)] . This prov es (2) of Lemma 2 . Pr o of (Pr o of of The or em 3 ) Direct computation using ( 71 ) and ( 72 ) establishes ( 33 ): G ( k ) ( h ) = λ ( k ) k ( h ) − λ ( k ) k +1 ( h ) = 1 − (1 − h/ 2) k +1 k + 1 + (2 k + 1)(1 − (1 − h/ 2) k +1 ) k + 1 − 2( k + 1)(1 − (1 − h/ 2) k +2 ) k + 2 = 2 − (1 − h/ 2) k +1 (( k + 1) h + 2) ( k + 2) . Unsurprisingly , the sp ectral gap dep ends on the “frequency channel” parameter k ∈ N . The rest of the pro of follo ws verbatim the pro of of [ 38 , Theorem 4]: By Lemma 1 and Lemma 2 we hav e λ ( k ) n ≤ λ ( k ) k ( k ) for ev ery n ≥ N ( k ) 1 and h ∈ [0 , 2] , as well as λ ( k ) n ≤ λ ( k ) k +1 ( k ) for every n ≥ N ( k ) 2 and h ∈ [0 , 1 / ( k + 1)] . W e then v erify directly b oth λ ( k ) n ≤ λ ( k ) k ( h ) ov er h ∈ [0 , 2] and λ ( k ) n ≤ λ ( k ) k +1 ( h ) ov er h ∈ [0 , 1 / ( k + 2)] for the finitely many cases left ( k ≤ n ≤ N ( k ) 1 and k + 1 ≤ n ≤ N ( k ) 2 , resp ectively). Representation Theoretic Patterns in Multi-F requency Class A veraging for 3D Cryo-EM 35 App endix B.3 Pro of of Theorem 4 Our pro of extends the argumen ts in the pro of of [ 38 , Theorem 5]. A k ey observ ation is that the top eigen- v ector H  λ ( k ) k ( h )  coincides with the isot ypic subspace H k, − k (see Section 4.1 ). Consider the morphism ω := p 1 / (2 k + 1) · τ : C 2 k +1 → H defined as ω ( v ) ( x ) =  δ ( k ) x  ∗ ( v ) . Part 1: τ is an isomorphism b etwe en C 2 k +1 and H k, − k . W e first sho w that Im ( ω ) ⊂ H k, − k , namely , for an y x ∈ SO(3) , v ∈ C 2 k +1 , and g ∈ SO(2) there holds  δ ( k ) x C g  ∗ ( v ) = ρ k  g − 1   δ ( k ) x  ∗ ( v ) . (83) T o this end, note that for any z ∈ C w e ha ve D δ ( k ) x C g  ∗ ( v ) , z E C = D v , δ ( k ) x C g ( z ) E C 3 = D v , z D k · , − k ( x C g ) E C 3 ( 64 ) = = D v , z ρ − k  g − 1  D k · , − k ( x ) E C 3 = D v , z ρ k ( g ) D k · , − k ( x ) E C 3 = D ρ k  g − 1  v , z D k · , − k ( x ) E C 3 = D ρ k  g − 1  v , δ ( k ) x ( z ) E C 3 = D ρ k  g − 1   δ ( k ) x  ∗ ( v ) , z E C 3 whic h prov es ( 83 ). Next, w e show that ω is a morphism of SO(3) representations, namely , for an y x ∈ SO(3) , v ∈ C 2 k +1 , and g ∈ SO(3) there holds  δ ( k ) x  ∗  D k ( g ) v  =  δ ( k ) g − 1 B x  ∗ ( v ) . (84) T o this end, again for an y arbitrary z ∈ C D δ ( k ) x  ∗  D k ( g ) v  , z E C = D D k ( g ) v , δ ( k ) x ( z ) E C 3 = D D k ( g ) v , z D k · , − k ( x ) E C 3 = D v , z D k  g − 1  D k · , − k ( x ) E C 3 ( 56 ) = = D v , z D k · , − k  g − 1 B x E C 3 = D v , δ ( k ) g − 1 B x ( z ) E C 3 = D δ ( k ) g − 1 B x  ∗ ( v ) , z E C whic h prov es ( 84 ). It now follows immediately that the morphism ω maps C 2 k +1 isomorphically , as a unitary represen tation of SO(3) , on to H k, − k , the unique isotypical comp onent in H − k (b y ( 83 )) of unitary irreducible SO(3) -represen tation of dimension 2 k + 1 . This in turn implies that ω (and thus τ ) is an isomorphism betw een Hermitian vector spaces. It remains to determine the suitable normalization constant; we show that T r ( τ ∗ ◦ τ ) = 2 k + 1 . Indeed, T r  τ ∗ ◦ τ  = (2 k + 1) T r  ω ∗ ◦ ω  = (2 k + 1) Z C S 2 k  ω ∗ ◦ ω ( v ) , v  H d v = (2 k + 1) Z C S 2 k h ω ( v ) , ω ( v ) i H d v = (2 k + 1) Z C S 2 k Z SO(3) D δ ( k ) x  ∗ ( v ) ,  δ ( k ) x  ∗ ( v ) E C d x d v = (2 k + 1) Z C S 2 k Z SO(3) D D k · , − k ( x )  ∗ v ,  D k · , − k ( x )  ∗ v E C d x d v = (2 k + 1) Z C S 2 k Z (3) 1 d x d v = 2 k + 1 where C S 2 k is the (4 k + 1) -dimensional unit sphere in C 2 k +1 , and d v is the unique normalized Haar measure on C S 2 k . Part 2: Pr o of of ( 36 ) . By ( 83 ) we hav e  ev x | W ( k )  ◦ ω =  δ ( k ) x  ∗ , which is equiv alen t to  ϕ ( k ) x  ∗ ◦ τ =  δ ( k ) x  ∗ . The conclusion no w follo ws from the straigh tforw ard computation as in the proof of [ 38 , Theorem 5]:  ϕ ( k ) x  ∗ ◦ τ =  δ ( k ) x  ∗ ⇒  ϕ ( k ) x  ∗ ◦  τ ◦ τ ∗  =  δ ( k ) x  ∗ ◦ τ ∗ ⇒  ϕ ( k ) x  ∗ =  δ ( k ) x  ∗ ◦ τ ∗ ⇒ τ ◦ δ ( k ) x = ϕ ( k ) x . This completes the entire pro of. u t 36 Yifeng F an, Tingran Gao, Zhizhen Zhao App endix B.4 Pro of of Theorem 5 By Theorem 4 , τ is a morphism b etw een Hermitian inner pro duct spaces C 2 k +1 and W ( k ) and ( 36 ) holds, th us b y the same argumen t in the last step of the proof of [ 38 , Theorem 6] it suffices to pro ve that for an y unit-norm complex num b ers v , u ∈ C there holds    D δ ( k ) x ( u ) , δ ( k ) y ( v ) E C 2 k +1    =  h π ( x ) , π ( y ) i + 1 2  k . (85) This b oils do wn to the follo wing straigh tforward computation:    D δ ( k ) x ( u ) , δ ( k ) y ( v ) E C 2 k +1    =     u  D k · , − k ( x )  >  D k · , − k ( y )  ∗ ¯ v     =    D k − k, − k  x − 1 y     ( 57 ) = =    d k − k, − k  ϑ  x − 1 y     ( 58 ) = = 1 + cos ϑ  x − 1 y  2 ! k (86) where ϑ  x − 1 y  is the Euler angle ϑ of x − 1 y ∈ SO(3) . Recall from ( 55 ) that cos ϑ  x − 1 y  is exactly the (3 , 3) - en try of the 3 -by- 3 matrix form of x − 1 y ∈ SO(3) , which is exactly identical to the inner pro duct of the third columns of the matrix forms of x and y , i.e., cos ϑ  x − 1 y  = h π ( x ) , π ( y ) i . Plugging this bac k in to the rightmost term of ( 86 ) completes the en tire pro of. u t References 1. Ank ele, M., Lim, L.H., Gro eschel, S., Sch ultz, T.: V ersatile, robust, and efficient tractography with constrained higher- order tensor fODF s. International Journal of Computer Assisted Radiology and Surgery 12 (8), 1257–1270 (2017). DOI 10.1007/s11548- 017- 1593- 6 2. Baja j, C., Gao, T., He, Z., Huang, Q., Liang, Z.: SMAC: Sim ultaneous mapping and clustering using sp ectral decomp ositions. In: International Conference on Machine Learning, pp. 334–343 (2018) 3. Bandeira, A.S., Chen, Y., Lederman, R.R., Singer, A.: Non-unique games ov er compact groups and orientation estimation in cryo-em. Inv erse Problems 36 (6), 064002 (2020) 4. Bandeira, A.S., Singer, A., Spielman, D.A.: A Cheeger inequality for the graph connection Laplacian. SIAM Journal on Matrix Analysis and Applications 34 (4), 1611–1630 (2013) 5. Belkin, M., Niyogi, P .: T ow ards a theoretical foundation for Laplacian-based manifold metho ds. In: Learning Theory , pp. 486–500. Springer (2005) 6. Belkin, M., Niy ogi, P .: Con vergence of Laplacian eigenmaps. Adv ances in Neural Information Processing Systems 19 , 129 (2007) 7. Bena ych-Georges, F., Nadakuditi, R.R.: The eigenv alues and eigenv ectors of finite, low rank perturbations of large random matrices. Adv an ces in Mathematics 227 (1), 494–521 (2011) 8. Bendory , T., Boumal, N., Ma, C., Zhao, Z., Singer, A.: Bisp ectrum in version with application to multireference alignment. IEEE T ransactions on signal processing 66 (4), 1037–1050 (2018). DOI 10.1109/TSP .2017.2775591 9. Boumal, N., Singer, A., Absil, P .A., Blondel, V.D.: Cramér–Rao bounds for synchronization of rotations. Information and Inference 3 (1), 1–39 (2014) 10. Bo yle, M.: How should spin-weigh ted spherical functions be defined? Journal of Mathematical Physics 57 (9), 092504 (2016) 11. Bröc ker, T., T om Diec k, T.: Representations of compact Lie groups, vol. 98. Springer Science & Business Media (2013) 12. Campbell, W.B.: T ensor and spinor spherical harmonics and the spin- s harmonics s Y lm ( θ , ϕ ). Journal of Mathematical Physics 12 (8), 1763–1770 (1971) 13. Chen, B., F rank, J.: T wo promising future developmen ts of cry o-EM: capturing short-liv ed states and mapping a contin uum of states of a macromolecule. Microscopy 65 (1), 69–79 (2015). DOI 10.1093/jmicro/dfv344 14. Cho w, Y., Gatteschi, L., W ong, R.: A Bernstein-type inequality for the Jacobi polynomial. Proceedings of the American Mathematical So ciety 121 (3), 703–709 (1994) 15. Coifman, R.R., Lafon, S.: Diffusion maps. Applied and computational harmonic analysis 21 (1), 5–30 (2006) 16. Dash ti, A., Sch wander, P ., Langlois, R., F ung, R., Li, W., Hosseinizadeh, A., Liao, H.Y., Pallesen, J., Sharma, G., Stupina, V.A., Simon, A.E., Dinman, J.D., F rank, J., Ourmazd, A.: T ra jectories of the rib osome as a Bro wnian nanomac hine. Proceedings of the National Academy of Sciences of the United States of America 111 (49), 17492–7 (2014) 17. Da vis, C., Kahan, W.: The rotation of eigenvectors by a p erturbation. I I I. SIAM Journal on Numerical Analysis 7 (1), 1–46 (1970). DOI 10.1137/0707001 18. Do yle, D.A., Cabral, J.M., Pfuetzner, R.A., Kuo, A., Gulbis, J.M., Cohen, S.L., Chait, B.T., MacKinnon, R.: The structure of the p otassium c hannel: Molecular basis of k+ conduction and selectivity . Science 280 (5360), 69–77 (1998). DOI 10.1126/science.280.5360.69 Representation Theoretic Patterns in Multi-F requency Class A veraging for 3D Cryo-EM 37 19. Driv er, K.A., Jordaan, K.: Bounds for extreme zeros of some classical orthogonal polynomials. Journal of Approximation Theory 164 (9), 1200–1204 (2012). DOI h ttps://doi.org/10.1016/j.jat.2012.05.014 20. East woo d, M., T o d, P .: Edth – a differential operator on the sphere. Mathematical Proceedings of the Cam bridge Philo- sophical So ciety 92 (2), 317–330 (1982) 21. El Karoui, N., W u, H.T.: Graph connection Laplacian methods can be made robust to noise. Ann. Statist. 44 (1), 346–372 (2016). DOI 10.1214/14- AOS1275 22. Elbert, A., Laforgia, A., Ro donò, L.G.: On the zeros of Jacobi p olynomials. A cta Mathematica Hungarica 64 (4), 351–359 (1994) 23. Eldridge, J., Belkin, M., W ang, Y.: Unperturb ed: sp ectral analysis b eyond Davis-Kahan. In: Algorithmic Learning Theory , pp. 321–358. PMLR (2018) 24. F an, Y., Zhao, Z.: Cryo-electron microscop y image analysis using m ulti-frequency vector diffusion maps. arXiv preprint arXiv:1904.07772 (2019) 25. F an, Y., Zhao, Z.: Multi-frequency vector diffusion maps. In: K. Chaudhuri, R. Salakhutdino v (eds.) Pro ceedings of the 36th In ternational Conference on Machine Learning, Pr o ce e dings of Machine L earning Rese ar ch , v ol. 97, pp. 1843–1852. PMLR, Long Beach, California, USA (2019) 26. F rank, J.: Three-dimensional electron microscopy of macromolecular assemblies: visualization of biological molecules in their native state. Oxfor d Univ ersity Press (2006) 27. F rank, J.: New opp ortunities created by single-particle cryo-EM: The mapping of conformational space. Bio chemistry 57 (6), 888–888 (2018). DOI 10.1021/acs.biochem.8b00064 28. F rank, J., Ourmazd, A.: Contin uous changes in structure mapp ed by manifold embedding of single-particle data in cryo-EM. Methods 100 , 61–67 (2016). DOI 10.1016/J.YMETH.2016.02.007 29. Gao, T.: Hyp oelliptic diffusion maps and their applications in automated geometric morphometrics. Ph.D. thesis, Duke Universit y (2015) 30. Gao, T.: The diffusion geometry of fibre bundles: Horizontal diffusion maps. Applied and Computational Harmonic Analysis 50 , 147–215 (2021) 31. Gao, T., Bro dzki, J., Mukherjee, S.: The geometry of synchronization problems and learning group actions. Discrete & Computational Geometry (2019). DOI 10.1007/s00454- 019- 00100- 2 32. Gao, T., Zhao, Z.: Multi-frequency phase synchronization. In: K. Chaudh uri, R. Salakhutdinov (eds.) Proceedings of the 36th In ternational Conference on Machine Learning, Pr o ce e dings of Machine L earning Rese ar ch , v ol. 97, pp. 2132–2141. PMLR, Long Beach, California, USA (2019) 33. Gelfand, I.M., Minlos, R.A., Shapiro, Z.Y.: Representations of the rotation and Lorentz groups and their applications. Courier Dover Publications (2018) 34. Goldberg, J.N., MacF arlane, A.J., Newman, E.T., Rohrlich, F., Sudarshan, E.G.: Spin-s spherical harmonics and ð. Journal of Mathematical Physics 8 (11), 2155–2161 (1967) 35. Gurevic h, S., Hadani, R., Singer, A .: Represen tation theoretic patterns in three dimensional cry o-electron microscop y II I– Presence of point symmetries. Preprin t (2011) 36. Haagerup, U., Schlich tkrull, H.: Inequalities for Jacobi polynomials. The Ramanujan Journal 33 (2), 227–246 (2014). DOI 10.1007/s11139- 013- 9472- 4 37. Hadani, R., Singer, A.: Representation theoretic patterns in three dimensional cryo-electron microscop y I: The intrinsic reconstitution algorithm. Annals of Mathematics 174 (2), 1219–1241 (2011) 38. Hadani, R., Singer, A.: Representation theoretic patterns in three-dimensional cryo-electron microscop y II – the class av eraging problem. F oundations of computational mathematics 11 (5), 589–616 (2011) 39. Heel, M.V.: Angular reconstitution: A p osteriori assignment of pro jection directions for 3D reconstruction. Ultramicroscop y 21 (2), 111–123 (1987). DOI h ttps://doi.org/10.1016/0304- 3991(87)90078- 7 40. Henderson, R., McMullan, G.: Problems in obtaining p erfect images by single-particle electron cryomicroscop y of biological structures in amorphous ice. Microscop y 62 (1), 43–50 (2013). DOI 10.1093/jmicro/dfs094 41. Kak arala, R.: The bispectrum as a source of phase-sensitiv e inv ariants for fourier descriptors: A group-theoretic approac h. Journal of Mathematical Imaging and Vision 44 (3), 341–353 (2012). DOI 10.1007/s10851- 012- 0330- 6 42. Khorunzh y , A.: Sparse random matrices: sp ectral edge and statistics of ro oted trees. A dv ances in Applied Probability 33 (1), 124–140 (2001) 43. K olda, T., Bader, B.: T ensor decomp ositions and applications. SIAM Review 51 (3), 455–500 (2009). DOI 10.1137/ 07070111X 44. K oltchinskii, V., Giné, E.: Random matrix appro ximation of sp ectra of integral op erators. Bernoulli 6 (1), 113–167 (2000) 45. K o ornwinder, T., Kostenko, A., T eschl, G.: Jacobi p olynomials, Bernstein-type inequalities and disp ersion estimates for the discrete Laguerre operator. A dv ances in Mathematics 333 , 796–821 (2018) 46. Lederman, R.R., Singer, A.: A representation theory p ersp ective on sim ultaneous alignment and classification. arxiv preprint (2016) 47. Lin, C.Y., Minasian, A., Qi, X.J., W u, H.T.: Manifold learning via the principle bundle approach. F rontiers in Applied Mathematics and Statistics 4 , 21 (2018). DOI 10.3389/fams.2018.00021 48. MacKinnon, R.: P otassium channels and the atomic basis of selectiv e ion conduction (nob el lecture). Angewandte Chemie International Edition 43 (33), 4265–4277 (2004) 49. Maly arenko, A.: Inv ariant random fields in vector bundles and application to cosmology . Annales de l’I.H.P . Probabilités et statistiques 47 (4), 1068–1095 (2011). DOI 10.1214/10- AIHP409 50. Marin ucci, D., Peccati, G.: Random fields on the sphere: represen tation, limit theorems and cosmological applications, v ol. 389. Cambridge Universit y Press (2011) 51. Newman, E.T., Penrose, R.: Note on the Bondi-Metzner-Sachs group. Journal of Mathematical Physics 7 (5), 863–870 (1966) 52. Oik onomou, C.M., Jensen, G.J.: The dev elopment of cryo-EM and how it has advanced microbiology . Nature microbiology 2 (12), 1577–1579 (2017) 53. P enczek, P .A., Zhu, J., F rank, J.: A common-lines based metho d for determining orientations for n > 3 particle pro jections simultaneously . Ultramicroscopy 63 (3-4), 205–218 (1996) 38 Yifeng F an, Tingran Gao, Zhizhen Zhao 54. P erry , A., W ein, A.S., Bandeira, A.S., Moitra, A.: Message-passing algorithms for synchronization problems ov er compact groups. Communications on Pure and Applied Mathematics (2018) 55. Reed, M., Simon, B.: Metho ds of mo dern mathematical physics. vol. 1. F unctional analysis. A cademic San Diego (1980) 56. Salas, D., Le Gall, A., Fiche, J.B., V aleri, A., Ke, Y., Bron, P ., Bellot, G., Nollmann, M.: Angular reconstitution-based 3d reconstructions of nanomolecular structures from sup erresolution ligh t-microscopy images. Proceedings of the National Academ y of Sciences (2017). DOI 10.1073/pnas.1704908114 57. Sc heres, S.H.: A Ba yesian view on cry o-EM structure determination. Journal of Molecular Biology 415 (2), 406–418 (2012). DOI 10.1016/J.JMB.2011.11.010 58. Sc hultz, T., F uster, A., Ghosh, A., Deriche, R., Florack, L., Lim, L.H.: Higher-order tensors in diffusion imaging. In: C.F. W estin, A. Vilano v a, B. Burgeth (eds.) Visualization and Pro cessing of T ensors and Higher Order Descriptors for Multi-V alued Data, pp. 129–161. Springer Berlin Heidelb erg, Berlin, Heidelb erg (2014) 59. Shk olnisky , Y., Singer, A.: Viewing direction estimation in cryo-EM using sync hronization. SIAM Journal on Imaging Sciences 5 (3), 1088–1110 (2012) 60. Sigw orth, F.J., Do ersch uk, P .C., Carazo, J.M., Scheres, S.H.W.: An introduction to maximum-lik eliho o d methods in cryo- EM. In: Methods in enzymology , vol. 482, pp. 263–294. San Diego, CA (United States); A cademic Press Inc. (2010). DOI 10.1016/S0076- 6879(10)82011- 7 61. Singer, A.: Angular sync hronization by eigenv ectors and semidefinite programming. Applied and Computational Harmonic Analysis 30 (1), 20–36 (2011). DOI 10.1016/j.acha.2010.02.001 62. Singer, A., W u, H.T.: V ector diffusion maps and the connection Laplacian. Communications on Pure and Applied Mathe- matics 65 (8), 1067–1144 (2012) 63. Singer, A., Zhao, Z., Shk olnisky , Y., Hadani, R.: Viewing angle classification of cryo-electron microscopy images using eigenv ectors. SIAM Journal on Imaging Sciences 4 (2), 723–759 (2011) 64. Szegő, G.: Orthogonal polynomials, vol. 23. American Mathematical So c. (1939) 65. V ainshtein, B., Goncharo v, A.: Determination of the spatial orientation of arbitrarily arranged iden tical particles of unknown structure from their pro jections. In: So viet Ph ysics Doklady , v ol. 31, p. 278 (1986) 66. V arshalovich, D.A., Moskalev, A.N., Khersonskii, V.K.: Quantum theory of angular momentum. W orld Scien tific (1988) 67. W ang, L., Singer, A.: Exact and stable reco very of rotations for robust sync hronization. Information and Inference (2013). DOI 10.1093/imaiai/iat005 68. W atts, D.J., Strogatz, S.H.: Collective dynamics of ‘small-world’netw orks. nature 393 (6684), 440 (1998) 69. Wigner, E.P .: Characteristic vectors of bordered matrices with infinite dimensions. Ann. of Math. 62 (2), 548–564 (1955) 70. Wigner, E.P .: Characteristic vectors of bordered matrices with infinite dimensions. Ann. of Math. 67 (2), 325–327 (1958) 71. Y e, K., Lim, L.H.: Cohomology of cryo-electron microscopy . SIAM Journal on Applied Algebra and Geometry 1 (1), 507–535 (2017). DOI 10.1137/16M1070220 72. Y u, Y., W ang, T., Samw orth, R.J.: A useful v ariant of the Davis–Kahan theorem for statisticians. Biometrik a 102 (2), 315–323 (2014) 73. Zhao, Z., Shkolnisky , Y., Singer, A.: F ast steerable principal comp onent analysis. IEEE transactions on computational imaging 2 (1), 1–12 (2016) 74. Zhao, Z., Singer, A.: F ourier–Bessel rotational inv ariant eigenimages. JOSA A 30 (5), 871–877 (2013) 75. Zhao, Z., Singer, A.: Rotationally in v ariant image representation for viewing direction classification in cryo-EM. Journal of structural biology 186 (1), 153–166 (2014)

Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Leave a Comment