Goodness-of-Fit Tests for Latent Class Models with Ordinal Categorical Data
Ordinal categorical data are widely collected in psychology, education, and other social sciences, appearing commonly in questionnaires, assessments, and surveys. Latent class models provide a flexible framework for uncovering unobserved heterogeneit…
Authors: Huan Qing
Goodness-of - Fit T ests for Latent Class Models wit h Ordinal Cate gorical Data Huan Qing a, ∗ a Scho ol of Ec onomics an d F i nance , Chongqi ng Univer sity of T ech nolog y , Chongqing, 400054, China Abstract Ordinal categorical data are widely collected in psycholog y , education , and other social sciences, appearin g co m - monly in question naires, assessments, and sur veys. Latent class models provid e a flexible fr amew ork f or uncovering unobser ved heterogen eity b y gro uping individuals into ho mogeneo us classes b a sed on th eir respo n se pattern s. A fun- damental challen ge in app lying these mod els is de termining the n umber of late n t classes, w h ich is u n known and must be infe r red from data. In this paper, we p ropo se on e test statistic for this problem . The test statistic centers the largest singular value of a norma lized residual m atrix by a simple sample-size adju stme nt. Und e r the null hypothe sis that the candidate numb er of latent classes is corr e ct, its up per bou nd conv e rges to zer o in probab ility . Und e r an un der-fitted alternative, th e statistic itself exceeds a fixed positiv e co nstant with probability approa c hing one. This sharp dichoto- mous behavior of th e test statistic yield s two sequen tial testing algo rithms that co nsistently estimate th e true n u mber of latent c la sses. Exten si ve experime ntal stu d ies confir m the theo retical finding s and dem onstrate their acc uracy and reliability in d etermining th e n umber o f latent classes. K eywor ds: Ordinal categorical data, latent class model, goodn e ss-o f-fit, estimatio n of n umber of latent classes 1. Introduction Ordinal ca tegorical data are com m only encoun tered in psychology , educa tion, po litical science, and many other fields. In psych ological surveys, responden ts rate their agreemen t on a Likert scale, with o ptions coded as 0 , 1 , 2 , 3 , 4 representin g “stron gly disagr e e, ” “d isagree, ” “neutral, ” “agree, ” and “strongly agree. ” In ed ucational assessments, student perform ance is of ten classified into o rdered p roficiency lev els suc h as 0 (below ba sic), 1 (basic), 2 (proficien t), and 3 (advanced). In political polls, individuals in dicate their lev el of support for a policy using a similar ord e red scale. Such data can b e organize d into an N × J r esponse matrix R , wher e N is the n umber of sub jects (individuals), J is the numb e r of item s, and each entr y R ( i , j ) records subject i ’ s respo n se to item j . Responses take values in { 0 , 1 , . . . , M } , with 0 denotin g the lowest intensity and M the h ighest. When M = 1, the data are dichotomou s (also known as binar y ); whe n M ≥ 2, th ey are p olytomo us. A key featu r e of ordinal categor ical data is that wh ile th e categories f ollow a na tural order, the d istances between them are not necessarily equa l or inte r pretable. Any v a lid statistical an alysis must respect this ordinal nature with out imposing un warranted assumption s abo ut equ al spacin g [ 1 ]. The latent class model (LCM) pr ovides a flexible and in terpretable framework for u ncovering latent pop ula- tion structure from such data [ 9 ]. This m odel is widely used in p sycholog ical, behavioral, and social sciences [ 12 , 27 , 7 , 11 ]. The LCM p osits th a t the pop ulation consists o f K d istinct laten t classes (also known as gr oups), an d that condition al on class memb e rship, an individual’ s r esponses to di ff e rent items are inde penden t. This assumption accounts f or the association s observed amo ng item s, as any de p endenc e b etween responses arises solely from their shared latent class membersh ip. For o rdinal responses, a natur al specification mod e ls each item respon se as a Bino- mial ran dom variable with M trials a n d a class-specific success pr obability . When M = 1, this r educes to the Bernoulli distribution co mmonly u sed for binary d ata [ 6 , 32 , 20 ]. F o r M ≥ 2, the Bino m ial form ulation directly captures the polytom ous natu re by allo wing the expected r esponse to increase with the success pro bability , thereby shifting the ∗ Correspondi ng author . Email addre s s: qinghua n@u.nus.e du&qinghuan@cqut.edu.cn (Huan Qing) Prep rint submitted to F ebruary 26, 2026 distribution toward higher categorie s. As d iscussed extensively in the literature on categorical data analysis [ 1 ], the Binomial distribution is a fundam ental tool for mo deling discrete r esponses, an d its mea n is dir ectly linked to the un- derlying probab ility of succ ess. Historically , parameter estimation has relied o n the Exp e c tation-Maximization (EM) algorithm , which treats class memb erships as missing data [ 8 ]. Ho wev e r , the EM alg orithm can be comp utationally demand in g fo r large datasets and is sensiti ve to initialization, of ten requiring multiple random starts to a void local optima [ 5 ]. T o address these limitations, alternative methods have b een developed in recent years, includin g spectral clustering that exploits th e low-rank structur e o f the data matr ix [ 21 , 22 , 23 , 1 9 , 20 ], tenso r d ecompo sition method s that operate on lo w-or d er moments [ 32 ], and regularized estimation techniques that p erform simultaneous parame- ter e stima tio n and mo d el selectio n [ 6 ]. Other framew orks that can model o r dinal categor ical data with polytomous responses exist, such as the ge n eral diag nostic mo del proposed b y [ 26 ]. A fun damental and unre solved challenge in applyin g LCMs is determ ining the numb er of latent c lasses K . In real-world ap plications, K is rarely known a priori and must b e in ferred from th e o bserved resp o nse matrix R . Se- lecting too fe w classes can obscu re m eaningf ul hetero geneity , while selecting too many can lead to overfitting and scientifically spuriou s conclusions. Traditional approaches for selecting K include inform ation criteria such as the Akaike Inform ation Criter ion (AI C) and Bayesian Info rmation Criter ion (BIC) [ 2 , 24 ], and likelihoo d-ratio tests, typ - ically implemen ted with the Expectation-Maximization (EM) algorithm. These metho ds inherit the computation al burden of the EM algo rithm, makin g them expensi ve when many cand idate K values mu st be evaluated. M oreover , their theo retical prope rties in high-dimensional settings where the n u mber of items J grows with the sample size N remain la rgely u nexplored . The consistency o f these criter ia of ten relies on regularity cond itions that may not h old in complex latent variable mo dels—a point examined b y Keribin [ 16 ] in the con text of mixture mod e ls, wh ic h in clude LCMs as a special case. Mor e recent appro aches address th ese limitations. Regularized latent class analysis, pro posed by Chen et al. [ 6 ], learns the latent stru cture by shrink ing small param e ter d i ff erences tow ard zero, th ereby revealing the u n derlying numb er o f laten t classes. Their fram ework u ses a gener a lized in formation cr iter ion (GIC) to select both the regularization param e te r and the number of classes. Spectra l m e th ods o ff er ano ther d irection. By exploiting the low- rank stru cture o f th e expected respon se ma tr ix, the numb er of latent classes can be estimated b y thresholding th e singular v alues of the data matr ix . For binary r esponses, L y u and Gu [ 20 ] d ev eloped such a method, wh ich is compu - tationally e ffi cient a nd theor e tically g round ed, but yields o nly a p oint estimate of K with o ut a fo rmal goodne ss-of-fit assessment. While promising , th ese existing me th ods lack r igorou s go odness-of -fit guarantees, especially for ordin al categorical data. This gap mo tivates the present work. In this paper, we introduc e novel good ness-of- fit testing proced ures that directly add r esses the p roblem o f estimat- ing K in LCMs fo r ordin a l ca tegor ical d ata. Our approac h es tra n sform a challengin g mod el selection p r oblem into a sequence o f simple spectral chec k s, o ff ering both comp utational e ffi cien cy and statistical rig o r . The main contr ibutions of this work are as follows: • W e dev elop a goodn ess-of-fit framework for determ ining the numb er of latent classes in ord in al categorica l data under the latent class models. The f r amew ork intro duces a test statistic con stru cted fr om a normalized r e sid ual matrix by a simple sample-size adjustment. W e prove that its upp er boun d con verges to zero in prob ability under the null hypoth e sis th at th e can didate n umber of latent classes K 0 equals the true K , while the statistic itself exceed s a fixed po siti ve co nstant with probab ility ap p roachin g one u nder an under-fitted a lter native. These results ar e o btained u sing matr ix co ncentratio n ineq ualities and a pe rturbation a n alysis th a t con trols the error from estimated p arameters. • Based on the dichoto m ous behavior of the test statist ic, we develop two seq uential testing pro cedures that consistently estimates the true nu mber of latent classes unde r the latent class mo dels. The co nsistency theo r ems specify the required gr owth r ates of the thr esholds re la tive to the sample dimensions. • W e e valuate both pro cedures through extensive simulation s. The results sho w that our methods sign ificantly outperf orm existing ap p roach in accuracy . A real-d ata applica tio n fur ther demo n strates their practical value. The r emainder of this paper is organized as follows. Section 2 de fines the latent class model for ord inal categorical data and forma lly states the prob le m of estimating th e number of latent classes. Section 3 introdu ces the test statistic and estab lishe s its asympto tic prop erties und er both null an d under-fitted alter n ativ es. Sectio n 4 pr esents two sequen- tial testing proced ures based on th ese statistics and proves their con sistency . Section 5 reports simulation studies and 2 a real data application . Section 6 conclud es this pap er and discu sses futu re research direction s. T echnica l proo fs are giv en in the Appendix . 2. Model and problem This section in troduces the latent class m odel f or ord inal categorical da ta and then states the p roblem of estimating the number of latent classes. 2.1. Latent class mo del W e now give a fo rmal definition of th e laten t class mod el. Recall th at th e obser ved d ata f orm an N × J re sponse matrix R , where N is the numb er of subjects an d J is the number o f items. E ach entry R ( i , j ) recor ds th e response of subject i to item j , taking an ordinal value in the set { 0 , 1 , . . . , M } , with M ≥ 1 fixed a n d representin g the highest response category . Suppose all N su bjects are divided in to K distinct latent classes, which we deno te by C 1 , C 2 , . . . , C K . Le t ℓ ( i ) ∈ [ K ] be the class memb ership of subject i ; tha t is, ℓ ( i ) = k pre c isely when i ∈ C k . Th e c lassification m atrix Z ∈ { 0 , 1 } N × K is defined by Z ( i , k ) = 1 { ℓ ( i ) = k } . Hence each row of Z contain s exactly one 1, an d Z ⊤ Z = d iag( N 1 , . . . , N K ), where N k = |C k | d enotes the n umber o f su b jects in cla ss k for k ∈ [ K ]. W e also set N min = m in k ∈ [ K ] N k and N max = m ax k ∈ [ K ] N k . For each item j and each latent class k , let Θ ( j , k ) ∈ [0 , M ] b e th e expec ted response of a sub ject in class k to item j . Th ese values fo r m the item parame ter matrix Θ ∈ [0 , M ] J × K . Define the N × J expected r esponse m atrix R : = E [ R ] as R = Z Θ ⊤ . By definition, R ( i , j ) = Θ ( j , ℓ ( i )) . W e now spe cify th e data- generating me chanism of the observed response matr ix R . Follo wing the f r amew ork of gener alized linear models f or categor ical data [ 1 ], cond itional on the latent structure (i.e., on Z ), we assume tha t the entries of R are in depend ent and each follows a b inomial distribution. The distribution depend s o n Z and Θ only thr ough th e expected response m atrix R . E xplicitly , R ( i , j ) ∼ Binomial M , R ( i , j ) M ! , i ∈ [ N ] , j ∈ [ J ] , indepen d ently across a ll p airs ( i , j ). Equiv alen tly , fo r ea ch m ∈ { 0 , 1 , . . . , M } , we have P R ( i , j ) = m = M m ! R ( i , j ) M m 1 − R ( i , j ) M M − m , i ∈ [ N ] , j ∈ [ J ] . It follows immediately that E [ R ( i , j )] = R ( i , j ) , V ar[ R ( i , j )] = R ( i , j ) 1 − R ( i , j ) M i ∈ [ N ] , j ∈ [ J ] . W e now fo r mally de fine the latent class model via the f ollowing d efinition. Definition 1 (L atent class m o del for o rdinal categorical data) . F or fixed po sitive inte gers N , J , M , K ∈ N with M ≥ 1 and K ≥ 1 , the la tent class mod el with K latent cla sses, d e n oted LCM( K ) , is a statistical mo del for the observed r espon se matrix R ∈ { 0 , 1 , . . . , M } N × J with parameters ( Z , Θ ) ∈ { 0 , 1 } N × K × [0 , M ] J × K , wher e Z is the classification matrix induc e d by a partition {C 1 , . . . , C K } of [ N ] an d Θ is the item parameter matrix. The expected r esponse matrix is R = Z Θ ⊤ . Con ditional on Z and Θ , the entries of R ar e indepen dent and satisfy R ( i , j ) ∼ Binomial M , R ( i , j ) M ! , i ∈ [ N ] , j ∈ [ J ] . The form ulation o f the latent class mo del giv e n in Defin itio n 1 follows th e seminal framework introd uced by [ 9 ], where th e pop u lation is a ssum ed to consist of K latent classes and, con ditional on class m embership , the respo nses to di ff erent items are ind ependen t. Th e LCM model co nsidered in th is pap er extends that c lassical mo del to accommo - date or dinal ca tego r ical respo nses throu gh a bino m ial specification . When M = 1, the bino mial distribution red uces to Bernoulli, and LCM( K ) becomes the classical latent class model for binar y respo nses, which has been extensiv ely studied in th e liter a ture [ 6 , 30 , 11 , 10 , 32 , 20 ]. 3 2.2. Pr ob lem statemen t Throu g hout this paper, we u se K to denote the true number of latent classes in the laten t class model define d in Definition 1 , an d u se K 0 to de n ote a hypo thetical number of latent classes. The goal is to estimate K f rom the ob served response matrix R . W e adopt a sequen tial go odness-of-fit testing fr amew ork. Th is approach was first introduced for stochastic block mode ls by Bickel and Sar kar [ 4 ], who de veloped a recur si ve bipartition in g algo rithm b ased on the limiting T racy-W idom distribution of the largest eigenv alue of th e centered ad jacency ma trix. Subsequen t work extended this idea to test H 0 : K = K 0 directly using various test statistics with k nown asymptotic null distributions [ 17 , 1 3 , 14 , 29 ]. For a candidate value K 0 , we test H 0 : K = K 0 versus H 1 : K > K 0 . If H 0 is not r ejected, we set ˆ K = K 0 ; otherwise we incr ease K 0 and r epeat. T he key is to co nstruct a test statistic T K 0 whose behavior under the null and altern ativ e hypotheses is sharply di ff erent. T he following sections develop such a statistic an d establish its asymptotic properties. 3. T est statistic The co r e o f ou r m ethod is a test statistic whose behavior is fundamen tally di ff erent whe n the mode l is correctly specified co m pared to wh e n it is under-fitted. W e construct this statistic in two stages. First, we consider an idealized version that u ses the true, u n known par ameters. This ideal version provides the theoretical found ation. W e then replace the unknowns with estimates to obtain a practical, data-driven statistic. Finally , we analyze the asymptotic proper ties o f th is practical statistic un der b oth the null and alternative h ypoth e ses. 3.1. Idea l normalized r esidua l matrix T o b u ild intuition, we fir st construct the idea l no rmalized resid u al matrix using tru e p arameters, which serves as the theoretical fo undation for ou r test statistic. Define the ideal norma lized residual matrix R ∗ ∈ R N × J as: R ∗ ( i , j ) = R ( i , j ) − R ( i , j ) p N V ( i , j ) , V ( i , j ) = R ( i , j ) 1 − R ( i , j ) M ! . (1) For this ide a l n ormalized r esidual matrix R ∗ to be well-defined and for its entries to have stable v ariance, the denomin ator must b e b ounded aw ay f rom ze r o. Th is leads to o u r first assumptio n . Assumption 1 ( Parameter bo unded n ess) . Ther e exis ts a constant δ ∈ (0 , 1 / 2] su ch that for all j ∈ [ J ] , k ∈ [ K ] , δ ≤ Θ ( j , k ) M ≤ 1 − δ . Assumption 1 gua rantees that V ( i , j ) is boun ded below b y a positive con stant, making th e n ormalization in Equa- tion ( 1 ) v alid. Under th is assumption, it is straig htforward to verify that E [ R ∗ ( i , j )] = 0 an d V ar( R ∗ ( i , j )) = 1 / N for all i , j . M eanwhile, we call δ signal stren gth parameter in this p aper for the following reason : δ contr ols the rang e o f the expected item responses Θ ( j , k ) thro ugh the c onstraint δ ≤ Θ ( j , k ) / M ≤ 1 − δ . When δ is small, th e admissible interval [ δ M , ( 1 − δ ) M ] is wid e, allowing the class-specific parameters to take values nea r the extremes 0 o r M . This creates substantial di ff ere n ces between classes, resulting in a strong signal th at facilitates accurate estimatio n o f K . As δ increases tow ard 0 . 5, the interval shrin ks an d becomes sym metric aro und M / 2. The class-specific parame ter s are then confin ed to a na r row central region, reduc in g the between -class d i ff erences and weakening the signal. Conse- quently , d istinguishing between latent classes b ecomes more di ffi cult, an d estimating the true nu mber of latent classes K becomes increasin gly challeng ing. Thus, we see th a t smaller values o f δ cor respond to stronger sign als and easier estimation. W e are inte r ested in the spectral norm of this ideal residu al matr ix . The fo llowing lem m a characterizes the asymp- totic b ehavior of σ 1 ( R ∗ ), the largest singular v alue of R ∗ (i.e., spe c tr al n orm). Lemma 1 ( Spectral no rm of ideal residual matrix) . When Assumption 1 holds, for an y ǫ > 0 , we have lim N →∞ P k R ∗ k ≤ 1 + r J N + ǫ = 1 . 4 Lemma 1 imp lies that σ 1 ( R ∗ ) is asym p totically n o larger th an 1 + √ J / N . This m otiv ates th e d efinition o f an ide a l test statistic, T ideal : = σ 1 ( R ∗ ) − 1 + r J N , which b y L emma 1 , satisfies P ( T ideal > ǫ ) → 0 for any ǫ > 0 . 3.2. Practical test statistic In practice, th e true parame ters Z and Θ in the ideal test statistic T ideal are unknown. W e now con struct a prac- tical test statistic using estimated parameters an d e stab lish its a symptotic p roperties under both null and alternativ e hypoth eses. Giv e n a can didate numb er of laten t classes K 0 , ap ply a consistent classification estimato r M to obtain estimates estimated classification matr ix ˆ Z , estimated item parameter matrix ˆ Θ , and the fitted expe c ta tio n resp onse ma tr ix ˆ R = ˆ Z ˆ Θ ⊤ . W ith these estimates, we define our pr actical n ormalized r e sidual matrix ˜ R ∈ R N × J as ˜ R ( i , j ) = R ( i , j ) − ˆ R ( i , j ) q N ˆ V ( i , j ) , if ˆ V ( i , j ) > 0 , 0 , otherwise , (2) where ˆ V ( i , j ) = ˆ R ( i , j ) 1 − ˆ R ( i , j ) / M . T o ensure th at ˜ R is a good p roxy for the ideal normalized residua l ma trix R ∗ under the n u ll h ypothe sis, we n eed to c o ntrol th e error introduce d by p arameter estimation. This r e quires several regularity condition s on the latent class structure and the estimator . Assumption 2 (Class balanc e ) . Ther e exists a constan t c 0 > 0 such that for all k ∈ [ K ] , c 0 N K ≤ N k ≤ 1 c 0 N K . Assumption 2 ensures that n o latent class is too sma ll, which is n e cessary for obtain in g uniform concentration bound s a c r oss classes—a key in g redient in bo undin g the estimatio n error of ˆ R . Assumption 3 (Separ ation condition) . There exist two co n stants ζ > 0 and c 1 > 0 such that for any distinct true classes k and l, the set T kl = { j ∈ [ J ] : | Θ ( j , k ) − Θ ( j , l ) | ≥ ζ / 2 } satisfies |T kl | ≥ c 1 J . Assumption 3 gua rantees that any two d istinct latent classes are distinguished by a t least a constant fraction of the items, with a minimu m d i ff erence in exp ectation. This sep aration will play a key role both in establishing consistency of the estimato r (un der th e n u ll) a nd in cr eating a detectable signal under under -fitted alternatives . Assumption 4 (Growth co nditions) . The numbe r of items J and the numb er of latent cla sses K satisfy the fo llowing gr owth rates as N → ∞ : (i) K 2 log( N + J ) N − → 0 ; (ii) J K log( J K ) N − → 0 . Assumption 4 provides precise growth rates for the dim ensions. They ar e needed to ensure that accumu lated estimation er r ors vanish asy m ptotically . Remark 1. Assumptions 1 – 3 are mild and widely a dopted in th e an alysis of laten t class mod e ls with high- d imensional data. Sp ecifically , the b ounded ness con dition in Assumptio n 1 and the cla ss sep aration condition in Assumption 3 a re essential for establishing the co n sistency of estimato rs in b inary LCMs when both the number of subjects and the number of items in cr ease [ 32 ]. Assumption 2 , which ensur es a min imum class size, is a typical r eq u ir emen t for obtainin g unifo rm concentration b ound s acr oss all latent classes. In co ntrast, Assumption 4 pr ovid e s pr e cise gr owth rates on the d imensions J and K r elative to the samp le size N that a r e necessary for the technical p r oofs of ou r ma in theor ems, particula rly to contr o l the accu mulated estimation err ors. Finally , the asym ptotic ana lysis un der the null hypo thesis requ ires a condition on the classifier itself. 5 Assumption 5 (Con sistency of classification estimator ) . When K 0 = K , the cla ssification estimation method M satisfies P ˆ Z = Z Π → 1 , wher e Π ∈ { 0 , 1 } K × K is a pe rmutation ma trix. Assumption 5 requir es a classifier that achieves exact rec overy: ˆ Z = Z Π with proba b ility ten ding to one. The spectral clusterin g with likelihood refin ement (SOLA) prop osed b y L yu and Gu [ 20 ] p rovides such a guar a ntee, but it is d esigned for binar y respon ses and does n ot directly han dle the po lytomou s o rdinal responses in our mo del. T o ob tain a prac tical estimator f or o ur setting, we introduce a spectral clu stering algor ithm called SC-LCM in Append ix C . Theorem 6 shows that SC-LCM is consistent for estimatin g the cla ss memb ership matrix Z in the sense tha t the clustering err or er r( ˆ Z , Z ) defin ed in Append ix C conv e rges to zero in pro bability , but it d oes no t m eet the exact recovery requirem ent of Assumption 5 . This g ap betwe en a theo retical con d ition an d th e actual pe r forman ce of an estimator is no t unc o mmon. A similar situation occur s in th e stocha stic block m odel literature for network an alysis. In developing a go o dness-of- fit test fo r stoch astic block m odels, Lei [ 17 ] assumed the existence of a commu nity estimator that exactly recovers th e true pa rtition. In their numerica l work, h owe ver, they used the sp ectral clustering alg orithm studied by Lei and Rinaldo [ 18 ], which is o nly known to be co nsistent (i.e . , its misclassification pro portion tend s to z e ro). Despite this discrepancy , their testing pro cedure per formed very well in simulations. Recently , important theoretical ad vances sug g est th at exact recovery m ig ht also be achievable for SC-LCM under stronger c o nditions. Th e leav e-one-o ut sing ular subspace p erturbatio n analysis developed b y Zhang and Zhou [ 33 ] provides a p owerful tool for obtaining entrywise b ound s on the di ff erence between emp irical and population singu la r vectors. Bu ilding on this, L yu and Gu [ 20 ] pr oved that their SOLA alg o rithm achieves exact recovery fo r binary latent class mo dels when the signal is su ffi cien tly strong. This indicates that SC-LCM could po tentially be shown to achieve exact recovery u n der analogo u s strengthe ned conditions. A rigoro us p roof of this, h owe ver, would require a detailed entrywise analysis that is beyond the sco p e o f the present pape r, whose main focus is to develop a goodn ess-of-fit test for the numb er of latent classes. W e leav e this m e aningfu l theoretical question for our future work. Inspire d by th e pr ecedent in network analysis and suppo r ted by rec e nt theo retical insights, we adopt SC-LCM as the practica l implementatio n of M in our sequen tial testing algorithm . As the n umerical studies in Section 5 dem o nstrate, th e resulting sequential testing proced u re estimates the tru e num b er of latent classes with hig h accuracy ac r oss a wide rang e of settings, pr oviding empirical support fo r using SC-LCM as a reliable plu g-in estimator even thoug h its exact r ecovery proper ty has no t been f ormally established her e . W ith the se assumptio ns in place, we can qua n tify the di ff ere n ce between the practical and ideal normalize d r esidual matrix m atrices u nder the nu ll h ypothesis. Lemma 2 ( Perturbation c ontrol of nor malized re sidual matrix ) . When K = K 0 and A ssumptions 1 – 5 h old, we h a ve k ˜ R − R ∗ k = o P (1) . (3) Lemma 2 shows that u nder the null hy pothesis, the empirical n ormalized residu al matrix ˜ R is asymptotically equiv a lent to its o racle counter p art R ∗ in sp ectral nor m. Hen ce, ˜ R asy mptotically inh e r its the behavior of R ∗ shown in Lemm a 1 . This m eans that after corr ectly fitting a K 0 -class mod el, the empirical residu al matrix is asymptotically indistinguishab le f rom a pur e n oise matr ix . Based on th is lemma, our test statistic for a cand idate K 0 is then define d as T K 0 = σ 1 ( ˜ R ) − (1 + r J N ) . (4) The following theorem e stab lishes T K 0 ’ s behavior w h en the m odel is c orrectly specified . Theorem 1 (Null behavior of test statistic) . When K = K 0 and Assumptions 1 – 5 h old, for a ny ǫ > 0 , we h ave lim N →∞ P T K 0 < ǫ = 1 . Theorem 1 establishes that un der the null hyp o thesis, T K 0 is bo unded above by any positive con stant with pro ba- bility ten ding to on e. Hence , th e test statistic is asym ptotically negligib le, providin g no evidence against the n ull. This result (i.e., Th e o rem 1 ) alone, h owe ver, d o es not su ffi ce for mod el selectio n. Therefor e , we m ust also und erstand the 6 behavior of T K 0 when the ca n didate number of latent classes is too small. When K 0 < K , the estimated expec tatio n response matrix ˆ R can n ot capture the full structure o f the d ata, leaving a determin istic (i.e., no-r andom) sign al in the residual m atrix that we can detect. T o quan tify this signal, we in troduce the fin al co ndition. Assumption 6 (Signal-to-noise ratio) . Define th e constants c : = √ c 3 0 c 1 2 √ 2 , C signal : = 2 c ζ √ M , C noise : = 5 δ . There exists a constant η 0 > 0 such that for all su ffi cien tly large N , C signal √ J √ K K 0 ≥ C noise + 1 + 3 η 0 , Assumption 6 quantifies the stre n gth of the signa l: the left-hand side is the minimal deterministic signal (afte r scaling in ˜ R ), while the righ t-hand side ag g regates the max im al p ossible noise and th e centering term, plus a b u ff er . When this holds, th e signal dom inates the no ise, g uaranteein g th at the test will reject H 0 with probab ility tending to one. W e are now ready to state the power g uarantee. Theorem 2 (Alternative b ehavior of the test statistic) . Under K 0 < K and Assump tio ns 1 – 4 an d 6 , we ha v e lim N →∞ P T K 0 > 2 η 0 = 1 . Theorem 2 shows that under th e alternative hypoth e sis K 0 < K , the test statistic T K 0 exceeds a fixed positi ve constant 2 η 0 with p robab ility tendin g to one. Henc e , the test has p ower a gainst any und er-fitted mod el. T og e ther with Theorem 1 , we ha ve established the fundamen tal d ichotom y : T K 0 ’ s upper b ound is near zero unde r correct specifi- cation but itself is large u nder under-fitting. This sharp d ichotomo us beh avior of T K 0 under co r rect and u nder-fitted specifications p r ovides the foun d ation for the c o nsistent sequential estimation of the true number of laten t classes K . 4. Algorithms The sharp dich o tomy in the behavior of T K 0 under the null and altern ativ e hyp otheses motiv ates two natur a l sequential testing procedur es for estimatin g K : the first stops at the smallest K 0 for which T K 0 falls below a thresho ld; the second stops when th e ratio of successive test statistics excee d s a diverging th reshold. Th is section pr esents two such sequential alg o rithms an d proves their estimation consistency . 4.1. GoF-LCM algorithm Our first algorithm d irectly imp lements th e d ichotomo us b ehavior of T K 0 for e stima ting K . Alg orithm 1 sequen- tially tests candida te s K 0 = 1 , 2 , . . . , K max . It accep ts the first K 0 for which T K 0 falls below a thresho ld τ N that decays to z e ro. The maximum ca ndidate K max is chosen so that any K 0 ≤ K max respects the g r owth co ndition in Assump- tion 4 (i). A convenient default is K max = p N / log( N + J ) . This cho ice e nsures that fo r all candid a te s con sidered, K 2 0 log( N + J ) / N → 0, which is required f or the tech nical argum ents in the consistency pro o f. The following theore m estab lishes the consistency of Algorithm 1 in estimatin g the tru e number of latent classes K un der the latent class mode ls. Theorem 3 ( Consistency of GoF–LCM) . Let the true nu m b er of latent classes be K (which ma y g r ow with N subject to A ssumption 4 ). Suppose Assumptions 1 – 6 hold a nd the th r esho ld sequen ce { τ N } N ≥ 1 used in Alg orithm 1 satisfy (Con1) τ N − − − − → N →∞ 0 and N τ 2 N max( J K log( J K ) , log( N + J )) − − − − → N →∞ ∞ . Then the estima to r ˆ K pr o duced by A lgorithm 1 satisfies lim N →∞ P ˆ K = K = 1 . 7 Algorithm 1 Go F-LCM: G o odness-o f -Fit T e sting f or Laten t Class Mo dels Require: Observed response matr ix R ∈ { 0 , 1 , . . . , M } N × J , maximum candidate n umber K max (default ⌊ p N / log( N + J ) ⌋ ), thresho ld seque n ce τ N (default τ N = N − 1 / 5 ), and a classification estimator M . Ensure: Estimated nu mber of latent classes ˆ K 1: Initialize ˆ K ← K max 2: for K 0 = 1 , 2 , . . . , K max do 3: Apply M to R with candid ate K 0 to o btain ˆ Z , ˆ Θ , ˆ R = ˆ Z ˆ Θ ⊤ 4: Compute T K 0 via Equation ( 4 ) 5: if T K 0 < τ N then 6: ˆ K ← K 0 7: break 8: end if 9: end for 10: return ˆ K Theorem 3 ensur es that GoF-LCM cor rectly id entifies the true numb er of latent classes with pr obability ap proach - ing 1 as sample sizes grow . Condition (Con1 ) quantifies how slowly τ N must decay . The first req uirement τ N → 0 ensures that under the null K 0 = K th e test statistic T K ev entually falls below the th reshold with high p robab ility (Theor e m 1 ). The secon d requ ir ement gua rantees that for every under-fitted K 0 < K the statistic T K 0 exceeds τ N with proba bility tending to on e (Th e orem 2 ). The sequential nature of GoF–L CM makes it computa tio nally e ffi cient, typically r equiring o nly a few iter a tions b efore stopping at the true K . Remark 2 (Choice of τ N ) . A simple and theo r etically valid defau lt is τ N = N − 1 / 5 . Und er Assumption 4 , we ha ve J K lo g( J K ) = o ( N ) and lo g( N + J ) = o ( N ) . Hence N τ 2 N = N 3 / 5 gr ows faster th an both terms in the d enomina to r , satisfying (Con 1) pr ovided that J K log( J K ) and lo g( N + J ) do not a ppr oa ch the order of N too rapidly . In practice, the algorithm is r obust to moderate variation s o f τ N as long a s it d ecays slowly . Other choices such as τ N = ( log N ) − 1 ar e also admissible under (Co n1) whe n J K log( J K ) gr o ws su ffi ciently slowly . 4.2. RGoF- LCM algo rithm In this section, we develop a ratio-based g oodne ss- of -fit test for the latent class m odel. T his metho d complements the GoF-LCM algorith m by using the ratio o f successiv e test statistics, which often exh ibits m ore robust finite-sam p le behaviour . Recall th at fo r a candidate number of latent classes K 0 , th e test statistic T K 0 is defin ed in Equatio n ( 4 ). For K 0 ≥ 2 , w e in troduce th e ra tio r K 0 : = T K 0 − 1 T K 0 , (5) with the convention that r 1 is n ot defined. Th e ab so lute value hand les the po ssibility that T K may be negative under the true model, thou gh T heorem 1 gua rantees that T K conv erges to z e r o fr om above. Th e following theor em ch aracterizes the asymptotic b e h aviour of the ratio statistic r K 0 . Theorem 4 (Asym ptotic behaviour of the ratio statistic) . Assume that Assumptions 1 – 6 hold, and K 3 = o ( N ) . W e have: 1. (Diver genc e at the true mod el.) F or the true candid ate K 0 = K , r K P − → ∞ as N → ∞ . 2. (Upper bound under un der-fitting.) F or every K 0 with 2 ≤ K 0 < K , lim N →∞ P r K 0 > √ M c low √ K ( K − 1) = 0 , wher e c low is the con stant fr om Lemma 7 (depen ding on ly on the mod el parameters δ , c 0 , c 1 , M , ζ ), an d √ K ( K − 1) r eflects the depen dence on the tru e number o f latent classes. 8 In contrast to the origin al test statistic T K 0 , who se u pper bo und converges to zero und er the null (Theorem 1 ) while the statistic itself exceeds a fixed positiv e constant und er unde r- fitting (Th eorem 2 ), the ratio r K 0 amplifies this di ff erence. It di verges to in fin ity a t th e true model K 0 = K , but for every un der-fitted K 0 < K it rem ains b ound e d above b y a quantity d ependin g on K . Thus r K 0 exhibits a sharp peak at the tru e n umber of latent classes, sugg esting a natural sequential stopp ing rule: comp ute r K 0 for increa sing K 0 and stop at the first value for which th e ratio exceed s a d i verging thr eshold γ N . W e n ow propose a seque n tial alg orithm based o n this idea. T h e algorithm fir st checks the candidate K 0 = 1 using the o r iginal te st statistic with a de caying th r eshold τ N ; if accepted, it retu rns ˆ K = 1 . Otherwise, it proceed s to compute ratios fo r K 0 = 2 , 3 , . . . , K max and stop s at the first K 0 for wh ich r K 0 > γ N . The choice of th resholds must satisfy the condition s in T heorem 5 provided later to ensure consistent e stima tio n of K . Algorithm 2 RGo F- LCM: Ratio-b ased Goo dness-of- Fit fo r L a tent Class Mod e ls Require: Observed r e sponse matrix R ∈ { 0 , 1 , . . . , M } N × J , maximum candidate number K max (default: ⌊ p N / log( N + J ) ⌋ ), thresho ld sequences τ N (default τ N = N − 1 / 5 ) and γ N (default γ N = log N ), and a classifi- cation estimato r M Ensure: Estimated nu m ber of latent classes ˆ K 1: Compu te T 1 with K 0 = 1 2: if T 1 < τ N then 3: return ˆ K = 1 4: end if 5: for K 0 = 2 , 3 , . . . , K max do 6: Compute r K 0 via Equation ( 5 ) 7: if r K 0 > γ N then 8: return ˆ K = K 0 9: break 10: end if 11: end for 12: return ˆ K W e now pr ove that under appropriate co nditions o n the th r eshold seq uences, the RGoF-LCM alg orithm consis- tently estimates the true numb er of late n t classes K . For this theorem , we assume that K is fixed (does not grow with N ) to simplify our analysis and the ch oice of the threshold γ N . Theorem 5 (Con sistency of RGoF-LCM) . Let the true numb er of la tent classes be K (fixed, i.e., not gr owing with N ). Assume th at Assumptions 1 – 6 ho ld. Let the thr e shold sequ ences γ N satisfy: (Con2) γ N − − − − → N →∞ ∞ a nd γ N p N / log J − − − − → N →∞ 0 . Then the estima to r ˆ K pr o duced by A lgorithm 2 satisfies lim N →∞ P ( ˆ K = K ) = 1 . Theorem 5 g uarantees that RGoF-LCM consistently recovers the true numb er of laten t classes when K is fixed. Condition (Con2) ensures that the diverging threshold γ N grows fast enou gh to b e exceed ed by r K at the true mo d el (Theor e m 4 p art 1), yet slowly en o ugh so th at u nder-fitted ratios r K 0 ( K 0 < K ) stay belo w γ N with h igh p r obability (Theor e m 4 part 2). T h e two requir ements together p revent both und er-es timation and over-estimation a sy mptotically . Remark 3 (Why K is assumed fix ed) . Theor em 5 a ssumes that the true num b er of latent classes K doe s not gr o w with the sample size . Th is assumption is mad e for theo r etica l co n v e n ience: the upper bou nd for the und er-fitted ratios r K 0 in Theor em 4 contains a fa ctor √ K ( K − 1) that depends on K . Wh en K is fixed, this factor is a constant, and the condition on γ N can be expr essed in a simple form in depend ent of K . If K wer e allowed to incr ease with N , the bound wo u ld gr ow with K , and the choice of a diver g ing thr eshold γ N would have to depend o n the gr owth rate of 9 K as well. Such an extension is possible in princip le but would make the a nalysis considerably mor e complex. W e ther efo r e r estrict ourselves to the fixed-K setting. In practice , a s long a s K is sma ll r elative to the sample size , the fixed-K analysis pr ovides r eliable guida nce for selecting γ N . Remark 4 (Choice of γ N ) . A simple and theo retically valid default is γ N = log N . Ind eed log N → ∞ , and un der Assumption 4 we hav e J = o ( N ) , h ence log J ≤ log N for a ll su ffi ciently larg e N . Consequently , log N p N / log J ≤ log N p N / log N = (log N ) 3 / 2 √ N − − − − → N →∞ 0 , so γ N = log N satisfies Condition ( Con2). In practice, other slo wly diver ging sequences such as log log N are also admissible as lon g as they meet the gr owth r estrictions. 5. Numerical Studies In this section, we cond uct compr ehensive experimenta l stud ies to ev alu a te the perf ormance o f the propo sed goodn ess-of-fit test and th e two sequential estimation algo rithms GoF-LCM and RGoF-LCM. Our n umerical experi- ments are d esigned to em p irically validate the theoretical p r operties established in Theo rems 1 to 5 . Specifically , we in vestigate: 1. The behavior of the test statistic T K 0 and th e ratio statistic r K 0 under bo th the null hyp othesis ( H 0 : K = K 0 ) and the alternativ e hyp othesis ( H 1 : K > K 0 ). 2. The acceptance rates of GoF-LCM and RGoF-LCM under the true m odel an d their rejection r ates u n der und er- fitted models. 3. The accur a cy of G o F-LCM and RGoF-LCM in estimating th e true number of latent classes K un der v arious combinatio ns of sam ple size N , nu mber of items J , sign al streng th parameter δ , and tru e number of laten t classes K . 4. The comp utational e ffi cien cy of the two proposed m ethods wh en the samp le size N increases. 5. The sensitivity of the algorithm s to the ir thresh olds: τ N for GoF-LCM and γ N for RGoF-LCM. 6. The robustness of GoF-LCM an d RGoF-LCM to the n umber of items J when J grows much faster than N , thereby violating the g r owth condition J = o ( N ) re quired by Assumptio n 4 . 5.1. General simulatio n setu p Data are gene rated f rom the latent class m odels for ord inal categoric a l data as defined in Definition 1 . The generation process follows th e steps outlined below , with all parame ter s cho sen to satisfy Assumptions 1 – 3 unless otherwise n oted. Class membership matrix Z . For a giv e n tr ue number of latent classes K , we assign each of th e N subjects to one of the K classes in depend ently with equal p robability 1 / K . Th is yields a memb ership vecto r ℓ ∈ [ K ] N , an d the classification m atrix Z ∈ { 0 , 1 } N × K is defin ed by Z ( i , k ) = 1 { ℓ ( i ) = k } . Th is ran dom assignme nt ensur e s that Assumption 2 (class ba lance) h olds with hig h p robab ility for su ffi c iently large N . Item parameter matrix Θ . W e g enerate Θ ∈ [0 , M ] J × K with con tr olled sig nal streng th a nd class sepa ration as fo llows. Let M = 5 be fixed throu ghou t all experiments. For e a ch j ∈ [ J ] and k ∈ [ K ], independ ently d raw θ jk ∼ Un iform[ δ M , (1 − δ ) M ] , where δ ∈ (0 , 0 . 5] is th e sign al strength parameter . This construction guarantees δ ≤ θ jk / M ≤ 1 − δ by d efinition, thereby satisfyin g A ssum ption 1 . Assumption 3 requir es that for any two d istinct classes k 1 , k 2 , at least a fraction c 1 of the items exhibit a di ff eren c e in exp e ctation a t least ζ / 2. W e set c 1 = 0 . 3 thr ougho ut an d defin e the separation th r eshold as ζ = (1 − 2 δ ) M / 2, wh ich 10 is half th e length o f the uniform interval. In our experimen ts, the number of items is at least J = 60 and the max imum number o f classes is K = 8. For a fixed pair of distinct classes k 1 , k 2 , co nsider th e in dicator X j = 1 | θ j , k 1 − θ j , k 2 | ≥ ζ / 2 , j = 1 , . . . , J . Because θ j , k 1 and θ j , k 2 are independ e n t and un iformly distributed, a simple ge o metric argumen t y ields P ( X j = 1 ) = 1 − ζ / 2 (1 − 2 δ ) M ! 2 = 3 4 ! 2 = 9 16 , which is inde penden t of δ . He n ce E P J j = 1 X j = 9 J 16 , wh ich fo r J = 6 0 equ als 33 . 75, well above th e re q uired c 1 J = 18. By Ho e ff ding’ s in equality , the pr o bability that a single pair fails to meet the condition is b ounded by P J X j = 1 X j < 0 . 3 J ≤ exp − 2 J 9 16 − 0 . 3 2 ! = exp( − 0 . 1378 1 25 J ) . For J = 60 this bo u nd is exp( − 8 . 26 8 75) ≈ 2 . 57 × 10 − 4 . A union boun d ov e r all K 2 class pa irs (with K = 8, 8 2 = 2 8) g i ves an overall failure pr obability at most 8 2 ! exp( − 0 . 137812 5 × 60) ≈ 0 . 007 2 . Thus, with prob ability exceeding 0 . 9928, a randomly gen erated Θ ma trix automatically satisfies Assumption 3 . For larger J this probability rapidly approaches 1 (e.g., f or J = 7 0 it exceed s 0 . 9 98). Given this strong theoretical guaran tee , no rejection sampling or verification step is nee d ed; all matrice s are used directly in the simulations. The signal strength parameter δ directly contr ols the width of the admissible inter val [ δ M , (1 − δ ) M ] : larger δ r educes ζ , weakening the signal, exactly as intende d . Response ma trix R. Given Z and Θ , the expected response matrix is R = Z Θ ⊤ . For each i ∈ [ N ] and j ∈ [ J ], th e response R ( i , j ) is indepen dently drawn fr o m a Binom ial distribution with M trials and success prob ability R ( i , j ) / M . Estimation an d evaluation . For each simulated dataset, we app ly th e SC-LCM algo rithm (Algor ith m 3 in App endix C ) as the classification estimator M to obtain ˆ Z an d ˆ Θ . The test statistic T K 0 is then co mputed using Equ ation ( 4 ), an d the ratio r K 0 is com puted via Eq uation ( 5 ). F or the sequ ential alg o rithms, we u se th e d efault thresho lds: τ N = N − 1 / 5 for Go F-LCM an d γ N = log N for RGoF- LCM, u nless oth erwise stated . The m aximum can didate n umber is set to K max = ⌊ p N / log( N + J ) ⌋ . All re su lts are based on 200 independe nt Monte Carlo replications. 5.2. Experiment 1: behavior of T K 0 and r K 0 under H 0 and H 1 This experimen t empir ically verifies the sharp dichotomo us behavior of T K 0 (Theor e m s 1 and 2 ) and r K 0 (Theo- rem 4 ). W e fix the true n umber o f latent classes K = 4 , the numb e r of items J = 60 , and the signal strength p arameter δ = 0 . 2. W e then comp u te T K 0 for K 0 = 1 , 2 , 3 , 4 and the ratio statistics r 2 = | T 1 / T 2 | , r 3 = | T 2 / T 3 | , and r 4 = | T 3 / T 4 | . W e vary th e sam p le size N ∈ { 20 0 , 400 , 600 , 800 , 1 000 } . For each N and eac h c a ndidate K 0 , we c o mpute the mean and stand ard d eviation of T K 0 over 200 replication s. T able 1 reports the r e su lts. Under the cor rectly specified mod el ( K 0 = K = 4), the mean of T 4 is close to zer o and its absolu te value dec reases as N g rows, with the standar d deviation also shrinking. This confirms Theor em 1 . For the un der-fitted mo dels ( K 0 = 1 , 2 , 3), T K 0 takes large p ositiv e values that in crease with N , validating the divergence predicted b y Theorem 2 . Th e ratios r 2 = | T 1 / T 2 | and r 3 = | T 2 / T 3 | remain b ounded between 1 . 16 a nd 1 . 21 as N increases, consistent with par t (b) of Theo rem 4 f o r K 0 < K . In contrast, r 4 = | T 3 / T 4 | diver ges b ecause T 3 grows while | T 4 | tends to zer o, co n firming par t (a) of Theo rem 4 for the tru e candid ate K 0 = K . 11 T able 1: Beha vior of T K 0 and ratios r 2 , r 3 , r 4 under H 0 ( K 0 = 4) and under-fitted models ( K 0 = 1 , 2 , 3) for true K = 4. V alues are mean (standard de viation) over 200 replica tions. N T K 0 Ratios K 0 = 1 ( H 1 ) K 0 = 2 ( H 1 ) K 0 = 3 ( H 1 ) K 0 = 4 ( H 0 ) r 2 = | T 1 / T 2 | r 3 = | T 2 / T 3 | r 4 = | T 3 / T 4 | 200 2.020 (0.171) 1.706 (0.147) 1.418 (0.149) -0.037 (0.026) 1.19 (0.12) 1.21 (0.13) 255.42 (2952.98) 400 2.135 (0.167) 1.835 (0.147) 1.551 (0.153) -0.024 (0.018) 1.17 (0.10) 1.19 (0.12) 157.51 (486.75) 600 2.214 (0.174) 1.891 (0.147) 1.620 (0.157) -0.018 (0.014) 1.18 (0.10) 1.18 (0.12) 229.62 (676.48) 800 2.257 (0.167) 1.933 (0.152) 1.662 (0.153) -0.017 (0.011) 1.17 (0.11) 1.17 (0.12) 388.22 (2547.19) 1000 2.278 (0.169) 1.956 (0.150) 1.690 (0.161) -0.014 (0.010) 1.17 (0.10) 1.16 (0.12) 450.91 (2527.84) 5.3. Experiment 2: acceptance and r ejection rates This experiment ev aluates the ability of both sequen tial algorith ms, GoF-LCM and RGoF- L CM, to co rrectly accept the tru e m odel and rejec t u n derfitted models. W e fix N = 1000, J = 60 , δ = 0 . 2, and co nsider true nu mbers of latent classes K ∈ { 2 , 3 , 4 , 5 , 6 } . For each true K , we simulate 200 in depend ent datasets. On each dataset, we apply both a lg orithms: • For GoF-LCM, we reco rd the stoppin g K 0 as the first can didate fo r which T K 0 < τ N (with τ N = N − 1 / 5 ). • For RGoF-LCM, we r ecord the stopping K 0 as the fir st can didate for wh ich r K 0 > γ N (with γ N = lo g N ). If T 1 < τ N , the algorithm stops at K 0 = 1 ; o therwise, it proceeds to com pute r K 0 for K 0 ≥ 2 . W e then compu te the prop o rtion of times each algorithm stops at each candidate K 0 over the 200 rep lications. T able 2 presents th e results. For each true K , the table shows the stopp ing propo r tions fo r GoF-LCM and RGoF- LCM sid e by side, with colum ns covering K 0 = 1 to 10 to f ully capture overfitting behavior . T he main find ings are: • Both algorith ms exhibit hig h accep tance rates at the tru e m odel. For K = 2 , 3 , 4 , 5, bo th meth o ds stop at the true K in all 200 replications. For K = 6, GoF-LCM stop s at K 0 = 6 in all replications, while RGoF-LCM does so in 9 9 . 5% o f th e c a ses, with a small overfitting prob ability of 0 . 5% stopping at K 0 = 7 . • The probab ility o f stopping at an un derfitted mod el ( K 0 < K ) is e ff ectively zero fo r both algorith ms in e very case. • Overfitting ( K 0 > K ) occur s with very low p robability , only observed in the K = 6 case for RGoF-LCM. These results empirically c onfirm the co n sistency of both seq uential alg orithms. T able 2: Proporti on of times GoF-LCM and RGoF-LCM stop at each candidat e K 0 for true K = 2 , 3 , 4 , 5 , 6. Columns cover K 0 = 1 to 10. T rue K Algorithm Stopping K 0 1 2 3 4 5 6 7 8 9 10 2 GoF-LCM 0.000 1.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 RGoF-LCM 0.000 1.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 3 GoF-LCM 0.000 0.000 1.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 RGoF-LCM 0.000 0.000 1.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 4 GoF-LCM 0.000 0.000 0.000 1.000 0.000 0.000 0.000 0.000 0.000 0.000 RGoF-LCM 0.000 0.000 0.000 1.000 0.000 0.000 0.000 0.000 0.000 0.000 5 GoF-LCM 0.000 0.000 0.000 0.000 1.000 0.000 0.000 0.000 0.000 0.000 RGoF-LCM 0.000 0.000 0.000 0.000 1.000 0.000 0.000 0.000 0.000 0.000 6 GoF-LCM 0.000 0.000 0.000 0.000 0.000 1.000 0.000 0.000 0.000 0.000 RGoF-LCM 0.000 0.000 0.000 0.000 0.000 0.995 0.005 0.000 0.000 0.000 12 5.4. Experiment 3: accuracy in e stima tin g K W e now a ssess the e stima tio n accuracy of the propo sed metho ds GoF-LCM and RGoF- L CM, and co mpare them with the spectral thresho lding method (d enoted Spec) p roposed in Equation (17 ) of L yu and Gu [ 20 ], which estimates K by co u nting singular values exceeding 2 . 01( √ J + √ N ). The parameter s ar e : N ∈ { 200 , 600 } , J ∈ { 60 , 100 } , δ ∈ { 0 . 1 , 0 . 2 , 0 . 3 } , and K ∈ { 1 , 2 , 3 , 4 } . For each c ombinatio n, we generate 200 ind ependen t datasets and ap ply all three m ethods. Accura cy is defined as the pro p ortion of replicatio n s where the estimated ˆ K equ als the true K . Standard err ors are co mputed as p accuracy × (1 − accur acy ) / 200 an d are shown in p arentheses. T able 3 repor ts the accu racy for all 48 parameter com binations. Th e ma in fin dings ar e: • The case K = 1 is tri vial: all me thods ach iev e perfec t accuracy a cross all setting s. • GoF-LCM and RGoF-LCM achieve near-perfect accu r acy across almost all setting s. For all combin ations except one, bo th meth ods attain an accuracy of 1 . 000. The on ly exception is K = 4 , N = 200 , J = 60 , δ = 0 . 3, where GoF-LCM yields 0 . 985 an d RGoF-LCM yields 0 . 990. T his dem o nstrates the strong con sistency and reliability of th e propo sed seq uential testing pr ocedur es, which are able to rec over the true num ber of latent classes even u nder relatively weak signal conditions. • The spectral thre sh olding meth od (Spec) perform s u n reliably , with accuracy rang ing from perfect to zero de- pending on th e signal strength. I ts accu racy depend s critically o n the stren gth of the sign al, which is controlled by δ , K , N , an d J . Recall that δ is a signal stren gth parameter : smaller δ allows the class-specific parameters Θ ( j , k ) to take v alues n ear the extremes 0 or M , crea ting substan tial di ff e r ences betwee n classes and a strong signal. Larger δ con fines Θ ( j , k ) to a n arrow interval arou nd M / 2, reducing class sep aration and weakening the signal. The data confir m this interpr etation: – At δ = 0 . 1 (stron g signal), Sp ec p erform s excellen tly , with accuracy 1 . 0 00 in all but o ne case ( K = 4 , N = 200 , J = 60 , δ = 0 . 1 yields 0 . 995 ). – At δ = 0 . 2 (mo derate signal), Spec remain s good overall, b ut some d egradation appears: K = 3 , N = 200 , J = 60 , δ = 0 . 2 giv es 0 . 9 95, and K = 4 , N = 200 , J = 6 0 , δ = 0 . 2 d r ops to 0 . 520. – At δ = 0 . 3 ( weak signal), Spec frequen tly fails. For K = 3, accu racy range s from 0 . 00 0 to 0 . 995 ; fo r K = 4 , it is 0 . 000 in three of four com binations, reach ing on ly 0 . 56 5 at th e largest samp le size ( N = 6 00 , J = 100 ). • While Spe c is simp le a nd co mputation ally cheap, its perfo rmance is high ly sensitive to the signal stre n gth. I n weak-signal regimes ( δ = 0 . 3), it can com pletely fail. In c o ntrast, Go F- LCM an d RGoF-LCM adap tiv ely test the fit of candidate models and maintain near-perfect accur acy across all scen arios, includ in g the weakest signal considered . The slight ad vantage of RGo F-LCM over GoF-LCM in the hardest case ( K = 4 , N = 200 , J = 60 , δ = 0 . 3) con firms its ma rgin ally improved robustness These results highlight the practical advantage of the pro p osed sequential testing framew ork: it delivers high ly accurate and robust estimates of the number of latent classes without requiring manual tuning, where a s the simple spectral thresholding metho d can fail catastrophica lly wh en the sign al is weak . 5.5. Experiment 4: computational e ffi ciency under weak signal W e investigate the perf ormance o f GoF-LCM, RGoF-LCM, an d Spec in a ch allenging weak-signal scenario with signal strength parameter δ = 0 . 3, nu mber of item s J = 60, tru e n umber o f latent classes K = 8, and sample size N ranging f rom 4 00 to 4000 in increm ents of 400. For each con figuration , we generate 200 indep endent datasets and apply b o th alg orithms. The a ccuracy (pro portion of correct K estimates) and the average runn ing time (in seco nds) are recorde d . Figur e 1 display s the results. Under th is weak signal scen ario, RGoF-LCM demo nstrates strong robustness, achieving high a ccuracy even at the smallest sample size and reach ing per fect accuracy for almost all samp les. GoF- LCM, on the other h and, has low accur acy at N = 4 00 and N = 800 , b ut its p e rforman ce im proves rapidly as N increases, r e aching perfect acc uracy for N ≥ 1600 . In contr ast, Spec fails completely in this weak-signa l regime: its accuracy is almost always z ero. All th ree methods are comp utationally e ffi cient. The run ning times of GoF-LCM and RGoF-LCM scale app roximate ly linearly with N , and even at N = 4 000, th e a verag e time remains below 0 . 6 seconds. Spec is extremely fast, w ith n egligible running time across all sample sizes, due to its simple singular 13 T able 3: Acc urac y (with standard error in parentheses) of GoF-LCM, RGoF-LCM, and the spectral thresholding method (Spec) for all parameter combinat ions. Parame ters GoF-LCM RGoF-LCM Spec K N J δ Accura cy (S.E. ) Accurac y (S.E.) Accuracy (S.E. ) 1 200 60 0.1 1.000 (0.000) 1.000 (0.000) 1.000 (0.000) 1 200 60 0.2 1.000 (0.000) 1.000 (0.000) 1.000 (0.000) 1 200 60 0.3 1.000 (0.000) 1.000 (0.000) 1.000 (0.000) 1 200 100 0.1 1.000 (0.000) 1.000 (0.000) 1.000 (0.000) 1 200 100 0.2 1.000 (0.000) 1.000 (0.000) 1.000 (0.000) 1 200 100 0.3 1.000 (0.000) 1.000 (0.000) 1.000 (0.000) 1 600 60 0.1 1.000 (0.000) 1.000 (0.000) 1.000 (0.000) 1 600 60 0.2 1.000 (0.000) 1.000 (0.000) 1.000 (0.000) 1 600 60 0.3 1.000 (0.000) 1.000 (0.000) 1.000 (0.000) 1 600 100 0.1 1.000 (0.000) 1.000 (0.000) 1.000 (0.000) 1 600 100 0.2 1.000 (0.000) 1.000 (0.000) 1.000 (0.000) 1 600 100 0.3 1.000 (0.000) 1.000 (0.000) 1.000 (0.000) 2 200 60 0.1 1.000 (0.000) 1.000 (0.000) 1.000 (0.000) 2 200 60 0.2 1.000 (0.000) 1.000 (0.000) 1.000 (0.000) 2 200 60 0.3 1.000 (0.000) 1.000 (0.000) 0.880 (0.023) 2 200 100 0.1 1.000 (0.000) 1.000 (0.000) 1.000 (0.000) 2 200 100 0.2 1.000 (0.000) 1.000 (0.000) 1.000 (0.000) 2 200 100 0.3 1.000 (0.000) 1.000 (0.000) 1.000 (0.000) 2 600 60 0.1 1.000 (0.000) 1.000 (0.000) 1.000 (0.000) 2 600 60 0.2 1.000 (0.000) 1.000 (0.000) 1.000 (0.000) 2 600 60 0.3 1.000 (0.000) 1.000 (0.000) 1.000 (0.000) 2 600 100 0.1 1.000 (0.000) 1.000 (0.000) 1.000 (0.000) 2 600 100 0.2 1.000 (0.000) 1.000 (0.000) 1.000 (0.000) 2 600 100 0.3 1.000 (0.000) 1.000 (0.000) 1.000 (0.000) 3 200 60 0.1 1.000 (0.000) 1.000 (0.000) 1.000 (0.000) 3 200 60 0.2 1.000 (0.000) 1.000 (0.000) 0.995 (0.005) 3 200 60 0.3 1.000 (0.000) 1.000 (0.000) 0.000 (0.000) 3 200 100 0.1 1.000 (0.000) 1.000 (0.000) 1.000 (0.000) 3 200 100 0.2 1.000 (0.000) 1.000 (0.000) 1.000 (0.000) 3 200 100 0.3 1.000 (0.000) 1.000 (0.000) 0.440 (0.035) 3 600 60 0.1 1.000 (0.000) 1.000 (0.000) 1.000 (0.000) 3 600 60 0.2 1.000 (0.000) 1.000 (0.000) 1.000 (0.000) 3 600 60 0.3 1.000 (0.000) 1.000 (0.000) 0.450 (0.035) 3 600 100 0.1 1.000 (0.000) 1.000 (0.000) 1.000 (0.000) 3 600 100 0.2 1.000 (0.000) 1.000 (0.000) 1.000 (0.000) 3 600 100 0.3 1.000 (0.000) 1.000 (0.000) 0.995 (0.005) 4 200 60 0.1 1.000 (0.000) 1.000 (0.000) 0.995 (0.005) 4 200 60 0.2 1.000 (0.000) 1.000 (0.000) 0.520 (0.035) 4 200 60 0.3 0.985 (0.009) 0.990 (0.007) 0.000 (0.000) 4 200 100 0.1 1.000 (0.000) 1.000 (0.000) 1.000 (0.000) 4 200 100 0.2 1.000 (0.000) 1.000 (0.000) 1.000 (0.000) 4 200 100 0.3 1.000 (0.000) 1.000 (0.000) 0.000 (0.000) 4 600 60 0.1 1.000 (0.000) 1.000 (0.000) 1.000 (0.000) 4 600 60 0.2 1.000 (0.000) 1.000 (0.000) 0.980 (0.010) 4 600 60 0.3 1.000 (0.000) 1.000 (0.000) 0.000 (0.000) 4 600 100 0.1 1.000 (0.000) 1.000 (0.000) 1.000 (0.000) 4 600 100 0.2 1.000 (0.000) 1.000 (0.000) 1.000 (0.000) 4 600 100 0.3 1.000 (0.000) 1.000 (0.000) 0.565 (0.035) 14 0 1000 2000 3000 4000 Sample size N 0 0.2 0.4 0.6 0.8 1 Accuracy Accuracy vs. N GoF-LCM RGoF-LCM Spec 0 1000 2000 3000 4000 Sample size N 0 0.1 0.2 0.3 0.4 0.5 0.6 Running time (seconds) Running time vs. N GoF-LCM RGoF-LCM Spec Figure 1: Accurac y (left) and running time (right) of GoF-LCM and RGoF-LCM for K = 8, J = 60, δ = 0 . 3, with vary ing N . value thre sh olding proced ure. These results confirm that RGoF- L CM is th e method of c h oice in weak-signal settings, o ff ering both h igh accuracy and fast computation across the entire range of sample sizes. GoF-L CM also ach iev es excellent accuracy on ce the sample size is su ffi ciently large, while Spe c, despite its speed, is u nreliable when th e signal is weak . 5.6. Experiment 5: sensitivity to thr e sholds This final experiment investigates the sen siti v ity of th e two a lg orithms to their p rimary tuning p arameters. W e fix K = 5, N = 10 00, J = 6 0, and δ = 0 . 2 . 5.6.1. Sensitivity of Go F-LCM to τ N = N − ǫ W e vary th e d ecay parameter ǫ in the thr eshold τ N = N − ǫ over the set { 0 . 1 , 0 . 2 , 0 . 3 , 0 . 4 , 0 . 5 , 0 . 6 , 0 . 7 , 0 . 8 , 0 . 9 , 1 } . For each ǫ , we run GoF-LCM on 2 0 0 simu la ted datasets and record the accu racy . The left pa r t of T ab le 4 shows the results. Accuracy rem a ins perfec t (1 . 000) fo r ǫ ≤ 0 . 4, then gradually de clines: 0 . 995 a t ǫ = 0 . 5 , 0 . 990 at ǫ = 0 . 6, 0 . 970 at ǫ = 0 . 7 , 0 . 960 at ǫ = 0 . 8, an d finally 0 . 935 and 0 . 9 30 at ǫ = 0 . 9 a nd 1 . 0, respectively . Overall, GoF-L CM is robust to the choice of ǫ as lo ng as it does not exceed 0 . 5; beyond this p oint, perf ormance gradually degrades. Finally , the default ǫ = 0 . 2 lies well within the stable region and yields perfect ac c uracy . 5.6.2. Sensitivity of R GoF-LCM to γ N = a lo g N W e vary the mu ltiplier a in the thresho ld γ N = a log N over the set { 0 . 5 , 1 . 0 , 1 . 5 , 2 . 0 , 2 . 5 , 3 . 0 , 3 . 5 , 4 . 0 , 4 . 5 , 5 . 0 } . For each a , we run RGoF-LCM o n 200 simulated data sets a nd rec o rd th e accura cy . The right p art of T able 4 shows the results. Accuracy is per fect (1 . 000) f or all a ≤ 4 . 5 , a nd only slightly lower (0 . 995) at a = 5 . 0. This dem onstrates tha t RGoF-LCM is highly ro bust to the ch oice of the m ultiplier a over a wide range. Finally , the default a = 1 ( γ N = lo g N ) lies well within the stable region. 5.7. Experiment 6: performance und er lar ge J ( v iolating J = o ( N ) ) The th e oretical analysis in Sec tions 3 and 4 relies o n Assump tion 4 , which requ ires J K log( J K ) N → 0. In particu lar , this assump tion fo rces the numb er of items J to g row slower than the samp le size N . This condition is used in the proof s to control accu mulated estimation er rors. Exper iment 6 investigates the p erform ance of GoF-LCM and RGoF- LCM wh en J is allo wed to be substantially larger th an N , i. e . wh en J grows faster than N an d the term J K log( J K ) N does not tend to zer o. W e fix th e sample size N = 60 0, th e true nu mber of laten t classes K = 8, and the sign al strength param eter δ = 0 . 3 (weak signal). Th e nu mber of item s J is varied from 200 to 2000 in steps of 200. For each value of J , we gener ate 200 15 T able 4: Sensitiv ity of GoF-LCM to ǫ in τ N = N − ǫ and of RGoF-LCM to multiplie r a in γ N = a log N . GoF-LCM: τ N = N − ǫ ǫ Accuracy 0.1 1.000 0.2 1.000 0.3 1.000 0.4 1.000 0.5 0.995 0.6 0.990 0.7 0.970 0.8 0.960 0.9 0.935 1.0 0.930 RGoF-LCM: γ N = a log N a Accuracy 0.5 1.000 1.0 1.000 1.5 1.000 2.0 1.000 2.5 1.000 3.0 1.000 3.5 1.000 4.0 1.000 4.5 1.000 5.0 0.995 indepen d ent datasets. Fig u re 2 rep orts the accuracy of both algo rithms f or all con sidered values of J . Remark ably , for ev ery value o f J (including J = 20 00, which is mor e than three times of N ), b oth GoF-LCM a nd RGoF-L CM achieve almost perfect accu racy of 1 . 000. 0 500 1000 1500 2000 Number of items J 0 0.2 0.4 0.6 0.8 1 Accuracy Accuracy vs. J GoF-LCM RGoF-LCM Figure 2: Accurac y of GoF-LCM and RGoF-LCM K = 8, N = 600, δ = 0 . 3, with varying J . These resu lts show th a t the pro posed sequential testing p rocedur es are r obust to the dimension of the item spac e. Although Assum ption 4 requires J = o ( N ) for the th e o retical der iv ations, the a c tu al p erform a nce d oes n ot deteriorate ev en w h en J significantly exceeds N . For exam ple, at J = 2000 we have J K log( J K ) N ≈ 258 , wh ich is far from zero , yet the algorithm s still recover the true number of classes with p robability one. Th is suggests tha t the growth co ndition in Assumption 4 is a su ffi cient con dition impo sed by th e pro of techniq ue and ma y not b e n ecessary for the co nsistency of the a lgorithms. This empir ical finding op ens the p ossibility of r elaxing the g rowth co nditions in fu ture theor etical work. 5.8. Real data e xample W e illustrate the p roposed sequen tial testing pro cedures using a pu b licly av ailab le dataset from the Open-So urce Psychometrics Pr oject. The data file ‘rando mnumb er .zip‘ can be downloaded f rom https: //ope npsychomet rics.org/_rawdata/ . The survey consisted of a cognitive task ( g enerating r andom nu mbers) followed by a standa r d Big Fi ve Perso nality T est (BFPT). The test com prises J = 50 item s, each rated on a six-point scale with integer values fr o m 0 to 5. For 16 each item, 0 in dicates the lowest level of agreemen t an d 5 the hig hest. The dataset co ntains resp onses fro m N = 1369 individuals, yielding a response ma tr ix R ∈ { 0 , 1 , . . . , 5 } 1369 × 50 . 5 10 15 1 1.2 1.4 1.6 1.8 2 2.2 5 10 15 0.9 1 1.1 1.2 1.3 1.4 1.5 Figure 3: T est statisti c T K 0 (left) and ratio r K 0 = | T ( K 0 − 1) / T ( K 0 ) | (right) versus candida te number of latent classes K 0 for the BFPT dataset. The left panel of Figure 3 plots the test statistic T K 0 for th e BFPT d ata ( N = 1369 , J = 50) . All values lie between 1 and 2 . 2 , well above GoF-LCM’ s default thr eshold τ N = N − 1 / 5 ≈ 0 . 2 4. Consequ e ntly , GoF-LCM n ev er encou n ters a K 0 satisfying T K 0 < τ N and th e r efore runs to the max imum c andidate K max = ⌊ p N / log( N + J ) ⌋ ≈ 13, p roducin g ˆ K = 13 by d efault. Thus, GoF-LCM yield s an estimate that is likely too large and provides little insight into the underly ing stru cture. The right panel displays the ratio r K 0 = | T K 0 − 1 / T K 0 | . It attains a clear peak at K 0 = 2 ( r 2 ≈ 1 . 47 ), while for K 0 ≥ 3 the ratios fluctu a te between 0 . 9 and 1 . 2 witho ut further stru cture. This p attern indicates that the re lati ve ch ange in T K 0 is largest when moving from K 0 = 1 to K 0 = 2 , suggesting th at a two-class model cap tures th e dom inant hetero geneity in th e d ata. Such a profile is exactly what RGoF-LCM is designed to exploit. Theorem 5 requires the stopp ing th reshold γ N to satisfy γ N → ∞ an d γ N = o ( p N / log J ); any such sequen ce is admissible. The default γ N = log N ≈ 7 . 22 exceed s r 2 , so the formal sequ ential proced ure does not stop at K 0 = 2. Nev ertheless, the p r onoun ced maximum pr ovides valuable diag nostic information : it suggests that a two-class mod el o ff ers the m ost p roper description of th e BFPT data, e ven thoug h the default threshold is to o conservati ve to trigg er a stop. This o bservation highligh ts a key di ff ere nce between the two approac h es. GoF-LCM relies on an abso lute thresh - old an d fails when m odel assump tions are violated. RGoF-LCM, b y focusing on relativ e im provements, can re veal dominan t structu r al transitions throu gh the shape of r K 0 —ev en when all T K 0 values a r e inflated. In Fig u re 3 , this lea ds to the clear peak at K 0 = 2 , a co nclusion that would be m issed by examining T K 0 alone or by a mecha n ical app lication of the d efault thresho ld rule. From a broade r perspe c ti ve, these results illu stra te the v alue of diagnostic tools sensiti ve to relativ e rather th an absolute fit. GoF-LCM enjoys clean asymptotic th eory under correct spe c ification, but RGoF-LCM demonstrates greater empirical robustness in practice. For the BFPT data, the evidence supp orts ˆ K = 2 as a reasonab le descriptio n , a co nclusion that em erges clearly from the r atio p rofile. 6. Conclusion W e develop a goodn ess-of-fit fr amew ork f o r determin ing the numb er of latent classes in ord in al categorical data. The framew o rk intr o duces a test statistic, exhibiting a sharp dich otomou s behavior that leads to consistent sequen- tial estimation. By transformin g a challeng ing model selection problem into simple sequ ential tests, the proposed 17 methods are both computatio nally e ffi cient an d theo retically princip led. Simulation studies an d real-data applicatio ns demonstra te th e ir acc uracy and reliability . Sev e r al directions for future resear ch emerge f rom this work. Relaxing th e current mo deling assumptions is a nat- ural extension. The bin omial respo nse assum p tion could b e generalized to accomm odate oth er exponen tial- family dis- tributions, such as Poisson for count data or multinomial for polytomo us responses. The bound edness and separation condition s might be wea kened thro ugh more refined con c entration argum ents. Extending the framework to richer la- tent stru ctures p r esents an other impo rtant dir ection. M ixed -member ship formu lations, such as Gra d e-of-Membership (GoM) mo dels W ood bury et al. [ 28 ], allow individuals partial membership acro ss multiple laten t profiles. Degree heteroge n eous laten t class models L yu et al. [ 1 9 ] (Dh-LCM) introd u ce additional in dividual-specific parameters to capture varying lev els o f activity . For these more complex mode ls, a fun d amental first step is to estimate the num b er o f latent classes K , wh ich remains largely u n explored and cou ld b uild upo n the sequ ential te sting framework developed in this work , thou gh adap ting them would require f u ndamen tal re consideratio n of th e residual co n cept. H ie r archical latent structures, where classes themselves are organized in to higher-order g roupin gs, p ose a further challenge that may n e c essitate layer e d testing proc e dures. Finally , e xtending to dynamic settings, such as latent transition mod els for lo ngitudin a l data, would enable studying how class memberships e volve over time. Pursuin g these dir ections will promising ly extend ou r fram ew o r k to more complex data en v ironmen ts. CRediT authorship co nt rib ution statement Huan Qing is the so le au thor o f th is article. Declaratio n of competing interest The author declares n o com peting interests. Data availability Data and co de will b e m ade available on request. Ap pendix A. T echnical proofs Append ix A.1. Pr o of of Lemma 1 Pr oof. For any fixed ( i , j ), we have E [ R ∗ ( i , j )] = E [ R ( i , j )] − R ( i , j ) p N V ( i , j ) = 0 . The variance is V ar( R ∗ ( i , j )) = V ar ( R ( i , j )) N V ( i , j ) . Since V ar( R ( i , j )) = V ( i , j ), we obtain V ar( R ∗ ( i , j )) = V ( i , j ) N V ( i , j ) = 1 N . By Assumption 1 , we have p i j : = R ( i , j ) M ∈ [ δ , 1 − δ ]. Then V ( i , j ) = R ( i , j ) 1 − R ( i , j ) M ! = M p i j (1 − p i j ) . 18 The function g ( p ) = p (1 − p ) is continu ous and concave on [ δ , 1 − δ ]. Since g ( p ) is symm etric about p = 1 / 2 and δ ≤ 1 / 2 , its minimum on [ δ , 1 − δ ] is attained at bo th end points, i.e . , g ( p ) ≥ g ( δ ) = δ (1 − δ ) for all p ∈ [ δ , 1 − δ ] . Consequently , we have V ( i , j ) ≥ M δ (1 − δ ) for all i ∈ [ N ] , j ∈ [ J ] . Because R ( i , j ) takes values in { 0 , 1 , . . . , M } , we have | R ( i , j ) − R ( i , j ) | ≤ M . Hence, we g et | R ∗ ( i , j ) | = | R ( i , j ) − R ( i , j ) | p N V ( i , j ) ≤ M √ N · M δ (1 − δ ) = r M N δ (1 − δ ) . Hence, the maximum entr ywise m agnitude (d enoted by σ ∗ in m atrix concentra tio n ine q ualities) satisfies σ ∗ : = m ax i , j k R ∗ ( i , j ) k ∞ ≤ r M N δ (1 − δ ) . For each row i ∈ [ N ], we have J X j = 1 E ( R ∗ ( i , j )) 2 = J X j = 1 1 N = J N , which g i ves σ 1 : = m ax i ∈ [ N ] v u t J X j = 1 E ( R ∗ ( i , j )) 2 = r J N . For each colum n j ∈ [ J ], we have N X i = 1 E ( R ∗ ( i , j )) 2 = N X i = 1 1 N = 1 , which g i ves σ 2 : = m ax j ∈ [ J ] v u t N X i = 1 E ( R ∗ ( i , j )) 2 = 1 . The following lemma is ob tained f rom th e stateme n ts after Corollary 3 .12 o f [ 3 ]. Lemma 3. Let X b e a ny n 1 × n 2 random matrix with indepen dent entries satisfying E [ X i j ] = 0 . De fi ne σ 1 = m ax i s X j E [ X 2 i j ] , σ 2 = m ax j s X i E [ X 2 i j ] , σ ∗ = m ax i , j k X i j k ∞ . Then for any 0 < η ≤ 1 2 , th er e exists a constan t C η > 0 such that for all t ≥ 0 , P k X k ≥ (1 + η )( σ 1 + σ 2 ) + t ≤ ( n 1 + n 2 ) exp − t 2 C η σ 2 ∗ ! . W e apply Le m ma 3 with X = R ∗ , n 1 = N , n 2 = J , and σ 1 , σ 2 , σ ∗ as co mputed befo re. Sub stituting these values, for any t ≥ 0, we obtain P k R ∗ k ≥ (1 + η ) r J N + 1 + t ≤ ( N + J ) exp − t 2 δ (1 − δ ) N C η M ! . 19 Let γ > 2 be a fixed constant (e.g ., γ = 3). Choose t = q γ C η M log( N + J ) δ (1 − δ ) N . Substitutin g th is t into th e r ight-han d side of the ab ove ineq uality g ives ( N + J ) exp − t 2 δ (1 − δ ) N C η M ! = ( N + J ) exp − γ C η M lo g( N + J ) δ (1 − δ ) N · δ (1 − δ ) N C η M ! = ( N + J ) 1 − γ . Since γ > 2 , ( N + J ) 1 − γ = o (1) a s N → ∞ . Ther e fore, with prob a bility at least 1 − o (1), we have k R ∗ k ≤ (1 + η ) r J N + 1 + s γ C η M lo g( N + J ) δ (1 − δ ) N . Because the term q γ C η M log( N + J ) δ (1 − δ ) N = O q log N N ! = o (1) and the parameter η can b e chosen arbitrarily sm all, we obtain th at with p robab ility approach ing 1, k R ∗ k ≤ 1 + r J N + o (1) . Thus, for any fixed ǫ > 0, we have lim N →∞ P k R ∗ k ≤ 1 + r J N + ǫ = 1 . Append ix A.2. Pr o of of Lemma 2 Pr oof. W e prove this lemma via five parts. Part 1: A high-probability event. By Lem ma 4 and the fact that a fin ite intersection of e vents with probab ility tending to 1 still has probability ten ding to 1, there exists an event F N with P ( F N ) → 1 such that on F N the following hold simu ltaneously: (i) ˆ Z = Z Π ; (ii) K X k = 1 J X j = 1 ˆ Θ ( j , π ( k )) − Θ ( j , k ) 2 ≤ C 1 J K 2 N ; (iii) max i ∈ [ N ] , j ∈ [ J ] ˆ R ( i , j ) − R ( i , j ) ≤ C 2 r K log( J K ) N ; (iv) ˆ V ( i , j ) ≥ v min ∀ i , j ; (v) V ( i , j ) ≥ δ ( 1 − δ ) M ∀ i , j . (Here V ( i , j ) = R ( i , j )(1 − R ( i , j ) / M ), ˆ V ( i , j ) = ˆ R ( i , j )(1 − ˆ R ( i , j ) / M ).) All subsequ ent estimates are perform ed pointwise o n F N and are therefore deter ministic on this event. Part 2: Dec omposition of the di ff erence matrix. Set ∆ : = ˜ R − R ∗ ∈ R N × J . On F N , ˆ V ( i , j ) > 0 by (iv), so the definition in Eq uation ( 2 ) red uces to the standard case with a positiv e denomin ator . For a fixed pair ( i , j ), write A : = R ( i , j ) , p : = R ( i , j ) , ˆ p : = ˆ R ( i , j ) , v : = V ( i , j ) , ˆ v : = ˆ V ( i , j ) . Then, we hav e ∆ ( i , j ) = A − ˆ p √ N ˆ v − A − p √ N v = ( A − p ) 1 √ N ˆ v − 1 √ N v + p − ˆ p √ N ˆ v . 20 Define two N × J matrices E 1 , E 2 entrywise by E 1 ( i , j ) : = ( A − p ) 1 √ N ˆ v − 1 √ N v , E 2 ( i , j ) : = p − ˆ p √ N ˆ v . Thus ∆ = E 1 + E 2 on F N . Part 3: Spectral norm of E 2 . Becau se ˆ p = ˆ R ( i , j ) = ˆ Θ ( j , π ( ℓ ( i ))) is co nstant on each true class, E 2 ( i , j ) dep ends only on the c lass of subject i and th e item j . Let C k : = { i : ℓ ( i ) = k } for k ∈ [ K ]. On F N , f or i ∈ C k , we h av e ˆ V ( i , j ) = ˆ Θ ( j , π ( k )) 1 − ˆ Θ ( j , π ( k )) M = : ˆ v k ( j ) , which is inde penden t of i ∈ C k and, b y (i v), satisfies ˆ v k ( j ) ≥ v min . Hence f o r i ∈ C k , we h av e E 2 ( i , j ) = Θ ( j , k ) − ˆ Θ ( j , π ( k )) p N ˆ v k ( j ) . Construct V ∈ R J × K by V ( j , k ) : = Θ ( j , k ) − ˆ Θ ( j , π ( k )) p N ˆ v k ( j ) . Since Z ( i , k ) = 1 { i ∈C k } , we have ( Z V ⊤ ) i j = V ( j , ℓ ( i )) = E 2 ( i , j ). Th us, we get E 2 = Z V ⊤ on F N . Z ⊤ Z = diag( N 1 , . . . , N K ) gives k Z k = √ max k N k = : √ N max . Assump tio n 2 gi ves c 0 N / K ≤ N k ≤ N / ( c 0 K ), so N max ≤ N / ( c 0 K ) and k Z k ≤ r N c 0 K . Using ˆ v k ( j ) ≥ v min gets k V k 2 F = K X k = 1 J X j = 1 Θ ( j , k ) − ˆ Θ ( j , π ( k )) 2 N ˆ v k ( j ) ≤ 1 N v min K X k = 1 J X j = 1 Θ ( j , k ) − ˆ Θ ( j , π ( k )) 2 . On F N , ( ii) b ounds the do uble su m by C 1 J K 2 / N . Thus, we h av e k V k 2 F ≤ 1 N v min · C 1 J K 2 N = C 1 v − 1 min J K 2 N 2 , k V k F ≤ q C 1 v − 1 min K √ J N . Submultiplicativity of the spectral n orm g iv e s k E 2 k ≤ k Z k k V k ≤ k Z k k V k F ≤ r N c 0 K q C 1 v − 1 min K √ J N = r C 1 c 0 v min | {z } = : C E 2 r J K N . By Assumption 4 , we h av e √ J K / N = o (1) . Con seq uently , f or any ε > 0, ∃ N 0 s.t. ∀ N ≥ N 0 , on F N we ha ve k E 2 k < ε , which gives k E 2 k = o ( 1) on F N . Part 4: Spect ral norm of E 1 . Set f ( x ) = 1 / √ x for x > 0. W e have f ′ ( x ) = − 1 2 x − 3 / 2 and | f ′ ( x ) | = 1 2 x − 3 / 2 . Fix ( i , j ). On F N , ˆ v ≥ v min by ( iv). Fr om (v ), v ≥ δ (1 − δ ) M . Because δ ≤ 1 2 , one has δ ( 1 − δ ) ≥ δ 2 / 4, h ence v ≥ v min as well. Thus ˆ v , v ∈ [ v min , ∞ ). Apply the mean value theore m to f on th e in terval between ˆ v and v : ther e exists ξ strictly between ˆ v an d v such that 1 √ ˆ v − 1 √ v = f ′ ( ξ ) ( ˆ v − v ) . Since ξ ≥ v min , we h av e 1 √ ˆ v − 1 √ v = | f ′ ( ξ ) | | ˆ v − v | ≤ 1 2 v − 3 / 2 min | ˆ v − v | . 21 Define h ( x ) = x (1 − x / M ) fo r x ∈ [0 , M ]. Then, we have ˆ v = h ( ˆ p ), v = h ( p ). | h ′ ( x ) | = | 1 − 2 x / M | ≤ 1 for all x ∈ [0 , M ], so the mean v alue theorem y ields | ˆ v − v | = | h ( ˆ p ) − h ( p ) | ≤ max z ∈ [0 , M ] | h ′ ( z ) | | ˆ p − p | ≤ | ˆ p − p | . Thus, we h av e 1 √ ˆ v − 1 √ v ≤ 1 2 v − 3 / 2 min | ˆ p − p | . Because R ( i , j ) ∈ { 0 , 1 , . . . , M } and R ( i , j ) ∈ [0 , M ], we have | A − p | ≤ M . Therefo re, we g et | E 1 ( i , j ) | ≤ | A − p | 1 √ N 1 √ ˆ v − 1 √ v ≤ M 2 √ N v 3 / 2 min | ˆ p − p | . On F N , ( iii) giv es max i , j | ˆ p − p | ≤ C 2 q K log( J K ) N . Hen ce, for every ( i , j ), we hav e | E 1 ( i , j ) | ≤ MC 2 2 v 3 / 2 min r K log( J K ) N 2 = : MC 2 2 v 3 / 2 min | {z } = : C E 1 r K lo g( J K ) N 2 . For any matrix , k E 1 k ≤ k E 1 k F and k E 1 k F = q P N i = 1 P J j = 1 E 1 ( i , j ) 2 ≤ √ N J max i , j | E 1 ( i , j ) | . Consequently , on F N , we h av e k E 1 k ≤ √ N J C E 1 r K log( J K ) N 2 = C E 1 r J K log( J K ) N . Assumption 4 dir e ctly gi ves J K log( J K ) / N → 0, so p J K lo g( J K ) / N = o ( 1). Thu s for any ε > 0 , ∃ N 0 s.t. ∀ N ≥ N 0 , on F N we h av e k E 1 k < ε . Part 5: Con version to o P (1) . On the e vent F N , th e trian gle inequality yields k ∆ k ≤ k E 1 k + k E 2 k ≤ C E 1 r J K log( J K ) N + C E 2 r J K N . Both terms are o (1). Ther e fore, f or e very ε > 0 there exists N 0 (depen d ing o nly on ε and the model co nstants) such tha t for all N ≥ N 0 and every realization belon ging to F N , k ∆ k < ε . Recall P ( F N ) → 1. For an arb itrary ε > 0, we h av e P k ∆ k > ε = P {k ∆ k > ε } ∩ F N + P {k ∆ k > ε } ∩ F c N ≤ P {k ∆ k > ε } ∩ F N + P ( F c N ) . From the previous resu lts, we k now that for su ffi ciently large N the set {k ∆ k > ε } ∩ F N is empty . Hence its probab ility is 0 . Thus, we h av e lim sup N →∞ P ( k ∆ k > ε ) ≤ lim N →∞ P ( F c N ) = 0 , and since ε > 0 was arb itrary , k ∆ k P − → 0 , which is exactly Equation ( 3 ). Append ix A.3. Pr o of of Theo r em 1 Pr oof. For any two real ma trices o f iden tical d im ensions, W eyl’ s ine q uality for singular values asserts | σ 1 ( A ) − σ 1 ( B ) | ≤ k A − B k . Applying it with A = ˜ R an d B = R ∗ (both are N × J m a tr ices) y ields th e d eterministic bound σ 1 ( ˜ R ) ≤ σ 1 ( R ∗ ) + k ˜ R − R ∗ k . Fix an arbitrary ε > 0, we have σ 1 ( ˜ R ) > 1 + p J / N + ε ⊆ σ 1 ( R ∗ ) > 1 + p J / N + ε/ 2 ∪ k ˜ R − R ∗ k > ε/ 2 , 22 which y ields P σ 1 ( ˜ R ) > 1 + p J / N + ε ≤ P σ 1 ( R ∗ ) > 1 + p J / N + ε/ 2 + P k ˜ R − R ∗ k > ε/ 2 . By Le m mas 1 and 2 , we have that both pro babilities on th e right-hand side of th e above eq uation converge to zero as N → ∞ . Consequently , we get lim N →∞ P σ 1 ( ˜ R ) > 1 + p J / N + ε = 0 . Recall the d efinition o f T K 0 is T K 0 = σ 1 ( ˜ R ) − 1 + √ J / N , so f or any ε > 0 , w e h av e { T K 0 > ε } = { σ 1 ( ˜ R ) > 1 + p J / N + ε } , which imm ediately im plies lim N →∞ P T K 0 > ε = 0 . Hence, we h av e P T K 0 < ε − → 1 , as N → ∞ . This completes the pro of of Theo rem 1 . Append ix A.4. Pr o of of Theo r em 2 Pr oof of Theor em 2 . W e work cond itio nally o n the output of the classifier M . Lemm a 5 holds determ inistically for ev ery possible ˆ Z whe n ev er K 0 < K . Step 1. Det erministic objects from Lemma 5 . Because K 0 < K , L emma 5 guarante e s the existence of two distinct true classes k 1 , k 2 ∈ [ K ], subsets S 1 ⊆ C k 1 , S 2 ⊆ C k 2 , S : = S 1 ∪ S 2 ⊆ [ N ] , T ⊆ [ J ] , and unit vectors u ∈ R | S | , v ∈ R | T | with the specific forms constructed in the lemma (see its proo f) su c h that the following deterministic p roperties hold: | S 1 | , | S 2 | ≥ c 0 N K K 0 , | S | = | S 1 | + | S 2 | ≤ 2 N c 0 K , | T | ≥ c 1 J , u i = α/ √ | S 1 | , i ∈ S 1 , − β/ √ | S 2 | , i ∈ S 2 , v j = s j √ | T | , s j = sig n Θ ( j , k 1 ) − Θ ( j , k 2 ) , where α = √ | S 2 | / √ | S 1 | + | S 2 | , β = √ | S 1 | / √ | S 1 | + | S 2 | . Moreover , f rom th e constru ction in L emma 5 , we ha ve the deterministic lower bound u ⊤ ( R − ˆ R ) S , T v = γ 0 √ | T | X j ∈ T | Θ ( j , k 1 ) − Θ ( j , k 2 ) | ≥ c ζ √ N J √ K K 0 , (A.1) where γ 0 = √ | S 1 || S 2 | / ( | S 1 | + | S 2 | ). All subjects in S are assigne d by ˆ Z to the same estimated class, say κ ∈ [ K 0 ]; consequen tly fo r e very j ∈ T the fitted value ˆ R ( i , j ) = ˆ Θ ( j , κ ) is constant on S × { j } . Step 2 . Sca ling f a ctors. For j ∈ T , d efine the following scaling factors: γ j : = 1 q N ˆ V ( i , j ) , i ∈ S , where ˆ V ( i , j ) = ˆ R ( i , j ) 1 − ˆ R ( i , j ) / M and th e value doe s not d epend on the particular i ∈ S . Set D : = d iag( γ j ) j ∈ T ∈ R | T |×| T | . Th en the su b matrix of the prac tical norm alized residual matrix satisfies ˜ R S , T = ( R − ˆ R ) S , T D = W + M D , where W : = ( R − R ) S , T and M : = ( R − ˆ R ) S , T . Step 3 . Three high-probability events. W e n ow construct three e vents whose probab ilities tend to 1 : 23 • Event E A – no zer o denomina tor . Define E A : = { γ j > 0 for all j ∈ T } . Fix a column j ∈ T . Recall that for any i ∈ S we have ˆ R ( i , j ) = ˆ Θ ( j , κ ) (a constant over S ) and ˆ V ( i , j ) = ˆ R ( i , j ) 1 − ˆ R ( i , j ) / M . Since the f unction h ( x ) = x (1 − x / M ) satisfies h ( x ) = 0 i ff x ∈ { 0 , M } , we obtain γ j = 1 q N ˆ V ( i , j ) > 0 ⇐ ⇒ ˆ Θ ( j , κ ) < { 0 , M } . Because ˆ Θ ( j , κ ) = 1 | S | P i ∈ S R ( i , j ) (th e samp le m ean of the | S | in d epende nt respo nses), the event { ˆ Θ ( j , κ ) = 0 } occurs on ly if every R ( i , j ) = 0 ; like wise { ˆ Θ ( j , κ ) = M } occurs only if every R ( i , j ) = M . Hence γ j = 0 (equiv alently ˆ Θ ( j , κ ) ∈ { 0 , M } ) is co ntained in the union of the two events “all re sp onses in item j on the set S are 0” an d “all a re M ”. Now we boun d the probability of each of these two extreme ev ents. Under Assumption 1 , for any tru e class k ∈ [ K ], we have Θ ( j , k ) / M ∈ [ δ , 1 − δ ]. Therefore, for any ind ividual i (regardless of its tr ue class), we have P R ( i , j ) = 0 = 1 − Θ ( j , ℓ ( i )) M M ≤ (1 − δ ) M , P R ( i , j ) = M = Θ ( j , ℓ ( i )) M M ≤ (1 − δ ) M . The responses { R ( i , j ) : i ∈ S } are mutually indepen dent. Consequently , we have P all R ( i , j ) = 0 , i ∈ S = Y i ∈ S P ( R ( i , j ) = 0) ≤ (1 − δ ) M | S | , P all R ( i , j ) = M , i ∈ S = Y i ∈ S P ( R ( i , j ) = M ) ≤ (1 − δ ) M | S | . Applying the union b ound g iv e s, condition a lly on the set S , P ( γ j = 0 | S ) ≤ 2 (1 − δ ) M | S | . From Lemm a 5 , when ev er K 0 < K , we h av e the determin istic lo wer boun d | S | ≥ 2 c 0 N K K 0 (it hold s for every realisation of ˆ Z ). Thus, we h ave (1 − δ ) M | S | ≤ (1 − δ ) 2 c 0 M N / ( K K 0 ) almost surely . T ak ing expe ctations an d using the fact that the boun d is non-random y ie ld s P ( γ j = 0 ) = E P ( γ j = 0 | S ) ≤ E 2(1 − δ ) 2 c 0 M N K K 0 = 2(1 − δ ) 2 c 0 M N K K 0 , Finally , applying the unio n boun d over the columns in T ( n ote that | T | ≤ J ) yields P ( E c A ) ≤ X j ∈ T P ( γ j = 0 ) ≤ 2 J (1 − δ ) 2 c 0 M N / ( K K 0 ) . (A.2) W e now pr ove tha t the right-hand side of Equation ( A.2 ) tend s to 0 . Set : = − ln (1 − δ ) > 0 (since 0 < δ ≤ 1 / 2 ). Then (1 − δ ) x = e − x and Equation ( A . 2 ) becomes P ( E c A ) ≤ 2 J exp − · 2 c 0 M N K K 0 . (A.3) Because K 0 < K , we hav e K K 0 ≤ K 2 and consequen tly N / ( K K 0 ) ≥ N / K 2 . Hence, we have P ( E c A ) ≤ 2 J exp − 2 c 0 M | {z } = : c ′ N K 2 . (A.4) 24 By Assumption 4 , f o r any positive con stant ǫ there exists an integer N 0 ( ǫ ) such that for all N ≥ N 0 ( ǫ ), K 2 log( N + J ) N ≤ 1 ǫ , or equivalently N K 2 ≥ ǫ log( N + J ) . (A.5) W e n ow fix a specific value of ǫ . Cho o se ǫ = 2 c ′ . Then for all su ffi ciently large N (say N ≥ N 0 (2 / c ′ )), we have N K 2 ≥ 2 c ′ log( N + J ) . (A.6) Substituting Equation ( A.6 ) in to Equ ation ( A.4 ) g ives P ( E c A ) ≤ 2 J exp − c ′ · N K 2 ≤ 2 J exp − c ′ · 2 c ′ log( N + J ) = 2 J ( N + J ) − 2 − − − − → N →∞ 0 , which giv es P ( E A ) → 1. • Event E B – estimated class mean sta y s away fr om boundaries. Let κ be the estimated c lass co ntaining S . By Lemma 5 , n κ : = | S | ≥ 2 c 0 N / ( K K 0 ) deterministically . For any j ∈ T , ˆ Θ ( j , κ ) = 1 n κ P i ∈ S R ( i , j ) is the av erage o f n κ indepen d ent Binomial( M , · ) variables, each bo unded in [0 , M ]. By Assumption 1 , E [ R ( i , j )] = Θ ( j , ℓ ( i )) ∈ [ δ M , (1 − δ ) M ] for all i ∈ S . Hence, we ha ve E [ ˆ Θ ( j , κ )] ∈ [ δ M , (1 − δ ) M ]. Hoe ff d ing’ s ineq u ality gives, for t = δ M / 2, P | ˆ Θ ( j , κ ) − E [ ˆ Θ ( j , κ )] | ≥ δ M 2 ≤ 2 exp − 2 n κ ( δ M / 2) 2 M 2 = 2 exp − n κ δ 2 2 . Since n κ ≥ 2 c 0 N / ( K K 0 ) ≥ 2 c 0 N / K 2 (because K 0 < K ) and the expon ential function is decreasing, the right-hand side is bound ed b y 2 exp − c 0 δ 2 N / K 2 . I f the d eviation is less than δ M / 2, then ˆ Θ ( j , κ ) ∈ [ δ M / 2 , (1 − δ/ 2) M ] because E [ ˆ Θ ( j , κ )] ∈ [ δ M , (1 − δ ) M ]. Define E B : = ˆ Θ ( j , κ ) ∈ [ δ M / 2 , (1 − δ/ 2 ) M ] for all j ∈ T . The e vent {∀ j : | ˆ Θ ( j , κ ) − E [ ˆ Θ ( j , κ )] | < δ M / 2 } is contain ed in E B . Th erefor e , we hav e P ( E c B ) ≤ P ∃ j : | ˆ Θ ( j , κ ) − E [ ˆ Θ ( j , κ )] | ≥ δ M / 2 ≤ | T | · 2 exp − c 0 δ 2 N K 2 ≤ 2 J exp − c 0 δ 2 N K 2 . Since K 2 = o ( N ) and J = o ( N ) by Assumption 4 , the rig ht-hand side tend s to 0; thus P ( E B ) → 1 . By Assumption 4 , we h av e N K 2 ≫ log N and J = o ( N ). Hence, fo r a ll su ffi ciently large N , we have J ≤ N . Therefo re, we hav e 2 J exp − c 0 δ 2 N K 2 ≤ 2 N exp − c 0 δ 2 N K 2 − − − − → N →∞ 0 . Consequently , we obtain P ( E c B ) → 0 and P ( E B ) → 1 . • Event E E – spectral norm bo und for the noise ma trix. Recall X = R − R ∈ R N × J . By Lemm a 6 , we hav e k X k = O P ( √ N + √ J ). Assum p tion 4 implies J / N → 0. Hen c e , we have k X k = O P ( √ N ). More concretely , from the proof of Lemm a 6 , we can take the constan t C X : = 2 . 5 √ M such that P k X k ≥ C X √ N → 0 , and consequen tly P k X k ≤ C X √ N → 1 . Define E E : = k X k ≤ C X √ N . Then P ( E E ) → 1 . 25 Step 4 . Beha v iour o n the inter section E C : = E A ∩ E B ∩ E E . Because each of E A , E B , E E has proba b ility tending to 1, P ( E C ) → 1 . On the event E C we h av e the following d e terministic bo unds. (i) Lower bo und f or γ j . On the event E A , γ j > 0 . Sin ce 0 ≤ ˆ R ( i , j ) ≤ M always imp lies ˆ V ( i , j ) ≤ M / 4, we have γ j = 1 q N ˆ V ( i , j ) ≥ 1 √ N · M / 4 = 2 √ N M ( ∀ j ∈ T ) . (ii) Upper bo und for k D k . On the event E B , ˆ Θ ( j , κ ) ∈ [ δ M / 2 , (1 − δ/ 2) M ] f or all j ∈ T . The function h ( x ) = x (1 − x / M ) satisfies h ( x ) ≥ h ( δ M / 2) = δ (2 − δ ) M / 4 ≥ δ 2 M / 4 ( sin ce δ ≤ 1 / 2) . He nce, we have ˆ V ( i , j ) ≥ δ 2 M / 4 and consequen tly γ j ≤ 1 p N · δ 2 M / 4 = 2 δ √ N M , k D k = m ax j ∈ T γ j ≤ 2 δ √ N M . (iii) Upper bound for k W D k . Because W is a subm a tr ix of X , k W k ≤ k X k . On the event E E , k X k ≤ C X √ N = 2 . 5 √ M N . Th us on E B ∩ E E , we h av e k W D k ≤ k W kk D k ≤ 2 . 5 √ M N · 2 δ √ N M = 5 δ = C noise . (iv) Lower bound for k M D k . As shown in the proo f of L emma 5 , for each co lumn j ∈ T we have the deterministic identity u ⊤ M : , j v j = γ 0 √ | T | | Θ ( j , k 1 ) − Θ ( j , k 2 ) | ≥ 0 , where M : , j denotes th e j -th column of M . Using (i) an d Equation ( A.1 ) g i ves u ⊤ ( M D ) v = X j ∈ T γ j ( u ⊤ M : , j ) v j ≥ min j ∈ T γ j u ⊤ M v ≥ 2 √ N M · c ζ √ N J √ K K 0 = C signal √ J √ K K 0 . Since k u k 2 = k v k 2 = 1 , th e variational characterization of th e spec tr al n orm y ields k M D k ≥ | u ⊤ ( M D ) v | ≥ C signal √ J √ K K 0 . Step 5 . Lower bound for k ˜ R S , T k and for σ 1 ( ˜ R ) . On th e event E C , th e triangle in equality gi ves k ˜ R S , T k = k ( M + W ) D k ≥ k M D k − k W D k ≥ C signal √ J √ K K 0 − C noise . Because ˜ R is an N × J matrix, σ 1 ( ˜ R ) = k ˜ R k ≥ k ˜ R S , T k . Hen ce o n the event E C , we h av e σ 1 ( ˜ R ) ≥ L N , L N : = C signal √ J √ K K 0 − C noise . Step 6 . Positive constant lower bo und f or T K 0 . Assum ption 6 asserts tha t f o r all su ffi cien tly large N , L N ≥ 1 + 3 η 0 . Assumption 4 gu arantees √ J / N → 0. Therefo re, there exists N 0 such that fo r all N ≥ N 0 , r J N ≤ η 0 . 26 Now for any N ≥ N 0 and on the event E C , we h av e T K 0 = σ 1 ( ˜ R ) − 1 + r J N ≥ L N − 1 + r J N ≥ (1 + 3 η 0 ) − (1 + η 0 ) = 2 η 0 . Thus, we h av e P T K 0 > 2 η 0 ≥ P ( E C ) − → 1 as N → ∞ . Step 7. Imp lication for the testing procedure. I f Algo rithm 1 uses a thr eshold sequence τ N with τ N → 0 (e.g. τ N = N − 1 / 5 ), then f or all su ffi cie n tly large N we hav e τ N < 2 η 0 . Con sequently , P T K 0 > τ N ≥ P T K 0 > 2 η 0 − → 1 . Hence, un der the alter n ativ e K 0 < K and Assumptio n 6 , th e test statistic exceeds the th reshold with p robab ility tend ing to o ne, lead ing to a corr e ct reje c tion of H 0 . This com pletes the pro of of Theor e m 2 . Append ix A.5. Pr o of of Theo r em 3 Pr oof. The proof is o rganized into fo ur parts: (I) Decom position of the er r or prob ability; (II) Contr ol of P ( C ) – the probab ility of n o t stopping at the tr ue K ; (III) Control of P ( B ) – the pr obability of stopp ing at an und er-s pecified K 0 ; (IV) Asym ptotic an alysis under Assumption 4 . Part 1: Events and error dec omposition. Defin e the events G A : = { ˆ K = K } , G B : = ∃ K 0 < K : T K 0 < τ N , G C : = { T K ≥ τ N } . If neither G B nor G C occurs, then the algor ithm does not stop at any K 0 < K (becau se T K 0 ≥ τ N for a ll such K 0 ) and it do es stop at K 0 = K (since T K < τ N ). Hence G c A ⊆ G B ∪ G C and consequen tly P ( G c A ) ≤ P ( G B ) + P ( G C ) . (A.7) Thus, it su ffi ces to prove P ( G B ) → 0 a nd P ( G C ) → 0 . Part 2: Control of P ( G C ) – correct specificat io n. When K 0 = K , ˜ R is the pr actical nor malized residual matrix an d R ∗ its ideal co unterpar t. By W eyl’ s inequ a lity , we h av e σ 1 ( ˜ R ) ≤ σ 1 ( R ∗ ) + k ˜ R − R ∗ k , and th e refore { T K ≥ τ N } = { σ 1 ( ˜ R ) ≥ 1 + p J / N + τ N } ⊆ { σ 1 ( R ∗ ) ≥ 1 + p J / N + τ N / 2 } ∪ {k ˜ R − R ∗ k ≥ τ N / 2 } . Bound fo r σ 1 ( R ∗ ) . From the proo f of Lemma 1 , we extract the following explicit rate. Set t N = s γ C η M log( N + J ) δ (1 − δ ) N , where γ > 2 is fixed and C η is the co nstant from Lem ma 3 . The pro of o f L e mma 1 shows that P σ 1 ( R ∗ ) ≥ ( 1 + η ) 1 + r J N + t N ≤ ( N + J ) 1 − γ . Since γ > 2 , th e right-hand side ten ds to 0. Let η be c lo se to zero, we get P σ 1 ( R ∗ ) > 1 + r J N + C R r log( N + J ) N − → 0 . 27 Condition (Con1 ) implies N τ 2 N log( N + J ) → ∞ . Hence, we h av e τ N 2 ≥ C R q log( N + J ) N holds for all su ffi ciently large N . For such N , we have the inclusion { σ 1 ( R ∗ ) ≥ 1 + p J / N + τ N / 2 } ⊆ { σ 1 ( R ∗ ) > 1 + p J / N + C R p log( N + J ) / N } , because th e lef t- hand threshold is larger than the right-han d thre sh old. Con sequently , we get P σ 1 ( R ∗ ) ≥ 1 + p J / N + τ N / 2 ≤ P σ 1 ( R ∗ ) > 1 + p J / N + C R p log( N + J ) / N − − − − → N →∞ 0 . Bound for k ˜ R − R ∗ k . Inspecting the pro of of L e mma 2 , there exists an event F N with P ( F c N ) → 0 and a constant C pert > 0 (d epending on ly on δ , c 0 , M ) such th at on F N , k ˜ R − R ∗ k ≤ C pert r J K lo g( J K ) N . Condition (Con1) a sserts th at τ N p J K log( J K ) / N → ∞ . Hence, there exists N ∗ such that fo r all N ≥ N ∗ , τ N 2 > C pert r J K lo g( J K ) N . On the event F N and for N ≥ N ∗ , the ab ove inequality tog e ther with the boun d o n k ˜ R − R ∗ k implies k ˜ R − R ∗ k < τ N / 2. Therefo re, we g et {k ˜ R − R ∗ k ≥ τ N / 2 } ⊆ F c N , and co nsequently P ( k ˜ R − R ∗ k ≥ τ N / 2) ≤ P ( F c N ) → 0 . Both probabilities in th e d ecompo sition ten d to zer o, hen ce we have P ( G C ) → 0 . Part 3: Control of P ( G B ) – under-specification. I f K = 1, th e set { K 0 < K } is emp ty and P ( G B ) = 0 trivially . Hence, we assume K ≥ 2. Fix an arbitrary K 0 with 1 ≤ K 0 < K . W e will derive an upper bo und for P ( T K 0 < τ N ) th at does not depend o n the pa r ticular K 0 . Step 1. Det erministic objects from Lemma 5 . Because K 0 < K , L emma 5 guarante e s the existence of two distinct true classes k 1 , k 2 ∈ [ K ], subsets S = S 1 ∪ S 2 ⊆ [ N ] with S 1 ⊆ C k 1 , S 2 ⊆ C k 2 , and a subset T ⊆ [ J ] satisfying the deterministic b ound s | S 1 | , | S 2 | ≥ c 0 N K K 0 , | S | ≥ 2 c 0 N K K 0 , | T | ≥ c 1 J . All subjects in S are assign ed by ˆ Z to the same estimated class κ ∈ [ K 0 ]. Define the scaling factors γ j : = 1 / q N ˆ V ( i , j ) for i ∈ S (c o nstant on S ), and set W : = ( R − R ) S , T , M : = ( R − ˆ R ) S , T , D : = diag( γ j ) j ∈ T . Then ˜ R S , T = ( W + M ) D . Step 2. Events and probability bo unds from Theorem 2 . In the proo f of Theorem 2 , th e following e ven ts are introdu c ed (with the same S , T , κ ): E A : = { γ j > 0 ∀ j ∈ T } , E B : = { ˆ Θ ( j , κ ) ∈ [ δ M / 2 , (1 − δ/ 2 ) M ] ∀ j ∈ T } , E E : = {k R − Rk ≤ 2 . 5 √ M N } . From that p roof, we h av e the expone n tial bound s (valid for all su ffi c ie n tly large N ): P ( E c A ) ≤ 2 J e − c A N / ( K K 0 ) , P ( E c B ) ≤ 2 J e − c B N / ( K K 0 ) , P ( E c E ) ≤ ( N + J ) e − c E N , (A.8) where c A : = 2 c 0 M ( − ln( 1 − δ )), c B : = c 0 δ 2 , and c E > 0 depends on ly on M an d the co nstant f rom Lemma 3 . Mor eover , on the event E A ∩ E B ∩ E E , th e an alysis in Th eorem 2 y ields th e determ inistic lower bo und T K 0 ≥ 2 η 0 , 28 where η 0 is the constant fr om Assump tion 6 . Step 3. From the event bound to a bo und on P ( T K 0 < τ N ) . Con dition ( Con1) e nsures τ N < 2 η 0 for all large N . Consequently , we have E A ∩ E B ∩ E E ⊆ { T K 0 ≥ 2 η 0 } ⊆ { T K 0 ≥ τ N } . T akin g com plements an d applying the unio n bou nd, we obtain P ( T K 0 < τ N ) ≤ P ( E c A ) + P ( E c B ) + P ( E c E ) . Step 4. Uniform bound independent o f K 0 . Since K 0 ≤ K − 1 and K ≥ 2, we have K K 0 ≤ K ( K − 1) ≤ K 2 , w h ich giv es N K K 0 ≥ N K ( K − 1) ≥ N K 2 . Therefo re, the exponential bou nds in Equatio n ( A.8 ) can be weakened to P ( E c A ) ≤ 2 J e − c A N / K 2 , P ( E c B ) ≤ 2 J e − c B N / K 2 . Note that the bound for P ( E c E ) in E quation ( A.8 ) already d oes not inv olve K 0 . Con sequently , f o r ev ery K 0 < K a n d all su ffi c iently large N , we have P ( T K 0 < τ N ) ≤ 2 J e − c A N / K 2 + 2 J e − c B N / K 2 + ( N + J ) e − c E N . Step 5 . Unio n bound over all K 0 < K . Because there are at most K − 1 candid ates with K 0 < K , we hav e P ( G B ) ≤ K − 1 X K 0 = 1 P ( T K 0 < τ N ) ≤ ( K − 1) h 2 J e − c A N / K 2 + 2 J e − c B N / K 2 + ( N + J ) e − c E N i . Part 4: Asymptotic analysis under Assumption 4 . • For the term 2( K − 1) J e − cN / K 2 (with c = c A or c B ): Assum ption 4 (i) states K 2 log( N + J ) N → 0 . T h is imp lies that for any fixed m 0 > 0 , th ere exists N 2 such that fo r all N ≥ N 2 , N K 2 ≥ m 0 log( N + J ) . Choose m 0 such that cm 0 > 1 . T hen, f or all N ≥ N 2 , we h av e e − cN / K 2 ≤ e − cm 0 log( N + J ) = ( N + J ) − cm 0 ≤ N − cm 0 . Assumption 4 (ii) gives J K N → 0. Hence, ther e exists N 3 such that for all N ≥ N 3 , J K ≤ N , which imp lies ( K − 1) J ≤ K J ≤ N . Th erefore , for all N ≥ max( N 2 , N 3 ), we hav e 2( K − 1 ) J e − cN / K 2 ≤ 2 N · N − cm 0 = 2 N 1 − cm 0 − − − − → N →∞ 0 , since 1 − cm 0 < 0 . • For the term ( K − 1)( N + J ) e − c E N : From K 2 = o ( N ) (a direct co nsequenc e of Assumption 4 (i)), we have K = o ( N 1 / 2 ), and certainly K = o ( N ). Henc e, there exists N 4 such that f o r all N ≥ N 4 , K ≤ N . Also, from J / N → 0 , we eventually h av e J ≤ N . Thu s f o r N ≥ N 4 , we have ( K − 1) ( N + J ) ≤ N · 2 N = 2 N 2 . Because e − c E N = o ( N − 2 ) (exponen tial decay do m inates any p olynom ial), we o btain ( K − 1)( N + J ) e − c E N − − − − → N →∞ 0 . All compon ents of P ( G B ) c o n verge to 0, therefo re P ( G B ) → 0 . Part 5 : Conclusion. W e have establishe d P ( G C ) → 0 and P ( G B ) → 0 . By decom position E q uation ( A.7 ), P ( G c A ) → 0 , i.e. P ( ˆ K = K ) → 1 . T h is co m pletes th e p roof of Theo rem 3 . 29 Append ix A.6. Pr o of of Theo r em 4 Pr oof. W e pr ove the two statements sepa rately . Throug h out th e pro of, we work u nder the given assum ptions, which allow K to grow with N subject to K 3 = o ( N ). P art 1: Diver genc e at the true model. Assume K 0 = K is corr ectly specified . W e fir st show that T K = o P (1). Fr o m Assumption 4 ( ii), we have J = o ( N ). Sin c e Assum ption 1 ho ld s, L emma 9 is applicable and yields k R ∗ k a . s . − − → 1, i.e. , k R ∗ k = 1 + o P (1). By Lemma 2 and Assum p tion 4 (ii), k ˜ R − R ∗ k = O P ( q J K log( J K ) N ) = o P (1). W eyl’ s inequality then implies | σ 1 ( ˜ R ) − σ 1 ( R ∗ ) | ≤ k ˜ R − R ∗ k = o P (1) , so σ 1 ( ˜ R ) = σ 1 ( R ∗ ) + o P (1) = 1 + o P (1). T herefor e, we g et T K = σ 1 ( ˜ R ) − 1 + r J N = o P (1) − r J N = o P (1) , because √ J / N → 0. Now consider K 0 = K − 1 < K . Assumption 6 (assumed to hold for e very K 0 < K ) ensures that Theorem 2 ap plies, giving a co nstant 2 η 0 > 0 such that P T K − 1 > 2 η 0 → 1 . Fix an arbitrary con stant M a > 0 . T K = o P (1) gives P ( | T K | < 2 η 0 / M a ) → 1 . Defin e two events A N : = { T K − 1 > 2 η 0 } , B N : = {| T K | < 2 η 0 / M a } . W e have P ( A N ∩ B N ) → 1 . On the event A N ∩ B N , we g et r K = | T K − 1 T K | > 2 η 0 2 η 0 / M a = M a . Thus, we g et P ( r K > M a ) ≥ P ( A N ∩ B N ) → 1. Because M a is arbitrary , we con c lude that r K P − → ∞ . P art 2: Bou ndedne ss und er under-fitting. Let c low be the c onstant fro m Lemma 7 . For each m with 1 ≤ m < K , Lemma 7 gu arantees the existence of an event E ( m ) N such that P ( E ( m ) N ) → 1 and on E ( m ) N , T m ≥ c low √ J √ K m . T o contro l the p robabilities of the c o mplemen ts, we recall the construc tion in the proo f of Th eorem 2 . F o r a given under-fitted can didate K 0 = m , the proo f of Th eorem 2 d efines thre e events E A ( m ), E B ( m ) an d E E (the latter does not depend on m ) such that E ( m ) N = E A ( m ) ∩ E B ( m ) ∩ E E . From the estimates obtained there (see the boun ds ( A.4 ) an d the subsequen t an a lysis), there exist positi ve constants c A , c B , c E depend ing only on the model p arameters δ , c 0 , M (and on the c o nstant fr om Lem ma 3 for c E ) su c h th at f or all su ffi ciently large N , P E c A ( m ) ≤ 2 J exp − c A N K m , P E c B ( m ) ≤ 2 J exp − c B N K m , P ( E c E ) ≤ ( N + J ) e − c E N . (These bounds are uniform in m because the co nstants c A , c B , c E are derived fr o m Assumptions 1 – 4 and do not in- volve m ; the factor K in th e denominator appe a rs be c ause the class sizes scale as N / K unif ormly , as established in Assumption 2 .) Using the u n ion b ound, P ( E ( m ) N ) c ≤ P ( E c A ( m )) + P ( E c B ( m )) + P ( E c E ) ≤ 2 J exp − c A N K m + 2 J exp − c B N K m + ( N + J ) e − c E N . 30 Since m ≤ K , we have 1 / ( K m ) ≥ 1 / K 2 , an d therefore exp − c A N K m ≤ exp − c A N K 2 , exp − c B N K m ≤ exp − c B N K 2 . Consequently , f o r all m < K , we have P ( E ( m ) N ) c ≤ 4 J exp − c A B N K 2 + ( N + J ) e − c E N , where we set c A B : = m in { c A , c B } > 0. Now define ˜ E N : = T K − 1 m = 1 E ( m ) N . Applying the union boun d giv es P ( ˜ E c N ) = P K − 1 [ m = 1 ( E ( m ) N ) c ≤ K − 1 X m = 1 P ( E ( m ) N ) c ≤ ( K − 1) h 4 J exp − c A B N K 2 + ( N + J ) e − c E N i . W e show that the right-hand side tends to zero . • First term. By Assumption 4 , we get K 2 log N ≪ N and log(( K − 1) J ) ≪ log N for su ffi ciently large N . Therefo re, we hav e log ( K − 1) J e − c AB N / K 2 ≪ log N − c A B N K 2 − → −∞ , so ( K − 1) J e − c AB N / K 2 = o (1). • Second term. Since K ≪ N 1 / 2 and J ≪ N fo r large N by Assumption 4 , we hav e ( K − 1) ( N + J ) e − c E N ≪ 2 N 1 . 5 e − c E N = o (1 ) , as the expo nential decay dom inates any p olynom ial g rowth. Thus, we have P ( ˜ E c N ) → 0, i. e., P ( ˜ E N ) → 1. On the ev ent ˜ E N , for every m with 1 ≤ m < K we have the lower bound T m ≥ c low √ J √ K m . I n par ticular, fo r any K 0 with 2 ≤ K 0 < K , we hav e T K 0 − 1 ≥ c low √ J √ K ( K 0 − 1) , T K 0 ≥ c low √ J √ K K 0 . Lemma 8 provid es th e deterministic u niversal b o und T m ≤ √ M J fo r ev ery m ≥ 1, which h olds with probab ility 1. Thus, o n ˜ E N , we h av e r K 0 = T K 0 − 1 T K 0 ≤ √ M J c low √ J √ K K 0 = √ M c low √ K K 0 ≤ √ M c low √ K ( K − 1) . Therefo re, we obtain P r K 0 > √ M c low √ K ( K − 1) ≤ P ( ˜ E c N ) − → 0 , which co mpletes the p roof. Append ix A.7. Pr o of of Theo r em 5 Pr oof. W e trea t the two po ssibilities K = 1 and K ≥ 2 separately . T hroug hout the pro of, all constants m ay dep end on the fixed number K a nd on the mode l parame te r s δ , c 0 , c 1 , M , ζ , but never on N or J . Part 1: Case K = 1 Under the null h ypothe sis K = 1, Th eorem 1 gives T 1 = o P (1). A closer in spection of th e p roof (combin ing Le m ma 2 and Lemma 1 ) shows that th ere exist a constant ˜ C 1 > 0 and an ev e n t F (1) N with P ( F (1) N ) → 1 such that o n F (1) N , | T 1 | ≤ ˜ C 1 r J K log( J K ) N . (A.9) 31 Because K is fixed, the righ t-hand side is O p J log J / N . Condition (Con1) guarantee s that for all su ffi ciently large N , we have τ N ≥ ˜ C 1 p J K log( J K ) / N on a set whose probability ten ds to one . Con sequently , we get P T 1 < τ N ≥ P F (1) N − o (1) − → 1 . Thus the alg orithm returns ˆ K = 1 with probab ility tending to on e. Part 2: Case K ≥ 2 W e mu st show two facts: (i) the a lgorithm does not stop at any u nder-fitted ca n didate K 0 < K ; (ii) it does stop at the tru e ca n didate K 0 = K . (i) No sto p a t under-fit ted ca ndidates. • Candidate K 0 = 1 . Becau se K ≥ 2, th e can didate 1 is und e r-fitted. Th eorem 2 (applied with K 0 = 1 ) provides a constant 2 η 0 > 0 such that P ( T 1 > 2 η 0 ) → 1 . By Cond ition (Con1) , we have P ( T 1 < τ N ) ≤ P ( T 1 ≤ 2 η 0 ) → 0 . Hence the alg orithm does n ot stop at K 0 = 1 . • Candidates 2 ≤ K 0 < K . Fix any such K 0 . Because K is fixed, the con dition K 3 = o ( N ) in Th eorem 4 is trivially satisfied. Hence p a r t (b) of Theor e m 4 applies, and we ob tain lim N →∞ P r K 0 > √ M c low √ K ( K − 1) = 0 . (A.10) Since γ N → ∞ b y Condition ( Co n 2), for all su ffi ciently large N we have γ N > √ M c low √ K ( K − 1). Th erefore, P r K 0 > γ N ≤ P r K 0 > √ M c low √ K ( K − 1) + 1 { γ N ≤ √ M c low √ K ( K − 1) } − → 0 . Consequently , with pro bability tending to o ne, th e ratio condition fails for every su c h K 0 . Thus, th e algor ithm does not stop at any of th em. (ii) Stop at the true candidate K 0 = K . W e now prove that P ( r K > γ N ) → 1 . From Lemma 7 applied with K 0 = K − 1, there exists an event A N with P ( A N ) → 1 such that on A N , T K − 1 ≥ c low √ J √ K ( K − 1) > 0 . (A.11) From th e pro of of Theorem 1 (specifically Lemm a 2 and W eyl’ s inequa lity ), there exists a constant ˜ C 2 > 0 and an ev ent B N with P ( B N ) → 1 such that on B N , | T K | ≤ ˜ C 2 r J K lo g( J K ) N . (A.12) Define ˘ E N : = A N ∩ B N . Since both A N and B N have pro bability tending to one , P ( ˘ E N ) → 1. On ˘ E N , com b ining Equation s ( A.11 ) and ( A.12 ) y ields r K = T K − 1 | T K | ≥ c low √ J √ K ( K − 1) ˜ C 2 r J K log( J K ) N = c low ˜ C 2 ( K − 1) K √ N p log( J K ) . (A.13) 32 Denote ˜ C 3 : = c low ˜ C 2 ( K − 1) K > 0 . T hen on ˘ E N , we h av e r K ≥ ˜ C 3 √ N p log( J K ) . (A.14) Condition (Con2) states that γ N = o q N log J . G iven that K is fixed, in particu lar , there exists an integer N 0 such that f or all N ≥ N 0 , ˜ C 3 √ N p log( J K ) > γ N . Consequently , o n the event ˘ E N and for all N ≥ N 0 , inequality ( A.14 ) yield s r K > γ N . Hence, we obtain P r K > γ N ≥ P ( ˘ E N ) − → 1 . W e ha ve shown that with pro b ability ten ding to one the alg orithm does not stop at any K 0 < K and does stop at K 0 = K . Because the algorithm sequ entially examin es can didates in increasing ord er , this implies P ( ˆ K = K ) → 1. Thus the th eorem ho lds fo r both K = 1 and K ≥ 2. Ap pendix B. T echnical lemmas Lemma 4 (Consistency of e stima ted para meters under the null) . Under H 0 with th e true K = K 0 , sup pose that Assumptions 1 , 2 , 4 , an d 5 ho ld . Let ˆ Z , ˆ Θ , ˆ R be the estimates fr om Algorith m 1 step 1 . Then ther e e xists a p ermutation matrix Π such that, with p r obab ility tending to 1 , 1. ˆ Z = Z Π , 2. k ˆ Θ − ΘΠ k F = O P q J K 2 N , 3. k ˆ R − Rk F = O P √ J K , 4. max i , j | ˆ R ( i , j ) − R ( i , j ) | = O P q K log( J K ) N ! , 5. Ther e exists a constan t v min > 0 ( specifically , v min = δ 2 M 4 ) such that with hig h pr obability , ˆ V ( i , j ) ≥ v min for all i , j. Lemma 5 (Lower boun d on stru ctural re sidual under under -fitting) . Assume K 0 < K , an d Assumptions 2 and 3 hold . Then ther e e xist constants c > 0 and disjoint sets S ⊂ [ N ] , T ⊂ [ J ] , with | S | ≍ N / K , | T | ≥ c 1 J , su ch that for any estimated classification ma trix ˆ Z (a nd the corres pondin g ˆ Θ , ˆ R ) we have deterministically ( R − ˆ R ) S , T ≥ c ζ √ N J √ K K 0 , (B.1) wher e the c onstant c = q c 3 0 c 1 2 √ 2 depend s only on c 0 (Assumption 2 ) a nd c 1 (Assumption 3 ), and is ind ependen t o f N , J , K , K 0 . Lemma 6 ( Control o f th e r esidual matrix) . Let X = R − R ∈ R N × J , we ha ve k X k = O P √ N + √ J . (B.2) Consequently , for an y sub matrix W = ( R − R ) S , T with S ⊆ [ N ] , T ⊆ [ J ] , we h ave k W k = O P √ N + √ J . 33 Lemma 7 (Lower bo u nd for T K 0 under und er-fitting) . Assume K 0 < K and that Assumptions 1 – 4 , and 6 hold. Su ppose addition ally th at K 3 = o ( N ) as N → ∞ . Let the constants c , ζ , M , C noise , C signal , η 0 be as defi ned in Theo r em 2 . Then ther e exists a con stant c low : = 2 η 0 C signal C noise + 1 + 3 η 0 such that for every K 0 < K , P T K 0 ≥ c low √ J √ K K 0 − → 1 as N → ∞ . (B.3) Lemma 8 (Determ inistic uppe r bou nd fo r T K 0 ) . F or any ca ndidate n u mber of latent classes K 0 (includin g the true one) a nd fo r every r ea lization of th e data and the estimation pr ocedu r e, we have T K 0 ≤ √ M J , wher e M is the ma x imum r esponse cate gory intr o duced in Defi nition 1 . Consequently , P T K 0 ≤ √ M J = 1 for all N . Lemma 9 ( Lower bound f or k R ∗ k under J = o ( N )) . Supp ose that Assumption 1 holds a nd J = o ( N ) , we h ave k R ∗ k = k Y N k √ N a . s . − − → 1 . In particular , for any ε > 0 , lim N →∞ P k R ∗ k ≥ 1 − ε = 1 , and σ 1 ( R ∗ ) = k R ∗ k = 1 + o P (1) . Append ix B.1. Pr o of of Lemma 4 Pr oof. Thro ugho ut, we deno te [ n ] = { 1 , 2 , . . . , n } fo r any positiv e integer n . Part 1: Perfect recovery o f class a ssignments. By Assumption 5 , when the tru e n umber of latent classes is K (i.e., K 0 = K ), the classification estimation method M u sed in Algo rithm 1 Step 1 is consistent. That is, there exists a permu tation matrix Π such that P ˆ Z = Z Π → 1 as N → ∞ . This p roves statement 1 d irectly fr o m the assump tion. Part 2: Frobenius norm error of ˆ Θ . W e no w c o ndition on the h igh-pr obability event E (1) N = { ˆ Z = Z Π } . By the construction o f the estimato r in Algorithm 1 Step 1 (wh ich implements m ethod M ), the item param eter matr ix is estimated a s: ˆ Θ = R ⊤ ˆ Z ( ˆ Z ⊤ ˆ Z ) − 1 = R ⊤ Z Π ( Π ⊤ Z ⊤ Z Π ) − 1 = R ⊤ Z ( Z ⊤ Z ) − 1 Π , where we u sed th e facts that for a p ermutation m atrix Π , Π − 1 = Π ⊤ , an d ( Π ⊤ Z ⊤ Z Π ) − 1 = Π ⊤ ( Z ⊤ Z ) − 1 Π . The true item par ameter m atrix Θ satisfies R = Z Θ ⊤ , where R = E [ R ]. Since R = E [ R ] = Z Θ ⊤ , we have: Θ = R ⊤ Z ( Z ⊤ Z ) − 1 . Therefo re, ˆ Θ − ΘΠ = h R ⊤ Z ( Z ⊤ Z ) − 1 − R ⊤ Z ( Z ⊤ Z ) − 1 i Π = ( R − R ) ⊤ Z ( Z ⊤ Z ) − 1 Π . Since Π is an orthogo nal m atrix (per mutation matrices are o rthogo nal), we h av e k Π k = 1 (spectr al no r m) and k A Π k F = k A k F for any matrix A . Thus, we have k ˆ Θ − ΘΠ k F = k ( R − R ) ⊤ Z ( Z ⊤ Z ) − 1 k F . 34 Define W = R − R . The entries W ( i , j ) are in depend ent across i and j (co nditional on the laten t classes, and also uncon d itionally since th e laten t classes a r e fixed) . Moreover, E [ W ( i , j )] = 0, and by the b inomial variance formula: V ar( W ( i , j )) = R ( i , j ) 1 − R ( i , j ) M ! ≤ M 4 , where the inequality h olds because th e fun ction h ( x ) = x (1 − x / M ) attains its max imum M / 4 a t x = M / 2, and R ( i , j ) ∈ [0 , M ]. Let D = ( Z ⊤ Z ) − 1 = d iag( N − 1 1 , N − 1 2 , . . . , N − 1 K ). Define A = W ⊤ Z D 1 / 2 ∈ R J × K , where D 1 / 2 = d iag( N − 1 / 2 1 , N − 1 / 2 2 , . . . , N − 1 / 2 K ). Then: k W ⊤ Z D k 2 F = k A D 1 / 2 k 2 F = tr D 1 / 2 A ⊤ A D 1 / 2 = tr A ⊤ A D = K X k = 1 1 N k k A : k k 2 2 , where A : k denotes th e k -th colu mn of A . Now , observe that: A : k = 1 √ N k X i ∈C k W ( i , :) ⊤ , where W ( i , :) = ( W ( i , 1) , . . . , W ( i , J )) ⊤ . For each j ∈ [ J ], the j -th comp o nent of A : k is: A : k ( j ) = 1 √ N k X i ∈C k W ( i , j ) . For fixed k and j , the summ ands { W ( i , j ) : i ∈ C k } are ind ependen t, zero- mean r andom variables, bounded by | W ( i , j ) | ≤ M (since R ( i , j ) ∈ { 0 , 1 , . . . , M } and R ( i , j ) ∈ [0 , M ]). The ir variances satisfy V ar( W ( i , j )) ≤ M / 4 as noted above. W e now com pute th e expected squared Euclidean nor m of A : k : E h k A : k k 2 2 i = J X j = 1 E h A : k ( j ) 2 i = J X j = 1 V ar ( A : k ( j ) ) (since E [ A : k ( j )] = 0) = J X j = 1 1 N k X i ∈C k V ar( W ( i , j )) (by indepen dence across i ) ≤ J X j = 1 1 N k X i ∈C k M 4 = M 4 · 1 N k · N k · J = M J 4 . Thus, E [ k A : k k 2 2 ] ≤ M J 4 for each k ∈ [ K ]. By Assumption 2 , there exists c 0 > 0 such that N k ≥ c 0 N / K for all k . Ther efore, we have E h k ˆ Θ − ΘΠ k 2 F i = E K X k = 1 1 N k k A : k k 2 2 = K X k = 1 1 N k E h k A : k k 2 2 i ≤ K X k = 1 1 c 0 N / K · M J 4 = M K 2 J 4 c 0 N . Now , by Markov’ s inequality , for any ǫ > 0, we h ave P k ˆ Θ − ΘΠ k F ≥ ǫ ≤ E h k ˆ Θ − ΘΠ k 2 F i ǫ 2 ≤ M K 2 J 4 c 0 N ǫ 2 . For any fixed η > 0, choose ǫ η = q M K 2 J 4 c 0 N η . Then, we have P k ˆ Θ − ΘΠ k F ≥ ǫ η ≤ η. 35 Hence, by d e fin ition, we o btain k ˆ Θ − ΘΠ k F = O P r J K 2 N . This proves statem ent 2 . Part 3: Frobenius norm error of ˆ R . Conditionin g again o n the event E (1) N where ˆ Z = Z Π (wh ich occurs with probab ility tending to 1) , we have ˆ R = ˆ Z ˆ Θ ⊤ = Z Π ˆ Θ ⊤ = Z ( ˆ ΘΠ − 1 ) ⊤ = Z ( ˆ ΘΠ ⊤ ) ⊤ . Giv e n that the true expected response matrix is R = Z Θ ⊤ , we h av e ˆ R − R = Z ( ˆ ΘΠ ⊤ ) ⊤ − Z Θ ⊤ = Z ( ˆ ΘΠ ⊤ ) ⊤ − Θ ⊤ = Z ˆ ΘΠ ⊤ − Θ ⊤ . Because k ˆ ΘΠ ⊤ − Θ k F = k ˆ Θ − ΘΠ k F , we h av e k ˆ R − Rk F = k Z ˆ ΘΠ ⊤ − Θ ⊤ k F ≤ k Z k · k ˆ ΘΠ ⊤ − Θ k F = k Z k · k ˆ Θ − ΘΠ k F . The m a trix Z has orthon ormal column s up to scaling : Z ⊤ Z = d iag( N 1 , . . . , N K ). Thus, its spectral n o rm is k Z k = √ λ max ( Z ⊤ Z ) = √ N max . By Assum ption 2 , N max ≤ 1 c 0 N K , so k Z k ≤ q N c 0 K . Combining this with statem ent 2, we obtain : k ˆ R − Rk F ≤ r N c 0 K · O P r J K 2 N = O P r J c 0 · K = O P √ J K . This p roves statement 3. Part 4: Unif orm e ntrywise error of ˆ R . W e begin b y recallin g that on the hig h -prob ability event E (1) N = { ˆ Z = Z Π } (which satisfies P ( E (1) N ) → 1), we have ˆ R ( i , j ) = ˆ Θ ( j , π ( k )) wh ere k = ℓ ( i ) and π is the perm utation co rrespon ding to Π . Moreover , by the construc tio n of th e estimator in Algor ith m 1 Step 1, ˆ Θ ( j , π ( k )) is precisely the sample mean of the respon ses to item j over all su bjects in the true class k , i.e., ˆ Θ ( j , π ( k )) = 1 N k X i ∈C k R ( i , j ) , where C k = { i : ℓ ( i ) = k } an d N k = | C k | . Fix an arbitr ary pair ( j , k ) with j ∈ [ J ] and k ∈ [ K ]. Conditio n o n th e latent class assignm ent ℓ ( which is fixed in our model) and on E (1) N . Under the d ata-gener ating mechan ism in the LCM model, th e rand om variables { R ( i , j ) : i ∈ C k } are m u tually ind e p endent (by cond itional inde penden c e given ℓ , and unco nditiona lly since ℓ is fixed) a nd e a ch fo llows a bino m ial d istribution with parameters M and Θ ( j , k ) / M . Conseq uently , each R ( i , j ) is boun ded within the interval [0 , M ] and has mean Θ ( j , k ). W e now apply Hoe ff d ing’ s in equality in its p recise form. Let Y 1 , . . . , Y n be independent random variables such that a i ≤ Y i ≤ b i almost surely . Then for any t > 0, Hoe ff ding ’ s in equality gi ves P 1 n n X i = 1 Y i − E h 1 n n X i = 1 Y i i ≥ t ≤ 2 exp − 2 n 2 t 2 P n i = 1 ( b i − a i ) 2 . In o ur settin g , for the N k variables { R ( i , j ) : i ∈ C k } , we h av e a i = 0 , b i = M , an d P N k i = 1 ( b i − a i ) 2 = N k M 2 . Th erefor e , we g et P ˆ Θ ( j , π ( k )) − Θ ( j , k ) ≥ t E (1) N ≤ 2 exp − 2 N k t 2 M 2 . By Assum p tion 2 , there exists a constant c 0 > 0 such that N k ≥ c 0 N / K fo r all k . Hence, u n iformly in j , k , we have 36 P ˆ Θ ( j , π ( k )) − Θ ( j , k ) ≥ t E (1) N ≤ 2 exp − 2 c 0 N t 2 K M 2 . Now observe tha t on E (1) N , f or any i ∈ C k , ˆ R ( i , j ) = ˆ Θ ( j , π ( k )) and R ( i , j ) = Θ ( j , k ). Ther efore, we have max i ∈ [ N ] , j ∈ [ J ] ˆ R ( i , j ) − R ( i , j ) = max k ∈ [ K ] , j ∈ [ J ] ˆ Θ ( j , π ( k )) − Θ ( j , k ) . W e contro l this maximu m via a union b ound over the J K distinc t p airs ( j , k ): P max i , j ˆ R ( i , j ) − R ( i , j ) ≥ t E (1) N ! = P max j , k ˆ Θ ( j , π ( k )) − Θ ( j , k ) ≥ t E (1) N ! ≤ J X j = 1 K X k = 1 P ˆ Θ ( j , π ( k )) − Θ ( j , k ) ≥ t E (1) N ≤ J K · 2 exp − 2 c 0 N t 2 K M 2 = 2 J K exp − 2 c 0 N t 2 K M 2 . For any fixed η > 0, let t = M s K 2 c 0 N log 4 J K η ! . Then, we hav e 2 J K exp − 2 c 0 N t 2 K M 2 ! = 2 J K · η 4 J K = η/ 2 , which g i ves P max i , j ˆ R ( i , j ) − R ( i , j ) ≥ M s K 2 c 0 N log 2 J K η ! E (1) N ≤ η/ 2 . Now we remove the c o nditionin g on E (1) N . Using the law of total pr obability , for any t > 0, we hav e P max i , j ˆ R − R ≥ t ! ≤ P ( E (1) N ) c + P max i , j ˆ R − R ≥ t E (1) N ! . Since P (( E (1) N ) c ) → 0, there exists N 0 such that fo r all N ≥ N 0 , P (( E (1) N ) c ) ≤ η/ 2. Then, for N ≥ N 0 , we h av e P max i , j ˆ R − R ≥ M s K 2 c 0 N log 4 J K η ! ≤ η, which m e ans max i , j ˆ R ( i , j ) − R ( i , j ) = O P r K log( J K ) N . Part 5 : Lower bound on ˆ V ( i , j ) . Recall that ˆ V ( i , j ) = ˆ R ( i , j ) 1 − ˆ R ( i , j ) / M . Defin e th e function h ( x ) = x 1 − x M , x ∈ [0 , M ] . Under Assumption 1 , we hav e R ( i , j ) = Θ ( j , ℓ ( i )) ∈ [ δ M , (1 − δ ) M ]. By statement 4, with pro bability tendin g to 1, for all i , j , we have ˆ R ( i , j ) − R ( i , j ) ≤ ǫ N , 37 where ǫ N = C q K log( J K ) N for some co nstant C > 0. Under Assumption 4 , ǫ N → 0. Hence, for su ffi cien tly large N , we have ǫ N ≤ δ M / 2. Consequently , ˆ R ( i , j ) ∈ [ δ M − ǫ N , (1 − δ ) M + ǫ N ] ⊆ δ M 2 , 1 − δ 2 M . Because h is a co ncave fun ction ( since h ′′ ( x ) = − 2 / M < 0), on the inter val [ δ M / 2 , (1 − δ/ 2) M ], its minim um is attained at o n e o f the en dpoints. Comp ute h δ M 2 = δ M 2 1 − δ 2 = δ (2 − δ ) M 4 , h 1 − δ 2 M = 1 − δ 2 M · δ 2 = δ (2 − δ ) M 4 . Thus, both e n dpoin ts yield the same value. Theref ore, for any x ∈ [ δ M / 2 , (1 − δ/ 2) M ], we h av e h ( x ) ≥ δ (2 − δ ) M 4 . Since δ ∈ (0 , 1 / 2 ] , we have 2 − δ ≥ δ , and hence δ (2 − δ ) M 4 ≥ δ 2 M 4 . Thus, with p robab ility tending to 1, for all i , j , we h av e ˆ V ( i , j ) ≥ δ 2 M 4 . W e can therefo r e set v min = δ 2 M 4 . Th is proves statemen t 5. Append ix B.2. Pr o of of Lemma 5 Pr oof. The argument is completely deterministic and uses only elementary cou nting, the pigeonho le p rinciple, and basic linear algebra. No p robabilistic statement is required u ntil the fin al co nclusion. The proof proce eds in twelve self-contained steps. Step 1. A large sub-block inside each true class. For each true class k ∈ [ K ] and e a ch estimated class κ ∈ [ K 0 ] define N k κ : = i ∈ C k : ˆ Z ( i , κ ) = 1 . Sure, we have P K 0 κ = 1 N k κ = N k . By the averaging princip le (a direct consequ e n ce of the p igeonh o le princip le), th e r e exists at least one estimated class κ ( k ) ∈ [ K 0 ] such that N k ,κ ( k ) ≥ N k K 0 . Assumption 2 gi ves N k ≥ c 0 N / K . Hence, we have N k ,κ ( k ) ≥ c 0 N K K 0 . Step 2. T wo true classes mapped to t he same estimated class. For each k ∈ [ K ], Step 1 giv es a non-empty set A k : = { κ ∈ [ K 0 ] : N k ,κ ≥ N k / K 0 } . Choose an arbitrary elem ent κ ( k ) ∈ A k and this defines a function κ : [ K ] → [ K 0 ]. Because K 0 < K , κ can not be injectiv e. Hence, by pigeonh ole pr in ciple, there exist distinct k 1 , k 2 ∈ [ K ] with κ ( k 1 ) = κ ( k 2 ) = : κ . These two true classes are the r efore merged, at least p a rtially , into the same estimated class κ . Step 3 . Co nstruction of t he row index set S . Define S 1 : = i ∈ C k 1 : ˆ Z ( i , κ ) = 1 , S 2 : = i ∈ C k 2 : ˆ Z ( i , κ ) = 1 , S : = S 1 ∪ S 2 . 38 From Step 1 we have the determin istic lower bou nds | S 1 | = N k 1 ,κ ≥ c 0 N K K 0 , | S 2 | = N k 2 ,κ ≥ c 0 N K K 0 . Using the u p per b ound in Assum ption 2 ( N k ≤ N / ( c 0 K )) and the fact tha t S 1 ⊆ C k 1 , S 2 ⊆ C k 2 , we o btain | S 1 | ≤ N k 1 ≤ N c 0 K , | S 2 | ≤ N k 2 ≤ N c 0 K , and co nsequently | S | = | S 1 | + | S 2 | ≤ 2 N c 0 K . Thus, we h av e 2 c 0 N K K 0 ≤ | S | ≤ 2 N c 0 K . Step 4 . Co nstruction of t he co lumn index set T . For the pair ( k 1 , k 2 ), we inv o ke Assumption 3 . Define T : = j ∈ [ J ] : | Θ ( j , k 1 ) − Θ ( j , k 2 ) | ≥ ζ / 2 . Assumption 3 asserts that | T | ≥ c 1 J . T r i vially , we have | T | ≤ J . Step 5. The residual submatrix M : = ( R − ˆ R ) S , T . Because every individual in S is assigned to th e same estimated class κ , the fitted expec ta tio n is constant on S × T : ˆ R ( i , j ) = ˆ Θ ( j , κ ) = : ˆ θ ( j ) , ∀ i ∈ S , j ∈ T . For the true expectation, we have R ( i , j ) = Θ ( j , k 1 ) , i ∈ S 1 , Θ ( j , k 2 ) , i ∈ S 2 . Hence, M admits the block representatio n M = " M 1 M 2 # , M 1 ( i , j ) = Θ ( j , k 1 ) − ˆ θ ( j ) ( i ∈ S 1 ) , M 2 ( i , j ) = Θ ( j , k 2 ) − ˆ θ ( j ) ( i ∈ S 2 ) . Step 6. Construction of two unit vecto rs. W e now exhibit sp ecific unit vector s u ∈ R | S | and v ∈ R | T | that yield a large value of | u ⊤ M v | , thereby provid ing a lower bound for k M k via the variational characterizatio n k M k = max k u k = k v k = 1 | u ⊤ M v | . Column vector v . For each j ∈ T , set s j : = sign Θ ( j , k 1 ) − Θ ( j , k 2 ) ∈ {− 1 , 1 } , v j : = s j √ | T | . Then, we have k v k 2 2 = P j ∈ T 1 / | T | = 1. The signs convert the absolute di ff erences | Θ ( j , k 1 ) − Θ ( j , k 2 ) | into a sum of non-negati ve ter ms. Row vector u. Recall that every r ow of M 1 equals Θ ( j , k 1 ) − ˆ θ ( j ) and ev ery row of M 2 equals Θ ( j , k 2 ) − ˆ θ ( j ). T o eliminate the unk n own co mmon part ˆ θ ( j ), we assign p ositi ve weights to r ows in S 1 and negative weigh ts to ro ws in S 2 , with e qual total m ass in abso lu te value. Define the b a lancing coe ffi cients α : = √ | S 2 | √ | S 1 | + | S 2 | , β : = √ | S 1 | √ | S 1 | + | S 2 | , and set u i : = α/ √ | S 1 | , i ∈ S 1 , − β/ √ | S 2 | , i ∈ S 2 . 39 A direct computation gives k u k 2 2 = X i ∈ S 1 α 2 | S 1 | + X i ∈ S 2 β 2 | S 2 | = α 2 + β 2 = 1 , so u is ind eed a unit vector . The opp osing signs and the specific c hoice of α, β gu arantee th at th e tota l weig ht on S 1 , α p | S 1 | = s | S 1 || S 2 | | S 1 | + | S 2 | , is exactly the opposite o f th e to tal weight on S 2 , − β p | S 2 | = − s | S 1 || S 2 | | S 1 | + | S 2 | , which fo rces the can cellation o f ˆ θ ( j ) wh en we fo rm u ⊤ M v in the n ext step. Step 7 . Exa ct evaluation of u ⊤ M v . Ex pandin g the bilin ear for m u ⊤ M v giv es u ⊤ M v = X i ∈ S 1 X j ∈ T α √ | S 1 | Θ ( j , k 1 ) − ˆ θ ( j ) s j √ | T | + X i ∈ S 2 X j ∈ T − β √ | S 2 | Θ ( j , k 2 ) − ˆ θ ( j ) s j √ | T | = α p | S 1 | 1 √ | T | X j ∈ T Θ ( j , k 1 ) − ˆ θ ( j ) s j − β p | S 2 | 1 √ | T | X j ∈ T Θ ( j , k 2 ) − ˆ θ ( j ) s j . Observe that α p | S 1 | = √ | S 2 | √ | S 1 | + | S 2 | p | S 1 | = s | S 1 || S 2 | | S 1 | + | S 2 | = : γ , β p | S 2 | = √ | S 1 | √ | S 1 | + | S 2 | p | S 2 | = γ . Therefo re, we hav e u ⊤ M v = γ √ | T | X j ∈ T h Θ ( j , k 1 ) − ˆ θ ( j ) − Θ ( j , k 2 ) − ˆ θ ( j ) i s j = γ √ | T | X j ∈ T Θ ( j , k 1 ) − Θ ( j , k 2 ) s j . By the definition o f s j , we h av e ( Θ ( j , k 1 ) − Θ ( j , k 2 )) s j = | Θ ( j , k 1 ) − Θ ( j , k 2 ) | ≥ 0, which giv es u ⊤ M v = γ √ | T | X j ∈ T | Θ ( j , k 1 ) − Θ ( j , k 2 ) | . Step 8. Lower bound using the separatio n condition. For every j ∈ T , Assumption 3 gu arantees | Θ ( j , k 1 ) − Θ ( j , k 2 ) | ≥ ζ / 2. Thus, we have u ⊤ M v ≥ γ √ | T | · | T | · ζ 2 = γ p | T | ζ 2 . Step 9. T ra nsfer to the spectr a l norm. Since k u k 2 = k v k 2 = 1, th e variational character ization of the spectral norm giv es k M k ≥ | u ⊤ M v | ≥ γ p | T | ζ 2 . Step 1 0. Quantitative lower bound for γ . From th e size estimates in Step 3, we hav e | S 1 | , | S 2 | ≥ c 0 N K K 0 , | S 1 | + | S 2 | ≤ 2 N c 0 K . 40 Consequently , we get γ = s | S 1 || S 2 | | S 1 | + | S 2 | ≥ s c 0 N / ( K K 0 ) 2 2 N / ( c 0 K ) = s c 3 0 N 2 K K 2 0 . Step 1 1. Combining the lower bo unds. By | T | ≥ c 1 J obtain e d fro m Step 4 and the estimate fo r γ , we get k M k ≥ ζ 2 s c 3 0 N 2 K K 2 0 p c 1 J = ζ 2 s c 3 0 c 1 N J 2 K K 2 0 = q c 3 0 c 1 2 √ 2 ζ √ N J √ K K 0 . Define the co nstant c : = q c 3 0 c 1 2 √ 2 . Then, we hav e established the deterministic inequality k ( R − ˆ R ) S , T k ≥ c ζ √ N J √ K K 0 . Step 1 2. Probability sta t ement. The construction of S , T and the su bsequent algeb raic estimates depend o nly on the fixed ˆ Z and the mod e l parameters. They hold for every possible realization o f ˆ Z whe n ev er K 0 < K . Hence, regard less of the d istribution of ˆ Z , P k ( R − ˆ R ) S , T k ≥ c ζ √ N J √ K K 0 = 1 . In p articular, the co nclusion hold s with pro b ability tending to 1 (indeed with prob ability 1) for any classifier M . Append ix B.3. Pr o of of Lemma 6 Pr oof. W e prove this lemma by six steps. Step 1. Basic estimat e s. By the co nstruction of X , we have E [ X ( i , j )] = 0 for all i ∈ [ N ] , j ∈ [ J ], an d the en tr ies { X ( i , j ) } are mutually independ ent. Recall that R ( i , j ) ∈ { 0 , 1 , . . . , M } and R ( i , j ) ∈ [0 , M ], we have | X ( i , j ) | ≤ M . Recall that V ar( R ( i , j )) = R ( i , j ) 1 − R ( i , j ) M ≤ ma x x ∈ [0 , M ] x 1 − x M = M 4 , and b ecause E [ X ( i , j )] = 0 , we h av e E [ X ( i , j ) 2 ] = V ar( R ( i , j )) ≤ M / 4. Step 2. Exact quantities required in Lemma 3 . Define σ ∗ 1 : = m ax i ∈ [ N ] v u t J X j = 1 E [ X ( i , j ) 2 ] , σ ∗ 2 : = m ax j ∈ [ J ] v u t N X i = 1 E [ X ( i , j ) 2 ] , σ ∗ ∗ : = m ax i , j k X ( i , j ) k ∞ . From the b ound s obtained in Step 1 , we o btain d eterministic up per b ounds for these qu antities: σ ∗ 1 ≤ r J · M 4 = √ M J 2 , σ ∗ 2 ≤ r N · M 4 = √ M N 2 , σ ∗ ∗ ≤ M . W ithou t co nfusion, we introd uce the symb o ls ˜ σ 1 : = √ M J 2 , ˜ σ 2 : = √ M N 2 , ˜ σ ∗ : = M , which ar e upper bounds fo r the true quantities: σ ∗ 1 ≤ ˜ σ 1 , σ ∗ 2 ≤ ˜ σ 2 , σ ∗ ∗ ≤ ˜ σ ∗ . 41 Step 3. Applying Lemma 3 wit h an infla t ed threshold. Fix η = 1 2 , where this value satisfies the cond ition 0 < η ≤ 1 2 of Lem ma 3 . Let C η be the constant whose existence is g uaranteed by the lem m a (it depe n ds on ly on η ). For any t ≥ 0, applyin g Lemm a 3 to the matrix X yields P k X k ≥ (1 + η ) ( σ ∗ 1 + σ ∗ 2 ) + t ≤ ( N + J ) exp − t 2 C η ( σ ∗ ∗ ) 2 . Because σ ∗ 1 ≤ ˜ σ 1 and σ ∗ 2 ≤ ˜ σ 2 , we h av e (1 + η )( ˜ σ 1 + ˜ σ 2 ) + t ≥ (1 + η )( σ ∗ 1 + σ ∗ 2 ) + t . Hence, th e event {k X k ≥ (1 + η )( ˜ σ 1 + ˜ σ 2 ) + t } is con tain ed in the ev e n t {k X k ≥ (1 + η ) ( σ ∗ 1 + σ ∗ 2 ) + t } . Thus, we have P k X k ≥ (1 + η ) ( ˜ σ 1 + ˜ σ 2 ) + t ≤ P k X k ≥ (1 + η )( σ ∗ 1 + σ ∗ 2 ) + t . Moreover , σ ∗ ∗ ≤ ˜ σ ∗ implies ( σ ∗ ∗ ) 2 ≤ ˜ σ 2 ∗ and th u s exp − t 2 C η ( σ ∗ ∗ ) 2 ≤ exp − t 2 C η ˜ σ 2 ∗ . Combining th e above ineq ualities, we obtain a co n venient up per boun d expre ssed en tirely in terms o f the deter - ministic c onstants ˜ σ 1 , ˜ σ 2 , ˜ σ ∗ : P k X k ≥ (1 + η )( ˜ σ 1 + ˜ σ 2 ) + t ≤ ( N + J ) exp − t 2 C η ˜ σ 2 ∗ . Step 4. Choice of t and an explicit tail estimate. T aking t = ˜ σ 1 + ˜ σ 2 giv es P k X k ≥ (2 + η ) ( ˜ σ 1 + ˜ σ 2 ) ≤ ( N + J ) exp − ( ˜ σ 1 + ˜ σ 2 ) 2 C η ˜ σ 2 ∗ . (B.4) Since ˜ σ 1 + ˜ σ 2 = √ M 2 ( √ N + √ J ) a nd ˜ σ ∗ = M , we have ( ˜ σ 1 + ˜ σ 2 ) 2 = M 4 √ N + √ J 2 ≥ M 4 ( N + J ) . Consequently , we get ( ˜ σ 1 + ˜ σ 2 ) 2 C η ˜ σ 2 ∗ ≥ M 4 · N + J C η M 2 = N + J 4 C η M . Define the constan t C opt : = (2 + η ) √ M 2 . For o ur ch oice η = 1 2 , C opt = 2 . 5 √ M 2 = 1 . 2 5 √ M . Using Equation ( B.4 ) together with th e mon otonicity of the exponen tial f unction, we obtain P k X k ≥ C opt √ N + √ J ≤ ( N + J ) exp − N + J 4 C η M . Step 5. Asymptotic O P bound. ( N + J ) exp − N + J 4 C η M tends to 0 as N → ∞ (it decays expo nentially in N + J ). Hence, for ev e r y ε > 0 there exists an integer N 0 (depen d ing on ε and on the fixed co nstants M , η, C η ) su c h that for all N ≥ N 0 and all J (which ma y dep e n d on N or b e fixed) , we have P k X k ≥ C opt √ N + √ J < ε . By the definition o f the O P ( · ) notation, this is exactly k X k = O P √ N + √ J , which estab lishe s Equation ( B.2 ). Step 6. Ext e nsion to arbitrary submatrices. Let S ⊆ [ N ] and T ⊆ [ J ] b e any sub sets, and set W : = ( R − R ) S , T . For any matrix , the sp ectral no r m of a submatrix never exceeds that of th e full matrix, i.e., k W k ≤ k X k . Combining this with E q uation ( B.2 ) yields k W k = O P √ N + √ J , which co mplets the p r oof. 42 Append ix B.4. Pr o of of Lemma 7 Pr oof. From th e proof of Theor em 2 (in particular the analysis leading to the lower bound for σ 1 ( ˜ R )), there exists an ev ent E ( K 0 ) N with P ( E ( K 0 ) N ) → 1 such that on E ( K 0 ) N , σ 1 ( ˜ R ) ≥ C signal √ J √ K K 0 − C noise . Hen ce, on the same event, we hav e T K 0 = σ 1 ( ˜ R ) − 1 + r J N ≥ C signal √ J √ K K 0 − C noise − 1 − r J N . (B.5) Define D N , K 0 : = √ J √ K K 0 . T h en, Assum p tion 6 is exactly C signal D N , K 0 ≥ C noise + 1 + 3 η 0 . Consequently , we get D N , K 0 ≥ d 0 : = C noise + 1 + 3 η 0 C signal . (B.6) Now we use the ad ditional co ndition K 3 = o ( N ). Since K 0 ≤ K − 1 < K , we have √ K K 0 √ N ≤ K 3 / 2 √ N = K 3 N ! 1 / 2 − → 0 . Thus, there exists an integer N 1 (depen d ing only o n th e co nstants η 0 and d 0 ) such that f or all N ≥ N 1 , K 3 / 2 √ N ≤ η 0 d 0 . Consequently , f o r every K 0 < K and every N ≥ N 1 , we h av e √ K K 0 √ N ≤ η 0 d 0 . (B.7) Observe that √ J / N can be expressed as q J N = √ K K 0 √ N D N , K 0 . Combining th is with Equa tio n ( B.7 ) yields, for all N ≥ N 1 and on E ( K 0 ) N , r J N ≤ η 0 d 0 D N , K 0 . (B.8) Substituting Equation ( B.8 ) into Equ ation ( B.5 ) gives T K 0 ≥ C signal D N , K 0 − C noise − 1 − η 0 d 0 D N , K 0 = C signal − η 0 d 0 ! D N , K 0 − ( C noise + 1) . (B.9) W e now show tha t the r ig ht-hand side of Equ ation ( B.9 ) is at least c low D N , K 0 . Th is is e q uiv alen t to C signal − η 0 d 0 − c low ! D N , K 0 ≥ C noise + 1 . Because D N , K 0 ≥ d 0 by Equation ( B.6 ), it su ffi ces to prove C signal − η 0 d 0 − c low ! d 0 ≥ C noise + 1 . (B.10) Computing the left-hand side of Equ ation ( B.10 ) g iv e s C signal − η 0 d 0 − c low ! d 0 = C signal d 0 − η 0 − c low d 0 = ( C signal − c low ) d 0 − η 0 . Recall the d efinition o f c low : c low = 2 η 0 C signal C noise + 1 + 3 η 0 . Using the expr ession for d 0 and c low obtains ( C signal − c low ) d 0 = C signal − 2 η 0 C signal C noise + 1 + 3 η 0 ! C noise + 1 + 3 η 0 C signal = C noise + 1 + 3 η 0 − 2 η 0 = C noise + 1 + η 0 . 43 Therefo re, we hav e ( C signal − c low ) d 0 − η 0 = ( C noise + 1 + η 0 ) − η 0 = C noise + 1 , which m e ans th at Eq uation ( B.10 ) h olds with equ ality . Consequ ently , for all N ≥ N 1 and o n E ( K 0 ) N , we h av e T K 0 ≥ c low D N , K 0 . Finally , because P ( E ( K 0 ) N ) → 1 , we have P T K 0 ≥ c low √ J √ K K 0 ≥ P ( E ( K 0 ) N ) − → 1 , which estab lishe s Equation ( B.3 ). Append ix B.5. Pr o of of Lemma 8 Pr oof. Fix an ar bitrary can didate n umber K 0 ≥ 1. Let ˆ Z ∈ { 0 , 1 } N × K 0 be the estimated classification matrix return ed by the classifier M in Algo rithm 1 . For each estimated class k ∈ [ K 0 ], define its index set ˆ C k = { i ∈ [ N ] : ˆ Z ( i , k ) = 1 } and its size n k = | ˆ C k | . By con struction of the classifier, we ma y assume n k > 0 fo r all k . Oth erwise the estimation of ˆ Θ would be undefin e d. The fitted expected respon se matrix is ˆ R = ˆ Z ˆ Θ ⊤ , where ˆ Θ ∈ [0 , M ] J × K 0 . Consequ ently , for any item j ∈ [ J ] and any estimated class k , th e fitted value ˆ R ( i , j ) is constant over all in d ividuals in ˆ C k . Denote this common value b y ˆ Θ jk ∈ [0 , M ]. For each pair ( j , k ), the quan tity ˆ V jk = ˆ Θ jk 1 − ˆ Θ jk M is alw ays n on-negativ e. If ˆ V jk > 0 , the cor respond ing e n tries of the n ormalized re sid u al m atrix ˜ R (see E quation ( 2 )) ar e well- d efined a n d, fo r i ∈ ˆ C k , ˜ R ( i , j ) = R ( i , j ) − ˆ Θ jk q N ˆ V jk . If ˆ V jk = 0, then nece ssarily ˆ Θ jk ∈ { 0 , M } and every R ( i , j ) for i ∈ ˆ C k must equal ˆ Θ jk (otherwise the a verage could not be exactly 0 or M ) . In this case the numera to r P i ∈ ˆ C k ( R ( i , j ) − ˆ Θ jk ) 2 is zero, and the definition of Equation ( 2 ) sets ˜ R ( i , j ) = 0 for all i ∈ ˆ C k . Henc e , regardless of the value of ˆ V jk , the contribution of the block ( ˆ C k , j ) to th e square d Frobeniu s n orm o f ˜ R can be b ound e d uniformly . W e now der i ve an upper bound fo r k ˜ R k 2 F . For a fixed block ( j , k ) and assuming first that ˆ V jk > 0 , we hav e X i ∈ ˆ C k ˜ R ( i , j ) 2 = 1 N ˆ V jk X i ∈ ˆ C k R ( i , j ) − ˆ Θ jk 2 . Because each R ( i , j ) takes values in { 0 , 1 , . . . , M } , we h av e the elemen ta r y in equality R ( i , j ) 2 ≤ M R ( i , j ). Su mming over i ∈ ˆ C k yields X i ∈ ˆ C k R ( i , j ) 2 ≤ M X i ∈ ˆ C k R ( i , j ) = M n k ˆ Θ jk . Hence, we g et X i ∈ ˆ C k R ( i , j ) − ˆ Θ jk 2 = X i ∈ ˆ C k R ( i , j ) 2 − n k ˆ Θ 2 jk ≤ M n k ˆ Θ jk − n k ˆ Θ 2 jk = n k ˆ Θ jk ( M − ˆ Θ jk ) . Now observe tha t ˆ Θ jk ( M − ˆ Θ jk ) = M ˆ V jk . Th erefor e , we get X i ∈ ˆ C k R ( i , j ) − ˆ Θ jk 2 ≤ n k M ˆ V jk . 44 Substituting this in to the expre ssion for the block contribution gives X i ∈ ˆ C k ˜ R ( i , j ) 2 ≤ 1 N ˆ V jk · n k M ˆ V jk = n k M N . If ˆ V jk = 0, then by c onstruction ˜ R ( i , j ) = 0 for a ll i ∈ ˆ C k , so the lef t-hand side is zero, a n d the inequality P i ∈ ˆ C k ˜ R ( i , j ) 2 ≤ n k M N remains valid b ecause the r ight-hand side is non -negati ve. Thus, for e very block ( j , k ), we have X i ∈ ˆ C k ˜ R ( i , j ) 2 ≤ n k M N . Now summing over all items j = 1 , . . . , J and all estimated classes k = 1 , . . . , K 0 giv es k ˜ R k 2 F = K 0 X k = 1 J X j = 1 X i ∈ ˆ C k ˜ R ( i , j ) 2 ≤ K 0 X k = 1 J X j = 1 n k M N = M N K 0 X k = 1 n k J . Since the estimated classes par tition the set of individuals, we have P K 0 k = 1 n k = N . Consequently , we have k ˜ R k 2 F ≤ M N · N · J = M J , and th e refore k ˜ R k F ≤ √ M J . For any m atrix, the spectral n orm d oes n ot exceed the Frobeniu s norm, i.e . , σ 1 ( ˜ R ) ≤ k ˜ R k F . Hen ce, we have σ 1 ( ˜ R ) ≤ √ M J . Recall from Equa tio n ( 4 ) th at the test statistic is defined as T K 0 = σ 1 ( ˜ R ) − 1 + √ J / N . Drop ping the negati ve ter m yields the d e sired bo u nd T K 0 ≤ √ M J . This ineq uality holds for every possible realization of the data and o f the e stima tio n proced ure, regard less of whether the candidate m odel is correctly spe cified or not. In particular, it holds with probability on e for all samp le sizes N . Append ix B.6. Pr o of of Lemma 9 Pr oof. Recall from Equatio n ( 1 ) th at R ∗ ( i , j ) = R ( i , j ) −R ( i , j ) √ N V ( i , j ) . Assumptio n 1 guaran tees V ( i , j ) ≥ M δ (1 − δ ) > 0, h e nce each en tr y is well defined. Let Y = √ N R ∗ . T hus, Y ’ s ( i , j )-th entry is Y i j = √ N R ∗ ( i , j ). Th en the entries { Y i j } are indepen d ent (be c ause the R ( i , j ) are indepen d ent giv en th e latent structu re, and the latent structu re is fixed), an d satisfy E [ Y i j ] = 0 , V a r( Y i j ) = 1 , | Y i j | ≤ C , where C : = r M δ (1 − δ ) < ∞ . W e first establish an a lm ost sure lower bo und. Consider th e first co lu mn Y ( · , 1) = ( Y 11 , . . . , Y N 1 ) ⊤ . The rando m variables { Y 2 i 1 } N i = 1 are independent, bo unded by C 2 , and satisfy E [ Y 2 i 1 ] = 1 (i.e., Y i 1 is standardized). For any ε > 0, Hoe ff ding ’ s ine quality y ields P 1 N N X i = 1 Y 2 i 1 − 1 > ε ≤ 2 exp − 2 N ε 2 C 4 . (B.11) 45 The right-hand side of Equa tio n ( B.1 1 ) is summable over N because P ∞ N = 1 e − cN < ∞ for any c > 0. By th e Borel–Cantelli lem m a, for every fixed ε > 0, we have P 1 N N X i = 1 Y 2 i 1 − 1 > ε in finitely of ten = 0 . T akin g a countab le sequ ence ε m = 1 / m ( m = 1 , 2 , . . . ) and in tersecting the correspo nding almo st sure events, we obtain a lmost su r ely lim N →∞ 1 N N X i = 1 Y 2 i 1 = 1 . Hence, we g et k Y ( · , 1) k F √ N = 1 N N X i = 1 Y 2 i 1 1 / 2 a . s . − − → 1 . Since k Y k ≥ k Y ( · , 1) k F , we h av e lim inf N →∞ k Y k √ N ≥ 1 alm o st sure ly . (B.12) Next we der i ve an almost su r e u pper bound. For Y , we h av e ˜ ˜ σ 1 : = m ax i ∈ [ N ] v u t J X j = 1 E [ Y i j ] 2 = √ J , ˜ ˜ σ 2 : = m ax j ∈ [ J ] v u t N X i = 1 E [ Y i j ] 2 = √ N , ˜ ˜ σ ∗ : = max i , j k Y i j k ∞ ≤ C . Fix any η ∈ (0 , 1 / 2]. Lemma 3 guarantees th e existence of a constant C η > 0 such that for every t ≥ 0, P k Y k ≥ (1 + η )( ˜ ˜ σ 1 + ˜ ˜ σ 2 ) + t ≤ ( N + J ) exp − t 2 C η ˜ ˜ σ 2 ∗ . (B.13) Now fix an arbitrary ˜ δ > 0. Choose η = ˜ δ/ 2 for 0 < ˜ δ ≤ 1. Because J = o ( N ), there exists N 0 such that for all N ≥ N 0 , √ J ≤ ˜ δ 4(1 + ˜ δ/ 2) √ N . Consequently , we get (1 + η ) √ J = (1 + ˜ δ/ 2) √ J ≤ ˜ δ 4 √ N . T ake t = ˜ δ 4 √ N in Eq uation ( B.13 ). Then f or all N ≥ N 0 , we h av e (1 + η )( √ N + √ J ) + t ≤ (1 + ˜ δ/ 2) √ N + ˜ δ 4 √ N + ˜ δ 4 √ N = (1 + ˜ δ ) √ N . Hence, the event {k Y k ≥ (1 + ˜ δ ) √ N } is contained in the ev ent {k Y k ≥ ( 1 + η )( √ N + √ J ) + t } , an d by Equation ( B.13 ), we have P k Y k ≥ (1 + ˜ δ ) √ N ≤ ( N + J ) exp − t 2 C η C 2 = ( N + J ) exp − ˜ δ 2 N 16 C η C 2 . (B.14) The right-hand side o f E quation ( B.14 ) is summ able ov er N b ecause it deca y s expo n entially in N . By the Borel–Cantelli lem m a, the event {k Y k ≥ (1 + ˜ δ ) √ N } occu rs o nly finitely many times almost surely . T herefor e, we ge t lim sup N →∞ k Y k √ N ≤ 1 + ˜ δ almost sur ely . 46 Since ˜ δ > 0 was ar bitrary , we ca n take a coun table sequence ˜ δ m ↓ 0 (e.g. , ˜ δ m = 1 / m ) an d intersect the correspo n d- ing almo st sure events to obtain lim sup N →∞ k Y k √ N ≤ 1 almost surely . (B.15) Combining Equation s ( B.12 ) and ( B.1 5 ) yie ld s k Y k √ N a . s . − − → 1 . Because R ∗ = Y / √ N , we have k R ∗ k = k Y k / √ N a . s . − − → 1 . Alm ost su r e c on vergence im plies conv e rgen ce in pro ba- bility , so for any ε > 0, lim N →∞ P k R ∗ k ≥ 1 − ε = 1 , and in p articular σ 1 ( R ∗ ) = k R ∗ k = 1 + o P (1). T his completes th e p roof. Ap pendix C. SC-LCM algorit hm and its consistency Here, we introdu ce SC-LCM, a simple spectral clu stering algorithm fo r estimating the laten t class m embership matrix Z under th e latent class m odel for o r dinal categorical data with polytomo us responses. The alg o rithm takes the top K left singular vectors of R and ap p lies k - means clustering to its rows. Under mild cond itio ns, we pr ove that the proced u re con sistently recovers the tru e latent classes even as N , J , and K a ll diverge (i.e., the large-s cale setting) . Our analysis begins with an or acle case where the pop ulation parameter s are k nown. Let R = E [ R ] = Z Θ ⊤ be the N × J expected respon se matrix. T o simplify our theor etical analysis, we let the rank of Θ b e K . Thu s, R h as a low-rank structure with ran k K . Sin c e R h as ra nk K , consider its com pact singu lar v alue dec o mposition R = U Σ V ⊤ , where U ∈ R N × K satisfies U ⊤ U = I K , V ∈ R J × K satisfies V ⊤ V = I K , and Σ = diag( σ 1 , . . . , σ K ) with σ 1 ≥ · · · ≥ σ K > 0. The f ollowing lemma shows th at U has exactly K distinct rows a n d tha t these rows are perfectly alig ned with th e latent class me m berships. Lemma 10 . Under the laten t class model, the left sing ular vectors U of R satisfy that a ll r o ws belonging to the same latent class are iden tical, an d for any two distinct cla sses k , l, k U ( i , :) − U ( j , :) k F = r 1 N k + 1 N l , i ∈ C k , j ∈ C l . By Lemm a 10 , app ly ing k -mean s with K clusters to the r ows of U recovers the classification matrix Z exactly up to a p ermutation o f labels. In practice we only ob serve R , not R . L et R = ˆ U ˆ Σ ˆ V ⊤ + ˆ U ⊥ ˆ Σ ⊥ ˆ V ⊤ ⊥ be the full SVD of R , where ˆ U ∈ R N × K contains the left singular vectors co r respond ing to the K largest singular values. T he practical SC-LCM algor ithm is summar ized in Algorith m 3 . Algorithm 3 Sp ectral Clustering for Laten t Class Models (SC-LCM) Require: Observed response m atrix R ∈ { 0 , 1 , . . . , M } N × J , n umber of latent classes K Ensure: Estimated classification matrix ˆ Z 1: Compu te the to p K left sin g ular vector s ˆ U ∈ R N × K of R . 2: Apply k -mean s alg o rithm to the rows of ˆ U with K clusters to obtain ˆ Z . 47 The intuition beh in d SC-LCM is that ˆ U is a perturbed version of U ( up to an o rthogo nal ro tation), an d by Lemma 10 the rows of U are perfe c tly separable. Hence, under a su ffi ciently small p erturbatio n, k -means on ˆ U will recover th e tru e cla sses with high accuracy . T o establish the co n sistency of SC-LCM, we introduce a para m eter that governs both the signal strength and the sparsity of th e d ata. D e fine ρ : = max j ∈ [ J ] , k ∈ [ K ] Θ ( j , k ) , which is the maximum expected response across a ll items and latent classes. Th is quantity directly a ff ects the scale of the entries of th e observed matrix R : a larger ρ lead s to larger expected responses and thus a denser o bservation matr ix, whereas a smaller ρ pushes the e xpected respo nses toward zero , m a king the data sparser . In our asympto tic regime we a llow ρ → 0 , wh ich correspo nds to a spar se setting wh ere mo st en tries o f R are zero. The fo llowing assum ption controls the speed of this decay . Assumption 7 (Sp arsity scalin g) . ρ max( N , J ) ≥ M 2 log( N + J ) . W e also define th e scaled item parameter matrix Θ 0 : = Θ /ρ . By construction, we have every entry of Θ 0 lies in [0 , 1]. Let σ K ( Θ 0 ) d enote the K -th largest sing ular value of Θ 0 . T o measure the di ff erence between Z and ˆ Z , we consider the Clustering err or used in [ 15 ]. This metr ic is define d as err( ˆ Z , Z ) = min π ∈ S K max k ∈ [ K ] |C k T ˆ C c π ( k ) | + | ˆ C π ( k ) T C c k | N k , where ˆ C k = { i : ˆ Z ( i , k ) = 1 } an d S K is the set o f permutations of { 1 , . . . , K } . The following theorem guaran tees SC-LCM’ s estimation c o nsistency under the LCM m o del. Theorem 6 ( Consistency of SC-LCM) . Un der Assumption 7 , the estimato r ˆ Z pr oduced b y Algorithm 3 satisfies, with pr oba bility tendin g to 1 as N → ∞ , err( ˆ Z , Z ) = O ( K 2 N max max( N , J ) log( N + J ) N 2 min ρσ 2 K ( Θ 0 ) ) . When N min ≍ N / K , N max ≍ N / K , and J = o ( N ), the bound in Theorem 6 reduces to err( ˆ Z , Z ) = O P K 3 log N ρσ 2 K ( Θ 0 ) . When ρσ 2 K ( Θ 0 ) ≫ K 3 log N , we have err( ˆ Z , Z ) P − → 0 , wh ich shows SC-L CM’ s estimation con sistency . Append ix C.1. P r oof of Lemma 10 Pr oof. From the compact sing ular value de composition R = U Σ V ⊤ and the factorization R = Z Θ ⊤ , we obtain U = R V Σ − 1 = Z ( Θ ⊤ V Σ − 1 ) . Setting X U = Θ ⊤ V Σ − 1 ∈ R K × K , we have U = Z X U and U ⊤ U = I K . This structure—a membersh ip matrix Z post-multiplied by a squ a re m a trix X U with ortho normal colu m ns—is exactly the on e con sidered in Lemm a 2.1 of [ 18 ] for the eigenvectors of th e stochastic block mo del’ s mean ma tr ix. App lying that lem ma directly yields the d e sired pro perties of the rows of U and the distan c e fo rmula. Append ix C.2. P r oof of The o r em 6 Pr oof. Set W : = R − R . For each pair ( i , j ) define W i j = W ( i , j ) e i ˜ e ⊤ j where { e i } and { ˜ e j } are the standard basis vectors in R N and R J . Th e m atrices { W i j } are in depend ent, cen tred, a n d satisfy k W i j k ≤ M . Mo reover , E [ W ( i , j ) 2 ] = V ar( R ( i , j )) = R ( i , j ) 1 − R ( i , j ) M ≤ R ( i , j ) ≤ ρ. Hence, we h av e X i , j E [ W i j W ⊤ i j ] ≤ ρ J , X i , j E [ W ⊤ i j W i j ] ≤ ρ N . 48 Under Assumption 7 , app lying th e m a trix Bernstein in equality [ 25 ] with a su ffi ciently large constan t C 3 yields P k W k ≥ C 3 p ρ max( N , J ) log( N + J ) ≤ ( N + J ) − 2 − − − − → N →∞ 0 , which g i ves k W k ≤ C 3 p ρ max( N , J ) log( N + J ) . (C.1) From R = Z Θ ⊤ and Θ = ρ Θ 0 , we h av e σ K ( R ) ≥ p N min ρ σ K ( Θ 0 ) . (C.2) The Davis–Kahan sin Θ theorem [ 31 ] g uarantees the existence of an orthogo nal m a trix O ∈ R K × K such that k ˆ U O − U k F ≤ 2 √ 2 K k R − Rk σ K ( R ) . Inserting Equations ( C.1 ) and ( C.2 ) gives k ˆ U O − U k F ≤ C 4 s K max( N , J ) log( N + J ) N min ρ σ 2 K ( Θ 0 ) , (C.3) where C 4 : = 2 √ 2 C 3 . Lemma 10 shows that the ro ws o f U are co nstant within eac h tr ue latent class: for i ∈ C k , U ( i , :) = u k for some vector u k ∈ R K . Mo reover , for any two distinct classes k , l , Lemma 10 gives k u k − u l k F = q 1 N k + 1 N l . Define d kl : = q 1 N k + 1 N l , ∆ kl : = 1 √ N k + 1 √ N l . A simple inequality gives ∆ kl ≤ √ 2 d kl for all k , l . Set ς : = q 2 K N max N min k ˆ U O − U k F . For any d istinct classes k , l , we hav e √ K ς k ˆ U O − U k F ∆ kl = √ K q 2 K N max N min k ˆ U O − U k F k ˆ U O − U k F ∆ kl = √ N min √ 2 N max ∆ kl ≤ √ N min √ 2 N max √ 2 d kl = √ N min √ N max d kl ≤ d kl . Hence, by Lem ma 2 o f [ 15 ], we ob tain err( ˆ Z , Z ) = O ( ς 2 ) . Substituting the definition of ς and Equation ( C.3 ) giv e s err( ˆ Z , Z ) = O ( K N max N min · K max( N , J ) log( N + J ) N min ρσ 2 K ( Θ 0 ) ) = O ( K 2 N max max( N , J ) log( N + J ) N 2 min ρσ 2 K ( Θ 0 ) ) . References [1] Agresti, A. , 2012. Cat egori cal data analysis. volume 792. J ohn W ile y & Sons. [2] Akaike, H., 1998. Information theory and an exte nsion of the m aximum likel ihood principl e, in: Selecte d papers of hirotugu akaike. Springer , pp. 199–213. [3] Bandeira, A.S. , van Handel, R. , 2016. Sharp nonasympto tic bounds on the norm of random m atrice s with independe nt entries. Annals of Probabil ity 44, 2479 – 2506. [4] Bicke l, P .J., Sarkar , P ., 2016. Hypothesis testing for automated community detec tion in network s. Journal of the Royal Statist ical Society Series B: Statistica l Methodology 78, 253–273. [5] Biernacki , C., Celeux , G., Gov aert, G., 2003. Choosing starting v alues for the em algorithm for getting the highest likel ihood in multi v ariat e gaussian mixture models. Computationa l Statistics & Data Analysis 41, 561–575. [6] Chen, Y ., Li, X., Liu, J., Y ing, Z., 2017. Regulari zed latent class analysis with applicat ion in cogniti ve diagnosis. Psychometrika 82, 660–692. [7] Collins, L.M., Lanza, S.T . , 2013. Latent class and latent transiti on analysis: Wit h appli catio ns in the social, behavi oral, and health s cienc es. John Wi ley & Sons. 49 [8] Dempster , A.P ., Laird, N.M., Rubin, D.B., 1977. Maximum like lihood from incomplete data via the EM algorit hm. Journal of the Royal Statist ical Societ y: S eries B (Methodologi cal) 39, 1–22. [9] Goodman, L. A., 1974. Explo ratory latent s tructur e analysis using both identi fiable and unidentifia ble models. Biometrika 61, 215–231. [10] Gu, Y ., Dunson, D.B., 2023. Bayesia n p yramids: Identifiable multilayer discrete latent structure m odels for discrete data. Journal of the Roya l Statistical Society Series B: Stati stical Methodolo gy 85, 399–426. [11] Gu, Y ., Xu, G., 2020. Par tial identifiabi lity of restricted latent class m odels. Annals of Statistic s 48, 2082–2107. [12] Hagenaars, J.A., McCutcheon, A.L., 2002. Applied late nt class analysis. Cambridge Univ ersity Press. [13] Hu, J., Zhang, J ., Qin, H. , Y an, T ., Zhu, J., 2021. Using maximum entry-wise devi ation to test the goodness of fit for stochastic block models. Journal of the American Statist ical Association 116, 1373–1382. [14] Jin, J., Ke, Z.T ., Luo, S., W ang, M., 2023. Optimal estimation of the number of netw ork communities. Journal of the American Stati stical Associati on 118, 2101–2116. [15] Joseph, A., Y u, B., 2016. Impact of regula rizat ion on spectra l clustering. Annals of Statisti cs 44, 1765–1791. [16] Keri bin, C., 2000. Consiste nt estimat ion of the order of mixture models. Sankhy ¯ a: The Indian J ournal of Statistics, Series A , 49–66. [17] Lei, J. , 2016. A goodness-of-fit test for stochastic block m odels. Annals of Statist ics 44, 401–424. [18] Lei, J. , Rinaldo, A., 2015. Consistenc y of spectral clustering in stochastic block models. Annals of Statistics 43, 215 – 237. [19] L yu, Z., Chen, L ., Gu, Y ., 2025. Degre e-hete rogeneo us laten t class analysis for high-dimensional discrete data. Journal of the American Statist ical Associati on 120, 2435–2448. [20] L yu, Z. , Gu, Y ., 2025. S pectral cluster ing with likel ihood refinement is optimal for latent class recov ery . arXi v preprint arXiv :2506.07167 . [21] Qing, H., 2024a. Finding mixed m emberships in cate gorica l data . Informat ion Sciences 676, 120785. [22] Qing, H., 2024b . Grade of membership analysis for multi-la yer ordinal cate gorical data. Statistica S inica 38. [23] Qing, H., 2025. Mixe d membership estimati on for cate gorica l data with weighted responses. TEST 34, 612–659. [24] Schwarz, G., 1978. Estimating the dimension of a model. Annals of Statistics , 461–464. [25] Tropp, J.A., 2012. User -friendl y tail bounds for sums of random matrices. F oundati ons of Computatio nal Mathemati cs 12, 389–434. [26] V on Davi er, M., 2008. A general diagnostic model applied to language testing data. British Journal of Mathemat ical and Statistic al Psychology 61, 287–307. [27] W ang, M., Hanges, P .J., 2011. Latent class procedures: Application s to organi zatio nal research. Organiza tional Research Methods 14, 24–31. [28] W oodbury , M.A., Cliv e, J. , Garson Jr , A., 1978. Mathematical typolo gy: a grade of m embership techni que for obtain ing disease definiti on. Computers and Biomedical Research 11, 277–298. [29] W u, Q., Hu, J. , 2024. A spectral based goodness-of-fit test for stochasti c block models. Statistics & Probabilit y Letters 209, 110104. [30] Xu, G., Shang, Z ., 2018. Identifyin g latent structures in restricted latent class models. Journal of the America n Statistica l Association 113, 1284–1295. [31] Y u, Y ., W ang, T ., Samwort h, R.J., 2015. A useful variant of the davi s –kahan theorem for statistic ians. Biomet rika 102, 315–323. [32] Zeng, Z. , Gu, Y ., Xu, G., 2023. A T ensor-EM Method for L arge -Scale Latent Class Analysis with Binary Responses. Psychometrika 88, 580–612. [33] Zhang, A.Y ., Zhou, H.Y . , 2024. Leav e-one-out singular subspace perturb ation ana lysis for spectral cl ustering . Annals of Stati stics 52, 2004–2033. 50
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment