Block Empirical Likelihood Inference for Longitudinal Generalized Partially Linear Single-Index Models
Generalized partially linear single-index models (GPLSIMs) provide a flexible and interpretable semiparametric framework for longitudinal outcomes by combining a low-dimensional parametric component with a nonparametric index component. For repeated …
Authors: Tianni Zhang, Yuyao Wang, Yu Lu
Blo c k Empirical Lik eliho o d Inference for Longitudinal Generalized P artially Linear Single-Index Mo dels Tianni Zhang 1 , Y uy ao W ang 1 , Y u Lu 1 , and Mengfei Ran ∗ 1 1 Wisdom Lak e Academy of Pharmacy , Xi’an Jiaotong-Liv erp ool Universit y Abstract Generalized partially linear single-index mo dels (GPLSIMs) pro vide a flexible and in ter- pretable semiparametric framew ork for longitudinal outcomes b y combining a lo w-dimensional parametric comp onen t with a nonparametric index comp onen t. F or rep eated measurements, v alid inference is c hallenging because within-sub ject correlation induces nuisance parameters and v ariance estimation can be unstable in semiparametric settings. W e prop ose a profile estimating-equation approac h based on spline appro ximation of the unknown link function and construct a blo c k empirical lik eliho o d (BEL) for joint inference on the parametric coefficients and the single-index direction. The resulting BEL ratio statistic enjo ys a Wilks-t yp e c hi-square limit, yielding likelihoo d-free confidence regions without explicit sandwic h v ariance estimation. W e also discuss practical implementation, including constrained optimization for the index param- eter, working-correlation choices, and b o otstrap-based confidence bands for the nonparametric comp onen t. Simulation studies and an application to the epilepsy longitudinal study illustrate the finite-sample p erformance. Keyw ords: block empirical likelihoo d; generalized estimating equations; longitudinal data; par- tially linear single-index mo del. 1 In tro duction Longitudinal and other clustered studies collect rep eated measurements on eac h exp erimental unit and arise routinely in biomedical research, economics, and the social sciences ( Diggle et al. , 2002 ). A k ey feature of longitudinal data is within-sub ject correlation, whic h, if neglected, can compromise efficiency and distort uncertaint y quan tification. Generalized estimating equations (GEE) ( Liang and Zeger , 1986 ) offer a widely used semiparametric framew ork that a voids full lik eliho o d sp ec- ification by relying on momen t restrictions and a working correlation. Subsequen t developmen ts clarified how to impro v e efficiency and robustness under correlation missp ecification; for example, the quadratic inference function w as prop osed b y Qu et al. ( 2000 ) to pro vide an alternative moment construction with fav orable testing prop erties. When the mean structure departs from a purely parametric form, semiparametric regression for clustered outcomes using GEE becomes esp ecially attractiv e ( Lin and Carroll , 2001 ), and new estimation and mo del selection pro cedures for longi- tudinal semiparametric mo deling were dev elop ed in F an and Li ( 2004 ). Nevertheless, W ald-type inference based on sandwic h co v ariance estimation can b e unstable when the num b er of clusters is mo derate and the w orking correlation is difficult to calibrate in practice ( Liang , 2008 ). ∗ Corresp onding author. mengfei.ran@xjtlu.edu.cn 1 Mean while, purely linear predictors can b e to o rigid for mo dern longitudinal studies, where co- v ariate effects may b e nonlinear, heterogeneous across sub jects, or driv en by a few laten t directions. Single-index structures address this b y pro jecting high-dimensional cov ariates on to a single infor- mativ e index and estimating an unkno wn univ ariate link, with the c hoice of smoothing lev el pla ying a crucial role; H¨ ardle et al. ( 1993 ) studied optimal smo othing for this class of mo dels. P artially linear single-index mo dels further retain an explicit linear component for in terpretability while cap- turing remaining nonlinear v ariation through the index link, aligning naturally with additive and other nonparametric regression ideas ( Stone , 1985 ). F or indep endent data, Y u and Rupp ert ( 2002 ) prop osed penalized spline estimation pro cedures for partially linear single-index mo dels, and Xia and Hardle ( 2006 ) dev elop ed semiparametric estimation theory that justifies their asymptotic prop- erties. T o accommo date non-Gaussian outcomes, Carroll et al. ( 1997 ) introduced the generalized partially linear single-index model (GPLSIM), which embeds the unknown link in a generalized mean structure and thus bridges generalized linear modeling with flexible regression. In rep eated- measures settings, Liang et al. ( 2010 ) developed estimation and testing metho ds for partially linear single-index mo dels with longitudinal data, while Bai et al. ( 2009 ) studied mo del-c hecking to ols tailored to longitudinal single-index sp ecifications. Closely related semiparametric longitudinal for- m ulations include lo cal p olynomial mixed-effects mo dels prop osed by W u and Zhang ( 2002 ) and p olynomial spline inference for v arying-co efficient mo dels developed in Huang et al. ( 2004 ). Because longitudinal outcomes are often contaminated by outliers or heavy-tailed noise, robust alternatives ha ve been pursued: Qin and Zh u ( 2008 ) inv estigated robust estimation in partial linear mo dels with longitudinal data, and Liu and Lian ( 2018 ) studied robust pro cedures for v arying-co efficient mo dels in longitudinal settings. Despite this extensiv e mo deling literature, reliable inference for longitudinal GPLSIMs remains c hallenging b ecause the unknown link function is a n uisance comp onent whose estimation error can affect the second-order b ehavior of inference on finite-dimensional parameters. This difficult y is closely related to general principles for inference on parameters in semiparametric mo dels ( He and Shi , 2000 ), and b ecomes more pronounced when v ariable selection under correlation is also of in terest. F rom an implemen tation standp oint, generalized semiparametric fitting is often carried out using iterativ ely reweigh ted least squares, as discussed b y Green ( 1984 ), and stable quasi-Newton up dating strategies can b e helpful for high-dimensional optimization ( No cedal , 1980 ). Spline sieves pro vide a practical approximation device for unknown smo oth functions De Bo or ( 2001 ), and a comprehensiv e treatment of semiparametric regression is given by Rupp ert et al. ( 2003 ). In mo dern longitudinal studies with dense tra jectories, connections to principal comp onent metho dology for functional and longitudinal data also offer useful p ersp ective on dimension reduction and v ariability ( Hall et al. , 2006 ). These challenges motiv ate inferen tial approaches that remain faithful to the estimating-equation paradigm while a voiding unstable plug-in v ariance calculations. Empirical likelihoo d provides a con venien t vehicle: it treats momen t restrictions as the primitive ob ject and yields likelihoo d-ratio t yp e confidence regions without sp ecifying a full parametric lik eliho o d. The original formulation is due to Owen ( 2001 ), and the extension to general estimating equations was formalized b y Kolaczyk ( 1994 ). In semiparametric con texts, Xue and Zh u ( 2006 ) prop osed EL-based inference for single- index mo dels, whereas Xue and Lian ( 2016 ) considered EL pro cedures when co v ariables are missing. When outcomes are correlated, EL constructions typically need to b e modified so that the de- p endence structure is resp ected rather than ignored. A natural remedy is to build the empirical lik eliho o d on blocks, using blo cks as approximately indep endent units. In longitudinal regression, Y ou et al. ( 2006 ) prop osed a blo ck empirical likelihoo d for partially linear mo dels, and Y u et al. ( 2014 ) studied EL inference for generalized partially linear sin gle-index mo dels. Metho dological ex- tensions hav e contin ued to app ear along several directions: robustification via robust GEE combined 2 with EL was developed by Hu and Xu ( 2022 ), while T an and Y an ( 2021 ) inv estigated p enalized EL for longitudinal generalized linear mo dels. Practical complications such as measurement error hav e also b een addressed; for example, Zhang et al. ( 2022 ) considered EL inference for longitudinal data with co v ariate measurement errors. T o integrate information b ey ond the primary sample, Sheng et al. ( 2022 ) prop osed a p enalized EL approac h for synthesizing external aggregated information under p opulation heterogeneit y . Ba yesian v arian ts ha v e b een explored to o: Ouyang and Bondell ( 2023 ) developed Ba yesian EL for longitudinal data, and decorrelation ideas for stabilizing inference in high-dimensional longitudinal GLMs were prop osed in Geng and Zhang ( 2024 ). Finally , EL has b een pushed into mo dern semiparametric and high-dimensional settings with missingness, including single-index quantile regression ( W ang and Liang , 2023 ), and a broader p ersp ective connecting EL to functional data analysis is survey ed by Chang and McKeague ( 2025 ). In this pap er, w e dev elop a blo ck empirical lik eliho o d (BEL) approac h for longitudinal gen- eralized partially linear single-index models. Our estimation strategy starts from a profile GEE form ulation: for a finite-dimensional parameter θ θ θ = ( β β β ⊤ , φ φ φ ⊤ ) ⊤ (with α α α = α α α ( φ φ φ ) enforcing scale iden tifiability), we appro ximate the unknown link η 0 ( · ) by a spline sieve ( De Bo or , 2001 ). Plugging the profiled link estimator into the marginal mean yields estimating functions, which are then em- b edded in to a BEL ratio that treats each sub ject as a block ( Y ou et al. , 2006 ). This construction pro duces lik eliho o d-free confidence regions for θ θ θ 0 that remain v alid under mild conditions ev en when the w orking correlation is missp ecified. At the technical lev el, a key ingredien t is a profile- orthogonalit y prop erty that renders the impact of estimating η 0 ( · ) second order for inference on θ θ θ 0 , enabling a Wilks-type limit for the BEL statistic in the longitudinal GPLSIM setting. The rest of the pap er is organized as follo ws. Section 2 in tro duces the longitudinal GPLSIM, the profile estimating equations, and the BEL construction. Section 3 presen ts a stable implemen tation and practical c hoices for spline dimension and correlation up dating. Section 4 establishes large- sample prop erties of the proposed estimator and the Wilks-type limit for BEL. Sim ulation studies (Section 5 ) and a real-data analysis (Section 6 ) illustrate finite-sample p erformance and practical utilit y . Section 7 concludes with discussion and p ossible extensions. 2 Metho dology 2.1 Longitudinal GPLSIM F or sub ject i = 1 , . . . , n , let { ( Y ij , x x x ij , z z z ij ) : j = 1 , . . . , m i } denote rep eated measurements, where Y ij is the resp onse, x x x ij ∈ R p en ters the linear comp onent, and z z z ij ∈ R q en ters the index comp onent. W e assume indep endence across sub jects while allowing arbitrary within-sub ject correlation, which is the standard setting for GEE-type methodology ( Liang and Zeger , 1986 ; Diggle et al. , 2002 ). Let µ ij = E ( Y ij | x x x ij , z z z ij ) and let g ( · ) b e a known link function. W e consider the longitudinal generalized partially linear single-index mo del (GPLSIM) g ( µ ij ) = x x x ⊤ ij β β β 0 + η 0 z z z ⊤ ij α α α 0 , i = 1 , . . . , n, j = 1 , . . . , m i , (1) where β β β 0 ∈ R p , α α α 0 ∈ R q , and η 0 ( · ) is an unknown smo oth function. This structure reduces dimensionalit y via the index while preserving in terpretability through the linear component ( H¨ ardle et al. , 1993 ; Xia and Hardle , 2006 ). Longitudinal estimation and testing for related partially linear single-index mo dels hav e b een studied in Liang et al. ( 2010 ), and our focus is to develop a likelihoo d- free inference pro cedure for ( 1 ) under within-sub ject dep endence. Because ( α α α 0 , η 0 ) is identifiable only up to scale, we imp ose the standard constraint ∥ α α α 0 ∥ 2 = 1 , α 0 , 1 > 0 . (2) 3 T o handle ( 2 ) seamlessly in b oth computation and inference, we reparameterize α α α ( φ φ φ ) = q 1 − ∥ φ φ φ ∥ 2 2 , φ φ φ ⊤ ⊤ , φ φ φ ∈ R q − 1 , ∥ φ φ φ ∥ 2 < 1 , (3) and define the finite-dimensional parameter as θ θ θ = ( β β β ⊤ , φ φ φ ⊤ ) ⊤ ∈ R d , d = p + q − 1 . 2.2 Siev e Appro ximation Let u ij ( θ θ θ ) = z z z ⊤ ij α α α ( φ φ φ ) denote the single-index. W e approximate the unkno wn smo oth link η 0 ( · ) b y a p olynomial spline siev e, η ( u ) ≈ B B B ( u ) ⊤ γ γ γ , B B B ( u ) = ( B 1 ( u ) , . . . , B K ( u )) ⊤ , γ γ γ ∈ R K , (4) where B B B ( · ) is tak en as cubic B -splines with quasi-uniform knots. This choice is computationally stable and flexible enough to capture nonlinear effects, while retaining a transparent bias–v ariance trade-off and tractable sieve theory for semiparametric inference ( Huang et al. , 2004 ; He and Shi , 2000 ). In particular, if η 0 is sufficien tly smo oth, the sieve appro ximation error is of order K − s for some s ≥ 2, and K = K n is allow ed to increase slo wly with n so that the appro ximation bias b ecomes asymptotically negligible. F or the i -th sub ject, define Y Y Y i = ( Y i 1 , . . . , Y im i ) ⊤ , X X X i = ( x x x i 1 , . . . , x x x im i ) ⊤ ∈ R m i × p , and the spline design matrix B B B i ( θ θ θ ) = ( B B B { u i 1 ( θ θ θ ) } , . . . , B B B { u im i ( θ θ θ ) } ) ⊤ ∈ R m i × K . Then the linear predictor and mean v ector can b e written in the compact form ξ ξ ξ i ( θ θ θ , γ γ γ ) = X X X i β β β + B B B i ( θ θ θ ) γ γ γ , µ µ µ i ( θ θ θ , γ γ γ ) = g − 1 { ξ ξ ξ i ( θ θ θ , γ γ γ ) } . (5) In implementation, w e select K from a small candidate set using a deviance- or information- criterion-t yp e rule, whic h empirically pro vides stable p erformance; the asymptotic theory only requires that K gro ws slo wly enough so that the siev e bias do es not affect ro ot- n inference for θ θ θ . 2.3 Profile Estimating Equations T o accommo date within-sub ject correlation, we adopt a w orking cov ariance V V V i ( θ θ θ , γ γ γ ) = A A A i ( θ θ θ , γ γ γ ) 1 / 2 R R R i ( ρ ρ ρ ) A A A i ( θ θ θ , γ γ γ ) 1 / 2 , (6) where A A A i = diag { v ( µ i 1 ) , . . . , v ( µ im i ) } with v ( · ) b eing the v ariance function implied by the mean mo del, and R R R i ( ρ ρ ρ ) is a w orking correlation matrix (e.g., indep endence, exc hangeable, AR(1)). This parallels the generalized estimating equations (GEE) framework ( Liang and Zeger , 1986 ; Qu et al. , 2000 ) Let ˙ µ ij = ∂ µ ij /∂ ξ ij and define ∆ ∆ ∆ i ( θ θ θ , γ γ γ ) = diag ( ˙ µ i 1 , . . . , ˙ µ im i ). F or fixed ( θ θ θ , γ γ γ ), the partial deriv ativ e with resp ect to β β β is ∂µ µ µ i ( θ θ θ , γ γ γ ) ∂β β β ⊤ = ∆ ∆ ∆ i ( θ θ θ , γ γ γ ) X X X i . 4 The deriv ativ e with resp ect to φ φ φ dep ends on the index score u ij ( θ θ θ ) and the deriv ativ e of the link function ˙ η ( · ); under the spline siev e ( 4 ), ˙ η ( u ) is computed from the deriv ativ e of the spline basis functions. W e denote the resulting partial Jacobian with respect to θ θ θ b y D D D i ( θ θ θ , γ γ γ ) = ∂µ µ µ i ( θ θ θ , γ γ γ ) ∂θ θ θ ⊤ ∈ R m i × d . A natural estimating equation for the join t parameter ( θ θ θ , γ γ γ ) is the quasi-score form n X i =1 D D D i ( θ θ θ , γ γ γ ) ⊤ V V V i ( θ θ θ , γ γ γ ) − 1 { Y Y Y i − µ µ µ i ( θ θ θ , γ γ γ ) } = 0 0 0 . (7) Directly solving ( 7 ) is feasible but treats the increasing-dimensional spline co efficient γ γ γ as part of the main parameter, whic h complicates inference for the finite-dimensional target θ θ θ = ( β β β ⊤ , φ φ φ ⊤ ) ⊤ . W e therefore adopt a profile approach: for each fixed θ θ θ , w e estimate the n uisance γ γ γ and then solv e a d -dimensional estimating equation in θ θ θ . Sp ecifically , define the inner (spline) estimating equation n X i =1 B B B i ( θ θ θ ) ⊤ ∆ ∆ ∆ i ( θ θ θ , γ γ γ ) V V V i ( θ θ θ , γ γ γ ) − 1 { Y Y Y i − µ µ µ i ( θ θ θ , γ γ γ ) } = 0 0 0 , (8) and let b γ γ γ ( θ θ θ ) denote a solution to ( 8 ), whic h can b e obtained via IRLS since ( 1 ) corresp onds to a generalized linear mo del with a known offset. Define the profiled mean and co v ariance as b µ µ µ i ( θ θ θ ) = µ µ µ i ( θ θ θ , b γ γ γ ( θ θ θ )) , b V V V i ( θ θ θ ) = V V V i ( θ θ θ , b γ γ γ ( θ θ θ )) . Then the outer estimating equation for θ θ θ is n X i =1 G G G i ( θ θ θ ) ⊤ b V V V i ( θ θ θ ) − 1 { Y Y Y i − b µ µ µ i ( θ θ θ ) } = 0 0 0 , (9) where G G G i ( θ θ θ ) = d b µ µ µ i ( θ θ θ ) /d θ θ θ ⊤ is the total deriv ativ e of the profiled mean with resp ect to θ θ θ . Note that G G G i ( θ θ θ ) captures the v ariation of b µ µ µ i b oth directly through θ θ θ and indirectly through the dep endence of b γ γ γ on θ θ θ . In practice, G G G i ( θ θ θ ) can b e c omputed by a numerical deriv ativ e, or analytically using the implicit function relationship induced by ( 8 ). Let b θ θ θ b e a solution to ( 9 ) and set b α α α = α α α ( b φ φ φ ) , b η ( u ) = B B B ( u ) ⊤ b γ γ γ ( b θ θ θ ) . The profile strategy is consistent with longitudinal partially linear single-index estimation ( Liang et al. , 2010 ) while b eing tailored here to supp ort empirical-likelihoo d inference for θ θ θ . By concen- trating out γ γ γ , the profile estimating equation effectiv ely remo v es the pro jection of the score function on to the nuisance tangent space, facilitating v alid semiparametric inference. 2.4 Blo c k Empirical Lik eliho o d for θ θ θ Define the estimating function g g g i ( θ θ θ ) = G G G i ( θ θ θ ) ⊤ b V V V i ( θ θ θ ) − 1 { Y Y Y i − b µ µ µ i ( θ θ θ ) } ∈ R d . (10) Then ( 9 ) is equiv alent to P n i =1 g g g i ( θ θ θ ) = 0 0 0. Crucially , although g g g i ( θ θ θ ) aggregates all repeated mea- suremen ts for sub ject i and incorp orates within-sub ject dep endence through b V V V i ( θ θ θ ), the sequence 5 { g g g i ( θ θ θ ) } n i =1 is i.i.d. across sub jects under our sampling assumption. This mak es blo ck empirical lik eliho o d (BEL) a natural inferential to ol ( Y ou et al. , 2006 ; Y u et al. , 2014 ). The BEL ratio for θ θ θ is defined as R ( θ θ θ ) = max { p i } ( n Y i =1 ( np i ) : p i ≥ 0 , n X i =1 p i = 1 , n X i =1 p i g g g i ( θ θ θ ) = 0 0 0 ) . (11) By the standard Lagrange-m ultiplier argumen t for empirical likelihoo d ( Owen , 2001 ; Kolaczyk , 1994 ), the maximizer has the form p i ( θ θ θ ) = 1 n { 1 + λ λ λ ( θ θ θ ) ⊤ g g g i ( θ θ θ ) } , where λ λ λ ( θ θ θ ) ∈ R d solv es n X i =1 g g g i ( θ θ θ ) 1 + λ λ λ ( θ θ θ ) ⊤ g g g i ( θ θ θ ) = 0 0 0 . (12) The corresp onding empirical log-lik eliho o d ratio statistic is ℓ ( θ θ θ ) = − 2 log R ( θ θ θ ) = 2 n X i =1 log n 1 + λ λ λ ( θ θ θ ) ⊤ g g g i ( θ θ θ ) o . (13) In longitudinal semiparametric mo dels, W ald-type inference typically requires estimating a sand- wic h co v ariance that is sensitive to smo othing, correlation missp ecification, and moderate n . BEL instead builds confidence regions b y in verting ℓ ( θ θ θ ) and often enjoys an (asymptotic) Wilks phe- nomenon, i.e., ℓ ( θ θ θ 0 ) conv erges to a χ 2 limit without explicit v ariance estimation ( Ow en , 2001 ; Kolaczyk , 1994 ; Y ou et al. , 2006 ). This “automatic studentization” is especially attractive when the n uisance function η 0 ( · ) is estimated nonparametrically and when the w orking correlation is used primarily for efficiency ( Qu et al. , 2000 ). Accordingly , a (1 − α ) BEL confidence region for θ θ θ is C α = θ θ θ : ℓ ( θ θ θ ) ≤ χ 2 d, 1 − α , (14) where χ 2 d, 1 − α is the (1 − α ) quan tile of χ 2 d . Marginal inference on a comp onent (e.g., β k ) can b e obtained by profiling ℓ ( θ θ θ ) ov er the remaining parameters, analogous to profile likelihoo d. 2.5 Bo otstrap Inference for η 0 ( · ) BEL targets the finite-dimensional parameter θ θ θ . F or η 0 ( · ), we recommend a (cluster) b o otstrap that resamples en tire sub jects to preserve within-sub ject dep endence ( Diggle et al. , 2002 ). Each b o otstrap sample refits ( 1 ) using the algorithm in Section 3 , pro ducing b η ∗ ( u ) on a grid. Poin t- wise confidence bands can b e formed by b o otstrap p ercentiles, and simultaneous bands can b e constructed from the b o otstrap distribution of sup u ∈U | b η ∗ ( u ) − b η ( u ) | . The bo otstrap complemen ts the Wilks-type BEL inference for θ θ θ and provides a practical uncertain ty quantification for the nonparametric comp onent. 3 Algorithm This section summarizes a practical implementation for the profile GEE estimator and the asso ci- ated blo ck empirical likelihoo d (BEL). 6 3.1 Inputs • Spline basis and dimension. Use a cubic B -spline basis B B B ( u ) on U with K basis functions and equally spaced interior knots. In practice, we select K from a small candidate set using a BIC/AIC-t yp e criterion or cross-v alidation, sub ject to the gro wth conditions in Assumption 6 . • W orking correlation. Choose a parametric family R R R i ( ρ ρ ρ ) (e.g., independence, exchangeable, or AR(1)). Up date ρ ρ ρ by method-of-moments using Pearson residuals, as in standard GEE implemen tations ( Liang and Zeger , 1986 ; Qu et al. , 2000 ). • Initialization. Initialize θ θ θ (0) = ( β β β (0) , φ φ φ (0) ) by fitting a working GLM that ignores the single- index nonlinearity (or b y a few iterations with indep endence working correlation), and nor- malize via ( 3 ). 3.2 Profile Algorithm for b θ θ θ and b η Algorithm 1 Profile fitting for longitudinal GPLSIM 1: Cho ose K and a working correlation family R R R i ( ρ ρ ρ ). 2: Initialize θ θ θ (0) = ( β β β (0) , φ φ φ (0) ) with ∥ φ φ φ (0) ∥ 2 < 1; set t = 0. 3: rep eat 4: Inner step (up date γ γ γ ). Given θ θ θ ( t ) , solv e the spline-score equation ( 8 ) for b γ γ γ ( t +1) = b γ γ γ ( θ θ θ ( t ) ) (e.g., by IRLS). 5: Correlation up date (optional). Compute Pearson residuals based on b µ µ µ i ( θ θ θ ( t ) ) and up date b ρ ρ ρ ( t +1) within the chosen R R R i ( · ) family . 6: Outer step (up date θ θ θ ). F orm the profiled estimating equation ( 9 ) using the profiled Jacobian G G G i ( θ θ θ ( t ) ) = ∂ b µ µ µ i ( θ θ θ ) /∂θ θ θ ⊤ θ θ θ = θ θ θ ( t ) (computed via implicit differen tiation of ( 8 )), and up date θ θ θ ( t +1) b y Newton or a damped quasi-Newton metho d (e.g., BF GS). 7: t ← t + 1. 8: until con v ergence in θ θ θ and γ γ γ . 9: Output b θ θ θ = θ θ θ ( t ) , b α α α = α α α ( b φ φ φ ), and b η ( u ) = B B B ( u ) ⊤ b γ γ γ ( b θ θ θ ). 3.3 BEL Statistic Giv en a candidate θ θ θ , compute g g g i ( θ θ θ ) in ( 10 ) and solv e the Lagrange-multiplier equation ( 12 ) for λ λ λ ( θ θ θ ). A stable approach is Newton’s metho d applied to Ψ Ψ Ψ( λ λ λ ) = n X i =1 g g g i ( θ θ θ ) 1 + λ λ λ ⊤ g g g i ( θ θ θ ) , ∂ Ψ Ψ Ψ( λ λ λ ) ∂λ λ λ ⊤ = − n X i =1 g g g i ( θ θ θ ) g g g i ( θ θ θ ) ⊤ { 1 + λ λ λ ⊤ g g g i ( θ θ θ ) } 2 . Because feasibility requires 1 + λ λ λ ⊤ g g g i ( θ θ θ ) > 0 for all i , we recommend a damp ed Newton up date with bac ktracking line searc h to preserv e p ositivity . Once λ λ λ ( θ θ θ ) is obtained, compute ℓ ( θ θ θ ) from ( 13 ) and form confidence regions using ( 14 ) (and the profile statistic in Theorem 3 when targeting sub vectors of θ θ θ ). 4 Asymptotic Theory In this section, we inv estigate the theoretical prop erties. Firstly , recall the longitudinal GPLSIM in ( 1 ) and the profile estimating equations ( 8 )–( 9 ). Let K = K n denote the sieve dimension in ( 4 ). 7 W rite the true parameter as θ θ θ 0 = ( β β β ⊤ 0 , φ φ φ ⊤ 0 ) ⊤ ∈ R d , and α α α 0 = α α α ( φ φ φ 0 ). F or an y θ θ θ , define the population (siev e) n uisance parameter γ γ γ 0 ( θ θ θ ) as a solution to the p opulation coun terpart of ( 8 ): E h B B B i ( θ θ θ ) ⊤ ∆ ∆ ∆ i ( θ θ θ , γ γ γ ) V V V i ( θ θ θ , γ γ γ ) − 1 { Y Y Y i − µ µ µ i ( θ θ θ , γ γ γ ) } i = 0 0 0 . (15) Define the p opulation profiled mean and co v ariance µ µ µ i, 0 ( θ θ θ ) = µ µ µ i ( θ θ θ , γ γ γ 0 ( θ θ θ )) , V V V i, 0 ( θ θ θ ) = V V V i ( θ θ θ , γ γ γ 0 ( θ θ θ )) , and the p opulation sub ject estimating function g g g i, 0 ( θ θ θ ) = G G G i, 0 ( θ θ θ ) ⊤ V V V i, 0 ( θ θ θ ) − 1 { Y Y Y i − µ µ µ i, 0 ( θ θ θ ) } , (16) where G G G i, 0 ( θ θ θ ) is the Jacobian of µ µ µ i, 0 ( θ θ θ ) w.r.t. θ θ θ . The sample estimating function in ( 10 ) equals g g g i ( θ θ θ ) = g g g i, 0 ( θ θ θ ) with γ γ γ 0 ( θ θ θ ) replaced b y b γ γ γ ( θ θ θ ) and (if applicable) ρ ρ ρ replaced by its up date. Define the p opulation momen t map U U U 0 ( θ θ θ ) = E { g g g i, 0 ( θ θ θ ) } , H H H 0 = ∂U U U 0 ( θ θ θ ) ∂θ θ θ ⊤ θ θ θ = θ θ θ 0 , S S S 0 = V ar { g g g i, 0 ( θ θ θ 0 ) } . Let b θ θ θ b e any measurable ro ot of the sample profile equation ( 9 ). 4.1 Assumptions W e next state a set of regularity conditions under which the profile estimator is ro ot- n consisten t and the BEL statistic admits a c hi-square (Wilks) limit. These assumptions are standard in longitudinal estimating-equation analysis and spline-sieve semiparametrics, but w e present them explicitly to clarify how within-sub ject dep endence and n uisance estimation are handled in our framework. Assumption 1 (Sampling and cluster size) . { ( Y ij , x x x ij , z z z ij ) : j = 1 , . . . , m i } ar e indep endent acr oss i . Within e ach subje ct, { ( Y ij , x x x ij , z z z ij ) } m i j =1 may b e arbitr arily dep endent. Mor e over max 1 ≤ i ≤ n m i ≤ M < ∞ for a fixe d c onstant M . Assumption 2 (Cov ariates and momen ts) . Ther e exists a c omp act set X × Z such that ( x x x ij , z z z ij ) ∈ X × Z almost sur ely. In addition, E ∥ g g g i, 0 ( θ θ θ 0 ) ∥ 4 2 < ∞ . Assumption 3 (Link and v ariance regularity) . The inverse link g − 1 ( · ) is twic e c ontinuously dif- fer entiable. The varianc e function v ( µ ) is c ontinuous and b ounde d away fr om 0 and ∞ on the r ange of µ ij . Mor e over ˙ µ ij = ∂ µ ij /∂ ξ ij is b ounde d away fr om 0 and ∞ uniformly in ( i, j ) and in a neighb orho o d of ( θ θ θ 0 , γ γ γ 0 ( θ θ θ 0 )) . Assumption 4 (Smo othness of η 0 ) . The true link η 0 is s times c ontinuously differ entiable on a c omp act interval U , with s ≥ 2 , and sup u ∈U | η ( s ) 0 ( u ) | < ∞ . Assumption 5 (Index supp ort and densit y) . L et U ij, 0 = z z z ⊤ ij α α α 0 . Then U ij, 0 ∈ U almost sur ely. The mar ginal density of U ij, 0 exists and is b ounde d away fr om 0 and ∞ on U . Assumption 6 (Siev e dimension gro wth) . The spline b asis in ( 4 ) is a cubic B -spline b asis with quasi-uniform knots on U . The sieve dimension K = K n satisfies K → ∞ , K 2 n → 0 , √ n K − s → 0 . (17) 8 Assumption 7 (W orking cov ariance) . F or any ( θ θ θ , γ γ γ ) in a neighb orho o d of ( θ θ θ 0 , γ γ γ 0 ( θ θ θ 0 )) , the eigen- values of V V V i ( θ θ θ , γ γ γ ) ar e uniformly b ounde d away fr om 0 and ∞ over i . If ρ ρ ρ is up date d, the r esulting b ρ ρ ρ c onver ges in pr ob ability to a deterministic limit ρ ρ ρ † . Assumption 8 (Local identifiabilit y and nonsingularit y) . The true p ar ameter φ φ φ 0 satisfies ∥ φ φ φ 0 ∥ 2 < 1 . The p opulation e quation U U U 0 ( θ θ θ ) = 0 has a unique solution at θ θ θ 0 in a neighb orho o d, and the Jac obian matrix H H H 0 is nonsingular. Assumption 1 formalizes the sub ject-as-a-blo c k view: arbitrary within-sub ject dep endence is allo wed, while cross-sub ject independence supplies the effective sample size n , matc hing the block empirical lik eliho o d paradigm ( Y ou et al. , 2006 ; Y u et al. , 2014 ). Assumption 2 imp oses bounded design and a finite fourth momen t for the sub ject estimating function; this is standard for deriving b oth asymptotic normality and the EL quadratic expansion ( Kolaczyk , 1994 ; Owen , 2001 ). As- sumption 3 ensures the mean map is smo oth and well b eha ved, which is needed for uniform T a ylor expansions in the profile equations and for stabilit y of IRLS-t yp e fitting ( Liang and Zeger , 1986 ; Qu et al. , 2000 ). Assumptions 4 – 5 guarantee the single-index U ij, 0 liv es on a compact interv al with w ell-b eha ved density , and that η 0 is sufficiently smo oth for spline sieve approximation with bias K − s ( Huang et al. , 2004 ; He and Shi , 2000 ). Assumption 6 balances sieve bias and v ariance so that the nuisance estimation error is asymptotically negligible at the √ n scale; this is the key condition that enables a Wilks phenomenon for BEL despite the nonparametric comp onen t. Assumption 7 guaran tees working cov ariance matrices remain inv ertible uniformly; if ρ ρ ρ is estimated, only conv er- gence to a deterministic limit is needed (it need not equal the true correlation), consistent with GEE practice ( W ang and Carey , 2004 ). Finally , Assumption 8 is a local identifiabilit y condition ensuring the profile estimating equation has a well-defined ro ot and that linearization yields a v alid influence representation. 4.2 Main Results Theorem 1 (Consistency and rates) . Under Assumptions 1 – 8 , ther e exists a se quenc e of r o ots b θ θ θ of ( 9 ) such that ∥ b θ θ θ − θ θ θ 0 ∥ 2 = O P ( n − 1 / 2 ) . Mor e over, with b η define d by the spline sieve at b θ θ θ , sup u ∈U | b η ( u ) − η 0 ( u ) | = O P K − s + r K n ! . Remark 1. The first conclusion states that the finite-dimensional target θ θ θ is estimable at the parametric rate despite the presence of an unknown link, reflecting the dimension-reduction b enefit of the single-index structure and the profiling step. The second rate is the familiar spline bias– v ariance trade-off: K − s is the approximation bias con trolled by smoothness in Assumption 4 , while p K/n is the sto chastic term. Assumption 6 ensures √ nK − s → 0 and K/n → 0, whic h makes the n uisance estimation error asymptotically negligible for ro ot- n inference on θ θ θ . Theorem 2 (Asymptotic normalit y) . Under Assumptions 1 – 8 , √ n ( b θ θ θ − θ θ θ 0 ) ⇒ N 0 0 0 , H H H − 1 0 S S S 0 ( H H H − 1 0 ) ⊤ . Equivalently, b θ θ θ admits the influenc e r epr esentation √ n ( b θ θ θ − θ θ θ 0 ) = − H H H − 1 0 1 √ n n X i =1 g g g i, 0 ( θ θ θ 0 ) + o P (1) . 9 Remark 2. The cov ariance matrix dep ends on the w orking co v ariance through g g g i, 0 . Thus, a w ell-chosen w orking correlation can improv e efficiency , but correct sp ecification is not required for consistency or asymptotic normality , matc hing the spirit of GEE. In practice, W ald inference based on estimating H H H 0 and S S S 0 can b e sensitive in mo derate samples, which motiv ates the BEL approach b elo w that av oids explicit sandwic h estimation. Lemma 1 (Quadratic expansion of BEL) . Under Assumptions 1 – 8 , let ¯ g g g ( θ θ θ 0 ) = n − 1 P n i =1 g g g i ( θ θ θ 0 ) and S S S n ( θ θ θ 0 ) = n − 1 P n i =1 g g g i ( θ θ θ 0 ) g g g i ( θ θ θ 0 ) ⊤ . Then ℓ ( θ θ θ 0 ) = n ¯ g g g ( θ θ θ 0 ) ⊤ S S S n ( θ θ θ 0 ) − 1 ¯ g g g ( θ θ θ 0 ) + o P (1) , wher e ℓ ( θ θ θ ) is define d in ( 13 ) . Remark 3. Lemma 1 is the technical core of Wilks-type results for empirical likelihoo d: it reduces the BEL statistic to a self-normalized quadratic form. Compared to i.i.d. EL, the no v elty here is that g g g i ( θ θ θ ) aggregates a dependent within-sub ject vector and inv olves a profiled nonparametric estimator. Assumptions 1 and 6 ensure these extra lay ers only con tribute o P (1) to the EL expansion. Theorem 3 (Wilks phenomenon for BEL) . Under Assumptions 1 – 8 , ℓ ( θ θ θ 0 ) ⇒ χ 2 d . Mor e gener al ly, let θ θ θ = ( θ θ θ ⊤ 1 , θ θ θ ⊤ 2 ) ⊤ with dim( θ θ θ 1 ) = r . Define the pr ofile BEL statistic ℓ prof ( θ θ θ 1 ) = min θ θ θ 2 ℓ ( θ θ θ 1 , θ θ θ 2 ) . Then ℓ prof ( θ θ θ 1 , 0 ) ⇒ χ 2 r . Remark 4. Theorem 3 justifies BEL c onfidence regions of the form { θ θ θ : ℓ ( θ θ θ ) ≤ χ 2 d, 1 − α } without explicitly estimating a sandwic h co v ariance, pro viding an “automatic studentization” effect familiar in EL theory ( Owen , 2001 ; Kolaczyk , 1994 ). The result holds under arbitrary within-sub ject dep endence b ecause inference is built on blo cks, and it remains v alid in semiparametric settings b ecause the n uisance estimation error is controlled by Assumption 6 . The profile version yields c hi-square limits for marginal inference, analogous to profile lik eliho o d. 5 Sim ulation Studies This section ev aluates the finite-sample p erformance of the proposed profile blo ck empirical like- liho o d (Profile-BEL) inference for the finite-dimensional parameter θ θ θ = ( β β β ⊤ , φ φ φ ⊤ ) ⊤ , together with the spline-sieve estimator b η ( · ) for the unknown link. W e fo cus on three outcome t yp es (Gaussian, Bernoulli, and P oisson) and systematically v ary the strength of within-sub ject dep endence to examine b oth estimation accuracy and inferential v alidit y under longitudinal correlation, in the spirit of GEE b enchmarking ( Liang and Zeger , 1986 ; Diggle et al. , 2002 ; Qu et al. , 2000 ). 5.1 Data Generation F or each Monte Carlo replication, w e generate indep enden t sub jects i = 1 , . . . , n with b ounded cluster size m i ≡ m (default m = 5) and consider tw o sample sizes n ∈ { 100 , 200 } . Let ( p, q ) = (3 , 3) and set the true co efficients β β β 0 = (1 , − 1 , 0 . 5) ⊤ , α α α 0 = (1 , 1 , 1) ⊤ √ 3 , θ θ θ 0 = ( β β β ⊤ 0 , φ φ φ ⊤ 0 ) ⊤ , 10 where φ φ φ 0 is the ( q − 1)-dimensional parameter in ( 3 ) corresp onding to α α α 0 . F or each sub ject i and visit j , generate ( x x x ⊤ ij , z z z ⊤ ij ) ⊤ ∈ R p + q from a centered Gaussian distribution with unit v ariances and T o eplitz correlation Corr( W k , W ℓ ) = κ | k − ℓ | , with κ ∈ { 0 , 0 . 3 } to repre- sen t weak/moderate collinearit y . W e also consider a sensitivity exp eriment with hea vier tails in Section 5.5 . Define the index u ij, 0 = z z z ⊤ ij α α α 0 and rescale it to [0 , 1] b y ˜ u ij, 0 = { u ij, 0 − min( u ij, 0 ) } / { max( u ij, 0 ) − min( u ij, 0 ) } within each replication. Set η 0 ( u ) = sin(2 π u ) , (18) so that η 0 is nonlinear and smo oth but not p olynomial, making it informative for spline-sieve appro ximation ( Huang et al. , 2004 ; He and Shi , 2000 ). T o induce longitudinal correlation while keeping sub jects indep endent, generate a laten t Gaus- sian v ector b b b i = ( b i 1 , . . . , b im ) ⊤ ∼ N ( 0 0 0 , σ 2 b R R R AR ( ρ )), where R R R AR ( ρ ) is the AR(1) correlation matrix with en tries ρ | j − k | . W e v ary ρ ∈ { 0 , 0 . 3 , 0 . 6 } and fix σ b = 0 . 6 (mo derate dep endence). The laten t effect enters the conditional linear predictor, which is a standard sim ulation device for correlated non-Gaussian outcomes. Let the systematic comp onen t b e ξ ij, 0 = x x x ⊤ ij β β β 0 + η 0 ( ˜ u ij, 0 ) . W e generate outcomes from three families: • Gaussian: Y ij = ξ ij, 0 + b ij + ε ij with ε ij i.i.d. ∼ N (0 , σ 2 ε ) and σ ε = 1. • Bernoulli: Y ij | b ij ∼ Bernoulli( π ij ) with logit( π ij ) = ξ ij, 0 + b ij . • Poisson: Y ij | b ij ∼ Poisson( λ ij ) with log ( λ ij ) = ξ ij, 0 + b ij . The analysis model in Section 2 targets the marginal mean structure ( 1 ); thus, in Bernoulli/P oisson cases the latent effect also serv es as a delib erate mild missp ecification to prob e robustness of estimating-equation inference. 5.2 Implemen tation W e compare the prop osed BEL-based inference with sev eral comp eting approac hes that are rou- tinely used for semiparametric longitudinal mo dels. • Profile-BEL. Let ψ i ( θ , α, η ) b e the estimating function and define the BEL ratio ℓ ( θ ) = 2 sup λ P n i =1 log { 1 + λ ⊤ ψ i ( θ , b α ( θ ) , b η ( θ )) } , where ( b α ( θ ) , b η ( θ )) are obtained b y iteratively up dating α (and η ) conditional on θ until con vergence. A 95% CI for a comp onen t of θ is { θ j : ℓ ( θ ) ≤ χ 2 1 , 0 . 95 } ; p oint wise bands for η 0 ( · ) are built b y bo otstrap. • Naive-EL. This is an indep endence-based EL that treats all ( i, j ) as i.i.d. and uses observ ation- lev el estimating equations g ij ( θ ): ℓ ind ( θ ) = 2 sup λ P i,j log { 1 + λ ⊤ g ij ( θ ) } . CIs are obtained b y { θ j : ℓ ind ( θ ) ≤ χ 2 1 , 0 . 95 } , ignoring within-sub ject correlation in both estimation and inference. • GEE-W ald. P oint estimator b θ solving a GEE-t yp e blo ck equation P n i =1 ψ i ( θ , α, η ) = 0 under a ch osen working correlation, and form W ald CIs b θ j ± 1 . 96 q d V ar( b θ ) j j , with d V ar( b θ ) given by the plug-in sandwich estimator. 11 • GEE-Poly . W e approximate the nonparametric comp onen t by a low-order p olynomial η ( t ) ≈ P d ℓ =0 c ℓ t ℓ and fit the resulting parametric mo del b y GEE. Inference for θ is again W ald-t yp e using the sandwich co v ariance under the same w orking-correlation options. Unless otherwise stated, w e use cubic B -splines on U = [0 , 1] with quasi-uniform knots. The siev e dimension is selected from K ∈ { 6 , 8 , 10 , 12 } b y a BIC-type criterion based on the working quasi- lik eliho o d (Gaussian) or binomial/Poisson deviance (non-Gaussian). F or the working correlation in ( 6 ), we fit under three c hoices: R R R i = I I I (indep endence) , R R R i = R R R EX ( ρ ) (exc hangeable) , R R R i = R R R AR ( ρ ) (AR(1)) , to assess efficiency gains and sensitivit y to missp ecification. The default rep orted results use AR(1) w orking correlation with ρ estimated b y moment metho ds as in standard GEE implementations. W e run B = 200 Monte Carlo replications for each configuration. F or the η 0 ( · ) bands, we use a b o otstrap with B ∗ = 200 resamples p er replication (resampling en tire sub jects to preserve dep endence). All optimizations use the profile iteration in Algorithm 1 ; the BEL multiplier ( 12 ) is solv ed b y Newton’s metho d as in Section 3 . 5.3 P erformance Measures W e ev aluate finite-sample p erformance from three complementary p ersp ectives: estimation accu- racy for the finite-dimensional parameters, reco very quality for the nonparametric link, and cal- ibration/efficiency of the resulting uncertaint y quan tification. In what follo ws, b α α α = α α α ( b φ φ φ ) and B denotes the num b er of Mon te Carlo replications. F or each scalar com p onen t θ k of θ θ θ , w e rep ort the empirical bias and ro ot mean squared error (RMSE), Bias( b θ k ) = 1 B B X b =1 b θ ( b ) k − θ 0 ,k , RMSE( b θ k ) = ( 1 B B X b =1 b θ ( b ) k − θ 0 ,k 2 ) 1 / 2 , where b θ ( b ) k is the estimate from the b -th replication. T o assess reco very of the index direction, we additionally rep ort the angle error Ang( b α α α, α α α 0 ) = arccos b α α α ⊤ α α α 0 , whic h is inv ariant to sign changes in b α α α . T o quantify accuracy of the estimated link function, w e ev aluate b η on a dense grid { u ℓ } L ℓ =1 o ver [0 , 1] (default L = 200) and compute the integrated squared error ISE( b η ) = 1 L L X ℓ =1 { b η ( u ℓ ) − η 0 ( u ℓ ) } 2 . W e summarize the distribution of ISE( b η ) across replications b y its mean and selected quantiles, whic h pro vides a stable picture of the bias–v ariance trade-off induced b y the spline sieve. F or inferential p erformance, we fo cus on tw o standard metrics for nominal 95% confidence in terv als: empirical cov erage probability and a v erage interv al length. F or each targeted scalar parameter, let CI ( b ) k b e the interv al pro duced in replication b . W e compute Co ver = 1 B B X b =1 1 { θ 0 ,k ∈ CI ( b ) k } , Len = 1 B B X b =1 length(CI ( b ) k ) . 12 F or the nonparametric comp onent, we rep ort p oint wise cov erage on the grid for the b o otstrap bands of η 0 ( · ), and (when included) simultaneous cov erage based on the supremum deviation used to construct uniform bands. 5.4 Main Results W e summarize the sim ulation findings in three parts: estimation accuracy for the finite-dimensional comp onen t, inferential v alidit y for θ θ θ , and reco v ery of the nonparametric link η 0 ( · ). W e addition- ally examine three working correlation structures to assess correlation sensitivit y and the spline dimension K is selected by the criterion describ ed in Section 5.2 . T ables 1 – 3 rep ort RMSE for the comp onents of β β β , the angle error for the index direction α α α , and the integrated squared error for b η . Across all three outcome families and a wide range of within- sub ject dep endence lev els, Profile-BEL delivers the most accurate recov ery of the index direction and the nonparametric link. In particular, Profile-BEL consistently attains the smallest (or near- smallest) ISE( b η ) and the smallest angular error Ang( b α , α 0 ), while maintaining competitive RMSEs for b β . This adv an tage is most pronounced in the moderate-to-strong dep endence regimes, where metho ds that either ignore correlation (Naive-EL) or rely on W ald-type plug-in v ariance form ulas (GEE-W ald) tend to exhibit noticeably larger errors in b α and b η . The con trast with the p olynomial working correlation approach is particularly clear. Although GEE-P oly can yield reasonable estimates in some configurations, it is far more sensitiv e to mo del missp ecification of η ( · ): the polynomial restriction ma y lead to inflated errors for b η and, conse- quen tly , degraded estimation of the index direction. In c omparison, Profile-BEL a voids imp osing a rigid parametric form on η ( · ) and up dates ( α , η ) iteratively , which translates into more stable and accurate recov ery of the single-index comp onent. The nonparametric comp onent exhibits the expected spline bias–v ariance trade-off. The ISE decreases as n increases, and the median selected K remains in a mo derate range, indicating that the selection pro cedure av oids severe underfitting or o verfitting across scenarios. In settings with stronger dep endence, the ISE can increase mo destly , reflecting reduced effective information p er sub ject when observ ations are more redundant. Overall, b η tracks the oscillatory shape of η 0 w ell, and the improv ement with larger n is consisten t with the rate statemen t in Theorem 1 . T able 4 ev aluates inference quality by rep orting empirical cov erage and av erage interv al length for nominal 95% CIs of selected parameters under differen t working correlation choices. Overall, Profile-BEL provides reliable inference with fav orable length–co v erage trade-offs: its interv als re- main comparativ ely short while ac hieving co v erage closer to the nominal lev el for the w ell-identified comp onen ts (notably for β 2 and α 2 in challenging scenarios). In contrast, GEE-W ald interv als can b e noticeably wider and more sensitive to the assumed working correlation, and GEE-Poly may pro duce unstable inference when the p olynomial restriction p o orly matc hes the true link. Finally , Figures 1 - 2 visualize the estimated link function and its p oint wise bo otstrap band under Bernoulli and Guassian settings. The Profile-BEL estimate closely tracks the true curve, and the b o otstrap band pro vides an appropriate uncertaint y env elop e across the index range, offering a clear graphical confirmation of the improv ed η ( · ) recov ery suggested b y the ISE summaries. T aken together, the sim ulation evidence supp orts t wo main conclusions. First, the prop osed profile estimator yields accurate finite-dimensional estimation and reliable recov ery of the index direction under a range of within-sub ject dependence structures. Second, the Profile-BEL approac h deliv ers well-calibrated confidence interv als for θ θ θ in finite samples, while the b o otstrap bands provide a practical and in terpretable uncertain ty assessmen t for η 0 ( · ). 13 T able 1: Gaussian Case n ρ Method RMSE( b β 1 ) RMSE( b β 2 ) RMSE( b β 3 ) Ang( b α α α, α α α 0 ) ISE( b η ) K (med) Ind. Exc. AR(1) Ind. Exc. AR(1) Ind. Exc. AR(1) Ind. Exc. AR(1) Ind. Exc. AR(1) 100 0.0 Profile-BEL 0.1596 0.1596 0.1596 0.0562 0.0562 0.0562 0.0562 0.0562 0.0562 0.1305 0.1305 0.1305 0.1117 0.1117 0.1117 6 Naive-EL 0.1597 0.1598 0.1597 0.0568 0.0568 0.0568 0.0822 0.0822 0.0822 0.4496 0.4496 0.4496 0.1716 0.1716 0.1716 8 GEE-W ald 0.1598 0.1597 0.1598 0.0573 0.0573 0.0573 0.0895 0.0895 0.0895 0.5172 0.5172 0.5172 0.2309 0.2309 0.2309 12 GEE-Poly 1.6665 1.6649 1.6676 0.0583 0.0583 0.0584 0.0957 0.0959 0.0958 0.6184 0.6184 0.6184 0.3877 0.3879 0.3876 2 100 0.3 Profile-BEL 0.1611 0.1611 0.1611 0.0563 0.0563 0.0563 0.0573 0.0573 0.0573 0.1299 0.1299 0.1299 0.1120 0.1120 0.1120 6 Naive-EL 0.1612 0.1612 0.1612 0.0563 0.0563 0.0563 0.0859 0.0859 0.0859 0.4641 0.4641 0.4641 0.1669 0.1669 0.1668 8 GEE-W ald 0.1613 0.1613 0.1613 0.0573 0.0573 0.0573 0.0899 0.0899 0.0899 0.5145 0.5145 0.5145 0.2307 0.2307 0.2307 12 GEE-Poly 1.6754 1.6722 1.6736 0.0582 0.0575 0.0575 0.0962 0.0965 0.0962 0.6182 0.6182 06182 0.3886 0.3884 0.3871 2 100 0.6 Profile-BEL 0.1631 0.1631 0.1631 0.0565 0.0565 0.0565 0.0581 0.0581 0.0581 0.1282 0.1282 0.1282 0.1122 0.1122 0.1122 6 Naive-EL 0.1632 0.1632 0.1632 0.0578 0.0578 0.0578 0.0864 0.0864 0.0864 0.4799 0.4799 0.4799 0.1716 0.1716 0.1716 8 GEE-W ald 0.1633 0.1633 0.1633 0.0574 0.0574 0.0574 0.0909 0.0909 0.0909 0.5194 0.5194 0.5194 0.2346 0.2346 0.2346 12 GEE-Poly 1.6873 1.6803 1.6816 0.0584 0.0564 0.0565 0.0964 0.0966 0.0962 0.6162 0.6162 0.6162 0.3899 0.3882 0.3867 2 200 0.0 Profile-BEL 0.1499 0.1499 0.1499 0.0399 0.0399 0.0399 0.0405 0.0405 0.0405 0.0950 0.0950 0.0950 0.0810 0.0810 0.0810 6 Naive-EL 0.1496 0.1496 0.1496 0.0413 0.0414 0.0413 0.0849 0.0849 0.0849 0.6162 0.6162 0.6162 0.1625 0.1625 0.1625 8 GEE-W ald 0.1496 0.1496 0.1496 0.0414 0.0413 0.0414 0.0851 0.0851 0.0851 0.6119 0.6119 0.6119 0.2122 0.2122 0.2122 12 GEE-Poly 1.7325 1.7327 1.7331 0.0411 0.0413 0.0412 0.0848 0.0852 0.0850 0.6214 0.6214 0.6214 0.4346 0.4344 0.4346 2 200 0.3 Profile-BEL 0.1510 0.1510 0.1510 0.0400 0.0400 0.0400 0.0412 0.0412 0.0412 0.0957 0.0957 0.0957 0.0805 0.0805 0.0805 6 Naive-EL 0.1507 0.1507 0.1507 0.0416 0.0416 0.0416 0.0860 0.0860 0.0860 0.6197 0.6197 0.6197 0.1622 0.1622 0.1622 8 GEE-W ald 0.1507 0.1507 0.1507 0.0416 0.0416 0.0416 0.0859 0.0859 0.0859 0.6144 0.6144 0.6144 0.2122 0.2122 0.2122 12 GEE-Poly 1.7318 1.7278 1.7296 0.0414 0.0412 0.0411 0.0855 0.0853 0.0854 0.6217 0.6217 0.6217 0.4342 0.4322 0.4334 2 200 0.6 Profile-BEL 0.1523 0.1523 0.1523 0.0398 0.0398 0.0398 0.0419 0.0419 0.0419 0.0961 0.0961 0.0961 0.0804 0.0804 0.0804 6 Naive-EL 0.1520 0.1520 0.1519 0.0416 0.0416 0.0416 0.0865 0.0865 0.0865 0.6194 0.6093 0.6194 0.1622 0.1622 0.16224 8 GEE-W ald 0.1519 0.1519 0.1519 0.0415 0.0415 0.0415 0.0863 0.0863 0.0863 0.6093 0.6194 0.6093 0.2124 0.2124 0.2124 12 GEE-Poly 1.7304 1.7216 1.7261 0.0415 0.0405 0.0404 0.0862 0.0852 0.0857 0.6221 0.6221 0.6221 0.4333 0.4312 0.4318 2 14 T able 2: Bernoulli Case n ρ Method RMSE( b β 1 ) RMSE( b β 2 ) RMSE( b β 3 ) Ang( b α α α, α α α 0 ) ISE( b η ) K (med) Ind. Exc. AR(1) Ind. Exc. AR(1) Ind. Exc. AR(1) Ind. Exc. AR(1) Ind. Exc. AR(1) 100 0.0 Profile-BEL 0.2026 0.2026 0.2026 0.1347 0.1347 0.1347 0.1840 0.1840 0.1840 0.3878 0.3878 0.3878 1.0929 1.0929 1.0929 6 Naive-EL 0.1920 0.1920 0.1920 0.1311 0.1311 0.1311 0.1803 0.1803 0.1803 0.5767 0.5767 0.5767 3.2271 3.2271 3.2271 8 GEE-W ald 0.2477 0.2477 0.2477 0.1359 0.1359 0.1359 0.2180 0.2180 0.2180 0.6378 0.6378 0.6378 1.4955 1.4955 1.4955 12 GEE-Poly 2.0976 2.0968 2.1063 0.1330 0.1333 0.1321 0.1796 0.1807 0.1774 0.5102 0.5102 0.5102 0.4949 0.4927 0.5035 2 100 0.3 Profile-BEL 0.1974 0.1974 0.1974 0.1171 0.1171 0.1171 0.1784 0.1784 0.1784 0.3101 0.3101 0.3101 2.7356 2.7356 2.7356 6 Naive-EL 0.1837 0.1837 0.1837 0.1145 0.1145 0.1145 0.1871 0.1871 0.1871 0.4964 0.4964 0.4964 4.7557 4.7557 4.7557 8 GEE-W ald 0.2172 0.2172 0.2172 0.1071 0.1071 0.1071 0.1685 0.1685 0.1685 0.6304 0.6304 0.6304 4.4098 4.4098 4.4098 12 GEE-Poly 1.8148 1.8197 1.8218 0.1102 0.1109 0.1104 0.1744 0.1752 0.1739 0.5635 0.5635 0.5635 0.3823 0.3812 0.3832 2 100 0.6 Profile-BEL 0.1597 0.1597 0.1597 0.1424 0.1424 0.1424 0.1770 0.1770 0.1770 0.2792 0.2792 0.2792 4.4703 4.4703 4.703 6 Naive-EL 0.1503 0.1503 0.1503 0.1303 0.1303 0.1303 0.1726 0.1726 0.1726 0.5266 0.5266 0.5266 8.8542 8.8542 8.8542 8 GEE-W ald 0.1513 0.1513 0.1513 0.1408 0.1408 0.1408 0.1778 0.1778 0.1778 0.5774 0.5774 0.5774 13.6590 13.6590 13.6590 12 GEE-Poly 1.9911 1.9837 1.9885 0.1344 0.1332 0.1355 0.1697 0.1716 0.1681 0.4564 0.4564 0.4564 0.4665 0.4647 0.4698 2 200 0.0 Profile-BEL 0.1577 0.1577 0.1577 0.0820 0.0820 0.0820 0.0801 0.0801 0.0801 0.2676 0.2676 0.2676 0.2967 0.2967 0.2967 6 Naive-EL 0.1532 0.1532 0.1532 0.0825 0.0825 0.0825 0.1243 0.1243 0.1243 0.5412 0.5412 0.5412 0.5018 0.5018 0.5018 8 GEE-W ald 0.1546 0.1546 0.1546 0.0740 0.0740 0.0740 0.1172 0.1172 0.1172 0.4531 0.4531 0.4531 1.1278 1.1278 1.1278 12 GEE-Poly 1.6207 1.6224 1.6169 0.1006 0.1001 0.1003 0.1450 0.1445 0.1447 0.6309 0.6309 0.6309 0.3344 0.3351 0.3331 2 200 0.3 Profile-BEL 0.1422 0.1422 0.1422 0.1021 0.1021 0.1021 0.0792 0.0792 0.0792 0.2720 0.2720 0.2720 0.2925 0.2925 0.2925 6 Naive-EL 0.1342 0.1342 0.1342 0.0990 0.0990 0.0990 0.1061 0.1061 0.1061 0.4620 0.4620 0.4620 0.6091 0.6091 0.6091 8 GEE-W ald 0.1359 0.1359 0.1359 0.0910 0.0910 0.0910 0.1125 0.1125 0.1125 0.4610 0.4610 0.4610 0.7399 0.7399 0.7399 12 GEE-Poly 1.8000 1.7951 1.7959 0.1149 0.1154 0.1144 0.1336 0.1338 0.1333 0.6239 0.6239 0.6239 0.3722 0.3731 0.3714 2 200 0.6 Profile-BEL 0.1503 0.1503 0.1503 0.1271 0.1271 0.1271 0.0788 0.0788 0.0788 0.2412 0.2412 0.2412 0.2966 0.2966 0.2966 6 Naive-EL 0.1486 0.1486 0.1486 0.1306 0.1306 0.1306 0.1307 0.1307 0.1307 0.4894 0.4894 0.4894 0.7119 0.7119 0.7119 8 GEE-W ald 0.1607 0.1607 0.1607 0.1175 0.1175 0.1175 0.1309 0.1309 0.1309 0.5114 0.511 0.5114 14.7871 14.7871 14.7871 12 GEE-Poly 1.9209 1.9300 1.9280 0.1411 0.1432 0.1413 0.1426 0.1433 0.1435 0.6225 0.6225 0.6225 0.3913 0.3958 0.3924 2 15 T able 3: P oisson Case n ρ Metho d RMSE( b β 1 ) RMSE( b β 2 ) RMSE( b β 3 ) Ang( b α α α, α α α 0 ) ISE( b η ) K (med) Ind. Exc. AR(1) Ind. Exc. AR(1) Ind. Exc. AR(1) Ind. Exc. AR(1) Ind. Exc. AR(1) 100 0.0 Profile-BEL 0.2303 0.2303 0.2303 0.0669 0.0669 0.0669 0.0609 0.0609 0.0609 0.1562 0.1562 0.1562 0.2601 0.2601 0.2601 10 Naive-EL 0.2713 0.2713 0.2713 0.0741 0.0741 0.0741 0.0983 0.0983 0.0983 0.6339 0.6339 0.6339 0.2605 0.2065 0.2065 8 GEE-W ald 0.2663 0.2663 0.2663 0.0724 0.0724 0.0724 0.0997 0.0997 0.0997 0.6332 0.6332 0.6332 0.3068 0.3068 0.3068 12 GEE-Poly 1.3390 2.7788 1.3381 0.0758 0.5585 0.0757 0.0970 0.2965 0.0970 0.6276 0.6273 0.6276 0.5192 3.4769 0.5201 2 100 0.3 Profile-BEL 0.2346 0.2346 0.2346 0.0657 0.0657 0.0657 0.0641 0.0641 0.0641 0.1476 0.1476 0.1476 0.1711 0.1711 0.1711 10 Naive-EL 0.2732 0.2732 0.2732 0.0688 0.0688 0.0688 0.0999 0.0999 0.0999 0.6271 0.6271 0.6271 0.1992 0.1992 0.1992 8 GEE-W ald 0.2679 0.2679 0.2679 0.0680 0.0680 0.0680 0.0991 0.0991 0.0991 0.6250 0.6250 0.6250 0.2899 0.2899 0.2899 12 GEE-Poly 1.3427 1.3328 1.3393 0.0715 0.1002 0.0718 0.0999 0.0996 0.0996 0.6328 0.6328 0.6328 0.5177 0.5214 0.5144 2 100 0.6 Profile-BEL 0.2398 0.2398 0.2398 0.0720 0.0720 0.0720 0.0683 0.0683 0.0683 0.1523 0.1523 0.1523 0.2230 0.2230 0.2230 10 Naive-EL 0.2781 0.2781 0.2781 0.0732 0.0732 0.0732 0.1011 0.1011 0.1011 0.6352 0.6352 0.6352 0.1767 0.1767 0.1767 8 GEE-W ald 0.2723 0.2723 0.2723 0.0736 0.0736 0.0736 0.1014 0.1014 0.1014 0.6300 0.6300 0.6300 0.3614 0.3614 0.3614 12 GEE-Poly 1.3531 1.3301 1.3446 0.0751 0.1197 0.0745 0.1003 0.0994 0.0992 0.6349 0.6366 0.6349 0.5169 0.5177 0.5072 2 200 0.0 Profile-BEL 0.2327 0.2327 0.2327 0.0497 0.0497 0.0497 0.0432 0.0432 0.0432 0.1025 0.1025 0.1025 0.1662 0.1662 0.1662 10 Naive-EL 0.2733 0.2733 0.2733 0.0558 0.0558 0.0558 0.0846 0.0846 0.0846 0.6331 0.6331 0.6331 0.1806 0.1806 0.1806 8 GEE-W ald 0.2710 0.2710 0.2710 0.0542 0.0542 0.0542 0.0845 0.0845 0.0845 0.6335 0.6335 0.6335 0.2267 0.2267 0.2267 12 GEE-Poly 1.3938 1.4095 1.3951 0.0566 0.3027 0.0567 0.0844 0.0886 0.0844 0.6328 0.6324 0.6328 0.5864 1.1599 0.5865 2 200 0.3 Profile-BEL 0.2334 0.2334 0.2334 0.0503 0.0503 0.0503 0.0441 0.0441 0.0441 0.1042 0.1042 0.1042 0.1486 0.1486 0.1486 10 Naive-EL 0.2739 0.2739 0.2739 0.0557 0.0557 0.0557 0.0859 0.0859 0.0859 0.6326 0.6326 0.6326 0.1732 0.1732 0.1732 8 GEE-W ald 0.2709 0.2709 0.2709 0.0549 0.0549 0.0549 0.0855 0.0855 0.0855 0.6341 0.6341 0.6341 0.2526 0.2526 0.2526 12 GEE-Poly 1.4006 1.3947 1.3989 0.0566 0.0664 0.0561 0.0854 0.0849 0.0849 0.6326 0.6325 0.6326 0.5791 0.5889 0.5796 2 200 0.6 Profile-BEL 0.2327 0.2327 0.2327 0.0526 0.0526 0.0526 0.0458 0.0458 0.0458 0.1041 0.1041 0.1041 0.1555 0.1555 0.1555 10 Naive-EL 0.2735 0.2735 0.2735 0.0565 0.0565 0.0565 0.0862 0.0862 0.0862 0.6296 0.6296 0.6296 0.1699 0.1699 0.1699 8 GEE-W ald 0.2706 0.2706 0.2706 0.0559 0.0559 0.0559 0.0863 0.0863 0.0863 0.6287 0.6287 0.6287 0.2322 0.2322 0.2322 12 GEE-Poly 1.3965 1.3499 1.3966 0.0574 0.1158 0.0561 0.0861 0.0830 0.0849 0.6327 0.6324 0.6327 0.5862 0.6572 0.5852 2 16 T able 4: Empirical cov erage and a verage length of nominal 95% CIs for se lected parameters. Case W-Cor Metho d β 1 β 2 α 2 Co v er Len Cov er Len Cov er Len Gaussian Indep endence Profile-BEL 0.6150 0.2875 0.9900 0.2594 0.9650 0.4423 Naiv e-BEL 0.4750 0.2092 0.9500 0.2219 0.7950 0.5165 GEE-W ald 0.5200 0.2394 0.9550 0.2194 0.9500 0.7150 GEE-P oly 0.0350 1.5385 0.9500 0.2237 0.4500 0.3862 Exc hange Profile-BEL 0.6150 0.2875 0.9900 0.2594 0.9650 0.4423 Naiv e-BEL 0.4750 0.2092 0.9500 0.2219 0.7950 0.5165 GEE-W ald 0.5200 0.2394 0.9550 0.2194 0.9500 0.7150 GEE-P oly 0.0250 1.5254 0.9550 0.2215 0.4500 0.3862 AR(1) Profile-BEL 0.6150 0.2875 0.9900 0.2594 0.9650 0.4423 Naiv e-BEL 0.4750 0.2092 0.9500 0.2219 0.7950 0.5165 GEE-W ald 0.5200 0.2394 0.9550 0.2194 0.9500 0.7150 GEE-P oly 0.0250 1.5190 0.9500 0.2206 0.4500 0.3862 Bernoulli Indep endence Profile-BEL 0.8450 0.6227 0.9650 0.6259 0.9600 1.1121 Naiv e-BEL 0.7800 0.7062 0.9300 0.5169 0.9400 1.3321 GEE-W ald 0.7550 0.8286 0.9200 0.5193 0.9050 2.1365 GEE-P oly 0.4950 3.9400 0.9000 0.5028 0.9600 0.9839 Exc hange Profile-BEL 0.8450 0.6227 0.9650 0.6259 0.9600 1.1121 Naiv e-BEL 0.7800 0.7062 0.9300 0.5169 0.9400 1.3321 GEE-W ald 0.7550 0.8286 0.9200 0.5193 0.9050 2.1365 GEE-P oly 0.4750 3.9284 0.8950 0.5016 0.9600 0.9839 AR(1) Profile-BEL 0.8450 0.6227 0.9650 0.6259 0.9600 1.1121 Naiv e-BEL 0.7800 0.7062 0.9300 0.5169 0.9400 1.3321 GEE-W ald 0.7550 0.8286 0.9200 0.5193 0.9050 2.1365 GEE-P oly 0.4750 3.9329 0.9000 0.5012 0.9600 0.9839 P oisson Indep endence Profile-BEL 0.4300 0.2888 0.9250 0.2458 0.9850 0.6272 Naiv e-BEL 0.1400 0.1071 0.4950 0.0778 0.6800 0.4960 GEE-W ald 0.2500 0.2448 0.8500 0.2188 0.9700 1.0678 GEE-P oly 0.0650 1.1656 0.8850 0.2370 0.6600 0.4940 Exc hange Profile-BEL 0.4300 0.2888 0.9250 0.2458 0.9850 0.6272 Naiv e-BEL 0.1400 0.1071 0.4950 0.0778 0.6800 0.4960 GEE-W ald 0.2500 0.2448 0.8500 0.2188 0.9700 1.0678 GEE-P oly 0.1285 1.2930 0.8827 0.3480 0.6648 0.5058 AR(1) Profile-BEL 0.4300 0.2888 0.9250 0.2458 0.9850 0.6272 Naiv e-BEL 0.1400 0.1071 0.4950 0.0778 0.6800 0.4960 GEE-W ald 0.2500 0.2448 0.8500 0.2188 0.9700 1.0678 GEE-P oly 0.0450 1.1291 0.8850 0.2322 0.6600 0.4940 17 0.0 0.2 0.4 0.6 0.8 1.0 −5 0 5 10 15 20 25 u = alpha^T z η ( u ) T rue eta(u) Estimated eta(u) 95% bootstrap band 0.0 0.2 0.4 0.6 0.8 1.0 −5 0 5 10 15 20 25 30 u = alpha^T z η ( u ) T rue eta(u) Estimated eta(u) 95% bootstrap band 0.0 0.2 0.4 0.6 0.8 1.0 0 10 20 30 u = alpha^T z η ( u ) T rue eta(u) Estimated eta(u) 95% bootstrap band Figure 1: Representativ e fit of η 0 ( u ) and b η ( u ) with 95% b o otstrap bands under Bernoulli case. ( a ) T op L eft : n = 200, ρ = 0 . 0; ( b ) T op Right : n = 200, ρ = 0 . 3; ( c ) Bottom : n = 200, ρ = 0 . 6. 0.0 0.2 0.4 0.6 0.8 1.0 −1.5 −1.0 −0.5 0.0 0.5 1.0 1.5 u = alpha^T z η ( u ) T rue eta(u) Estimated eta(u) 95% bootstrap band 0.0 0.2 0.4 0.6 0.8 1.0 −1.5 −1.0 −0.5 0.0 0.5 1.0 1.5 u = alpha^T z η ( u ) T rue eta(u) Estimated eta(u) 95% bootstrap band Figure 2: Re presen tative fit of η 0 ( u ) and b η ( u ) with 95% b o otstrap bands under Gaussian case. ( a ) L eft : n = 100, ρ = 0 . 0; ( b ) R ight : n = 200, ρ = 0 . 0. 5.5 Sensitivit y Analyses W e conduct sensitivity analyses to assess how the comp eting pro cedures b ehav e when key implemen- tation choices are p erturb ed. In particular, we v ary the w orking correlation among independence, exc hangeable, and AR(1), and we examine robustness across outcome families. Throughout, the spline dimension K is selected b y BIC ov er the same candidate grid used in the main exp eriments, so the comparison reflects the intended data-driv en implemen tation rather than hand tuning. T ables 1 – 3 sho w that the p oint estimation conclusions are largely insensitive to the w orking correlation c hoice, in the sense that the relativ e ordering of metho ds in RMSE/ISE-type summaries remains stable across three working correlation structures. In particular, Profile-BEL and GEE- W ald (whic h share the same estimating-equation backbone) yield comparable point estimates under 18 differen t w orking correlations, while Naive-EL is essen tially unc hanged across correlation settings b y construction. The p olynomial GEE comp etitor exhibits the greatest v ariability , b oth in magnitude and in rank, especially for the more difficult comp onen ts and for non-Gaussian resp onses, indicating that missp ecification in its mean/w orking-structure interpla y can translate into noticeably less stable finite-sample p erformance. T able 4 summarizes inference sensitivity via empirical cov erage and av erage CI length. Two patterns emerge. First, Profile-BEL is largely insensitive to the working correlation c hoice: for each outcome family , its cov erages and lengths change only mildly when switc hing among three working correlation structures, consisten t with the blo c kwise construction based on estimating equations and calibration that do es not hinge on correctly sp ecifying the w orking correlation. Second, Profile- BEL delivers a more fav orable cov erage–length trade-off for the harder parameters, most notably for β 1 and α 2 . In the Gaussian case, co verage for β 1 impro ves relativ e to Naive-EL and GEE- W ald while k eeping CI length moderate, and for α 2 Profile-BEL is close to nominal co verage with substan tially shorter interv als than GEE-W ald. Similar b ehavior is seen under Bernoulli outcomes, where Profile-BEL maintains near-nominal cov erage for α 2 and improv es calibration for β 1 without inflating lengths. Under the P oisson design, inference for β 1 is c hallenging for all metho ds, yet Profile-BEL still dominates the baselines in co v erage for β 1 while remaining well-calibrated for β 2 and α 2 . Ov erall, these sensitivit y results reinforce the main message: Profile-BEL provides the most robust and practically useful inference across correlation sp ecifications and outcome types, whereas the p olynomial GEE alternativ e can b e substan tially less stable in calibration and/or in terv al length for the more difficult comp onents. 6 Real Data Application 6.1 Data Description W e use epil dataset from the MASS pack age in R for analysis, which is a public longitudinal epilepsy study 1852 observ ations. The outcome is the seizure coun t recorded rep eatedly for eac h sub ject across follo w-up perio ds, whic h naturally motiv ates a P oisson-type mean mo del with within-sub ject correlation. Let Y ij denote the seizure coun t for sub ject i at visit/p erio d j , and t ij ∈ [0 , 1] b e the rescaled time index. W e include a treatmen t indicator and baseline severit y (e.g., baseline seizure frequency) as co v ariates, together with demographic adjustmen t v ariables (e.g., age). All con tinuous cov ariates are standardized. 6.2 Implemen tation W e fit the same semiparametric longitudinal mo del as in the simulation section: log { E ( Y ij | X ij , t ij ) } = X ⊤ ij θ + η ( t ij ) , (19) where θ is the finite-dimensional target and η ( · ) is an unkno wn smooth time effect. W e appro ximate η ( t ) by a spline basis with dimension K , and select K b y BIC o ver the same c andidate grid used in the simulation (e.g., K ∈ { 6 , 8 , 10 , 12 } ). W orking correlation is tak en as indep endence, AR(1), and exchangeable, matching the simulation design. W e compare the same four pro cedures as in Section 5.2 : Profile-BEL , Naiv e-EL , GEE- W ald and GEE-Poly . F or Profile-BEL, the p oint estimator up dates the nuisance parameter (including α ) iterativ ely , and inference for θ θ θ is obtained b y profiling the BEL statistic. F or the 19 functional comp onent, w e construct a 95% b o otstrap p oint wise band for η ( · ) using resampling with B bo otstrap replicates. 6.3 Ev aluation Criteria In the real-data analysis, the true θ and the unkno wn link η ( · ) are not observ able, so we compare metho ds using ev aluation criteria that av oid reliance on ground truth. W e therefore assess predictiv e performance through K -fold cross-v alidation; for coun t outcomes, w e rep ort the held-out Poisson deviance (equiv alently , the negative log-likelihoo d up to an additive constan t), where smaller v alues indicate b etter out-of-sample fit. F or scien tific interpretation and uncertain ty quan tification, w e focus on k ey comp onen ts of θ (e.g., treatment and baseline sev erit y effects) and summarize each metho d by its 95% confidence-in terv al length as well as ho w sensitive that length is to the working-correlation sp ecification (indep endence/AR(1)/exc hangeable). Fi- nally , to ev aluate uncertaint y for the nonparametric comp onen t, we compare the estimated shap es of η ( · ) and the widths of the asso ciated 95% point wise bands: bands that remain o verly narro w despite p o or predictiv e p erformance may indicate underestimated uncertaint y , whereas uniformly v ery wide bands can reflect loss of efficiency . T o explicitly quan tify ho w sensitiv e interv al estimation is to the working-correlation choice, T able 7 reports a new stability metric, R ange acr oss c orr elation . F or a given parameter θ j , define Range across corr. = max c ∈{ ind , ar1 , exc } n CI length( θ j ; c ) o − min c ∈{ ind , ar1 , exc } n CI length( θ j ; c ) o . Smaller v alues indicate more stable (i.e., less correlation-sensitiv e) uncertaint y quan tification across plausible working correlations. 6.4 Results W e summarize cross-v alidated deviance, p oin t estimates and CIs for selected co efficients in θ , and the fitted η ( t ) with p oin t wise bands in T able 5 - 7 . Figure 3 visualizes b η ( t ): Profile-BEL estimate with its 95% b o otstrap p oint wise band, and o v erla y the GEE-Poly fit for comparison. T able 5: K -fold CV Poisson deviance. Metho d Indep endence AR(1) Exchangeable Profile-BEL 4.5251 4.5113 5.0705 GEE-W ald 4.6351 4.6213 5.1805 Naiv e-EL 4.6357 4.6357 4.6357 GEE-P oly 4.6129 4.5849 4.7328 Across all methods, the treatmen t-related co efficient θ trt is estimated to b e negativ e, but Profile- BEL deliv ers the most informative inference: under b oth independence and AR(1), its 95% confi- dence interv al excludes zero while remaining substan tially shorter than the corresp onding interv als from GEE-W ald and GEE-P oly . In contrast, the W ald-type in terv als tend to b e wider and often include zero, reflecting their heavier reliance on plug-in v ariance estimation in a mo derate-sample correlated setting, and Naive-EL (by construction) yields essentially identical p oin t/interv al outputs regardless of the w orking correlation b ecause it ignores within-sub ject dep endence. The fact that Profile-BEL simultaneously sharp ens uncertaint y quan tification and preserves the direction and magnitude of the estimated effect pro vides empirical supp ort for the self-normalization adv an tage of empirical likelihoo d in this longitudinal semiparametric problem. 20 T able 6: P oin t estimates and 95% CIs under differen t working correlations. Co efficien t Metho d Indep endence AR(1) Exchangeable θ trt Profile-BEL -0.163 [-0.306, -0.020] -0.174 [-0.312, -0.037] -0.023 [-0.219, 0.172] GEE-W ald -0.153 [-0.488, 0.183] -0.161 [-0.484, 0.161] -0.005 [-0.464, 0.453] Naiv e-EL -0.153 [-0.491, 0.186] -0.153 [-0.491, 0.186] -0.153 [-0.491, 0.186] GEE-P oly -0.153 [-0.488, 0.183] -0.161 [-0.479, 0.157] 0.001 [-0.377, 0.379] θ base Profile-BEL 0.613 [0.585, 0.640] 0.627 [0.599, 0.654] 0.686 [0.641, 0.731] GEE-W ald 0.605 [0.540, 0.670] 0.617 [0.551, 0.682] 0.674 [0.568, 0.780] Naiv e-EL 0.605 [0.540, 0.670] 0.605 [0.540, 0.670] 0.605 [0.540, 0.670] GEE-P oly 0.605 [0.540, 0.670] 0.615 [0.550, 0.680] 0.648 [0.551, 0.744] θ age Profile-BEL 0.148 [0.088, 0.209] 0.169 [0.107, 0.230] 0.304 [0.182, 0.426] GEE-W ald 0.142 [0.000, 0.284] 0.161 [0.015, 0.306] 0.294 [0.008, 0.580] Naiv e-EL 0.142 [-0.001, 0.286] 0.142 [-0.001, 0.286] 0.142 [-0.001, 0.286] GEE-P oly 0.142 [0.000, 0.284] 0.158 [0.014, 0.302] 0.248 [-0.015, 0.512] T able 7: CI length and correlation-sensitivit y . Co efficien t Metho d Avg. CI length Range across corr. θ trt Profile-BEL 0.3170 0.1161 GEE-W ald 0.7441 0.2726 Naiv e-EL 0.6765 0.0000 GEE-P oly 0.6878 0.1204 θ base Profile-BEL 0.0670 0.0351 GEE-W ald 0.1572 0.0823 Naiv e-EL 0.1305 0.0000 GEE-P oly 0.1507 0.0640 0.0 0.2 0.4 0.6 0.8 1.0 −0.20 −0.15 −0.10 −0.05 0.00 0.05 0.10 wcor = independence t η ^ ( t ) Profile−BEL 95% bootstrap band Poly−GEE 0.0 0.2 0.4 0.6 0.8 1.0 −0.20 −0.15 −0.10 −0.05 0.00 0.05 0.10 wcor = ar1 t η ^ ( t ) Profile−BEL 95% bootstrap band Poly−GEE 0.0 0.2 0.4 0.6 0.8 1.0 −0.20 −0.15 −0.10 −0.05 0.00 0.05 0.10 wcor = exchangeable t η ^ ( t ) Profile−BEL 95% bootstrap band Poly−GEE Figure 3: Estimated time effect b η ( t ) on the epilepsy dataset. All η estimates are p ost-pro cessed to satisfy the same iden tifiability constrain t. Bey ond significance, the real-data analysis highlights stability and practical robustness. The cross-v alidated deviance fav ors Profile-BEL under plausible w orking correlations, indicating that prop erly accoun ting for dep endence while profiling out the nonparametric link impro ves general- 21 ization rather than merely tightening in terv als. Moreov er, the CI-stability summaries sho w that Profile-BEL ac hieves the b est length–stabilit y trade-off: its a v erage CI length is the smallest among comp etitors, and its range across correlation sp ecifications is comparatively limited, suggesting re- duced sensitivity to the working correlation c hoice. T aken together, the real-data findings align with the sim ulation evidence and reinforce the main tak eaw ay of the paper: Profile-BEL offers a practical, dep endence-aw are inference strategy that pro duces tighter and more stable conclusions for longitudinal GPLSIMs. 7 Conclusion W e studied a generalized partially linear single-index model for longitudinal data, where the co- v ariate effects are decomp osed in to a finite-dimensional linear comp onent and an unknown smo oth link along a lo w-dimensional index. By com bining spline-sieve profiling with estimating equations, w e developed a practical inference framework that accommo dates within-sub ject dep endence while retaining a clear separation b etw een the target parameter θ θ θ and the nuisance function η ( · ). The prop osed profile blo c k empirical lik eliho o d pro vides likelihoo d-type confidence regions for θ θ θ with- out requiring explicit stabilization of sandwich v ariance estimators, and our asymptotic theory establishes a Wilks-type c hi-square limit under mild regularit y conditions. F rom a metho dological p ersp ective, the main adv antage of the prop osed approach is its flexibil- it y for longitudinal dep endence. The inference is constructed at the sub ject lev el, whic h naturally resp ects the block structure of rep eated measurements, and it remains applicable when the w ork- ing correlation is only an appro ximation of the true dep endence. The spline-based profiling step offers a con v enient and computationally efficient wa y to estimate the unknown link function, while preserving ro ot- n inference for the finite-dimensional component. Our sim ulation results supp ort these theoretical findings and suggest that the empirical-lik eliho o d calibration can deliver stable co verage in mo derate samples across a range of outcome types and correlation strengths. Sev eral extensions are of interest. First, the current framework assumes a b ounded cluster size; it would b e useful to study regimes where the num b er of rep eated measurements grows with n , p o- ten tially requiring refined empirical-pro cess arguments and alternativ e normalization. Second, one ma y incorp orate more flexible dependence mo dels, including time-v arying correlation or random- effect structures, while retaining the estimating-equation foundation. Third, the index structure can b e enric hed b y allo wing m ultiple indices, leading to an additiv e m ulti-index link P L ℓ =1 η ℓ ( z z z ⊤ α α α ℓ ) that balances interpretabilit y and flexibilit y . F ourth, it is natural to consider high-dimensional linear comp onen ts with structured regularization, where one can combine profiling with p enalized esti- mating equations to enable v ariable selection in the presence of an unkno wn link function. Finally , extending the current metho dology to handle irregular observ ation times, missingness mechanisms, and more complex measuremen t error structures w ould broaden its applicabilit y in real longitudinal studies. Ov erall, the prop osed profile blo ck empirical lik eliho o d framework offers a principled and im- plemen table route for inference in semiparametric longitudinal mo dels with dimension reduction. W e hop e it will serv e as a useful building blo ck for more general dep endence structures and ric her functional comp onents in future work. References Bai, Y., F ung, W. K. and Zh u, Z. Y. (2009). P enalized quadratic inference functions for single-index mo dels with longitudinal data. Journal of Multivariate Analysis 100: 152–161. 22 Carroll, R. J., F an, J., Gijb els, I. and W and, M. P . (1997). Generalized partially linear single-index mo dels. Journal of the Americ an Statistic al Asso ciation 92: 477–489. Chang, H.-W. and McKeague, I. W. (2025). Empirical lik eliho o d in functional data analysis. Annual R eview of Statistics and Its Applic ation 12: 425–448. De Bo or, C. (2001). A Pr actic al Guide to Splines . Springer-V erlag New Y ork. Diggle, P ., Heagerty , P ., Liang, K.-Y. and Zeger, S. (2002). Analysis of L ongitudinal Data . Oxford Univ ersity Press. F an, J. and Li, R. (2004). New estimation and mo del selection pro cedures for semiparametric mo deling in longitudinal data analysis. Journal of the A meric an Statistic al Asso ciation 99: 710– 723. Geng, S. and Zhang, L. (2024). Decorrelated e mpirical lik eliho o d for generalized linear mo dels with high-dimensional longitudinal data. Statistics & Pr ob ability L etters 211: 110135. Green, P . J. (1984). Iteratively reweigh ted least squares for maxim um likelihoo d estimation, and some robust and resistant alternativ es. Journal of the R oyal Statistic al So ciety: Series B 46: 149–192. Hall, P ., M ¨ uller, H.-G. and W ang, J.-L. (2006). Prop erties of principal comp onen t metho ds for functional and longitudinal data analysis. The Annals of Statistics 34: 1493–1517. H¨ ardle, W., Hall, P . and Ichim ura, H. (1993). Optimal smo othing in single-index mo dels. The A nnals of Statistics 21: 157–178. He, X. and Shi, P . (2000). Parameters in semiparametric mo dels. Journal of Multivariate Analysis 75. Hu, S. and Xu, H. (2022). An efficient and robust inference metho d based on robust generalized estimating equations and empirical lik eliho o d. Communic ations in Statistics-The ory and Metho ds 51: 994–1010. Huang, J. Z., W u, C. O. and Zhou, L. (2004). P olynomial spline estimation and inference for v arying co efficien t mo dels with longitudinal data. Statistic a Sinic a 14: 763–788. Kolaczyk, E. D. (1994). Empirical lik eliho o d in generalized linear mo dels. Statistic a Sinic a : 199–218. Liang, H. (2008). Generalized partially linear mo dels with missing cov ariates. Journal of Multivari- ate A nalysis 99: 880–895. Liang, H., Liu, X., Li, R. and T asi, C.-L. (2010). Estimation and testing for partially linear single- index mo dels with longitudinal data. The Annals of Statistics 38: 3811–3836. Liang, K.-Y. and Zeger, S. L. (1986). Longitudinal data analysis using generalized linear mo dels. Biometrika 73: 13–22. Lin, X. and Carroll, R. J. (2001). Semiparametric regression for clustered data using generalized estimating equations. Journal of the Americ an Statistic al Asso ciation 96: 1045–1056. 23 Liu, S. and Lian, H. (2018). Robust estimation and mo del identification for longitudinal data v arying-coefficient mo del. Communic ations in Statistics - The ory and Metho ds 47: 2701–2719. No cedal, J. (1980). Up dating quasi-newton matrices with limited storage. Mathematics of c ompu- tation 35: 773–782. Ouy ang, J. and Bondell, H. (2023). Ba yesian analysis of longitudinal data via empirical likelihoo d. Computational Statistics & Data A nalysis 187: 107785. Ow en, A. B. (2001). Empiric al Likeliho o d . Chapman and Hall/CRC. Qin, G. and Zhu, Z. (2008). Robust estimation in partial linear mixed mo del for longitudinal data. A cta Mathematic a Scientia 28: 333–347. Qu, A., Lindsa y , B. G. and Li, B. (2000). Improving generalised estimating equations using quadratic inference functions. Biometrika 87: 823–836. Rupp ert, D., W and, M. P . and Carroll, R. J. (2003). Semip ar ametric R e gr ession . Cambridge: Cam bridge Univ ersity Press. Sheng, Y., Sun, Y., Huang, C.-Y. and Kim, M.-O. (2022). Syn thesizing external aggregated infor- mation in the presence of p opulation heterogeneity: A p enalized empirical likelihoo d approach. Biometrics 78: 679–690. Stone, C. J. (1985). Additiv e regression and other nonparametric mo dels. The A nnals of Statistics 13: 689–705. T an, X. and Y an, L. (2021). Penalized empirical likelihoo d for generalized linear mo dels with longitudinal data. Communic ations in Statistics - Simulation and Computation 50: 608–623. W ang, B.-H. and Liang, H.-Y. (2023). Empirical likelihoo d in single-index quan tile regression with high dimensional and missing observ ations. Journal of Statistic al Planning and Infer enc e 226: 1–19. W ang, Y.-G. and Carey , V. J. (2004). Un biased estimating equations from working correlation mo dels for irregularly timed rep eated measures. Journal of the Americ an Statistic al Asso ciation 99: 845–853, doi:10.1198/016214504000001178. W u, H. and Zhang, J.-T. (2002). Lo cal p olynomial mixed-effects mo dels for longitudinal data. Journal of the A meric an Statistic al Asso ciation 97: 883–897. Xia, Y. and Hardle, W. (2006). Semiparametric estimation of partially linear single-index mo dels. Journal of Multivariate A nalysis 97: 1162–1184. Xue, L. and Lian, H. (2016). Empirical lik eliho o d for single-index mo dels with resp onses missing at random. Scienc e China Mathematics 59: 1187–1207. Xue, L. and Zhu, L. (2006). Empirical lik eliho o d for single-index mo dels. Journal of Multivariate A nalysis 97: 1295–1312. Y ou, J., Chen, G. and Zhou, Y. (2006). Blo ck empirical likelihoo d for longitudinal partially linear regression mo dels. Canadian Journal of Statistics 34: 79–96. 24 Y u, Y. and Rupp ert, D. (2002). Penalized spline estimation for partially linear single-index mo dels. Journal of the A meric an Statistic al Asso ciation 97: 1042–1054. Y u, Z., He, B. and Chen, M. (2014). Empirical lik eliho o d for generalized par tially linear single-index mo dels. Communic ations in Statistics - The ory and Metho ds 43: 4156–4163. Zhang, Y., Qin, G., Zhu, Z. and Zhang, J. (2022). Empirical likelihoo d inference for longitudi- nal data with co v ariate measuremen t errors: An application to the lean study . Computational Statistics & Data A nalysis 175: 107553. 25 A Pro ofs The pro ofs of Section 4 rely on spline sieve approximation and uniform con v ergence of the profiled n uisance estimator, a linearization of the profile estimating equation, and a standard EL Lagrange- m ultiplier expansion. Throughout the appendix, C denotes a generic p ositiv e constant that ma y c hange from line to line. All sto c hastic orders are with respect to n → ∞ . A.1 Auxiliary Lemmas Lemma 2 (Spline appro ximation) . Under Assumptions 4 – 5 , ther e exists a c o efficient ve ctor γ γ γ ∗ 0 = γ γ γ ∗ 0 ( K ) such that sup u ∈U η 0 ( u ) − B B B ( u ) ⊤ γ γ γ ∗ 0 ≤ C K − s . Pr o of. This is a standard prop erty of p olynomial spline appro ximation on a compact interv al with quasi-uniform knots. Since η 0 is s times con tinuously differen tiable with b ounded s -th deriv ative (Assumption 4 ), the spline space of dimension K contains an approximan t with sup-norm error O ( K − s ). A detailed construction can be found in ( Huang et al. , 2004 ; He and Shi , 2000 ). Lemma 3 (Uniform consistency of the profiled nuisance) . Under Assumptions 1 – 7 . L et b γ γ γ ( θ θ θ ) solve ( 8 ) and γ γ γ 0 ( θ θ θ ) solve ( 15 ) . Then, uniformly over θ θ θ in a neighb orho o d N of θ θ θ 0 , ∥ b γ γ γ ( θ θ θ ) − γ γ γ 0 ( θ θ θ ) ∥ 2 = O P r K n + K − s ! . Conse quently, uniformly over θ θ θ ∈ N , sup u ∈U B B B ( u ) ⊤ b γ γ γ ( θ θ θ ) − B B B ( u ) ⊤ γ γ γ 0 ( θ θ θ ) = O P r K n + K − s ! . Pr o of. Fix a neighborho o d N of θ θ θ 0 , ∀ θ θ θ ∈ N , define the sample and p opulation inner estimating maps Φ Φ Φ n ( γ γ γ ; θ θ θ ) = 1 n n X i =1 B B B i ( θ θ θ ) ⊤ ∆ ∆ ∆ i ( θ θ θ , γ γ γ ) V V V i ( θ θ θ , γ γ γ ) − 1 { Y Y Y i − µ µ µ i ( θ θ θ , γ γ γ ) } , Φ Φ Φ 0 ( γ γ γ ; θ θ θ ) = E h B B B i ( θ θ θ ) ⊤ ∆ ∆ ∆ i ( θ θ θ , γ γ γ ) V V V i ( θ θ θ , γ γ γ ) − 1 { Y Y Y i − µ µ µ i ( θ θ θ , γ γ γ ) } i . By definition, b γ γ γ ( θ θ θ ) satisfies Φ Φ Φ n ( b γ γ γ ( θ θ θ ); θ θ θ ) = 0 0 0 and γ γ γ 0 ( θ θ θ ) satisfies Φ Φ Φ 0 ( γ γ γ 0 ( θ θ θ ); θ θ θ ) = 0 0 0. W e first con trol the uniform stochastic fluctuation of Φ Φ Φ n around Φ Φ Φ 0 . Under Assumptions 1 , 2 , 3 , and 7 , together with max i m i ≤ M , the contributions are i.i.d. across i with an integrable en velope, and the maps ( θ θ θ , γ γ γ ) 7→ Φ Φ Φ n ( γ γ γ ; θ θ θ ) are uniformly Lipsc hitz in ( θ θ θ , γ γ γ ) on N × Γ K for any b ounded set Γ K ⊂ R K con taining the relev ant solutions. Moreo v er, b ecause the effective dimension of the sieve comp onen t is K , standard empirical-pro cess b ounds for finite-dimensional sieve scores yield sup θ θ θ ∈N sup γ γ γ ∈ Γ K ∥ Φ Φ Φ n ( γ γ γ ; θ θ θ ) − Φ Φ Φ 0 ( γ γ γ ; θ θ θ ) ∥ 2 = O P r K n ! . (20) Next we establish local inv ertibilit y (uniformly in θ θ θ ) of the p opulation Jacobian w.r.t γ γ γ . Denote J J J 0 ( γ γ γ ; θ θ θ ) = ∂ ∂γ γ γ ⊤ Φ Φ Φ 0 ( γ γ γ ; θ θ θ ) . 26 By Assumptions 3 and 7 , J J J 0 ( γ γ γ ; θ θ θ ) exists and is contin uous in ( γ γ γ , θ θ θ ). F urthermore, since Φ Φ Φ 0 ( γ γ γ ; θ θ θ ) is a generalized least-squares normal equation in γ γ γ , the matrix − J J J 0 ( γ γ γ 0 ( θ θ θ ); θ θ θ ) is p ositive definite and its eigenv alues are uniformly b ounded aw ay from 0 and ∞ o ver θ θ θ ∈ N . Consequently , there exists c > 0 such that sup θ θ θ ∈N J J J 0 ( γ γ γ 0 ( θ θ θ ); θ θ θ ) − 1 ≤ c. (21) By the same contin uity argumen t and ( 20 ), the sample Jacobian J J J n ( γ γ γ ; θ θ θ ) = ∂ Φ Φ Φ n ( γ γ γ ; θ θ θ ) /∂γ γ γ ⊤ con- v erges uniformly to J J J 0 ( γ γ γ ; θ θ θ ) on N × Γ K , hence is in v ertible uniformly with probability tending to one. W e no w relate b γ γ γ ( θ θ θ ) to γ γ γ 0 ( θ θ θ ). F or any fixed θ θ θ ∈ N , apply the mean v alue theorem to Φ Φ Φ n ( b γ γ γ ( θ θ θ ); θ θ θ ) − Φ Φ Φ n ( γ γ γ 0 ( θ θ θ ); θ θ θ ): there exists ˜ γ γ γ ( θ θ θ ) on the segmen t joining b γ γ γ ( θ θ θ ) and γ γ γ 0 ( θ θ θ ) such that 0 0 0 = Φ Φ Φ n ( b γ γ γ ( θ θ θ ); θ θ θ ) = Φ Φ Φ n ( γ γ γ 0 ( θ θ θ ); θ θ θ ) + J J J n ( ˜ γ γ γ ( θ θ θ ); θ θ θ ) { b γ γ γ ( θ θ θ ) − γ γ γ 0 ( θ θ θ ) } . Therefore, b γ γ γ ( θ θ θ ) − γ γ γ 0 ( θ θ θ ) = − J J J n ( ˜ γ γ γ ( θ θ θ ); θ θ θ ) − 1 Φ Φ Φ n ( γ γ γ 0 ( θ θ θ ); θ θ θ ) . Decomp ose the righ t-hand side as Φ Φ Φ n ( γ γ γ 0 ( θ θ θ ); θ θ θ ) = { Φ Φ Φ n ( γ γ γ 0 ( θ θ θ ); θ θ θ ) − Φ Φ Φ 0 ( γ γ γ 0 ( θ θ θ ); θ θ θ ) } + Φ Φ Φ 0 ( γ γ γ 0 ( θ θ θ ); θ θ θ ) . The second term is exactly 0 0 0 by definition of γ γ γ 0 ( θ θ θ ), hence Φ Φ Φ n ( γ γ γ 0 ( θ θ θ ); θ θ θ ) = Φ Φ Φ n ( γ γ γ 0 ( θ θ θ ); θ θ θ ) − Φ Φ Φ 0 ( γ γ γ 0 ( θ θ θ ); θ θ θ ) . Com bining this with the uniform inv ertibilit y of J J J n and the uniform b ound ( 20 ) gives, uniformly for θ θ θ ∈ N , ∥ b γ γ γ ( θ θ θ ) − γ γ γ 0 ( θ θ θ ) ∥ 2 = O P r K n ! . T o incorp orate the sieve approximation error, let γ γ γ ∗ 0 ( K ) b e the spline co efficient v ector in Lemma 2 so that sup u ∈U | η 0 ( u ) − B B B ( u ) ⊤ γ γ γ ∗ 0 ( K ) | ≤ C K − s . Under Assumptions 3 and 7 , the map γ γ γ 7→ Φ Φ Φ 0 ( γ γ γ ; θ θ θ ) is con tinuously differentiable with Jacobian uniformly nonsingular around γ γ γ 0 ( θ θ θ ). Because the p opulation equation ( 15 ) is the sieve-restricted p opulation score, the deviation b et w een its ro ot γ γ γ 0 ( θ θ θ ) and the oracle approximation γ γ γ ∗ 0 ( K ) is controlled by the appro ximation error through a standard implicit-function perturbation argument, yielding an additional O ( K − s ) term. Hence, uniformly on N , ∥ b γ γ γ ( θ θ θ ) − γ γ γ 0 ( θ θ θ ) ∥ 2 = O P r K n + K − s ! . Finally , the stated sup-norm b ound follo ws from b oundedness of the spline basis on U : sup u ∈U B B B ( u ) ⊤ b γ γ γ ( θ θ θ ) − B B B ( u ) ⊤ γ γ γ 0 ( θ θ θ ) ≤ sup u ∈U ∥ B B B ( u ) ∥ 2 · ∥ b γ γ γ ( θ θ θ ) − γ γ γ 0 ( θ θ θ ) ∥ 2 , whic h is O P ( p K/n + K − s ) uniformly ov er θ θ θ ∈ N . Lemma 4 (Linearization of the profile estimating equation) . Under Assumptions 1 – 8 , the sample pr ofile map U U U n ( θ θ θ ) = n − 1 P n i =1 g g g i ( θ θ θ ) satisfies, uniformly for θ θ θ in a neighb orho o d of θ θ θ 0 , U U U n ( θ θ θ ) = U U U n ( θ θ θ 0 ) + H H H 0 ( θ θ θ − θ θ θ 0 ) + o P ( ∥ θ θ θ − θ θ θ 0 ∥ 2 ) + o P ( n − 1 / 2 ) . Mor e over, √ n U U U n ( θ θ θ 0 ) ⇒ N ( 0 0 0 , S S S 0 ) . 27 Pr o of. W rite U U U n ( θ θ θ ) = n − 1 P n i =1 g g g i ( θ θ θ ) and U U U n, 0 ( θ θ θ ) = n − 1 P n i =1 g g g i, 0 ( θ θ θ ). W e first show that replacing the profiled n uisance estimator b γ γ γ ( θ θ θ ) b y its p opulation counterpart γ γ γ 0 ( θ θ θ ) only induces an asymp- totically negligible error at the n − 1 / 2 scale. By Lemma 3 and the smo othness of the map ( θ θ θ , γ γ γ ) 7→ g g g i ( θ θ θ ) implied by Assumptions 3 and 7 , there exists a neigh b orho o d N of θ θ θ 0 suc h that sup θ θ θ ∈N ∥ U U U n ( θ θ θ ) − U U U n, 0 ( θ θ θ ) ∥ 2 = O P r K n + K − s ! . Under Assumption 6 , the right-hand side is o P ( n − 1 / 2 ), hence uniformly on N , U U U n ( θ θ θ ) = U U U n, 0 ( θ θ θ ) + o P ( n − 1 / 2 ) . (22) Next we linearize U U U n, 0 ( θ θ θ ) around θ θ θ 0 . Let H H H i, 0 ( θ θ θ ) = ∂g g g i, 0 ( θ θ θ ) /∂θ θ θ ⊤ . By Assumptions 2 – 7 , H H H i, 0 ( θ θ θ ) exists, is con tinuous in θ θ θ , and is uniformly b ounded on N . Therefore, for an y θ θ θ ∈ N , a mean-v alue expansion yields U U U n, 0 ( θ θ θ ) = U U U n, 0 ( θ θ θ 0 ) + ( 1 n n X i =1 H H H i, 0 ( ˜ θ θ θ ) ) ( θ θ θ − θ θ θ 0 ) , where ˜ θ θ θ lies on the line segment joining θ θ θ and θ θ θ 0 . By a uniform law of large num b ers (using Assumptions 1 and 2 ), sup θ θ θ ∈N 1 n n X i =1 H H H i, 0 ( θ θ θ ) − H H H 0 = o P (1) , so the deriv ative term can b e replaced by H H H 0 up to o P (1) uniformly on N . Consequently , U U U n, 0 ( θ θ θ ) = U U U n, 0 ( θ θ θ 0 ) + H H H 0 ( θ θ θ − θ θ θ 0 ) + o P ( ∥ θ θ θ − θ θ θ 0 ∥ 2 ) , uniformly for θ θ θ ∈ N . Finally we establish the central limit theorem at θ θ θ 0 . Since sub jects are indep endent (Assump- tion 1 ) and E ∥ g g g i, 0 ( θ θ θ 0 ) ∥ 2 2 < ∞ (Assumption 2 ), the multiv ariate Lindeb erg–F eller CL T yields √ n U U U n, 0 ( θ θ θ 0 ) = 1 √ n n X i =1 g g g i, 0 ( θ θ θ 0 ) ⇒ N ( 0 0 0 , S S S 0 ) . Com bining this CL T with ( 22 ) sho ws √ n U U U n ( θ θ θ 0 ) ⇒ N ( 0 0 0 , S S S 0 ), and substituting ( 22 ) into the ab ov e linearization of U U U n, 0 ( θ θ θ ) yields U U U n ( θ θ θ ) = U U U n ( θ θ θ 0 ) + H H H 0 ( θ θ θ − θ θ θ 0 ) + o P ( ∥ θ θ θ − θ θ θ 0 ∥ 2 ) + o P ( n − 1 / 2 ) , uniformly on N . This completes the pro of. A.2 Pro ofs of Theorems In this subsection we provide pro ofs for Theorems 1 – 3 . Throughout, N denotes a sufficien tly small op en neighborho o d of θ θ θ 0 on which the expansions in Lemmas 3 and 4 hold uniformly , and where Assumption 8 guarantees lo cal uniqueness of the p opulation root. Pr o of of The or em 1 . W e prov e existence of a ro ot b θ θ θ ∈ N to ( 9 ) with probability tending to one, b θ θ θ → θ θ θ 0 , and the stated rates for b θ θ θ and b η . 28 Existence and consistency of a lo cal ro ot. Let U U U n ( θ θ θ ) = n − 1 P n i =1 g g g i ( θ θ θ ) b e the sample profile estimating map so that ( 9 ) is U U U n ( θ θ θ ) = 0 0 0, and let U U U 0 ( θ θ θ ) = E { g g g i, 0 ( θ θ θ ) } b e its p opulation coun terpart. By Assumption 8 , U U U 0 ( θ θ θ ) is contin uously differentiable on N , has a unique zero at θ θ θ 0 , and H H H 0 = ∂U U U 0 ( θ θ θ ) /∂θ θ θ ⊤ | θ θ θ = θ θ θ 0 is nonsingular. Th us, there exists ε > 0 suc h that on the sphere ∂ B ( θ θ θ 0 , ε ), inf ∥ θ θ θ − θ θ θ 0 ∥ 2 = ε ∥ U U U 0 ( θ θ θ ) ∥ 2 ≥ c 0 > 0 . (23) Lemma 4 implies sup θ θ θ ∈N ∥ U U U n ( θ θ θ ) − U U U 0 ( θ θ θ ) ∥ 2 = o P (1). Hence, with probabilit y tending to one, sup ∥ θ θ θ − θ θ θ 0 ∥ 2 = ε ∥ U U U n ( θ θ θ ) − U U U 0 ( θ θ θ ) ∥ 2 ≤ c 0 / 2 , whic h together with ( 23 ) giv es inf ∥ θ θ θ − θ θ θ 0 ∥ 2 = ε ∥ U U U n ( θ θ θ ) ∥ 2 ≥ c 0 / 2 . Since U U U n ( θ θ θ ) is contin uous in θ θ θ on N (b y Assumptions 3 – 7 and the profile construction), a stan- dard top ological argumen t for v ector equations (e.g., degree theory for Z-estimators) implies that U U U n ( θ θ θ ) = 0 0 0 admits at least one ro ot b θ θ θ inside B ( θ θ θ 0 , ε ). Therefore b θ θ θ → θ θ θ 0 in probability . Ro ot- n rate for b θ θ θ . Because b θ θ θ ∈ N w.h.p., w e ma y apply Lemma 4 at θ θ θ = b θ θ θ : 0 0 0 = U U U n ( b θ θ θ ) = U U U n ( θ θ θ 0 ) + H H H 0 ( b θ θ θ − θ θ θ 0 ) + r r r n , (24) where ∥ r r r n ∥ 2 = o P ( ∥ b θ θ θ − θ θ θ 0 ∥ 2 ) + o P ( n − 1 / 2 ). Lemma 4 also yields U U U n ( θ θ θ 0 ) = O P ( n − 1 / 2 ). Left- m ultiplying ( 24 ) by H H H − 1 0 and taking norms giv es ∥ b θ θ θ − θ θ θ 0 ∥ 2 ≤ ∥ H H H − 1 0 ∥ · ∥ U U U n ( θ θ θ 0 ) ∥ 2 + ∥ H H H − 1 0 ∥ · ∥ r r r n ∥ 2 . Since the first term is O P ( n − 1 / 2 ) and the remainder is asymptotically smaller than ∥ b θ θ θ − θ θ θ 0 ∥ 2 plus n − 1 / 2 , a standard con traction argumen t implies ∥ b θ θ θ − θ θ θ 0 ∥ 2 = O P ( n − 1 / 2 ). Uniform rate for b η . Recall b η ( u ) = B B B ( u ) ⊤ b γ γ γ ( b θ θ θ ). Add and subtract B B B ( u ) ⊤ γ γ γ 0 ( b θ θ θ ) and B B B ( u ) ⊤ γ γ γ 0 ( θ θ θ 0 ): sup u ∈U | b η ( u ) − η 0 ( u ) | ≤ sup u | B B B ( u ) ⊤ { b γ γ γ ( b θ θ θ ) − γ γ γ 0 ( b θ θ θ ) }| + sup u | B B B ( u ) ⊤ { γ γ γ 0 ( b θ θ θ ) − γ γ γ 0 ( θ θ θ 0 ) }| + sup u | B B B ( u ) ⊤ γ γ γ 0 ( θ θ θ 0 ) − η 0 ( u ) | . The first term is O P ( p K/n + K − s ) uniformly by Lemma 3 and b oundedness of B B B ( u ) on U . F or the second term, the map θ θ θ 7→ γ γ γ 0 ( θ θ θ ) is locally Lipschitz by the implicit function theorem, since Φ Φ Φ 0 ( γ γ γ ; θ θ θ ) = 0 0 0 has a lo cally unique solution and its Jacobian w.r.t. γ γ γ is uniformly nonsingular (see the Jacobian argument in the pro of of Lemma 3 ). Thus ∥ γ γ γ 0 ( b θ θ θ ) − γ γ γ 0 ( θ θ θ 0 ) ∥ 2 ≤ C ∥ b θ θ θ − θ θ θ 0 ∥ 2 = O P ( n − 1 / 2 ), so the second term is O P ( n − 1 / 2 ). The last term is the siev e approximation error and is O ( K − s ) b y Lemma 2 . Com bining these b ounds yields sup u ∈U | b η ( u ) − η 0 ( u ) | = O P K − s + r K n ! , whic h completes the pro of. 29 Pr o of of The or em 2 . F rom ( 24 ) in the previous pro of, √ n ( b θ θ θ − θ θ θ 0 ) = − H H H − 1 0 √ n U U U n ( θ θ θ 0 ) − H H H − 1 0 √ n r r r n . It suffices to show √ n r r r n = o P (1). By Lemma 4 , ∥ r r r n ∥ 2 = o P ( ∥ b θ θ θ − θ θ θ 0 ∥ 2 ) + o P ( n − 1 / 2 ). Using Theorem 1 , ∥ b θ θ θ − θ θ θ 0 ∥ 2 = O P ( n − 1 / 2 ), hence ∥ r r r n ∥ 2 = o P ( n − 1 / 2 ) and therefore √ n r r r n = o P (1). Lemma 4 further giv es √ n U U U n ( θ θ θ 0 ) ⇒ N ( 0 0 0 , S S S 0 ). By Slutsky’s theorem, √ n ( b θ θ θ − θ θ θ 0 ) ⇒ N 0 0 0 , H H H − 1 0 S S S 0 ( H H H − 1 0 ) ⊤ , and the influence representation follo ws immediately by writing √ n U U U n ( θ θ θ 0 ) = n − 1 / 2 P n i =1 g g g i, 0 ( θ θ θ 0 ) + o P (1), which is part of Lemma 4 . Lemma 5 (Existence and expansion of the EL multiplier) . Under Assumptions 1 – 8 , with pr ob ability tending to one, the L agr ange multiplier e quation ( 12 ) at θ θ θ = θ θ θ 0 admits a unique solution b λ λ λ ∈ R d such that ∥ b λ λ λ ∥ 2 = O P ( n − 1 / 2 ) , and b λ λ λ = S S S n ( θ θ θ 0 ) − 1 ¯ g g g ( θ θ θ 0 ) + o P ( n − 1 / 2 ) , wher e ¯ g g g ( θ θ θ 0 ) = n − 1 P n i =1 g g g i ( θ θ θ 0 ) and S S S n ( θ θ θ 0 ) = n − 1 P n i =1 g g g i ( θ θ θ 0 ) g g g i ( θ θ θ 0 ) ⊤ . Pr o of. F or brevity write g g g i = g g g i ( θ θ θ 0 ), ¯ g g g = n − 1 P n i =1 g g g i , and S S S n = n − 1 P n i =1 g g g i g g g ⊤ i . Define the dual map Ψ Ψ Ψ( λ λ λ ) = 1 n n X i =1 g g g i 1 + λ λ λ ⊤ g g g i , so that ( 12 ) is equiv alen t to Ψ Ψ Ψ( λ λ λ ) = 0 0 0. W e first record the size of the key empirical quantities. By Lemma 4 , √ n ¯ g g g ⇒ N ( 0 0 0 , S S S 0 ), hence ¯ g g g = O P ( n − 1 / 2 ). Also, S S S n → S S S 0 in probability with S S S 0 p ositiv e definite, so S S S n is inv ertible w.h.p. Moreo ver, Assumption 2 implies max 1 ≤ i ≤ n ∥ g g g i ∥ 2 = O P (1) b ecause the observ ations are i.i.d. at the sub ject level and m i is b ounded. Next we establish a lo cal expansion of Ψ Ψ Ψ( λ λ λ ) around λ λ λ = 0 0 0. F or ∥ λ λ λ ∥ 2 sufficien tly small, use the iden tity 1 1 + λ λ λ ⊤ g g g i = 1 − λ λ λ ⊤ g g g i + ( λ λ λ ⊤ g g g i ) 2 1 + λ λ λ ⊤ g g g i , whic h yields Ψ Ψ Ψ( λ λ λ ) = ¯ g g g − S S S n λ λ λ + R R R n ( λ λ λ ) , where R R R n ( λ λ λ ) = 1 n n X i =1 g g g i ( λ λ λ ⊤ g g g i ) 2 1 + λ λ λ ⊤ g g g i . On the even t max i | λ λ λ ⊤ g g g i | ≤ 1 / 2 (which holds w.h.p. for ∥ λ λ λ ∥ 2 ≤ cn − 1 / 2 and fixed c ), ∥ R R R n ( λ λ λ ) ∥ 2 ≤ 2 n n X i =1 ∥ g g g i ∥ 2 ( λ λ λ ⊤ g g g i ) 2 ≤ 2 ∥ λ λ λ ∥ 2 2 · 1 n n X i =1 ∥ g g g i ∥ 3 2 = O P ( ∥ λ λ λ ∥ 2 2 ) , and thus sup ∥ λ λ λ ∥≤ cn − 1 / 2 ∥ R R R n ( λ λ λ ) ∥ 2 = o P ( n − 1 / 2 ). 30 Define the candidate appro ximation λ λ λ ∗ = S S S − 1 n ¯ g g g . Then ∥ λ λ λ ∗ ∥ 2 = O P ( n − 1 / 2 ) and Ψ Ψ Ψ( λ λ λ ∗ ) = ¯ g g g − S S S n λ λ λ ∗ + R R R n ( λ λ λ ∗ ) = R R R n ( λ λ λ ∗ ) = o P ( n − 1 / 2 ) . Also, the Jacobian of Ψ Ψ Ψ is ∂ Ψ Ψ Ψ( λ λ λ ) ∂λ λ λ ⊤ = − 1 n n X i =1 g g g i g g g ⊤ i { 1 + λ λ λ ⊤ g g g i } 2 , whic h is negative definite in a neigh b orho o d of 0 0 0 w.h.p. b ecause S S S n is p ositive definite and the denominators stay bounded. Therefore, by the implicit function theorem / Newton-Kan toro vich argumen t, there exists a unique ro ot b λ λ λ of Ψ Ψ Ψ( λ λ λ ) = 0 0 0 in the ball {∥ λ λ λ ∥ ≤ cn − 1 / 2 } w.h.p., and it satisfies b λ λ λ − λ λ λ ∗ = ( ∂ Ψ Ψ Ψ( ˜ λ λ λ ) ∂λ λ λ ⊤ ) − 1 Ψ Ψ Ψ( λ λ λ ∗ ) = o P ( n − 1 / 2 ) , for some ˜ λ λ λ b etw een b λ λ λ and λ λ λ ∗ . This yields ∥ b λ λ λ ∥ 2 = O P ( n − 1 / 2 ) and the expansion b λ λ λ = S S S − 1 n ¯ g g g + o P ( n − 1 / 2 ). Pr o of of L emma 1 . Let g g g i = g g g i ( θ θ θ 0 ), ¯ g g g = n − 1 P n i =1 g g g i and S S S n = n − 1 P n i =1 g g g i g g g ⊤ i . By Lemma 4 , ¯ g g g = O P ( n − 1 / 2 ) and S S S n → S S S 0 in probability with S S S 0 p ositiv e definite, so S S S n is inv ertible w.h.p. W e also need the feasibilit y (con v ex h ull) condition ensuring the EL w eigh ts exist. Since E ( g g g i ) = 0 0 0 at θ θ θ 0 and V ar( g g g i ) = S S S 0 is p ositive definite, the origin lies in the in terior of the conv ex h ull of { g g g i } n i =1 with probability tending to one; this is a standard EL fact under nondegeneracy and i.i.d. sampling ( Kolaczyk , 1994 ; Ow en , 2001 ). Hence the Lagrange multiplier equation ( 12 ) admits a unique solution b λ λ λ w.h.p. By Lemma 5 , b λ λ λ = S S S − 1 n ¯ g g g + o P ( n − 1 / 2 ) and ∥ b λ λ λ ∥ 2 = O P ( n − 1 / 2 ). Moreo v er, max 1 ≤ i ≤ n ∥ g g g i ∥ 2 = O P (1) b y Assumption 2 and b ounded cluster size, so max i | b λ λ λ ⊤ g g g i | = o P (1), which justifies the T aylor expansion of the log. Using log (1 + t ) = t − 1 2 t 2 + O ( t 3 ) uniformly for | t | ≤ o (1), ℓ ( θ θ θ 0 ) = 2 n X i =1 log { 1 + b λ λ λ ⊤ g g g i } = 2 n X i =1 b λ λ λ ⊤ g g g i − 1 2 ( b λ λ λ ⊤ g g g i ) 2 + 2 n X i =1 O | b λ λ λ ⊤ g g g i | 3 . The cubic remainder is o P (1) b ecause P i | b λ λ λ ⊤ g g g i | 3 ≤ (max i | b λ λ λ ⊤ g g g i | ) P i ( b λ λ λ ⊤ g g g i ) 2 = o P (1) · O P (1). Also, P n i =1 b λ λ λ ⊤ g g g i = n b λ λ λ ⊤ ¯ g g g and P n i =1 ( b λ λ λ ⊤ g g g i ) 2 = n b λ λ λ ⊤ S S S n b λ λ λ . Therefore, ℓ ( θ θ θ 0 ) = 2 n b λ λ λ ⊤ ¯ g g g − n b λ λ λ ⊤ S S S n b λ λ λ + o P (1) . Substituting b λ λ λ = S S S − 1 n ¯ g g g + o P ( n − 1 / 2 ) yields ℓ ( θ θ θ 0 ) = n ¯ g g g ⊤ S S S − 1 n ¯ g g g + o P (1) , as claimed. 31 Pr o of of The or em 3 . By Lemma 1 , ℓ ( θ θ θ 0 ) = n ¯ g g g ( θ θ θ 0 ) ⊤ S S S n ( θ θ θ 0 ) − 1 ¯ g g g ( θ θ θ 0 ) + o P (1) . Lemma 4 giv es √ n ¯ g g g ( θ θ θ 0 ) ⇒ N ( 0 0 0 , S S S 0 ) and S S S n ( θ θ θ 0 ) → S S S 0 in probabilit y . Let S S S 1 / 2 0 b e the symmetric square ro ot and define Z Z Z n = S S S n ( θ θ θ 0 ) − 1 / 2 √ n ¯ g g g ( θ θ θ 0 ). Then Z Z Z n ⇒ N ( 0 0 0 , I I I d ) by Slutsky , hence n ¯ g g g ⊤ S S S − 1 n ¯ g g g = Z Z Z ⊤ n Z Z Z n ⇒ χ 2 d , whic h pro ves ℓ ( θ θ θ 0 ) ⇒ χ 2 d . F or the profile statistic, partition θ θ θ = ( θ θ θ ⊤ 1 , θ θ θ ⊤ 2 ) ⊤ with dim( θ θ θ 1 ) = r . Near θ θ θ 0 , the minimizer b θ θ θ 2 ( θ θ θ 1 ) = arg min θ θ θ 2 ℓ ( θ θ θ 1 , θ θ θ 2 ) exists and is unique w.h.p. b ecause ℓ ( θ θ θ ) is lo cally conv ex and twice differen tiable in a neighborho o d of θ θ θ 0 under Assumptions 3 – 8 , the conv exity follo ws from the EL dual form and nondegeneracy of S S S 0 , see Owen ( 2001 ); Kolaczyk ( 1994 ). Applying the quadratic appro ximation in Lemma 1 to ℓ ( θ θ θ ) yields, uniformly lo cally , ℓ ( θ θ θ ) = n ¯ g g g ( θ θ θ ) ⊤ S S S n ( θ θ θ ) − 1 ¯ g g g ( θ θ θ ) + o P (1) . Using the smo othness of ¯ g g g ( θ θ θ ) and a T aylor expansion around θ θ θ 0 , √ n ¯ g g g ( θ θ θ ) = √ n ¯ g g g ( θ θ θ 0 ) + G G G √ n ( θ θ θ − θ θ θ 0 ) + o P (1) , where G G G = ∂ U U U 0 ( θ θ θ ) /∂θ θ θ ⊤ | θ θ θ = θ θ θ 0 = H H H 0 . Minimizing the resulting quadratic form ov er θ θ θ 2 yields a reduced quadratic form in the r - dimensional comp onen t corresp onding to θ θ θ 1 , which con verges to χ 2 r b y the same self-normalized CL T argumen t as abov e (formally , via the Sc hur complement of the partitioned information matrix). Therefore ℓ prof ( θ θ θ 1 , 0 ) ⇒ χ 2 r . 32
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment