Root Finding and Metamodeling for Rapid and Robust Computer Model Calibration

R O O T F I N D I N G A N D M E T A M O D E L I N G F O R R A P I D A N D R O B U S T C O M P U T E R M O D E L C A L I B R A T I O N Y ongseok Jeon, Sara Shashaani Edward P . Fitts Department of Industrial and Systems Engineering North Carolina State Univ ersity Raleigh, NC 27695 {yjeon, sshasha2}@ncsu.edu A B S T R AC T W e concern computer model calibration problem where the goal is to ﬁnd the parameters that minimize the discrepancy between the multiv ariate real-world and computer model outputs. W e propose to solve an approximation using signed residuals that enables a root ﬁnding approach and an accelerated search. W e characterize the distance of the solutions to the approximation from the solutions of the original problem for the strongly-con vex objective functions, showing that it depends on v ariability of the signed residuals across output dimensions, as wells as their v ariance and co variance. W e develop a metamodel-based root ﬁnding frame work under kriging and stochastic kriging that is augmented with a sequential search space reduction. W e deri ve three ne w acquisition functions for ﬁnding roots of the approximate problem along with their deriv atives usable by ﬁrst-order solvers. Compared to kriging, the stochastic kriging metamodels will incorporate noisiness in the observ ations suggesting more robust solutions. W e also analyze the case where a root may not exist. Our analysis of the asymptotic behavior in this conte xt show that, since existence of roots in the approximation problem may not be known a priori, using ne w acquisition functions will not hurt the outcome. Numerical experiments on data-dri ven and physics-based examples demonstrate signiﬁcant computational gains ov er standard calibration approaches. K eywords sample a verage approximation · stochastic kriging · acquisition functions · Bayesian optimization 1 Introduction W ith the recent adv ances in computational power and data acquisition technologies, computer models ha ve become indispensable tools for simulation [ 1 ], prediction [ 2 ], and decision-support tools under uncertainty [ 3 ]. Howe ver , the parameters that gov ern the behavior of a computer model often need to be careful ly estimated to ensure accurate representation of the real-world system. This is commonly referred to as the model calibr ation pr oblem . Calibration in volves ﬁnding the optimal set of parameters that enable the model outputs to match the observed outputs [ 4 ]. Signiﬁcant progress has been made through v arious framew orks, including Bayesian calibration [ 5 , 6 ], ordinary least squares [7], maximum likelihood estimation [8] or surrogate-based optimization [9]. Despite these adv ances, several practical challenges remain. When the observed dataset is lar ge, calibration becomes computationally demanding. Moreov er , due to the inherent stochasticity of the real-world systems, the calibrated parameters may overﬁt the observed data and perform poorly on unseen data. High-dimensional input and output spaces further complicate the problem by masking the inﬂuence of calibration parameters. Recent efforts to mitigate these challenges include subsampling strategies [ 10 ], parallel computing approaches [ 11 ], and low-rank deep learning methods [12]. Building on this ongoing ef fort, the ﬁrst objecti ve of this paper is to accelerate the calibration process by means of approximations and reducing the search space. W e propose to solve an approximation to the original discrepancy minimization objective function using a signed discrepancy metric. W e sho w that ﬁnding the roots of this approximation, that may not be far from the true optimal calibration parameters under standard assumptions, can be more efﬁcient. The Root Finding and Metamodeling for Computer Model Calibration second objectiv e is to e xtend our proposed frame work to noisy observations, where incorporating uncertainty promotes a more robust calibration. 1.1 Problem Statement Let D n = { ( x j , y j ) } n j =1 be the dataset av ailable up to time n , where x j ∈ X ⊂ R m x is the input variate that could represent external factors such as en vironmental conditions and y j ∈ Y ⊂ R m y is the real-world system response. Throughout this work, we consider the computer model as a black-box function h : X × Θ → Y that maps the inputs to outputs with a parameter θ ∈ Θ ⊂ R m θ , where Θ is compact. Deﬁne the (random) residual function R ∈ R m y as R ( θ ) := Y − h ( X ; θ ) (removing X and Y from the input arguments for ease of exposition). Assuming that the system behavior up to time n is in steady state, meaning that { ( x j , y j ) } n j =1 are independent and identically distributed samples from the joint distribution of ( X, Y ) , calibration seeks min θ ∈ Θ ¯ f n ( θ ) := 1 m y 1 n n X j =1 ∥ r j ( θ ) ∥ 2 2 , (P) where r j ( θ ) := y j − h ( x j ; θ ) denotes the j -th residual of D n and let ¯ Θ ∗ n := arg min θ ∈ Θ ¯ f n ( θ ) denote the set of optimal solutions. The objectiv e function in (P) is also referred to as empirical risk minimization—a sample a verage approximation of the problem, min θ ∈ Θ ¯ f ( θ ) := 1 m y E  ∥ R ( θ ) ∥ 2 2  , (SP) where the expectation is with respect to the unknown joint distribution of ( X, Y ) , and let ¯ Θ ∗ := arg min θ ∈ Θ ¯ f ( θ ) denote the set of optimal solutions. Problem (SP) is a standard stochastic optimization problem that has commonly appeared for model calibration across man y studies [ 13 , 14 , 15 , 16 ]. There are sev eral well-established solution methods for (SP) including stochastic gradient [17], stochastic trust-region [18], and surrogate-based methods [19]. 1.2 Proposed A pproximation and Root ﬁnding Consider an approximation of (SP) as min θ ∈ Θ f ( θ ) := 1 m 2 y  E  1 ⊤ R ( θ )  2 = E " 1 m y m y X i =1 [ R ( θ )] i #! 2 , (SAP) where the calibration objecti ve is instead deﬁned by aggregating the residuals across the output dimensions into a scalar, and let Θ ∗ := arg min θ ∈ Θ f ( θ ) denote the set of optimal solutions. (Throughout the paper , for any vector v ∈ R d , we denote its i -th component by [ v ] i .) This formulation satisﬁes the following relationship, E  1 m y ∥ R ( θ ) ∥ 2 2  ≥ E   1 m y m y X i =1 [ R ( θ )] i ! 2   ≥ E " 1 m y m y X i =1 [ R ( θ )] i #! 2 , where the ﬁrst inequality follows from the Cauchy–Schwarz inequality , and the second inequality follows from the Jensen’ s inequality . Speciﬁcally , from the Cauchy–Schwarz inequality ,  P m y i =1 a i b i  2 ≤  P m y i =1 a 2 i   P m y i =1 b 2 i  , and letting a i = 1 and b i = [ R ( θ )] i for all i = 1 , . . . , m y , we obtain  P m y i =1 [ R ( θ )] i  2 ≤ m y  P m y i =1 [ R ( θ )] 2 i  , which yields the ﬁrst inequality after normalization and taking expectations. Therefore, solving the surrogate objective (SAP) amounts to minimizing the lo wer bound of the original problem (SP) . W e propose to solve this approximation rather than the original problem which can be done more effecti vely with root ﬁnding for the follo wing reason. T o see this, deﬁne the av eraged residual scalar S ( θ ) := 1 m y 1 ⊤ R ( θ ) = 1 m y m y X i =1 [ R ( θ )] i , which aggregates the residual vector R ( θ ) across the output dimensions into a single signed quantity . W ith this, f ( θ ) = ( E [ S ( θ )]) 2 ≥ 0 , and we consider ˜ f ( θ ) := E [ S ( θ )] , ˜ Θ ∗ := n θ ∈ Θ | ˜ f ( θ ) = 0 o . (RF) For a nonempty ˜ Θ ∗ (Assumption 2(a)), f ( θ ) is minimized at 0, and hence Θ ∗ = ˜ Θ ∗ , thus solving (SAP) is equi valent to solving (RF) . Solving (RF) in the presence of the sign information allo ws us to bypass large portions of the parameter 2 Root Finding and Metamodeling for Computer Model Calibration space and focus only on the regions where the sign is expected to change (see Figure 1). This progressiv e search space reduction is illustrated in Figures 3 and 4. Under the proposed frame work, one can expect a more guided sampling trace during the search, as demonstrated in Figure 6. W e primarily treat the root ﬁnding formulation of (SAP) as a surrogate to the original problem (SP) . Note that the approximate solution becomes tight in the special case of m y = 1 . In this case, E [( R ( θ )) 2 ] = ( E [ R ( θ )]) 2 + V ar( R ( θ )) , so that the gap reduces to the v ariance term. In this setting, if either (a) the parameter that minimizes the problem (SAP) also minimizes the variance, i.e., θ ∗ = arg min θ ∈ Θ ( E [ R ( θ )]) 2 = arg min θ ∈ Θ V ar( R ( θ )) , an outcome that arises when systems with optimal param- eters stabilize and exhibit reduced v ariability [20, 21], or (b) the variance satisﬁes the relation V ar ( R ( θ )) ∝ d dθ E [ R ( θ )] , which is known as state-dependent variance property where the noise diminishes near stationary points, then solving (SAP) also solves (SP) , i.e., arg min θ ∈ Θ ( E [ R ( θ )]) 2 = arg min θ ∈ Θ E [( R ( θ )) 2 ] . Beyond this special case, solving the approximation problem, in general, leads to a dif ferent solution. This naturally leads to the question: how accurate is the resulting approximate solution? W e address this question in Section 2; in summary , we show that the solution bound is characterized by the residual properties, and discuss the stochastic treatment to further reﬁne this accuracy . 1.3 Metamodeling and Root ﬁnding When solving (RF) , we employ a metamodeling approach (e.g., kriging or GPR), and solve the problem via Bayesian Optimization , but with adjusted acquisition functions for the root ﬁnding scheme. One could alternatively use other approaches for root ﬁnding with a stochastic deriv ativ e-free optimization (DFO) algorithm such as [ 18 , 22 ]. An advantage of using kriging is the ability to e xtend, with some modiﬁcations, to stochastic kriging [23] in order to take uncertainty in the observ ed residuals into account. In the stochastic setting, the metamodel’ s output remains random ev en at the ev aluated parameter v alues, allowing us to seek solutions that are optimal in expectation rather than only for a ﬁxed data sample. This stochastic treatment also provides an opportunity to further tighten the solution accurac y bound analyzed in Section 2. W e demonstrate through several e xamples that stochastic kriging ﬁnds better solutions to the original problem than deterministic kriging under the same computational budget. Additionally , the sample average approximation of (RF) may not hav e a root–where all aggregated residuals are either positiv e or negati ve ov er the parameter space–due to the ﬁnite sample v ariability or misspeciﬁcation of the computer model. W e show in Section J that the proposed root ﬁnding framework naturally retrie ves the optimizer of this sample av erage ev en in the absence of a root. 1.4 Contributions and Paper Organization Our contributions in this paper are threefold. First, we analyze the solution accuracy of the root ﬁnding approximation and characterize its error bounds and behavior compared to the original problem. W e also show that a stochastic approach that accounts for the ﬁnite-sample error can further improve the solution accuracy . Second, we develop a metamodeling framework tailored to the root ﬁnding, introducing new acquisition functions and a search space reduction technique, and characterize its asymptotic behavior in the absence of a root. Lastly , we extend the framew ork to stochastic setting to enhance robustness against noisy observations. W e discuss that stochastic kriging can be used as the stochastic approach to further improv e the solution accuracy . The paper is organized as follows. Section 2 analyzes the solution accuracy of the root ﬁnding approximation and deri ves error bounds with con vergence analysis. Section 3 re views kriging-based metamodeling and standard acquisition strategies for deterministic calibration. Section 4 introduces the proposed root ﬁnding framew ork with modiﬁed acquisition strategies and acceleration techniques, and characterizes the asymptotic beha vior of these strate gies when no root exists. Section 5 extends the root ﬁnding framework to stochastic kriging for robust calibration. Section 6 presents the numerical experiments and concludes in Section 7 with se veral remarks. List of Notations m x , m y , m θ ˆ = dimensions of input, output, and parameter space X, Y ˆ = random input and output variates D n = { ( x j , y j ) } n j =1 ˆ = observed dataset of size n h : X × Θ → Y ˆ = computer model output 3 Root Finding and Metamodeling for Computer Model Calibration θ ∈ Θ ⊂ R m θ ˆ = calibration parameter R ( θ ) = Y − h ( X ; θ ) ∈ R m y ˆ = residual vector r j ( θ ) = y j − h ( x j ; θ ) ˆ = j -th residual of D n S ( θ ) = 1 m y 1 ⊤ R ( θ ) ˆ = averaged residual scalar ¯ f ( θ ) , f ( θ ) , ˜ f ( θ ) ˆ = objectiv es of (SP), (SAP), (RF) ¯ f n ( θ ) , f n ( θ ) , ˜ f n ( θ ) ˆ = sample average approximations of (SP), (SAP) and (RF) δ ( θ ) ˆ = spatial variability; see (1) ¯ Θ ∗ , Θ ∗ ˆ = solution sets of (SP) and (SAP) Θ ∗ n , ˇ Θ ∗ n ˆ = deterministic and stochastic approximate solution sets for Θ ∗ Θ t = { θ 1 , . . . , θ p + t } ˆ = set of conﬁgurations at iteration t with p initial design points 2 Solution Accuracy of the Surrogate A pproximation In this section, we analyze the solution accuracy of the surrogate approximation (SAP) relativ e to the original problem (SP) . W e characterize the gap between the solutions through the structural properties of (SP) , the spatial variability of the residuals, the variance of the aggregated residuals, and an estimation error for solving (SAP) with a ﬁnite sample D n . This gap becomes small when the residuals are approximately uniform across output dimensions and have low v ariability across observations, and when the function (SP) increases more rapidly away from its minimizer ¯ θ ∗ . Additionally , this gap can be further tightened by accounting for the ﬁnite sample uncertainty . W e begin by characterizing the approximation error arising from replacing ¯ f ( θ ) with its minorizing function f ( θ ) . From (SP) and (SAP), we can write ¯ f ( θ ) − f ( θ ) = E  1 m y ∥ R ( θ ) ∥ 2 2  − ( E [ S ( θ )]) 2 = E  1 m y ∥ R ( θ ) ∥ 2 2  − E h ( S ( θ )) 2 i | {z } := δ ( θ ) + E h ( S ( θ )) 2 i − ( E [ S ( θ )]) 2 | {z } =V ar( S ( θ )) . W e refer to the ﬁrst term as the spatial variability and deﬁne it as follo ws. Deﬁnition 1 (Spatial V ariability) . The spatial variability δ ( θ ) is deﬁned as δ ( θ ) := E " 1 m y m y X i =1 ([ R ( θ )] i ) 2 # − E   1 m y m y X i =1 [ R ( θ )] i ! 2   = E    1 m y m y X i =1   [ R ( θ )] i − 1 m y m y X j =1 [ R ( θ )] j   2    δ ( θ ) quantiﬁes the variability of the residuals across the output dimensions. This term vanishes when m y = 1 , and becomes small when, within each observ ation, the residual components are similar . This can occur in systems where the model outputs are gov erned by a common underlying mechanism and operate under similar conditions. The second term, V ar( S ( θ )) , represents the variability of the aggre gated residual across observation. This term can be expressed as V ar( S ( θ )) = 1 m 2 y   m y X i =1 V ar([ R ( θ )] i ) + X i  = j Co v ([ R ( θ )] i , [ R ( θ )] j )   , and is small when the residuals in each output dimension have lo w variability across observ ations, or when the residual components are negati vely correlated. The conditions that make small δ ( θ ) and V ar( S ( θ )) are conceptually dif ferent, and thereby neither condition implies the other . Howe ver , both terms can be small when the residuals are similar across output dimensions and e xhibit low variability across the observations. Notably , lar ge values of these terms may limit the accuracy of the proposed approximation, but this also reﬂects the inherent dif ﬁculty of the original problem (SP) itself. W e next discuss the structural properties of (SP) that are needed to characterize the solution accuracy bound. Since ¯ f is a composite function that depends on h , θ , and ( X, Y ) , we impose the following regularity conditions on these components. 4 Root Finding and Metamodeling for Computer Model Calibration Assumption 1 (Regularity Conditions on the Computer Model) . (a) h ( X ; θ ) is twice-continuously differentiable with respect to θ for all θ ∈ Θ . Accordingly , let J h ( θ ) := ∇ θ h ( X ; θ ) ∈ R m y × m θ and H h ( θ ) := ∇ 2 θ h ( X ; θ ) ∈ R m y × m θ × m θ denote the Jacobian and Hessian of h with respect to θ . (b) There exists a non-negati ve integrable random v ariable V with E [ V ] < ∞ such that ∥ J h ( θ ) ⊤ R ( θ ) ∥ 2 ≤ V for all θ ∈ Θ . (c) There exists a non-negativ e integrable random v ariable W with E [ W ] < ∞ such that ∥ R ( θ ) ∥ 2 2 ≤ W for all θ ∈ Θ . (d) There e xists a non-negati ve inte grable random v ariable U with E [ U ] < ∞ such that   J h ( θ ) ⊤ J h ( θ ) − P m y i =1 [ R ( θ )] i [ H h ( θ )] i   F ≤ U for all θ ∈ Θ , where ∥ · ∥ F denotes the Frobenius norm. (e) There exists ¯ u > E [ U ] , where U is as in Assumption 1(d) and λ min denotes the smallest eigen value, such that inf θ ∈ Θ λ min  E [ J h ( θ ) ⊤ J h ( θ )]  ≥ ¯ u . As J h ( θ ) and H h ( θ ) are random due to their dependence on X , we in voke the following result to differentiate ¯ f ( θ ) under the expectation. Theorem 1 (Lebesgue’ s Dominated Conv ergence Theorem) . Assume X n → X almost surely as n → ∞ . Suppose ther e exists Y ∈ L 1 ( P ) such that | X n | ≤ Y almost sur ely for every n . Then X ∈ L 1 ( P ) and E [ X n ] → E [ X ] as n → ∞ . Under Assumption 1(c), R ( θ ) ∈ L 2 and under Assumption 1(b) and Theorem 1, the gradient of ¯ f ( θ ) is ∇ θ ¯ f ( θ ) = 1 m y ∇ θ E  ∥ R ( θ ) ∥ 2 2  = 1 m y E  ∇ θ ∥ R ( θ ) ∥ 2 2  = − 2 m y E [ J h ( θ ) ⊤ R ( θ )] ∈ R m θ . Similarly , under Assumption 1(d) and Theorem 1, the Hessian of ¯ f ( θ ) is ∇ 2 θ ¯ f ( θ ) = 1 m y ∇ 2 θ E  ∥ R ( θ ) ∥ 2 2  = − 2 m y ∇ θ E  J h ( θ ) ⊤ R ( θ )  = 2 m y E " J h ( θ ) ⊤ J h ( θ ) − m y X i =1 [ R ( θ )] i [ H h ( θ )] i # ∈ R m θ × m θ . Then, under Assumption 1(e), for any θ ∈ Θ , ∇ 2 θ ¯ f ( θ ) ⪰ 2 m y  λ min  E [ J h ( θ ) ⊤ J h ( θ )]  − E [ U ]  I m θ ⪰ 2 m y ( ¯ u − E [ U ]) I m θ ≻ 0 , where we obtain the following strong con vexity condition for (SP). Corollary 2 (Strong Con ve xity of the Calibration Objectiv e) . Under Assumption 1, ¯ f ( θ ) is ¯ ℓ C -str ongly con vex over Θ with ¯ ℓ C = 2 m y ( ¯ u − E [ U ]) , that is, ¯ f ( θ ) − ¯ f ( ¯ θ ∗ ) ≥ ¯ ℓ C 2 ∥ θ − ¯ θ ∗ ∥ 2 , for all θ ∈ Θ , and ¯ θ ∗ is the unique minimizer of ¯ f ( θ ) , and hence ¯ Θ ∗ = { ¯ θ ∗ } . W e then deﬁne the distance between a point a ∈ R d and a set B ⊂ R d as dist( a, B ) := min b ∈B ∥ a − b ∥ , so that under Theorem 2, solving (SAP) giv es dist( ¯ θ ∗ , Θ ∗ ) 2 ≤ 2 ¯ ℓ C min θ ∈ Θ ∗  ¯ f ( θ ) − ¯ f ( ¯ θ ∗ )  . Since Θ ∗ is not directly accessible and can only be estimated from the ﬁnite sample D n , let the sample average approximation of (SAP) be f n ( θ ) := 1 m y 1 n n X j =1 1 ⊤ r j ( θ ) ! 2 , Θ ∗ n := arg min θ ∈ Θ f n ( θ ) , (1) where Θ ∗ n denotes the set of optimal solutions. T o study the behavior of Θ ∗ n , we next establish the uniform conv ergence of f n to f over Θ . 5 Root Finding and Metamodeling for Computer Model Calibration Proposition 3 (Uniform Law of Large Numbers) . Given that D n consists of i.i.d. draws fr om the joint distribution ( X, Y ) , Θ is compact and under Assumption 1(c), sup θ ∈ Θ | f n ( θ ) − f ( θ ) | → 0 as n → ∞ a.s. The integrability condition needed for Theorem 3 [ 24 , Pro. 8.5] is satisﬁed under Assumption 1(c). That is, from Cauchy–Schwarz inequality , for all θ ∈ Θ ,  1 m y 1 ⊤ r j ( θ )  2 ≤ 1 m y ∥ r j ( θ ) ∥ 2 2 , j = 1 , . . . , n. Since r j ( θ ) is a realization of R ( θ ) , Assumption 1(c) giv es for all θ ∈ Θ , 1 m y ∥ R ( θ ) ∥ 2 2 ≤ 1 m y W . Hence, the sample average terms are bounded by the integrable random variable W /m y , and since E [ W ] < ∞ , the integrability condition holds. Note that Theorem 3 itself does not imply that solving (1) yields a solution that is close to Θ ∗ . That said, ev en with f n ( θ ) uniformly close to f ( θ ) , a minimizer of f n may remain far from Θ ∗ if f is ﬂat near Θ ∗ . Thus, con vergence of a minimizer of f n to the solution set Θ ∗ further requires a growth condition near Θ ∗ . For this, we next impose the follo wing regularity conditions. Assumption 2 (Regularity Conditions on the Root-Finding Approximation) . (a) ˜ Θ ∗ is nonempty . (b) There exists ˜ ℓ R > 0 such that for e very θ a ∈ Θ and every θ b ∈ arg min θ ∈ ˜ Θ ∗ ∥ θ a − θ ∥ , | ˜ f ( θ a ) − ˜ f ( θ b ) | ≥ ˜ ℓ R ∥ θ a − θ b ∥ . (c) E h   ∇ θ ( S ( θ )) 2   2 2 i < ∞ for all θ ∈ Θ . (d) There exists a nonneg ative random v ariable L with E [ L 2 ] < ∞ such that ∥∇ θ ( S ( θ a )) 2 − ∇ θ ( S ( θ b )) 2 ∥ 2 ≤ L ∥ θ a − θ b ∥ 2 for all θ a , θ b ∈ Θ . (e) ˜ f ( θ ) is ˜ ℓ E -Lipschitz continuous ov er Θ , i.e., | ˜ f ( θ a ) − ˜ f ( θ b ) | ≤ ˜ ℓ E ∥ θ a − θ b ∥ for all θ a , θ b ∈ Θ . (f) V ar( S ( θ )) is ˜ ℓ V -Lipschitz continuous o ver Θ , i.e., | V ar( S ( θ a )) − V ar( S ( θ b )) | ≤ ˜ ℓ V ∥ θ a − θ b ∥ for all θ a , θ b ∈ Θ . (g) δ ( θ ) is ˜ ℓ δ -Lipschitz continuous ov er Θ , i.e., | δ ( θ a ) − δ ( θ b ) | ≤ ˜ ℓ δ ∥ θ a − θ b ∥ for all θ a , θ b ∈ Θ . W e next establish the con ver gence rate of sample average approximation. Theorem 4 (Con vergence Rate of Sample A verage Approximation) . Given that D n consists of i.i.d. draws fr om the joint distrib ution ( X, Y ) and Θ is compact, under Assumption 1(c), Assumption 2(a), Assumption 2(b), Assumption 2(c), and Assumption 2(d), for any θ ∗ n ∈ Θ ∗ n , dist( θ ∗ n , Θ ∗ ) = O p ( n − 1 / 2 ) , wher e O p ( n − 1 / 2 ) denotes that n 1 / 2 dist( θ ∗ n , Θ ∗ ) is bounded in pr obability . Theorem 4 requires a quadratic growth condition for f near the solution set Θ ∗ ; see [ 24 , Thm. 8.4] and [ 25 , Thm. 5.21]. Under Assumption 2(a), we hav e Θ ∗ = ˜ Θ ∗ and ˜ f ( θ ) = 0 for all θ ∈ ˜ Θ ∗ . Fix any θ a ∈ Θ , and let θ b ∈ arg min θ ∈ ˜ Θ ∗ ∥ θ a − θ ∥ . Then, by Assumption 2(b), | ˜ f ( θ a ) − ˜ f ( θ b ) | ≥ ˜ ℓ R ∥ θ a − θ b ∥ = ˜ ℓ R dist( θ a , ˜ Θ ∗ ) . Since ˜ f ( θ b ) = 0 , it follows that | ˜ f ( θ a ) | ≥ ˜ ℓ R dist( θ a , ˜ Θ ∗ ) and therefore, f ( θ a ) = ( ˜ f ( θ a )) 2 ≥ ˜ ℓ 2 R dist( θ a , Θ ∗ ) 2 , so that the quadratic gro wth condition for f near Θ ∗ is satisﬁed. Under Assumption 1(a), R ( θ ) is twice continuously differentiable in θ , and hence so is S ( θ ) = 1 m y 1 ⊤ R ( θ ) . Therefore, ∇ θ ( S ( θ )) 2 ∈ R m θ is well-deﬁned for all θ ∈ Θ . Assumption 2(c) and Assumption 2(d) further impose re gularity conditions on this gradient. W e additionally impose the following assumption to deri ve con ver gence rates in expectation. 6 Root Finding and Metamodeling for Computer Model Calibration Assumption 3 (Bounded Second Moment of the Estimation Error) . For any θ ∗ n ∈ Θ ∗ n , sup n ≥ 1 E [ n dist( θ ∗ n , Θ ∗ ) 2 ] < ∞ . Corollary 5 (Sample A verage Con vergence Rates in Expectation) . Under the assumptions of Theorem 4 and Assump- tion 3, E [dist( θ ∗ n , Θ ∗ )] = O ( n − 1 / 2 ) , E [dist( θ ∗ n , Θ ∗ ) 2 ] = O ( n − 1 ) . Pr oof. By Assumption 3, there exists a constant c > 0 such that for all n ≥ 1 , E  n dist( θ ∗ n , Θ ∗ ) 2  ≤ c. Hence, E  dist( θ ∗ n , Θ ∗ ) 2  ≤ c n , which giv es E  dist( θ ∗ n , Θ ∗ ) 2  = O ( n − 1 ) . Also, by the Cauchy–Schwarz inequality , E [dist( θ ∗ n , Θ ∗ )] ≤ p E [dist( θ ∗ n , Θ ∗ ) 2 ] = O ( n − 1 / 2 ) . W e then establish the solution accuracy bound for the sample a verage approximation. Theorem 6 (Solution Accuracy Bound) . Given that D n consists of i.i.d. draws fr om the joint distribution of ( X, Y ) , Θ is compact, and under Assumption 1 and Assumption 2 dist( ¯ θ ∗ , Θ ∗ n ) 2 ≤ 2 ¯ ℓ C ˜ ℓ 2 E dist( θ ∗ n , Θ ∗ ) 2 + sup θ ∈ Θ ∗ δ ( θ ) + ˜ ℓ δ dist( θ ∗ n , Θ ∗ ) + sup θ ∈ Θ ∗ V ar( S ( θ )) + ˜ ℓ V E [dist( θ ∗ n , Θ ∗ )] + ˜ ℓ 2 E E  dist( θ ∗ n , Θ ∗ ) 2  ! . For the proof, see Section A. W e then establish the conv ergence rate of this bound. Corollary 7 (Con vergence Rate of Solution Accuracy) . Under the assumptions of Theorem 6 and Assumption 3, dist( ¯ θ ∗ , Θ ∗ n ) 2 ≤ 2 ¯ ℓ C sup θ ∈ Θ ∗ ( δ ( θ ) + V ar( S ( θ ))) + O p ( n − 1 / 2 ) . Pr oof. Follows directly from Theorem 6 by applying Theorem 4 and Assumption 3. Theorem 7 sho ws that the accuracy of the surrogate solution is governed by the spatial v ariability , the variance of the aggregated residual, and the curv ature of the original objective. In particular , solving (SAP) yields a more reliable proxy for (SP) when the variability terms are small, or when ¯ f is sufﬁciently strongly curv ed around its minimizer so that the effect of these variability terms is small. Importantly , these components are intrinsic to the calibration problem itself and cannot be reduced through (SAP) . Howe ver , a better treatment of ﬁnite-sample approximation can be obtained by accounting for sampling uncertainty , where D n is treated as random. T o formalize this, we impose the following assumption. Assumption 4 (Improv ed Uniform Mean Squared Error of the Stochastic Estimator) . Deﬁne ˇ f n as a stochastic estimator of f constructed from D n , and ˇ Θ ∗ n := arg min θ ∈ Θ ˇ f n ( θ ) . Then, E  sup θ ∈ Θ ( ˇ f n ( θ ) − f ( θ )) 2  ≤ E  sup θ ∈ Θ ( f n ( θ ) − f ( θ )) 2  . Assumption 4 states that the stochastic estimator ˇ f n provides better estimates of f than f n ov er Θ . In Section 5, we show that stochastic kriging motiv ates Assumption 4 under suitable regularity conditions. The next result gives an expected v ersion of the solution accuracy bound. Corollary 8 (Expected Solution Accurac y Bound) . Under the assumptions of Theor em 6 and Assumption 2(b), E  dist( ¯ θ ∗ , Θ ∗ n ) 2  ≤ 2 ¯ ℓ C 2 ˜ ℓ 2 E E  dist( θ ∗ n , Θ ∗ ) 2  + sup θ ∈ Θ ∗ δ ( θ ) + sup θ ∈ Θ ∗ V ar( S ( θ )) + ( ˜ ℓ δ + ˜ ℓ V )  E  dist( θ ∗ n , Θ ∗ ) 2  1 / 2 ! . 7 Root Finding and Metamodeling for Computer Model Calibration See Section B for the proof. The following result sho ws that an improv ement in the mean squared estimation error of the stochastic estimator can lead to a tighter upper bound on the expected solution accurac y . Corollary 9 (Impro ved Upper Bound on Expected Solution Accuracy) . Let E [dist( ¯ θ ∗ , A ) 2 ] denote the upper bound on E [dist( ¯ θ ∗ , A ) 2 ] given in Theor em 8. Under the assumptions of Theor em 8 and Assumption 4, E  dist( ¯ θ ∗ , ˇ Θ ∗ n ) 2  ≤ E  dist( ¯ θ ∗ , Θ ∗ n ) 2  . For the proof, see Section C. W e note that Theorem 9 does not necessarily imply a comparable con vergence rate as shown in Theorem 6. Establishing such a rate would require additional assumptions on ˇ f n , analogous to those used in Theorem 4. W e do not pursue that extension here, although one may expect a similar type of rate result under suitable regularity conditions. W e note that several assumptions used in this section, such as Assumption 2(b), are imposed globally over Θ rather than locally near the optimum, primarily for technical conv enience. The key implication of this section is that (SAP) can become a reliable proxy for (SP) , and that a stochastic treatment can further improve the solution accuracy bound. 3 Calibration with Deterministic Simulation Metamodeling Solving the calibration problem (P) can be computationally challenging, especially when D n is large or when the ev aluation of h ( · ) itself is time consuming. This motiv ates the use of a simulation metamodeling approach, often referred to as the response surface method, in which a surrogate model is constructed to approximate the objective function. 3.1 Kriging Kriging is a widely-used metamodeling approach that was originally dev eloped in geostatistics for spatial interpolation, such as estimating mineral distributions in mining [ 26 ]. It has since then evolv ed into a standard technique with applications in machine learning, uncertainty quantiﬁcation, and simulation metamodeling. In the machine learning area, kriging is more commonly referred to as Gaussian process regression (GPR) and is widely used for h yperparameter tuning, prediction and uncertainty quantiﬁcation. T o ease the use of Kriging for solving the calibration task in (P) , we generally assume that ¯ f n ( θ ) is a realization of a Gaussian random ﬁeld (GRF) over Θ [23]. Deﬁne Θ t as the set of e valuated parameters at iteration t . T o initialize the procedure, we allow for an initial design consisting of p -points in the parameter space, so that Θ t consists of these p -initial design points when t = 0 , i.e., Θ 0 = { θ 1 , . . . , θ p } . Let ¯ f t n represents the corresponding function v alues at Θ t . F or solving (P) , we get ¯ f t n = [ ¯ f n ( θ 1 ) , . . . , ¯ f n ( θ p + t )] ⊤ . Then, the predictive distribution of kriging at an unseen conﬁguration θ at iteration- t is expressed as: η t n ( θ ) ∼ N  µ t n ( θ ) , ( σ t n ) 2 ( θ )  , (2) µ t n ( θ ) = k ( θ, Θ t ; l )[ k (Θ t , Θ t ; l )] − 1 ¯ f t n , ( σ t n ) 2 ( θ ) = k ( θ , θ ; l ) − k ( θ , Θ t ; l )[ k (Θ t , Θ t ; l )] − 1 k ( θ , Θ t ; l ) ⊤ , where Θ t = { θ 1 , . . . , θ p + t } . The kernel function k ( · , · ; l ) deﬁnes the correlation between two conﬁgurations in the parameter space. A common choice for kernel function is squared exponential kernel, also known as radial basis function (RBF) kernel, which is deﬁned by k ( θ , θ ′ ; l ) = exp  − ∥ θ − θ ′ ∥ 2 2 l 2  , where θ , θ ′ ∈ R m θ , and l > 0 is the length-scale hyperparameter for the kernel. Giv en two sets of points w ∈ R m m × m θ and v ∈ R m n × m θ , we deﬁne the kernel matrix k ( w , v ; l ) ∈ R m m × m n , where each entry corresponds to the kernel ev aluation between a pair of points from the two sets. The ( i, j ) − th entry of the kernel matrix k ( w , v ; l ) is [ k ( w , v ; l )] i,j := k ([ w ] i , [ v ] j ; l ) . k ( θ , Θ t ; l ) ∈ R p + t denotes the vector of kernel ev aluations between the test point θ and the set of e valuated points Θ t ∈ R ( p + t ) × m θ . The kernel matrix k (Θ t , Θ t ; l ) ∈ R ( p + t ) × ( p + t ) represents the pairwise kernel ev aluation between the ev aluated conﬁgurations. The hyperparameter l controls how smoothly the function varies across the parameter space. Smaller values of l make the kernel decay more rapidly with distance, allowing the function to capture more localized v ariations. In contrast, larger v alues of l result in a more broader correlation across the parameter space, thereby enforcing smoother and lower variational structure. In practice, l is chosen by maximizing the log marginal likelihood as L t n ( l ) = − 1 2 ( ¯ f t n ) ⊤  k (Θ t , Θ t ; l )  − 1 ¯ f t n − 1 2 log det k (Θ t , Θ t ; l ) − p + t 2 log(2 π ) . 8 Root Finding and Metamodeling for Computer Model Calibration 3.2 Sequential Design and Acquisition Strategies As noted, metamodeling approaches are employed to reduce the computational cost of the time-consuming computer models. These surrogates are reﬁned sequentially by selecting ne w parameter values in a way that exploits the surrogate’ s structure. This process is carried out by optimizing an acquisition function , which provides the criterion for choosing the next e valuation point. When kriging is used, this sequential process is typically referred to as Bayesian optimization , particularly in the machine learning domain [ 27 ]. Some commonly used acquisition functions are Lo wer Conﬁdence Bound (LCB), Probability of Improv ement (PI), and Expected Improvement (EI) [28, 29]. In the following, we re view three commonly used acquisition functions. For consistenc y , we adopt the same notation as before. Lower Conﬁdence Bound Lower Conﬁdence Bound (LCB) is a widely used acquisition function that balances exploration and exploitation when selecting the ne xt ev aluation point. In the minimization context, LCB f avors poi nts where the predicted mean is lo w with higher uncertainty (to explore). The LCB acquisition function is deﬁned as: LCB t n ( θ ) = µ t n ( θ ) − κσ t n ( θ ) where κ > 0 is a user-deﬁned parameter that balances the e xploitation and exploration. Larger values of κ emphasize exploration by giving more weight to uncertainty , while smaller values trigger more exploitation over the mean prediction. In this setting, optimizing the acquisition function is carried out by minimizing the LCB, i.e., θ t +1 = arg min θ ∈ Θ LCB t n ( θ ) . Probability of Impr ovement Probability of Improv ement (PI) selects the next ev aluation point based on the likelihood that it is expected to yield a lower value than the current best observed value so far . Deﬁne v t n as the current best observed value among the ev aluated conﬁgurations, v t n := min ¯ f t n . The PI acquisition function is deﬁned as: PI t n ( θ ) = Φ  v t n − µ t n ( θ ) σ t n ( θ )  , for the detailed deri vation, see Section D. The ne xt ev aluation point is carried out by maximizing the PI, i.e., θ t +1 = arg max θ ∈ Θ PI t n ( θ ) . Expected Impro vement Expected Improv ement (EI) selects the next e valuation point based on the expected magnitude of impro vement over the current best observed v alue. Deﬁne the standardized threshold z t n ( θ ) := v t n − µ t n ( θ ) σ t n ( θ ) . The EI acquisition function is deﬁned as: EI t n ( θ ) = ( v t n − µ t n ( θ ))Φ( z t n ( θ )) + σ t n ( θ ) ϕ ( z t n ( θ )) , for the detailed deriv ation, see Section E. Using EI, the next ev aluation point is obtained by maximizing EI, i.e., θ t +1 = arg max θ ∈ Θ EI t n ( θ ) . Acquisition Function Optimization In the sequential design frame work, once the acquisition is chosen, the next e valuation point is decided by optimizing the acquisition function. One common approach for this step is to employ a multi-start strategy , where several initial points are randomly sampled across the parameter space and then reﬁned using an ef ﬁcient local optimization solver [ 27 ]. Quasi-Newton or other nonlinear optimization methods are typically employed when deriv atives can be computed efﬁciently . Each ev aluation of the acquisition function requires computing the posterior predictiv e distribution at the candidate point, which in volves in verting the regularized kernel matrix in (2) . This matrix inv ersion is usually the most computationally intensiv e part of the sequential design procedure. Accordingly , as more observations are accumulated, acquisition function optimization becomes more computationally expensi ve. 9 Root Finding and Metamodeling for Computer Model Calibration 4 Calibration with Root Finding Strategy In this section, we introduce an alternati ve root ﬁnding frame work to solve (1) . Deﬁne the sample root-ﬁnding objectiv e as ˜ f n ( θ ) := 1 m y 1 n n X j =1 1 ⊤ r j ( θ ) , (3) and since f n ( θ ) = ( ˜ f n ( θ )) 2 , solving (1) is equiv alent to ﬁnding a root of (3) whenev er such a root exists. The key motiv ation is that aggregating residuals before squaring preserves sign information, which can be effecti vely lev eraged to guide the calibration process. Throughout, we assume that the sign of the discrepancy is treated symmetrically , without fa voring either positiv e or negati ve values, although weighting schemes may be incorporated if desired to fav or one ov er the other . W e ﬁrst establish the following assumptions. Assumption 5 (Regularity Conditions on the Sample Root-Finding Problem) . (a) There exists a θ ∗ n ∈ Θ such that ˜ f n ( θ ∗ n ) = 0 . (b) ˜ f n ( θ ) is continuous on θ ∈ Θ . Theorem 10 (Bolzano’ s Theorem) . Let ˜ f : Θ → R be continuous on a closed interval [ θ a , θ b ] ⊂ Θ . If ˜ f ( θ a ) · ˜ f ( θ b ) < 0 , then ther e exists θ ∗ ∈ ( θ a , θ b ) such that ˜ f ( θ ∗ ) = 0 . While Assumption 5(a) does not hold in general for ﬁnite n , we ﬁrst impose this for clarity and later discuss in Section J where no root e xists. For Assumption 5(b), this assumption is typically satisﬁed, since most commonly used k ernels such as RBF or Matérn, are continuous, and such continuity assumptions appear frequently in many computer model applications [ 30 , 31 ]. Assumption 5(a) and Assumption 5(b) are particularly important, since under Theorem 10, they ensure the existence of a root when the discrepancy function exhibits opposite signs at two distinct points. Importantly , this enables sign-guided calibration, in which we exploit the signs to progressively narrow down the search space. Figure 1 illustrates this structural advantage. Figure 1: Left: a metamodel ﬁtted to the signed discrepancy ˜ f n ( θ ) , where opposite signs at θ a and θ b guarantees the existence of a root within [ θ a , θ b ] by continuity . Right: a metamodel ﬁtted to the squared discrepancy f n ( θ ) , where additional ev aluations may be needed to locate lo wer estimates within [ θ a , θ b ] . Adapting this root ﬁnding strategy to the metamodeling framework, howe ver , requires new acquisition functions, since the con ventional acquisition functions are designed to target the re gions with lower v alues. Instead, root ﬁnding acquisition function should target where the re gions are expected to be close-to-zero. In what follows, we propose the root ﬁnding variants of the acquisition functions re vie wed in Section 3. 4.1 Proposed Root ﬁnding Acquisition Functions Follo wing similar notation as before, to distinguish these quantities, we denote ˜ η t n ( θ ) ∼ N ( ˜ µ t n ( θ ) , ( ˜ σ t n ) 2 ( θ )) , as the pre- dictiv e distribution that is constructed with ˜ f n ( θ ) from (3) . Accordingly , in (2) , we use ˜ f t n = [ ˜ f n ( θ 1 ) , . . . , ˜ f n ( θ p + t )] ⊤ instead. 10 Root Finding and Metamodeling for Computer Model Calibration Root ﬁnding with Lower Conﬁdence Bound W e extend the LCB acquisition function to root ﬁnding by fa voring points where the predicted discrepancy is close-to- zero while encouraging exploration through uncertainty . Accordingly , we deﬁne the root ﬁnding LCB as: g LCB t n ( θ ) = | ˜ µ t n ( θ ) | − κ ˜ σ t n ( θ ) . Since the predictiv e variance is always non-neg ative, only the mean component needs to be adjusted. As in the standard LCB, the balance between these e xploitation and exploration is controlled by the user -deﬁned parameter κ > 0 . The next e valuation point is selected by minimizing g LCB t n , i.e., θ t +1 = arg min θ ∈ Θ g LCB t n ( θ ) . Root ﬁnding with Probability of Impr ovement In the root ﬁnding setting, the improvement event is redeﬁned to measure ho w much the predicted v alue moves closer to zero relati ve to the currently closest observed v alue, where i t n ∗ := arg min 1 ≤ i ≤ p + t    [ ˜ f t n ] i    and ˜ v t n := [ ˜ f t n ] i t n ∗ denote the index and e valuated conﬁguration v alue whose observed discrepancy is closest to zero. The resulting root ﬁnding PI is deﬁned as e PI t n ( θ ) = Φ  | ˜ v t n | − ˜ µ t n ( θ ) ˜ σ t n ( θ )  − Φ  −| ˜ v t n | − ˜ µ t n ( θ ) ˜ σ t n ( θ )  , for the detailed deriv ation, see Section F. Figure 2: Probability of improvement under the root ﬁnding (left) and minimization frame work (right). Figure 2 illustrates the conceptual difference of improv ement criteria between the minimization and root ﬁnding. In the minimization, improv ement is deﬁned as the predictive distrib ution producing a value lo wer than the current best value, resulting in a one-sided threshold. Howe ver , the root ﬁnding seeks predictions that are closer to zero, resulting in a two-sided threshold. The next e valuation point is then selected by maximizing e PI t n , i.e., θ t +1 = arg max θ ∈ Θ e PI t n ( θ ) . Root ﬁnding with Expected Impro vement In the root ﬁnding framew ork, the expected amount of improvement is redeﬁned to quantify the e xpected reduction in the absolute de viation from zero. Let z ¯ t n ( θ ) := −| ˜ v t n |− ˜ µ t n ( θ ) ˜ σ t n ( θ ) , ˜ z t n ( θ ) := − ˜ µ t n ( θ ) ˜ σ t n ( θ ) , and ¯ z t n ( θ ) := | ˜ v t n |− ˜ µ t n ( θ ) ˜ σ t n ( θ ) denote the standardized lower , central, and upper thresholds, respectiv ely . The resulting root ﬁnding expected improvement is deﬁned as e EI t n ( θ ) = | ˜ v t n |  Φ( ¯ z t n ( θ )) − Φ( z ¯ t n ( θ ))  + ˜ µ t n ( θ )  2Φ( z t n ( θ )) − Φ( ¯ z t n ( θ )) − Φ( z ¯ t n ( θ ))  − ˜ σ t n ( θ )  2 ϕ ( z t n ( θ )) − ϕ ( ¯ z t n ( θ )) − ϕ ( z ¯ t n ( θ ))  , for the detailed deri vation, see Section G. The ne xt ev aluation point is then selected by maximizing e EI t n , i.e., θ t +1 = arg max θ ∈ Θ e EI t n ( θ ) . 11 Root Finding and Metamodeling for Computer Model Calibration 4.2 Accelerating Acquisition Function Optimization Optimizing the acquisition function is the most computationally demanding step when using metamodel. In the root ﬁnding, howe ver , this optimization step can be ef fectively ac celerated by exploiting the sign information. That is, we can bypass re gions where no sign change occurs, thereby concentrating optimization efforts on those areas that are more likely to contain a root. Based on this insight, we introduce a search space reduction strategy tailored to this framework. W e additionally deri ve the closed form ﬁrst-order deri vati ves of the proposed root ﬁnding acquisition functions (see Section H and Section I) to facilitate the use of gradient-based optimizer . Search Space Reduction As additional e valuations are acquired, the number of sign changing interv als (combinations of conﬁgurations) also increases. The key question is therefore which intervals to select and prioritize for subsequent ev aluations. Since we assume no prior structural knowledge of the function, we focus on the most compact interv al that contains a root. W e quantify such compactness using a hypervolume measure based on the size of each sign-changing region in the parameter space. T o formalize this, we ﬁrst identify candidate subre gions by examining all pairs ( θ a , θ b ) with 1 ≤ a < b ≤ p + t such that [ ˜ f t n ] a · [ ˜ f t n ] b < 0 , where each pair deﬁnes a hyperrectangular subregion R ( θ a , θ b ) , R ( θ a , θ b ) = Y m θ l =1 [min { [ θ a ] l , [ θ b ] l } , max { [ θ a ] l , [ θ b ] l } ] . The hypervolume of each subre gion is computed as V ( θ a , θ b ) =  Y m θ l =1 max {| [ θ a ] l − [ θ b ] l | , θ }  ·    [ ˜ f t n ] a − [ ˜ f t n ] b    , where θ > 0 is a user-deﬁned distance threshold to prev ent zero-volume subregions when coordinates coinci de along any axis. Figure 3 illustrates this process and ov erall algorithmic framework is presented in Algorithm 1. Algorithm 1 Reduced Search Space (RSS) - deterministic 1: Initialization: Evaluated conﬁgurations Θ t = { θ 1 , . . . , θ p + t } ∈ R ( p + t ) × m θ ; corresponding signed discrepancies ˜ f t n = [ ˜ f n ( θ 1 ) , . . . , ˜ f n ( θ p + t )] ⊤ ; distance threshold θ > 0 2: S ← ∅ to store candidate subregions 3: for each pair ( a, b ) with 1 ≤ a < b ≤ p + t do 4: if ˜ f n ( θ a ) · ˜ f n ( θ b ) < 0 then 5: Deﬁne subregion: R ( θ a , θ b ) = Q m θ l =1 [min { [ θ a ] l , [ θ b ] l } , max { [ θ a ] l , [ θ b ] l } ] 6: Compute hypervolume: V ( θ a , θ b ) = ( Q m θ l =1 max {| [ θ a ] l − [ θ b ] l | , θ } ) ·    [ ˜ f t n ] a − [ ˜ f t n ] b    7: Append ( R ( θ a , θ b ) , V ( θ a , θ b )) to S 8: end if 9: end for 10: Select subregions in S with the smallest V ( θ a , θ b ) 4.3 Root Finding in Rootless Scenarios Throughout, the core of this section was moti vated by the sign information under Assumption 5(a) and Assumption 5(b). Howe ver , Assumption 5(a) need not hold for ﬁnite n , and ˜ f n ( θ ) may remain strictly positi ve (or negati ve) o ver Θ . In practice, the functional structure is unkno wn in advance, so it is unclear whether one should abandon the root ﬁnding formulation and reﬁt the surrogate to the minimization objective ¯ f n ( θ ) based on the observed residuals before collecting any e xcessive e valuations. W e refer to this as the r ootless scenario, and note that RSS is also inapplicable here since it relies on sign-changing interv als. W e discuss this scenario on a realized sample path of ˜ f n . For this, let (Ω , F , P ) be the probability space governing the randomness in D n , and deﬁne Ω + := { ω ∈ Ω : ˜ f n ( θ ; ω ) > 0 for all θ ∈ Θ } as the ev ent corresponding to the rootless scenario. W e then establish the follo wing result. 12 Root Finding and Metamodeling for Computer Model Calibration Figure 3: Computed hypervolume from three ev aluated conﬁgurations in a two-dimensional input space ( m θ = 2 ), among which two pairs exhibit opposite signs. These pairs deﬁne two candidate subregions, and the subregion with the smallest hypervolume is highlighted in green. Theorem 11 (Asymptotic Equi valence in the Rootless Scenario) . F or a ﬁxed ω ∈ Ω + , the optimizers of the r oot ﬁnding and standar d acquisition functions coincide asymptotically as t → ∞ ,     arg min θ ∈ Θ ] LCB t ( θ ; ω ) − arg min θ ∈ Θ LCB t ( θ ; ω )     → 0 ,     arg max θ ∈ Θ f PI t ( θ ; ω ) − arg max θ ∈ Θ PI t ( θ ; ω )     → 0 ,     arg max θ ∈ Θ f EI t ( θ ; ω ) − arg max θ ∈ Θ EI t ( θ ; ω )     → 0 . For the detailed assumptions and proofs, see Section J. Theorem 11 implies that the proposed acquisition functions at least asymptotically reco vers the minimizer of f n , where the gap between f n and the original objecti ve ¯ f is characterized in Section 2. 5 Calibration with Stochastic Simulation Metamodeling The deterministic approach in Section 3 relies on a ﬁxed observed dataset D n . Howe ver , this ﬁnite-sample formulation has sev eral limitations. First, it is subject to sampling variability and noise. Second, when n is large, ev aluating (P) becomes computationally expensi ve, as each calibration iteration requires processing the full dataset. Third, this sample av erage approximation implicitly assumes homoskedastic noise, which may not generally hold in practice. Stochastic kriging addresses these limitations by explicitly modeling the heteroskedastic nature of the observ ation. In this frame work, the stochastic input-output pairs ( X, Y ) can be obtained either through resampling methods (e.g., bootstrap) from D n , or by ﬁtting a generativ e model. In what follows, we ﬁrst re view the standard minimization based stochastic kriging. 5.1 Stochastic Kriging Stochastic kriging models the mean response ¯ f ( θ ) in (SP) as a realization of GRF . Unlike the deterministic setting in Section 3, where residuals r j ( θ ) were computed from the ﬁxed observed D n , in stochastic kriging, each conﬁguration θ is ev aluated with n ( θ ) independent replications drawn from the underlying distribution ( X, Y ) , where r ( j ) ( θ ) = y ( j ) − h ( x ( j ) ; θ ) . W e use separate notation for these residuals to emphasize that they arise from generated simulation draws at each θ , rather than from a pre-existing ﬁxed dataset. The replication average at θ is computed as ¯ f n ( θ ) ( θ ) := 1 m y 1 n ( θ ) n ( θ ) X j =1 ∥ r ( j ) ( θ ) ∥ 2 2 . (4) 13 Root Finding and Metamodeling for Computer Model Calibration Let Θ t = { θ 1 , . . . , θ p + t } denote the set of ev aluated conﬁgurations up to iteration t , and ¯ f t n = [ ¯ f n ( θ 1 ) ( θ 1 ) , . . . , ¯ f n ( θ p + t ) ( θ p + t )] ⊤ be the vector of replication a verages. These observations are modeled as ¯ f t n = ¯ f t + ε t , ε t ∼ N (0 , Σ ( t ) ) , where ¯ f t = [ ¯ f ( θ 1 ) , . . . , ¯ f ( θ p + t )] ⊤ denotes the vector of latent mean responses and ε t represents the simulation noise. The cov ariance matrix Σ ( t ) ∈ R ( p + t ) × ( p + t ) characterizes the noise structure induced by ﬁnite replications. Its diagonal entries quantify the sampling variance at each design point, while the off-diagonal entries capture potential correlations arising from common random numbers or other shared sources of randomness. The a -th diagonal entry of Σ ( t ) is estimated as [Σ ( t ) ] a,a = 1 n ( θ a )( n ( θ a ) − 1) n ( θ a ) X j =1  1 m y ∥ r ( j ) ( θ a ) ∥ 2 2 − ¯ f n ( θ a ) ( θ a )  2 . The replication size n ( θ ) gov erns the extent to which simulation noise is reduced at each design point. Increasing n ( θ ) decreases the v ariance of the replication av erage and giv es more accurate estimates of the mean response. Howe ver , larger n ( θ ) also incur higher computational cost, so it is important to balance the replication effort and modeling accuracy . One approach for determining n ( θ ) is based on the integrated mean squared error (IMSE) criterion [ 19 ]. In this approach, replication sizes are allocated to minimize the e xpected mean squared prediction error integrated ov er the domain of interest. Assuming that the extrinsic cov ariance and intrinsic v ariance functions are known, the replication effort can be determined by quantifying its mar ginal impact on the IMSE score. Alternativ ely , n ( θ ) can be determined by balancing the exploration–exploitation trade-of f between replicating at the visited points and ev aluating new design points [ 32 ]. Here, replication sizes are chosen by minimizing the integrated v ariance criterion, which is computed by comparing the posterior variance reduction achie ved by adding replications at existing design points and the reduction achiev ed by introducing new design points sampled from the posterior distrib ution. W e now describe the predicti ve distrib ution of stochastic kriging. Giv en Θ t and corresponding vector of replication av erages ¯ f t n , we seek a predictor of the mean response ¯ f ( θ ) at an unobserved conﬁguration θ . The predictive distribution of stochastic kriging is giv en by η t n ( θ ) ∼ N  µ t n ( θ ) , ( σ t n ) 2 ( θ )  , (5) µ t n ( θ ) = k ( θ, Θ t ; l ) h k (Θ t , Θ t ; l ) + Σ ( t ) i − 1 ¯ f t n , ( σ t n ) 2 ( θ ) = k ( θ , θ ; l ) − k ( θ , Θ t ; l ) h k (Θ t , Θ t ; l ) + Σ ( t ) i − 1 k ( θ , Θ t ; l ) ⊤ , for the detailed deriv ation, see Section K. These expressions closely mirror the standard kriging predictor in (2) , where the difference is that the simulation noise co variance Σ ( t ) is explicitly incorporated. The hyperparameter l is estimated by maximizing the log marginal lik elihood L t n ( l ) = − 1 2 ( ¯ f t n ) ⊤ h k (Θ t , Θ t ; l ) + Σ ( t ) i − 1 ¯ f t n − 1 2 log det  k (Θ t , Θ t ; l ) + Σ ( t )  − p + t 2 log(2 π ) . (6) 5.2 Sequential Design and Acquisition Strategies The sequential design strategies for stochastic kriging follow the same general principles as in Section 3. A con ventional approach is to directly apply the standard acquisition functions in Section 3, under the assumption that the stochasticity is implicitly accounted through Σ ( t ) . Here, alternativ e resampling methods, such as jackknife-based methods, can also be used to reﬁne the v ariance estimation [ 33 ]. Moreo ver , several reﬁned acquisition functions hav e also been proposed to better accommodate the stochastic settings. One example is the Augmented Expected Improvement (AEI), which modiﬁes EI by penalizing repeated sampling at the current best conﬁguration [ 34 ]. Howe ver , AEI relies on the assumption that the noise variance is known, which is not practically satisﬁed in many applications [ 35 ]. Another related work is Modiﬁed Expected Improv ement (MEI), which is proposed within a two-stage framework. In its initial stage, MEI accounts only for spatial prediction uncertainty , while in the second stage, replication ef fort is distributed based on the estimated noise variance [ 36 ]. Furthermore, sev eral choices hav e been discussed in the literature for deﬁning the current best v alue used in PI and EI, such as using the predictive mean [ 37 ] or quantile-based approach [38]. In this work, we deﬁne the current best value as the minimum predicti ve mean among Θ t , i t n ∗ := arg min 1 ≤ i ≤ p + t , µ t n ( θ i ) , v t n := µ t n ( θ i t n ∗ ) . (7) 14 Root Finding and Metamodeling for Computer Model Calibration 5.3 Root ﬁnding Modiﬁcations W e now extend the proposed root ﬁnding framework to the stochastic setting. Here, observations are no longer noise-free, and the presence of stochasticity implies that all quantities of interest, such as the sign and the existence of a root, are probabilistic in nature. W e ﬁrst write the stochastic root ﬁnding observation as ˜ f n ( θ ) ( θ ) := 1 m y 1 n ( θ ) n ( θ ) X j =1 1 ⊤ r ( j ) ( θ ) , which mirrors the expression in (4) . Given the set of design points Θ t , the vector of collected sample averages is ˜ f t n = [ ˜ f n ( θ 1 ) ( θ 1 ) , . . . , ˜ f n ( θ p + t ) ( θ p + t )] ⊤ . The a -th diagonal entries for simulation-noise cov ariance matrix ˜ Σ ( t ) ∈ R ( p + t ) × ( p + t ) is [ ˜ Σ ( t ) ] a,a = 1 n ( θ a )( n ( θ a ) − 1) n ( θ a ) X j =1  1 m y 1 ⊤ r ( j ) ( θ a ) − ˜ f n ( θ a ) ( θ a )  2 . All remaining modeling assumptions follo w the conv ention in (5) and (6) , with ¯ f t n and Σ ( t ) replaced by ˜ f t n and ˜ Σ ( t ) , and denote the root ﬁnding predictiv e distribution as ˜ η t n ( θ ) ∼ N  ˜ µ t n ( θ ) , ( ˜ σ t n ) 2 ( θ )  . W e now discuss the stochastic estimator that moti vates Assumption 4. In this setting, we deﬁne the stochastic estimator in Assumption 4 as ˇ f n ( θ ) := E h  ˜ η t n ( θ )  2 | D n i =  ˜ µ t n ( θ )  2 + ( ˜ σ t n ) 2 ( θ ) . (8) Under the GRF assumption, ˜ η t n ( θ ) |D n represents the posterior distribution of ˜ f ( θ ) , so that E h  ˜ η t n ( θ )  2 | D n i = E   ˜ f ( θ )  2 | D n  = E [ f ( θ ) | D n ] . As this becomes the posterior mean of f ( θ ) under D n , (8) achiev es the minimum mean squared error among Θ . Therefore, for each ﬁxed θ ∈ Θ , E h  ˇ f n ( θ ) − f ( θ )  2 i ≤ E h ( f n ( θ ) − f ( θ )) 2 i . (9) Although (9) does not by itself imply Assumption 4, this implies that for any θ ∈ Θ , ˇ f n achiev es a smaller mean squared error than f n . Now , given that Θ is compact, if this pointwise error improvement is sufﬁciently uniform across Θ , or when the kernel is bounded and sufﬁciently smooth so that ( ˇ f n ( θ ) − f ( θ )) 2 is relativ ely stable across Θ , then Assumption 4 becomes more plausible. Moreover , when the design points are well dispersed so that the resulting predictiv e variance does not drastically changes over Θ , this would further justify Assumption 4. Accordingly , the reﬁned solution is obtained by minimizing ˇ f n ( θ ) ov er Θ , where we deﬁne the current best v alue in the root-ﬁnding as i t n ∗ := arg min 1 ≤ i ≤ p + t  ( ˜ µ t n ( θ i )) 2 + ( ˜ σ t n ) 2 ( θ i )  , ˜ v t n := ˜ µ t n ( θ i t n ∗ ) . (10) Search Space Reduction In the stochastic setting, Theorem 10 no longer applies directly since we can only observe noisy sample averages. For this, we instead quantify the probability of root existence using the stochastic kriging posterior , P ( I ± ( θ a , θ b )) = Φ  − ˜ µ t n ( θ a ) ˜ σ t n ( θ a )  + Φ  − ˜ µ t n ( θ b ) ˜ σ t n ( θ b )  − 2Φ  − ˜ µ t n ( θ a ) ˜ σ t n ( θ a )  Φ  − ˜ µ t n ( θ b ) ˜ σ t n ( θ b )  , (11) where I ± ( θ a , θ b ) denotes the ev ent that ˜ f n ( θ a ) and ˜ f n ( θ b ) hav e opposite signs (see Section L). In the deterministic RSS, the search process was triggered when the e valuated points e xhibited opposite signs. Howe ver , in the stochastic setting, this triggering condition has to be modiﬁed since e very pair of conﬁgurations has a non-zero probability of having opposite signs. For this, we compute the probability of root e xistence for all conﬁguration pairs and narrow the search only to those that are likely to contain a root with high probability , i.e., P ( I ± ( θ a , θ b )) ≥ α , where α ∈ [0 , 1] is a user-deﬁned probability threshold. Accordingly , we deﬁne the hypervolume measure as V ( θ a , θ b ) =  Y m θ l =1 max {| [ θ a ] l − [ θ b ] l | , θ }  · (1 − P ( I ± ( θ a , θ b ))) , 15 Root Finding and Metamodeling for Computer Model Calibration Figure 4: RSS in a stochastic setting in a two-dimensional input space ( m θ = 2 ). Under the probabilistic framew ork, the hypervolume accounts for sampling v ariability , resulting in a different selected subregion compared to Figure 3. Algorithm 2 Reduced Search Space (RSS) - stochastic 1: Initialization: Evaluated conﬁgurations Θ t = { θ 1 , . . . , θ p + t } ∈ R ( p + t ) × m θ ; predictiv e means ˜ µ t n ( θ ) and standard deviations ˜ σ t n ( θ ) for all θ ∈ Θ t ; probability threshold α ∈ [0 , 1] ; distance threshold θ > 0 2: S ← ∅ to store candidate subregions 3: for each pair ( a, b ) with 1 ≤ a < b ≤ p + t do 4: if P ( I ± ( θ a , θ b )) ≥ α then 5: Deﬁne subregion: R ( θ a , θ b ) = Q m θ l =1 [min { [ θ a ] l , [ θ b ] l } , max { [ θ a ] l , [ θ b ] l } ] 6: Compute hypervolume: V ( θ a , θ b ) = ( Q m θ l =1 max {| [ θ a ] l − [ θ b ] l | , θ } ) · (1 − P ( I ± ( θ a , θ b ))) 7: Append ( R ( θ a , θ b ) , V ( θ a , θ b )) to S 8: end if 9: end for 10: Select subregions in S with the smallest V ( θ a , θ b ) where the goal is to identify a compact region with a smaller hypervolume. Figure 4 illustrates this process and detailed algorithmic framew ork is presented in Algorithm 2. Here, the triggering condition in RSS may also be used to determine the replication size n ( θ ) . Since larger replication sizes reduce the predictiv e variance (uncertainty) by reducing the simulation noise, replication can be allocated in two complementary ways. First, within a ﬁxed replication budget at each iteration, additional replications may be assigned to reduce uncertainty at e valuated conﬁgurations where no candidate pair has yet satisﬁed the triggering condition α . Secondly , once the potential pairs are identiﬁed, replications can be allocated to those conﬁgurations under the ranking and selection method [39], continuing until a desired lev el of statistical signiﬁcance is achiev ed. 6 Numerical Experiment In this section, we conduct a range of experiments to ev aluate the calibration performance of the proposed root ﬁnding frame work against standard minimization. W e compare three acquisition strategies re viewed in Section 3 with their root ﬁnding variants in Section 4. W e set κ = 1 for both LCBs and α = 0 . 95 for the stochastic RSS, and set θ = 1 e − 8 . Each experiment consists of 100 independent macro replications. In each macro replication, we start from initial design of p = 2 points and perform up to 10 sequential ev aluations. At each design point θ , we conduct 10 independent replications to estimate the discrepancy . T o optimize the acquisition function, we use a multi-start strategy with L-BFGS-B initialized from 10 randomly generated starting points, and each run is limited to 10 iterations. T o assess the calibration performance, we conduct a post-evaluation step [ 40 ] by estimating ¯ f at each current best solution using 1,000 independent replications. 16 Root Finding and Metamodeling for Computer Model Calibration 6.1 2D Synthetic Example W e consider a synthetic example where the computer model has parameter θ ∈ [ − 3 , 3] 2 . The signed discrepancy is a modiﬁed Himmelblau function [41], ˜ f ( θ ) := log 2   [ θ ] 2 1 + [ θ ] 2 − 3  2 +  [ θ ] 1 + [ θ ] 2 2 − 2  2  − 1 , where each replication at θ is observed through R ( i ) ( θ ) = ˜ f ( θ ) + ε ( i ) , ε ( i ) ∼ N  0 , σ 2 ( θ )  , σ ( θ ) = q | ˜ f ( θ ) | , in which the noise v ariance is heteroskedastic and increases with the magnitude of the discrepanc y . Figure 11 depicts the signed discrepancy surface. Figure 5: Calibration results of the 2D synthetic e xample. Objecti ve function value at each best observed solution is obtained from 1,000 post replications and reported with 95% conﬁdence intervals. The experimental results are summarized in Figure 5. Across all three acquisition functions, root ﬁnding with RSS consistently achieves faster con vergence than minimization. Notably , stochastic kriging tends to off er additional improv ement in both minimization and root-ﬁnding. Howe ver , this beneﬁt is less reﬂected in minimization, as minimization requires more e valuations to identify promising region. During the early stage, e valuations are mostly exploratory , so the advantage of stochastic kriging is not exploited as effecti vely . In contrast, root ﬁnding offers a structural adv antage that allows the modeling beneﬁt to be lev eraged more effecti vely . This result is also consistent with the analysis in Section 2, where stochastic kriging provides tighter solution bounds by accounting for the ﬁnite sample uncertainty . T o better illustrate this, Figure 6 sho ws the sampling trajectories from a single macro replication after 10 e valuations. For the minimization, the sampling trajectory is more scattered and exploratory , as there is no structural guidance. In contrast, root ﬁnding produces a more directed trajectory by le veraging sign information. Particularly , in root ﬁnding, the sampling trajectories under kriging and stochastic kriging are noticeably different. In the deterministic case, the search space is progressively reduced toward a single region as the root is expected to exist as a ﬁxed point. In the stochastic case, howe ver , the probability of sign changes (from (11) ) is adaptively updated as new observations are acquired, thereby the search space can switch across different region. This results in the sampling trajectories that remain close to the zeroth contour while being scattered around it. 6.2 Logistic Simulation Example In this experiment, we consider a logistic simulation that resembles a classical M/M/1 system. The service rate µ real = 4 is assumed to be known and ﬁxed, while the actual arriv al rate θ real = 6 is unknown. In this system, entities (e.g., customers or jobs) arriv e according to a Poisson process with rate θ real , and are served sequentially with exponentially distributed service times. W e assume that only the sojourn times of entities are observable in the system, and thereby θ real can only be estimated through computer simulation by comparing the simulated output with the observed data. Such settings are commonly appeared in practice, where detailed system dynamics are often una vailable and thereby parameter estimation must rely on partially observable dataset [ 42 , 43 ]. Moreover , ev en if all arriv al and service events 17 Root Finding and Metamodeling for Computer Model Calibration Figure 6: Sampling trajectories of the benchmark methods after 10 ev aluations from a single macro replication. Dashed rectangles show the progressi vely reduced search space using RSS. were fully observable, estimating the arriv al rate would still be challenging, as most of those tractable closed-form equations are deriv ed under idealized assumptions, such as inﬁnite-horizon or steady-state assumptions, which are not applicable in our ﬁnite entity setting. Furthermore, we do not attempt to restrict the problem to so called “well-behav ed” setting in which the arriv al rate is strictly less than the service rate (i.e., θ real < µ real ). Instead, we allow the calibration task to directly reﬂect the observ ed system output as is, without imposing additional assumptions. For these reasons, the problem is addressed within a simulation based calibration framework. Figure 12 illustrates the calibration setup with m y = 100 observed entities. Figure 13 illustrates the estimated discrepancy functions along with their 95% conﬁdence intervals. As θ increases, the signed discrepancy becomes more neg ative, since a higher arri val rate makes the system more congested and leads to longer av erage sojourn times. The variance of the estimated discrepancy also increases with θ , as higher arriv al rate incurs longer waiting times, which accumulate additional variability across replications. In contrast, when the arriv al rate is relativ ely smaller than the service rate, most of the randomness arises solely from the arriv al and service processes, so the variability is much smaller . Figure 7: Calibration results of the M/M/1 example. Objectiv e function value at each best observed solution is obtained from 1,000 post replications and reported with 95% conﬁdence intervals. 18 Root Finding and Metamodeling for Computer Model Calibration In this example, the two formulations are expected to yield similar optimal solutions. Since all entities are gov erned by the same arri val rate θ and are sequentially dependent, changes in θ tend to af fect the resulting sojourn time of all entities in the same direction. This makes the spatial v ariability δ ( θ ) small, meaning that the a veraged behavior well represents the individual output. Figure 7 summarizes the calibration results. The root ﬁnding framework consistently achie ves faster con ver gence than minimization across all acquisition functions. Similar to the previous example, given the inherent variability of the problem, the beneﬁt of stochastic kriging is not well exploited in the minimization, as seen in LCB and PI cases. Again, this is mostly attributed to the exploratory nature of minimization in the early stage, requiring more observations to identify a promising region. Moreover , the best observed solution under minimization can be misleading in the early stage, which leads to higher objectiv e function v alue. This is because typically , minimization has to solve more comple x functional structure, and with few and often uninformati ve early observ ations, the metamodel is unlikely to be reliable. 6.3 Epidemic Simulation Example In this experiment, we consider an epidemic computer model based on a stochastic SIR (Suscepti- ble–Infected–Recov ered) framew ork. In this example, a population of 100 individuals is divided into three com- partments: susceptible (S), infected (I), and recovered (R). Initially , 10 individuals are infected, and disease transmission occurs through contacts between infected and susceptible individuals. Each infected individual interacts with up to 2 randomly selected susceptible individuals per day , and each contact results in a new infection with probability θ real = 0 . 65 . Infected individuals recov er independently with a daily probability of 0.7. The simulation runs for 5 days epidemic period, after which the proportion of recovered indi viduals is recorded as the model output. Figure 14 depicts the example temporal e volutions of this model. The objecti ve of this experiment is to estimate the unkno wn infection rate θ ∈ [0 , 1] by comparing the simulated daily recov ery proportions ( m y = 5 ) against the observed trajectory . This problem setting is particularly relev ant in practice, since during the epidemic outbreaks, recov ery records are often the only observable data [44]. Figure 15 illustrates the estimated discrepanc y functions. As the infection rate θ increases, the disease spreads more rapidly in the early stages, leading to a higher proportion of (observed) recov ered individuals, thereby the signed discrepancy becomes more negati ve. Notice that the variability also increases with θ , since a higher infection rate leads to more infected individuals, generating more interactions and further amplifying the stochasticity in the recovery phase. Howe ver , compared to the M/M/1 example, the overall v ariability remains relativ ely small across the parameter space. In the M/M/1 system, entities are sequentially dependent, so that the effect of θ tends to accumulate throughout the queue. In contrast, in the SIR model, the same infection and recovery rules apply to all individuals. Although interactions within each replication are processed sequentially , the realized order is itself random, so the individual-le vel randomness tends to av erage out at the population lev el. Hence, V ar ( S ( θ )) is also small in the SIR problem. Figure 8: Calibration results of the SIR example. Objective function value at each best observ ed solution is obtained from 1,000 post replications and reported with 95% conﬁdence intervals. Figure 8 summarizes the overall experimental results. The root ﬁnding framew ork consistently achieves faster con vergence than minimization across all acquisition functions. Compared to the M/M/1 example, in the root-ﬁnding frame work, due to the relativ ely small variability of the problem, there is no noticeable performance difference between kriging and stochastic kriging at the early stage. Among the acquisition functions, PI tends to exhibit slo wer con vergence 19 Root Finding and Metamodeling for Computer Model Calibration under minimization in both M/M/1 and SIR examples. Overall, root ﬁnding frame work under stochastic kriging with RSS achiev es the best calibration performance across all three examples. 7 Conclusion In this paper , we introduced a root ﬁnding framew ork for computer model calibration that lev erages sign information to accelerate the calibration process. W e proposed three new acquisition functions and analyzed their asymptotic behavior in rootless scenarios. Moreover , a search space reduction technique is introduced that is uniquely enabled by the sign structure of the root ﬁnding formulation. Both the theoretical analyses and numerical experiments demonstrate that the proposed framew ork offers a reliable and computationally ef ﬁcient alternativ e to standard calibration approaches. W e close with several remarks. First, while the root ﬁnding approach was introduced within a metamodel-based setting, the theoretical results on solution accuracy and con vergence in Section 2 are broadly applicable to other deriv ati ve-free optimization methods. Second, adaptiv e sampling presents a promising direction for further improving the computational efﬁcienc y , including not only the sequential design of new conﬁgurations but also the adaptiv e reﬁnement of replication budget in the stochastic RSS. Lastly , incorporating variance reduction techniques such as common random numbers [45] or stratiﬁed sampling [46] are also notew orthy . A Proof of Solution Accuracy Bound This section provides the proof of Theorem 6. Pr oof. Let θ ∗ n ∈ Θ ∗ n satisfy dist( ¯ θ ∗ , Θ ∗ n ) 2 = ∥ θ ∗ n − ¯ θ ∗ ∥ 2 , and let θ ∗ ∈ Θ ∗ satisfy dist( θ ∗ n , Θ ∗ ) = ∥ θ ∗ n − θ ∗ ∥ . Then, under Theorem 2, dist( ¯ θ ∗ , Θ ∗ n ) 2 ≤ 2 ¯ ℓ C  ¯ f ( θ ∗ n ) − ¯ f ( ¯ θ ∗ )  ≤ 2 ¯ ℓ C  ¯ f ( θ ∗ n ) − f ( θ ∗ )  (12) = 2 ¯ ℓ C ( f ( θ ∗ n ) − f ( θ ∗ ) + δ ( θ ∗ n ) + V ar( S ( θ ∗ n ))) = 2 ¯ ℓ C    ( E [ S ( θ ∗ n )]) 2 − ( E [ S ( θ ∗ )]) 2 | {z } ( A 1 ) + δ ( θ ∗ n ) | {z } ( A 2 ) + V ar( S ( θ ∗ n )) | {z } ( A 3 )    , where the second inequality holds since f ( θ ) ≤ ¯ f ( θ ) and f ( θ ∗ ) ≤ f ( ¯ θ ∗ ) , which giv es f ( θ ∗ ) ≤ f ( ¯ θ ∗ ) ≤ ¯ f ( ¯ θ ∗ ) . For ( A 1 ) , under Assumption 2(e) and using a 2 − b 2 = ( a − b ) 2 + 2 b ( a − b ) , ( E [ S ( θ ∗ n )]) 2 − ( E [ S ( θ ∗ )]) 2 = ( E [ S ( θ ∗ n )] − E [ S ( θ ∗ )]) 2 + 2 E [ S ( θ ∗ )] ( E [ S ( θ ∗ n )] − E [ S ( θ ∗ )]) ≤ ˜ ℓ 2 E dist( θ ∗ n , Θ ∗ ) 2 + 2 ˜ ℓ E | E [ S ( θ ∗ )] | dist( θ ∗ n , Θ ∗ ) . Since θ ∗ ∈ Θ ∗ and under Assumption 2(a) we hav e Θ ∗ = ˜ Θ ∗ , it follows that E [ S ( θ ∗ )] = 0 , and thus ( E [ S ( θ ∗ n )]) 2 ≤ ˜ ℓ 2 E dist( θ ∗ n , Θ ∗ ) 2 . For ( A 2 ) , under Assumption 2(g), δ ( θ ∗ n ) ≤ δ ( θ ∗ ) + ˜ ℓ δ dist( θ ∗ n , Θ ∗ ) . For ( A 3 ) , applying the law of total v ariance on θ ∗ n , V ar( S ( θ ∗ n )) = E [V ar( S ( θ ∗ n ) | θ ∗ n )] | {z } ( B 1 ) + V ar ( E [ S ( θ ∗ n ) | θ ∗ n ]) | {z } ( B 2 ) . (13) W e bound the two components separately . For ( B 1 ) , under Assumption 2(f), − ˜ ℓ V dist( θ ∗ n , Θ ∗ ) ≤ V ar( S ( θ ∗ n ) | θ ∗ n ) − V ar( S ( θ ∗ )) ≤ ˜ ℓ V dist( θ ∗ n , Θ ∗ ) . Using the right-hand-side inequality and taking expectation gi ves E [V ar( S ( θ ∗ n ) | θ ∗ n )] ≤ E h V ar( S ( θ ∗ )) + ˜ ℓ V dist( θ ∗ n , Θ ∗ ) i . 20 Root Finding and Metamodeling for Computer Model Calibration Since θ ∗ ∈ Θ ∗ , this further implies E [V ar( S ( θ ∗ n ) | θ ∗ n )] ≤ sup θ ∈ Θ ∗ V ar( S ( θ )) + ˜ ℓ V E [dist( θ ∗ n , Θ ∗ )] . For ( B 2 ) , from V ar( X ) ≤ E [ X 2 ] and under Assumption 2(e), V ar ( E [ S ( θ ∗ n ) | θ ∗ n ]) ≤ E h ( E [ S ( θ ∗ n ) | θ ∗ n ]) 2 i ≤ ˜ ℓ 2 E E  dist( θ ∗ n , Θ ∗ ) 2  . Hence, (13) is bounded as V ar( S ( θ ∗ n )) ≤ sup θ ∈ Θ ∗ V ar( S ( θ )) + ˜ ℓ V E [dist( θ ∗ n , Θ ∗ )] + ˜ ℓ 2 E E  dist( θ ∗ n , Θ ∗ ) 2  . Accordingly , the solution bound in (12) can be written as dist( ¯ θ ∗ , Θ ∗ n ) 2 ≤ 2 ¯ ℓ C ˜ ℓ 2 E dist( θ ∗ n , Θ ∗ ) 2 + δ ( θ ∗ ) + ˜ ℓ δ dist( θ ∗ n , Θ ∗ ) + sup θ ∈ Θ ∗ V ar( S ( θ )) + ˜ ℓ V E [dist( θ ∗ n , Θ ∗ )] + ˜ ℓ 2 E E  dist( θ ∗ n , Θ ∗ ) 2  ! . Since θ ∗ ∈ Θ ∗ , it follows that δ ( θ ∗ ) + sup θ ∈ Θ ∗ V ar( S ( θ )) ≤ sup θ ∈ Θ ∗ ( δ ( θ ) + V ar( S ( θ ))) . Thereby , we can conclude that dist( ¯ θ ∗ , Θ ∗ n ) 2 ≤ 2 ¯ ℓ C ˜ ℓ 2 E dist( θ ∗ n , Θ ∗ ) 2 + sup θ ∈ Θ ∗ δ ( θ ) + ˜ ℓ δ dist( θ ∗ n , Θ ∗ ) + sup θ ∈ Θ ∗ V ar( S ( θ )) + ˜ ℓ V E [dist( θ ∗ n , Θ ∗ )] + ˜ ℓ 2 E E  dist( θ ∗ n , Θ ∗ ) 2  ! . B Proof of Expected Solution Accuracy Bound This section provides the proof of Theorem 8. Pr oof. Let θ ∗ n ∈ Θ ∗ n satisfy dist( ¯ θ ∗ , Θ ∗ n ) 2 = ∥ θ ∗ n − ¯ θ ∗ ∥ 2 , and let θ ∗ ∈ Θ ∗ satisfy dist( θ ∗ n , Θ ∗ ) = ∥ θ ∗ n − θ ∗ ∥ . From Theorem 6, taking expectation gi ves E  dist( ¯ θ ∗ , Θ ∗ n ) 2  ≤ 2 ¯ ℓ C 2 ˜ ℓ 2 E E  dist( θ ∗ n , Θ ∗ ) 2  + sup θ ∈ Θ ∗ ( δ ( θ ) + V ar( S ( θ ))) + ( ˜ ℓ δ + ˜ ℓ V ) E [dist( θ ∗ n , Θ ∗ )] ! ≤ 2 ¯ ℓ C 2 ˜ ℓ 2 E E  dist( θ ∗ n , Θ ∗ ) 2  + sup θ ∈ Θ ∗ ( δ ( θ ) + V ar( S ( θ ))) + ( ˜ ℓ δ + ˜ ℓ V ) E  dist( θ ∗ n , Θ ∗ ) 2  1 / 2 ! , where the second inequality follows from the Cauchy–Schwarz inequality E [dist( θ ∗ n , Θ ∗ )] ≤ E  dist( θ ∗ n , Θ ∗ ) 2  1 / 2 . 21 Root Finding and Metamodeling for Computer Model Calibration C Proof of T ighter Expected Accuracy Bound of Stochastic Appr oximation Solution This section provides the proof of Theorem 9. Pr oof. For any ˇ θ ∗ n ∈ ˇ Θ ∗ n , from a − b ≤ | b − a | , f ( ˇ θ ∗ n ) − ˇ f n ( ˇ θ ∗ n ) ≤   ˇ f n ( ˇ θ ∗ n ) − f ( ˇ θ ∗ n )   ≤ sup θ ∈ Θ   ˇ f n ( θ ) − f ( θ )   , where we get f ( ˇ θ ∗ n ) ≤ ˇ f n ( ˇ θ ∗ n ) + sup θ ∈ Θ   ˇ f n ( θ ) − f ( θ )   . (14) Since ˇ θ ∗ n minimizes ˇ f n ov er Θ , ˇ f n ( ˇ θ ∗ n ) ≤ ˇ f n ( θ ∗ ) , so we can further bound f ( ˇ θ ∗ n ) ≤ ˇ f n ( θ ∗ ) + sup θ ∈ Θ   ˇ f n ( θ ) − f ( θ )   . Similarly , for θ ∗ ∈ Θ ∗ , now , from b − a ≤ | b − a | , ˇ f n ( θ ∗ ) − f ( θ ∗ ) ≤   ˇ f n ( θ ∗ ) − f ( θ ∗ )   ≤ sup θ ∈ Θ   ˇ f n ( θ ) − f ( θ )   , which giv es ˇ f n ( θ ∗ ) ≤ f ( θ ∗ ) + sup θ ∈ Θ   ˇ f n ( θ ) − f ( θ )   . (15) From (14) and (15), we hav e f ( ˇ θ ∗ n ) ≤ f ( θ ∗ ) + 2 sup θ ∈ Θ   ˇ f n ( θ ) − f ( θ )   . Under Assumption 2(a), f ( θ ∗ ) = 0 , so f ( ˇ θ ∗ n ) ≤ 2 sup θ ∈ Θ   ˇ f n ( θ ) − f ( θ )   . Under Assumption 2(b), for any θ ∈ Θ , dist( θ , Θ ∗ ) 2 ≤ 1 ˜ ℓ 2 R f ( θ ) . For ˇ θ ∗ n , taking expectations gi ves E  dist( ˇ θ ∗ n , Θ ∗ ) 2  ≤ 1 ˜ ℓ 2 R E  f ( ˇ θ ∗ n )  ≤ 2 ˜ ℓ 2 R E  sup θ ∈ Θ   ˇ f n ( θ ) − f ( θ )    . By the same argument, for θ ∗ n , we get E  dist( θ ∗ n , Θ ∗ ) 2  ≤ 2 ˜ ℓ 2 R E  sup θ ∈ Θ | f n ( θ ) − f ( θ ) |  . By the Cauchy–Schwarz inequality , E  sup θ ∈ Θ   ˇ f n ( θ ) − f ( θ )    ≤  E  sup θ ∈ Θ  ˇ f n ( θ ) − f ( θ )  2  1 / 2 , and E  sup θ ∈ Θ | f n ( θ ) − f ( θ ) |  ≤  E  sup θ ∈ Θ ( f n ( θ ) − f ( θ )) 2  1 / 2 . Then, under Assumption 4,  E  sup θ ∈ Θ  ˇ f n ( θ ) − f ( θ )  2  1 / 2 ≤  E  sup θ ∈ Θ ( f n ( θ ) − f ( θ )) 2  1 / 2 , which implies that the upper bound of E h   ˇ θ ∗ n − θ ∗   2 i is smaller than that of E h ∥ θ ∗ n − θ ∗ ∥ 2 i . Thereby , we can conclude that E  dist( ¯ θ ∗ , ˇ Θ ∗ n ) 2  ≤ E  dist( ¯ θ ∗ , Θ ∗ n ) 2  . 22 Root Finding and Metamodeling for Computer Model Calibration D Derivation of Pr obability of Impro vement The ev ent of improvement at a candidate point θ is deﬁned as the predicted value being smaller than v t n : I t n ( θ ) :=  η t n ( θ ) ≤ v t n  . Using the reparameterization trick, we can express the predicti ve random v ariable as η t n ( θ ) = µ t n ( θ ) + σ t n ( θ ) Z, Z ∼ N (0 , 1) . Then, the PI acquisition function is giv en by PI t n ( θ ) = P ( I t n ( θ )) = P ( µ t n ( θ ) + σ t n ( θ ) Z ≤ v t n ) = Φ  v t n − µ t n ( θ ) σ t n ( θ )  , where Φ( · ) denotes the standard normal cumulativ e distribution function (CDF). E Derivation of Expected Impr ovement The improv ement at a candidate point θ is deﬁned as EI t n ( θ ) = E  max  0 , v t n − η t n ( θ )  . Using the reparameterization trick, we can write η t n ( θ ) = µ t n ( θ ) + σ t n ( θ ) Z, Z ∼ N (0 , 1) . Substituting into the EI expression gi ves EI t n ( θ ) = E  max  0 , v t n − µ t n ( θ ) − σ t n ( θ ) Z  . Then, applying R a −∞ ϕ ( b ) db = Φ( a ) and R a −∞ bϕ ( b ) db = − ϕ ( a ) , EI t n ( θ ) = Z z t n ( θ ) −∞  v t n − µ t n ( θ ) − σ t n ( θ ) z  ϕ ( z ) dz = ( v t n − µ t n ( θ ))Φ( z t n ( θ )) + σ t n ( θ ) ϕ ( z t n ( θ )) , where ϕ ( · ) denotes the probability density function (PDF) of the standard normal distribution. F Derivation of Root-Finding Pr obability of Impro vement The improv ement event at a candidate point θ is deﬁned as the predicted value being closer to zero than the current best, e I t n ( θ ) :=  | ˜ η t n ( θ ) | ≤ | ˜ v t n |  , which accounts for improv ement in both sign directions symmetrically . Using the reparameterization trick, ˜ η t n ( θ ) = ˜ µ t n ( θ ) + ˜ σ t n ( θ ) Z, Z ∼ N (0 , 1) , and we can express the closed-form of root-ﬁnding PI as: e PI t n ( θ ) = P  e I t n ( θ )  = P  −| ˜ v t n | ≤ ˜ µ t n ( θ ) + ˜ σ t n ( θ ) Z ≤ | ˜ v t n |  = P  −| ˜ v t n | − ˜ µ t n ( θ ) ˜ σ t n ( θ ) ≤ Z ≤ | ˜ v t n | − ˜ µ t n ( θ ) ˜ σ t n ( θ )  = Φ  | ˜ v t n | − ˜ µ t n ( θ ) ˜ σ t n ( θ )  − Φ  −| ˜ v t n | − ˜ µ t n ( θ ) ˜ σ t n ( θ )  . 23 Root Finding and Metamodeling for Computer Model Calibration G Derivation of Root-Finding Expected Impr ovement The improv ement at a candidate point θ is deﬁned as e EI t n ( θ ) = E  max(0 , | ˜ v t n | − | ˜ η t n ( θ ) | )  . Then, the full expectation form is e xpressed as e EI t n ( θ ) = Z ∞ −∞ max  0 , | ˜ v t n | − | ˜ µ t n ( θ ) + ˜ σ t n ( θ ) z |  ϕ ( z ) dz . (16) Here, observe that the inte grand is nonzero only when | ˜ µ t n ( θ ) + ˜ σ t n ( θ ) z | ≤ | ˜ v t n | . Then, we can write (16) as e EI t n ( θ ) = Z ¯ z t n ( θ ) z ¯ t n ( θ )  | ˜ v t n | − | ˜ µ t n ( θ ) + ˜ σ t n ( θ ) z |  ϕ ( z ) dz . Since the absolute value term changes its sign at ˜ z t n ( θ ) , we di vide the integral parts, e EI t n ( θ ) = Z ˜ z t n ( θ ) z ¯ t n ( θ )  | ˜ v t n | + ˜ µ t n ( θ ) + ˜ σ t n ( θ ) z  ϕ ( z ) dz + Z ¯ z t n ( θ ) ˜ z t n ( θ )  | ˜ v t n | − ˜ µ t n ( θ ) − ˜ σ t n ( θ ) z  ϕ ( z ) dz . Separating each term giv es e EI t n ( θ ) = | ˜ v t n | Z ˜ z t n ( θ ) z ¯ t n ( θ ) ϕ ( z ) dz + Z ¯ z t n ( θ ) ˜ z t n ( θ ) ϕ ( z ) dz ! + ˜ µ t n ( θ ) Z ˜ z t n ( θ ) z ¯ t n ( θ ) ϕ ( z ) dz − Z ¯ z t n ( θ ) ˜ z t n ( θ ) ϕ ( z ) dz ! + ˜ σ t n ( θ ) Z ˜ z t n ( θ ) z ¯ t n ( θ ) z ϕ ( z ) dz − Z ¯ z t n ( θ ) ˜ z t n ( θ ) z ϕ ( z ) dz ! . Using R b a ϕ ( z ) dz = Φ( b ) − Φ( a ) , R b a z ϕ ( z ) dz = ϕ ( a ) − ϕ ( b ) , we get e EI t n ( θ ) = | ˜ v t n |  Φ( ¯ z t n ( θ )) − Φ( z ¯ t n ( θ ))  + ˜ µ t n ( θ )  2Φ( ˜ z t n ( θ )) − Φ( ¯ z t n ( θ )) − Φ( z ¯ t n ( θ ))  − ˜ σ t n ( θ )  2 ϕ ( ˜ z t n ( θ )) − ϕ ( ¯ z t n ( θ )) − ϕ ( z ¯ t n ( θ ))  . H Gradient Derivations f or Root ﬁnding Acquisition Functions In this appendix, we provide analytical expressions for the ﬁrst-order deriv ativ es of the proposed root ﬁnding acquisition functions. W e ﬁrst present the gradients of the posterior mean and v ariance under both kriging and stochastic kriging models. The gradient of the posterior mean under the kriging model is given by: ∂ ˜ µ t n ( θ ) ∂ θ = ∂ ∂ θ  k ( θ , Θ t ; l )[ k (Θ t , Θ t ; l )] − 1 ˜ f t n  = ∂ k ( θ, Θ t ; l ) ∂ θ [ k (Θ t , Θ t ; l )] − 1 ˜ f t n . Similarly , the gradient of the posterior mean under the stochastic kriging model is: ∂ ˜ µ t n ( θ ) ∂ θ = ∂ k ( θ, Θ t ; l ) ∂ θ [ k (Θ t , Θ t ; l ) + Σ ( t ) ] − 1 ˜ f t n . The gradient of the posterior standard deviation under the kriging model is: ∂ ˜ σ t n ( θ ) ∂ θ = ∂ ∂ θ p ( ˜ σ t n ) 2 ( θ ) = 1 2 p ( ˜ σ t n ) 2 ( θ ) · ∂ ( ˜ σ t n ) 2 ( θ ) ∂ θ = − 1 2 p ( ˜ σ t n ) 2 ( θ ) ∂ k ( θ, Θ t ; l ) ∂ θ [ k (Θ t , Θ t ; l )] − 1 k ( θ , Θ t ; l ) ⊤ + k ( θ , Θ t ; l )[ k (Θ t , Θ t ; l )] − 1 ∂ k ( θ, Θ t ; l ) ⊤ ∂ θ ! , 24 Root Finding and Metamodeling for Computer Model Calibration where ∂ k ( θ, Θ t ; l ) ⊤ ∂ θ =      − 1 l 2 ( θ − θ 1 ) k ( θ , θ 1 ; l ) − 1 l 2 ( θ − θ 2 ) k ( θ , θ 2 ; l ) . . . − 1 l 2 ( θ − θ p + t ) k ( θ , θ p + t ; l )      . The gradient of the posterior standard deviation under the stochastic kriging model is: ∂ ˜ σ t n ( θ ) ∂ θ = − 1 2 p ( ˜ σ t n ) 2 ( θ ) ∂ k ( θ, Θ t ; l ) ∂ θ [ k (Θ t , Θ t ; l ) + Σ ( t ) ] − 1 k ( θ , Θ t ; l ) ⊤ + k ( θ , Θ t ; l )[ k (Θ t , Θ t ; l ) + Σ ( t ) ] − 1 ∂ k ( θ, Θ t ; l ) ⊤ ∂ θ ! . Gradient of ] LCB ∂ g LCB t n ( θ ) ∂ θ = ∂ ∂ θ  | ˜ µ t n ( θ ) | − κ ˜ σ t n ( θ )  = sign ( ˜ µ t n ( θ )) · ∂ ˜ µ t n ( θ ) ∂ θ − κ · ∂ ˜ σ t n ( θ ) ∂ θ . Gradient of f PI W e use Φ( a ) = R a −∞ ϕ ( b ) db , ∂ e PI t n ( θ ) ∂ θ = ∂ ∂ θ  Φ( ¯ z t n ( θ )) − Φ( z ¯ t n ( θ ))  = ∂ ∂ θ Z ¯ z t n ( θ ) −∞ ϕ ( z ) dz − Z z ¯ t n ( θ ) −∞ ϕ ( z ) dz ! = ∂ ∂ θ Z ¯ z t n ( θ ) z ¯ t n ( θ ) ϕ ( z ) dz . Applying the Leibniz integral rule ∂ ∂ θ R b ( θ ) a ( θ ) g ( z ) dz = g ( b ( θ )) ∂ b ( θ ) ∂ θ − g ( a ( θ )) ∂ a ( θ ) ∂ θ , we get ∂ e PI t n ( θ ) ∂ θ = ϕ ( ¯ z t n ( θ )) ∂ ¯ z t n ( θ ) ∂ θ − ϕ ( z ¯ t n ( θ )) ∂ z ¯ t n ( θ ) ∂ θ , where the gradients of ¯ z t n ( θ ) and z ¯ t n ( θ ) are ∂ ¯ z t n ( θ ) ∂ θ = − ∂ ˜ µ t n ( θ ) ∂ θ · ˜ σ t n ( θ ) − ( | ˜ v t n | − ˜ µ t n ( θ )) · ∂ ˜ σ t n ( θ ) ∂ θ ( ˜ σ t n ) 2 ( θ ) , ∂ z ¯ t n ( θ ) ∂ θ = − ∂ ˜ µ t n ( θ ) ∂ θ · ˜ σ t n ( θ ) + ( | ˜ v t n | + ˜ µ t n ( θ )) · ∂ ˜ σ t n ( θ ) ∂ θ ( ˜ σ t n ) 2 ( θ ) . 25 Root Finding and Metamodeling for Computer Model Calibration Gradient of f EI ∂ ∂ θ e EI t n ( θ ) = | ˜ v t n |  ∂ ¯ z t n ( θ ) ∂ θ · ϕ ( ¯ z t n ( θ )) − ∂ z ¯ t n ( θ ) ∂ θ · ϕ ( z ¯ t n ( θ ))  + ∂ ˜ µ t n ( θ ) ∂ θ (2Φ( ˜ z t n ( θ )) − Φ( ¯ z t n ( θ )) − Φ( z ¯ t n ( θ ))) + ˜ µ t n ( θ )  2 ∂ ˜ z t n ( θ ) ∂ θ · ϕ ( ˜ z t n ( θ )) − ∂ ¯ z t n ( θ ) ∂ θ · ϕ ( ¯ z t n ( θ )) − ∂ z ¯ t n ( θ ) ∂ θ · ϕ ( z ¯ t n ( θ ))  − ∂ ˜ σ t n ( θ ) ∂ θ (2 ϕ ( ˜ z t n ( θ )) − ϕ ( ¯ z t n ( θ )) − ϕ ( z ¯ t n ( θ ))) − ˜ σ t n ( θ ) − 2 ∂ ˜ z t n ( θ ) ∂ θ · ˜ z t n ( θ ) √ 2 π e − 1 2 ˜ z t n ( θ ) 2 + ∂ z ¯ t n ( θ ) ∂ θ · z ¯ t n ( θ ) √ 2 π e − 1 2 z ¯ t n ( θ ) 2 + ∂ ¯ z t n ( θ ) ∂ θ · ¯ z t n ( θ ) √ 2 π e − 1 2 ¯ z t n ( θ ) 2 ! . The gradient of ˜ z t n ( θ ) is giv en by ∂ ˜ z t n ( θ ) ∂ θ = − ∂ ˜ µ t n ( θ ) ∂ θ ˜ σ t n ( θ ) + ˜ µ t n ( θ ) ∂ ˜ σ t n ( θ ) ∂ θ ( ˜ σ t n ) 2 ( θ ) . I V alidation of Analytical Gradients Using Finite Difference Estimates In this experiment, we validate the derived analytical gradients by comparing them against central ﬁnite difference estimates with step size 10 − 5 . Figure 9: Comparison of analytical and numerical gradient estimates for root ﬁnding acquisition functions. As sho wn in Figure 9, the analytical gradients are consistent with the ﬁnite dif ference estimates across all three root ﬁnding acquisition functions. J Asymptotic Beha vior of Root Finding Acquisition Functions In this section, we analyze the rootless scenario, where the sample average root ﬁnding objective does not contain a root over Θ . W e show that, under suitable posterior regularity conditions, the proposed root ﬁnding acquisition functions asymptotically recov er the same optimizer as their standard minimization-based counterparts. For analytical con venience, let (Ω , F , P ) be the probability space in Section 4, and ﬁx ω ∈ Ω + . The asymptotic analysis throughout this section is with respect to t . Assumption 6 (Regularity Conditions for the Rootless Scenario) . (a) The sample av erage root ﬁnding function is strictly positive ov er Θ , i.e., there exists ϵ ( ω ) > 0 such that ˜ f n ( θ ; ω ) ≥ ϵ ( ω ) for all θ ∈ Θ . 26 Root Finding and Metamodeling for Computer Model Calibration (b) Each acquisition function is globally optimized at every iteration. (c) Under the rootless regime, the predicti ve mean remains strictly positiv e over Θ for all sufﬁciently lar ge t , i.e., ˜ µ t ( θ ; ω ) > 0 for all θ ∈ Θ . (d) For all sufﬁciently lar ge t , the compared acquisition functions hav e unique optimizers over Θ . Assumption 7 (Regularity Conditions for the Small- ϵ Regime) . (a) Let Θ ∗ t ⊆ Θ be a neighborhood containing the optimizer of the acquisition function. As t → ∞ , sup θ ∈ Θ ∗ t ˜ µ t ( θ ; ω ) | ˜ v t ( ω ) | → 0 , sup θ ∈ Θ ∗ t | ˜ v t ( ω ) | ˜ σ t ( θ ; ω ) → 0 . (b) Let Θ ∗ t ⊆ Θ be a neighborhood containing the optimizer of the acquisition function. The predictiv e standard deviation is asymptotically constant o ver Θ ∗ t , i.e., sup θ a ,θ b ∈ Θ ∗ t   ˜ σ t ( θ a ; ω ) − ˜ σ t ( θ b ; ω )   → 0 , as t → ∞ . Assumption 6(a) formalizes the rootless regime, where the sample average root ﬁnding function is strictly positive across the parameter space. Although the following analysis is presented under the strictly positiv e case, the arguments and proofs extend symmetrically to the neg ative case. Assumption 6(b) assures that the acquisition function is globally optimized at each iteration. This technical condition is imposed to avoid potential complications arising from local optimization of the acquisition function. Assumption 6(c) is a mild condition that the predictiv e mean follows the one-sided structure of the rootless re gime. Lastly , Assumption 6(d) ensures that the compared acquisition functions admit unique optimizers for all sufﬁciently lar ge t . The conditions in Assumption 7 are inv oked only in the small- ϵ analyses of PI and EI, where ϵ ( ω ) := inf θ ∈ Θ ˜ f n ( θ ; ω ) > 0 is small enough that the best observed value | ˜ v t ( ω ) | and the predictiv e mean ˜ µ t ( θ ; ω ) both approach zero as t → ∞ . Assumption 7(a) reﬂects the local asymptotic behavior of the search in this re gime, where regions fav ored by improv ement-based acquisition functions are expected to hav e predictive means smaller than the current best observed value, since otherwise the standardized improv ement term would be strictly negati ve, yielding a small CDF value. The second relation captures the presence of regions with suf ﬁcient remaining uncertainty to explore, as the search would hav e otherwise already conv erged. Assumption 7(b) requires the predictive standard de viation to vary only mildly ov er the neighborhood Θ ∗ t containing the optimizer , and is more likely to be satisﬁed at a later stage of the search, once Θ ∗ t is well-identiﬁed and enough ev aluations have accumulated within it so that the predictiv e uncertainty stabilizes locally . Root ﬁnding Lower Conﬁdence Bound in Rootless Scenario Theorem 12. Under Assumption 6, for a ﬁxed ω ∈ Ω + , the optimizer of the r oot ﬁnding and standard LCB acquisition functions coincide asymptotically , i.e.,     arg min θ ∈ Θ ] LCB t ( θ ; ω ) − arg min θ ∈ Θ LCB t ( θ ; ω )     → 0 , as t → ∞ . Pr oof of Theor em 12. The standard LCB acquisition function and its root ﬁnding counterpart at iteration t are deﬁned as LCB t ( θ ; ω ) = µ t ( θ ; ω ) − κσ t ( θ ; ω ) , ] LCB t ( θ ; ω ) = | ˜ µ t ( θ ; ω ) | − κ ˜ σ t ( θ ; ω ) . Under Assumption 6(c), the predictive mean satisﬁes ˜ µ t ( θ ; ω ) > 0 for all θ ∈ Θ and all sufﬁciently large t , which implies | ˜ µ t ( θ ; ω ) | = ˜ µ t ( θ ; ω ) , ∀ θ ∈ Θ . Therefore, ] LCB t ( θ ; ω ) = LCB t ( θ ; ω ) , ∀ θ ∈ Θ , and under Assumption 6(d), both acquisition functions ha ve same unique optimizers for all suf ﬁciently large t . Then, under Assumption 6(b),     arg min θ ∈ Θ ] LCB t ( θ ; ω ) − arg min θ ∈ Θ LCB t ( θ ; ω )     → 0 , as t → ∞ . 27 Root Finding and Metamodeling for Computer Model Calibration Notably , although Theorem 12 is established in the asymptotic re gime, this re verting beha vior can generally be observed in practice ev en with a relativ ely small number of ev aluations. Under Assumption 6(a), the observed e valuations remain predominantly one-sided, and the predictiv e mean ˜ µ t ( θ ; ω ) therefore also tends to be one-sided ov er Θ . Root ﬁnding Probability Impr ovement in Rootless Scenario Theorem 13. Under Assumption 6 and Assumption 7(a), for a ﬁxed ω ∈ Ω + , the maximizers of the r oot ﬁnding and standar d PI acquisition functions coincide asymptotically , i.e.,     arg max θ ∈ Θ f PI t ( θ ; ω ) − arg max θ ∈ Θ PI t ( θ ; ω )     → 0 , as t → ∞ . Pr oof of Theor em 13. The standard PI acquisition function and its root ﬁnding counterpart at iteration t are deﬁned as PI t ( θ ; ω ) = Φ  v t ( ω ) − µ t ( θ ; ω ) σ t ( θ ; ω )  , f PI t ( θ ; ω ) = Φ  | ˜ v t ( ω ) | − ˜ µ t ( θ ; ω ) ˜ σ t ( θ ; ω )  − Φ  −| ˜ v t ( ω ) | − ˜ µ t ( θ ; ω ) ˜ σ t ( θ ; ω )  . W e consider two distinct regimes based on the magnitude of ϵ ( ω ) . Case 1. Large- ϵ W e show that the second term in f PI t ( θ ; ω ) is uniformly negligible over Θ as t → ∞ . Since | ˜ v t ( ω ) | ≥ ϵ ( ω ) and ˜ µ t ( θ ; ω ) ≥ ϵ ( ω ) for suf ﬁciently large t , we ha ve −| ˜ v t ( ω ) | − ˜ µ t ( θ ; ω ) ˜ σ t ( θ ; ω ) ≤ − 2 ϵ ( ω ) ˜ σ t ( θ ; ω ) . By Mills’ ratio, for a > 0 , Φ( − a ) ≤ 1 a √ 2 π e − a 2 / 2 , so that Φ  −| ˜ v t ( ω ) | − ˜ µ t ( θ ; ω ) ˜ σ t ( θ ; ω )  ≤ 1 √ 2 π ˜ σ t ( θ ; ω ) 2 ϵ ( ω ) exp  − 2 ϵ ( ω ) 2 ˜ σ t ( θ ; ω ) 2  . Since ˜ σ t ( θ ; ω ) → 0 as t → ∞ , the argument − 2 ϵ ( ω ) / ˜ σ t ( θ ; ω ) → −∞ , and hence the second term vanishes uniformly ov er Θ . Therefore, the root ﬁnding acquisition function reduces to f PI t ( θ ; ω ) → Φ  | ˜ v t ( ω ) | − ˜ µ t ( θ ; ω ) ˜ σ t ( θ ; ω )  = PI t ( θ ; ω ) . Then, under Assumption 6(b) and Assumption 6(d), we obtain     arg max θ ∈ Θ f PI t ( θ ; ω ) − arg max θ ∈ Θ PI t ( θ ; ω )     → 0 , as t → ∞ . Case 2. Small- ϵ W e now consider the re gime where the irreducible discrepancy ϵ ( ω ) > 0 is small. Let Θ ∗ t ⊆ Θ be a compact neighborhood containing the maximizer of the acquisition function. Under Assumption 7(a), both arguments of Φ in f PI t ( θ ; ω ) approach zero, i.e., | ˜ v t ( ω ) | − ˜ µ t ( θ ; ω ) ˜ σ t ( θ ; ω ) → 0 , −| ˜ v t ( ω ) | − ˜ µ t ( θ ; ω ) ˜ σ t ( θ ; ω ) → 0 , as t → ∞ . W e then apply a second-order T aylor expansion Φ( a ) = 1 / 2 + aϕ (0) + O ( a 3 ) around a = 0 , which giv es, for θ ∈ Θ ∗ t , PI t ( θ ; ω ) = 1 2 + ϕ (0) · v t ( ω ) − µ t ( θ ; ω ) σ t ( θ ; ω ) + O  v t ( ω ) − µ t ( θ ; ω ) σ t ( θ ; ω )  3 ! , f PI t ( θ ; ω ) = 2 ϕ (0) · | ˜ v t ( ω ) | ˜ σ t ( θ ; ω ) + O  | ˜ v t ( ω ) | + ˜ µ t ( θ ; ω ) ˜ σ t ( θ ; ω )  3 ! . 28 Root Finding and Metamodeling for Computer Model Calibration Under Assumption 6(c), for sufﬁciently lar ge t we have | ˜ µ t ( θ ; ω ) | = ˜ µ t ( θ ; ω ) and | ˜ v t ( ω ) | = ˜ v t ( ω ) , so that f PI t ( θ ; ω ) = 2 · PI t ( θ ; ω ) − 1 + 2 ϕ (0) · ˜ µ t ( θ ; ω ) ˜ σ t ( θ ; ω ) + O  ˜ v t ( ω ) + ˜ µ t ( θ ; ω ) ˜ σ t ( θ ; ω )  3 ! | {z } ˜ r t ( θ ; ω ) . Under Assumption 7(a), ˜ r t ( θ ; ω ) → 0 o ver θ ∈ Θ ∗ t , and therefore f PI t ( θ ; ω ) = 2 · PI t ( θ ; ω ) − 1 , as t → ∞ . Since f PI t ( θ ; ω ) is a positive linear transformation of PI t ( θ ; ω ) , both share the same maximizer, and under Assump- tion 6(b) and Assumption 6(d),     arg max θ ∈ Θ f PI t ( θ ; ω ) − arg max θ ∈ Θ PI t ( θ ; ω )     → 0 , as t → ∞ . Root ﬁnding Expected Impro vement in Rootless Scenario Theorem 14. Under Assumption 6 and Assumption 7, for a ﬁxed ω ∈ Ω + , the maximizers of the r oot ﬁnding and standar d EI acquisition functions coincide asymptotically , i.e.,     arg max θ ∈ Θ f EI t ( θ ; ω ) − arg max θ ∈ Θ EI t ( θ ; ω )     → 0 , as t → ∞ . Pr oof of Theor em 14. For EI, we follow the same logical ﬂo w of Theorem 13, exploring the large- ϵ and small- ϵ regimes. Case 1. Large- ϵ Recall the closed-form expression of f EI t ( θ ; ω ) is expressed as f EI t ( θ ; ω ) = ˜ v t ( ω )  Φ  ˜ v t ( ω ) − ˜ µ t ( θ ; ω ) ˜ σ t ( θ ; ω )  − Φ  − ˜ v t ( ω ) − ˜ µ t ( θ ; ω ) ˜ σ t ( θ ; ω )  + ˜ µ t ( θ ; ω )  2Φ  − ˜ µ t ( θ ; ω ) ˜ σ t ( θ ; ω )  − Φ  ˜ v t ( ω ) − ˜ µ t ( θ ; ω ) ˜ σ t ( θ ; ω )  − Φ  − ˜ v t ( ω ) − ˜ µ t ( θ ; ω ) ˜ σ t ( θ ; ω )  − ˜ σ t ( θ ; ω )  2 ϕ  − ˜ µ t ( θ ; ω ) ˜ σ t ( θ ; ω )  − ϕ  ˜ v t ( ω ) − ˜ µ t ( θ ; ω ) ˜ σ t ( θ ; ω )  − ϕ  − ˜ v t ( ω ) − ˜ µ t ( θ ; ω ) ˜ σ t ( θ ; ω )  . Again using Mills’ ratio, we hav e Φ  ˜ v t ( ω ) − ˜ µ t ( θ ; ω ) ˜ σ t ( θ ; ω )  ≤ 1 √ 2 π ˜ σ t ( θ ; ω ) ˜ µ t ( θ ; ω ) − ˜ v t ( ω ) exp  − ( ˜ µ t ( θ ; ω ) − ˜ v t ( ω )) 2 2 ˜ σ t ( θ ; ω ) 2  , Φ  − ˜ µ t ( θ ; ω ) ˜ σ t ( θ ; ω )  ≤ 1 √ 2 π ˜ σ t ( θ ; ω ) ˜ µ t ( θ ; ω ) exp  − ˜ µ t ( θ ; ω ) 2 2 ˜ σ t ( θ ; ω ) 2  . As t → ∞ and ˜ σ t ( θ ; ω ) → 0 , we can show that Φ  ˜ v t ( ω ) − ˜ µ t ( θ ; ω ) ˜ σ t ( θ ; ω )  Φ  − ˜ µ t ( θ ; ω ) ˜ σ t ( θ ; ω )  ≤ ϵ ( ω ) ˜ µ t ( θ ; ω ) − ˜ v t ( ω ) exp  ϵ ( ω ) 2 − ( ˜ µ t ( θ ; ω ) − ˜ v t ( ω )) 2 2 ˜ σ t ( θ ; ω ) 2  → ∞ , and thus Φ  ˜ v t ( ω ) − ˜ µ t ( θ ; ω ) ˜ σ t ( θ ; ω )  becomes the dominant term among the CDF terms. For the PDF terms, we get ϕ  ˜ v t ( ω ) − ˜ µ t ( θ ; ω ) ˜ σ t ( θ ; ω )  ϕ  − ˜ µ t ( θ ; ω ) ˜ σ t ( θ ; ω )  = exp  ˜ v t ( ω )(2 ˜ µ t ( θ ; ω ) − ˜ v t ( ω )) 2 ˜ σ t ( θ ; ω ) 2  → ∞ , ϕ  ˜ v t ( ω ) − ˜ µ t ( θ ; ω ) ˜ σ t ( θ ; ω )  ϕ  − ˜ v t ( ω )+ ˜ µ t ( θ ; ω ) ˜ σ t ( θ ; ω )  = exp  2 ˜ v t ( ω ) ˜ µ t ( θ ; ω ) ˜ σ t ( θ ; ω ) 2  → ∞ . 29 Root Finding and Metamodeling for Computer Model Calibration Thus, among the PDF terms, only ϕ  ˜ v t ( ω ) − ˜ µ t ( θ ; ω ) ˜ σ t ( θ ; ω )  is the dominant contribution, and f EI t ( θ ; ω ) reduces in the limit to f EI t ( θ ; ω ) → ( v t ( ω ) − µ t ( θ ; ω )) · Φ  v t ( ω ) − µ t ( θ ; ω ) σ t ( θ ; ω )  + σ t ( θ ; ω ) · ϕ  v t ( ω ) − µ t ( θ ; ω ) σ t ( θ ; ω )  = EI t ( θ ; ω ) . Then, under Assumption 6(b) and Assumption 6(d),     arg max θ ∈ Θ f EI t ( θ ; ω ) − arg max θ ∈ Θ EI t ( θ ; ω )     → 0 , as t → ∞ . Case 2. Small- ϵ W e now consider the small- ϵ ( ω ) case. Let Θ ∗ t ⊆ Θ be a compact neighborhood containing the maximizer of the acquisition function. From Assumption 7(a), we can neglect the contribution of µ t ( θ ; ω ) , i.e., v t ( ω ) ± µ t ( θ ; ω ) = v t ( ω ) , where we can write f EI t ( θ ; ω ) = ˜ v t ( ω )  Φ  ˜ v t ( ω ) ˜ σ t ( θ ; ω )  − Φ  − ˜ v t ( ω ) ˜ σ t ( θ ; ω )  + ˜ µ t ( θ ; ω )  1 − Φ  ˜ v t ( ω ) ˜ σ t ( θ ; ω )  − Φ  − ˜ v t ( ω ) ˜ σ t ( θ ; ω )  − ˜ σ t ( θ ; ω )  2 ϕ (0) − ϕ  ˜ v t ( ω ) ˜ σ t ( θ ; ω )  − ϕ  − ˜ v t ( ω ) ˜ σ t ( θ ; ω )  = 2 ˜ v t ( ω ) · Φ  ˜ v t ( ω ) ˜ σ t ( θ ; ω )  + ˜ µ t ( θ ; ω ) ·  1 − 2Φ  ˜ v t ( ω ) ˜ σ t ( θ ; ω )  − 2 ˜ σ t ( θ ; ω ) ·  ϕ (0) − ϕ  ˜ v t ( ω ) ˜ σ t ( θ ; ω )  . Similarly , the standard EI can be expressed as EI t ( θ ; ω ) = ˜ v t ( ω ) · Φ  ˜ v t ( ω ) ˜ σ t ( θ ; ω )  + ˜ σ t ( θ ; ω ) · ϕ  ˜ v t ( ω ) ˜ σ t ( θ ; ω )  , so that f EI t ( θ ; ω ) = 2 · EI t ( θ ; ω ) + ˜ µ t ( θ ; ω ) ·  1 − 2Φ  ˜ v t ( ω ) ˜ σ t ( θ ; ω )  − 2 ˜ σ t ( θ ; ω ) · ϕ (0) . The contribution of ˜ µ t ( θ ; ω ) · h 1 − 2Φ  ˜ v t ( ω ) ˜ σ t ( θ ; ω ) i is negligible under Assumption 7(a) as t → ∞ , since | ˜ v t ( ω ) | ≫ ˜ µ t ( θ ; ω ) . Under Assumption 7(b), the term − 2 ˜ σ t ( θ ; ω ) ϕ (0) is asymptotically constant ov er Θ ∗ t , so that f EI t ( θ ; ω ) is a positi ve linear transformation of EI t ( θ ; ω ) in the limit. Both then share the same maximizer , and under Assumption 6(b) and Assumption 6(d),     arg max θ ∈ Θ f EI t ( θ ; ω ) − arg max θ ∈ Θ EI t ( θ ; ω )     → 0 , as t → ∞ . J.1 Numerical Experiment In this section, we ha ve analyzed the asymptotic behavior of the root ﬁnding acquisition functions under the rootless regime. As a follo w-up experiment, we empirically validate the established results through a simple one-dimensional example in which the signed discrepancy remains strictly positiv e over the parameter space. W e deﬁne a noise-free signed discrepancy as ˜ f ( θ ) := θ 2 + ϵ, θ ∈ [ − 1 , 1] , where ϵ > 0 denotes the irreducible discrepancy . W e consider two distinct regimes: a small- ϵ case with ϵ = 0 . 1 and a large- ϵ case with ϵ = 10 . In both cases, the global optimizer is θ ∗ = 0 . For each θ ∈ Θ , noisy observations are generated as ˜ f ( j ) ( θ ) = ˜ f ( θ ) + ε ( j ) , ε ( j ) ∼ N (0 , 0 . 01 2 ) , where ε ( i ) denotes the i -th replication. At each θ , we perform 10 independent replications. 30 Root Finding and Metamodeling for Computer Model Calibration Figure 10: Acquisition function difference between the standard and root ﬁnding acquisition functions for large- ϵ and small- ϵ cases, av eraged over 100 replications. In this experiment setting, we progressi vely increase the number of uniformly spaced design points, ﬁt the metamodel at each stage t , and ev aluate both the minimization and root ﬁnding acquisition functions at the known solution θ ∗ = 0 . Then, we compute the dif ference of each root ﬁnding acquisition function from the asymptotic form deri ved in Section J, and repeat this process across 100 macro replications. Figure 10 illustrates the a veraged dif ferences as the number of design points increases. In the lar ge- ϵ case, the difference is negligible across all three acquisition functions regardless of the number of design points. This is because large ϵ makes the changes from the root ﬁnding modiﬁcations (e.g., additional CDF term in f PI ) ef fectively ne gligible. This result is somewhat natural since when the function is f ar away from zero, ﬁnding the root and ﬁnding the minimum become nearly equi valent problems. In the small- ϵ case, the LCB dif ference vanishes immediately , as the predicti ve mean becomes strictly positiv e with fewer observations in this simple example. For PI, the diff erence vanishes as more design points are added. For EI, howe ver , the difference between the asymptotic relationship reduces more slowly , since it is further governed by the additional term − 2 ˜ σ t ( θ ; ω ) ϕ (0) , which stabilizes only as the predictive uncertainty itself uniform within the region with many design points. This is also consistent with the discussion following Assumption 7(b), where the condition is e xpected to hold only at a relativ ely later stage of the search. K Derivation of Stochastic Kriging Pr edictive Distrib ution Consider a linear predictor of the form ˆ µ ( θ ) = w ( θ ) ⊤ ¯ f t n , where w ( θ ) ∈ R p + t is a weight vector . Then, the mean squared prediction error is giv en by E  ( ¯ f ( θ ) − ˆ µ ( θ )) 2  = k ( θ , θ ; l ) − 2 w ( θ ) ⊤ k ( θ , Θ t ; l ) + w ( θ ) ⊤ h k (Θ t , Θ t ; l ) + Σ ( t ) i w ( θ ) . Minimizing this with respect to w ( θ ) yields the optimal weight w ∗ ( θ ) = h k (Θ t , Θ t ; l ) + Σ ( t ) i − 1 k ( θ , Θ t ; l ) , and the corresponding optimal predictor µ t n ( θ ) = w ∗ ( θ ) ⊤ ¯ f t n = k ( θ , Θ t ; l ) h k (Θ t , Θ t ; l ) + Σ ( t ) i − 1 ¯ f t n . 31 Root Finding and Metamodeling for Computer Model Calibration The minimum achiev able prediction error is then ( σ t n ) 2 ( θ ) = k ( θ , θ ; l ) − k ( θ , Θ t ; l ) h k (Θ t , Θ t ; l ) + Σ ( t ) i − 1 k ( θ , Θ t ; l ) ⊤ . Under the GRF assumption, the predictiv e distribution at θ is given by η t n ( θ ) ∼ N  µ t n ( θ ) , ( σ t n ) 2 ( θ )  . L Derivation of Pr obability of Root Existence Let (Ω , F , P ) be the probability space introduced in Section 4. The sign-change ev ent for a pair ( θ a , θ b ) is deﬁned as I ± ( θ a , θ b ) := { ω ∈ Ω : ˜ f n ( θ a ; ω ) · ˜ f n ( θ b ; ω ) < 0 } , under which, by Theorem 10, a root of ˜ f n ( θ ) is guaranteed within [ θ a , θ b ] . Then, the probability of this event is P ( I ± ( θ a , θ b )) = P ( ˜ f n ( θ a ) < 0 , ˜ f n ( θ b ) > 0) + P ( ˜ f n ( θ a ) > 0 , ˜ f n ( θ b ) < 0) , since the sign-change event I ± ( θ a , θ b ) arises from the two opposite cases. As ˜ f n ( θ ) ( θ ) is a replication average, its exact distribution is generally intractable, and hence we approximate it using the stochastic kriging posterior ˜ η t n ( θ ) ∼ N ( ˜ µ t n ( θ ) , ( ˜ σ t n ) 2 ( θ )) and assume independent replications between conﬁgurations. Then, we can write the closed form expression as P ( I ± ( θ a , θ b )) ≈ P ( ˜ η t n ( θ a ) < 0) P ( ˜ η t n ( θ b ) > 0) + P ( ˜ η t n ( θ a ) > 0) P ( ˜ η t n ( θ b ) < 0) = Φ  − ˜ µ t n ( θ a ) ˜ σ t n ( θ a )  + Φ  − ˜ µ t n ( θ b ) ˜ σ t n ( θ b )  − 2Φ  − ˜ µ t n ( θ a ) ˜ σ t n ( θ a )  Φ  − ˜ µ t n ( θ b ) ˜ σ t n ( θ b )  . M Figures f or Numerical Experiments This section presents the supplementary ﬁgures for Section 6. Figure 11: Noise-free signed discrepancy surface of the modiﬁed Himmelblau function. References [1] Michael Griev es and John V ickers. Digital twin: Mitigating unpredictable, undesirable emergent behavior in complex systems. In T ransdisciplinary per spectives on complex systems: New ﬁndings and appr oaches , pages 85–113. Springer , 2016. [2] Panagiotis Ai valiotis, K onstantinos Georgoulias, and Geor ge Chryssolouris. The use of digital twin for predictive maintenance in manufacturing. International Journal of Computer Inte grated Manufacturing , 32(11):1067–1080, 2019. 32 Root Finding and Metamodeling for Computer Model Calibration Figure 12: Overview of the M/M/1 queueing system calibration setup. Figure 13: Estimated discrepancy functions for the M/M/1 calibration problem under root ﬁnding and minimization. For each θ , the 95% conﬁdence interval is constructed from 100 macro replications. Figure 14: 100 simulated realizations of the stochastic SIR model under the described experimental setting. 33 Root Finding and Metamodeling for Computer Model Calibration Figure 15: Estimated discrepancy functions for stochastic SIR calibration task under root ﬁnding and minimization. For each θ , the 95% conﬁdence is constructed from 100 macro replications. [3] Anirban Chaudhuri, Graham Pash, David A Hormuth, Guillermo Lorenzo, Michael Kapteyn, Chengyue W u, Ernesto ABF Lima, Thomas E Y ankeelov , and Karen W illcox. Predicti ve digital twin for optimizing patient- speciﬁc radiotherapy regimens under uncertainty in high-grade gliomas. F rontier s in Artiﬁcial Intelligence , 6:1222612, 2023. [4] Mengyang Gu and Long W ang. Scaled gaussian stochastic process for computer model calibration and prediction. SIAM/ASA Journal on Uncertainty Quantiﬁcation , 6(4):1555–1583, 2018. [5] Marc C K ennedy and Anthony O’Hagan. Bayesian calibration of computer models. Journal of the Royal Statistical Society: Series B (Statistical Methodology) , 63(3):425–464, 2001. [6] Matthew Plumlee. Bayesian calibration of inexact computer models. Journal of the American Statistical Association , 112(519):1274–1285, 2017. [7] Rui T uo and C. F . Jeff W u. Efﬁcient calibration for imperfect computer models. The Annals of Statistics , 43(6):2331 – 2352, 2015. [8] Chih-Li Sung and Rui T uo. A revie w on computer model calibration. W ile y Interdisciplinary Re views: Computa- tional Statistics , 16(1):e1645, 2024. [9] Sicheng Zhan, Gordon W ichern, Christopher Laughman, Adrian Chong, and Ankush Chakrabarty . Calibrating building simulation models using multi-source datasets and meta-learned bayesian optimization. Ener gy and Buildings , 270:112278, 2022. [10] Shurui Lv , Jun Y u, Y an W ang, and Jiang Du. Fast calibration for computer models with massiv e physical observations. SIAM/ASA Journal on Uncertainty Quantiﬁcation , 11(3):1069–1104, 2023. [11] Lanyue T ang, Duo Zhang, Y u Han, Aohui Fu, He Zhang, Y e Tian, Lishengsa Y ue, Di W ang, and Jian Sun. Parallel-computing-based calibration for microscopic trafﬁc simulation model. T ransportation r esear ch r ecor d , 2678(4):279–294, 2024. [12] Saumya Bhatnagar , W on Chang, Seonjin Kim, and Jiali W ang. Computer model calibration with time series data using deep learning and quantile regression. SIAM/ASA Journal on Uncertainty Quantiﬁcation , 10(1):1–26, 2022. [13] Jun Y uan, Szu Hui Ng, and Kwok Leung Tsui. Calibration of stochastic computer models using stochastic approximation methods. IEEE T ransactions on A utomation Science and Engineering , 10(1):171–186, 2012. [14] Jie Xu. Model calibration. In Advances in Modeling and Simulation: Seminal Resear ch fr om 50 Y ears of W inter Simulation Confer ences , pages 27–46. Springer , 2017. [15] Bingjie Liu, Xubo Y ue, Eunshin Byon, and Raed Al K ontar . Parameter calibration in wak e effect simulation model with stochastic gradient descent and stratiﬁed sampling. The Annals of Applied Statistics , 16(3):1795–1821, 2022. [16] Peng Cui, W enbo Hu, and Jun Zhu. Calibrated reliable regression using maximum mean discrepanc y . Advances in Neural Information Pr ocessing Systems , 33:17164–17175, 2020. [17] Herbert Robbins and Sutton Monro. A stochastic approximation method. The annals of mathematical statistics , pages 400–407, 1951. [18] Sara Shashaani, Fatemeh S Hashemi, and Raghu Pasupathy . Astro-df: A class of adaptiv e sampling trust-region algorithms for deriv ati ve-free stochastic optimization. SIAM Journal on Optimization , 28(4):3145–3176, 2018. 34 Root Finding and Metamodeling for Computer Model Calibration [19] Bruce Ankenman, Barry L Nelson, and Jeremy Staum. Stochastic kriging for simulation metamodeling. Operations r esearc h , 58(2):371–382, 2010. [20] Helin Zhu, Joshua Hale, and Enlu Zhou. Simulation optimization of risk measures with adaptive risk levels. Journal of Global Optimization , 70(4):783–809, 2018. [21] Daniel Jacobson, Menna Hassan, and Zhijie S Dong. Exploring the ef fect of clustering algorithms on sample av erage approximation. In 2021 Institute of Industrial and Systems Engineers (IISE) Annual Confer ence & Expo , 2021. [22] Jeffre y Larson, Matt Menickelly , and Stefan M W ild. Deriv ative-free optimization methods. Acta Numerica , 28:287–404, 2019. [23] Jeremy Staum. Better simulation metamodeling: The why , what, and how of stochastic kriging. In Proceedings of the 2009 winter simulation confer ence (WSC) , pages 119–133. IEEE, 2009. [24] Sujin Kim, Raghu Pasupathy , and Shane G Henderson. A guide to sample average approximation. Handbook of simulation optimization , pages 207–243, 2014. [25] Aad W V an der V aart. Asymptotic statistics , volume 3. Cambridge university press, 2000. [26] Margaret A Oli ver and Richard W ebster . Kriging: a method of interpolation for geographical information systems. International Journal of Geo graphical Information System , 4(3):313–332, 1990. [27] Peter I Frazier . A tutorial on bayesian optimization. arXiv pr eprint arXiv:1807.02811 , 2018. [28] Jack PC Kleijnen. Simulation optimization through regression or kriging metamodels. High-performance simulation-based optimization , pages 115–135, 2019. [29] Di Sha, Kaan Ozbay , and Y ue Ding. Applying bayesian optimization for calibration of transportation simulation models. T ransportation Resear ch Recor d , 2674(10):215–228, 2020. [30] Richard James Ord-Smith and John Stephenson. Computer simulation of continuous systems , volume 3. CUP Archiv e, 1975. [31] François E Cellier and Jur gen Greifeneder . Continuous system modeling . Springer Science & Business Media, 2013. [32] Özge Sürer . Batch sequential experimental design for calibration of stochastic simulation models. T echnometrics , 0(0):1–23, 2025. [33] Jian T ang, Anuj Pal, W en Dai, Chad Archer, James Y i, and Guoming Zhu. Borderline knock prediction using machine learned kriging model. In 2022 American contr ol conference (A CC) , pages 3038–3043. IEEE, 2022. [34] Deng Huang, Theodore T Allen, W illiam I Notz, and Ning Zeng. Global optimization of stochastic black-box systems via sequential kriging meta-models. Journal of global optimization , 34(3):441–466, 2006. [35] Sasan Amini and Inneke V an Nieuwenhuyse. A tutorial on kriging-based stochastic simulation optimization. arXiv pr eprint arXiv:2502.05216 , 2025. [36] Ning Quan, Jun Y in, Szu Hui Ng, and Loo Hay Lee. Simulation optimization via kriging: a sequential search using expected impro vement with computing budget constraints. Iie T ransactions , 45(7):763–780, 2013. [37] Han Zhou, Xingchen Ma, and Matthew B Blaschko. A corrected expected improvement acquisition function under noisy observations. In Asian Conference on Mac hine Learning , pages 1747–1762. PMLR, 2024. [38] V ictor Picheny , David Ginsbourger , Y ann Richet, and Gregory Caplin. Quantile-based optimization of noisy computer experiments with tunable precision. T echnometrics , 55(1):2–13, 2013. [39] Seong-Hee Kim and Barry L Nelson. Recent advances in ranking and selection. In 2007 W inter Simulation Confer ence , pages 162–172. IEEE, 2007. [40] David J Eckman, Shane G Henderson, and Sara Shashaani. Diagnostic tools for ev aluating and comparing simulation-optimization algorithms. INFORMS Journal on Computing , 35(2):350–367, 2023. [41] Y ongseok Jeon and Sara Shashaani. Calibrating digital twins via bayesian optimization with a root ﬁnding strategy . In 2024 W inter Simulation Confer ence (WSC) , pages 335–346. IEEE, 2024. [42] Peter W Glynn and W ard Whitt. Indirect estimation via l= λ w . Operations Resear ch , 37(1):82–103, 1989. [43] Azam Asanjarani, Y oni Nazarathy , and Peter T aylor . A survey of parameter and state estimation in queues. Queueing Systems , 97(1):39–80, 2021. [44] Mehrdad Kiamari, Go wri Ramachandran, Quynh Nguyen, Ev a Pereira, Jeanne Holm, and Bhaskar Krishnamachari. Covid-19 risk estimation using a time-varying sir-model. In Pr oceedings of the 1st ACM SIGSP ATIAL International W orkshop on Modeling and Understanding the Spr ead of CO VID-19 , pages 36–42, 2020. 35 Root Finding and Metamodeling for Computer Model Calibration [45] Y unsoo Ha, Sara Shashaani, and Raghu P asupathy . Complexity of zeroth-and ﬁrst-order stochastic trust-re gion algorithms. SIAM Journal on Optimization , 35(3):2098–2127, 2025. [46] Jaeshin Park, Eunshin Byon, Y oung Myoung Ko, and Sara Shashaani. Strata design for v ariance reduction in stochastic simulation. T echnometrics , 67(2):203–214, 2025. 36

Root Finding and Metamodeling for Rapid and Robust Computer Model Calibration

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment