This work concerns estimation of multidimensional nonlinear regression models using multilayer perceptron (MLP). The main problem with such model is that we have to know the covariance matrix of the noise to get optimal estimator. however we show that, if we choose as cost function the logarithm of the determinant of the empirical error covariance matrix, we get an asymptotically optimal estimator.
Let us consider a sequence (Y t , Z t ) t∈N of i.i.d. 1 random vectors (i.e. identically distributed and independents). So, each couple (Y t , Z t ) has the same law that a generic variable (Y, Z). Moreover, we assume that the model can be written Y t = F W 0 (Z t ) + ε t where • F W 0 is a function represented by a MLP with parameters or weights W 0 .
• (ε t ) is an i.i.d. centered noise with unknown invertible covariance matrix Γ 0 .
Our goal is to estimate the true parameter by minimizing an appropriate cost function. This model is called a regression model and a popular choice for the associated cost function is the mean square error :
where . denotes the Euclidean norm on R d . Although this function is widely used, it is easy to show that we get then a suboptimal estimator. An other solution is to use an approximation of the covariance error matrix to compute generalized least square estimator :
where T denotes the transposition of the matrix. Here we assume that Γ is a good approximation of the true covariance matrix of the noise Γ 0 . However it takes time to compute a good approximation of matrix Γ 0 and it leads asymptotically to the cost function proposed in this article (see for example Rynkiewicz [4]) :
This paper is devoted to the theoretical study of U n (W ). We assume that the true architecture of the MLP is known so that the Hessian matrix computed in the sequel verifies the assumption to be definite positive (see Fukumizu [1]).
In this framework, we study the asymptotic behavior Ŵn := arg min U n (W ), the weights minimizing the cost function U n (W ). We show that under simple assumptions this estimator is asymptotically optimal in the sense that it has the same asymptotic behavior than the generalized least square estimator using the true covariance matrix of the noise.
Numerical procedures to compute this estimator and examples of it behavior can be found in Rynkiewicz [4].
∂W k ∂W l ) for the ddimensional vector of partial derivative (resp. second order partial derivatives) of each component of F W (X).
Now, if Γ n (W ) is a matrix depending of the parameter vector W , we get From Magnus and Neudecker [3]
note that these matrix Γ n (W ) and it inverse are symmetric. Now, if we note
We write now
We get
Now, Magnus and Neudecker [3] give an analytic form of the derivative of an inverse matrix, so we get
3 Asymptotic properties of Ŵn First, following the same lines that Yao [5], it is easy to show that, if the noise of the model has a moment of order at least 2, the estimator is strongly consistent (i.e. Ŵn a.s.
→ W 0 ). Moreover, for a MLP function, there exists a constant C such that we have the following inequalities :
So, if Z has a moment of order at least 3 (see the justification in Yao [5]), we get the following lemma :
Lemma 1 Let ∆U n (W 0 ) be the gradient vector of U n (W ) at W 0 , ∆U (W 0 ) be the gradient vector of U (W ) := log det(Y -F W (Z)) at W 0 and HU n (W 0 ) be the Hessian matrix of U n (W ) at W 0 .
We define finally
We get then
where, the component (k, l) of the matrix I 0 is :
proof To prove the lemma, we remark first that the component (k, l) of the matrix 4I 0 is :
and, since the trace of the product is invariant by circular permutation,
l ) Now, for the component (k, l) of the expectation of the Hessian matrix, we remark that
From a classical argument of local asymptotic normality (see for example Yao [5]), we deduce then the following property for the estimator Ŵn :
n the estimator of the generalized least square :
then we have
We remark that Ŵn has the same asymptotic behavior than the estimator generalized least square estimator with the true covariance matrix Γ -1 0 which is asymptotically optimal (see for example Ljung [2]), so the proposed estimator is asymptotically optimal too.
In the linear multidimensional regression model the optimal estimator has an analytic solution (see Magnus and Neudecker [3]), so it doesn’t make sense to consider minimization of a cost function. However, for the non-linear multidimensional regression model the ordinary least square estimator is sub-optimal if the covariance matrix of the noise is not the identity matrix. We can overcome this difficulty by using the cost function U n (W ). The numerical computation and the empirical properties of these estimator have been studied in a previous article (see rynkiewicz [4]). In this paper, we have given a proof of the optimality of the estimator associated with U n (W ). This is then a good choice for the estimation of multidimensional non-linear regression model with multilayer perceptron.
It is not hard to extend all what we show in this paper for stationary mixing variables and so for time series
This content is AI-processed based on open access ArXiv data.