Learning activation functions from data using cubic spline interpolation

Neural networks require a careful design in order to perform properly on a given task. In particular, selecting a good activation function (possibly in a data-dependent fashion) is a crucial step, which remains an open problem in the research communi…

Authors: Simone Scardapane, Michele Scarpiniti, Danilo Comminiello

Learning activation functions from data using cubic spline interpolation
Learning acti vation functions fr om data using cubic spline interpolation Simone Scardapane, Michele Scarpiniti, Danilo Comminiello, and Aurelio Uncini Department of Information Engineering, Electronics and T elecommunications (DIET), “Sapienza” Univ ersity of Rome, V ia Eudossiana 18, 00184, Rome. Email: { simone.scardapane, michele.scarpiniti, danilo.comminiell } @uniroma1.it; aurel@ieee.org Abstract. Neural networks require a careful design in order to perform properly on a giv en task. In particular , selecting a good activ ation function (possibly in a data-dependent fashion) is a crucial step, which remains an open problem in the research community . Despite a large amount of inv estigations, most current implementations simply select one fixed function from a small set of candidates, which is not adapted during training, and is shared among all neurons through- out the different layers. Howe ver , neither two of these assumptions can be sup- posed optimal in practice. In this paper , we present a principled way to have data-dependent adaptation of the acti vation functions, which is performed inde- pendently for each neuron. This is achiev ed by lev eraging over past and present advances on cubic spline interpolation, allo wing for local adaptation of the func- tions around their regions of use. The resulting algorithm is relativ ely cheap to implement, and overfitting is counterbalanced by the inclusion of a novel damp- ing criterion, which penalizes unwanted oscillations from a predefined shape. Preliminary experimental results v alidate the proposal. Keyw ords: Neural network, acti vation function, spline interpolation 1 Introduction Neural networks (NNs) are extremely po werful tools for approximating complex non- linear functions [7]. The nonlinear behavior is introduced in the NN architecture by the elementwise application of a given nonlinearity , called the activ ation function (AF), at ev ery layer . Since AFs are crucial to the dynamics and computational power of NNs, the history of the two over the last decades is deeply connected [15]. As an example, the use of differ entiable AFs was one of the major breakthroughs in NNs, leading di- rectly to the back-propagation algorithm. More recently , progress on piecewise linear functions was shown to facilitate backward flow of information for training very deep networks [4]. At the same time, it is somewhat surprising that the vast majority of NNs only use a small handful of fixed functions, to be hand-chosen by the practitioner be- fore the learning process. W orse, there is no principled reason to believ e that a ‘good’ nonlinearity might be the same across all layers of the network, or ev en across neurons in the same layer . 2 Scardapane et al. This is sho wn clearly in a recent work by Agostinelli et al. [1], where e very neuron in a deep netw ork was endowed with an adaptable piece wise linear function with possi- bly different parameters, concluding that “ the standar d one-activation-function-fits-all appr oach may be suboptimal ” in current practice. Experiments in AF adaptation have a long history , but they hav e nev er met a wide applicability in the field. The simplest ap- proach is to parameterize each sigmoid function in the network by one or more ‘shape’ parameters to be optimized, such as in the seminal 1996 paper by Chen and Chang [3] or the later work by T rentin [16]. Along a similar line, one may consider the use of polynomial AFs, wherein each coefficient of the polynomial is adapted by gradient descent [11]. Additional in vestigations can be found in [20,5,2,10,9]. One strong dra w- back of these approaches is that the parameters in volv ed affect the AF globally , such that a change in one region of the function may be counterproducti ve on a different, possibly faraw ay , region. Sev eral years ago, an alternativ e approach was introduced by using spline interpo- lating functions as AFs [17,6], resulting in what was called a spline AF (SAF). Splines are an attracti ve choice for interpolating unkno wn functions, since they can be described by a small amount of parameters, yet each parameter has a local effect, and only a fixed number of them is in volv ed e very time an output value is computed [18]. The original works in [17,6] had two main drawbacks that prevented a wider use of the underly- ing theory . First, SAFs were only inv estigated in an online setting, where updates are computed one sample at a time. Whether an ef ficient implementation is possible (and feasible) also for batch (or mini-batch) settings was not shown. Secondly , the obtained SAFs had a tendency to overfit training data, resulting in oscillatory behaviors which hindered performance. Inspired by recent successes in the field of nonlinear adapti ve filtering [13,14], our aim in this paper is two-fold. On one hand, we provide a modern introduction to the use of SAFs in neural networks, with a particular emphasis on their efficient implementation in the case of batch (or mini-batch) training. Our treatment clearly shows that the major problem in their implementation, which is e vident from the discussion abo ve, is the design of an efficient way to regularize their control points. In this sense, as a second contribution we provide a simple (yet effecti ve) ‘damping’ cri- terion to prev ent unwanted oscillations in the testing phase, which penalizes deviations from the original points in terms of ` 2 norm. A restricted set of experiments sho ws that the resulting formulation is able to achiev e a lower test error than a standard NN with fixed AFs, while at the same time learning non-trivial activ ations with different shapes across different neurons. The rest of the paper is organized as follows. Section 2 presents the basic theory of SAFs for the case of a single neuron. Section 3 extends the treatment to the case of a NN with one hidden layer , by deriving the gradient equations for the SAFs parameters in the internal layer . Then, Section 4 goes over the experimental results, while we conclude with some final remarks in Section 5. 2 The spline activation function W e begin our treatment of SAFs with the simplest case of a single neuron endo wed with a flexible AF (see [17,13] for additional details). Given a generic input x ∈ R D , the Learning activ ation functions from data 3 output of the SAF is computed as: s = w T x , (1) y = ϕ ( s ; q ) , (2) where w ∈ R D (we suppose that an e ventual bias term is added directly to the input vec- tor), and the AF ϕ ( · ) is parameterized by a vector q ∈ R Q of internal parameters, called knots . The knots are a sampling of the AF v alues ov er Q representati ve points spanning the ov erall function. In particular , we suppose the knots to be uniformly spaced, i.e. q i + 1 = q i + ∆ x , for a fixed ∆ x ∈ R , and symmetrically spaced around the origin. Given s , the output is computed by spline interpolation over the closest knot and its P right- most neighbors. The common choice P = 3, which we adopt in this paper , corresponds to cubic interpolation, and it is generally a good trade-of f between locality of the output and interpolating precision. Giv en the index i of the closest knot, we can define the normalized abscissa value between q i and q i + 1 as: u = s ∆ x − j s ∆ x k . (3) where b·c is the floor operator . From u we can compute the normalized reference vector u =  u P u P − 1 . . . u 1  T , while from i we can extract the relev ant control points q i = [ q i q i + 1 . . . q i + P ] T . W e refer to the vector q i as the i th span . The output (2) is then computed as: y = ϕ ( s ) = u T Bq i , (4) where B ∈ R ( P + 1 ) × ( P + 1 ) is called the spline basis matrix. In this work, we use the Catmull-Rom (CR) spline with P = 3, given by: B = 1 2     − 1 3 − 3 1 2 − 5 4 − 1 − 1 0 1 0 0 2 0 0     . (5) Different bases gi ve rise to alternative interpolation schemes, e.g. a spline defined by a CR basis passes through all the control points, but its second deriv ativ e is not continu- ous. Apart from the locality of the output, SAFs ha ve two additional interesting prop- erties. First, the output in (4) is extremely efficient to compute, in volving only vector - matrix products of very small dimensionality . Secondly , deri vati ves with respect to the internal parameters are equiv alently simple and can be written down in closed form. In particular , the deriv ative of the nonlinearity ϕ ( s ) with respect to the input s is giv en by: ∂ ϕ ( s ) ∂ s = ϕ 0 ( s ) = ∂ ϕ ( s ) ∂ u · ∂ u ∂ s =  1 ∆ x  ˙ uBq i , (6) where: ˙ u = ∂ u ∂ u =  Pu P − 1 ( P − 1 ) u P − 2 . . . 1 0  T . (7) 4 Scardapane et al. Giv en this, the deriv ativ e of the SAF output y with respect to w is straightforward: ∂ ϕ ( s ) ∂ w = ϕ 0 ( s ) · ∂ s ∂ w = ϕ 0 ( s ) x , (8) Similarly , for q i we obtain: ∂ ϕ ( s ) ∂ q i = B T u . (9) while we hav e ∂ ϕ ( s ) ∂ q k = 0 for any element q k outside the current span q i . 3 Designing networks with SAF neur ons 3.1 Computing outputs and inner derivati ves Now we consider the more elaborate case of a single hidden layer NN, with a D - dimensional input, H neurons in the hidden layer , and O output neurons. 1 Every neuron in the network uses a SAF with possibly different adaptive control points, which are set independently during the training process. F or easiness of computation, we suppose that the sampling set of the splines is the same for ev ery neuron (i.e., each neuron has Q points equispaced according to the same ∆ x ), and we also have a single shared basis matrix B . The forward phase of the network is similar to that of a standard network. In particular , giv en the input x , we first compute the output of the i th hidden neuron, i = 1 , . . . , H , as: h i = ϕ ( w T h i x ; q h i ) . (10) These are concatenated in a single vector h = [ h 1 , . . . , h H , 1 ] T , and the i th output of the network, i = 1 , . . . , O , is giv en by: f i ( x ) = y i = ϕ ( w T y i h ; q y i ) . (11) The deri vati ves with respect to the parameters  w y i , q y i  , i = 1 , . . . , O can be computed directly with (8)-(9), substituting x with h . By back-propagation, the deriv ativ e of the i th output with respect to the j th (inner) weight vector w h j is similar to a standard NN: ∂ y j ∂ w h i = ϕ 0 ( s y j ) · ϕ 0 ( s h i ) · b w h i c j · x , (12) where with a slight abuse of notation we let s y j denote the acti vation of the j th output (and similarly for s h i ), b·c j extracts the j th element of its input vector , and the two ϕ 0 ( · ) are given by (6). For the deri vati ve of the control points of the i th hidden neuron, denote by q h i , k the currently activ e span, and by u h i the corresponding reference vector . The deriv ativ e with respect to the j th output is then giv en by: ∂ y j ∂ q h i , k = ϕ 0 ( s y j ) · b w h i c j · B T u h i . (13) 1 W e note that the follo wing treatment can be extended easily to the case of a network with more than one hidden layer . Howe ver , restricting it to a single layer allow us to keep the discussion focused on the problems/advantages arising in the use of SAFs. W e leave this extension to a future work. Learning activ ation functions from data 5 3.2 Initialization of the control points An important aspect that we have not discussed yet is how to properly initialize the control points. One immediate choice is to sample their v alues from an AF which is known to work well on the giv en problem, e.g. a hyperbolic tangent. In this way , the network is guaranteed to work similarly to a standard NN in the initial phase of learning. Additionally , we hav e found good improvements in error by adding Gaussian noise N ( 0 , σ 2 ) with small variance σ 2 to a randomly chosen subset of control points (around 5% in our experiments). This provides a good variability in the beginning, similarly to how connections are set close to (b ut not identically equal to) zero during initialization. 3.3 Choosing a training criterion Suppose we are provided with a training set of N input/output pairs in the form { x i , d i } N i = 1 . For simplicity of notation, we denote by w the concatenation of all weight vectors  w h i  and  w y i  , and by q a similar concatenation of all control points. Training can be for- mulated as the minimization of the following cost function: J ( w , q ) = 1 N N ∑ i = 1 L ( d i , f ( x i )) + λ w R w ( w ) + λ q R q ( q ) , (14) where L ( · , · ) is an error function, while R w ( · ) and R q ( · ) provide meaningful regular - ization on the two set of parameters. The first two terms are well-kno wn in the neural network literature [7], and they can be set accordingly . Particularly , in our experiments we consider a squared error term L ( d i , f ( x i )) = k d i − f ( x i ) k 2 2 , and ` 2 regularization on the weights R w ( w ) = k w k 2 2 . The deriv atives of L ( · , · ) can be computed straightforwardly with the formulas presented in Section 3.1. The term R q ( q ) is used to avoid overfitted solutions for the control points. In fact, its presence is the major difference with respect to previous attempts at implementing SAFs in neural networks [17], wherein ov erfitting was counterbalanced by choosing a large value for ∆ x , which in a way goes outside the philosophy of spline interpolation itself. At the same time, choosing a proper form for the regularization term is non-trivial, as the term should be cheap to compute, and it should introduce just as much a priori information as needed, without hindering the training process. Most of the literature on regularizing w cannot be used here, as the corresponding formulations do not make sense in the context of spline interpolation. As an example, simply penalizing the ` 2 norm of q leads to functions close to the zero function, while imposing sparsity is also meaningless. For the purpose of this paper , we consider the following ‘damping’ criterion: R q ( q ) = k q − q o k 2 2 , (15) where q o represents the initial v alues for the control points, as discussed in the pre vious section (without considering additional noise). The criterion makes intuiti ve sense as follows: while for w we wish to penalize unwanted de viations from very small weights (which can be justified with arguments from learning theory), in the case of q we are 6 Scardapane et al. interested in penalizing changes with respect to a ‘good’ function parameterized by the initial control points q o , namely one of the standard AFs used in NN training. In fact, setting a value for λ q very high essentially deactiv ates the adaptation of the control points. Clearly , other choices are possible, and in this sense this paper serves as a start- ing point for further in vestigations to wards this objective. As an example, we may wish to penalize first (or second) order deriv atives of the splines in order to force a desired lev el of smoothness [18]. 3.4 Remarks on the implementation In order to be usable in practice, SAFs require an efficient implementation to compute outputs and deri v ativ es concurrently for the entire training dataset or , alternati vely , for a properly chosen mini-batch (in the case of stochastic optimization algorithms). T o be gin with, we underline that the equations for the reference vector (see (3)) do not depend on the specific neuron, and for this reason they can easily be v ectorized layer-wise on most numerical algebra libraries to obtain all vectors concurrently . Additionally , the index es and relati ve terms Bq i in (4) can be cached during the forward pass, to be reused during the computation of the deri vati ves. In this sense, the outputs of a layer and its deriv atives can be computed by one 4 × 4 matrix-vector computation, and three 4-dimensional inner products, which have to be repeated for e very pair input/neuron. In our experience, the cost of a relatively well-optimized implementation does not exceed twice that of a standard network for medium-sized batches, where the most onerous operation is the reshaping of the gradients in (9) and (13) into a single vector of gradients relati ve to the global vector q . 4 Experimental results 4.1 Experimental setup T o ev aluate the preliminary proposal, we consider two simple regression benchmarks for neural networks, the ‘chemical’ dataset (included among MA TLAB’ s testbeds for function fitting), and the ‘California Housing’. 2 They ha ve respectiv ely 498 and 20640 examples, and 8 numerical features. Inputs are normalized in the [ − 1 , + 1 ] range, while outputs are normalized in the [ − 0 . 5 , + 0 . 5 ] range. W e compare a NN with 5 hidden neu- rons and tanh ( · ) AFs (denoted as ‘Standard’ in the results), and a NN with the same number of neurons and SAF nonlinearities. The weight vector w is initialized with the method described in [4]. Each SAF is initialized from a tanh ( · ) nonlinearity , and control points are defined in the [ − 2 , + 2 ] range with ∆ x = 0 . 2, which is a good compromise between locality of the SAFs and the ov erall number of adaptable parameters. For the first scenario, λ q is kept to a small value of 10 − 5 . For each experiment, a random 30% of the dataset is k ept for testing, and results are av eraged ov er 15 different splits to ave r- age out statistical effects. Error is computed with the Normalized Root Mean-Squared Error (NRMSE). The optimization problems are solved using a freely available MA T - LAB implementation of the Polack-Ribiere variant of the nonlinear conjugate gradient 2 http://www.dcc.fc.up.pt/ ˜ ltorgo/Regression/cal_housing.html Learning activ ation functions from data 7 T able 1. A verage results for scenario 1 ( λ w = 1), together with one standard deviation. Dataset Nonlinearity Tr . RMSE T .st NRMSE Chemical Standard 1 . 00 ± 0 . 00 1 . 00 ± 0 . 01 SAF 0 . 29 ± 0 . 02 0 . 31 ± 0 . 02 Calhousing Standard 1 . 02 ± 0 . 00 1 . 01 ± 0 . 01 SAF 0 . 56 ± 0 . 01 0 . 57 ± 0 . 02 optimization algorithm by C.E. Rasmussen. [12]. 3 The optimization process is allo wed 1500 maximum iterations. MA TLAB code for the experiments is also av ailable on the web . 4 W e briefly remark that the MA TLAB library , apart from repeating the exper - iments presented here, is also designed to handle networks with more than a single hidden layer, and implements the ADAM algorithm [8] for stochastic training in case of a larger dataset. 4.2 Scenario 1 : strong underfitting As a first example, we consider a scenario of strong underfitting, wherein the user has misleadingly selected a very large value of λ w = 1, leading in turn to extremely small values for the elements of w after training. Results in terms of training and test RMSE are pro vided in T ab . 1. Since the acti vations of the NN tend to be v ery close to 0 (where the hyperbolic tangent operates in an almost-linear regime), standard NNs have a con- stant zero output, leading to a RMSE of 1. Nonetheless, SAF netw orks are able to reach a very satisfactory le vel of performance, which in the first case is almost comparable to that of a fully optimized network (see the follo wing section). T o sho w the reasons for this, we ha ve plotted four representati ve nonlinearities after training in Fig. 1. It is easy to see that the nonlinearities have adapted to act as ‘ampli- fiers’ for the acti vations in their operating regime, with mild and strong peaks around 0. Of particular interest is the fact that the resulting SAFs need not be perfectly centered around 0 (e.g. Fig. 1c), or e ven symmetrical around the y -axis (e.g. Fig. 1d). In fact, the splines are able to efficiently counterbalance a bad setting for the weights, with behav- iors which would be very hard (or close to impossible) using standard setups with fixed, shared, mild nonlinearities. 4.3 Scenario 2 : well-optimized parameters In our second scenario, we consider a similar comparison with respect to before, b ut we fine-tune the parameters of the two methods using a grid-search with a 3-fold cross- validation on the training data as performance measure. Both λ w and λ q (only for the 3 http://learning.eng.cam.ac.uk/carl/code/minimize/ 4 [The URL has been hidden for the revie w process.] 8 Scardapane et al. Ac tiv ati on s -2 0 2 Spline value -1 0 1 Ac tiv ati on s -2 0 2 Spline value -1 0 1 Ac tiv ati on s -2 0 2 Spline value -1 0 1 2 Ac tiv ati on s -2 0 2 Spline value -1 0 1 2 Fig. 1. Non-trivial representati ve SAFs after training for scenario 1. T able 2. Optimal parameters (averaged ov er the runs) found by the grid-search procedure for scenario 2. Dataset Nonlinearity λ w λ q Chemical Standard 10 − 3 — SAF 10 − 2 10 − 4 Calhousing Standard 10 − 4 — SAF 10 − 3 10 − 4 proposed algorithm) are searched in an exponential interval 2 j , with j = − 10 , . . . , 5. Optimal parameters found by the grid-search are listed in T able 2, while results in terms of training and test NRMSE are collected in T able 3. Overall, we see that the NNs endowed with the SAF nonlinearities are able to sur- pass by a large margin a standard NN, and the results from the previous scenario. The only minor drawback evidenced in T able 3 is that the SAF network has some ov erfitting occurring in the ‘chemical’ dataset (around 2 points of NRMSE), showing that there is still some room for improv ement in terms of spline optimal regularization. Also in this case, we plot some representatives SAFs after training (taken among those which are not trivially identical to the tanh nonlinearity) in Fig. 2. As before, in general SAFs tend to provide an amplification (with a possible change of sign) of their activ ation around some region of operation. It is interesting to observe that, also in this case, the optimal shape need not be symmetric (e.g. Fig. 2a), and might even be far Learning activ ation functions from data 9 T able 3. A verage results for scenario 2 (fine-tuning for parameters), together with one standard deviation. Dataset Nonlinearity Tr . RMSE T .st NRMSE Chemical Standard 0 . 32 ± 0 . 01 0 . 32 ± 0 . 02 SAF 0 . 26 ± 0 . 01 0 . 28 ± 0 . 02 Calhousing Standard 0 . 55 ± 0 . 01 0 . 55 ± 0 . 01 SAF 0 . 51 ± 0 . 02 0 . 51 ± 0 . 02 Ac tiv ati on s -2 0 2 Spline value -1 0 1 Ac tiv ati on s -2 0 2 Spline value -1 0 1 Ac tiv ati on s -2 0 2 Spline value -1 0 1 Ac tiv ati on s -2 0 2 Spline value -1 0 1 Ac tiv ati on s -2 0 2 Spline value -1 0 1 Ac tiv ati on s -2 0 2 Spline value -1 0 1 2 Fig. 2. Non-trivial representati ve SAFs after training for scenario 2. from centered around 0 (e.g. Fig. 2c). Resulting nonlinearities can also present some additional non-trivial behaviors, such as a small region of insensibility around 0 (e.g. Fig. 2d), or a re gion of pre-saturation before the actual tanh saturation (e.g. Fig.s 2e-2f). 5 Conclusion In this paper, we have presented a principled way to adapt the activ ation functions of a neural network from training data, locally and independently for each neuron. Particu- larly , each nonlinearity is implemented with cubic spline interpolation, whose control points are adapted in the optimization phase. Overfitting is controlled by a no vel ` 2 reg- ularization criterion av oiding unwanted oscillations. Albeit efficient, this criterion does constrain the shapes of the resulting functions by a certain degree. In this sense, the de- sign of more advanced regularization terms is a promising line of research. Additionally , 10 Scardapane et al. we plan on exploring the application of SAFs to deeper networks, where it is expected that the statistics of the neurons’ activ ations can change significantly layer-wise [4]. References 1. Agostinelli, F ., Hoffman, M., Sadowski, P ., Baldi, P .: Learning activ ation functions to im- prov e deep neural networks. arXiv preprint arXi v:1412.6830 (2014) 2. Chandra, P ., Singh, Y .: An activ ation function adapting training algorithm for sigmoidal feed- forward networks. Neurocomputing 61, 429–437 (2004) 3. Chen, C.T ., Chang, W .D.: A feedforward neural network with function shape autotuning. Neural networks 9(4), 627–641 (1996) 4. Glorot, X., Bengio, Y .: Understanding the dif ficulty of training deep feedforward neural net- works. In: Int. conf. on artificial intell. and stat. pp. 249–256 (2010) 5. Goh, S., Mandic, D.: Recurrent neural networks with trainable amplitude of acti vation func- tions. Neural Networks 16(8), 1095–1100 (2003) 6. Guarnieri, S., Piazza, F ., Uncini, A.: Multilayer feedforward networks with adaptiv e spline activ ation function. IEEE Trans. Neural Netw . 10(3), 672–683 (1999) 7. Haykin, S.: Neural networks and learning machines. Pearson Education, 3rd edn. (2009) 8. Kingma, D., Ba, J.: Adam: A method for stochastic optimization. In: 3rd International Con- ference for Learning Representations (2015), arXiv preprint arXi v:1412.6980 9. Lin, M., Chen, Q., Y an, S.: Network in network. arXi v preprint arXiv:1312.4400 (2013) 10. Ma, L., Khorasani, K.: Constructi ve feedforward neural networks using hermite polynomial activ ation functions. IEEE Trans. Neural Netw . 16(4), 821–833 (2005) 11. Piazza, F ., Uncini, A., Zenobi, M.: Artificial neural networks with adaptive polynomial ac- tiv ation function. In: Int. Joint Conf. on Neural Networks. vol. 2, pp. II–343. IEEE/INNS (1992) 12. Rasmussen, C.: Gaussian processes for machine learning. MIT Press (2006) 13. Scarpiniti, M., Comminiello, D., Parisi, R., Uncini, A.: Nonlinear spline adaptiv e filtering. Signal Process. 93(4), 772–783 (2013) 14. Scarpiniti, M., Comminiello, D., Scarano, G., Parisi, R., Uncini, A.: Steady-state perfor- mance of spline adaptiv e filters. IEEE T rans. Signal Process. 64(4), 816–828 (2016) 15. Schmidhuber , J.: Deep learning in neural networks: An ov erview . Neural Networks 61, 85– 117 (2015) 16. T rentin, E.: Networks with trainable amplitude of activ ation functions. Neural Networks 14(4), 471–493 (2001) 17. V ecci, L., Piazza, F ., Uncini, A.: Learning and approximation capabilities of adaptiv e spline activ ation function neural networks. Neural Networks 11(2), 259–270 (1998) 18. W ahba, G.: Spline models for observational data. Siam (1990) 19. W ang, Y ., Shen, D., T eoh, E.: Lane detection using catmull-rom spline. In: IEEE Int. Conf. on Intell. V ehicles. pp. 51–57 (1998) 20. Zhang, M., Xu, S., Fulcher , J.: Neuron-adaptive higher order neural-network models for automated financial data modeling. IEEE T rans. Neural Netw . 13(1), 188–204 (2002)

Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Leave a Comment