Support vector machine for functional data classification

Reading time: 5 minute
...

📝 Original Info

  • Title: Support vector machine for functional data classification
  • ArXiv ID: 0705.0209
  • Date: 2007-05-23
  • Authors: ** 논문에 명시된 저자 정보가 제공되지 않았습니다. (원문에 저자 명단이 포함되지 않음) **

📝 Abstract

In many applications, input data are sampled functions taking their values in infinite dimensional spaces rather than standard vectors. This fact has complex consequences on data analysis algorithms that motivate modifications of them. In fact most of the traditional data analysis tools for regression, classification and clustering have been adapted to functional inputs under the general name of functional Data Analysis (FDA). In this paper, we investigate the use of Support Vector Machines (SVMs) for functional data analysis and we focus on the problem of curves discrimination. SVMs are large margin classifier tools based on implicit non linear mappings of the considered data into high dimensional spaces thanks to kernels. We show how to define simple kernels that take into account the unctional nature of the data and lead to consistent classification. Experiments conducted on real world data emphasize the benefit of taking into account some functional aspects of the problems.

💡 Deep Analysis

📄 Full Content

In many real world applications, data should be considered as discretized functions rather than as standard vectors. In these applications, each observation corresponds to a mapping between some conditions (that might be implicit) and the observed response. A well studied example of those functional data is given by spectrometric data (see section 6.3): each spectrum is a function that maps the wavelengths of the illuminating light to the corresponding absorbances (the responses) of the studied sample. Other natural examples can be found in voice recognition area (see sections 6.1 and 6.2) or in meteorological problems, and more generally, in multiple time series analysis where each observation is a complete time series.

The direct use of classical models for this type of data faces several difficulties: as the inputs are discretized functions, they are generally represented by high dimensional vectors whose coordinates are highly correlated. As a consequence, classical methods lead to ill-posed problems, both on a theoretical point of view (when working in functional spaces that have infinite dimension) and on a practical one (when working with the discretized functions). The goal of Functional Data Analysis (FDA) is to use, in data analysis algorithms, the underlying functional nature of the data: many data analysis methods have been adapted to functions (see [29] for a comprehensive introduction to functional data analysis and a review of linear methods). While the original papers on FDA focused on linear methods such as Principal Component Analysis [10,8,9,2] and the linear model [30,16,18], non linear models have been studied extensively in the recent years. This is the case, for instance, of most neural network models [14,31,32,33].

In the present paper, we adapt Support Vector Machines (SVMs, see e.g. [42,7]) to functional data classification (the paper extends results from [34,44]). We show in particular both the practical and theoretical advantages of using functional kernels, which are kernels that take into account the functional nature of the data. On a practical point of view, those kernels allow to take advantage of the expert knowledge on the data. On the theoretical point of view, a specific type of functional kernels allows the construction of a consistent training procedure for functional SVMs.

The paper is organized as follow: section 2 presents the functional data classification and why it generally leads to ill-posed problems. Section 3 provides a short introduction to SVMs and explains why their generalization to FDA can lead to particular problems. Section 4 describes several functional kernels and explains how they can be practically computed while section 5 presents a consistency result for some of them. Finally, section 6 illustrates the various approaches presented in the paper on real data sets.

To simplify the presentation, this article focuses on functional data for which each observation is described by one function from R to R. Extension to the case of several real valued functions is straightforward. More formally, if µ denotes a known finite positive Borel measure on R, an observation is an element of L 2 (µ), the Hilbert space of µ-square-integrable real valued functions defined on R. In some situations, additional regularity assumptions (e.g., existence of derivatives) will be needed.

However, almost all the developments of this paper are not specific to functions and use only the Hilbert space structure of L 2 (µ). We will therefore denote X an arbitrary Hilbert space and ., . the corresponding inner product. Additional assumptions on X will be given on a case by case basis. As stated above, the most common situation will of course be X = L 2 (µ) with u, v = uvdµ.

It should be first noted that many data analysis algorithms can be written so as to apply, at least on a theoretical point of view, to arbitrary Hilbert spaces. This is obviously the case, for instance, for distance-based algorithms such as the k-nearest neighbor method. Indeed, this algorithm uses only the fact that distances between observations can be calculated. Obviously, it can be applied to Hilbert spaces using the distance induced by the inner product. This is also the case of methods directly based on inner products such as multilayer perceptrons (see [35,36,41] for a presentation of multi-layer perceptrons with almost arbitrary input spaces, including Hilbert spaces).

However, functional spaces have infinite dimension and a basic transposition of standard algorithms introduces both theoretical and practical difficulties. In fact, some simple problems in R d become ill-posed in X when the space has infinite dimension, even on a theoretical point of view.

Let us consider for instance the linear regression model in which a real valued target variable Y is modeled by E(Y |X) = H(X) where H is a linear continuous operator defined on the input space. When X has values in R d (i.e., X = R d ), H can

Reference

This content is AI-processed based on open access ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut