In discriminating between objects from different classes, the more separable these classes are the less computationally expensive and complex a classifier can be used. One thus seeks a measure that can quickly capture this separability concept between classes whilst having an intuitive interpretation on what it is quantifying. A previously proposed separability measure, the separability index (SI) has been shown to intuitively capture the class separability property very well. This short note highlights the limitations of this measure and proposes a slight variation to it by combining it with another form of separability measure that captures a quantity not covered by the Separability Index.
Deep Dive into A note on the separability index.
In discriminating between objects from different classes, the more separable these classes are the less computationally expensive and complex a classifier can be used. One thus seeks a measure that can quickly capture this separability concept between classes whilst having an intuitive interpretation on what it is quantifying. A previously proposed separability measure, the separability index (SI) has been shown to intuitively capture the class separability property very well. This short note highlights the limitations of this measure and proposes a slight variation to it by combining it with another form of separability measure that captures a quantity not covered by the Separability Index.
In object categorization/classification one is given a dataset of objects from different classes from which to discover a class-distinguishing-pattern so as to predict the classification of new, previously unseen objects [1,7]. This will only be possible if the main justification pillar of induction systems which is based on the dictum; "similar objects tend to cluster together" is true. This process of discovering a pattern in the dataset is further complicated by the fact that the dataset often cannot immediately be visualized to determine the class distribution. This could be due to the datasets' high dimensionality. Discovering a method that can distil such information, without running multiple sets of computationally expensive classifiers, would be advantageous. This method should quantify how the classes are distributed with respect to each other; are there class overlaps, are there multiple modes within the classes and are there many outliers etc? We thus seek a simple measure that can concisely capture some of these aspects of the classes to gauge the complexity of classifier to be implemented. The notion of a 'simpler classifier' relates to the complexity of the discrimination function. A simpler function e.g. linear is preferred over a more complex polynomial function as stated by Occam's razor. The complexity of a classifier is also determined by the number of irrelevant features in the dataset. The original dataset input space -defined by the number of expertly measured attributes -is often not the optimal in terms of producing clearly separable/non-overlapping classes. A subset of this space can often produce a substantially separable set of classes which in turn results in a simpler discriminating function. Searching for an optimal sub-space can be considered an optimization problem whose criterion function is the maximization of some predefined separability measure. A recent review and comment on this area of research is presented in [4 and 6]. One measure, the separability index (SI), that intuitively measures the class overlap was previously introduced in [3,8] and was shown to be efficient in a number of popular machine learning datasets in [3,5].
The separability index measure estimates the average number of instances in a dataset that have a nearest neighbour with the same label. Since this is a fraction the index varies between 0-1 or 0-100%. Another separability measure, based on the class distance or margin is the Hypothesis margin (HM), introduced in [2]. It measures the distance between an object’s nearest neighbour of the same class (near-hit) and a nearest neighbour of the opposing class (near-miss) and sums over these. This means the larger the nearmiss distance and smaller the near-hit values, the larger the hypothesis margin will be. This note is only concerned with the above two mentioned measures’ limitations. In the next section we show with a simple example the behaviour of both the SI and HM. We highlight the advantages and disadvantages of SI and HM then we propose a hybrid of the two measures. The resulting measures’ pseudo code and behaviour are presented.
In this section the behaviour of both measures is simulated in an example where the separation of two Gaussian clusters is incrementally increased. This is taken to simulate the process of searching for an optimal feature space in a given high dimensional dataset. Figure 1 shows two Gaussian clusters that are initially overlapping with a SI of 0.54 or 54%. These clusters are incrementally separated, by varying one cluster’s centre distance from the other. Figure 2 shows the point where the SI measure is 1or 100%; a quadratic or cubic discriminator will certainly be enough to cleanly partition the clusters whereas a linear classifier might not without misclassification. Figure 3 shows a state where the two clusters are visually more fully separated than in figure 2 and certainly a linear function will be an adequate classifier for such class separations. Figure 4 shows the variation of the separability index with the increasing cluster distance. The SI measure is informative about the separability of the clusters below full separability (<=1) but is no longer informative when the classes separate further which can arise in practise. This is to be expected since the separability index does not measure class distances per se. The hypothesis margin on the other hand, shown in figure 5, keeps on measuring with no real informative limit on the quantity it is measuring except that the class separation distance is increasing. What is required is a measure that has the ability to intuitively inform on the class separability below 100%, a characteristic of the separability index and has the ability to continue measuring after 100% class separability, a characteristic of the hypothesis margin.
Merging the two measures will consist of two parts; the original SI and modified HM parts. The HM is modified by only initializing it
…(Full text truncated)…
This content is AI-processed based on ArXiv data.