The use of entropy to measure structural diversity

Reading time: 6 minute
...

📝 Original Info

  • Title: The use of entropy to measure structural diversity
  • ArXiv ID: 0810.3525
  • Date: 2008-10-21
  • Authors: Researchers from original ArXiv paper

📝 Abstract

In this paper entropy based methods are compared and used to measure structural diversity of an ensemble of 21 classifiers. This measure is mostly applied in ecology, whereby species counts are used as a measure of diversity. The measures used were Shannon entropy, Simpsons and the Berger Parker diversity indexes. As the diversity indexes increased so did the accuracy of the ensemble. An ensemble dominated by classifiers with the same structure produced poor accuracy. Uncertainty rule from information theory was also used to further define diversity. Genetic algorithms were used to find the optimal ensemble by using the diversity indices as the cost function. The method of voting was used to aggregate the decisions.

💡 Deep Analysis

Deep Dive into The use of entropy to measure structural diversity.

In this paper entropy based methods are compared and used to measure structural diversity of an ensemble of 21 classifiers. This measure is mostly applied in ecology, whereby species counts are used as a measure of diversity. The measures used were Shannon entropy, Simpsons and the Berger Parker diversity indexes. As the diversity indexes increased so did the accuracy of the ensemble. An ensemble dominated by classifiers with the same structure produced poor accuracy. Uncertainty rule from information theory was also used to further define diversity. Genetic algorithms were used to find the optimal ensemble by using the diversity indices as the cost function. The method of voting was used to aggregate the decisions.

📄 Full Content

There is still an immense need to develop robust and reliable classification of data. It has become apparent that as opposed to using one classifier an ensemble of classifiers performs better [1], [2], [3]. This is because a committee of classifiers is better than one classifier. However one of the question that arises is that, how do we measure the integrity of these committee in generalizing. The popular method that is used do gain confidence from the generalization ability of an ensemble is of inducing diversity within the ensemble. This therefore calls for a form of a method of measuring the diversity of the ensemble. Methods have been implemented to relate ensemble diversity with ensemble accuracy [4], [5], [6]. These methods use the outcomes of the individual classifiers of the ensemble to measure diversity [7], [8]. This means that diversity is induced by different training methods, popular ones being, boosting and bagging.

This paper deals with the measure of structural diversity of an ensemble by using entropy measures. Diversity is induced by varying the structural parameters of the classifiers [9]. The parameters of interest include the activation function, number of hidden nodes and the learning rate. This study therefore does not take into consideration the outcome of the individual classifiers to measure diversity but the individual structure of the classifiers of the ensemble. One of the statistical measures of variance such as the Khohavi variance method has already been used to measure structural diversity of an ensemble [9]. This study aims to find a suitable measure of structural diversity by using methods adopted in ecology and also use the concept of uncertainty adopted in information theory to better understand the ensemble diversity. The entropy measures are therefore aimed at bringing more knowledge to how diversity of an ensemble relates with the ensemble accuracy. However this study will only focus on three measures of diversity, Shannon, Simpson and Berger Parker to quantify structural diversity of the classifiers.

Shannon entropy has found its fame in information theory as it is used to measure the uncertainty of states [10]. In ecology Shannon is used to measure the diversity indices of species, however in this study instead of the biological species the individual classifiers are treated as species [11]. For example, if there are three species of different kind, two of the same kind and one of another kind, then that would replicate three MLP’s of different structural parameters.

The relationship between the classification accuracy and the entropy measures is attained by the use of genetic algorithms by using accuracy as the cost function [9]. There are a number of aggregation schemes such as minimum, maximum, product, average, simple majority, weighted majority, Naïve Bayes and decision templates to name a few [12], [13]. However for this study the majority vote scheme was used to aggregate the individual classifiers for a final solution. This paper includes a section on the background, Species and the Identity Structure (IDS), Renyi Entropy, Shannon entropy measure, Simpson Diversity Index, Berger Parker Index, The neural network parameters, Genetic Algorithms (GA), The model, The data used, Results and discussion and then lastly the conclusion.

Shannon entropy has been used in information theory to quantify uncertainty [10]. The meaning or implication of information is not dealt with in this paper. However this paper aims to use similar concepts. The more information one has the more certain one becomes [10], likewise we can postulate that the more diverse something is the more uncertain we become in knowing its decision or outcome. This can be accredited to the use of Shannon entropy to quantify uncertainty. Entropy measures have been used to compute species population diversity [14], however in this paper a committee of classifiers with different parameters is considered as a committee of species.

The ensemble of the classifiers was then treated as species when viewed in the perspective of ecology or as population in statistics. However before the ecological methods can be applied in giving an indication of structural diversity, it was important that the classifiers have a unique identity. This was due to the fact that the ensemble was composed of classifiers with different machine parameters such as the hidden nodes, learning rates and the type of the activation function used. The Identity Structure (IDS) was converted to a binary string so as to mimic a gene type unique for the classifiers.

Five learning rates were considered and three activation functions just as in [9]. The number of hidden nodes was between 7 and 21. They were made larger than the attributes so as to have classifiers that could generalize well and then less than 21 so as to reduce the computational costs. The learning rates considered were: 0.01, 0.02, 0.03, 0.04, 0.05 and the activation functio

…(Full text truncated)…

📸 Image Gallery

cover.png

Reference

This content is AI-processed based on ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut