Divisive-agglomerative algorithm and complexity of automatic classification problems
📝 Abstract
An algorithm of solution of the Automatic Classification (AC for brevity) problem is set forth in the paper. In the AC problem, it is required to find one or several artitions, starting with the given pattern matrix or dissimilarity, similarity matrix.
💡 Analysis
An algorithm of solution of the Automatic Classification (AC for brevity) problem is set forth in the paper. In the AC problem, it is required to find one or several artitions, starting with the given pattern matrix or dissimilarity, similarity matrix.
📄 Content
DIVISIVE-AGGLOMERATIVE ALGORITHM AND COMPLEXITY OF
AUTOMATIC CLASSIFICATION PROBLEMS
Alexander Rubchinsky
National Research University Higher School of Economics, Russian Federation,
International Laboratory of Decision Choice and Analysis
International University “Dubna”, Russian Federation,
Department of Applied Mathematics and Informatics,
Abstract An algorithm of solution of the Automatic Classification (AC for brevity) problem is set forth in the paper. In the AC problem, it is required to find one or several partitions, starting with the given pattern matrix or dissimilarity ∕ similarity matrix. The three-level scheme of the algorithm is suggested. At the internal level, the frequency minimax dichotomy algorithm is described. At the intermediate level, this algorithm is repeatedly used at alternations of divisive and agglomerative stages, which causes the construction of a classifications family. At the external level, several runs of the algorithm of the intermediate level are completed; thereafter among all the constructed classifications families the set of all the different classifications is selected. The latest set is taken as a set of all the solutions of the given AC problem. In many cases, this set of solutions can be significantly contracted (sometimes to one classification). The ratio of cardinality of the set of solutions to cardinality of the set of all the classifications found at the external level is taken as a measure of complexity of the initial AC problem. For classifications of parliament members according to their vote results, the general notion of complexity is interpreted as consistence or rationality of this parliament policy. For “tossing” deputies or ∕ and whole fractions the corresponding clusters become poorly distinguished and partially perplexing that results in relatively high value of complexity of their classifications. By contrast, under consistent policy, deputy’s clusters are clearly distinguished and the complexity level is low enough (i.e. in a given parliament the level of consistency, accordance, rationality is high). The mentioned reasoning was applied to analysis of activity of 2-nd, 3-rd and 4-th RF Duma (Russian parliament,1996- 2007). The classifications based upon one-month votes were constructed for every month. Calculation of an average complexity for every Duma have demonstrated its almost three times decrease in the 3-rd Duma as compared to the 2-nd Duma as well as its subsequent essential increase in the 4-th Duma as compared to the 3-nd Duma. The decrease of the suggested index was the most pronounced in 2002 in the wake of the “political peculiar point” – creation of the party “United Russia” 01.12.2001. In 2002 the complexity was equal to 0.096 that was significantly less when in any other year at the consider 12-years period. The introduced notions allow suggesting new meaningful interpretations of activity of various election bodies, including different country parliaments, international organizations and board of large corporations. 3
- Introduction
An experience in solving of various Automatic Classification (AC)
problems, both model and real ones, demonstrates that among them
simpler and more complicated problems can occur. In intuitively simple
situations finding classifications do not cast any doubt, while in more
complicated situations this is not the case. The causes might be different,
for instance:
classifications are not the unique ones;
the mere existence of classifications is not evident;
a classification is unique and intuitively clear but it is not clear how it can be found; search of classifications in real dimensions leads to significant computational difficulties. Other reasons can also determine the complexity of AC problems. However, these issues, despite of their practical and theoretical importance, are almost not considered in the literature, except for the analysis of computational complexity of some AC algorithms. Just the absence of the general formal notion of complexity of AC problems, as well as the absence of algorithms of their solutions that cope with problems of various complexity in the framework of one scheme, has initiated the present investigation.
The solution of an AC problem is understood as a family of classifi- cations that includes all reasonable (in some sense) classifications. The complexity of a problem is determined in the construction of the above mentioned family. Generally, the subsequent choice of one or several classifications can be accomplished on a basis of additional data by specialists in the considered specific domain, i.e. beyond the framework of the initial AC problem. The corresponding multi-criteria problem is not considered in the paper; only some reasoning concerning the possible criteria are given. Yet frequently encountered situations, in which intuitively ev
This content is AI-processed based on ArXiv data.