Analysis and Optimization of fastText Linear Text Classifier
📝 Abstract
The paper [1] shows that simple linear classifier can compete with complex deep learning algorithms in text classification applications. Combining bag of words (BoW) and linear classification techniques, fastText [1] attains same or only slightly lower accuracy than deep learning algorithms [2-9] that are orders of magnitude slower. We proved formally that fastText can be transformed into a simpler equivalent classifier, which unlike fastText does not have any hidden layer. We also proved that the necessary and sufficient dimensionality of the word vector embedding space is exactly the number of document classes. These results help constructing more optimal linear text classifiers with guaranteed maximum classification capabilities. The results are proven exactly by pure formal algebraic methods without attracting any empirical data.
💡 Analysis
The paper [1] shows that simple linear classifier can compete with complex deep learning algorithms in text classification applications. Combining bag of words (BoW) and linear classification techniques, fastText [1] attains same or only slightly lower accuracy than deep learning algorithms [2-9] that are orders of magnitude slower. We proved formally that fastText can be transformed into a simpler equivalent classifier, which unlike fastText does not have any hidden layer. We also proved that the necessary and sufficient dimensionality of the word vector embedding space is exactly the number of document classes. These results help constructing more optimal linear text classifiers with guaranteed maximum classification capabilities. The results are proven exactly by pure formal algebraic methods without attracting any empirical data.
📄 Content
1
Analysis and Optimization of fastText Linear Text Classifier.
Vladimir Zolotov and David Kung
IBM T. J. Watson Research Center, Yorktown Heights, NY, USA
zolotov@us.ibm.com, kung@us.ibm.com
Abstract
The paper [1] shows that simple linear classifier can compete with complex deep learning algorithms in
text classification applications. Combining bag of words (BoF) and linear classification techniques, fastText
[1] attains same or only slightly lower accuracy than deep learning algorithms [2-9] that are orders of
magnitude slower. We proved formally that fastText can be transformed into a simpler equivalent
classifier, which unlike fastText does not have any hidden layer. We also proved that the necessary and
sufficient dimensionality of the word vector embedding space is exactly the number of document classes.
These results help constructing more optimal linear text classifiers with guaranteed maximum
classification capabilities. The results are proven exactly by pure formal algebraic methods without
attracting any empirical data.
- Introduction
Text classification is a difficult important problem of Computational Linguistics and Natural Language
Processing. Different types of neural networks (deep learning, convolutional, recurrent, LSTM, neural
Turing machines, etc.) are used for text classification, often achieving significant success.
Recently, a team of researchers (A. Joulin, E. Grave P. Bojanowski, T. Mikolov) [1] experimentally has
shown that comparable results can be achieved by a simple linear classifier. Their tool fastText [1] can be
trained to the accuracy achieved with more complex deep learning algorithms [2-9], but orders of
magnitude faster, even without using a high-performance GPU.
Exceptional performance of fastText is not a big surprise. It is a consequence of its very simple
classification algorithm, and highly professional implementation in C++. High accuracy of very simple
fastText algorithms is a clear indicator that the text classification problem is still not understood well
enough to construct really efficient nonlinear classification models.
Because of very high complexity of nonlinear classification models, their direct analysis is too difficult. A
good understanding of simple linear classification algorithms like fastText is a key to constructing good
nonlinear text classifiers. That was the main motivation for analyzing the fastText classification algorithm.
On the other hand, the simplicity of fastText makes very conducive for formal analysis. In spite of its
simplicity, fastText combines several very important techniques: bag of words (BoW), representation of
words as vectors of linear space, and linear classification. Therefore, a thorough formal analysis of fastText
can further our understanding of other text classification algorithms employing similar basic techniques.
We obtained the following main results:
The linear hidden layer of fastText model is not required for improving classification accuracy. We
formally proved that any fastText type classifier can be transformed into an equivalent classifier
without a hidden layer.
The sufficient number of dimensions of the vector space representing document words is equal to the number of the document classes.
Any fastText classifier recognizing N classes of documents can be algebraically transformed into an equivalent classifier with word vectors selected from N-dimensional linear space.
2
In the general case, the minimum dimensionality of word vectors is the number of the document classes. It means, it is possible to construct a text classification problem with N classes of documents such that, for word vectors of N-1 dimensional space, there is no fastText type classifier that correctly recognizes all classes. However, there exists fastText type classifier with N-dimensional word vectors, that can perform the required classification correctly. By simple modification of the classification algorithm, it is possible to reduce the necessary and sufficient dimensionality of the vector space of word representations by 1. The above facts are proven using formal algebraic transformations. Therefore, these conclusions are exact and fully deterministic. The proven theoretical facts have practical value. From them it follows that increasing length of word vectors beyond the number of document classes cannot improve the classification accuracy of linear BoW classifier. On other hand, if word vectors have fewer dimensions than the number of the document classes, we may fail to achieve the maximum possible accuracy. Besides, we see that by adding a hidden linear layer we cannot improve the accuracy of linear BoW classifier. According to the proven facts, an LBoW text classifier guaranteeing maximum achievable accuracy has well defined structure: word vectors with as many coordinates as the number of document classes to be recognized, and no hidden layer
This content is AI-processed based on ArXiv data.