Analysis and Optimization of fastText Linear Text Classifier

February 23, 2026

Reading time: 5 minute

...

📝 Abstract

The paper [1] shows that simple linear classifier can compete with complex deep learning algorithms in text classification applications. Combining bag of words (BoW) and linear classification techniques, fastText [1] attains same or only slightly lower accuracy than deep learning algorithms [2-9] that are orders of magnitude slower. We proved formally that fastText can be transformed into a simpler equivalent classifier, which unlike fastText does not have any hidden layer. We also proved that the necessary and sufficient dimensionality of the word vector embedding space is exactly the number of document classes. These results help constructing more optimal linear text classifiers with guaranteed maximum classification capabilities. The results are proven exactly by pure formal algebraic methods without attracting any empirical data.

💡 Analysis

🇰🇷 한글로 읽기

📄 Content

Analysis and Optimization of fastText Linear Text Classifier. Vladimir Zolotov and David Kung IBM T. J. Watson Research Center, Yorktown Heights, NY, USA zolotov@us.ibm.com, kung@us.ibm.com
Abstract The paper [1] shows that simple linear classifier can compete with complex deep learning algorithms in text classification applications. Combining bag of words (BoF) and linear classification techniques, fastText [1] attains same or only slightly lower accuracy than deep learning algorithms [2-9] that are orders of magnitude slower. We proved formally that fastText can be transformed into a simpler equivalent classifier, which unlike fastText does not have any hidden layer. We also proved that the necessary and sufficient dimensionality of the word vector embedding space is exactly the number of document classes.
These results help constructing more optimal linear text classifiers with guaranteed maximum classification capabilities. The results are proven exactly by pure formal algebraic methods without attracting any empirical data.

Introduction Text classification is a difficult important problem of Computational Linguistics and Natural Language Processing. Different types of neural networks (deep learning, convolutional, recurrent, LSTM, neural Turing machines, etc.) are used for text classification, often achieving significant success. Recently, a team of researchers (A. Joulin, E. Grave P. Bojanowski, T. Mikolov) [1] experimentally has shown that comparable results can be achieved by a simple linear classifier. Their tool fastText [1] can be trained to the accuracy achieved with more complex deep learning algorithms [2-9], but orders of magnitude faster, even without using a high-performance GPU. Exceptional performance of fastText is not a big surprise. It is a consequence of its very simple classification algorithm, and highly professional implementation in C++. High accuracy of very simple fastText algorithms is a clear indicator that the text classification problem is still not understood well enough to construct really efficient nonlinear classification models. Because of very high complexity of nonlinear classification models, their direct analysis is too difficult. A good understanding of simple linear classification algorithms like fastText is a key to constructing good nonlinear text classifiers. That was the main motivation for analyzing the fastText classification algorithm. On the other hand, the simplicity of fastText makes very conducive for formal analysis. In spite of its simplicity, fastText combines several very important techniques: bag of words (BoW), representation of words as vectors of linear space, and linear classification. Therefore, a thorough formal analysis of fastText can further our understanding of other text classification algorithms employing similar basic techniques. We obtained the following main results:  The linear hidden layer of fastText model is not required for improving classification accuracy. We formally proved that any fastText type classifier can be transformed into an equivalent classifier without a hidden layer.
 The sufficient number of dimensions of the vector space representing document words is equal to the number of the document classes.
 Any fastText classifier recognizing N classes of documents can be algebraically transformed into an equivalent classifier with word vectors selected from N-dimensional linear space.

 In the general case, the minimum dimensionality of word vectors is the number of the document classes. It means, it is possible to construct a text classification problem with N classes of documents such that, for word vectors of N-1 dimensional space, there is no fastText type classifier that correctly recognizes all classes. However, there exists fastText type classifier with N-dimensional word vectors, that can perform the required classification correctly.  By simple modification of the classification algorithm, it is possible to reduce the necessary and sufficient dimensionality of the vector space of word representations by 1. The above facts are proven using formal algebraic transformations. Therefore, these conclusions are exact and fully deterministic. The proven theoretical facts have practical value. From them it follows that increasing length of word vectors beyond the number of document classes cannot improve the classification accuracy of linear BoW classifier. On other hand, if word vectors have fewer dimensions than the number of the document classes, we may fail to achieve the maximum possible accuracy. Besides, we see that by adding a hidden linear layer we cannot improve the accuracy of linear BoW classifier. According to the proven facts, an LBoW text classifier guaranteeing maximum achievable accuracy has well defined structure: word vectors with as many coordinates as the number of document classes to be recognized, and no hidden layer

View Original ArXiv

This content is AI-processed based on ArXiv data.

Analysis and Optimization of fastText Linear Text Classifier

📝 Abstract

💡 Analysis

📄 Content

Table of Contents

Table of Contents

📝 Abstract

💡 Analysis

📄 Content

Start searching

No results found