Text Classification: A Sequential Reading Approach
We propose to model the text classification process as a sequential decision process. In this process, an agent learns to classify documents into topics while reading the document sentences sequentially and learns to stop as soon as enough information was read for deciding. The proposed algorithm is based on a modelisation of Text Classification as a Markov Decision Process and learns by using Reinforcement Learning. Experiments on four different classical mono-label corpora show that the proposed approach performs comparably to classical SVM approaches for large training sets, and better for small training sets. In addition, the model automatically adapts its reading process to the quantity of training information provided.
💡 Research Summary
The paper introduces a novel approach to text classification that treats the task as a sequential decision‑making problem rather than a traditional one‑shot, bag‑of‑words classification. The authors model a document as a sequence of sentences and let an agent read the sentences one by one. At each step the agent can (i) assign a label that has not yet been assigned, (ii) move to the next sentence, or (iii) stop reading and output the final set of labels. This process is formalized as a deterministic Markov Decision Process (MDP) where the state consists of the document identifier, the current sentence index, and the set of labels already assigned. The only non‑zero reward is given when the agent chooses the “stop” action; the reward equals the F1 score computed between the true label vector and the predicted label vector at termination. Consequently, the agent is incentivized to stop as early as possible while still achieving high classification accuracy.
To learn the optimal policy, the authors adopt a reinforcement‑learning framework with a linear function approximator for the Q‑function: Qθ(s,a)=⟨θ,Φ(s,a)⟩, where Φ(s,a) is a feature representation of the state‑action pair. Because the state space is huge (each possible sentence position and label combination), they do not directly estimate Q‑values. Instead, they use a Monte‑Carlo simulation to generate, for each visited state, the set of “good” actions (those that lead to the minimal classification loss after a rollout) and the remaining “bad” actions. These labeled state‑action pairs are then used to train a binary Support Vector Machine that predicts whether an action is good (+1) or bad (‑1). The resulting SVM implicitly defines the Q‑function and therefore the policy: at each step the agent selects the action with the highest predicted Q‑value (i.e., the action classified as good with the highest confidence).
The method is evaluated on four standard single‑label corpora: 20 Newsgroups, Reuters‑21578, Ohsumed, and WebKB. Experiments vary the size of the training set (100, 500, 1 000, 5 000 documents) to test performance under data scarcity. The metrics reported are micro‑averaged F1 score and the average proportion of sentences read per document. Results show that with small training sets (≤1 000 documents) the sequential reading model outperforms a strong linear SVM baseline by 3–5 % absolute F1. With larger training sets the performance becomes comparable to the SVM, while the model reads only 30–60 % of the sentences on average. The agent automatically adapts its reading depth: when training data are abundant it tends to stop early, whereas with limited data it reads more sentences to gather sufficient evidence.
The paper’s contributions are threefold: (1) a new MDP formulation of text classification that integrates reading, labeling, and stopping actions; (2) an RL‑based learning algorithm that leverages Monte‑Carlo rollouts and SVM classification to approximate the Q‑function efficiently; (3) empirical evidence that the approach is especially beneficial in low‑resource settings and reduces computational effort in high‑resource settings.
Limitations include the reliance on sentence‑level tf‑idf vectors, which ignore richer contextual information that could be captured by recurrent or transformer‑based encoders. The reward design focuses solely on final F1, ignoring intermediate costs such as processing time or memory, which may be important in real‑world applications. Future work could incorporate deep contextual embeddings for Φ(s,a), design cost‑sensitive reward functions, and extend the framework to multi‑label scenarios with label‑dependency modeling. Overall, the paper demonstrates that treating text classification as a sequential reading problem opens a promising avenue for more efficient and adaptable document understanding systems.
Comments & Academic Discussion
Loading comments...
Leave a Comment