Sentiment Analysis of Comments on Rohingya Movement with Support Vector Machine

Sentiment Analysis of Comments on Rohingya Movement with Support Vector   Machine
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

The Rohingya Movement and Crisis caused a huge uproar in the political and economic state of Bangladesh. Refugee movement is a recurring event and a large amount of data in the form of opinions remains on social media such as Facebook, with very little analysis done on them.To analyse the comments based on all Rohingya related posts, we had to create and modify a classifier based on the Support Vector Machine algorithm. The code is implemented in python and uses scikit-learn library. A dataset on Rohingya analysis is not currently available so we had to use our own data set of 2500 positive and 2500 negative comments. We specifically used a support vector machine with linear kernel. A previous experiment was performed by us on the same dataset using the naive bayes algorithm, but that did not yield impressive results.


💡 Research Summary

The paper presents a case study on sentiment analysis of public opinion regarding the Rohingya refugee crisis in Bangladesh, using a Support Vector Machine (SVM) classifier built from scratch. Because no publicly available Rohingya‑related sentiment dataset exists, the authors collected and manually labeled 5,000 Facebook comments—2,500 positive (approval of refugee admission) and 2,500 negative (disapproval). The study is motivated by the need for quantitative tools that can help policymakers and researchers gauge the highly politicised sentiment surrounding the crisis.

The methodology follows a conventional NLP pipeline but is tailored to the idiosyncrasies of social‑media text. Pre‑processing steps include lower‑casing, URL and username normalization, hashtag handling (removing the ‘#’ symbol), reduction of character repetitions, emoticon mapping to explicit sentiment tokens, and stemming with the Porter algorithm. Notably, when an emoticon is detected the system bypasses the SVM and directly assigns a sentiment label, reflecting a pragmatic shortcut for obvious affective cues.

Feature extraction employs scikit‑learn’s TfidfVectorizer with the following settings: min_df = 5, max_df = 0.95, sublinear_tf = True, use_idf = True, and ngram_range = (1, 2). This configuration captures both unigrams and bigrams, allowing the model to consider limited context while keeping the feature space manageable. The classifier itself is a LinearSVC with a regularization parameter C = 0.1, chosen after a brief hyper‑parameter search. Linear kernels were preferred for their computational efficiency on high‑dimensional sparse data and to avoid over‑fitting given the modest dataset size.

Evaluation uses a 20 % hold‑out test set (i.e., cross‑validation on the training corpus). The SVM achieves an overall accuracy of 79 %, with class‑wise precision/recall/F1 scores of 0.78/0.79/0.78 for the negative class and 0.81/0.81/0.81 for the positive class. These results are presented in a confusion matrix‑style table and visualized with bar charts. For comparison, a Naïve Bayes classifier using only unigrams was also trained on the same data, yielding a markedly lower accuracy of 67 %. The authors attribute the performance gap to the inability of the Naïve Bayes model to capture the nuanced, context‑dependent language typical of political discourse.

The related‑work section surveys a broad spectrum of sentiment‑analysis literature, ranging from early English product‑review studies to recent Bengali‑language efforts involving SVM, Maximum Entropy, word embeddings, and deep learning. The authors position their contribution as the first to provide a labeled Rohingya‑specific corpus and to demonstrate that a relatively simple linear SVM can outperform a baseline Naïve Bayes approach on this domain.

In the discussion, the authors acknowledge several limitations: the dataset is relatively small, the binary labeling scheme excludes neutral or mixed sentiments, and the model ignores word order and deeper syntactic structures, which can be crucial for disambiguating politically charged statements. They also note that while non‑linear kernels (e.g., RBF) have shown superior performance on movie‑review datasets, the linear kernel sufficed for their task.

Future work is outlined as follows: (1) expanding the corpus in size and linguistic diversity (including multilingual comments), (2) incorporating contextual embeddings such as BERT or RoBERTa to capture richer semantics, (3) exploring hybrid models that combine handcrafted TF‑IDF features with deep‑learning representations, and (4) extending the label set to include neutral or multi‑class sentiment categories. The authors argue that these enhancements would make the system more robust for real‑time monitoring of public opinion, thereby providing valuable insights for governmental decision‑making and academic research on conflict‑related sentiment dynamics.


Comments & Academic Discussion

Loading comments...

Leave a Comment