Sentiment Analysis of Review Datasets Using Naive Bayes and K-NN Classifier

Reading time: 5 minute
...

📝 Abstract

The advent of Web 2.0 has led to an increase in the amount of sentimental content available in the Web. Such content is often found in social media web sites in the form of movie or product reviews, user comments, testimonials, messages in discussion forums etc. Timely discovery of the sentimental or opinionated web content has a number of advantages, the most important of all being monetization. Understanding of the sentiments of human masses towards different entities and products enables better services for contextual advertisements, recommendation systems and analysis of market trends. The focus of our project is sentiment focussed web crawling framework to facilitate the quick discovery of sentimental contents of movie reviews and hotel reviews and analysis of the same. We use statistical methods to capture elements of subjective style and the sentence polarity. The paper elaborately discusses two supervised machine learning algorithms: K-Nearest Neighbour(K-NN) and Naive Bayes and compares their overall accuracy, precisions as well as recall values. It was seen that in case of movie reviews Naive Bayes gave far better results than K-NN but for hotel reviews these algorithms gave lesser, almost same accuracies.

💡 Analysis

The advent of Web 2.0 has led to an increase in the amount of sentimental content available in the Web. Such content is often found in social media web sites in the form of movie or product reviews, user comments, testimonials, messages in discussion forums etc. Timely discovery of the sentimental or opinionated web content has a number of advantages, the most important of all being monetization. Understanding of the sentiments of human masses towards different entities and products enables better services for contextual advertisements, recommendation systems and analysis of market trends. The focus of our project is sentiment focussed web crawling framework to facilitate the quick discovery of sentimental contents of movie reviews and hotel reviews and analysis of the same. We use statistical methods to capture elements of subjective style and the sentence polarity. The paper elaborately discusses two supervised machine learning algorithms: K-Nearest Neighbour(K-NN) and Naive Bayes and compares their overall accuracy, precisions as well as recall values. It was seen that in case of movie reviews Naive Bayes gave far better results than K-NN but for hotel reviews these algorithms gave lesser, almost same accuracies.

📄 Content

Sentiment Analysis of Review Datasets using Naïve Bayes’ and K-NN Classifier

                      Lopamudra Dey                                                             Sanjay Chakraborty   
  Department of Computer Science & Engineering                    Department of Computer Science & Engineering 
              Heritage Institute of Technology                                        Institute of Engineering & Management 
                            Kolkata, India                                                                           Kolkata, India 
         Email: lopamudra.dey@heritageit.edu                                       Email:sanjay.chakraborty@iemcal.com 

               Anuraag Biswas                                 Beepa Bose                                      Sweta Tiwari 
   Computer Science & Engineering           Computer Science & Engineering          Computer Science & Engineering 
        Heritage Institute of Technology         Heritage Institute of Technology            Heritage Institute of Technology 
                     Kolkata, India                                           Kolkata, India                                      Kolkata, India 
    Email:anuraagbiswas111@gmail.com        Email:beepabose@gmail.com          Email:sweta.tiwari604@gmail.com 

Abstract—The advent of Web 2.0 has led to an increase in the amount of sentimental content available in the Web. Such content is often found in social media web sites in the form of movie or product reviews, user comments, testimonials, messages in discussion forums etc. Timely discovery of the sentimental or opinionated web content has a number of advantages, the most important of all being monetization. Understanding of the sentiments of human masses towards different entities and products enables better services for contextual advertisements, recommendation systems and analysis of market trends. The focus of our project is sentiment focussed web crawling framework to facilitate the quick discovery of sentimental contents of movie reviews and hotel reviews and analysis of the same. We use statistical methods to capture elements of subjective style and the sentence polarity. The paper elaborately discusses two supervised machine learning
algorithms: K-Nearest Neighbour(K-NN) and Naïve Bayes’
and compares their overall accuracy, precisions as well as recall values. It was seen that in case of movie reviews Naïve Bayes’ gave far better results than K-NN but for hotel reviews these algorithms gave lesser, almost same accuracies.

Index Terms —Sentiment Analysis, Naïve Bayes’, K-NN, Supervised Machine Learning, Text Mining.

I. INTRODUCTION Data mining is a process of mined valuable data from a large set of data. Several analysis tools of data mining (like, clustering, classification, regression etc,) can be used for sentiment analysis task [13][14]. Sentiment mining is one of the important aspects of data mining where important data can be mined based on the positive or negative senses of the collected data. Sentiment Analysis also known as Opinion Mining refers to the use of natural language processing, text analysis and computational linguistic to identify and extract subjective information in source materials.
Here the source materials refer to opinions / reviews /comments given in various social networking sites [1].The Sentiment found within comments, feedback or critiques provide useful indicators for many different purposes and can be categorized by polarity [2].By polarity we tend to find out if a review is overall a positive one or a negative one. For example:

  1. Positive Sentiment in subjective sentence: “I loved the movie Mary Kom”—This sentence is expressed positive sentiment about the movie Mary Kom and we can decide that from the sentiment threshold value of word “loved”. So, threshold value of word “loved” has positive numerical threshold value.

  2. Negative sentiment in subjective sentences: “Phata poster nikla hero is a flop movie” defined sentence is expressed negative sentiment about the movie named “Phata poster nikla hero” and we can decide that from the sentiment threshold value of word “flop”. So, threshold value of word “flop” has negative numerical threshold value. Sentiment Analysis is of three
    different types: Document level, Sentence level and
    Entity level. However we are studying phrase level sentiment analysis. The traditional text mining concentrates on analysis of facts whereas opinion mining deals with the attitudes [3]. The main fields of research are sentiment classification, feature based sentiment classification and opinion summarizing. Now, the use of sentiment analysis in a commercial environment is growing. This is evident in the increasing number of brand tracking and marke

This content is AI-processed based on ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut