Sentiment Analysis of Review Datasets Using Naive Bayes and K-NN Classifier
📝 Abstract
The advent of Web 2.0 has led to an increase in the amount of sentimental content available in the Web. Such content is often found in social media web sites in the form of movie or product reviews, user comments, testimonials, messages in discussion forums etc. Timely discovery of the sentimental or opinionated web content has a number of advantages, the most important of all being monetization. Understanding of the sentiments of human masses towards different entities and products enables better services for contextual advertisements, recommendation systems and analysis of market trends. The focus of our project is sentiment focussed web crawling framework to facilitate the quick discovery of sentimental contents of movie reviews and hotel reviews and analysis of the same. We use statistical methods to capture elements of subjective style and the sentence polarity. The paper elaborately discusses two supervised machine learning algorithms: K-Nearest Neighbour(K-NN) and Naive Bayes and compares their overall accuracy, precisions as well as recall values. It was seen that in case of movie reviews Naive Bayes gave far better results than K-NN but for hotel reviews these algorithms gave lesser, almost same accuracies.
💡 Analysis
The advent of Web 2.0 has led to an increase in the amount of sentimental content available in the Web. Such content is often found in social media web sites in the form of movie or product reviews, user comments, testimonials, messages in discussion forums etc. Timely discovery of the sentimental or opinionated web content has a number of advantages, the most important of all being monetization. Understanding of the sentiments of human masses towards different entities and products enables better services for contextual advertisements, recommendation systems and analysis of market trends. The focus of our project is sentiment focussed web crawling framework to facilitate the quick discovery of sentimental contents of movie reviews and hotel reviews and analysis of the same. We use statistical methods to capture elements of subjective style and the sentence polarity. The paper elaborately discusses two supervised machine learning algorithms: K-Nearest Neighbour(K-NN) and Naive Bayes and compares their overall accuracy, precisions as well as recall values. It was seen that in case of movie reviews Naive Bayes gave far better results than K-NN but for hotel reviews these algorithms gave lesser, almost same accuracies.
📄 Content
Sentiment Analysis of Review Datasets using Naïve Bayes’ and K-NN Classifier
Lopamudra Dey Sanjay Chakraborty
Department of Computer Science & Engineering Department of Computer Science & Engineering
Heritage Institute of Technology Institute of Engineering & Management
Kolkata, India Kolkata, India
Email: lopamudra.dey@heritageit.edu Email:sanjay.chakraborty@iemcal.com
Anuraag Biswas Beepa Bose Sweta Tiwari
Computer Science & Engineering Computer Science & Engineering Computer Science & Engineering
Heritage Institute of Technology Heritage Institute of Technology Heritage Institute of Technology
Kolkata, India Kolkata, India Kolkata, India
Email:anuraagbiswas111@gmail.com Email:beepabose@gmail.com Email:sweta.tiwari604@gmail.com
Abstract—The advent of Web 2.0 has led to an increase in
the amount of sentimental content available in the Web.
Such content is often found in social media web sites in the
form of movie or product reviews, user comments,
testimonials, messages in discussion forums etc. Timely
discovery of the sentimental or opinionated web content has
a number of advantages, the most important of all being
monetization. Understanding of the sentiments of human
masses towards different entities and products enables
better
services
for
contextual
advertisements,
recommendation systems and analysis of market trends.
The focus of our project is sentiment focussed web crawling
framework to facilitate the quick discovery of sentimental
contents of movie reviews and hotel reviews and analysis of
the same. We use statistical methods to capture elements of
subjective style and the sentence polarity. The paper
elaborately discusses two supervised machine learning
algorithms: K-Nearest Neighbour(K-NN) and Naïve Bayes’
and compares their overall accuracy, precisions as well as
recall values. It was seen that in case of movie reviews Naïve
Bayes’ gave far better results than K-NN but for hotel
reviews these algorithms gave lesser, almost same
accuracies.
Index Terms —Sentiment Analysis, Naïve Bayes’, K-NN, Supervised Machine Learning, Text Mining.
I. INTRODUCTION
Data mining is a process of mined valuable data from a
large set of data. Several analysis tools of data mining
(like, clustering, classification, regression etc,) can be
used for sentiment analysis task [13][14]. Sentiment
mining is one of the important aspects of data mining
where important data can be mined based on the positive
or negative senses of the collected data. Sentiment
Analysis also known as Opinion Mining refers to the use
of natural language processing, text analysis and
computational linguistic to identify and extract subjective
information in source materials.
Here the source materials refer to opinions / reviews
/comments given in various social networking sites
[1].The Sentiment found within comments, feedback or
critiques provide useful indicators for many different
purposes and can be categorized by polarity [2].By
polarity we tend to find out if a review is overall a
positive one or a negative one. For example:
Positive Sentiment in subjective sentence: “I loved the movie Mary Kom”—This sentence is expressed positive sentiment about the movie Mary Kom and we can decide that from the sentiment threshold value of word “loved”. So, threshold value of word “loved” has positive numerical threshold value.
Negative sentiment in subjective sentences: “Phata poster nikla hero is a flop movie” defined sentence is expressed negative sentiment about the movie named “Phata poster nikla hero” and we can decide that from the sentiment threshold value of word “flop”. So, threshold value of word “flop” has negative numerical threshold value. Sentiment Analysis is of three
different types: Document level, Sentence level and
Entity level. However we are studying phrase level sentiment analysis. The traditional text mining concentrates on analysis of facts whereas opinion mining deals with the attitudes [3]. The main fields of research are sentiment classification, feature based sentiment classification and opinion summarizing. Now, the use of sentiment analysis in a commercial environment is growing. This is evident in the increasing number of brand tracking and marke
This content is AI-processed based on ArXiv data.