Context-aware Sentiment Word Identification: sentiword2vec
📝 Abstract
Traditional sentiment analysis often uses sentiment dictionary to extract sentiment information in text and classify documents. However, emerging informal words and phrases in user generated content call for analysis aware to the context. Usually, they have special meanings in a particular context. Because of its great performance in representing inter-word relation, we use sentiment word vectors to identify the special words. Based on the distributed language model word2vec, in this paper we represent a novel method about sentiment representation of word under particular context, to be detailed, to identify the words with abnormal sentiment polarity in long answers. Result shows the improved model shows better performance in representing the words with special meaning, while keep doing well in representing special idiomatic pattern. Finally, we will discuss the meaning of vectors representing in the field of sentiment, which may be different from general object-based conditions.
💡 Analysis
Traditional sentiment analysis often uses sentiment dictionary to extract sentiment information in text and classify documents. However, emerging informal words and phrases in user generated content call for analysis aware to the context. Usually, they have special meanings in a particular context. Because of its great performance in representing inter-word relation, we use sentiment word vectors to identify the special words. Based on the distributed language model word2vec, in this paper we represent a novel method about sentiment representation of word under particular context, to be detailed, to identify the words with abnormal sentiment polarity in long answers. Result shows the improved model shows better performance in representing the words with special meaning, while keep doing well in representing special idiomatic pattern. Finally, we will discuss the meaning of vectors representing in the field of sentiment, which may be different from general object-based conditions.
📄 Content
1
Context-aware Sentiment Word Identification: sentiword2vec Yushi Yao, Guangjian Li Department of Information Management, Peking University, China {yaoyushi, ligj}@pku.edu.cn
Abstract Traditional sentiment analysis often uses sentiment dictionary to extract sentiment information in text and classify documents. However, emerging informal words and phrases in user generated content call for analysis aware to the context. Usually, they have special meanings in a particular context. Because of its great performance in representing inter-word relation, we use sentiment word vectors to identify the special words. Based on the distributed language model word2vec, in this paper we represent a novel method about sentiment representation of word under particular context, to be detailed, to identify the words with abnormal sentiment polarity in long answers. Result shows the improved model shows better performance in representing the words with special meaning, while keep doing well in representing special idiomatic pattern. Finally, we will discuss the meaning of vectors representing in the field of sentiment, which may be different from general object-based conditions. Keywords: Sentiment Analysis; Distributed Language; Context awareness; Sentiment Word Identification
1 Introduction Different from traditional news corpus, user generated content is informal in linguistics. While they constantly create new words and phrases, some informal words express more meaning. It challenges traditional sentiment analysis methods as provide us more vivid corpus to explore human beings’ sentiment expression. Exploring this kind of new words requires deeply making use of context, especially semantic meaning. Using distributed language methods in a particular context provide insight to latent language meaning, while shows superiority in context aware and analogy. Sometimes, in a special language context, some words would have special meaning different from normal environment. As the social media developing, more online community the cluster effect. For example, “refugee” is a positive word in the discussion of human rights union while negative among the real estate holder. 2
In this paper, we use a model based on word2vec to find out the special word in a particular
context. Different from researches on short information flow like twitter, we use long articles,
answers to a question posted in social media as corpus. Using our model, we provide a method
dipping into the latent sentiment tendency in long social articles. After training vectors using
word2vec, we change the vectors of words with known sentiment polarity and train them again
controlling iteration times. The special words in a particular context are detected in the model,
while a better vector expression of them is presented.
2 Literature Review
As sentiment reflects more latent information in text, the meanings that sentiment words
contain are often context-specific. This nature led to general sentiment feature processing
problematic. Common is that a word, or phrase, is positive in one context while negative in
another. So considering more contextual-information such as topic or domain is essential. In this
process, identifying new sentiment words and new meaning of sentiment words in context is
important. Additionally, informal words and phrases play an important role in user generated
content (UGC). But the emerging informal words usually do not disappear in sentiment
dictionaries. So to address the special words is a key issue.
Though studies exploring sentiment features are plentiful, many of which is based on
N-gram (Cambria, Havasi & Hussain, 2012). In other words, the construction of sentiment
features usually shows have dependency on context. However, it is not enough. The latent nature
of sentiment word meaning led to special processing in identification.
Researchers have spent a lot of time on finding and selecting typical sentiment features.
However, this traditional way to construct sentiment features makes it difficult to go further in
latent sentiment features (Asghar, Khan, Ahmad & Kundi, 2014), especially context-aware
information.
2.1 Lexicon-based Identification
In a particular context, the sentiment polarity of some words may differ from its polarity in
general context. In forum discussion, an unknown file is neural, while an unknown politician is
usually negative. In a way, politician is regarded negative in this context. It is difficult for normal
sentiment analysis to recognize this special sentiment expression. So, we need special processing
to recognize and predict the rich meaning those words contain.
Recognizing such special lexicon first is a common method. A traditional approach starts
from semantic lexicon such as WordNet to build a conceptual semantic dictionary such as
WordNet-Affect (Strapparava & Valitutti, 2004), SenticNet (Cambriaet al, 2012) etc. Common
furt
This content is AI-processed based on ArXiv data.