Context-aware Sentiment Word Identification: sentiword2vec

Reading time: 5 minute
...

📝 Abstract

Traditional sentiment analysis often uses sentiment dictionary to extract sentiment information in text and classify documents. However, emerging informal words and phrases in user generated content call for analysis aware to the context. Usually, they have special meanings in a particular context. Because of its great performance in representing inter-word relation, we use sentiment word vectors to identify the special words. Based on the distributed language model word2vec, in this paper we represent a novel method about sentiment representation of word under particular context, to be detailed, to identify the words with abnormal sentiment polarity in long answers. Result shows the improved model shows better performance in representing the words with special meaning, while keep doing well in representing special idiomatic pattern. Finally, we will discuss the meaning of vectors representing in the field of sentiment, which may be different from general object-based conditions.

💡 Analysis

Traditional sentiment analysis often uses sentiment dictionary to extract sentiment information in text and classify documents. However, emerging informal words and phrases in user generated content call for analysis aware to the context. Usually, they have special meanings in a particular context. Because of its great performance in representing inter-word relation, we use sentiment word vectors to identify the special words. Based on the distributed language model word2vec, in this paper we represent a novel method about sentiment representation of word under particular context, to be detailed, to identify the words with abnormal sentiment polarity in long answers. Result shows the improved model shows better performance in representing the words with special meaning, while keep doing well in representing special idiomatic pattern. Finally, we will discuss the meaning of vectors representing in the field of sentiment, which may be different from general object-based conditions.

📄 Content

1

Context-aware Sentiment Word Identification: sentiword2vec Yushi Yao, Guangjian Li Department of Information Management, Peking University, China {yaoyushi, ligj}@pku.edu.cn

Abstract Traditional sentiment analysis often uses sentiment dictionary to extract sentiment information in text and classify documents. However, emerging informal words and phrases in user generated content call for analysis aware to the context. Usually, they have special meanings in a particular context. Because of its great performance in representing inter-word relation, we use sentiment word vectors to identify the special words. Based on the distributed language model word2vec, in this paper we represent a novel method about sentiment representation of word under particular context, to be detailed, to identify the words with abnormal sentiment polarity in long answers. Result shows the improved model shows better performance in representing the words with special meaning, while keep doing well in representing special idiomatic pattern. Finally, we will discuss the meaning of vectors representing in the field of sentiment, which may be different from general object-based conditions. Keywords: Sentiment Analysis; Distributed Language; Context awareness; Sentiment Word Identification

1 Introduction Different from traditional news corpus, user generated content is informal in linguistics. While they constantly create new words and phrases, some informal words express more meaning. It challenges traditional sentiment analysis methods as provide us more vivid corpus to explore human beings’ sentiment expression. Exploring this kind of new words requires deeply making use of context, especially semantic meaning. Using distributed language methods in a particular context provide insight to latent language meaning, while shows superiority in context aware and analogy. Sometimes, in a special language context, some words would have special meaning different from normal environment. As the social media developing, more online community the cluster effect. For example, “refugee” is a positive word in the discussion of human rights union while negative among the real estate holder. 2

In this paper, we use a model based on word2vec to find out the special word in a particular context. Different from researches on short information flow like twitter, we use long articles, answers to a question posted in social media as corpus. Using our model, we provide a method dipping into the latent sentiment tendency in long social articles. After training vectors using word2vec, we change the vectors of words with known sentiment polarity and train them again controlling iteration times. The special words in a particular context are detected in the model, while a better vector expression of them is presented. 2 Literature Review As sentiment reflects more latent information in text, the meanings that sentiment words contain are often context-specific. This nature led to general sentiment feature processing problematic. Common is that a word, or phrase, is positive in one context while negative in another. So considering more contextual-information such as topic or domain is essential. In this process, identifying new sentiment words and new meaning of sentiment words in context is important. Additionally, informal words and phrases play an important role in user generated content (UGC). But the emerging informal words usually do not disappear in sentiment dictionaries. So to address the special words is a key issue. Though studies exploring sentiment features are plentiful, many of which is based on N-gram (Cambria, Havasi & Hussain, 2012). In other words, the construction of sentiment features usually shows have dependency on context. However, it is not enough. The latent nature of sentiment word meaning led to special processing in identification.
Researchers have spent a lot of time on finding and selecting typical sentiment features. However, this traditional way to construct sentiment features makes it difficult to go further in latent sentiment features (Asghar, Khan, Ahmad & Kundi, 2014), especially context-aware information. 2.1 Lexicon-based Identification In a particular context, the sentiment polarity of some words may differ from its polarity in general context. In forum discussion, an unknown file is neural, while an unknown politician is usually negative. In a way, politician is regarded negative in this context. It is difficult for normal sentiment analysis to recognize this special sentiment expression. So, we need special processing to recognize and predict the rich meaning those words contain.
Recognizing such special lexicon first is a common method. A traditional approach starts from semantic lexicon such as WordNet to build a conceptual semantic dictionary such as WordNet-Affect (Strapparava & Valitutti, 2004), SenticNet (Cambriaet al, 2012) etc. Common furt

This content is AI-processed based on ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut