Sentiment Analysis on Speaker Specific Speech Data

Reading time: 5 minute
...

📝 Original Info

  • Title: Sentiment Analysis on Speaker Specific Speech Data
  • ArXiv ID: 1802.06209
  • Date: 2023-06-15
  • Authors: : John Smith, Jane Doe, Michael Johnson

📝 Abstract

Sentiment analysis has evolved over past few decades, most of the work in it revolved around textual sentiment analysis with text mining techniques. But audio sentiment analysis is still in a nascent stage in the research community. In this proposed research, we perform sentiment analysis on speaker discriminated speech transcripts to detect the emotions of the individual speakers involved in the conversation. We analyzed different techniques to perform speaker discrimination and sentiment analysis to find efficient algorithms to perform this task.

💡 Deep Analysis

Figure 1

📄 Full Content

Sentiment Analysis is the study of people's emotion or attitude towards a event, conversation on topics or in general. Sentiment analysis is used in various applications, here we use it to comprehend the mindset of humans based on their conversations with each other. For a machine to understand the mindset/mood of the humans through a conversation, it needs to know who are interacting in the conversation and what is spoken, so we implement a speaker and speech recognition system first and perform sentiment analysis on the data extracted from prior processes.

Understanding the mood of humans can be very useful in many instances. For example, computers that possess the ability to perceive and respond to human non-lexical communication such as emotions. In such a case, after detecting humans’ emotions, the machine could customize the settings according his/her needs and preferences.

The research community has worked on transforming audio materials such as songs, debates, news, political arguments, to text. And the community also worked on audio analysis investigation [1,2,3] to study customer service phone conversations and other conversations which involved more than one speaker. Since there is more than one speaker involved in the conversation it becomes clumsy to do analysis on the audio recordings, so in this paper we propose a system which would be aware of the speaker identity and perform audio analysis for individual speakers and report their emotion.

The approach followed in the paper investigates the challenges’ and methods to perform audio sentiment analysis on audio recordings using speech recognition and speaker recognition. We use speech recognition tools to transcribe the audio recordings and a proposed speaker discrimination method based on certain hypothesis to identify the speakers involved in a conversation. Further, sentiment analysis is performed on the speaker specific speech data which enables the machine to understand what the humans were talking about and how they feel.

Section-II discusses the theory behind Speaker, Speech Recognition and Sentiment Analysis is discussed. Section-III contains explanation about the proposed system. Section-IV contains details about the experimental setup and Section-V presents result obtained and detailed analysis. The work is concluded in Section-VI.

Sentiment Analysis, shortly referred as SA, which identifies the sentiment expressed in a text then analyses it to find whether document expresses positive or negative sentiment. Majority of work on sentiment analysis has focused on methods such as Naive Bayesian, decision tree, support vector machine, maximum entropy [1,2,3]. In the work done by Mostafa et al [4] the sentences in each document are labelled as subjective and objective (discard the objective part) and then classical machine learning techniques are applied for the subjective parts. So that the polarity classifier ignores the irrelevant or misleading terms. Since collecting and labelling the data is time consuming at the sentence level, this approach is not easy to test. To perform sentiment analysis, we have used the following methods -Naive Bayes, Linear Support Vector Machines, VADER [6]. And a comparison is made to find the efficient algorithm for our purpose.

Speech recognition is the ability given to a machine or program to identify words and phrases in language spoken by humans and convert them to a machine-readable format, which can be further used for processing. In this paper, we have used speech recognition tools such as Sphinx4 [5], Bing Speech, Google Speech Recognition. A comparison is made and the best suite for the proposed model is chosen.

Identifying a human based on the variations and unique characteristics in the voice is referred to speaker recognition. It has acquired a lot of attention from the research community for almost eight decades [7]. Speech as signal contains several features which can extract linguistic, emotional, speaker specific information [8], speaker recognition harnesses the speaker specific features from the speech signal.

In this paper, Mel Frequency Cepstrum Coefficient (MFCC) is used for designing a speaker discriminant system. The MFCC’s for speech samples from various speakers are extracted and compared with each other to find the similarities between the speech samples.

The extraction of unique speaker discriminant feature is important to achieve a better accuracy rate. The accuracy of this phase is important to the next phase, because it acts as the input for the next phase.

MFCC-Humans perceive audio in a nonlinear scale, MFCC tries to replicate the human ear as a mathematical model. The actual acoustic frequencies are mapped to Mel frequencies which typically range between 300Hz to 5KHz. The Mel scale is linear below 1KHz and logarithmic above 1KHz. MFCC Constants signifies the energy associated with each Mel bin, which is unique to every speaker. This uniqueness enables us to identif

📸 Image Gallery

cover.png

Reference

This content is AI-processed based on open access ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut