Personalization Effect on Emotion Recognition from Physiological Data: An Investigation of Performance on Different Setups and Classifiers
📝 Abstract
This paper addresses the problem of emotion recognition from physiological signals. Features are extracted and ranked based on their effect on classification accuracy. Different classifiers are compared. The inter-subject variability and the personalization effect are thoroughly investigated, through trial-based and subject-based cross-validation. Finally, a personalized model is introduced, that would allow for enhanced emotional state prediction, based on the physiological data of subjects that exhibit a certain degree of similarity, without the requirement of further feedback.
💡 Analysis
This paper addresses the problem of emotion recognition from physiological signals. Features are extracted and ranked based on their effect on classification accuracy. Different classifiers are compared. The inter-subject variability and the personalization effect are thoroughly investigated, through trial-based and subject-based cross-validation. Finally, a personalized model is introduced, that would allow for enhanced emotional state prediction, based on the physiological data of subjects that exhibit a certain degree of similarity, without the requirement of further feedback.
📄 Content
Personalization Effect on Emotion
Recognition from Physiological Data: An
Investigation of Performance on Different
Setups and Classifiers
Varvara Kollia
Abstract—This paper addresses the problem of emotion recognition from physiological signals. Features are extracted and
ranked based on their effect on classification accuracy. Different classifiers are compared. The inter-subject variability and the
personalization effect are thoroughly investigated, through trial-based and subject-based cross-validation. Finally, a
personalized model is introduced, that would allow for enhanced emotional state prediction, based on the physiological data of
subjects that exhibit a certain degree of similarity, without the requirement of further feedback.
Index Terms—cross validation, feature extraction, emotion recognition, machine learning, personalization, physiological
sensors, random forests
1 INTRODUCTION motion recognition is a rapidly increasing field today, as automation and personalization turn out to be some of the key elements in most human-computer inter- action (HCI) systems. The problem of machine emotional intelligence is very broad and multi-faceted; one of its challenges being the very fact that is hard to define it in an unambiguous way. There is no unique definition of emotion, and there is neither a specific method nor a par- ticular required dataset that is guaranteed to capture it. One of the most popular emotion definitions is the one of the six basic emotions by Paul Ekman [1]. The original six emotions he proposed are: anger, disgust, fear, happiness, sadness and surprise. Another very popular approach is the 2-dimensional emotion map, where each emotional state is projected on the orthogonal axes of valence and arousal [2]. A third dimension can be added to this space with the axis of dominance, see [3] and its related refer- ences. The axis of arousal corresponds to the variation of the level of calmness or excitement towards a stimulus. Valence is a measure of the degree of happiness or sad- ness the subject feels or how attracted or repulsed they are to an external factor/event. Finally, dominance repre- sents the level of empowerment. Representative emo- tional states in the 3D model of valence, arousal, and dominance can be found in [3, 4]. For the most part of this work, we will be working on the 2D valence-arousal space.
2 METHODOLOGY OVERVIEW
2.1 Dataset
The DEAP dataset [5] is one of the few publicly available
datasets that aims to recognize the emotional state of the
subject based on their electroencephalogram (EEG) and
peripheral physiological sensors. The experiment took
place in a controlled setup and it consists of two parts.
The first part contains the online self-assessment ratings
for 120 one-min extracts of music videos, rated by 14-16
volunteers based on arousal, valence and dominance.
Using as stimuli a selection of 40 of those pre-rated music
video clips, the EEG and the other biosensor data of 32
volunteers were recorded, along with partial frontal face
recordings and their rankings on valence, arousal, domi-
nance, liking and familiarity were collected.
32 EEG channels were collected from each of the subjects.
In addition, peripheral physiological signals were record-
ed. Part of the data were preprocessed (downsampled,
filtered, reordered and the artifacts were removed) [6].
For this part, the blood volume pressure (BVP) from the
plethysmograph, the galvanic skin response (GSR), the
temperature (T), the respiration amplitude (RESP), two
electromyograms ,- Zygomaticus Major EMG and Trape-
zius EMG-, and two electrooculograms (EOG),-horizontal
and vertical-, were used. The final dataset consists of the
40-channel physiological recordings of 32 subjects for 40
videos. The self-assesment is in terms of valence, arousal,
dominance and liking.
2.2 Problem SetUp The typical block diagram of a classification problem is shown in Figure 1. Starting from the raw data, digital sig- nal processing techniques [7] are usually used to remove noise and artifacts from the data. Author is with Intel Corp. 2200 Mission College Blvd, Santa Clara, CA 95054. E-mail: Varvara.Kollia@intel.com E
Fig.1. Block diagram of our classification problem.
Our data have been preconditioned, as explained in [6].
All data have been downsampled to 128 Hz and the 60-
sec trials were reordered in terms of video-order. Arti-
facts were removed and the data were prefiltered, as ap-
plicable.
For the temporal segmentation process, a uniform win-
dow of 1sec was applied. There is no overlap between
adjacent windows.
The feature extraction is used to reduce the size (dimen-
sionality) of the problem and generate more representa-
tive characteristics of the problem that would lead to
higher accuracy in classification. As the features are hand-
designed, special atten
This content is AI-processed based on ArXiv data.