Abstract
Abstract A new architecture of intelligent
audio emotion recognition is proposed in this paper. It fully utilizes
both prosodic and spectral features in its design. It has two main paths
in parallel and can recognize 6 emotions. Path 1 is designed based on
intensive analysis of different prosodic features. Significant prosodic
features are identified to differentiate emotions. Path 2 is designed
based on research analysis on spectral features. Extraction of
Mel-Frequency Cepstral Coefficient (MFCC) feature is then followed by
Bi-directional Principle Component Analysis (BDPCA), Linear Discriminant
Analysis (LDA) and Radial Basis Function (RBF) neural classification.
This path has 3 parallel BDPCA + LDA + RBF sub-paths structure and each
handles two emotions. Fusion modules are also proposed for weights
assignment and decision making. The performance of the proposed
architecture is evaluated on eNTERFACE’05 and RML databases. Simulation
results and comparison have revealed good performance of the proposed
recognizer.