Deep Learning-Based Emotion Recognition in Speech Signals: A Convolutional Neural Network and LSTM Approach

Authors:
A. Anushya, Sabiha Begum, Savita Shiwani, Ayush Shrivastava

Addresses:
Department of Artificial Intelligence and Data Science, College of Computer Science and Engineering, University of Hail, Hail, Kingdom of Saudi Arabia. Department of Computer Science and Engineering, College of Computer Science and Engineering, University of Hail, Hail, Kingdom of Saudi Arabia.  Department of Computer Science and Engineering, Poornima University, Jaipur, India. Department of Data Science Engineering, Aadhar Housing Finance Ltd. Owned by Blackstone (US), Mumbai, India. 

Abstract:

Deep learning creates a hybrid CNN-LSTM model for voice signal emotion prediction. The design addresses voice emotion recognition issues with both neural architectures. CNN feature extractors can create abstract representations from raw audio waveforms. The CNN finds crucial patterns like spectral and temporal data, eliminating human feature engineering, a bottleneck in prior methods. LSTM networks handle temporal dependencies and sequential data well with gathered properties. The gated LSTM learns speech dynamics and emotions by retaining contextual information. The TESS was utilized to train and evaluate the suggested model. The approach improves speech emotion prediction with 99.29% classification accuracy. This high accuracy shows the model’s architecture and ability to generalize across emotional states in the dataset. Study findings affect many applications. Emotional computing is more personalised and sensitive, as is voice emotion recognition. Speech recognition emotion detection enhances context awareness and reaction tailoring. Emotional understanding makes human-computer connection more natural. This shows that CNNs and LSTMs can record spatial and temporal voice data, enhancing emotion recognition. The good performance on a tough dataset like TESS implies this approach can be utilised in real-world scenarios, enabling additional advancements. 

Keywords: Speech Emotion Prediction; Deep Learning; Keras Library; Pipeline for Certain Emotional States; Human-Computer Interaction; Affective Computing; Toronto Emotional Speech Set (TESS).

Received on: 20/04/2024, Revised on: 01/07/2024, Accepted on: 25/08/2024, Published on: 14/12/2024

AVE Trends in Intelligent Computing Systems, 2024 Vol. 1 No. 4, Pages: 198-208

  • Views : 266
  • Downloads : 16
Download PDF