Decoding Emotions in Speech: A Deep Learning Approach Using Convolutional Neural Networks
Аннотация
The work proposed emotion identification system using deep learning. The research advances human-computer interaction, mental health monitoring, market research, sentiment analysis and emphasizes the necessity of neural networks. It’s seeking to construct a model that learns from raw speech audio. It’s developed using CNNs and LSTMs model, a classification block uses LSTM units to capture long-term temporal correlations. This happens after a feature extraction block uses imultaneous CNNs and MFCCs. These methods ensure the categorization block can accurately display data. Data preparation for training and testing a CREMA-D dataset-based emotion prediction model is considerable. To optimize neural network performance, the method includes feature-label separation, encoding, dataset splitting, standardization, and data reshaping. To simplify and reduce complexity, it excludes data augmentation approaches. The model is trained and evaluated using CREMA-D, dataset contains 7,442 voice recordings representing different emotions. There are 84 male and 43 female performers, with ages ranging from 20 to 74 years old. The model has an average accuracy of 86.92% across validation. In the future, research may focus on developing real-time emotion identification applications and integrating multimodal data to enhance the accuracy and robustness of emotion detection systems.
Keyword: Automation, Speech Emotion Recognition, Convolutional Neural
Networks, Long Short-term Memory networks.
Keyword: Automation, Speech Emotion Recognition, Convolutional Neural
Networks, Long Short-term Memory networks.