Search Results/Filters    

Filters

Year

Banks


Expert Group


Full-Text


Issue Info: 
  • Year: 

    2022
  • Volume: 

    11
  • Issue: 

    21
  • Pages: 

    85-98
Measures: 
  • Citations: 

    0
  • Views: 

    93
  • Downloads: 

    19
Abstract: 

The purpose of speech emotion recognition systems is to create an emotional connection between humans and machine, since recognizing human emotions and goals helps improve interactions between humans and machines. Recognizing emotions through speech has been a challenge for researchers over the past decade. But with advances in artificial intelligence, these challenges have faded. In this study, we took steps to improve the efficiency of these systems by using deep learning methods. In the first step, three-dimensional Convolutional neural networks are used to learn the spectral-temporal Features of speech. In the second step, to strengthen the proposed model, We use the New pyramidal Concatenated three-dimensional Convolutional neural networks, Which is a multi-scale architecture of three-dimensional Convolutional neural networks on input dimensions. Finally, to obtain the ability of learning the spectral-temporal features extracted from the New Pyramidal Concatenated 3D CNN Approach, we used the temporal capsule network, so could be called consider the spatial and temporal relationship of the data. Finally, we named the proposed structure, which is a powerful structure for spectral-temporal feaures, the MSID 3DCNN + Temporal Capsule.The final model has been applied on a combination of two speech and song databases from the RAVDESS database. comparing the results of the proposed model with the conventional models, shows the better performance of our approach. The proposed SER model has achieved an accuracy of 81.77% for six emotional classes by gender.

Yearly Impact: مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic Resources

View 93

مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic ResourcesDownload 19 مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic ResourcesCitation 0 مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic ResourcesRefrence 0
Issue Info: 
  • Year: 

    2020
  • Volume: 

    33
  • Issue: 

    2 (TRANSACTIONS B: Applications)
  • Pages: 

    285-292
Measures: 
  • Citations: 

    0
  • Views: 

    236
  • Downloads: 

    116
Abstract: 

Speech Emotion Recognition (SER) is an important part of speech-based Human-Computer Interface (HCI) applications. Previous SER methods rely on the extraction of features and training an appropriate classifier. However, most of those features can be affected by emotionally irrelevant factors such as gender, speaking styles and environment. Here, an SER method has been proposed based on a concatenated Convolutional Neural Network (CNN) and a Recurrent Neural Network (RNN). The CNN can be used to learn local salient features from speech signals, images, and videos. Moreover, the RNNs have been used in many sequential data processing tasks in order to learn long-term dependencies between the local features. A combination of these two gives us the advantage of the strengths of both networks. In the proposed method, CNN has been applied directly to a scalogram of speech signals. Then, the attention-mechanism-based RNN model was used to learn long-term temporal relationships of the learned features. Experiments on various data such as RAVDESS, SAVEE, and Emo-DB demonstrate the effectiveness of the proposed SER method.

Yearly Impact: مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic Resources

View 236

مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic ResourcesDownload 116 مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic ResourcesCitation 0 مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic ResourcesRefrence 0
Issue Info: 
  • Year: 

    2023
  • Volume: 

    13
  • Issue: 

    52
  • Pages: 

    79-98
Measures: 
  • Citations: 

    0
  • Views: 

    150
  • Downloads: 

    0
Abstract: 

Speech emotions recognition due to its various applications has been considered by many researchers in recent years. With the extension of deep neural network training methods and their widespread usage in various applications. In this paper, the application of convolutional and transformer networks in a new combination in the recognition of speech emotions has been investigated, which is easier to implement than existing methods and has a good performance. For this purpose, basic convolutional neural networks and transformers are introduced and then based on them a new model resulting from the combination of convolutional networks and transformers is presented in which the output of the basic convolutional network is the input of the basic transformer network. The results show that the use of transformer neural networks in recognizing some emotional categories performs better than the convolutional neural network-based method. This paper also shows that the use of simple neural networks in combination can have a better performance in recognizing emotions through speech. In this regard, recognition of speech emotions using a combination of convolutional neural networks and a transformer called convolutional-transformer (CTF) for RAVDESS dataset achieved an accuracy of %80. 94,while a simple convolutional neural network achieved an accuracy of about %72. 7. The combination of simple neural networks can not only increase recognition accuracy but also reduce training time and the need for labeled training samples.

Yearly Impact: مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic Resources

View 150

مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic ResourcesDownload 0 مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic ResourcesCitation 0 مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic ResourcesRefrence 0
Issue Info: 
  • Year: 

    2024
  • Volume: 

    56
  • Issue: 

    2
  • Pages: 

    213-226
Measures: 
  • Citations: 

    0
  • Views: 

    10
  • Downloads: 

    0
Abstract: 

Recognizing the emotions from speech signals is very important in different applications of human-computer-interaction (HCI). In this paper, we present a novel model for speech emotion recognition (SER) based on new multi-task parallel convolutional autoencoder (PCAE) and transformer networks. The PCAEs have been proposed to generate high-level informative harmonic sparse features from the input. With the aid of the proposed parallel CAE, we can extract nonlinear sparse features in an ensemble manner improving the accuracy and the generalization of the model. These PCAEs also address the problem of the loss of initial sequential information during convolution operations for SER tasks. We have also proposed using a transformer in parallel with PCAEs to gather long-term dependencies between speech samples and make use of its self-attention mechanism. Finally, we have proposed a multi-task loss function made up of two terms of classification and AE mapper losses. This multi-task loss tries not only to reduce the classification error but also the regression error caused by the PCAEs which also work as mappers between the input and output Mel-frequency-cepstral-coefficients (MFCCs). Thus, we can both focus on finding accurate features with PCAEs and improving the classification results. We have evaluated our proposed method on the RAVDESS SER dataset in different terms of accuracy, precision, recall, and f1-score. The average accuracy of the proposed model on eight emotions outperforms all the recent baselines.

Yearly Impact: مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic Resources

View 10

مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic ResourcesDownload 0 مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic ResourcesCitation 0 مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic ResourcesRefrence 0
Issue Info: 
  • Year: 

    2022
  • Volume: 

    10
  • Issue: 

    2 (38)
  • Pages: 

    89-101
Measures: 
  • Citations: 

    0
  • Views: 

    54
  • Downloads: 

    22
Abstract: 

Emotional distress detection has become a hot topic of research in recent years due to concerns related to mental health and complex nature distress identification. One of the challenging tasks is to use non-invasive technology to understand and detect emotional distress in humans. Personalized affective cues provide a non-invasive approach considering visual, vocal, and verbal cues to recognize the affective state. In this paper, we are proposing a multimodal hierarchical weighted framework to recognize emotional distress. We are utilizing negative emotions to detect the unapparent behavior of the person. To capture facial cues, we have employed hybrid models consisting of a transfer learned residual network and CNN models. Extracted facial cue features are processed and fused at decision using a weighted approach. For audio cues, we employed two different models exploiting the LSTM and CNN capabilities fusing the results at the decision level. For textual cues, we used a BERT transformer to learn extracted features. We have proposed a novel decision level adaptive hierarchical weighted algorithm to fuse the results of the different modalities. The proposed algorithm has been used to detect the emotional distress of a person. Hence, we have proposed a novel algorithm for the detection of emotional distress based on visual, verbal, and vocal cues. Experiments on multiple datasets like FER2013, JAFFE, CK+, RAVDESS, TESS, ISEAR, Emotion Stimulus dataset, and Daily-Dialog dataset demonstrates the effectiveness and usability of the proposed architecture. Experiments on the enterface'05 dataset for distress detection has demonstrated significant results.

Yearly Impact: مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic Resources

View 54

مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic ResourcesDownload 22 مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic ResourcesCitation 0 مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic ResourcesRefrence 0
litScript
telegram sharing button
whatsapp sharing button
linkedin sharing button
twitter sharing button
email sharing button
email sharing button
email sharing button
sharethis sharing button