مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic Resources

video

مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic Resources

sound

مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic Resources

Persian Version

مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic Resources

View:

10
مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic Resources

Download:

مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic Resources

Cites:

Information Journal Paper

Title

A Novel Multi-Task and Ensembled Optimized Parallel Convolutional Autoencoder and Transformer for Speech Emotion Recognition

Pages

  213-226

Abstract

 Recognizing the emotions from speech signals is very important in different applications of human-computer-interaction (HCI). In this paper, we present a novel model for Speech emotion recognition (SER) based on new multi-task parallel convolutional Autoencoder (PCAE) and Transformer networks. The PCAEs have been proposed to generate high-level informative harmonic sparse features from the input. With the aid of the proposed parallel CAE, we can extract nonlinear sparse features in an ensemble manner improving the accuracy and the generalization of the model. These PCAEs also address the problem of the loss of initial sequential information during convolution operations for SER tasks. We have also proposed using a Transformer in parallel with PCAEs to gather long-term dependencies between speech samples and make use of its self-attention mechanism. Finally, we have proposed a multi-task loss function made up of two terms of classification and AE mapper losses. This multi-task loss tries not only to reduce the classification error but also the regression error caused by the PCAEs which also work as mappers between the input and output Mel-frequency-cepstral-coefficients (MFCCs). Thus, we can both focus on finding accurate features with PCAEs and improving the classification results. We have evaluated our proposed method on the RAVDESS SER dataset in different terms of accuracy, precision, recall, and f1-score. The average accuracy of the proposed model on eight emotions outperforms all the recent baselines.

Multimedia

  • No record.
  • Cites

  • No record.
  • References

  • No record.
  • Cite

    Related Journal Papers

  • No record.
  • Related Seminar Papers

  • No record.
  • Related Plans

  • No record.
  • Recommended Workshops






    Move to top
    telegram sharing button
    whatsapp sharing button
    linkedin sharing button
    twitter sharing button
    email sharing button
    email sharing button
    email sharing button
    sharethis sharing button