Speech Emotion Recognition

98 papers with code • 14 benchmarks • 18 datasets

Speech Emotion Recognition is a task of speech processing and computational paralinguistics that aims to recognize and categorize the emotions expressed in spoken language. The goal is to determine the emotional state of a speaker, such as happiness, anger, sadness, or frustration, from their speech patterns, such as prosody, pitch, and rhythm.

For multimodal emotion recognition, please upload your result to Multimodal Emotion Recognition on IEMOCAP

Benchmarks

Add a Result

These leaderboards are used to track progress in Speech Emotion Recognition

Dataset	Best Model	Compare
IEMOCAP	DANN	See all
CREMA-D	ConformerXL-P	See all
RAVDESS	VQ-MAE-S-12 (Frame) + Query2Emo	See all
MSP-Podcast (Valence)	w2v2-L-robust-12	See all
MSP-Podcast (Activation)	w2v2-L-robust-12	See all
MSP-Podcast (Dominance)	w2v2-L-robust-12	See all
ShEMO	CNN (1D)	See all
EmoDB Dataset	VQ-MAE-S-12 (Frame) + Query2Emo	See all
Dusha Crowd	Dusha baseline	See all
Dusha Podcast	Dusha baseline	See all
LSSED	PyResNet	See all
EMODB	VGG-optiVMD	See all
Quechua-SER	LSTM	See all
MSP-IMPROV	emoDARTS	See all

Show all 14 benchmarks

Collapse benchmarks

Libraries

Use these libraries to find Speech Emotion Recognition models and implementations

raulsteleac/Speech_Emotion_Recognit…

3 papers

alibaba-damo-academy/FunASR

2 papers

3,299

aris-ai/Audio-and-text-based-emotio…

2 papers

138

Datasets

Subtasks

Most implemented papers

Most implemented Social Latest No code

Attention Is All You Need

tensorflow/tensor2tensor • • NeurIPS 2017

The dominant sequence transduction models are based on complex recurrent or convolutional neural networks in an encoder-decoder configuration.

567

Paper
Code

Continuous control with deep reinforcement learning

ray-project/ray • 9 Sep 2015

We adapt the ideas underlying the success of Deep Q-Learning to the continuous action domain.

157

Paper
Code

Multimodal Speech Emotion Recognition and Ambiguity Resolution

Demfier/multimodal-speech-emotion-recognition • • 12 Apr 2019

In this work, we adopt a feature-engineering based approach to tackle the task of speech emotion recognition.

Paper
Code

Multimodal Speech Emotion Recognition Using Audio and Text

david-yoon/multimodal-speech-emotion • • 10 Oct 2018

Speech emotion recognition is a challenging task, and extensive reliance has been placed on models that use audio features in building well-performing classifiers.

Paper
Code

Compact Graph Architecture for Speech Emotion Recognition

AmirSh15/Compact_SER • • 5 Aug 2020

We propose a deep graph approach to address the task of speech emotion recognition.

Paper
Code

AST: Audio Spectrogram Transformer

YuanGongND/ast • • 5 Apr 2021

In the past decade, convolutional neural networks (CNNs) have been widely adopted as the main building block for end-to-end audio classification models, which aim to learn a direct mapping from audio spectrograms to corresponding labels.

Paper
Code

Speech Emotion Recognition Using Multi-hop Attention Mechanism

raulsteleac/Speech_Emotion_Recognition • • 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2019

As opposed to using knowledge from both the modalities separately, we propose a framework to exploit acoustic information in tandem with lexical data.

Paper
Code