Emotion Recognition from Brazilian Portuguese Informal Spontaneous Speech

Here, we present the Brazilian Portuguese Speech Emotion Recognition Task. This task aims to motivate research for SER in our community, mainly to discuss theoretical and practical aspects of Speech Emotion Recognition, pre-processing and feature extraction, and machine learning models for Brazilian Portuguese.

We provide a dataset called CORAA SER version 1.0 composed of approximately 50 minutes of audio segments labeled in three classes: neutral, non-neutral female, and non-neutral male. While the neutral class represents audio segments with no well-defined emotional state, the non-neutral classes represent segments associated with one of the primary emotional states in the speaker's speech. This dataset was built from the C-ORAL-BRASIL I corpus.

The available corpus consists of audio segments representing Brazilian Portuguese informal spontaneous speech. The non-neutral emotion class was labeled considering paralinguistic elements (laughing, crying, etc). Participants can use pre-trained models and external data, as long as the original C-ORAL-BRASIL corpus (or variants) is not used for model training.

In this task, participants must train their own models using acoustic audio features. A training set is available. The models trained by the participants will be evaluated in a test set, which will be made publicly available after the challenge.

Training Data

Train audio segments are available in the data_train.zip file.

Audio files are named according to their label: <file-id>_<label>.wav . Check the baselines for some examples on reading and pre-processing the training set.

Test Data

Test audio segments are available in the test_ser.zip file.

The ground truth and other metadata are available in the test_ser_metadata.csv file.

Baselines

We present two simple baselines as examples of pre-processing audio segments for feature extraction and model training for emotion recognition.

The first baseline uses a set of prosodic audio features for emotion classification.

In the second baseline, we use the Wav2Vec model to extract features (i.e. embeddings) from the audio segments. These features can be used for training a speech emotion recognition classifier.

Evaluation

Each participant can submit up to three models. The Macro F1 Score measure will be used to evaluate the models.

More information

The S&ER 2022 Workshop is collocated with the 15th edition of the International Conference on the Computational Processing of Portuguese (PROPOR 2022).

Workshop website: https://sites.google.com/view/ser2022/home

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Emotion Recognition from Brazilian Portuguese Informal Spontaneous Speech

Training Data

Test Data

Baselines

Evaluation

More information

About

Releases

Packages

rmarcacini/ser-coraa-pt-br

Folders and files

Latest commit

History

Repository files navigation

Emotion Recognition from Brazilian Portuguese Informal Spontaneous Speech

Training Data

Test Data

Baselines

Evaluation

More information

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages