Responsive image
A Multilingual Multi-corpus Speech Emotion Recognition Toolkit and Benchmark.

Arxiv Code Intra-corpus Benchmark repeat Cross-corpus Benchmark


Introduction


EmoBox, a groundbreaking multilingual multi-corpus speech emotion recognition (SER) toolkit designed to streamline research in this field. EmoBox is accompanied by a meticulously curated benchmark tailored for both intra-corpus and cross-corpus evaluation settings.

EmoBox consists of:

  • For intra-corpus evaluations, we have devised a systematic approach to data partitioning across various datasets, ensuring that researchers can conduct rigorous and comparable analyses of different SER models.
  • For the cross-corpus evaluations, we leverage a foundational SER model, emotion2vec, to address annotation discrepancies and create a test set that achieves a balance in speaker and emotion distribution, a feat previously unattained in SER research.

Based on EmoBox, we present the intra-corpus SER results of 10 pre-trained speech models on 32 emotion datasets with 14 languages, and the cross-corpus SER results on 4 datasets with the fully balanced test sets. To the best of our knowledge, this is the largest SER benchmark, across language scopes and quantity scales. We hope that our toolkit and benchmark can facilitate the research of SER in the community


Usage



Powerful Toolkit

Easily conduct experiments on different datasets.



Leaderboard

Track the advances in Speech Emotion Recognition research.



Paper


Please cite our paper as below if you use the EmoBox toolkit and benchmark.

@inproceedings{ma2024emobox,
    title={EmoBox: Multilingual Multi-corpus Speech Emotion Recognition Toolkit and Benchmark},
    author={Ziyang Ma and Mingjie Chen and Hezhao Zhang and Zhisheng Zheng and Wenxi Chen and Xiquan Li and Jiaxin Ye and Xie Chen and Thomas Hain},
    booktitle={Proc. INTERSPEECH},
    year={2024}
}

                

Contact



Have any questions or suggestions? Feel free to reach us at EmoBox GitHub repo