Leaderboard for Intra-corpus Benchmark



If you would like to report your results here, please follow instructions at EmoBox GitHub Page repo.

Accuracy (%) with features from different pre-training models for cross-corpus settings. The horizontal direction represents the training sets, while the vertical direction represents the test sets. I, M, R, S stand for IEMOCAP, MELD, RAVDESS, and SAVEE, respectively. Bold indicates the best results for each train-test pair among 10 pre-trained models.




Rank Model I->M I->R I->S M->I M->R M->S R->I R->M R->S S->I S->M S->R
Whisper large v3 46.14 38.24 46.12 51.42 47.00 36.44 48.12 40.68 66.91 49.30 42.18 49.63
WavLM large 48.59 34.16 35.53 39.06 23.06 25.74 44.03 33.90 63.35 43.69 34.10 36.69
HuBERT large 44.60 15.03 39.99 44.69 38.22 43.74 36.18 25.02 56.96 42.81 31.54 31.92
WavLM base 38.25 27.80 39.40 46.30 21.38 35.75 30.78 29.58 43.24 34.00 27.40 27.38
data2vec base 43.86 40.46 32.53 42.57 24.16 24.33 22.28 20.32 27.02 29.41 31.96 34.68
HuBERT base 37.32 22.63 35.88 38.31 31.60 32.67 42.00 33.43 43.47 39.39 29.03 38.41
data2vec 2.0 base 44.96 30.52 31.40 42.35 33.32 30.42 30.77 26.80 37.31 29.12 35.29 19.24
data2vec large 44.99 39.03 36.88 40.62 29.10 31.57 26.50 26.73 32.54 26.82 24.05 16.96
wav2vec 2.0 base 29.78 18.25 28.84 22.50 31.39 35.24 27.15 23.20 33.77 31.34 29.19 21.36
data2vec 2.0 large 47.43 17.80 29.67 41.75 31.80 29.66 38.79 34.21 35.43 36.39 37.79 23.58