If you would like to report your results here, please follow instructions at EmoBox GitHub Page repo.
Accuracy (%) with features from different pre-training models for cross-corpus settings. The horizontal direction represents the training sets, while the vertical direction represents the test sets. I, M, R, S stand for IEMOCAP, MELD, RAVDESS, and SAVEE, respectively. Bold indicates the best results for each train-test pair among 10 pre-trained models.
Rank | Model | I->M | I->R | I->S | M->I | M->R | M->S | R->I | R->M | R->S | S->I | S->M | S->R |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Whisper large v3 | 46.14 | 38.24 | 46.12 | 51.42 | 47.00 | 36.44 | 48.12 | 40.68 | 66.91 | 49.30 | 42.18 | 49.63 | |
WavLM large | 48.59 | 34.16 | 35.53 | 39.06 | 23.06 | 25.74 | 44.03 | 33.90 | 63.35 | 43.69 | 34.10 | 36.69 | |
HuBERT large | 44.60 | 15.03 | 39.99 | 44.69 | 38.22 | 43.74 | 36.18 | 25.02 | 56.96 | 42.81 | 31.54 | 31.92 | |
WavLM base | 38.25 | 27.80 | 39.40 | 46.30 | 21.38 | 35.75 | 30.78 | 29.58 | 43.24 | 34.00 | 27.40 | 27.38 | |
data2vec base | 43.86 | 40.46 | 32.53 | 42.57 | 24.16 | 24.33 | 22.28 | 20.32 | 27.02 | 29.41 | 31.96 | 34.68 | |
HuBERT base | 37.32 | 22.63 | 35.88 | 38.31 | 31.60 | 32.67 | 42.00 | 33.43 | 43.47 | 39.39 | 29.03 | 38.41 | |
data2vec 2.0 base | 44.96 | 30.52 | 31.40 | 42.35 | 33.32 | 30.42 | 30.77 | 26.80 | 37.31 | 29.12 | 35.29 | 19.24 | |
data2vec large | 44.99 | 39.03 | 36.88 | 40.62 | 29.10 | 31.57 | 26.50 | 26.73 | 32.54 | 26.82 | 24.05 | 16.96 | |
wav2vec 2.0 base | 29.78 | 18.25 | 28.84 | 22.50 | 31.39 | 35.24 | 27.15 | 23.20 | 33.77 | 31.34 | 29.19 | 21.36 | |
data2vec 2.0 large | 47.43 | 17.80 | 29.67 | 41.75 | 31.80 | 29.66 | 38.79 | 34.21 | 35.43 | 36.39 | 37.79 | 23.58 |