If you would like to report your results here, please follow instructions at EmoBox GitHub Page repo.
The EmoBox leaderboard compiles results from intra-corpus models which can be applied to 32 emotion datasets. For models that works on a single dataset, please use the tabs below to navigate to the corresponding leaderboard.
Intra-corpus SER results of pre-trained speech models on 32 emotion datasets spanning 14 distinct languages with EmoBox data partitioning. The models are ranked by the Mean Weighted Average Accuracy (WA %).
Rank | Model | AESDD | ASED | ASVP-ESD | CaFE | CASIA | CREMA-D | EMNS | EmoDB | EmoV-DB | EMOVO | Emozionalmente | eNTERFACE | ESD | IEMOCAP | JL-Corpus | M3ED | MEAD | MELD | MER2023 | MESD | MSP-Podcast | Oreau | PAVOQUE | Polish | RAVDESS | RESD | SAVEE | ShEMO | SUBESCO | TESS | TurEV-DB | URDU |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | Whisper large v3 | 79.18 | 96.73 | 71.52 | 68.84 | 59.58 | 76.48 | 70.58 | 92.43 | 99.37 | 57.82 | 76.91 | 97.68 | 84.62 | 72.86 | 66.71 | 49.42 | 77.34 | 51.89 | 65.23 | 69.67 | 44.10 | 84.79 | 93.40 | 83.27 | 75.87 | 55.54 | 77.24 | 89.55 | 73.05 | 99.96 | 81.58 | 82.52 |
2 | WavLM large | 84.49 | 96.45 | 65.91 | 61.33 | 52.12 | 74.32 | 84.12 | 92.67 | 99.47 | 48.82 | 75.00 | 92.42 | 79.14 | 69.07 | 60.86 | 44.86 | 82.03 | 49.31 | 54.77 | 62.49 | 40.47 | 66.29 | 93.40 | 79.29 | 72.22 | 56.47 | 70.80 | 87.13 | 65.33 | 99.78 | 80.09 | 86.61 |
3 | HuBERT large | 78.90 | 96.19 | 63.00 | 58.73 | 45.30 | 73.64 | 74.28 | 90.26 | 99.40 | 45.74 | 69.83 | 88.17 | 75.85 | 66.69 | 56.62 | 44.49 | 77.84 | 46.37 | 50.49 | 53.67 | 40.02 | 64.35 | 92.76 | 70.20 | 70.29 | 51.51 | 75.05 | 83.35 | 64.53 | 99.86 | 72.08 | 81.75 |
4 | data2vec 2.0 large | 72.34 | 94.28 | 62.35 | 58.02 | 45.57 | 69.27 | 58.60 | 80.41 | 98.32 | 45.88 | 64.10 | 94.02 | 76.81 | 56.23 | 65.14 | 43.02 | 75.24 | 47.72 | 46.81 | 48.45 | 41.33 | 64.88 | 92.07 | 74.00 | 71.63 | 44.64 | 78.59 | 82.68 | 66.45 | 99.54 | 64.13 | 78.10 |
5 | HuBERT base | 82.35 | 94.13 | 59.78 | 54.16 | 47.23 | 70.98 | 76.25 | 87.73 | 98.77 | 46.10 | 66.30 | 79.11 | 72.41 | 63.10 | 51.11 | 42.55 | 75.71 | 45.47 | 49.80 | 47.48 | 38.70 | 52.80 | 92.19 | 69.40 | 66.21 | 52.34 | 59.85 | 81.17 | 57.89 | 99.62 | 73.26 | 88.41 |
6 | WavLM base | 79.08 | 94.31 | 58.05 | 52.33 | 47.25 | 69.49 | 69.71 | 87.12 | 98.49 | 42.39 | 63.02 | 88.27 | 72.90 | 63.94 | 53.79 | 42.79 | 72.86 | 44.71 | 48.71 | 43.52 | 37.38 | 58.06 | 90.84 | 69.31 | 62.10 | 45.09 | 67.05 | 78.76 | 57.06 | 99.10 | 70.51 | 82.82 |
7 | data2vec 2.0 base | 46.68 | 93.99 | 57.57 | 51.67 | 43.31 | 65.48 | 48.39 | 75.86 | 96.09 | 42.96 | 56.22 | 90.80 | 73.40 | 53.19 | 54.04 | 41.42 | 70.64 | 46.65 | 46.64 | 44.85 | 39.97 | 59.73 | 87.82 | 75.75 | 64.84 | 44.03 | 75.50 | 79.03 | 57.97 | 98.55 | 63.69 | 69.97 |
8 | wav2vec 2.0 base | 67.65 | 88.63 | 59.12 | 42.47 | 39.56 | 61.90 | 65.27 | 83.14 | 98.03 | 31.07 | 56.69 | 64.19 | 69.17 | 57.73 | 45.27 | 43.20 | 73.57 | 45.17 | 46.78 | 62.89 | 37.15 | 41.06 | 91.23 | 67.35 | 55.40 | 53.39 | 49.83 | 78.96 | 51.25 | 97.92 | 68.01 | 87.50 |
9 | data2vec large | 45.69 | 88.31 | 56.36 | 42.85 | 37.65 | 63.51 | 48.96 | 61.95 | 94.93 | 35.66 | 53.92 | 89.46 | 72.01 | 51.11 | 48.87 | 38.73 | 68.44 | 45.74 | 40.27 | 36.61 | 37.45 | 49.12 | 84.89 | 68.05 | 59.50 | 31.57 | 71.45 | 74.09 | 53.06 | 96.89 | 55.11 | 64.81 |
10 | data2vec base | 50.03 | 86.34 | 50.79 | 42.36 | 34.72 | 57.78 | 34.58 | 60.01 | 93.61 | 32.47 | 48.97 | 91.62 | 65.05 | 53.20 | 49.29 | 37.32 | 66.05 | 45.57 | 43.06 | 34.35 | 37.15 | 55.67 | 85.11 | 71.31 | 52.22 | 37.90 | 78.25 | 70.07 | 46.22 | 94.67 | 52.58 | 65.42 |
For the dataset AESDD (el), we report the following results:
Rank | Model | UA | WA | F1 |
---|---|---|---|---|
1 | WavLM large | 84.40 | 84.49 | 84.19 |
2 | HuBERT base | 82.30 | 82.35 | 82.36 |
3 | Whisper large v3 | 79.13 | 79.18 | 79.13 |
4 | WavLM base | 78.99 | 79.08 | 78.78 |
5 | HuBERT large | 78.85 | 78.90 | 78.88 |
6 | data2vec 2.0 large | 72.26 | 72.34 | 71.82 |
7 | wav2vec 2.0 base | 67.59 | 67.65 | 67.61 |
8 | data2vec base | 49.96 | 50.03 | 49.30 |
9 | data2vec 2.0 base | 46.55 | 46.68 | 45.33 |
10 | data2vec large | 45.63 | 45.69 | 44.65 |
For the dataset ASED (am), we report the following results:
Rank | Model | UA | WA | F1 |
---|---|---|---|---|
1 | Whisper large v3 | 96.75 | 96.73 | 96.74 |
2 | WavLM large | 96.44 | 96.45 | 96.42 |
3 | HuBERT large | 96.19 | 96.19 | 96.17 |
4 | WavLM base | 94.27 | 94.31 | 94.29 |
5 | data2vec 2.0 large | 94.30 | 94.28 | 94.27 |
6 | HuBERT base | 94.17 | 94.13 | 94.13 |
7 | data2vec 2.0 base | 93.99 | 93.99 | 93.98 |
8 | wav2vec 2.0 base | 88.59 | 88.63 | 88.63 |
9 | data2vec large | 88.31 | 88.31 | 88.30 |
10 | data2vec base | 86.39 | 86.34 | 86.34 |
For the dataset ASVP-ESD (mix), we report the following results:
Rank | Model | UA | WA | F1 |
---|---|---|---|---|
1 | Whisper large v3 | 61.14 | 71.52 | 62.08 |
2 | WavLM large | 56.31 | 65.91 | 56.83 |
3 | HuBERT large | 53.33 | 63.00 | 54.14 |
4 | data2vec 2.0 large | 52.18 | 62.35 | 52.64 |
5 | HuBERT base | 48.69 | 59.78 | 49.72 |
6 | wav2vec 2.0 base | 48.99 | 59.12 | 49.78 |
7 | WavLM base | 46.38 | 58.05 | 47.35 |
8 | data2vec large | 46.95 | 56.36 | 47.33 |
9 | data2vec 2.0 base | 46.00 | 57.57 | 46.62 |
10 | data2vec base | 37.66 | 50.79 | 38.26 |
For the dataset CaFE (fr), we report the following results:
Rank | Model | UA | WA | F1 |
---|---|---|---|---|
1 | Whisper large v3 | 68.84 | 68.84 | 68.06 |
2 | WavLM large | 62.20 | 61.33 | 61.14 |
3 | HuBERT large | 59.50 | 58.73 | 58.22 |
4 | data2vec 2.0 large | 59.04 | 58.02 | 57.51 |
5 | HuBERT base | 54.16 | 54.16 | 53.36 |
6 | WavLM base | 52.71 | 52.33 | 51.66 |
7 | data2vec 2.0 base | 51.83 | 51.67 | 50.52 |
8 | wav2vec 2.0 base | 42.76 | 42.47 | 41.03 |
9 | data2vec base | 42.18 | 42.36 | 41.69 |
10 | data2vec large | 42.24 | 42.85 | 41.11 |
For the dataset CASIA (zh), we report the following results:
Rank | Model | UA | WA | F1 |
---|---|---|---|---|
1 | Whisper large v3 | 59.58 | 59.58 | 56.27 |
2 | WavLM large | 52.12 | 52.12 | 46.55 |
3 | HuBERT base | 47.23 | 47.23 | 42.47 |
4 | WavLM base | 47.25 | 47.25 | 41.78 |
5 | data2vec 2.0 large | 45.57 | 45.57 | 41.46 |
6 | HuBERT large | 45.30 | 45.30 | 39.10 |
7 | data2vec 2.0 base | 43.31 | 43.31 | 38.90 |
8 | wav2vec 2.0 base | 39.56 | 39.56 | 34.86 |
9 | data2vec large | 37.65 | 37.65 | 33.50 |
10 | data2vec base | 34.72 | 34.72 | 30.88 |
For the dataset CREMA-D (en), we report the following results:
Rank | Model | UA | WA | F1 |
---|---|---|---|---|
1 | Whisper large v3 | 76.75 | 76.48 | 76.60 |
2 | WavLM large | 74.50 | 74.32 | 74.39 |
3 | HuBERT large | 73.83 | 73.64 | 73.73 |
4 | HuBERT base | 71.13 | 70.98 | 71.00 |
5 | WavLM base | 69.64 | 69.49 | 69.54 |
6 | data2vec 2.0 large | 69.55 | 69.27 | 69.25 |
7 | data2vec 2.0 base | 65.74 | 65.48 | 65.47 |
8 | data2vec large | 63.80 | 63.51 | 63.48 |
9 | wav2vec 2.0 base | 61.95 | 61.90 | 61.75 |
10 | data2vec base | 58.03 | 57.78 | 57.73 |
For the dataset EMNS (en), we report the following results:
Rank | Model | UA | WA | F1 |
---|---|---|---|---|
1 | WavLM large | 84.12 | 84.12 | 83.97 |
2 | HuBERT base | 75.83 | 76.25 | 75.70 |
3 | HuBERT large | 73.94 | 74.28 | 73.67 |
4 | Whisper large v3 | 70.58 | 70.58 | 69.31 |
5 | WavLM base | 69.46 | 69.71 | 69.24 |
6 | wav2vec 2.0 base | 65.14 | 65.27 | 64.80 |
7 | data2vec 2.0 large | 57.80 | 58.60 | 57.15 |
8 | data2vec large | 48.52 | 48.96 | 48.39 |
9 | data2vec 2.0 base | 47.83 | 48.39 | 47.21 |
10 | data2vec base | 34.33 | 34.58 | 33.12 |
For the dataset EmoDB (de), we report the following results:
Rank | Model | UA | WA | F1 |
---|---|---|---|---|
1 | WavLM large | 92.58 | 92.67 | 92.57 |
2 | Whisper large v3 | 91.26 | 92.43 | 91.84 |
3 | HuBERT large | 89.81 | 90.26 | 89.86 |
4 | HuBERT base | 87.73 | 87.73 | 87.82 |
5 | WavLM base | 87.03 | 87.12 | 86.76 |
6 | wav2vec 2.0 base | 82.06 | 83.14 | 82.21 |
7 | data2vec 2.0 large | 79.36 | 80.41 | 79.96 |
8 | data2vec 2.0 base | 75.07 | 75.86 | 75.49 |
9 | data2vec large | 60.96 | 61.95 | 61.26 |
10 | data2vec base | 58.12 | 60.01 | 58.32 |
For the dataset EmoV-DB (en), we report the following results:
Rank | Model | UA | WA | F1 |
---|---|---|---|---|
1 | WavLM large | 99.44 | 99.47 | 99.45 |
2 | HuBERT large | 99.36 | 99.40 | 99.37 |
3 | Whisper large v3 | 99.36 | 99.37 | 99.34 |
4 | HuBERT base | 98.72 | 98.77 | 98.71 |
5 | WavLM base | 98.38 | 98.49 | 98.39 |
6 | data2vec 2.0 large | 98.17 | 98.32 | 98.19 |
7 | wav2vec 2.0 base | 97.85 | 98.03 | 97.90 |
8 | data2vec 2.0 base | 95.81 | 96.09 | 95.80 |
9 | data2vec large | 94.47 | 94.93 | 94.58 |
10 | data2vec base | 93.26 | 93.61 | 93.23 |
For the dataset EMOVO (it), we report the following results:
Rank | Model | UA | WA | F1 |
---|---|---|---|---|
1 | Whisper large v3 | 57.82 | 57.82 | 56.06 |
2 | WavLM large | 48.82 | 48.82 | 44.16 |
3 | data2vec 2.0 large | 45.88 | 45.88 | 43.63 |
4 | HuBERT base | 46.10 | 46.10 | 41.28 |
5 | HuBERT large | 45.74 | 45.74 | 40.56 |
6 | data2vec 2.0 base | 42.96 | 42.96 | 41.01 |
7 | WavLM base | 42.39 | 42.39 | 37.33 |
8 | data2vec large | 35.66 | 35.66 | 33.59 |
9 | data2vec base | 32.47 | 32.47 | 29.22 |
10 | wav2vec 2.0 base | 31.07 | 31.07 | 27.24 |
For the dataset Emozionalmente (it), we report the following results:
Rank | Model | UA | WA | F1 |
---|---|---|---|---|
1 | Whisper large v3 | 76.91 | 76.91 | 76.90 |
2 | WavLM large | 75.00 | 75.00 | 74.97 |
3 | HuBERT large | 69.83 | 69.83 | 69.81 |
4 | HuBERT base | 66.30 | 66.30 | 66.26 |
5 | data2vec 2.0 large | 64.10 | 64.10 | 63.93 |
6 | WavLM base | 63.02 | 63.02 | 63.02 |
7 | wav2vec 2.0 base | 56.69 | 56.69 | 56.64 |
8 | data2vec 2.0 base | 56.22 | 56.22 | 56.00 |
9 | data2vec large | 53.92 | 53.92 | 53.66 |
10 | data2vec base | 48.97 | 48.97 | 48.77 |
For the dataset eNTERFACE (en), we report the following results:
Rank | Model | UA | WA | F1 |
---|---|---|---|---|
1 | Whisper large v3 | 97.69 | 97.68 | 97.68 |
2 | data2vec 2.0 large | 94.02 | 94.02 | 94.01 |
3 | WavLM large | 92.43 | 92.42 | 92.40 |
4 | data2vec 2.0 base | 90.81 | 90.80 | 90.83 |
5 | data2vec large | 89.46 | 89.46 | 89.65 |
6 | WavLM base | 88.30 | 88.27 | 88.20 |
7 | HuBERT large | 88.19 | 88.17 | 88.14 |
8 | HuBERT base | 79.14 | 79.11 | 78.97 |
9 | wav2vec 2.0 base | 64.19 | 64.12 | 63.81 |
10 | data2vec base | 34.33 | 34.58 | 33.12 |
For the dataset ESD (mix), we report the following results:
Rank | Model | UA | WA | F1 |
---|---|---|---|---|
1 | Whisper large v3 | 84.62 | 84.62 | 84.33 |
2 | WavLM large | 79.14 | 79.14 | 78.87 |
3 | data2vec 2.0 large | 76.81 | 76.81 | 76.43 |
4 | HuBERT large | 75.85 | 75.85 | 75.38 |
5 | data2vec 2.0 base | 73.40 | 73.40 | 73.10 |
6 | WavLM base | 72.90 | 72.90 | 72.55 |
7 | HuBERT base | 72.41 | 72.41 | 72.11 |
8 | data2vec large | 72.01 | 72.01 | 71.77 |
9 | wav2vec 2.0 base | 69.17 | 69.27 | 68.66 |
10 | data2vec base | 65.05 | 65.05 | 64.55 |
For the dataset IEMOCAP (en), we report the following results:
Rank | Model | UA | WA | F1 |
---|---|---|---|---|
1 | Whisper large v3 | 73.54 | 72.86 | 73.11 |
2 | WavLM large | 69.47 | 69.07 | 69.29 |
3 | HuBERT large | 67.42 | 66.69 | 67.24 |
4 | HuBERT base | 63.87 | 63.10 | 63.45 |
5 | WavLM base | 62.92 | 63.94 | 63.40 |
6 | wav2vec 2.0 base | 58.27 | 57.73 | 57.83 |
7 | data2vec 2.0 large | 57.30 | 56.23 | 56.70 |
8 | data2vec 2.0 base | 54.40 | 53.19 | 53.71 |
9 | data2vec base | 54.19 | 53.20 | 53.76 |
10 | data2vec large | 52.56 | 51.11 | 51.71 |
For the dataset JL-Corpus (en), we report the following results:
Rank | Model | UA | WA | F1 |
---|---|---|---|---|
1 | Whisper large v3 | 66.71 | 66.71 | 65.19 |
2 | data2vec 2.0 large | 65.14 | 65.14 | 63.49 |
3 | WavLM large | 60.86 | 60.86 | 57.34 |
4 | HuBERT large | 56.62 | 56.62 | 52.77 |
5 | data2vec 2.0 base | 54.04 | 54.04 | 52.85 |
6 | WavLM base | 53.79 | 53.79 | 52.36 |
7 | HuBERT base | 51.11 | 51.11 | 50.56 |
8 | data2vec base | 49.29 | 49.29 | 47.86 |
9 | data2vec large | 48.87 | 48.87 | 46.92 |
10 | wav2vec 2.0 base | 45.27 | 45.27 | 40.73 |
For the dataset M3ED (zh), we report the following results:
Rank | Model | UA | WA | F1 |
---|---|---|---|---|
1 | Whisper large v3 | 32.84 | 49.42 | 33.76 |
2 | WavLM large | 26.58 | 44.86 | 26.98 |
3 | HuBERT large | 23.25 | 44.49 | 23.28 |
4 | data2vec 2.0 large | 23.82 | 43.02 | 23.98 |
5 | HuBERT base | 23.80 | 42.55 | 24.03 |
6 | wav2vec 2.0 base | 23.13 | 43.20 | 22.91 |
7 | WavLM base | 22.76 | 42.79 | 22.03 |
8 | data2vec 2.0 base | 22.82 | 41.42 | 22.89 |
9 | data2vec large | 20.20 | 38.73 | 20.26 |
10 | data2vec base | 19.44 | 37.32 | 19.24 |
For the dataset MEAD (en), we report the following results:
Rank | Model | UA | WA | F1 |
---|---|---|---|---|
1 | WavLM large | 81.27 | 82.03 | 81.43 |
2 | HuBERT large | 76.87 | 77.84 | 77.12 |
3 | Whisper large v3 | 76.35 | 77.34 | 76.55 |
4 | HuBERT base | 74.76 | 75.71 | 74.92 |
5 | data2vec 2.0 large | 74.13 | 75.24 | 74.43 |
6 | wav2vec 2.0 base | 72.17 | 73.57 | 72.32 |
7 | WavLM base | 71.85 | 72.86 | 72.14 |
8 | data2vec 2.0 base | 69.66 | 70.64 | 69.90 |
9 | data2vec large | 67.57 | 68.44 | 67.88 |
10 | data2vec base | 65.36 | 66.05 | 65.53 |
For the dataset MELD (en), we report the following results:
Rank | Model | UA | WA | F1 |
---|---|---|---|---|
1 | Whisper large v3 | 31.54 | 51.89 | 32.95 |
2 | WavLM large | 28.18 | 49.31 | 29.11 |
3 | data2vec 2.0 large | 26.33 | 47.72 | 27.35 |
4 | data2vec 2.0 base | 24.79 | 46.65 | 25.28 |
5 | HuBERT large | 24.13 | 46.37 | 24.99 |
6 | data2vec base | 23.82 | 45.57 | 24.37 |
7 | HuBERT base | 23.53 | 45.47 | 24.29 |
8 | data2vec large | 23.35 | 45.74 | 24.10 |
9 | WavLM base | 23.44 | 44.71 | 24.25 |
10 | wav2vec 2.0 base | 20.06 | 45.17 | 20.04 |
For the dataset MER2023 (zh), we report the following results:
Rank | Model | UA | WA | F1 |
---|---|---|---|---|
1 | Whisper large v3 | 61.22 | 65.23 | 62.29 |
2 | WavLM large | 48.17 | 54.77 | 49.36 |
3 | HuBERT large | 43.96 | 50.49 | 44.45 |
4 | HuBERT base | 42.56 | 49.80 | 42.77 |
5 | WavLM base | 41.80 | 48.71 | 41.97 |
6 | data2vec 2.0 base | 42.59 | 46.64 | 43.22 |
7 | data2vec 2.0 large | 42.05 | 46.81 | 42.08 |
8 | wav2vec 2.0 base | 40.40 | 46.78 | 40.73 |
9 | data2vec base | 37.94 | 43.06 | 38.15 |
10 | data2vec large | 35.14 | 40.27 | 34.28 |
For the dataset MESD (es), we report the following results:
Rank | Model | UA | WA | F1 |
---|---|---|---|---|
1 | Whisper large v3 | 69.78 | 69.67 | 69.64 |
2 | wav2vec 2.0 base | 62.93 | 62.89 | 62.85 |
3 | WavLM large | 62.54 | 62.49 | 62.33 |
4 | HuBERT large | 53.71 | 53.67 | 53.67 |
5 | data2vec 2.0 large | 48.46 | 48.45 | 46.80 |
6 | HuBERT base | 47.52 | 47.48 | 46.33 |
7 | data2vec 2.0 base | 44.86 | 44.85 | 43.60 |
8 | WavLM base | 43.58 | 43.52 | 42.94 |
9 | data2vec large | 36.67 | 36.61 | 35.81 |
10 | data2vec base | 34.37 | 34.35 | 33.24 |
For the dataset MSP-Podcast (en), we report the following results:
Rank | Model | UA | WA | F1 |
---|---|---|---|---|
1 | Whisper large v3 | 22.24 | 44.10 | 22.12 |
2 | WavLM large | 18.60 | 40.47 | 17.97 |
3 | data2vec 2.0 large | 18.35 | 41.33 | 17.07 |
4 | HuBERT large | 18.07 | 40.02 | 17.20 |
5 | data2vec 2.0 base | 16.79 | 39.97 | 15.33 |
6 | HuBERT base | 16.97 | 38.70 | 15.89 |
7 | data2vec large | 17.24 | 37.45 | 16.86 |
8 | WavLM base | 17.11 | 37.38 | 16.53 |
9 | data2vec base | 16.19 | 37.15 | 15.49 |
10 | wav2vec 2.0 base | 15.50 | 37.48 | 13.80 |
For the dataset Oreau (fr), we report the following results:
Rank | Model | UA | WA | F1 |
---|---|---|---|---|
1 | Whisper large v3 | 84.48 | 84.79 | 85.01 |
2 | WavLM large | 65.54 | 66.29 | 65.67 |
3 | data2vec 2.0 large | 64.64 | 64.88 | 64.50 |
4 | HuBERT large | 63.69 | 64.35 | 63.66 |
5 | data2vec 2.0 base | 59.40 | 59.73 | 58.13 |
6 | WavLM base | 57.78 | 58.06 | 57.45 |
7 | data2vec base | 54.84 | 55.67 | 54.76 |
8 | HuBERT base | 51.98 | 52.80 | 51.89 |
9 | data2vec large | 48.29 | 49.12 | 48.21 |
10 | wav2vec 2.0 base | 40.14 | 41.06 | 39.23 |
For the dataset PAVOQUE (de), we report the following results:
Rank | Model | UA | WA | F1 |
---|---|---|---|---|
1 | WavLM large | 87.73 | 93.40 | 88.43 |
2 | Whisper large v3 | 87.72 | 93.17 | 88.41 |
3 | HuBERT large | 87.04 | 92.76 | 87.94 |
4 | HuBERT base | 86.12 | 92.19 | 87.06 |
5 | data2vec 2.0 large | 85.38 | 92.07 | 86.75 |
6 | wav2vec 2.0 base | 84.95 | 91.23 | 86.09 |
7 | data2vec 2.0 base | 78.30 | 87.82 | 80.92 |
8 | data2vec base | 74.94 | 85.11 | 76.63 |
9 | data2vec base | 74.94 | 85.11 | 76.63 |
10 | data2vec large | 73.26 | 84.89 | 75.71 |
For the dataset Polish (pl), we report the following results:
Rank | Model | UA | WA | F1 |
---|---|---|---|---|
1 | Whisper large v3 | 83.27 | 83.27 | 82.77 |
2 | WavLM large | 79.29 | 79.29 | 79.02 |
3 | data2vec 2.0 base | 75.75 | 75.75 | 75.52 |
4 | data2vec 2.0 large | 74.00 | 74.00 | 74.05 |
5 | data2vec base | 71.31 | 71.31 | 70.65 |
6 | HuBERT large | 70.20 | 70.20 | 70.74 |
7 | WavLM base | 69.31 | 69.31 | 69.46 |
8 | HuBERT base | 69.40 | 69.40 | 69.08 |
9 | data2vec large | 68.05 | 68.05 | 67.07 |
10 | wav2vec 2.0 base | 67.35 | 67.35 | 66.76 |
For the dataset RAVDESS (en), we report the following results:
Rank | Model | UA | WA | F1 |
---|---|---|---|---|
1 | Whisper large v3 | 75.32 | 75.87 | 75.19 |
2 | WavLM large | 72.00 | 72.22 | 71.42 |
3 | data2vec 2.0 large | 71.15 | 71.63 | 70.94 |
4 | HuBERT large | 70.00 | 70.29 | 69.54 |
5 | HuBERT base | 65.43 | 66.21 | 65.31 |
6 | data2vec 2.0 base | 64.66 | 64.84 | 64.18 |
7 | WavLM base | 61.56 | 62.10 | 61.18 |
8 | data2vec large | 59.30 | 59.50 | 58.74 |
9 | wav2vec 2.0 base | 54.33 | 55.40 | 53.99 |
10 | data2vec base | 51.92 | 52.22 | 51.10 |
For the dataset RESD (ru), we report the following results:
Rank | Model | UA | WA | F1 |
---|---|---|---|---|
1 | WavLM large | 55.87 | 56.47 | 55.82 |
2 | Whisper large v3 | 54.98 | 55.54 | 54.99 |
3 | wav2vec 2.0 base | 52.82 | 53.39 | 52.90 |
4 | HuBERT base | 51.56 | 52.34 | 51.65 |
5 | HuBERT large | 50.82 | 51.51 | 50.74 |
6 | WavLM base | 44.81 | 45.09 | 44.97 |
7 | data2vec 2.0 large | 44.08 | 44.64 | 44.25 |
8 | data2vec 2.0 base | 43.54 | 44.03 | 43.00 |
9 | data2vec base | 37.09 | 37.90 | 36.86 |
10 | data2vec large | 30.78 | 31.57 | 30.81 |
For the dataset SAVEE (en), we report the following results:
Rank | Model | UA | WA | F1 |
---|---|---|---|---|
1 | data2vec 2.0 large | 75.75 | 78.59 | 78.24 |
2 | data2vec base | 75.65 | 78.25 | 78.38 |
3 | Whisper large v3 | 74.07 | 77.24 | 75.31 |
4 | HuBERT large | 71.91 | 75.05 | 71.83 |
5 | data2vec large | 68.01 | 71.45 | 71.08 |
6 | WavLM large | 66.74 | 70.80 | 66.37 |
7 | WavLM base | 63.57 | 67.05 | 62.83 |
8 | HuBERT base | 58.90 | 59.85 | 64.05 |
9 | data2vec base | 51.92 | 52.22 | 51.10 |
10 | wav2vec 2.0 base | 44.89 | 49.83 | 42.07 |
For the dataset ShEMO (fa), we report the following results:
Rank | Model | UA | WA | F1 |
---|---|---|---|---|
1 | Whisper large v3 | 80.23 | 89.55 | 82.94 |
2 | WavLM large | 71.72 | 87.13 | 73.55 |
3 | data2vec 2.0 large | 64.09 | 82.68 | 68.47 |
4 | HuBERT large | 64.29 | 83.35 | 66.26 |
5 | data2vec 2.0 base | 60.59 | 79.03 | 64.03 |
6 | HuBERT base | 58.31 | 81.17 | 63.15 |
7 | WavLM base | 60.73 | 78.76 | 62.06 |
8 | wav2vec 2.0 base | 56.34 | 78.96 | 57.34 |
9 | data2vec large | 56.42 | 74.09 | 60.98 |
10 | data2vec base | 47.61 | 70.07 | 49.49 |
For the dataset SUBESCO (bn), we report the following results:
Rank | Model | UA | WA | F1 |
---|---|---|---|---|
1 | Whisper large v3 | 73.05 | 73.05 | 72.94 |
2 | data2vec 2.0 large | 66.45 | 66.45 | 66.25 |
3 | WavLM large | 65.33 | 65.33 | 65.09 |
4 | HuBERT large | 64.53 | 64.53 | 64.33 |
5 | data2vec 2.0 base | 57.97 | 57.97 | 57.69 |
6 | HuBERT base | 57.89 | 57.89 | 57.64 |
7 | WavLM base | 57.06 | 57.06 | 56.78 |
8 | data2vec large | 53.06 | 53.06 | 52.62 |
9 | wav2vec 2.0 base | 51.25 | 51.25 | 50.91 |
10 | data2vec base | 46.22 | 46.22 | 45.80 |
For the dataset TESS (en), we report the following results:
Rank | Model | UA | WA | F1 |
---|---|---|---|---|
1 | Whisper large v3 | 99.96 | 99.96 | 99.96 |
2 | HuBERT large | 99.86 | 99.86 | 99.86 |
3 | WavLM large | 99.78 | 99.78 | 99.78 |
4 | HuBERT base | 99.62 | 99.62 | 99.62 |
5 | data2vec 2.0 large | 99.54 | 99.54 | 99.54 |
6 | WavLM base | 99.10 | 99.10 | 99.10 |
7 | data2vec 2.0 base | 98.55 | 98.55 | 98.55 |
8 | wav2vec 2.0 base | 97.92 | 97.92 | 97.92 |
9 | data2vec large | 96.89 | 96.89 | 96.89 |
10 | data2vec base | 94.67 | 94.67 | 94.65 |
For the dataset TurEV-DB (tr), we report the following results:
Rank | Model | UA | WA | F1 |
---|---|---|---|---|
1 | Whisper large v3 | 81.32 | 81.58 | 81.31 |
2 | WavLM large | 79.50 | 80.09 | 79.51 |
3 | HuBERT base | 72.81 | 73.26 | 72.89 |
4 | HuBERT large | 70.93 | 72.08 | 70.96 |
5 | WavLM base | 69.97 | 70.51 | 70.18 |
6 | wav2vec 2.0 base | 67.19 | 68.01 | 67.19 |
7 | data2vec 2.0 large | 63.77 | 64.13 | 63.00 |
8 | data2vec 2.0 base | 63.44 | 63.69 | 63.37 |
9 | data2vec large | 54.69 | 55.11 | 54.56 |
10 | data2vec base | 51.62 | 52.58 | 51.52 |
For the dataset URDU (ur), we report the following results:
Rank | Model | UA | WA | F1 |
---|---|---|---|---|
1 | HuBERT base | 88.41 | 88.41 | 88.40 |
2 | wav2vec 2.0 base | 87.50 | 87.50 | 87.57 |
3 | WavLM large | 86.61 | 86.61 | 86.64 |
4 | WavLM base | 82.82 | 82.82 | 82.85 |
5 | Whisper large v3 | 82.52 | 82.52 | 82.41 |
6 | HuBERT large | 81.75 | 81.75 | 81.66 |
7 | data2vec 2.0 large | 78.10 | 78.10 | 78.10 |
8 | data2vec 2.0 base | 69.97 | 69.97 | 69.91 |
9 | data2vec base | 65.42 | 65.42 | 65.35 |
10 | data2vec large | 64.81 | 64.81 | 64.93 |