If you would like to report your results here, please follow instructions at EmoBox GitHub Page repo.
The EmoBox leaderboard compiles results from intra-corpus models which can be applied to 32 emotion datasets. For models that works on a single dataset, please use the tabs below to navigate to the corresponding leaderboard.
Intra-corpus SER results of pre-trained speech models on 32 emotion datasets spanning 14 distinct languages with EmoBox data partitioning. The models are ranked by the Mean Weighted Average Accuracy (WA %).
Rank | Model | AESDD | ASED | ASVP-ESD | CaFE | CASIA | CREMA-D | EMNS | EmoDB | EmoV-DB | EMOVO | Emozionalmente | eNTERFACE | ESD | IEMOCAP | JL-Corpus | M3ED | MEAD | MELD | MER2023 | MESD | MSP-Podcast | Oreau | PAVOQUE | Polish | RAVDESS | RESD | SAVEE | ShEMO | SUBESCO | TESS | TurEV-DB | URDU |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Whisper large v3 | 79.18 | 96.73 | 71.52 | 68.84 | 59.58 | 76.48 | 70.58 | 92.43 | 99.37 | 57.82 | 76.91 | 97.68 | 84.62 | 72.86 | 66.71 | 49.42 | 77.34 | 51.89 | 65.23 | 69.67 | 44.10 | 84.79 | 93.40 | 83.27 | 75.87 | 55.54 | 77.24 | 89.55 | 73.05 | 99.96 | 81.58 | 82.52 | |
WavLM large | 84.49 | 96.45 | 65.91 | 61.33 | 52.12 | 74.32 | 84.12 | 92.67 | 99.47 | 48.82 | 75.00 | 92.42 | 79.14 | 69.07 | 60.86 | 44.86 | 82.03 | 49.31 | 54.77 | 62.49 | 40.47 | 66.29 | 93.40 | 79.29 | 72.22 | 56.47 | 70.80 | 87.13 | 65.33 | 99.78 | 80.09 | 86.61 | |
HuBERT large | 78.90 | 96.19 | 63.00 | 58.73 | 45.30 | 73.64 | 74.28 | 90.26 | 99.40 | 45.74 | 69.83 | 88.17 | 75.85 | 66.69 | 56.62 | 44.49 | 77.84 | 46.37 | 50.49 | 53.67 | 40.02 | 64.35 | 92.76 | 70.20 | 70.29 | 51.51 | 75.05 | 83.35 | 64.53 | 99.86 | 72.08 | 81.75 | |
HuBERT base | 82.35 | 94.13 | 59.78 | 54.16 | 47.23 | 70.98 | 76.25 | 87.73 | 98.77 | 46.10 | 66.30 | 79.11 | 72.41 | 63.10 | 51.11 | 42.55 | 75.71 | 45.47 | 49.80 | 47.48 | 38.70 | 52.80 | 92.19 | 69.40 | 66.21 | 52.34 | 59.85 | 81.17 | 57.89 | 99.62 | 73.26 | 88.41 | |
WavLM base | 79.08 | 94.31 | 58.05 | 52.33 | 47.25 | 69.49 | 69.71 | 87.12 | 98.49 | 42.39 | 63.02 | 88.27 | 72.90 | 63.94 | 53.79 | 42.79 | 72.86 | 44.71 | 48.71 | 43.52 | 37.38 | 58.06 | 90.84 | 69.31 | 62.10 | 45.09 | 67.05 | 78.76 | 57.06 | 99.10 | 70.51 | 82.82 | |
data2vec 2.0 large | 72.34 | 94.28 | 62.35 | 58.02 | 45.57 | 69.27 | 58.60 | 80.41 | 98.32 | 45.88 | 64.10 | 94.02 | 76.81 | 56.23 | 65.14 | 43.02 | 75.24 | 47.72 | 46.81 | 48.45 | 41.33 | 64.88 | 92.07 | 74.00 | 71.63 | 44.64 | 78.59 | 82.68 | 66.45 | 99.54 | 64.13 | 78.10 | |
data2vec 2.0 base | 46.68 | 93.99 | 57.57 | 51.67 | 43.31 | 65.48 | 48.39 | 75.86 | 96.09 | 42.96 | 56.22 | 90.80 | 73.40 | 53.19 | 54.04 | 41.42 | 70.64 | 46.65 | 46.64 | 44.85 | 39.97 | 59.73 | 87.82 | 75.75 | 64.84 | 44.03 | 75.50 | 79.03 | 57.97 | 98.55 | 63.69 | 69.97 | |
data2vec large | 45.69 | 88.31 | 56.36 | 42.85 | 37.65 | 63.51 | 48.96 | 61.95 | 94.93 | 35.66 | 53.92 | 89.46 | 72.01 | 51.11 | 48.87 | 38.73 | 68.44 | 45.74 | 40.27 | 36.61 | 37.45 | 49.12 | 84.89 | 68.05 | 59.50 | 31.57 | 71.45 | 74.09 | 53.06 | 96.89 | 55.11 | 64.81 | |
data2vec base | 50.03 | 86.34 | 50.79 | 42.36 | 34.72 | 57.78 | 34.58 | 60.01 | 93.61 | 32.47 | 48.97 | 91.62 | 65.05 | 53.20 | 49.29 | 37.32 | 66.05 | 45.57 | 43.06 | 34.35 | 37.15 | 55.67 | 85.11 | 71.31 | 52.22 | 37.90 | 78.25 | 70.07 | 46.22 | 94.67 | 52.58 | 65.42 | |
wav2vec 2.0 base | 67.65 | 88.63 | 59.12 | 42.47 | 39.56 | 61.90 | 65.27 | 83.14 | 98.03 | 31.07 | 56.69 | 64.19 | 69.17 | 57.73 | 45.27 | 43.20 | 73.57 | 45.17 | 46.78 | 62.89 | 37.15 | 41.06 | 91.23 | 67.35 | 55.40 | 53.39 | 49.83 | 78.96 | 51.25 | 97.92 | 68.01 | 87.50 |
For the dataset AESDD (el), we report the following results:
Rank | Model | UA | WA | F1 |
---|---|---|---|---|
WavLM large | 84.40 | 84.49 | 84.19 | |
Whisper large v3 | 79.13 | 79.18 | 79.13 | |
HuBERT base | 82.30 | 82.35 | 82.36 | |
WavLM base | 78.99 | 79.08 | 78.78 | |
HuBERT large | 78.85 | 78.90 | 78.88 | |
data2vec 2.0 large | 72.26 | 72.34 | 71.82 | |
wav2vec 2.0 base | 67.59 | 67.65 | 67.61 | |
data2vec base | 49.96 | 50.03 | 49.30 | |
data2vec 2.0 base | 46.55 | 46.68 | 45.33 | |
data2vec large | 45.63 | 45.69 | 44.65 |
For the dataset ASED (am), we report the following results:
Rank | Model | UA | WA | F1 |
---|---|---|---|---|
Whisper large v3 | 96.75 | 96.73 | 96.74 | |
WavLM large | 96.44 | 96.45 | 96.42 | |
HuBERT large | 96.19 | 96.19 | 96.17 | |
WavLM base | 94.27 | 94.31 | 94.29 | |
HuBERT base | 94.17 | 94.13 | 94.13 | |
data2vec 2.0 large | 94.30 | 94.28 | 94.27 | |
data2vec 2.0 base | 93.99 | 93.99 | 93.98 | |
wav2vec 2.0 base | 88.59 | 88.63 | 88.63 | |
data2vec large | 88.31 | 88.31 | 88.30 | |
data2vec base | 86.39 | 86.34 | 86.34 |
For the dataset ASVP-ESD (mix), we report the following results:
Rank | Model | UA | WA | F1 |
---|---|---|---|---|
Whisper large v3 | 61.14 | 71.52 | 62.08 | |
WavLM large | 56.31 | 65.91 | 56.83 | |
HuBERT large | 53.33 | 63.00 | 54.14 | |
WavLM base | 46.38 | 58.05 | 47.35 | |
HuBERT base | 48.69 | 59.78 | 49.72 | |
data2vec 2.0 large | 52.18 | 62.35 | 52.64 | |
data2vec 2.0 base | 46.00 | 57.57 | 46.62 | |
data2vec large | 46.95 | 56.36 | 47.33 | |
wav2vec 2.0 base | 48.99 | 59.12 | 49.78 | |
data2vec base | 37.66 | 50.79 | 38.26 |
For the dataset CaFE (fr), we report the following results:
Rank | Model | UA | WA | F1 |
---|---|---|---|---|
Whisper large v3 | 68.84 | 68.84 | 68.06 | |
WavLM large | 62.20 | 61.33 | 61.14 | |
HuBERT large | 59.50 | 58.73 | 58.22 | |
WavLM base | 52.71 | 52.33 | 51.66 | |
HuBERT base | 54.16 | 54.16 | 53.36 | |
data2vec 2.0 large | 59.04 | 58.02 | 57.51 | |
data2vec 2.0 base | 51.83 | 51.67 | 50.52 | |
data2vec base | 42.18 | 42.36 | 41.69 | |
wav2vec 2.0 base | 42.76 | 42.47 | 41.03 | |
data2vec large | 42.24 | 42.85 | 41.11 |
For the dataset CASIA (zh), we report the following results:
Rank | Model | UA | WA | F1 |
---|---|---|---|---|
Whisper large v3 | 59.58 | 59.58 | 56.27 | |
WavLM large | 52.12 | 52.12 | 46.55 | |
WavLM base | 47.25 | 47.25 | 41.78 | |
HuBERT base | 47.23 | 47.23 | 42.47 | |
HuBERT large | 45.30 | 45.30 | 39.10 | |
data2vec 2.0 base | 43.31 | 43.31 | 38.90 | |
data2vec 2.0 large | 45.57 | 45.57 | 41.46 | |
data2vec large | 37.65 | 37.65 | 33.50 | |
wav2vec 2.0 base | 39.56 | 39.56 | 34.86 | |
data2vec base | 34.72 | 34.72 | 30.88 |
For the dataset CREMA-D (en), we report the following results:
Rank | Model | UA | WA | F1 |
---|---|---|---|---|
Whisper large v3 | 76.75 | 76.48 | 76.60 | |
WavLM large | 74.50 | 74.32 | 74.39 | |
HuBERT large | 73.83 | 73.64 | 73.73 | |
HuBERT base | 71.13 | 70.98 | 71.00 | |
WavLM base | 69.64 | 69.49 | 69.54 | |
data2vec 2.0 large | 69.55 | 69.27 | 69.25 | |
data2vec 2.0 base | 65.74 | 65.48 | 65.47 | |
data2vec large | 63.80 | 63.51 | 63.48 | |
wav2vec 2.0 base | 61.95 | 61.90 | 61.75 | |
data2vec base | 58.03 | 57.78 | 57.73 |
For the dataset EMNS (en), we report the following results:
Rank | Model | UA | WA | F1 |
---|---|---|---|---|
Whisper large v3 | 70.58 | 70.58 | 69.31 | |
WavLM large | 84.12 | 84.12 | 83.97 | |
HuBERT base | 75.83 | 76.25 | 75.70 | |
HuBERT large | 73.94 | 74.28 | 73.67 | |
WavLM base | 69.46 | 69.71 | 69.24 | |
data2vec 2.0 large | 57.80 | 58.60 | 57.15 | |
data2vec 2.0 base | 47.83 | 48.39 | 47.21 | |
data2vec large | 48.52 | 48.96 | 48.39 | |
wav2vec 2.0 base | 65.14 | 65.27 | 64.80 | |
data2vec base | 34.33 | 34.58 | 33.12 |
For the dataset EmoDB (de), we report the following results:
Rank | Model | UA | WA | F1 |
---|---|---|---|---|
Whisper large v3 | 91.26 | 92.43 | 91.84 | |
WavLM large | 92.58 | 92.67 | 92.57 | |
HuBERT large | 89.81 | 90.26 | 89.86 | |
WavLM base | 87.03 | 87.12 | 86.76 | |
HuBERT base | 87.73 | 87.73 | 87.82 | |
data2vec 2.0 large | 79.36 | 80.41 | 79.96 | |
data2vec 2.0 base | 75.07 | 75.86 | 75.49 | |
wav2vec 2.0 base | 82.06 | 83.14 | 82.21 | |
data2vec large | 60.96 | 61.95 | 61.26 | |
data2vec base | 58.12 | 60.01 | 58.32 |
For the dataset EmoV-DB (en), we report the following results:
Rank | Model | UA | WA | F1 |
---|---|---|---|---|
WavLM large | 99.44 | 99.47 | 99.45 | |
Whisper large v3 | 99.36 | 99.37 | 99.34 | |
HuBERT large | 99.36 | 99.40 | 99.37 | |
HuBERT base | 98.72 | 98.77 | 98.71 | |
WavLM base | 98.38 | 98.49 | 98.39 | |
data2vec 2.0 large | 98.17 | 98.32 | 98.19 | |
data2vec 2.0 base | 95.81 | 96.09 | 95.80 | |
wav2vec 2.0 base | 97.85 | 98.03 | 97.90 | |
data2vec large | 94.47 | 94.93 | 94.58 | |
data2vec base | 93.26 | 93.61 | 93.23 |
For the dataset EMOVO (it), we report the following results:
Rank | Model | UA | WA | F1 |
---|---|---|---|---|
Whisper large v3 | 57.82 | 57.82 | 56.06 | |
WavLM large | 48.82 | 48.82 | 44.16 | |
data2vec 2.0 large | 45.88 | 45.88 | 43.63 | |
WavLM base | 42.39 | 42.39 | 37.33 | |
HuBERT base | 46.10 | 46.10 | 41.28 | |
HuBERT large | 45.74 | 45.74 | 40.56 | |
data2vec 2.0 base | 42.96 | 42.96 | 41.01 | |
wav2vec 2.0 base | 31.07 | 31.07 | 27.24 | |
data2vec large | 35.66 | 35.66 | 33.59 | |
data2vec base | 32.47 | 32.47 | 29.22 |
For the dataset Emozionalmente (it), we report the following results:
Rank | Model | UA | WA | F1 |
---|---|---|---|---|
Whisper large v3 | 76.91 | 76.91 | 76.90 | |
WavLM large | 75.00 | 75.00 | 74.97 | |
HuBERT large | 69.83 | 69.83 | 69.81 | |
HuBERT base | 66.30 | 66.30 | 66.26 | |
WavLM base | 63.02 | 63.02 | 63.02 | |
data2vec 2.0 large | 64.10 | 64.10 | 63.93 | |
data2vec 2.0 base | 56.22 | 56.22 | 56.00 | |
wav2vec 2.0 base | 56.69 | 56.69 | 56.64 | |
data2vec large | 53.92 | 53.92 | 53.66 | |
data2vec base | 48.97 | 48.97 | 48.77 |
For the dataset eNTERFACE (en), we report the following results:
Rank | Model | UA | WA | F1 |
---|---|---|---|---|
Whisper large v3 | 97.69 | 97.68 | 97.68 | |
data2vec 2.0 large | 94.02 | 94.02 | 94.01 | |
WavLM large | 92.43 | 92.42 | 92.40 | |
HuBERT large | 88.19 | 88.17 | 88.14 | |
WavLM base | 88.30 | 88.27 | 88.20 | |
HuBERT base | 79.14 | 79.11 | 78.97 | |
data2vec large | 89.46 | 89.46 | 89.65 | |
data2vec 2.0 base | 90.81 | 90.80 | 90.83 | |
wav2vec 2.0 base | 64.19 | 64.12 | 63.81 | |
data2vec base | 34.33 | 34.58 | 33.12 |
For the dataset ESD (mix), we report the following results:
Rank | Model | UA | WA | F1 |
---|---|---|---|---|
Whisper large v3 | 84.62 | 84.62 | 84.33 | |
WavLM large | 79.14 | 79.14 | 78.87 | |
data2vec 2.0 large | 76.81 | 76.81 | 76.43 | |
HuBERT large | 75.85 | 75.85 | 75.38 | |
data2vec 2.0 base | 73.40 | 73.40 | 73.10 | |
HuBERT base | 72.41 | 72.41 | 72.11 | |
WavLM base | 72.90 | 72.90 | 72.55 | |
data2vec large | 72.01 | 72.01 | 71.77 | |
wav2vec 2.0 base | 69.17 | 69.27 | 68.66 | |
data2vec base | 65.05 | 65.05 | 64.55 |
For the dataset IEMOCAP (en), we report the following results:
Rank | Model | UA | WA | F1 |
---|---|---|---|---|
Whisper large v3 | 73.54 | 72.86 | 73.11 | |
WavLM large | 69.47 | 69.07 | 69.29 | |
HuBERT large | 67.42 | 66.69 | 67.24 | |
WavLM base | 62.92 | 63.94 | 63.40 | |
HuBERT base | 63.87 | 63.10 | 63.45 | |
data2vec 2.0 large | 57.30 | 56.23 | 56.70 | |
data2vec 2.0 base | 54.40 | 53.19 | 53.71 | |
data2vec large | 52.56 | 51.11 | 51.71 | |
wav2vec 2.0 base | 58.27 | 57.73 | 57.83 | |
data2vec base | 54.19 | 53.20 | 53.76 |
For the dataset JL-Corpus (en), we report the following results:
Rank | Model | UA | WA | F1 |
---|---|---|---|---|
Whisper large v3 | 66.71 | 66.71 | 65.19 | |
data2vec 2.0 large | 65.14 | 65.14 | 63.49 | |
WavLM large | 60.86 | 60.86 | 57.34 | |
HuBERT large | 56.62 | 56.62 | 52.77 | |
WavLM base | 53.79 | 53.79 | 52.36 | |
HuBERT base | 51.11 | 51.11 | 50.56 | |
data2vec 2.0 base | 54.04 | 54.04 | 52.85 | |
data2vec large | 48.87 | 48.87 | 46.92 | |
wav2vec 2.0 base | 45.27 | 45.27 | 40.73 | |
data2vec base | 49.29 | 49.29 | 47.86 |
For the dataset M3ED (zh), we report the following results:
Rank | Model | UA | WA | F1 |
---|---|---|---|---|
Whisper large v3 | 32.84 | 49.42 | 33.76 | |
WavLM large | 26.58 | 44.86 | 26.98 | |
HuBERT large | 23.25 | 44.49 | 23.28 | |
WavLM base | 22.76 | 42.79 | 22.03 | |
HuBERT base | 23.80 | 42.55 | 24.03 | |
data2vec 2.0 large | 23.82 | 43.02 | 23.98 | |
data2vec 2.0 base | 22.82 | 41.42 | 22.89 | |
data2vec large | 20.20 | 38.73 | 20.26 | |
wav2vec 2.0 base | 23.13 | 43.20 | 22.91 | |
data2vec base | 19.44 | 37.32 | 19.24 |
For the dataset MEAD (en), we report the following results:
Rank | Model | UA | WA | F1 |
---|---|---|---|---|
WavLM large | 81.27 | 82.03 | 81.43 | |
Whisper large v3 | 76.35 | 77.34 | 76.55 | |
HuBERT large | 76.87 | 77.84 | 77.12 | |
HuBERT base | 74.76 | 75.71 | 74.92 | |
data2vec 2.0 large | 74.13 | 75.24 | 74.43 | |
data2vec 2.0 base | 69.66 | 70.64 | 69.90 | |
data2vec large | 67.57 | 68.44 | 67.88 | |
wav2vec 2.0 base | 72.17 | 73.57 | 72.32 | |
WavLM base | 71.85 | 72.86 | 72.14 | |
data2vec base | 65.36 | 66.05 | 65.53 |
For the dataset MELD (en), we report the following results:
Rank | Model | UA | WA | F1 |
---|---|---|---|---|
Whisper large v3 | 31.54 | 51.89 | 32.95 | |
WavLM large | 28.18 | 49.31 | 29.11 | |
data2vec 2.0 large | 26.33 | 47.72 | 27.35 | |
data2vec 2.0 base | 24.79 | 46.65 | 25.28 | |
HuBERT large | 24.13 | 46.37 | 24.99 | |
WavLM base | 23.44 | 44.71 | 24.25 | |
HuBERT base | 23.53 | 45.47 | 24.29 | |
data2vec large | 23.35 | 45.74 | 24.10 | |
wav2vec 2.0 base | 20.06 | 45.17 | 20.04 | |
data2vec base | 23.82 | 45.57 | 24.37 |
For the dataset MER2023 (zh), we report the following results:
Rank | Model | UA | WA | F1 |
---|---|---|---|---|
Whisper large v3 | 61.22 | 65.23 | 62.29 | |
WavLM large | 48.17 | 54.77 | 49.36 | |
HuBERT large | 43.96 | 50.49 | 44.45 | |
HuBERT base | 42.56 | 49.80 | 42.77 | |
data2vec 2.0 large | 42.05 | 46.81 | 42.08 | |
data2vec 2.0 base | 42.59 | 46.64 | 43.22 | |
wav2vec 2.0 base | 40.40 | 46.78 | 40.73 | |
data2vec large | 35.14 | 40.27 | 34.28 | |
WavLM base | 41.80 | 48.71 | 41.97 | |
data2vec base | 37.94 | 43.06 | 38.15 |
For the dataset MESD (es), we report the following results:
Rank | Model | UA | WA | F1 |
---|---|---|---|---|
Whisper large v3 | 69.78 | 69.67 | 69.64 | |
WavLM large | 62.54 | 62.49 | 62.33 | |
wav2vec 2.0 base | 62.93 | 62.89 | 62.85 | |
data2vec 2.0 large | 48.46 | 48.45 | 46.80 | |
WavLM base | 43.58 | 43.52 | 42.94 | |
HuBERT large | 53.71 | 53.67 | 53.67 | |
HuBERT base | 47.52 | 47.48 | 46.33 | |
data2vec large | 36.67 | 36.61 | 35.81 | |
data2vec 2.0 base | 44.86 | 44.85 | 43.60 | |
data2vec base | 34.37 | 34.35 | 33.24 |
For the dataset MSP-Podcast (en), we report the following results:
Rank | Model | UA | WA | F1 |
---|---|---|---|---|
Whisper large v3 | 22.24 | 44.10 | 22.12 | |
data2vec 2.0 large | 18.35 | 41.33 | 17.07 | |
WavLM large | 18.60 | 40.47 | 17.97 | |
HuBERT large | 18.07 | 40.02 | 17.20 | |
HuBERT base | 16.97 | 38.70 | 15.89 | |
WavLM base | 17.11 | 37.38 | 16.53 | |
data2vec large | 17.24 | 37.45 | 16.86 | |
data2vec 2.0 base | 16.79 | 39.97 | 15.33 | |
wav2vec 2.0 base | 15.50 | 37.48 | 13.80 | |
data2vec base | 16.19 | 37.15 | 15.49 |
For the dataset Oreau (fr), we report the following results:
Rank | Model | UA | WA | F1 |
---|---|---|---|---|
Whisper large v3 | 84.48 | 84.79 | 85.01 | |
WavLM large | 65.54 | 66.29 | 65.67 | |
data2vec 2.0 large | 64.64 | 64.88 | 64.50 | |
HuBERT large | 63.69 | 64.35 | 63.66 | |
WavLM base | 57.78 | 58.06 | 57.45 | |
HuBERT base | 51.98 | 52.80 | 51.89 | |
data2vec base | 54.84 | 55.67 | 54.76 | |
data2vec large | 48.29 | 49.12 | 48.21 | |
wav2vec 2.0 base | 40.14 | 41.06 | 39.23 | |
data2vec 2.0 base | 59.40 | 59.73 | 58.13 |
For the dataset PAVOQUE (de), we report the following results:
Rank | Model | UA | WA | F1 |
---|---|---|---|---|
Whisper large v3 | 87.72 | 93.17 | 88.41 | |
WavLM large | 87.73 | 93.40 | 88.43 | |
HuBERT large | 87.04 | 92.76 | 87.94 | |
HuBERT base | 86.12 | 92.19 | 87.06 | |
wav2vec 2.0 base | 84.95 | 91.23 | 86.09 | |
data2vec 2.0 large | 85.38 | 92.07 | 86.75 | |
data2vec 2.0 base | 78.30 | 87.82 | 80.92 | |
data2vec base | 74.94 | 85.11 | 76.63 | |
data2vec large | 73.26 | 84.89 | 75.71 | |
data2vec base | 74.94 | 85.11 | 76.63 |
For the dataset Polish (pl), we report the following results:
Rank | Model | UA | WA | F1 |
---|---|---|---|---|
Whisper large v3 | 83.27 | 83.27 | 82.77 | |
WavLM large | 79.29 | 79.29 | 79.02 | |
data2vec 2.0 base | 75.75 | 75.75 | 75.52 | |
data2vec 2.0 large | 74.00 | 74.00 | 74.05 | |
wav2vec 2.0 base | 67.35 | 67.35 | 66.76 | |
HuBERT large | 70.20 | 70.20 | 70.74 | |
HuBERT base | 69.40 | 69.40 | 69.08 | |
WavLM base | 69.31 | 69.31 | 69.46 | |
data2vec large | 68.05 | 68.05 | 67.07 | |
data2vec base | 71.31 | 71.31 | 70.65 |
For the dataset RAVDESS (en), we report the following results:
Rank | Model | UA | WA | F1 |
---|---|---|---|---|
Whisper large v3 | 75.32 | 75.87 | 75.19 | |
WavLM large | 72.00 | 72.22 | 71.42 | |
data2vec 2.0 large | 71.15 | 71.63 | 70.94 | |
HuBERT large | 70.00 | 70.29 | 69.54 | |
data2vec 2.0 base | 64.66 | 64.84 | 64.18 | |
HuBERT base | 65.43 | 66.21 | 65.31 | |
data2vec large | 59.30 | 59.50 | 58.74 | |
WavLM base | 61.56 | 62.10 | 61.18 | |
wav2vec 2.0 base | 54.33 | 55.40 | 53.99 | |
data2vec base | 51.92 | 52.22 | 51.10 |
For the dataset RESD (ru), we report the following results:
Rank | Model | UA | WA | F1 |
---|---|---|---|---|
WavLM large | 55.87 | 56.47 | 55.82 | |
Whisper large v3 | 54.98 | 55.54 | 54.99 | |
HuBERT base | 51.56 | 52.34 | 51.65 | |
HuBERT large | 50.82 | 51.51 | 50.74 | |
wav2vec 2.0 base | 52.82 | 53.39 | 52.90 | |
data2vec 2.0 large | 44.08 | 44.64 | 44.25 | |
data2vec 2.0 base | 43.54 | 44.03 | 43.00 | |
data2vec large | 30.78 | 31.57 | 30.81 | |
WavLM base | 44.81 | 45.09 | 44.97 | |
data2vec base | 37.09 | 37.90 | 36.86 |
For the dataset SAVEE (en), we report the following results:
Rank | Model | UA | WA | F1 |
---|---|---|---|---|
data2vec base | 75.65 | 78.25 | 78.38 | |
data2vec 2.0 large | 75.75 | 78.59 | 78.24 | |
Whisper large v3 | 74.07 | 77.24 | 75.31 | |
HuBERT large | 71.91 | 75.05 | 71.83 | |
data2vec large | 68.01 | 71.45 | 71.08 | |
WavLM base | 63.57 | 67.05 | 62.83 | |
WavLM large | 66.74 | 70.80 | 66.37 | |
HuBERT base | 58.90 | 59.85 | 64.05 | |
wav2vec 2.0 base | 44.89 | 49.83 | 42.07 | |
data2vec base | 51.92 | 52.22 | 51.10 |
For the dataset ShEMO (fa), we report the following results:
Rank | Model | UA | WA | F1 |
---|---|---|---|---|
Whisper large v3 | 80.23 | 89.55 | 82.94 | |
WavLM large | 71.72 | 87.13 | 73.55 | |
HuBERT large | 64.29 | 83.35 | 66.26 | |
WavLM base | 60.73 | 78.76 | 62.06 | |
HuBERT base | 58.31 | 81.17 | 63.15 | |
data2vec 2.0 large | 64.09 | 82.68 | 68.47 | |
wav2vec 2.0 base | 56.34 | 78.96 | 57.34 | |
data2vec large | 56.42 | 74.09 | 60.98 | |
data2vec 2.0 base | 60.59 | 79.03 | 64.03 | |
data2vec base | 47.61 | 70.07 | 49.49 |
For the dataset SUBESCO (bn), we report the following results:
Rank | Model | UA | WA | F1 |
---|---|---|---|---|
Whisper large v3 | 73.05 | 73.05 | 72.94 | |
data2vec 2.0 large | 66.45 | 66.45 | 66.25 | |
WavLM large | 65.33 | 65.33 | 65.09 | |
HuBERT large | 64.53 | 64.53 | 64.33 | |
data2vec 2.0 base | 57.97 | 57.97 | 57.69 | |
WavLM base | 57.06 | 57.06 | 56.78 | |
HuBERT base | 57.89 | 57.89 | 57.64 | |
data2vec large | 53.06 | 53.06 | 52.62 | |
wav2vec 2.0 base | 51.25 | 51.25 | 50.91 | |
data2vec base | 46.22 | 46.22 | 45.80 |
For the dataset TESS (en), we report the following results:
Rank | Model | UA | WA | F1 |
---|---|---|---|---|
Whisper large v3 | 99.96 | 99.96 | 99.96 | |
HuBERT large | 99.86 | 99.86 | 99.86 | |
WavLM large | 99.78 | 99.78 | 99.78 | |
HuBERT base | 99.62 | 99.62 | 99.62 | |
WavLM base | 99.10 | 99.10 | 99.10 | |
data2vec 2.0 base | 98.55 | 98.55 | 98.55 | |
data2vec 2.0 large | 99.54 | 99.54 | 99.54 | |
wav2vec 2.0 base | 97.92 | 97.92 | 97.92 | |
data2vec large | 96.89 | 96.89 | 96.89 | |
data2vec base | 94.67 | 94.67 | 94.65 |
For the dataset TurEV-DB (tr), we report the following results:
Rank | Model | UA | WA | F1 |
---|---|---|---|---|
Whisper large v3 | 81.32 | 81.58 | 81.31 | |
WavLM large | 79.50 | 80.09 | 79.51 | |
HuBERT base | 72.81 | 73.26 | 72.89 | |
HuBERT large | 70.93 | 72.08 | 70.96 | |
wav2vec 2.0 base | 67.19 | 68.01 | 67.19 | |
WavLM base | 69.97 | 70.51 | 70.18 | |
data2vec 2.0 base | 63.44 | 63.69 | 63.37 | |
data2vec 2.0 large | 63.77 | 64.13 | 63.00 | |
data2vec large | 54.69 | 55.11 | 54.56 | |
data2vec base | 51.62 | 52.58 | 51.52 |
For the dataset URDU (ur), we report the following results:
Rank | Model | UA | WA | F1 |
---|---|---|---|---|
Whisper large v3 | 82.52 | 82.52 | 82.41 | |
HuBERT base | 88.41 | 88.41 | 88.40 | |
wav2vec 2.0 base | 87.50 | 87.50 | 87.57 | |
WavLM large | 86.61 | 86.61 | 86.64 | |
HuBERT large | 81.75 | 81.75 | 81.66 | |
WavLM base | 82.82 | 82.82 | 82.85 | |
data2vec 2.0 base | 69.97 | 69.97 | 69.91 | |
data2vec 2.0 large | 78.10 | 78.10 | 78.10 | |
data2vec base | 65.42 | 65.42 | 65.35 | |
data2vec large | 64.81 | 64.81 | 64.93 |