Алгоритм для разделения треков на вокальную и инструментальную части на базе нейронной сети MelBand Roformer. Нейронная сеть впервые предложена в статье "Mel-Band RoFormer for Music Source Separation" от группы ученых из компании ByteDance. Первые веса высокого качества в открытый доступ выложила Kimberley Jensen. Далее нейронная сеть с открытыми весами была немного изменена и дотренирована командой MVSep с целью улучшить метрики качества. Также имеются высококачественные веса, предоставленные: @Bas Curtiz, @unwa, @becruily и @gabox.
Таблица качества
Algorithm name | Multisong dataset | Synth dataset | MDX23 Leaderboard |
||
SDR Vocals | SDR Instrumental | SDR Vocals | SDR Instrumental | SDR Vocals | |
MelBand Roformer (Kimberley Jensen) | 11.01 | 17.32 | 12.68 | 12.38 | 11.543 |
MelBand Roformer (ver. 2024.08) | 11.17 | 17.48 | 13.34 | 13.05 | --- |
Bas Curtiz edition | 11.18 | 17.49 | 13.89 | 13.60 | --- |
unwa Instrumental v1 | 10.24 | 16.54 | 12.25 | 11.95 | --- |
unwa Instrumental v1e Note: Max instrum fullness, but noisy |
10.05 | 16.36 | --- | --- | --- |
unwa big beta v5e Note: Max vocals fullness, but noisy |
10.59 | 16.89 | --- | --- | --- |
MelBand Roformer (ver. 2024.10) | 11.28 | 17.59 | 13.89 | 13.59 | --- |
becruily instrum max fullness Note: Max instrum fullness, but noisy |
10.16 | 16.47 | --- | --- | --- |
becruily vocals max fullness Note: Max vocals fullness, but noisy |
10.55 | 16.86 | --- | --- | --- |
unwa Instrumental v1e plus Note: Max instrum fullness, but noisy |
10.33 | 16.64 | --- | --- | --- |
gabox Instrumental v7 Note: Max instrum fullness, but noisy |
10.32 | 16.63 | --- | --- | --- |
Детальная статистика на Multisong датасете:
Модель | Vocals fullness | Vocals bleedless | Vocals SDR | Vocals L1Freq | Instrum fullness | Instrum bleedless | Instrum SDR | Instrum L1Freq |
MelBand Roformer (Kimberley Jensen) | 16.66 | 36.51 | 11.01 | 38.96 | 27.71 | 46.72 | 17.32 | 39.77 |
MelBand Roformer (ver. 2024.08) | 16.39 | 39.13 | 11.18 | 39.26 | 27.74 | 47.07 | 17.49 | 40.16 |
Bas Curtiz edition | 16.30 | 38.94 | 11.18 | 39.18 | 27.49 | 47.00 | 17.49 | 40.15 |
MelBand Roformer (ver. 2024.10) | 16.92 | 37.78 | 11.28 | 39.41 | 27.71 | 47.29 | 17.59 | 40.29 |
unwa Instrumental v1 (SDR vocals: 10.24, SDR instrum: 16.54) | 15.89 | 27.48 | 10.24 | 36.06 | 35.44 | 38.02 | 16.55 | 38.67 |
unwa Instrumental v1e (SDR vocals: 10.05, SDR instrum: 16.36) | 14.67 | 26.83 | 10.06 | 34.37 | 38.85 | 35.68 | 16.37 | 38.31 |
unwa big beta v5e (SDR vocals: 10.59, SDR instrum: 16.89) | 20.78 | 32.02 | 10.59 | 38.53 | 25.65 | 45.90 | 16.90 | 37.31 |
becruily instrum high fullness (SDR instrum: 16.47) | 15.76 | 30.15 | 10.16 | 35.84 | 33.93 | 40.55 | 16.47 | 38.86 |
becruily vocals high fullness (SDR vocals: 10.55) | 20.72 | 31.25 | 10.55 | 38.84 | 28.28 | 40.85 | 16.86 | 38.24 |
unwa Instrumental v1e plus (SDR vocals: 10.33, SDR instrum: 16.64) | 14.96 | 31.89 | 10.33 | 35.76 | 36.20 | 38.57 | 16.64 | 39.04 |
gabox Instrumental v7 (SDR vocals: 10.32, SDR instrum: 16.63) | 16.25 | 27.28 | 10.32 | 36.85 | 29.34 | 45.06 | 16.63 | 38.70 |