There are a lot of algorithms at MVSep now. Which algorithm to choose?
- If you need good isolated vocals or instrumental then use one of: Ultimate Vocal Remover HQ, MDX-B, Demucs3 (Model B)
- If you need good bass, drums, other: Demucs3 (Model B)
For comparsion of algorithm we use SDR (signal-to-distortion ratio) metric. The larger the metric the better the result of algorithm.
Table 1. Comparsion on test set of MUSDB18HQ
Algorithm | SDR bass | SDR drums | SDR other | SDR vocals | SDR instrumental (inverted vocals) |
Model link | Demos |
---|---|---|---|---|---|---|---|
spleeter (2 stems) |
--- | --- | --- | 6.8647 | 13.3231 | Link | Demos |
spleeter (4 stems) |
4.8200 | 6.3390 | 4.5362 | 6.7021 | 13.1434 | Link | Demos |
spleeter (5 stems) |
4.6376 | 6.1300 | 3.8689 | 6.5027 | 12.9646 | Link | Demos |
Unmix XL |
5.9577 | 7.7001 | 5.2165 | 7.6852 | 14.1339 | Link | Demos |
Unmix HQ |
4.6124 | 6.3807 | 3.6915 | 6.0783 | 12.5660 | Link | Demos |
Unmix SD |
4.7894 | 6.2632 | 3.8281 | 6.1822 | 12.6689 | Link | Demos |
Demucs 2 |
4.6145 | 6.1588 | 3.1786 | 5.3980 | 11.8388 | Link | Demos |
MDX-A |
4.9803 | 6.1111 | 4.1430 | 7.1758 | 13.6192 | Link | Demos |
MDX-B (Default + Demucs2 data) * |
5.2035 | 7.7192 | 5.3624 | 7.9621 | 14.3854 | Link | Demos |
MDX-B (ONNX Only) * |
6.5687 | 10.2110 | 7.3126 | 9.9084 | 16.3305 | Link | Demos |
UVR HQ (2 stems) |
4.1616 | 6.1976 | --- | 8.6975 | 14.7872 | Link | Demos |
Demucs 3 (Model A) |
7.6054 | 8.8748 | 5.5306 | 8.2012 | 14.6347 | Link | Demos |
Demucs 3 (Model B) * |
11.3270 | 12.0055 | 8.2793 | 9.9202 | 16.2890 | Link | Demos |
Zero Shot (QBLWLD) |
2.6324 | 3.3939 | 1.4146 | 4.1016 | --- | Link | Demos |
Danna Sep (CPU) |
6.3462 | 7.8521 | 5.0470 | 7.9611 | 14.4007 | Link | Demos |
Byte Dance | --- | --- | --- | 8.1485 | 14.5739 | Link | Demos |
UVR Demucs (Model 1) | --- | --- | --- | 9.0877 | 15.4612 | Link | Demos |
MVSep Vocal model v2 | --- | --- | --- | 8.8292 | 15.2719 | Link | Demos |
Demucs4 HT | 8.9770 | 10.0886 | 6.1301 | 9.0252 | 15.4318 | Link | Demos |
* - these numbers incorrect because MUSDB18 test set was used to train these models.
Algorithm | Quality (Bass) | Quality (Drums) | Quality (Other) | Quality (Vocals) | Examples |
---|---|---|---|---|---|
Spleeter (4 stems) | 5.774 | 5.845 | 4.321 | 6.939 | Example |
UmxXL | 6.619 | 6.838 | 4.891 | 7.732 | Example |
MDX A | 7.232 | 7.173 | 5.636 | 8.901 | Example |
MDX B (Orig) | 7.495 | 7.554 | 5.533 | 8.896 | --- |
MDX B (UVR) | 7.495 | 7.554 | 5.533 | 9.482 | Example |
Ultimate Vocal Remover HQ | --- | --- | --- | --- | Example |
Demucs 3 Model A | 8.115 | 8.037 | 5.193 | 7.968 | Example |
Demucs 3 Model B | 8.856 | 8.850 | 5.978 | 8.756 | Example |
Danna Sep | 6.993 | 7.018 | 4.901 | 7.686 | --- |
Byte Dance | ---- | ---- | ---- | 8.079 | --- |
Table 3. Comparsion of algorithms based on synthetic dataset. SDR metric (higher is better)
Algorithm | Quality (Vocals) | Quality (Instrumental) |
---|---|---|
Spleeter (2 stems) | 7.1930 | 6.6612 |
Spleeter (4 stems) | 7.3168 | 7.0206 |
Spleeter (5 stems) | 7.1761 | 6.8799 |
Unmix XL | 8.4581 | 8.1619 |
Unmix HQ | 6.9301 | 6.6339 |
Unmix SD | 7.0438 | 6.7476 |
MDX-A | 8.6540 | 8.3578 |
MDX-B | 10.8872 | 10.4585 |
UVR HQ (2 stems) | 9.4008 | 9.0839 |
Demucs 3 (Model A) | 9.0464 | 8.7502 |
Demucs 3 (Model B) | 9.7837 | 9.4875 |
Demux 2 | 8.5364 | 8.2402 |
Danna Sep | 8.5975 | 8.3013 |
Byte Dance | 7.9893 | 7.6931 |
UVR Demucs (Model 1) | 8.7951 | 8.6191 |
MVSep Vocal model v2 | 10.4523 | 10.1561 |
Demucs4 HT | 10.2397 | 9.9435 |
Table 4. Comparsion of aggresiveness for model HP2-4BAND-3090_4band_arch-500m_1 on synthetic dataset. SDR metric (higher is better)
Aggressiveness | Quality (Vocals) | Quality (Instrumental) |
---|---|---|
0.0 | 9.3259 | 8.8948 |
0.1 | 9.3580 | 8.9277 |
0.2 | 9.3824 | 8.9527 |
0.3 | 9.4008 | 8.9719 |
0.4 | 9.4147 | 8.9864 |
0.5 | 9.4250 | 8.9972 |
0.6 | 9.4324 | 9.0051 |
0.7 | 9.4374 | 9.0106 |
0.8 | 9.4404 | 9.0142 |
0.9 | 9.4419 | 9.0161 |
1.0 | 9.4420 | 9.0167 |
turbo@mvsep.com