Algorithm for extracting only lead vocals and everything else based on the MelBand Roformer model. It works for any music track, but you can also pre-extract vocals by selecting the "Extract vocals first" option in Extraction type. In the second case, back vocals will be available in a separate file.
There are 2 models, one prepared by the team @aufr33 and viperx, the second by @becruily.
Quality metrics are given below. For comparison, the table also provides quality metrics for the old UVR and MDX-B Karaoke algorithms.
Algorithm name | Lead Vocals (SDR) | Back Vocals (SDR) | Back Vocals + Instrum SDR | Instrum SDR |
UVR (HP-KAROKEE-MSB2-3BAND-3090) | 7.08 | --- | 12.23 | --- |
UVR (karokee_4band_v2_sn) | 7.20 | --- | 12.36 | --- |
UVR (UVR-BVE-4B_SN-44100-1) | --- | 0.68 | --- | --- |
MDX-B (Karaoke) | 8.32 | --- | 13.49 | --- |
MDX-B (Karaoke) Extract from vocals | 9.21 | 4.81 | 14.38 | 15.49 |
MelBand Roformer (@aufr33 и viperx) | 10.36 | --- | 15.53 | --- |
MelBand Roformer (@becruily) | 10.47 | --- | 15.64 | --- |
MelBand Roformer (@aufr33 и viperx) extract vocals first | 10.06 | 5.41 | 15.24 | 15.49 |
MelBand Roformer (@becruily) extract vocals first | 9.67 | 5.05 | 14.70 | 15.49 |