Algorithm for extracting only lead vocals and everything else based on the MelBand Roformer model. It works for any music track, but you can also pre-extract vocals by selecting the "Extract vocals first" option in Extraction type. In the second case, back vocals will be available in a separate file.
There are 4 models, one prepared by the team @aufr33 and viperx, the second by @becruily, 3rd by @gabox and 4th it's fused model from @gabox's and team @aufr33/viperx.
Quality metrics are given below. For comparison, the table also provides quality metrics for the old UVR and MDX-B Karaoke algorithms.
Algorithm name | Lead Vocals (SDR) | Back Vocals (SDR) | Back Vocals + Instrum SDR | Instrum SDR |
UVR (HP-KAROKEE-MSB2-3BAND-3090) | 6.42 | --- | 11.79 | --- |
UVR (karokee_4band_v2_sn) | 6.72 | --- | 12.09 | --- |
UVR (UVR-BVE-4B_SN-44100-1) | --- | 0.87 | --- | 4.90 |
MDX-B (Karaoke) | 7.42 | --- | 12.81 | --- |
MDX-B (Karaoke) Extract from vocals | 8.28 | 4.46 | 13.67 | 15.94 |
MelBand Roformer (@aufr33 и viperx) | 9.45 | --- | 14.84 | --- |
MelBand Roformer (@becruily) | 9.61 | --- | 15.00 | --- |
MelBand Roformer (@gabox) | 9.67 | --- | 15.06 | --- |
MelBand Roformer (Fused @gabox and @aufr33/viperx) | 9.85 | --- | 15.23 | --- |
MelBand Roformer (@aufr33 и viperx) extract vocals first | 9.22 | 5.27 | 14.61 | 15.94 |
MelBand Roformer (@becruily) extract vocals first | 8.98 | 4.98 | 14.24 | 15.94 |
MelBand Roformer (@gabox) extract vocals first | 9.36 | 5.46 | 14.75 | 15.94 |
MelBand Roformer (Fused @gabox and @aufr33/viperx) extract vocals first | 9.62 | 5.63 | 15.01 | 15.94 |