Vocal & Instrumental Isolation

Algorithm for extracting only lead vocals and everything else based on the MelBand Roformer and SCNet models. It works for any music track, but you can also pre-extract vocals by selecting the "Extract vocals first" option in Extraction type. In the second case, back vocals will be available in a separate file.

There are 5 models, one prepared by the team @aufr33 and viperx, the second by @becruily, 3rd by @gabox and 4th it's fused model from @gabox's and team @aufr33/viperx. Additionally, a model based on the SCNet XL IHF architecture by @becruily has been added separately.

Quality metrics are given below. For comparison, the table also provides quality metrics for the old UVR and MDX-B Karaoke algorithms.

Algorithm name	Lead Vocals (SDR)	Back Vocals (SDR)	Back Vocals + Instrum SDR	Instrum SDR
UVR (HP-KAROKEE-MSB2-3BAND-3090)	6.42	---	11.79	---
UVR (karokee_4band_v2_sn)	6.72	---	12.09	---
UVR (UVR-BVE-4B_SN-44100-1)	---	0.87	---	4.90
MDX-B (Karaoke)	7.42	---	12.81	---
MDX-B (Karaoke) Extract from vocals	8.28	4.46	13.67	15.94
MelBand Roformer (@aufr33 и viperx)	9.45	---	14.84	---
MelBand Roformer (@becruily)	9.61	---	15.00	---
MelBand Roformer (@gabox)	9.67	---	15.06	---
MelBand Roformer (Fused @gabox and @aufr33/viperx)	9.85	---	15.23	---
SCNet XL IHF (@becruily)	9.53	---	14.91	---
BS Roformer (@frazer and @becruily)	10.10	---	15.48	---
MelBand Roformer (@aufr33 и viperx) extract vocals first	9.22	5.27	14.61	15.94
MelBand Roformer (@becruily) extract vocals first	8.98	4.98	14.24	15.94
MelBand Roformer (@gabox) extract vocals first	9.36	5.46	14.75	15.94
MelBand Roformer (Fused @gabox and @aufr33/viperx) extract vocals first	9.62	5.63	15.01	15.94

MVSep Karaoke (lead/back vocals)

Advanced features

Company

Extra