Vocal & Instrumental Isolation

Algorithm for separating tracks into vocal and instrumental parts based on the SCNet neural network. The neural network was proposed in the article "SCNet: Sparse Compression Network for Music Source Separation" by a group of scientists from China. The authors made the neural network code open source, and the MVSep team was able to reproduce results similar to those presented in the published article. First, we trained a small version of SCNet, and then after some time, a heavier version of SCNet was prepared. The quality metrics are quite close to the quality of Roformer models (which are the top models at the moment), but still slightly inferior. However, in some cases, the model can work better than Roformers.

Quality metrics

Algorithm name	Multisong dataset		Synth dataset		MDX23 Leaderboard
Algorithm name	SDR Vocals	SDR Instrumental	SDR Vocals	SDR Instrumental	SDR Vocals
SCNet	10.25	16.56	12.27	11.97	---
SCNet Large	10.74	17.05	12.89	12.59	---
SCNet XL	10.96	17.27	13.08	12.78	---
SCNet XL (high fullness)	10.92	17.23	---	---	---
SCNet XL (very high fullness)	10.40	16.60	---	---	---
SCNet XL IHF	11.11	17.41	13.29	12.99	---
SCNet XL IHF (high instrum fullness by becruily)	10.89	17.20	13.31	12.98	---

Detailed statistics on Multisong dataset:

Model	Vocals fullness	Vocals bleedless	Vocals SDR	Vocals L1Freq	Instrum fullness	Instrum bleedless	Instrum SDR	Instrum L1Freq
SCNet	17.34	25.24	10.25	35.47	29.35	32.34	16.56	36.24
SCNet Large	17.70	26.84	10.74	36.86	27.10	41.47	17.05	37.62
SCNet XL	17.96	26.95	10.96	37.35	28.74	39.42	17.27	38.09
SCNet XL (high fullness)	21.67	25.00	10.92	37.70	31.95	34.06	17.23	37.91
SCNet XL (very high fullness)	23.50	25.30	10.40	37.16	34.04	35.15	16.60	36.78
SCNet XL IHF	17.98	28.31	11.11	37.91	28.87	40.37	17.41	38.54
SCNet XL IHF (high instrum fullness by becruily)	22.70	25.48	10.89	38.18	32.31	38.15	17.20	38.43

SCNet (vocals, instrumental)

Advanced features

Company

Extra