Vocal & Instrumental Isolation

MVSep DnR v3 is a cinematic model for splitting tracks into 3 stems: music, sfx and speech. It is trained on a huge multilingual dataset DnR v3. The quality metrics on the test data turned out to be better than those of a similar multilingual model Bandit v2. The model is available in 3 variants: based on SCNet, MelBand Roformer architectures, and an ensemble of these two models. See the table below:

Algorithm name	SDR Metric on DnR v3 leaderboard
	music (SDR)	sfx (SDR)	speech (SDR)
SCNet Large	9.94	11.35	12.59
Mel Band Roformer	9.45	11.24	12.27
Ensemble (Mel + SCNet)	10.15	11.67	12.81
Bandit v2 (for reference)	9.06	10.82	12.29

MVSep DnR v3 (speech, music, effects)

Advanced features

Company

Extra