Vocal & Instrumental Isolation

1) After release of viperx BS Roformer weights we finetuned them on our dataset. And we were able to improve their SDRs even further. So we added new version of BSRoformer weights. Currently it's probably best available models in the world.

Multisong validation dataset:
SDR vocals: 10.87 -> 11.24
SDR insrum: 17.17 -> 17.55

Synth validation dataset:
SDR vocals: 12.71 -> 13.47
SDR insrum: 12.41 -> 13.17

2) Ensembles also improved:

Ensemble (vocals, instrum) on Multisong dataset:
SDR vocals: 11.06 -> 11.33
SDR instrum: 17.37 -> 17.63

Ensemble (vocals, instrum) on Synth dataset:
SDR vocals: 13.00 -> 13.57
SDR instrum: 12.70 -> 13.27

Ensemble (vocals, instrum, bass, drums, other):
SDR vocals: 11.06 -> 11.33
SDR instrum: 17.37 -> 17.63
SDR bass: 12.57 -> 12.59
SDR drums: 11.94 -> 11.99
SDR other: 7.22 -> 7.33

3) We were reported about some "click" sounds in separated stems. We improved our inference code. They must have gone now. Please check an report us if the problem still exists.

Music & Voice Separation

MVSEP performs separation of audio on voice and music parts

Updates of vocal models and ensembles

Advanced features

Company

Extra