Ensemble of UVR-MDX-NET-Voc_FT, Demucs4 Vocals 2023 and two MDX23C models. Algorithm gives the highest possible quality for vocal and instrumental stems.
This ensemble is based on algorithm which took 2nd place at Music Demixing Track of Sound Demixing Challenge 2023. The main changes comparing to contest version is much better vocal models, which is used here. We use following 4 models for vocals: UVR-MDX-NET-Voc_FT, Demucs4 Vocals 2023 and two models MDX23C. For stems 'bass', 'drums' and 'other' we us the following 4 models: demucsht_ft, deumcs_ht, demucs_6s and demucs_mmi. Initial winning model available here: https://github.com/ZFTurbo/MVSEP-MDX23-music-separation-model
Algorithm Demucs4 HT splits track into 4 stems (bass, drums, vocals, other). It's now best for bass/drums/other separation. It was released in year 2022 and has 3 versions:
htdemucs_ft - best quality, but slow
htdemucs - lower quality, but fast
htdemucs_6d - it has 2 additional stems "piano" and "guitar" (quality for them is still so-so).
New set of models MDX23C is based on code released by kuielab for Sound Demixing Challenge 2023. All models are full band, e.g. they don't cut high frequences.
Demucs4 Vocals 2023 model - it's Demucs4 HT model fine-tuned on big vocal/instrumental dataset. It has better metrics for vocals separation compared to Demucs4 HT (_ft version). It usually gives worse metrics than MDX23C models, but can be useful for ensembles, since the model is very different from MDX23C.
The MDX-B Karaoke model was prepared as part of the Ultimate Vocal Remover project. The model produces high-quality lead vocal extraction from a music track.The model is available in two versions.In the first version, the neural network model is used directly on the entire track.In the second version, the track is first divided into two parts, vocal and instrumental, and then the neural network model is applied only to the vocal part.In the second version, the quality of separation is usually higher and it becomes possible to additionally separate the backing vocals into a separate track.The model was compared with two other models from UVR (they are also available on the website) on a large validation set.The metric used is SDR: the higher the better.
MVSep Piano model is based on MDX23C architecture. It produces high quality separation. Model was compared with other two models (Demucs4HT (6 stems) and GSEP) on two validation sets. First validation includes electric piano as part of piano, while 2nd only contains acoustic piano (grand piano). Used metrics is SDR: the more the better.
See the results in table below.
Validation type
Algorithm name
Demucs4HT (6 stems)
GSEP
MVSep Piano 2023 (Type 0)
MVSep Piano 2023 (Type 1)
Validation full
2.4432
3.5589
4.9187
4.9772
Validation (only grand piano)
4.5591
5.7180
7.2651
7.2948
The model is available in two variants. In the first variant, the Piano model is used directly on the entire track. In the second variant, the track is first divided into two parts, vocal and instrument, and then the Piano model is applied to the instrument part only. In the second case, the separation quality is usually a bit better.
The MVSep Guitar model is based on the MDX23C architecture.The model produces high-quality separation of music into a guitar part (including acoustic and electronic) and everything else.The model was compared with the Demucs4HT model (6 stems) on a guitar validation set.The metric used is SDR: the higher the better.
See the results in the table below.
Validation type
Algorithm name
Demucs4HT (6 stems)
MVSep Guitar 2023 (Type 0)
MVSep Guitar 2023 (Type 1)
Validation guitar
7.2245
7.7716
7.9251
Validation other
13.1756
13.7227
13.8762
The model is available in two versions.In the first version, the neural network model for the guitar is used directly on the entire track.In the second case, the track is first divided into two parts, vocal and instrumental, and then the neural network model for the guitar is applied only to the instrumental part.In the second case, the separation quality is usually slightly higher.
Algorithm Demucs3 splits track into 4 stems (bass, drums, vocals, other). The winner of the Music Demuxing Challenge 2021. Only MUSDB18 training data was used for training of model, so quality is worse than Demucs3 Model B. Demucs3 Model A and Demucs3 Model B has the same architecture, but has different weights.