September updates

2023-09-18

1) We upgraded our main MDX23C 8K FFT model to split tracks into vocal and instrumental parts. SDR metrics have increased on MultiSong Dataset and on Synth Dataset. Separation results have improved accordingly on both Ensemble 4 and Ensemble 8 models. See the changes in the table below.

Algorithm name	Multisong dataset		Synth dataset		MDX23 Leaderboard
Algorithm name	SDR Vocals	SDR Instrumental	SDR Vocals	SDR Instrumental	SDR Vocals
8K FFT, Full Band (Old version)	10.01	16.32	12.07	11.77	10.85
8K FFT, Full Band (New version)	10.17	16.48	12.35	12.06	11.04

2) We have added two new models MVSep Piano (demo) and MVSep Guitar (demo). Both models are based on the MDX23C architecture. The models produce high quality separation of music into piano/guitar part and everything else. Each of the models is available in two variants. In the first variant, the neural network model is used directly on the entire track. In the second variant, the track is first split into two parts, vocal and instrumental, and then the neural network model is applied only to the instrumental part. In the second case, the separation quality is usually a bit higher. We also prepared a small internal validation set to compare the models by the quality of separation of piano/guitar from the main track. Our model was compared with two other models (Demucs4HT (6 stems) and GSEP). For the piano, we have two validation sets. The first set includes the electric piano as part of the piano part and the second set includes only the acoustic piano.
The metric used is SDR: the larger the better. See the results in the two tables below.

Validation type	Algorithm name
Validation type	Demucs4HT (6 stems)	GSEP	MVSep Piano 2023 (Type 0)	MVSep Piano 2023 (Type 1)
Validation full	2.4432	3.5589	4.9187	4.9772
Validation (only grand piano)	4.5591	5.7180	7.2651	7.2948

Validation type	Algorithm name
	Demucs4HT (6 stems)	MVSep Guitar 2023 (Type 0)	MVSep Guitar 2023 (Type 1)
Validation guitar	7.2245	7.7716	7.9251
Validation other	13.1756	13.7227	13.8762

3) We have updated the MDX-B Karaoke model (demo). It now has better quality metrics. The MDX-B Karaoke model was originally prepared as part of the Ultimate Vocal Remover project. The model produces high quality extraction of the lead vocal part from a music track. We have also made it available in two variants. In the first variant, the neural network model is used directly on the whole track. In the second variant, the track is first divided into two parts, vocal and instrumental, and then the neural network model is applied only to the vocal part. In the second case, the separation quality is usually higher and it is possible to extract backing vocals into a separate track. The model was compared on a large validation set with two other Karaoke models from UVR (they are also available on the website). See the results in the table below.

Validation type	Algorithm name
Validation type	UVR (HP-KAROKEE-MSB2-3BAND-3090)	UVR (karokee_4band_v2_sn)	MDX-B Karaoke (Type 0)	MDX-B Karaoke (Type 1)
Validation lead vocals	6.46	6.34	6.81	7.94
Validation other	13.17	13.02	13.53	14.66
Validation back vocals	---	---	---	1.88

🗎 Copy link

September updates

Site information

Company

Extra