1) We upgraded our main MDX23C 8K FFT model to split tracks into vocal and instrumental parts. SDR metrics have increased on MultiSong Dataset and on Synth Dataset. Separation results have improved accordingly on both Ensemble 4 and Ensemble 8 models. See the changes in the table below.
Algorithm name | Multisong dataset | Synth dataset | MDX23 Leaderboard |
||
SDR Vocals | SDR Instrumental | SDR Vocals | SDR Instrumental | SDR Vocals | |
8K FFT, Full Band (Old version) | 10.01 | 16.32 | 12.07 | 11.77 | 10.85 |
8K FFT, Full Band (New version) | 10.17 | 16.48 | 12.35 | 12.06 | 11.04 |
2) We have added two new models MVSep Piano (demo) and MVSep Guitar (demo). Both models are based on the MDX23C architecture. The models produce high quality separation of music into piano/guitar part and everything else. Each of the models is available in two variants. In the first variant, the neural network model is used directly on the entire track. In the second variant, the track is first split into two parts, vocal and instrumental, and then the neural network model is applied only to the instrumental part. In the second case, the separation quality is usually a bit higher. We also prepared a small internal validation set to compare the models by the quality of separation of piano/guitar from the main track. Our model was compared with two other models (Demucs4HT (6 stems) and GSEP). For the piano, we have two validation sets. The first set includes the electric piano as part of the piano part and the second set includes only the acoustic piano.
The metric used is SDR: the larger the better. See the results in the two tables below.
Validation type | Algorithm name | |||
Demucs4HT (6 stems) | GSEP | MVSep Piano 2023 (Type 0) | MVSep Piano 2023 (Type 1) | |
Validation full | 2.4432 | 3.5589 | 4.9187 | 4.9772 |
Validation (only grand piano) | 4.5591 | 5.7180 | 7.2651 | 7.2948 |
Validation type | Algorithm name | |||
Demucs4HT (6 stems) | MVSep Guitar 2023 (Type 0) | MVSep Guitar 2023 (Type 1) | ||
Validation guitar | 7.2245 | 7.7716 | 7.9251 | |
Validation other | 13.1756 | 13.7227 | 13.8762 |
3) We have updated the MDX-B Karaoke model (demo). It now has better quality metrics. The MDX-B Karaoke model was originally prepared as part of the Ultimate Vocal Remover project. The model produces high quality extraction of the lead vocal part from a music track. We have also made it available in two variants. In the first variant, the neural network model is used directly on the whole track. In the second variant, the track is first divided into two parts, vocal and instrumental, and then the neural network model is applied only to the vocal part. In the second case, the separation quality is usually higher and it is possible to extract backing vocals into a separate track. The model was compared on a large validation set with two other Karaoke models from UVR (they are also available on the website). See the results in the table below.
Validation type | Algorithm name | |||
UVR (HP-KAROKEE-MSB2-3BAND-3090) | UVR (karokee_4band_v2_sn) | MDX-B Karaoke (Type 0) | MDX-B Karaoke (Type 1) | |
Validation lead vocals | 6.46 | 6.34 | 6.81 | 7.94 |
Validation other | 13.17 | 13.02 | 13.53 | 14.66 |
Validation back vocals | --- | --- | --- | 1.88 |