The MDX-B Karaoke model was prepared as part of the Ultimate Vocal Remover project. The model produces high-quality lead vocal extraction from a music track. The model is available in two versions. In the first version, the neural network model is used directly on the entire track. In the second version, the track is first divided into two parts, vocal and instrumental, and then the neural network model is applied only to the vocal part. In the second version, the quality of separation is usually higher and it becomes possible to additionally separate the backing vocals into a separate track. The model was compared with two other models from UVR (they are also available on the website) on a large validation set. The metric used is SDR: the higher the better.
See the results in the table below.
Validation type | Algorithm name |
|||
UVR (HP-KAROKEE-MSB2-3BAND-3090) | UVR (karokee_4band_v2_sn) | MDX-B Karaoke (Type 0) | MDX-B Karaoke (Type 1) | |
Validation lead vocals | 6.46 | 6.34 | 6.81 | 7.94 |
Validation other | 13.17 | 13.02 | 13.53 | 14.66 |
Validation back vocals | --- | --- | --- | 1.88 |