We have prepared a unique model for removing crowd sounds from music recordings (applause, clapping, whistling, noise, etc.). Current metrics on our internal dataset for quality control:
- SDR crowd: 5.65
- SDR other: 19.31
2023-11-20
We have prepared a unique model for removing crowd sounds from music recordings (applause, clapping, whistling, noise, etc.). Current metrics on our internal dataset for quality control:
2023-11-11
We upgraded our main MDX23C 8K FFT model to split tracks into vocal and instrumental parts. SDR metrics have increased on MultiSong Dataset and on Synth Dataset. Separation results have improved accordingly on both Ensemble 4 and Ensemble 8 models. See the changes in the table below.
Algorithm name | Multisong dataset | Synth dataset | MDX23 Leaderboard |
||
SDR Vocals | SDR Instrumental | SDR Vocals | SDR Instrumental | SDR Vocals | |
8K FFT, Full Band (Previous version) | 10.17 | 16.48 | 12.35 | 12.06 | 11.04 |
8K FFT, Full Band (New version) | 10.36 | 16.66 | 12.52 | 12.22 | 11.16 |
Ensemble 4 (Previous version) | 10.32 | 16.63 | 12.67 | 12.38 | 11.09 |
Ensemble 4 (New version) | 10.44 | 16.74 | 12.76 | 12.46 | 11.17 |
The previous version of MDX23C 8K FFT is also available for use.
2023-09-18
1) We upgraded our main MDX23C 8K FFT model to split tracks into vocal and instrumental parts. SDR metrics have increased on MultiSong Dataset and on Synth Dataset. Separation results have improved accordingly on both Ensemble 4 and Ensemble 8 models. See the changes in the table below.
Algorithm name | Multisong dataset | Synth dataset | MDX23 Leaderboard |
||
SDR Vocals | SDR Instrumental | SDR Vocals | SDR Instrumental | SDR Vocals | |
8K FFT, Full Band (Old version) | 10.01 | 16.32 | 12.07 | 11.77 | 10.85 |
8K FFT, Full Band (New version) | 10.17 | 16.48 | 12.35 | 12.06 | 11.04 |
2) We have added two new models MVSep Piano (demo) and MVSep Guitar (demo). Both models are based on the MDX23C architecture. The models produce high quality separation of music into piano/guitar part and everything else. Each of the models is available in two variants. In the first variant, the neural network model is used directly on the entire track. In the second variant, the track is first split into two parts, vocal and instrumental, and then the neural network model is applied only to the instrumental part. In the second case, the separation quality is usually a bit higher. We also prepared a small internal validation set to compare the models by the quality of separation of piano/guitar from the main track. Our model was compared with two other models (Demucs4HT (6 stems) and GSEP). For the piano, we have two validation sets. The first set includes the electric piano as part of the piano part and the second set includes only the acoustic piano.
The metric used is SDR: the larger the better. See the results in the two tables below.
Validation type | Algorithm name | |||
Demucs4HT (6 stems) | GSEP | MVSep Piano 2023 (Type 0) | MVSep Piano 2023 (Type 1) | |
Validation full | 2.4432 | 3.5589 | 4.9187 | 4.9772 |
Validation (only grand piano) | 4.5591 | 5.7180 | 7.2651 | 7.2948 |
Validation type | Algorithm name | |||
Demucs4HT (6 stems) | MVSep Guitar 2023 (Type 0) | MVSep Guitar 2023 (Type 1) | ||
Validation guitar | 7.2245 | 7.7716 | 7.9251 | |
Validation other | 13.1756 | 13.7227 | 13.8762 |
3) We have updated the MDX-B Karaoke model (demo). It now has better quality metrics. The MDX-B Karaoke model was originally prepared as part of the Ultimate Vocal Remover project. The model produces high quality extraction of the lead vocal part from a music track. We have also made it available in two variants. In the first variant, the neural network model is used directly on the whole track. In the second variant, the track is first divided into two parts, vocal and instrumental, and then the neural network model is applied only to the vocal part. In the second case, the separation quality is usually higher and it is possible to extract backing vocals into a separate track. The model was compared on a large validation set with two other Karaoke models from UVR (they are also available on the website). See the results in the table below.
Validation type | Algorithm name | |||
UVR (HP-KAROKEE-MSB2-3BAND-3090) | UVR (karokee_4band_v2_sn) | MDX-B Karaoke (Type 0) | MDX-B Karaoke (Type 1) | |
Validation lead vocals | 6.46 | 6.34 | 6.81 | 7.94 |
Validation other | 13.17 | 13.02 | 13.53 | 14.66 |
Validation back vocals | --- | --- | --- | 1.88 |
2023-08-08
We have a lot of updates. First of all we redid the site from scratch. It has new features like user registration, more informative pages, better design etc. But also we added set of new algorithms:
1) We have released MDX23C models and made update for them. One of models reached 10 SDR on multisong dataset. Currently it's best single models for separation of vocals/instrumental.
2) We added new algorithm Demucs4 Vocals 2023. It's algorithm demucsht_ft but finetuned on big dataset. Metrics are better than for original, but slightly worse than MDX23C. On some melodies it can give more cleaner results.
3) We added new Ensemble algorithms. First is "Ensemble 4 models (vocals, instrum)". It includes: UVR-MDX-NET-Voc_FT, Demucs4 Vocals 2023 and two MDX23C models. Algorithm gives the highest possible quality for vocal and instrumental stems. Also if you need more detailed separation including 3 more stems "bass", "drums", "other" you can use "Ensemble 8 models (vocals, bass, drums, other)". This ensemble gives state of art results for 4 stems separation.
You can find comparison tables below (larger SDR is better).
Algorithm name | Multisong dataset | Synth dataset | MDX23 Leaderboard | ||
SDR Vocals | SDR Instrumental | SDR Vocals | SDR Instrumental | SDR Vocals | |
Ensemble of 4 models | 10.18 | 16.48 | 12.25 | 11.95 | 10.95 |
MDX23C, 8K FFT, Full Band | 10.01 | 16.32 | 12.07 | 11.77 | 10.85 |
UVR-MDX-NET-Voc_FT | 9.64 | 15.95 | 11.40 | 11.10 | 10.50 |
Demucs4 HT Vocals 2023 | 9.04 | 15.35 | 11.59 | 11.29 | 9.61 |
Demucs4 HT default (htdemucs_ft) | 8.33 | 14.63 | 10.23 | 9.94 | 9.08 |
Algorithm name | Multisong dataset | ||||
SDR Bass | SDR Drums | SDR Other | SDR Vocals | SDR Instrumental | |
Ensemble of 8 models | 12.52 | 11.73 | 6.93 | 10.17 | 16.48 |
Demucs 4 HT default (htdemucs_ft) | 12.05 | 11.24 | 5.74 | 8.33 | 14.63 |
2023-07-06
2023-05-22
2023-04-30
2022-11-13
2022-07-29
An experimental MVSep DNR algorithm has been added to the site, which divides tracks into 3 parts: music, special effects and voice. The algorithm was trained on the "Divide and Remaster" dataset. Quality Metrics:
SDR DNR for music: 6.17
SDR DNR for sfx: 7.26
SDR DNR for speech: 14.13
The algorithm is not well suited for ordinary music, but it does a good job when you need clean the voice of the speaker from extraneous noise in the background.
Examples of the MVSep DNR algorithm
2022-07-07
Quality metrics for algorithms including UVD Demucs can be found here.
2022-04-18
Quality metrics for these and other algorithms can be found here.
2022-02-24
2021-12-23
Unfortunately, all the highest quality algorithms work very slow. Large queues are periodically formed because of that. We think what to do with this.
2021-11-12
We had to move to a new server due to lack of space on the old one. Positive effect - the video card has been changed to a more powerful one with more memory. As a result, the waiting queues have decreased and there are fewer errors associated with a lack of GPU memory. The downside is that server costs have doubled.
A new algorithm has been added Ultimate Vocal Remover (UVR). It splits the track into two parts, music and vocals. UVR usually does it better than spleeter. There are a lot of models and different settings in the original UVR. We have chosen one of the best models and optimal settings. Perhaps later, a flexible choice of settings for the algorithm will be added.
The winner of the Music Demuxing Challenge has finally released his code. We added its models to the site under the names Demux3 Model A and Demux3 Model B. Demux3 Model B gives a better result, and works better for bass and drums comparing to other models, but is slightly inferior in vocals to the MDX-B algorithm.
Below is an updated table comparing the quality of algorithms (data for UVR are not available). The values in the table are calculated on private Music Demuxing Challenge dataset (available only to organizers). The higher the value, the better the algorithm works.
Algorithm | Quality (Bass) | Quality (Drums) | Quality (Other) | Quality (Vocals) | Example |
---|---|---|---|---|---|
Spleeter (4 stems) | 5.774 | 5.845 | 4.321 | 6.939 | Example |
UmxXL | 6.619 | 6.838 | 4.891 | 7.732 | Example |
MDX A | 7.232 | 7.173 | 5.636 | 8.901 | Example |
MDX B (Orig) | 7.495 | 7.554 | 5.533 | 8.896 | --- |
MDX B (UVR) | 7.495 | 7.554 | 5.533 | 9.482 | Example |
Ultimate Vocal Remover HQ | --- | --- | --- | --- | Example |
Demucs 3 Model A | 8.115 | 8.037 | 5.193 | 7.968 | Example |
Demucs 3 Model B | 8.856 | 8.850 | 5.978 | 8.756 | Example |
2021-10-19
Two new algorithms have been added to mvsep.com for separate tracks: MDX A and MDX B. These models were created by the participants in the Music Demuxing Challenge who took second place. Their solution code and neural network models were made publicly available. We are still waiting for the first place solution. But even these models significantly outperform Spleeter and UmxXL in competition metrics (see the table above), but slower in speed. MDX A differs from MDX B in that the first algorithm did not use external data for training, so the results are slightly worse than MDX B. Later, the enthusiasts of the UVR project improved the vocal separation model, getting a better value for the quality metric (8.896 -> 9.482).
2021-08-30
Examples of separation based on the new algorithm:
umxXL: Monk Turner Fascinoma - Its Your Birthday
umxHQ: Robin Grey - These Days
umxSD: Brad Sucks - Total Breakdown
umxSE: Paper Navy - Swan Song
And finally, some statistics. About 600-750 tracks are divided on the site per day. And for all the time, more than 300,000 tracks have been split. Moving towards a million.
turbo@mvsep.com
Help us translate!