1) We upgraded our main MDX23C 8K FFT model to split tracks into vocal and instrumental parts. SDR metrics have increased on MultiSong Dataset and on Synth Dataset. Separation results have improved accordingly on both Ensemble 4 and Ensemble 8 models. See the changes in the table below.
Algorithm name
Multisong dataset
Synth dataset
MDX23 Leaderboard
SDR Vocals
SDR Instrumental
SDR Vocals
SDR Instrumental
SDR Vocals
8K FFT, Full Band (Old version)
10.01
16.32
12.07
11.77
10.85
8K FFT, Full Band (New version)
10.17
16.48
12.35
12.06
11.04
2) We have added two new models MVSep Piano (demo) and MVSep Guitar (demo). Both models are based on the MDX23C architecture. The models produce high quality separation of music into piano/guitar part and everything else. Each of the models is available in two variants. In the first variant, the neural network model is used directly on the entire track. In the second variant, the track is first split into two parts, vocal and instrumental, and then the neural network model is applied only to the instrumental part. In the second case, the separation quality is usually a bit higher. We also prepared a small internal validation set to compare the models by the quality of separation of piano/guitar from the main track. Our model was compared with two other models (Demucs4HT (6 stems) and GSEP). For the piano, we have two validation sets. The first set includes the electric piano as part of the piano part and the second set includes only the acoustic piano. The metric used is SDR: the larger the better. See the results in the two tables below.
Validation type
Algorithm name
Demucs4HT (6 stems)
GSEP
MVSep Piano 2023 (Type 0)
MVSep Piano 2023 (Type 1)
Validation full
2.4432
3.5589
4.9187
4.9772
Validation (only grand piano)
4.5591
5.7180
7.2651
7.2948
Validation type
Algorithm name
Demucs4HT (6 stems)
MVSep Guitar 2023 (Type 0)
MVSep Guitar 2023 (Type 1)
Validation guitar
7.2245
7.7716
7.9251
Validation other
13.1756
13.7227
13.8762
3) We have updated the MDX-B Karaoke model (demo). It now has better quality metrics. The MDX-B Karaoke model was originally prepared as part of the Ultimate Vocal Remover project. The model produces high quality extraction of the lead vocal part from a music track. We have also made it available in two variants. In the first variant, the neural network model is used directly on the whole track. In the second variant, the track is first divided into two parts, vocal and instrumental, and then the neural network model is applied only to the vocal part. In the second case, the separation quality is usually higher and it is possible to extract backing vocals into a separate track. The model was compared on a large validation set with two other Karaoke models from UVR (they are also available on the website). See the results in the table below.
We have a lot of updates. First of all we redid the site from scratch. It has new features like user registration, more informative pages, better design etc. But also we added set of new algorithms:
1) We have released MDX23C models and made update for them. One of models reached 10 SDR on multisong dataset. Currently it's best single models for separation of vocals/instrumental. 2) We added new algorithm Demucs4 Vocals 2023. It's algorithm demucsht_ft but finetuned on big dataset. Metrics are better than for original, but slightly worse than MDX23C. On some melodies it can give more cleaner results. 3) We added new Ensemble algorithms. First is "Ensemble 4 models (vocals, instrum)". It includes: UVR-MDX-NET-Voc_FT, Demucs4 Vocals 2023 and two MDX23C models. Algorithm gives the highest possible quality for vocal and instrumental stems. Also if you need more detailed separation including 3 more stems "bass", "drums", "other" you can use "Ensemble 8 models (vocals, bass, drums, other)". This ensemble gives state of art results for 4 stems separation.
You can find comparison tables below (larger SDR is better).
We have released new MDX23C models. They are based on code from kuielab that was prepared for Sound Demixing Challenge 2023. The results of the obtained models contain the entire frequency spectrum and have the maximum quality metrics for vocals and music on MultiSong Dataset. A total of 4 models are available, by default the model with the highest quality metrics is used. We are currently working on further improvements of these models. More details...
A model was also prepared consisting of an ensemble of several single MDX23C models, which gives even better quality. It is available from a website with a title "MDX23C Ensemble".
MDX-B algorithm produces only vocals and instrumental after last update. It's because other 3 stems (bass, drums, other) work not so great comparing to Demucs4. You still can access old MDX-B (4 stems) at Old Models section.
We added Kim_vocal_2 model (trained by Kimberley Jensen) and some other UVR MDX models. Kim_vocal_2 is now used by default.
We upgraded MDX processing using overlap=0.8, so it produce higher SDR. For example Kim_vocal_2 alone gives: 9.60 for vocals and 15.91 for instrumental on Multisong dataset.
A new model has been added to the site to remove the reverb effect from music tracks. It is available under the name "FoxJoy Reverb Removal (other)". Examples of reverb removal can be found here.
All Demucs4 HT models are now available: htdemucs_ft [quality metrics], htdemucs [quality metrics] and htdemucs_6s [quality metrics]. htdemucs_6s divides the track into 6 parts, in addition to the standard parts, it will additionally include a piano and a guitar. These models are the best for getting bass, drums and other parts of tracks.
Added best quality MDX B model for vocal separation: "MDX Kimberley Jensen 2023.02.12 SDR: 9.30 (New)" [quality metrics].
Our own MVSep Vocal Model has been added to the site. It was trained on our own large dataset. It shows good results on test data: Synth dataset vocal SDR: 10.4523 Synth dataset instrumental SDR: 10.1561 MUSDB18HQ dataset vocal SDR: 8.8292 MUSDB18HQ dataset instrumental SDR: 15.2719
An experimental MVSep DNR algorithm has been added to the site, which divides tracks into 3 parts: music, special effects and voice. The algorithm was trained on the "Divide and Remaster" dataset. Quality Metrics: SDR DNR for music: 6.17 SDR DNR for sfx: 7.26 SDR DNR for speech: 14.13 The algorithm is not well suited for ordinary music, but it does a good job when you need clean the voice of the speaker from extraneous noise in the background. Examples of the MVSep DNR algorithm
We created independent synthetic dataset to compare different music source separation algorithms. We published dataset here as well as automatic judging test system. Also leaderboard of best algorithms is available.
New MDX-B UVR vocal model was added. It's latest reelease from UVR Team. You have ability to choose it during selecting MDX-B algorithm in form.
New models from Ultimate Vocal Remover based on demucs3 architecture were added. It's available by name UVR Demucs in algorithm list.
Quality metrics for algorithms including UVD Demucs can be found here.
New algorithm Danna Sep was added. It's algorithm which got 3rd place on Leaderboard A in Sony Music Demixing Challenge
New algorithm Byte Dance was added. This algorithm took second place in the vocals category on Leaderboard A in the Sony Music Demixing Challenge. It's trained only on the MUSDB18HQ data and has potential in the future if more training data is added.
Quality metrics for these and other algorithms can be found here.
Added the ability to select lossless encoding of the created audio-files. Previously, it was possible to use only MP3. Now we added output to WAV and FLAC.
Added the output of the general instrumental track for all main algorithms: MDX, Demucs3 and Unmix.
Added translation of the site into Polish and Indonesian.
Added an automatic script to reset the GPU in case of errors. There should be no longer large server downtime.
Unfortunately, all the highest quality algorithms work very slow. Large queues are periodically formed because of that. We think what to do with this.
We had to move to a new server due to lack of space on the old one. Positive effect - the video card has been changed to a more powerful one with more memory. As a result, the waiting queues have decreased and there are fewer errors associated with a lack of GPU memory. The downside is that server costs have doubled.
A new algorithm has been added Ultimate Vocal Remover (UVR). It splits the track into two parts, music and vocals. UVR usually does it better than spleeter. There are a lot of models and different settings in the original UVR. We have chosen one of the best models and optimal settings. Perhaps later, a flexible choice of settings for the algorithm will be added.
The winner of the Music Demuxing Challenge has finally released his code. We added its models to the site under the names Demux3 Model A and Demux3 Model B. Demux3 Model B gives a better result, and works better for bass and drums comparing to other models, but is slightly inferior in vocals to the MDX-B algorithm.
Below is an updated table comparing the quality of algorithms (data for UVR are not available). The values in the table are calculated on private Music Demuxing Challenge dataset (available only to organizers). The higher the value, the better the algorithm works.
Two new algorithms have been added to mvsep.com for separate tracks: MDX A and MDX B. These models were created by the participants in the Music Demuxing Challenge who took second place. Their solution code and neural network models were made publicly available. We are still waiting for the first place solution. But even these models significantly outperform Spleeter and UmxXL in competition metrics (see the table above), but slower in speed. MDX A differs from MDX B in that the first algorithm did not use external data for training, so the results are slightly worse than MDX B. Later, the enthusiasts of the UVR project improved the vocal separation model, getting a better value for the quality metric (8.896 -> 9.482).
Updated software and site code. Splitting tracks is faster and more stable. Our backend crashes are less and less common.
Added a new splitting algorithm called UnMix. The algorithm has 4 models "umxXL", "umxHQ", "umxSD", "umxSE". The highest quality is the first "umxXL". According to the first tests, the voice separates a little worse than the spleeter, but the instruments are better. In any case, a large field is now open for experimenting with tracks.
The page with the split results has been redesigned: an original track has been added, it is convenient to compare from one page. Added information on sharing settings, displays information on the uploaded file, ID3 tags and an image (if any).
And finally, some statistics. About 600-750 tracks are divided on the site per day. And for all the time, more than 300,000 tracks have been split. Moving towards a million.