分离人声与伴奏

Ensemble (vocals, instrum)

Ensemble of best vocal models. Algorithm gives the highest possible quality for vocal and instrumental stems. The latest ensemble consists of BS Roformer, MelBand Roformer and SCNet XL IHF vocal models.

Quality table

Algorithm name	Multisong dataset		Synth dataset		MDX23 Leaderboard
Algorithm name	SDR Vocals	SDR Instrumental	SDR Vocals	SDR Instrumental	SDR Vocals
Ensemble (2023.09) (UVR-MDX-NET-Voc_FT, Demucs4 Vocals 2023, MDX23C, VitLarge23)	10.44	16.74	12.76	12.46	11.17
Ensemble (2024.02) (BS Roformer (v1), MDX23C, VitLarge23)	10.75	17.06	12.72	12.42	---
Ensemble (2024.03) (BS Roformer (viperx), MDX23C)	11.06	17.37	13.00	12.70	---
Ensemble (2024.04) (BS Roformer (finetuned), MDX23C)	11.33	17.63	13.57	13.27	---
Ensemble (2024.08) (BS Roformer (finetuned), MelBand Roformer)	11.50	17.81	13.79	13.50	---
Ensemble (2024.12) (BS Roformer (finetuned), MelBand Roformer, SCNet XL)	11.61	17.92	14.09	13.79	---
Ensemble (2025.06) (BS Roformer (x2), MelBand Roformer (ft), SCNet XL IHF)	11.93	18.23	14.46	14.17	---

Detailed statistics on Multisong dataset:

Model	Vocals fullness	Vocals bleedless	Vocals SDR	Vocals L1Freq	Instrum fullness	Instrum bleedless	Instrum SDR	Instrum L1Freq
Ensemble (2025.06)	17.73	36.29	11.93	39.94	28.75	47.64	18.23	40.90
Ensemble High Vocals Fullness (2025.06)	20.46	32.77	11.69	39.86	---	---	---	---
Ensemble High Instrumental Fullness (2025.06)	---	---	---	---	34.79	41.47	17.69	40.51

🗎 复制链接 | Use algorithm | Demo

Ensemble (vocals, instrum, bass, drums, other)

This ensemble is based on algorithm which took 2nd place at Music Demixing Track of Sound Demixing Challenge 2023. The main changes comparing to contest version is much better vocal models, which is used here. We use following different models for vocals: UVR-MDX-NET-Voc_FT, Demucs4 Vocals 2023, best MDX23C model, VitLarge23, BS Roformer, Mel Roformer and SCNet XL. For stems 'bass', 'drums' and 'other' we us the following 4 models: demucs4ht_ft, deumcs4_ht, demucs4_6s and demucs3_mmi. Initial winning model available here: https://github.com/ZFTurbo/MVSEP-MDX23-music-separation-model

Quality table

Algorithm name	Multisong dataset					Synth dataset
Algorithm name	SDR Bass	SDR Drums	SDR Other	SDR Vocals	SDR Instrumental	SDR Vocals	SDR Instrumental
SDR average: 11.21 (v. 2023.09.01)	12.52	11.73	7.01	10.30	16.60	12.67	12.38
SDR average: 11.87 (v. 2024.03.08)	12.53	11.84	7.15	10.75	17.06	12.72	12.42
SDR average: 12.03 (v. 2024.03.28)	12.57	11.94	7.22	11.06	17.37	13.00	12.70
SDR average: 12.17 (v. 2024.04.04)	12.59	11.99	7.33	11.33	17.63	13.57	13.27
SDR average: 12.34 (v. 2024.05.21)	13.44	11.99	7.33	11.33	17.63	13.57	13.27
SDR average: 12.66 (v. 2024.07.14)	13.46	13.15	7.72	11.32	17.63	13.57	13.27
SDR average: 12.76 (v. 2024.08.15)	13.48	13.33	7.81	11.50	17.81	13.79	13.50
SDR average: 13.01 (v. 2024.12.20)	14.14	13.57	8.02	11.50	17.81	13.79	13.50
SDR average: 13.07 (v. 2024.12.28)	14.14	13.57	8.10	11.61	17.92	14.09	13.79
SDR average: 13.67 (v. 2025.06.30)	14.85	14.33	9.00	11.93	18.23	14.46	14.17

Algorithm name	MDX23 Leaderboard
Algorithm name	SDR Bass	SDR Drums	SDR Other	SDR Vocals
SDR average: 11.21 (v. 2023.09.01) (UVR-MDX-NET-Voc_FT, Demucs4 Vocals 2023, MDX23C, VitLarge23)	9.937	9.559	7.280	11.093

🗎 复制链接 | Use algorithm | Demo

Ensemble All-In (vocals, bass, drums, piano, guitar, lead/back vocals, other)

It's Ensemble (vocals, instrum, bass, drums, other) + more models included like guitars, piano, wind, strings, back/lead vocals and drumsep (4 stems extracted from drums: kick, toms, snare, cymbals). The algorithm works very slow but gives the most precise result plus a lot of additional stems. Guitar, piano, drums bass etc gives better results because they use filtered source from other stems.

🗎 复制链接 | Use algorithm | Demo

BS Roformer SW (vocals, bass, drums, guitar, piano, other)

BS Roformer SW model, which generates 6 stems at once with superior quality.

Quality table which shows SDR values from Multisong dataset and from leaderboards for Piano and Guitar:

vocals	instrum	bass	drums	guitar	piano	other
11.30	17.50	14.62	14.11	9.05	7.83	8.71

🗎 复制链接 | Use algorithm | Demo

Demucs4 HT (vocals, drums, bass, other)

Algorithm Demucs4 HT splits track into 4 stems (bass, drums, vocals, other). It's now best for bass/drums/other separation. It was released in year 2022 and has 3 versions:

htdemucs_ft - best quality, but slow
htdemucs - lower quality, but fast
htdemucs_6s - it has 2 additional stems "piano" and "guitar" (quality for them is still so-so).

Link: https://github.com/facebookresearch/demucs/tree/ht/demucs

Quality table

Algorithm name	Multisong dataset					Synth dataset		MDX23 Leaderboard
Algorithm name	SDR Bass	SDR Drums	SDR Other	SDR Vocals	SDR Instrumental	SDR Vocals	SDR Instrumental	SDR Vocals
htdemucs_ft	12.05	11.24	5.74	8.33	14.63	10.23	9.94	9.08
htdemucs	11.74	10.90	5.57	8.18	14.49	---	---
htdemucs_6s	11.42	10.59	2.63	8.17	14.48	---	---

🗎 复制链接 | Use algorithm | Demo

BS Roformer (vocals, instrumental)

BS Roformer model. Excellent quality for vocals/instrumental separation. It's modified version of initial BS Roformer model. Modifications were made by lucidrains on github. 2nd version of weights for model (with better quality) was prepared by viperx. Latest versions are fintuned viperx model with better metrics on 3 different checking systems.

Quality table

Algorithm name	Multisong dataset		Synth dataset		MDX23 Leaderboard
Algorithm name	SDR Vocals	SDR Instrumental	SDR Vocals	SDR Instrumental	SDR Vocals
BS Roformer (ver. 2024.02)	10.42	16.73	12.45	12.16	---
BS Roformer (viperx edition)	10.87	17.17	12.71	12.41	11.751
BS Roformer (ver. 2024.04)	11.24	17.55	13.47	13.17	11.915
BS Roformer (ver. 2024.08)	11.31	17.62	13.56	13.27	---
unwa high instrum fullness	10.94	17.25	13.23	12.94	---
unwa BS Roformer HyperACE v2 instrum	11.09	17.40	12.63	12.33	---
unwa BS Roformer HyperACE v2 vocals	11.39	17.70	12.86	12.56	---
BS Roformer (ver. 2025.07)	11.89	18.20	14.58	14.28	---

Detailed statistics on Multisong dataset:

Model	Vocals fullness	Vocals bleedless	Vocals SDR	Vocals L1Freq	Instrum fullness	Instrum bleedless	Instrum SDR	Instrum L1Freq
unwa high instrum fullness	15.85	24.29	10.94	36.92	34.72	40.43	17.25	39.50
unwa BS Roformer HyperACE v2 instrum	14.81	33.02	11.09	36.40	37.95	37.97	17.40	39.93
unwa BS Roformer HyperACE v2 vocals	19.07	34.12	11.39	39.55	27.31	45.48	17.70	39.55
BS Roformer (ver. 2025.07)	17.23	38.24	11.89	40.03	27.82	49.12	18.20	40.99

🗎 复制链接 | Use algorithm | Demo

MelBand Roformer (vocals, instrumental)

Algorithm for separating tracks into vocal and instrumental parts based on the MelBand Roformer neural network. The neural network was first proposed in the paper "Mel-Band RoFormer for Music Source Separation" by a group of scientists from ByteDance. The first high-quality weights were made publicly available by Kimberley Jensen. The neural network with open weights was then slightly modified and further trained by the MVSep team in order to improve quality metrics. Also there are high quality weights provided by: @Bas Curtiz, @unwa, @becruily and @gabox.

Quality metrics

Algorithm name	Multisong dataset		Synth dataset		MDX23 Leaderboard
Algorithm name	SDR Vocals	SDR Instrumental	SDR Vocals	SDR Instrumental	SDR Vocals
MelBand Roformer (Kimberley Jensen)	11.01	17.32	12.68	12.38	11.543
MelBand Roformer (ver. 2024.08)	11.17	17.48	13.34	13.05	---
Bas Curtiz edition	11.18	17.49	13.89	13.60	---
unwa Instrumental	10.24	16.54	12.25	11.95	---
unwa Instrumental v1e Note: Max instrum fullness, but noisy	10.05	16.36	---	---	---
unwa big beta v5e Note: Max vocals fullness, but noisy	10.59	16.89	---	---	---
MelBand Roformer (ver. 2024.10)	11.28	17.59	13.89	13.59	---
becruily instrum max fullness Note: Max instrum fullness, but noisy	10.16	16.47	---	---	---
becruily vocals max fullness Note: Max vocals fullness, but noisy	10.55	16.86	---	---	---
unwa Instrumental v1e plus Note: Max instrum fullness, but noisy	10.33	16.64	---	---	---
gabox Instrumental v7 Note: Max instrum fullness, but noisy	10.32	16.63	---	---	---
becruily deux Note: Max vocals and instrum fullness, but noisy	11.35	17.66	13.11	12.82	---

Detailed statistics on Multisong dataset:

Model	Vocals fullness	Vocals bleedless	Vocals SDR	Vocals L1Freq	Instrum fullness	Instrum bleedless	Instrum SDR	Instrum L1Freq
MelBand Roformer (Kimberley Jensen)	16.66	36.51	11.01	38.96	27.71	46.72	17.32	39.77
MelBand Roformer (ver. 2024.08)	16.39	39.13	11.18	39.26	27.74	47.07	17.49	40.16
Bas Curtiz edition	16.30	38.94	11.18	39.18	27.49	47.00	17.49	40.15
MelBand Roformer (ver. 2024.10)	16.92	37.78	11.28	39.41	27.71	47.29	17.59	40.29
unwa Instrumental v1 (SDR vocals: 10.24, SDR instrum: 16.54)	15.89	27.48	10.24	36.06	35.44	38.02	16.55	38.67
unwa Instrumental v1e (SDR vocals: 10.05, SDR instrum: 16.36)	14.67	26.83	10.06	34.37	38.85	35.68	16.37	38.31
unwa big beta v5e (SDR vocals: 10.59, SDR instrum: 16.89)	20.78	32.02	10.59	38.53	25.65	45.90	16.90	37.31
becruily instrum high fullness (SDR instrum: 16.47)	15.76	30.15	10.16	35.84	33.93	40.55	16.47	38.86
becruily vocals high fullness (SDR vocals: 10.55)	20.72	31.25	10.55	38.84	28.28	40.85	16.86	38.24
unwa Instrumental v1e plus (SDR vocals: 10.33, SDR instrum: 16.64)	14.96	31.89	10.33	35.76	36.20	38.57	16.64	39.04
gabox Instrumental v7 (SDR vocals: 10.32, SDR instrum: 16.63)	16.25	27.28	10.32	36.85	29.34	45.06	16.63	38.70
becruily deux (SDR vocals: 11.35, SDR instrum: 17.66)	22.81	29.78	11.35	40.28	33.41	42.11	17.51	40.10

🗎 复制链接 | Use algorithm | Demo

MDX23C (vocals, instrumental)

New set of models MDX23C is based on code released by kuielab for Sound Demixing Challenge 2023. All models are full band, e.g. they don't cut high frequences.

Quality table

Algorithm name	Multisong dataset		Synth dataset		MDX23 Leaderboard
Algorithm name	SDR Vocals	SDR Instrumental	SDR Vocals	SDR Instrumental	SDR Vocals
12K FFT, Full Band, Large Conv, Hop 1024	9.95	16.26	11.74	11.44	10.78
12K FFT, Full Band, Large Conv	9.71	16.02	---	---	---
12K FFT, Full Band	9.68	15.99	---	---	---
12K FFT, Full Band, 6 Poolings	9.49	15.79	---	---	---
8K FFT, Full Band (v1)	10.17	16.48	12.35	12.06	11.04
8K FFT, Full Band (v2)	10.36	16.66	12.52	12.22	11.16

🗎 复制链接 | Use algorithm | Demo

SCNet (vocals, instrumental)

Algorithm for separating tracks into vocal and instrumental parts based on the SCNet neural network. The neural network was proposed in the article "SCNet: Sparse Compression Network for Music Source Separation" by a group of scientists from China. The authors made the neural network code open source, and the MVSep team was able to reproduce results similar to those presented in the published article. First, we trained a small version of SCNet, and then after some time, a heavier version of SCNet was prepared. The quality metrics are quite close to the quality of Roformer models (which are the top models at the moment), but still slightly inferior. However, in some cases, the model can work better than Roformers.

Quality metrics

Algorithm name	Multisong dataset		Synth dataset		MDX23 Leaderboard
Algorithm name	SDR Vocals	SDR Instrumental	SDR Vocals	SDR Instrumental	SDR Vocals
SCNet	10.25	16.56	12.27	11.97	---
SCNet Large	10.74	17.05	12.89	12.59	---
SCNet XL	10.96	17.27	13.08	12.78	---
SCNet XL (high fullness)	10.92	17.23	---	---	---
SCNet XL (very high fullness)	10.40	16.60	---	---	---
SCNet XL IHF	11.11	17.41	13.29	12.99	---
SCNet XL IHF (high instrum fullness by becruily)	10.89	17.20	13.31	12.98	---

Detailed statistics on Multisong dataset:

Model	Vocals fullness	Vocals bleedless	Vocals SDR	Vocals L1Freq	Instrum fullness	Instrum bleedless	Instrum SDR	Instrum L1Freq
SCNet	17.34	25.24	10.25	35.47	29.35	32.34	16.56	36.24
SCNet Large	17.70	26.84	10.74	36.86	27.10	41.47	17.05	37.62
SCNet XL	17.96	26.95	10.96	37.35	28.74	39.42	17.27	38.09
SCNet XL (high fullness)	21.67	25.00	10.92	37.70	31.95	34.06	17.23	37.91
SCNet XL (very high fullness)	23.50	25.30	10.40	37.16	34.04	35.15	16.60	36.78
SCNet XL IHF	17.98	28.31	11.11	37.91	28.87	40.37	17.41	38.54
SCNet XL IHF (high instrum fullness by becruily)	22.70	25.48	10.89	38.18	32.31	38.15	17.20	38.43

🗎 复制链接 | Use algorithm | Demo

MDX B (vocals, instrumental)

MDX B models are based on kuielab code from Music Demixing Challenge 2021. Models were retrained by UVR team on big dataset. For long time models were best for vocals/instrumental separation.

Quality table

Algorithm name	Multisong dataset		Synth dataset		MDX23 Leaderboard
Algorithm name	SDR Vocals	SDR Instrumental	SDR Vocals	SDR Instrumental	SDR Vocals
UVR-MDX-NET-Inst_HQ_5	9.45	15.76	---	---	---
UVR-MDX-NET-Inst_HQ_4	9.71	16.01	11.53	11.23	---
UVR-MDX-NET-Voc_FT	9.64	15.95	11.40	11.10	10.505
MDX Kimberley Jensen v2	9.60	15.91	---	---	10.494
MDX Kimberley Jensen v1	9.48	15.79	---	---	---
UVR-MDX-NET-Inst_HQ_3	9.38	15.68	11.32	11.03	10.254
MDX Kimberley Jensen Inst	9.28	15.59	---	---	---
UVR-MDX-NET-Inst_HQ_2	9.12	15.42	---	---	---
MDX UVR 2022.01.01	8.83	15.14	---	---	---
UVR_MDXNET_Main	8.79	15.10	---	---	---
MDX UVR 2022.07.25	8.67	14.97	---	---	---

🗎 复制链接 | Use algorithm | Demo

Ultimate Vocal Remover VR (vocals, music)

A set of models from the Ultimate Vocal Remover program, which are based on the old VR architecture. Most of the models are vocal, but there are also special models for karaoke, piano, removing reverberation effects, etc. Overall the models have average separation quality, but may be useful in some scenarios.

🗎 复制链接 | Use algorithm | Demo

Demucs4 Vocals 2023 (vocals, instrum)

Demucs4 Vocals 2023 model - it's Demucs4 HT model fine-tuned on big vocal/instrumental dataset. It has better metrics for vocals separation compared to Demucs4 HT (_ft version). It usually gives worse metrics than MDX23C models, but can be useful for ensembles, since the model is very different from MDX23C.

Quality table

Algorithm name	Multisong dataset		Synth dataset		MDX23 Leaderboard
Algorithm name	SDR Vocals	SDR Instrumental	SDR Vocals	SDR Instrumental	SDR Vocals
Demucs4 Vocals 2023	9.04	15.35	11.59	11.29	9.61

🗎 复制链接 | Use algorithm | Demo

MVSep Karaoke (lead/back vocals)

Algorithm for extracting only lead vocals and everything else based on the MelBand Roformer and SCNet models. It works for any music track, but you can also pre-extract vocals by selecting the "Extract vocals first" option in Extraction type. In the second case, back vocals will be available in a separate file.

There are 5 models, one prepared by the team @aufr33 and viperx, the second by @becruily, 3rd by @gabox and 4th it's fused model from @gabox's and team @aufr33/viperx. Additionally, a model based on the SCNet XL IHF architecture by @becruily has been added separately.

Quality metrics are given below. For comparison, the table also provides quality metrics for the old UVR and MDX-B Karaoke algorithms.

Algorithm name	Lead Vocals (SDR)	Back Vocals (SDR)	Back Vocals + Instrum SDR	Instrum SDR
UVR (HP-KAROKEE-MSB2-3BAND-3090)	6.42	---	11.79	---
UVR (karokee_4band_v2_sn)	6.72	---	12.09	---
UVR (UVR-BVE-4B_SN-44100-1)	---	0.87	---	4.90
MDX-B (Karaoke)	7.42	---	12.81	---
MDX-B (Karaoke) Extract from vocals	8.28	4.46	13.67	15.94
MelBand Roformer (@aufr33 и viperx)	9.45	---	14.84	---
MelBand Roformer (@becruily)	9.61	---	15.00	---
MelBand Roformer (@gabox)	9.67	---	15.06	---
MelBand Roformer (Fused @gabox and @aufr33/viperx)	9.85	---	15.23	---
SCNet XL IHF (@becruily)	9.53	---	14.91	---
BS Roformer (@frazer and @becruily)	10.10	---	15.48	---
BS Roformer (MVSep Team)	10.41	6.61	15.72	15.69
BS Roformer (@anvuew)	10.22	---	15.60	---
MelBand Roformer (@aufr33 и viperx) extract vocals first	9.22	5.27	14.61	15.94
MelBand Roformer (@becruily) extract vocals first	8.98	4.98	14.24	15.94
MelBand Roformer (@gabox) extract vocals first	9.36	5.46	14.75	15.94
MelBand Roformer (Fused @gabox and @aufr33/viperx) extract vocals first	9.62	5.63	15.01	15.94

🗎 复制链接 | Use algorithm | Demo

MDX-B Karaoke (lead/back vocals)

The MDX-B Karaoke model was prepared as part of the Ultimate Vocal Remover project. The model produces high-quality lead vocal extraction from a music track. The model is available in two versions. In the first version, the neural network model is used directly on the entire track. In the second version, the track is first divided into two parts, vocal and instrumental, and then the neural network model is applied only to the vocal part. In the second version, the quality of separation is usually higher and it becomes possible to additionally separate the backing vocals into a separate track. The model was compared with two other models from UVR (they are also available on the website) on a large validation set. The metric used is SDR: the higher the better.

See the results in the table below.

Validation type	Algorithm name
Validation type	UVR (HP-KAROKEE-MSB2-3BAND-3090)	UVR (karokee_4band_v2_sn)	MDX-B Karaoke (Type 0)	MDX-B Karaoke (Type 1)
Validation lead vocals	6.46	6.34	6.81	7.94
Validation other	13.17	13.02	13.53	14.66
Validation back vocals	---	---	---	1.88

🗎 复制链接 | Use algorithm | Demo

MVSep Crowd removal (crowd, other)

An unique model for removing crowd sounds from music recordings (applause, clapping, whistling, noise, laugh etc.). Current metrics on our internal dataset for quality control:

Algorithm name	Crowd dataset
Algorithm name	SDR crowd	SDR other
Crowd model MDX23C (v1)	5.57	18.79
Crowd model MDX23C (v2)	6.06	19.28
MelBand Roformer	6.07	19.29
Ensemble (MelRoformer + MDX23C)	6.27	19.49
BS Roformer	7.21	20.43

Examples of how the model works can be found: here и here.

🗎 复制链接 | Use algorithm | Demo

Medley Vox (Multi-singer separation)

Medley Vox is a dataset for testing algorithms for separating multiple singers within a single music track. Also, the authors of Medley Vox proposed a neural network architecture for separating singers. However, unfortunately, they did not publish the weights. Later, their training process was repeated by Cyru5, who trained several models and published the weights in the public domain. Now the trained neural network is available on MVSep.

🗎 复制链接 | Use algorithm | Demo

MVSep Multichannel BS (vocals, instrumental)

MVSep Multichannel BS - this model is prepared for extracting vocals from multichannel sound (5.1, 7.1, etc.). Emphasis on lack of transformation and loss of quality. After processing, the model returns multi-channel audio in the same format in which it was sent to the server with the same sample rate.

🗎 复制链接 | Use algorithm | Demo

MVSep Male/Female separation

A model for separating male and female voices within a single vocal track. The track should contain only voices, no music.

Quality metrics

Algorithm name	Male/Female validation dataset
Algorithm name	SDR Male	SDR Female	L1_Freq Male	L1_Freq Female
BSRoformer by Sucial (SDR: 6.52)	6.82	6.23	40.99	40.62
BSRoformer by aufr33 (SDR: 8.18)	8.47	7.89	46.65	44.73
SCNet XL (SDR: 11.83)	12.08	11.58	50.50	51.51
MelRoformer (2025.01) (SDR: 13.03)	13.39	12.68	57.61	56.76

🗎 复制链接 | Use algorithm | Demo

MVSep Choir (choir, other)

A high-quality model designed to extract choir vocals from audio. The 'How to extract' option allows you to choose between extracting directly from the full mix or extracting the vocals first and then applying the model to the vocal track.

🗎 复制链接 | Use algorithm | Demo

MVSep SATB Choir (soprano, alto, tenor, bass)

SATB is an acronym in music that stands for Soprano, Alto, Tenor, and Bass. It is the standard standard scoring for four-part harmony, most commonly used in choral music, but also serving as the foundational structure for Western music theory and composition. Here is a breakdown of the four voices, their ranges, and their roles:

The voices are arranged from highest pitch to lowest pitch:

S (Soprano): The highest female voice. Role: In most standard hymns and pop arrangements, the Soprano line carries the melody.
A (Alto): The lower female voice. Role: The Alto line usually provides harmony immediately below the melody. It is often the "glue" that connects the high melody to the lower male voices.
T (Tenor): The highest adult male voice. Role: The Tenor line provides lower harmony and adds texture to the middle of the chord.
B (Bass): The lowest male voice. Role: The Bass line provides the harmonic foundation. It usually sings the "root" of the chord, grounding the music.

While SATB describes human voices, the concept is applied to instruments as well. Many instrumental families are built to mimic the SATB range structure:

Strings: Violin 1 (S), Violin 2 (A), Viola (T), Cello (B)

The provided model works with vocals, strings, piano, and synth melodies. It does not work with full mixes. If you need to process vocal parts, you must provide isolated vocals without the instrumental. For full mixes use option "Extract vocals first".

Metric SDR on internal validation dataset:

	Soprano	Alto	Tenor	Bass
SCNet Masked by @Dry Paint Dealer Undr	5.28	2.80	4.56	3.64
BS Roformer	8.72	6.59	7.54	6.72

🗎 复制链接 | Use algorithm | Demo

MVSep Drums (drums, other)

The MVSep Drums model exists in 3 different variants based on following architectures: HTDemucs4, MelRoformer and SCNet. The model produces high-quality separation of music into a drums part and everything else.

Quality metrics

Algorithm name	Multisong dataset		MDX23 Leaderboard
Algorithm name	SDR Drums	SDR Other	SDR Drums
HTDemucs4	12.04	16.56	---
MelBand Roformer	12.76	17.28	---
SCNet Large	13.01	17.53	---
SCNet XL	13.42	18.00
MelBand + SCNet XL Ensemble	13.78	18.31	---
BS Roformer SW	14.11	---	---
MelBand + SCNet XL + BS Roformer SW Ensemble	14.35	---	---

Detailed statistics on Multisong dataset:

Model	Drums fullness	Drums bleedless	Drums SDR	Drums L1Freq	Other fullness	Other bleedless	Other SDR	Other L1Freq
HTDemucs4	15.36	25.00	12.04	37.47	33.03	37.22	16.56	38.37
MelBand Roformer	14.16	42.12	12.76	40.80	33.97	47.24	17.28	42.02
SCNet Large	14.91	28.23	13.01	38.04	35.39	35.03	17.53	39.36
SCNet XL	21.21	24.47	13.42	40.30	38.56	38.32	18.00	40.35
MelBand + SCNet XL Ensemble	19.66	30.23	13.78	41.74	38.09	42.90	18.31	42.00
BS Roformer SW	14.78	43.70	14.11	42.23	---	---	---	---
MelBand + SCNet XL + BS Roformer SW Ensemble	16.97	39.73	14.35	42.74	---	---	---	---

🗎 复制链接 | Use algorithm | Demo

MVSep Bass (bass, other)

The MVSep Bass model exists in 3 different variants based on following architectures: HTDemucs4, BS Roformer and SCNet XL. The model produces high-quality separation of music into a bass part and everything else.

Quality metrics

Algorithm name	Multisong dataset		MDX23 Leaderboard
Algorithm name	SDR Bass	SDR Other	SDR Bass
BS Roformer	12.49	16.59	---
HTDemucs4	12.52	16.64	---
SCNet XL	13.81	17.93	---
BS + HTDemucs + SCNet XL Ensemble	14.07	18.18	---
BS + HTDemucs + SCNet XL Ensemble (+extract from Instrumental)	14.12	---	---
BS Roformer SW	14.62	---	---
SCNet XL + BS Roformer SW	14.87	---	---

🗎 复制链接 | Use algorithm | Demo

MVSep Synth (synth, other)

Synth extraction model. Synth is included the following stems: Synth, Synthesizer, Synth Pad, Synth Bass, Synth Vocals, Synth Strings, Synth Percussion, Synth FX, Synth Keys, Synth Brass, Synth Guitar, Synth Flute, Synth Ambiant

🗎 复制链接 | Use algorithm | Demo

DrumSep (4-6 stems: kick, snare, cymbals, toms, ride, hh, crash)

The model separates the drum track into 4, 5, or 6 types: 'kick', 'snare', 'cymbals', 'toms'. In the 5-track models, 'hh' is separated from 'cymbals', and in the case of 6 tracks, 'cymbals' is split into 'hh', 'ride', and 'crash'.

A total of 8 models are available:
1) The DrumSep model from the GitHub repository. It was trained on the HDemucs architecture and splitting drums into 4 tracks.
2) A model based on the mdx23c architecture, prepared by @jarredou and @aufr33. The model splits drums into 6 tracks.
3) A model based on the SCNet XL architecture, which splits drums into 5 tracks.
4) A model based on the SCNet XL architecture, which splits drums into 6 tracks.
5) A model based on the SCNet XL architecture, which splits drums into 4 tracks.
6) Ensemble of 4 models (1 MDX23C + 3 SCNet XL)
7) A model based on the MelBand Roformer architecture, which splits drums into 4 tracks.
8) A model based on the MelBand Roformer architecture, which splits drums into 6 tracks.

All models work only with the drum track. If other instruments or vocals are present in the track, the model will not work correctly. Therefore, the algorithm has two modes of operation. In the first (default) mode, the best model for drums, MVSep Drums, is first applied to the track, extracting only the drum part. Then, the DrumSep model is applied. If your track consists only of drums, it makes sense to use the second mode, where the DrumSep model is applied directly to the uploaded audio.

Quality table (SDR metric, higher is better):

Algorithm name	kick	snare	toms	cymbals
Algorithm name	kick	snare	toms	hh	ride	crash
DrumSep model by inagoy (HDemucs, 4 stems)	14.13	8.42	5.67	5.63
DrumSep model by aufr33 and jarredou (MDX23C, 6 stems)	18.32	13.60	13.25	6.71	5.38	7.56
DrumSep SCNet XL (5 stems)	20.21	15.05	16.28	7.05	8.56
DrumSep SCNet XL (6 stems)	20.24	14.80	15.93	6.74	5.02	7.63
DrumSep SCNet XL (4 stems)	20.50	14.69	15.92	10.08
Ensemble of 4 models (3 * SCNet + MDX23C)	20.59	15.11	16.41	7.19	5.59	7.85
DrumSep Mel Band Roformer (4 stems)	22.22	17.09	15.86	11.87
DrumSep Mel Band Roformer (6 stems)	20.21	15.33	15.48	8.79	6.96	8.79

Quality table (L1 Freq metric, higher is better):

Algorithm name	kick	snare	toms	cymbals
Algorithm name	kick	snare	toms	hh	ride	crash
DrumSep model by inagoy (HDemucs, 4 stems)	74.34	62.20	73.52	68.87
DrumSep model by aufr33 and jarredou (MDX23C, 4 stems)	78.20	71.27	84.22	80.84	86.74	79.41
DrumSep SCNet XL (5 stems)	81.56	73.16	87.85	80.65	75.44
DrumSep SCNet XL (6 stems)	81.63	72.75	87.46	79.97	85.73	78.67
DrumSep SCNet XL (4 stems)	81.69	72.90	88.43	73.64
Ensemble of 4 models (3 * SCNet + MDX23C)	81.91	73.41	88.24	81.12	86.91	79.41
DrumSep Mel Band Roformer (4 stems)	84.97	77.78	90.13	78.16
DrumSep Mel Band Roformer (6 stems)	81.82	75.63	88.93	85.66	90.50	82.18

Quality table (Fullness metric, higher is better):

Algorithm name	kick	snare	toms	cymbals
Algorithm name	kick	snare	toms	hh	ride	crash
DrumSep model by inagoy (HDemucs, 4 stems)	13.61	18.80	20.86	15.80
DrumSep model by aufr33 and jarredou (MDX23C, 4 stems)	18.67	17.85	18.29	12.95	15.76	14.92
DrumSep SCNet XL (5 stems)	18.40	30.94	29.64	13.28	15.15
DrumSep SCNet XL (6 stems)	32.03	29.43	36.04	13.64	14.05	15.05
DrumSep SCNet XL (4 stems)	29.87	30.53	48.35	17.48
Ensemble of 4 models (3 * SCNet + MDX23C)	23.89	30.06	36.19	14.23	18.34	15.43
DrumSep Mel Band Roformer (4 stems)	19.45	23.09	40.32	16.44
DrumSep Mel Band Roformer (6 stems)	15.22	25.98	42.33	19.53	20.51	19.39

Quality table (Bleedless metric, higher is better):

Algorithm name	kick	snare	toms	cymbals
Algorithm name	kick	snare	toms	hh	ride	crash
DrumSep model by inagoy (HDemucs, 4 stems)	48.04	18.25	33.85	14.65
DrumSep model by aufr33 and jarredou (MDX23C, 4 stems)	53.25	38.81	56.08	10.52	8.17	14.55
DrumSep SCNet XL (5 stems)	53.33	26.00	51.72	7.97	12.66
DrumSep SCNet XL (6 stems)	36.82	28.82	40.28	7.43	8.25	11.93
DrumSep SCNet XL (4 stems)	44.34	29.05	28.87	16.35
Ensemble of 4 models (3 * SCNet + MDX23C)	51.58	32.20	46.38	8.32	8.51	14.26
DrumSep Mel Band Roformer (4 stems)	69.11	57.86	51.44	50.52
DrumSep Mel Band Roformer (6 stems)	74.12	52.23	46.14	35.19	31.70	36.12

@jarredou prepared new DrumSep validation dataset. It consists of 150 small different tracks. 1st part is Drumkits from 001 to 017 (5 tracks for each of these drumkits, with different playing style) are acoustic drums. From 018 to 082 (1 track by drumkit) are electro drums. This dataset is for 5 stems separation of drums: ['kick', 'snare', 'toms', 'hh', 'cymbals']. For 6 stem models 'ride' and 'crash' were sumed up to 'cymbals'. For 4 stem models 'hh' and 'cymbals' were sumed up to 'cymbals'.

Quality table (SDR metric, higher is better):

Algorithm name	kick	snare	toms	cymbals
Algorithm name	kick	snare	toms	hh	ride	crash
DrumSep model by inagoy (HDemucs, 4 stems)	10.52	6.05	4.68	5.03
DrumSep model by aufr33 and jarredou (MDX23C, 6 stems)	14.54	9.79	10.63	3.19	6.08
DrumSep SCNet XL (5 stems)	17.89	12.56	14.14	3.63	6.15
DrumSep SCNet XL (6 stems)	17.74	12.43	14.24	3.39	5.91
DrumSep SCNet XL (4 stems)	17.61	12.37	13.40	7.48
DrumSep Mel Band Roformer (4 stems)	18.67	13.55	13.60	8.76
DrumSep Mel Band Roformer (6 stems)	17.46	12.64	13.69	5.05	7.06

Quality table (L1 Freq metric, higher is better):

Algorithm name	kick	snare	toms	cymbals
Algorithm name	kick	snare	toms	hh	ride	crash
DrumSep model by inagoy (HDemucs, 4 stems)	48.68	30.27	42.44	39.26
DrumSep model by aufr33 and jarredou (MDX23C, 6 stems)	56.95	38.31	54.65	47.47	47.39
DrumSep SCNet XL (5 stems)	61.56	43.06	60.76	48.19	47.49
DrumSep SCNet XL (6 stems)	61.46	42.42	60.55	47.32	46.43
DrumSep SCNet XL (4 stems)	61.59	42.91	60.46	44.65
DrumSep Mel Band Roformer (4 stems)	65.24	47.13	63.50	49.77
DrumSep Mel Band Roformer (6 stems)	63.58	46.14	62.94	53.98	51.83

Quality table (Log WMSE metric, higher is better):

Algorithm name	kick	snare	toms	cymbals
Algorithm name	kick	snare	toms	hh	ride	crash
DrumSep model by inagoy (HDemucs, 4 stems)	12.76	11.70	11.41	19.27
DrumSep model by aufr33 and jarredou (MDX23C, 6 stems)	16.47	15.13	16.89	23.18	22.32
DrumSep SCNet XL (5 stems)	19.54	17.69	20.12	23.59	22.39
DrumSep SCNet XL (6 stems)	19.41	17.57	20.21	23.38	22.17
DrumSep SCNet XL (4 stems)	19.29	17.52	19.44	21.54
DrumSep Mel Band Roformer (4 stems)	20.27	18.62	19.63	22.74
DrumSep Mel Band Roformer (6 stems)	19.16	17.77	19.71	24.94	23.23

🗎 复制链接 | Use algorithm | Demo

MVSep Piano (piano, other)

MVSep Piano model is based on MDX23C, MelRoformer and SCNet Large architectures. It produces high quality separation for piano and other stems. We provide comparison with other public model (Demucs4HT (6 stems)). Used metrics is SDR - the more the better.

See the results in table below.

Algorithm name	Validation type
	piano (SDR)	other (SDR)
Demucs4HT (6 stems)	2.23	14.51
mdx23c (2023.08, SDR: 4.79)	4.79	17.07
mdx23c (2024.09, SDR: 5.59)	5.59	17.89
MelRoformer (viperx, SDR: 5.67)	5.67	17.95
SCNet Large (2024.09, SDR: 5.89)	5.89	18.16
Ensemble (SCNet + Mel, SDR: 6.19)	6.19	18.47
BS Roformer SW (SDR: 7.83)	7.83	19.97

🗎 复制链接 | Use algorithm | Demo

MVSep Keys (keys, other)

The MVSep Keys is a high quality model for separating music into keys instruments and everything else. List of instruments: organ, harpsichord, acoustic piano, electric piano, digital piano, mellotron, celesta, wurlitzer, rhodes, accordion.

🗎 复制链接 | Use algorithm | Demo

MVSep Organ (organ, other)

The MVSep Organ model produces high-quality separation of music into an organ part and everything else.

🗎 复制链接 | Use algorithm | Demo

MVSep Guitar (guitar, other)

The MVSep Guitar model is based on the MDX23C, Mel Roformer and BSRoformer architectures. The model produces high-quality separation of music into a guitar part (including acoustic and electronic) and everything else. The model was compared with the Demucs4HT model (6 stems) on a guitar validation set. The metric used is SDR: the higher the better.

See the results in the table below.

Algorithm name	Validation type
	guitar (SDR)	other (SDR)
Demucs4HT (6 stems)	5.22	12.19
mdx23c (2023.08, SDR: 4.78)	4.78	11.75
mdx23c (2024.06, SDR: 6.34)	6.34	13.31
MelRoformer (2024.06, SDR: 7.02)	7.02	13.99
BSRoformer (viperx, SDR: 7.16)	7.16	14.13
Ensemble (mdx23 + MelRoformer, SDR: 7.18)	7.18	14.15
Ensemble (BSRoformer+ MelRoformer, SDR: 7.51)	7.51	14.48
BS Roformer SW (SDR: 9.05)	9.05	16.02

🗎 复制链接 | Use algorithm | Demo

MVSep Plucked Strings (plucked-strings, other)

The MVSep Plucked Strings is a high quality model for separating music into plucked string instruments and everything else. List of instruments: ukulele, sitar, banjo, mandolin, dobro, harp, guitar, acoustic guitar, electric guitar.

🗎 复制链接 | Use algorithm | Demo

MVSep Bowed Strings (strings, other)

The MVSep Bowed Strings is a high quality model for separating music into bowed string instruments and everything else. List of instruments: Fiddle, Violin, Viola, Cello, Double Bass.

🗎 复制链接 | Use algorithm | Demo

Ensemble (vocals, instrum)

Ensemble (vocals, instrum, bass, drums, other)

Ensemble All-In (vocals, bass, drums, piano, guitar, lead/back vocals, other)

BS Roformer SW (vocals, bass, drums, guitar, piano, other)

Demucs4 HT (vocals, drums, bass, other)

BS Roformer (vocals, instrumental)

MelBand Roformer (vocals, instrumental)

MDX23C (vocals, instrumental)

SCNet (vocals, instrumental)

MDX B (vocals, instrumental)

Ultimate Vocal Remover VR (vocals, music)

Demucs4 Vocals 2023 (vocals, instrum)

MVSep Karaoke (lead/back vocals)

MDX-B Karaoke (lead/back vocals)

MVSep Crowd removal (crowd, other)

Medley Vox (Multi-singer separation)

MVSep Multichannel BS (vocals, instrumental)

MVSep Male/Female separation

MVSep Choir (choir, other)

MVSep SATB Choir (soprano, alto, tenor, bass)

MVSep Drums (drums, other)

MVSep Bass (bass, other)

MVSep Synth (synth, other)

DrumSep (4-6 stems: kick, snare, cymbals, toms, ride, hh, crash)

MVSep Piano (piano, other)

MVSep Keys (keys, other)

MVSep Organ (organ, other)

MVSep Guitar (guitar, other)

MVSep Plucked Strings (plucked-strings, other)

MVSep Bowed Strings (strings, other)

站点信息

公司

其他