MVSEP Logo
  • Home
  • News
  • Plans
  • Demo
  • FAQ
  • Create Account
  • Login

MelBand Roformer (vocals, instrumental)

Algorithm for separating tracks into vocal and instrumental parts based on the MelBand Roformer neural network.  The neural network was first proposed in the paper "Mel-Band RoFormer for Music Source Separation" by a group of scientists from ByteDance. The first high-quality weights were made publicly available by Kimberley Jensen. The neural network with open weights was then slightly modified and further trained by the MVSep team in order to improve quality metrics. Also there are high quality weights provided by: @Bas Curtiz, @unwa and @becruily.

Quality metrics

Algorithm name Multisong dataset Synth dataset MDX23 Leaderboard
SDR Vocals SDR Instrumental SDR Vocals SDR Instrumental SDR Vocals
MelBand Roformer (Kimberley Jensen) 11.01 17.32 12.68 12.38 11.543
MelBand Roformer (ver. 2024.08) 11.17 17.48 13.34 13.05 ---
Bas Curtiz edition 11.18 17.49 13.89 13.60 ---
unwa Instrumental 10.24 16.54 12.25 11.95 ---
unwa Instrumental v1e
Note: Max instrum fullness, but noisy
10.05 16.36 --- --- ---
unwa big beta v5e
Note: Max vocals fullness, but noisy
10.59 16.89 --- --- ---
MelBand Roformer (ver. 2024.10) 11.28 17.59 13.89 13.59 ---
becruily instrum max fullness 
Note: Max instrum fullness, but noisy
10.16 16.47 --- --- ---
becruily vocals max fullness
Note: Max vocals fullness, but noisy
10.55 16.86 --- --- ---
unwa Instrumental v1e plus
Note: Max instrum fullness, but noisy
10.33 16.64 --- --- ---

Detailed statistics on Multisong dataset:

Model Vocals fullness Vocals bleedless  Vocals SDR Vocals L1Freq Instrum fullness Instrum bleedless  Instrum SDR Instrum L1Freq
MelBand Roformer (Kimberley Jensen) 16.66 36.51 11.01 38.96 27.71 46.72 17.32 39.77
MelBand Roformer (ver. 2024.08) 16.39 39.13 11.18 39.26 27.74 47.07 17.49 40.16
Bas Curtiz edition 16.30 38.94 11.18 39.18 27.49 47.00 17.49 40.15
MelBand Roformer (ver. 2024.10) 16.92 37.78 11.28 39.41 27.71 47.29 17.59 40.29
unwa Instrumental v1 (SDR vocals: 10.24, SDR instrum: 16.54) 15.89 27.48 10.24 36.06 35.44 38.02 16.55 38.67
unwa Instrumental v1e (SDR vocals: 10.05, SDR instrum: 16.36) 14.67 26.83 10.06 34.37 38.85 35.68 16.37 38.31
unwa big beta v5e (SDR vocals: 10.59, SDR instrum: 16.89) 20.78 32.02 10.59 38.53 25.65 45.90 16.90 37.31
becruily instrum high fullness (SDR instrum: 16.47) 15.76 30.15 10.16 35.84 33.93 40.55 16.47 38.86
becruily vocals high fullness (SDR vocals: 10.55) 20.72 31.25 10.55 38.84 28.28 40.85 16.86 38.24
unwa Instrumental v1e plus (SDR vocals: 10.33, SDR instrum: 16.64) 14.96 31.89 10.33 35.76 36.20 38.57 16.64 39.04
🗎 Copy link

MVSEP Logo

turbo@mvsep.com

Advanced features

Quality Checker

Algorithms

Full API Documentation

Company

Privacy Policy

Terms & Conditions

Refund Policy

Extra

Help us translate!

Help us promote!