Algorithm for separating tracks into vocal and instrumental parts based on the MelBand Roformer neural network. The neural network was first proposed in the paper "Mel-Band RoFormer for Music Source Separation" by a group of scientists from ByteDance. The first high-quality weights were made publicly available by Kimberley Jensen. The neural network with open weights was then slightly modified and further trained by the MVSep team in order to improve quality metrics.
Quality metrics
Algorithm name | Multisong dataset | Synth dataset | MDX23 Leaderboard |
||
SDR Vocals | SDR Instrumental | SDR Vocals | SDR Instrumental | SDR Vocals | |
MelBand Roformer (Kimberley Jensen) | 11.01 | 17.32 | 12.68 | 12.38 | 11.543 |
MelBand Roformer (ver. 2024.08) | 11.17 | 17.48 | 13.34 | 13.05 | --- |