Algorithm for separating tracks into vocal and instrumental parts based on the SCNet neural network. The neural network was proposed in the article "SCNet: Sparse Compression Network for Music Source Separation" by a group of scientists from China. The authors made the neural network code open source, and the MVSep team was able to reproduce results similar to those presented in the published article. First, we trained a small version of SCNet, and then after some time, a heavier version of SCNet was prepared. The quality metrics are quite close to the quality of Roformer models (which are the top models at the moment), but still slightly inferior. However, in some cases, the model can work better than Roformers.
Quality metrics
Algorithm name | Multisong dataset | Synth dataset | MDX23 Leaderboard |
||
SDR Vocals | SDR Instrumental | SDR Vocals | SDR Instrumental | SDR Vocals | |
SCNet | 10.25 | 16.56 | 12.27 | 11.97 | --- |
SCNet Large | 10.74 | 17.05 | 12.89 | 12.59 | --- |