We have many updates related to vocal models:
1) The BS Roformer (vocals, instrumental) model has been updated. SDR metrics have increased for vocals from 11.24 to 11.31 and for the instrumental from 17.55 to 17.62
2) We have added a new MelBand Roformer (vocals, instrumental) model. The neural network was first proposed in the article "Mel-Band RoFormer for Music Source Separation" by a group of scientists from ByteDance. The first high-quality weights were made publicly available by Kimberley Jensen. Then the neural network was slightly modified and finetuned by the MVSep team in order to improve the quality metrics. SDR for vocals is comparable to BS Roformer: 11.17. SDR for instrumental: 17.48.
3) Due to the new MelBand Roformer model, all algorithms of the Ensemble series have increased the metrics for vocals from 11.33 to 11.50 and for instrumental from 17.63 to 17.81.
4) We have added a new SCNet (vocals, instrumental) model. The neural network is proposed in the article "SCNet: Sparse Compression Network for Music Source Separation" by a group of scientists from China. The authors have made the neural network code open source, and the MVSep team was able to reproduce results similar to those presented in the published paper. First, we trained a small version of SCNet, and then after some time, a heavier version of SCNet was prepared. The quality metrics are quite close to the quality of Roformer models (which are the top models at the moment), but still slightly inferior. SDR metrics for the large version of the network. Vocals: 10.74 and instrumental part: 17.05.
5) An experimental model for noise removal DeNoise by aufr has been added. The model was prepared and made publicly available by aufr.
All measurements of SDR metrics were carried out on the Multisong dataset.