The MVSep Bass model is an ensemble of 2 models HTDemucs4 and BSRoformer. The model produces high-quality separation of music into a bass part and everything else.
The model is available in two versions. In the first version, the neural network model for the bass is used directly on the entire track. In the second case, the track is first divided into two parts, vocal and instrumental, and then the neural network model for the bass is applied only to the instrumental part. In the second case, the separation quality is usually slightly higher.
For MultiSong dataset SDR bass: 13.25
If you extract vocals first SDR bass: 13.42