The MVSep Drums model is an ensemble of 2 models HTDemucs4 and MelRoformer. The model produces high-quality separation of music into a drums part and everything else.
The model is available in two versions. In the first version, the neural network model for the drums is used directly on the entire track. In the second case, the track is first divided into two parts, vocal and instrumental, and then the neural network model for the drums is applied only to the instrumental part. In the second case, the separation quality is usually slightly higher.
For MultiSong dataset SDR drums: 13.05
If you extract vocals first SDR drums: 13.14