The model separates the drum track into 4, 5, or 6 types: 'kick', 'snare', 'cymbals', 'toms'. In the 5-track models, 'hh' is separated from 'cymbals', and in the case of 6 tracks, 'cymbals' is split into 'hh', 'ride', and 'crash'.
A total of 6 models are available:
1) The DrumSep model from the GitHub repository. It was trained on the HDemucs architecture and splitting drums into 4 tracks.
2) A model based on the mdx23c architecture, prepared by @jarredou and @aufr33. The model splits drums into 6 tracks.
3) A model based on the SCNet XL architecture, which splits drums into 5 tracks.
4) A model based on the SCNet XL architecture, which splits drums into 6 tracks.
5) A model based on the SCNet XL architecture, which splits drums into 4 tracks.
6) Ensemble of 4 models (1 MDX23C + 3 SCNet XL)
All models work only with the drum track. If other instruments or vocals are present in the track, the model will not work correctly. Therefore, the algorithm has two modes of operation. In the first (default) mode, the best model for drums, MVSep Drums, is first applied to the track, extracting only the drum part. Then, the DrumSep model is applied. If your track consists only of drums, it makes sense to use the second mode, where the DrumSep model is applied directly to the uploaded audio.
Quality table (SDR metric, higher is better):
Algorithm name | kick | snare | toms | cymbals | ||
hh | ride | crash | ||||
DrumSep model by inagoy (HDemucs, 4 stems) | 14.13 | 8.42 | 5.67 | 5.63 | ||
DrumSep model by aufr33 and jarredou (MDX23C, 4 stems) | 18.32 | 13.60 | 13.25 | 6.71 | 5.38 | 7.56 |
DrumSep model (SCNet XL, 5 stems) | 20.21 | 15.05 | 16.28 | 7.05 | 8.56 | |
DrumSep model (SCNet XL, 6 stems) | 20.24 | 14.80 | 15.93 | 6.74 | 5.02 | 7.63 |
DrumSep model (SCNet XL, 4 stems) | 20.50 | 14.69 | 15.92 | 10.08 | ||
Ensemble of 4 models | 20.59 | 15.11 | 16.41 | 7.19 | 5.59 | 7.85 |