MVSEP - Music & Voice Separation

Introduction

On this page you can find a tools for checking the quality of models for splitting tracks into different stems like vocal, bass, drums etc. As well as a table of the last performed checks.

Datasets

At the moment we have two datasets. One synthetic to check the quality of the separation into two parts: vocals and instruments. The second multisong dataset consists of individual songs from different genres and can check for vocals, instrumental, bass, drums and other stems.

Synthetic dataset

Synthetic dataset is made up of random vocal and instrumental samples mixed together. It doesn't always sound like a real melody, but it does allow you to test track splitting techniques. The dataset consists of 100 tracks, each exactly one minute long and the sample rate is 44100. The dataset has a closed part, which consists of audio files containing only the instrumental part of the composition and containing the vocal part of the composition. This part is hidden for correct check of the algorithms on the side of our server. Dataset size ~1.9 GB

Download synthetic dataset (~1.9 GB)

Multisong dataset

The Multisong dataset is made up of 100 songs from different genres found on the Internet: (Acoustic, Folk, Modern Blues, American Roots Rock, Modern Country, Ambient, Beats, Dance, Deep House, Disco, Drum n Bass, Electro, Euro Pop, Future Bass, House, Soft House, Funk, Alternative Hip Hop, Mainstream Hip Hop, Old School Hip Hop, Trap, Acid Jazz, Big Band, Modern Jazz, Smooth Jazz, Bossa Nova, Modern Latin, Salsa, 1970s Pop, 1980s Pop, 1990s Pop, 2000s Pop, 2010s Pop, 2020s Pop, Afrobeats, Indie Pop, K-pop, Synth Pop, RnB, Soul, 1960s Rock, Alternative, Hard Rock, Punk, Modern Hymns, Praise & Worship, India). Models that give good metrics on this dataset can boast of versatility. However, there is a small chance that some of the tunes were used to train some models, which makes the comparison less fair. The dataset consists of 100 tracks, each exactly one minute long and the sample rate is 44100. The dataset has a closed part, which consists of audio files containing 5 parts from which the composition is composed. This part is closed for an honest check of the algorithms on the side of our server. Dataset size ~1.8 GB

Download Multisong dataset (~1.8 GB)

Check other datasets

How to test your algorithm on synthetic dataset?

To do this, you need to download the dataset. Run your algorithm on this dataset. Each of the 100 melodies should be divided into 2 parts: instrumental and vocal. This way you should get exactly 200 files. The naming is as follows:

As an example for file: melody_086_mixture.wav you should get 2 files melody_086_instrum.wav and melody_086_vocals.wav. melody_086_instrum.wav - contains the instrumental part of the composition, melody_086_vocals.wav - contains the vocal part of the composition.

You can upload the result in .wav, .flac and .mp3 formats . mp3 is not recommended due to some loss of quality and, as a result, a decrease in the value of the metric.

The resulting 200 files must be zipped into an .zip file and uploaded to our server for automatic verification. For your submission, the average SDR metric will be calculated (similar to the Music Demuxing Challenge ) SDR metric will be calculated independently for instrumental and vocals parts. After some time, the result will appear in the overall leaderboard. You can upload your solution using the form below.

How to test your algorithm on Multisong dataset?

To test your model, you need to download the Multisong dataset. Then run your algorithm on this dataset. Each of the 100 melodies should be divided into 5 parts: "instrumental", "vocal", "bass", "drums" and "other". This way you should get exactly 500 files. You can also check each part separately (for the convenience of uploading large files to the server). Let's say you only want to check the "bass" in this case you need to upload only 100 files with "_bass" suffix.
The naming is as follows:

As an example for file: song_086_mixture.wav you should get 5 files:

song_086_instrum.wav - contains only instruemntal part of song
song_086_vocals.wav - contains only vocal part of song
song_086_bass.wav - contains only bass part of song
song_086_drums.wav - contains only drums part of song
song_086_other.wav - contains all other part of song except vocals, bass and drums

You can upload the result in .wav, .flac and .mp3 formats . mp3 is not recommended due to some loss of quality and, as a result, a decrease in the value of the metric.

The resulting 100 or 500 files must be zipped into an .zip file and uploaded to our server for automatic verification. For your submission, the average SDR metric will be calculated (similar to the Music Demuxing Challenge ) SDR metric will be calculated independently for instrumental and vocals parts. After some time, the result will appear in the overall leaderboard. You can upload your solution using the form below.

You can find guide in this video:

Citation

arxiv paper

@misc{solovyev2023benchmarks,
      title={Benchmarks and leaderboards for sound demixing tasks}, 
      author={Roman Solovyev and Alexander Stempkovskiy and Tatiana Habruseva},
      year={2023},
      eprint={2305.07489},
      archivePrefix={arXiv},
      primaryClass={cs.SD}
}