MVSEP - Music & Voice Separation

ID	Ensemble	SDR Bass	SDR Drums	SDR Other	SDR Instrumental	SDR Vocals
6019	-	---	---	---	16.0136	9.7061
Algorithm info: 40 days trained on 2xA100-40GB - Trained on +4500 songs - (My dataset are studio stems, private, ranging from 16kHz to 22kHz (WAV) songs, in this format: Arr Acoustic Guitar, Backing Vocals, Bass, Drum Kit, Electric Guitar, English Horn, Lead Vocal, Piano, Rhythm Acoustic Guitar , Brass Instruments, Flute, Lead Vocal, Piano, String Section, String Section...) - I didn't use any songs from the Multisong dataset (although ZFTurbo offered me +5000 and multisong dataset in exchange for my models, I rejected the offer..) - chunks: 7.99s dim: 384 depth: 12 size model: 961 MB - I conducted training over 3005 epochs. For a few days, the training process ran with 600 steps per epoch. However, the majority of the training used 1000 steps per epoch. Each epoch with 600 steps took approximately 7 to 10 minutes, while epochs with 1000 steps took around 14 to 15 minutes. These are estimated times. Initially, I suspected that the SDR was not improving due to using only 2x40GB A100 GPUs. After conducting tests with 8x80GB A100 GPUs, I observed that the SDR remained stagnant, suggesting that the issue might be related to an error in the implementation of the mel-roformer architecture. Trained by viperx. Date added: 2024-02-11