| Algorithm info: 40 days trained on 2xA100-40GB -
 Trained on +4500 songs
 -
 (My dataset are studio stems, private, ranging from 16kHz to 22kHz (WAV) songs, in this format: Arr Acoustic Guitar, Backing Vocals, Bass, Drum Kit, Electric Guitar, English Horn, Lead Vocal, Piano, Rhythm Acoustic Guitar , Brass Instruments, Flute, Lead Vocal, Piano, String Section, String Section...)
 -
 I didn't use any songs from the Multisong dataset (although ZFTurbo offered me +5000 and multisong dataset in exchange for my models, I rejected the offer..)
 -
 chunks: 7.99s
 dim: 384
 depth: 12
 size model: 961 MB
 -
 I conducted training over 3005 epochs. For a few days, the training process ran with 600 steps per epoch. However, the majority of the training used 1000 steps per epoch.
 
 Each epoch with 600 steps took approximately 7 to 10 minutes, while epochs with 1000 steps took around 14 to 15 minutes. These are estimated times.
 
 Initially, I suspected that the SDR was not improving due to using only 2x40GB A100 GPUs. After conducting tests with 8x80GB A100 GPUs, I observed that the SDR remained stagnant, suggesting that the issue might be related to an error in the implementation of the mel-roformer architecture.
 
 Trained by viperx.
 
 Date added: 2024-02-11
 |