Algorithm info: 40 days trained on 2xA100-40GB
-
Trained on +4500 songs
-
(My dataset are studio stems, private, ranging from 16kHz to 22kHz (WAV) songs, in this format: Arr Acoustic Guitar, Backing Vocals, Bass, Drum Kit, Electric Guitar, English Horn, Lead Vocal, Piano, Rhythm Acoustic Guitar , Brass Instruments, Flute, Lead Vocal, Piano, String Section, String Section...)
-
I didn't use any songs from the Multisong dataset (although ZFTurbo offered me +5000 and multisong dataset in exchange for my models, I rejected the offer..)
-
chunks: 7.99s
dim: 384
depth: 12
size model: 961 MB
-
I conducted training over 3005 epochs. For a few days, the training process ran with 600 steps per epoch. However, the majority of the training used 1000 steps per epoch.
Each epoch with 600 steps took approximately 7 to 10 minutes, while epochs with 1000 steps took around 14 to 15 minutes. These are estimated times.
Initially, I suspected that the SDR was not improving due to using only 2x40GB A100 GPUs. After conducting tests with 8x80GB A100 GPUs, I observed that the SDR remained stagnant, suggesting that the issue might be related to an error in the implementation of the mel-roformer architecture.
Trained by viperx.
Date added: 2024-02-11 |