Experimental model VitLarge23 based on Vision Transformers. In terms of metrics, it is slightly inferior to the MDX23C, but may work better in some cases.
Quality table
Algorithm name | Multisong dataset | Synth dataset | MDX23 Leaderboard |
||
SDR Vocals | SDR Instrumental | SDR Vocals | SDR Instrumental | SDR Vocals | |
Vit Large 23 (512px) v1 | 9.78 | 16.09 | 12.33 | 12.03 | 10.47 |
Vit Large 23 (512px) v2 | 9.90 | 16.20 | 12.38 | 12.08 | --- |