MVSEP Logo
  • 主页
  • 新闻
  • 套餐
  • 示例
  • 常见问题解答
  • 创建账号
  • 登录

MVSep Male/Female separation

A model for separating male and female voices within a single vocal track. The track should contain only voices, no music.

Quality metrics

Algorithm name Male/Female validation dataset
SDR Male SDR Female L1_Freq Male L1_Freq Female
BSRoformer by Sucial (SDR: 6.52) 6.82 6.23 40.99 40.62
BSRoformer by aufr33 (SDR: 8.18) 8.47 7.89 46.65 44.73
SCNet XL (SDR: 11.83) 12.08 11.58 50.50 51.51
MelRoformer (2025.01) (SDR: 13.03) 13.39 12.68 57.61 56.76

 

🗎 复制链接

Demucs3 Model (vocals, drums, bass, other)

Algorithm Demucs3 splits track into 4 stems (bass, drums, vocals, other). The winner of the Music Demuxing Challenge 2021. 

Link: https://github.com/facebookresearch/demucs/tree/v3

Quality table

Algorithm name Multisong dataset Synth dataset
SDR Bass SDR Drums SDR Other SDR Vocals SDR Instrumental SDR Vocals SDR Instrumental
Demucs3 (Model A) 9.50 8.97 4.40 7.21 13.52 --- ---
Demucs3 (Model B) 10.69 10.27 5.35 8.13 14.44 9.78 9.48

Note: For version A only MUSDB18 training data was used for training, so quality is worse than Demucs3 Model B. Demucs3 Model A and Demucs3 Model B has the same architecture, but has different weights.

🗎 复制链接

Vit Large 23 (vocals, instrum)

Experimental model VitLarge23 based on Vision Transformers. In terms of metrics, it is slightly inferior to the MDX23C, but may work better in some cases.

Quality table

Algorithm name Multisong dataset Synth dataset MDX23 Leaderboard
SDR Vocals SDR Instrumental SDR Vocals SDR Instrumental SDR Vocals
Vit Large 23 (512px) v1 9.78 16.09 12.33 12.03 10.47 
Vit Large 23 (512px) v2 9.90 16.20 12.38 12.08 ---
🗎 复制链接

MVSep MelBand Roformer (vocals, instrum)

Mel Band Roformer - a model proposed by employees of the company ByteDance for the competition Sound Demixing Challenge 2023, where they took first place on LeaderBoard C. Unfortunately, the model was not made publicly available and was reproduced according to a scientific article by the developer @lucidrains on the github. The vocal model was trained from scratch on our internal dataset. Unfortunately, we have not yet been able to achieve similar metrics as the authors.

Quality table

Algorithm name Multisong dataset Synth dataset MDX23 Leaderboard
SDR Vocals SDR Instrumental SDR Vocals SDR Instrumental SDR Vocals
Mel Band Roformer v1 (vocals) 9.07 --- 11.76 --- ---
🗎 复制链接

LarsNet (kick, snare, cymbals, toms, hihat)

The LarsNet model divides the drums stem into 5 types: 'kick', 'snare', 'cymbals', 'toms', 'hihat'. The model is from this github repository and it was trained on the dataset StemGMD. The model has two operating modes. The first (default) applies the Demucs4 HT model to the track at stage one, which extracts only the drum part from the track. On the second stage, the LarsNet model is used. If your track consists only of drums, then it makes sense to use the second mode, where the LarsNet model is applied directly to the uploaded audio. Unfortunately, subjectively, the quality of separation is inferior in quality to the model DrumSep.

🗎 复制链接

MVSep MultiSpeaker (MDX23C)

MVSep MultiSpeaker (MDX23C) - this model tries to isolate the most loud voice from all other voices. It uses MDX23C architecture. Still under development.

🗎 复制链接

Aspiration (by Sucial)

The algorithm adds "whispering" effect to vocals. Model was created by SUC-DriverOld. More details here.

🗎 复制链接

AudioSR (Super Resolution)

Algorithm AudioSR: Versatile Audio Super-resolution at Scale. Algorithm restores high frequencies. It works on all types of audio (e.g., music, speech, dog, raining, ...). It was initially trained for mono audio, so it can give not so stable result on stereo.

Metric on Super Resolution Checker for Music Leaderboard (Restored): 25.3195
Authors' paper: https://arxiv.org/pdf/2309.07314
Original repository: https://github.com/haoheliu/versatile_audio_super_resolution
Original inference script prepared by @jarredou: https://github.com/jarredou/AudioSR-Colab-Fork

🗎 复制链接

FlashSR (Super Resolution)

FlashSR - audio super resolution algorithm for restoring high frequencies. It's based on paper FlashSR: One-step Versatile Audio Super-resolution via Diffusion Distillation. 

Metric on Super Resolution Checker for Music Leaderboard (Restored): 22.1397
Original repository: https://github.com/jakeoneijk/FlashSR_Inference
Inference script by @jarredou: https://github.com/jarredou/FlashSR-Colab-Inference

🗎 复制链接

  • ‹
  • 1
  • 2
  • ›
MVSEP Logo

turbo@mvsep.com

高级功能

质量检查工具

算法

完整 API 文档

公司

隐私政策

服务条款

退款政策

其他

帮助我们翻译!

帮助我们推广!