News

March 2026 News

2026-03-21

1) We added an iOS app and updated our Android app. They are both now live.

The latest release adds the following features:

Auto-update checker
You can now send reviews for separations, just like on the website
Bug fixes

2) We have introduced a variety of new models for separating individual instruments:

1) MVSep Lead/Rhythm Guitar (Demo)	2) MVSep Plucked Strings (Demo)	3) MVSep Percussion (Demo)
4) MVSep Keys (Demo)	5) MVSep Brass (Demo)	6) MVSep Woodwind (Demo)
7) MVSep Xylophone (Demo)	8) MVSep Celesta (Demo)	9) MVSep Choir (Demo)
10) MVSep Bagpipes (Demo)	11) MVSep Braam (Demo)	12) MVSep FX (Demo)

The current separation scheme can be found below:

3) A new model, MVSep SATB Choir (soprano, alto, tenor, bass), has been added.

Description: https://mvsep.com/algorithms/104
Demo 1 vocals
Demo 2 vocals
Demo strings

A huge thanks to @Dry Paint Dealer Undr for helping me create this model.
P.S. The model works not only with vocals but also with strings and some other instruments.

4) We added the powerful VibeVoice model to the Experimental section. It is available in 2 variants: Voice Cloning and Text-to-Speech.

Key Features:

Two models: small (1.5B parameters) and large (7B parameters)
Up to 4 speakers in a single recording
Up to 90 minutes of generated audio
Language support: Officially supports English (default) and Chinese, but it has been verified to work decently for other languages as well.
Voice cloning: The ability to upload a reference audio recording

VibeVoice (Voice Cloning): Info | Demo 1 | Demo 2
VibeVoice (TTS): Info | Demo 1

We also noted that if a sample contains some music along with words, it can make the generated voice sing.

5) We added a new Crowd removal model based on the BSRoformer architecture. It's available in "MVSep Crowd removal (crowd, other)" under the name "BS Roformer (SDR crowd: 7.21)". The SDR has increased from 6.27 to 7.21.

6) Three new vocal models have been added.

In BS Roformer (vocals, instrumental):

unwa BS Roformer HyperACE v2 instrum (SDR instrum: 17.40)
unwa BS Roformer HyperACE v2 vocals (SDR vocals: 11.39)

In MelBand Roformer (vocals, instrumental):

becruily deux (SDR vocals: 11.35, SDR instrum: 17.66)

7) We added the new Transkun model. Transkun is a modern, open-source model for automatic piano music transcription (Audio-to-MIDI). The official page for the model is here. It is considered one of the best (SOTA — State of the Art) in its class. The model can recognize not only the notes themselves but also their duration, loudness (velocity), and pedal usage.

Demo | Model link

8) We added the new Basic Pitch model. Basic Pitch is a modern neural network from Spotify’s Audio Intelligence Lab that converts melodic audio recordings into notes (MIDI format). Unlike outdated converters, this model can "hear" not only individual notes but also chords, along with the finest nuances of a performance. Basic Pitch is an "instrument-agnostic" model. This means it handles different timbres equally well:

Vocals
Strings: Acoustic and electric guitars, violins, and cellos.
Keyboards: Pianos, organs, and synthesizers.
Winds: Flutes, saxophones, trumpets, and others.

Important: The model is designed for melodic instruments. It is not suitable for drums or percussion, as it focuses on pitch rather than rhythmic noise.

Demo | Description | Model link

9) We added the Bark (Speech Gen) algorithm to the Experimental section. Bark is a transformer-based model created by Suno, representing not just a traditional text-to-speech tool, but a fully generative "text-to-audio" system. Its capabilities go far beyond ordinary voicing: besides creating highly realistic speech in multiple languages, Bark can generate music, background noises, and simple sound effects. A unique feature of the model is its ability to reproduce subtle non-verbal communication, such as laughter, sighs, and crying, making the resulting sound maximally alive and natural.

Demo | Description

In our experiments, it sometimes doesn't follow the text or instructions. See the demo as an example.

10) We added Qwen3-TTS, a powerful speech generation model offering support for voice cloning, voice design, ultra-high-quality human-like speech generation, and natural language-based voice control. At MVSep, we use the largest 1.7 billion parameter model. The model is available in 3 variants:

Qwen3-TTS (Custom Voice) - A model with predefined speakers | Demo
Qwen3-TTS (Voice Design) - A model capable of creating a voice based on a description | Demo
Qwen3-TTS (Voice Cloning) - A model capable of cloning a voice based on a reference audio file | Demo

11) We added the new HeartMuLa algorithm to the site. It is an advanced open-source family of multimodal foundation models (Apache 2.0 license) designed for high-quality music synthesis and audio processing. Unlike proprietary cloud services (such as Suno or Udio), HeartMuLa gives developers the ability to run it locally on their own hardware. The quality of the generated songs is quite good.

Official repository | Demos 1 | Demos 2 | Documentation

Current limitations:
1) The model struggles to follow tags.
2) The model is computationally heavy and uses a lot of VRAM.

Model	Vocals fullness	Vocals bleedless	Vocals SDR	Vocals L1Freq	Instrum fullness	Instrum bleedless	Instrum SDR	Instrum L1Freq
SCNet XL IHF	17.98	28.31	11.11	37.91	28.87	40.37	17.41	38.54
SCNet XL IHF (high instrum fullness by becruily)	22.70	25.48	10.89	38.18	32.31	38.15	17.20	38.43

vocals	instrum	bass	drums	guitar	piano	other
11.30	17.50	14.62	14.11	9.05	7.83	8.71

Algorithm name	Multisong dataset		Synth dataset
Algorithm name	SDR Vocals	SDR Instrumental	SDR Vocals	SDR Instrumental
BS Roformer (ver. 2024.08)	11.31	17.62	13.56	13.27
BS Roformer (ver. 2025.07)	11.89	18.20	14.58	14.28

Algorithm name	Multisong dataset					Synth dataset
Algorithm name	SDR Bass	SDR Drums	SDR Other	SDR Vocals	SDR Instrumental	SDR Vocals	SDR Instrumental
SDR average: 13.07 (v. 2024.12.28)	14.14	13.57	8.10	11.61	17.92	14.09	13.79
SDR average: 13.67 (v. 2025.06.30)	14.85	14.33	9.00	11.93	18.23	14.58	14.28

News

March 2026 News

November News

September News

May News

January news

November news

September news

August Updates

July updates

Summer updates

Move to new server

Updates of bass models and ensembles

Updates of vocal models and ensembles

Latest march updates

February and march updates

January updates

Christmas updates

December updates

New model for crowd noise removal

November updates (MDX23C vocal model improvements)

September updates

Summer updates

New MDX23C models for vocal separation

News

Reverb effect removal and other news

MVSep Vocal Model has been added to the site

Experimental MVSep DNR algorithm

July changes on MVSep

April changes on MVSep

February changes on MVSep

Site information

Company

Extra