✨ SuperVoice Enhance [BETA]

Feel free to join my Discord Server to discuss this model!

Enhancing diffusion neural network for a single speaker speech based on Speech Flow architecture. Evaluation notebook.

Important

Network was trained using 5s intrevals, but it can work with any length of audio with slightly reduced quality.

Features

⚡️ Restoring and improving audio
🎤 24khz mono audio
🚀 Can work directly with spectograms for speedup and tight pipelining
🤹‍♂️ Can work with unknown languages

enhance_demo.mp4

Usage

Supervoice Enhance consists of multiple networks, but they are all loaded using a single command and published using Torch Hub, so you can use it as follows:

import torch
import torchaudio

# Load model
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = torch.hub.load(repo_or_dir='ex3ndr/supervoice-enhance', model='enhance', vocoder = True) # vocoder = False if you don't need vocoder
model.to(device)
model.eval()

# Load audio
def load_mono_audio(path):
    audio, sr = torchaudio.load(path)
    if sr != model.sample_rate:
        audio = torchaudio.transforms.Resample(sr, model.sample_rate)(audio)
        sr = model.sample_rate
    if audio.shape[0] > 1:
        audio = audio.mean(dim=0, keepdim=True)
    return audio[0]
audio = load_mono_audio("./eval/eval_2.wav")

# Enhance
enhanced = model.enhance(waveform = audio, steps = 8) # 8 is optimal, 32 is higer quality but sometimes it halluciantes
enhanced_spec = model.enhance(waveform = audio, steps = 8, vocoder = False) # Return spectogram without running vocoder

License

MIT

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

✨ SuperVoice Enhance [BETA]

Features

Usage

License

Files

README.md

Latest commit

History

README.md

File metadata and controls

✨ SuperVoice Enhance [BETA]

Features

Usage

License