Voice Conversion
149 papers with code • 2 benchmarks • 5 datasets
Voice Conversion is a technology that modifies the speech of a source speaker and makes their speech sound like that of another target speaker without changing the linguistic information.
Libraries
Use these libraries to find Voice Conversion models and implementationsMost implemented papers
StarGAN-VC: Non-parallel many-to-many voice conversion with star generative adversarial networks
This paper proposes a method that allows non-parallel many-to-many voice conversion (VC) by using a variant of a generative adversarial network (GAN) called StarGAN.
One-shot Voice Conversion by Separating Speaker and Content Representations with Instance Normalization
Recently, voice conversion (VC) without parallel data has been successfully adapted to multi-target scenario in which a single model is trained to convert the input voice to many different speakers.
AUTOVC: Zero-Shot Voice Style Transfer with Only Autoencoder Loss
On the other hand, CVAE training is simple but does not come with the distribution-matching property of a GAN.
Parallel-Data-Free Voice Conversion Using Cycle-Consistent Adversarial Networks
A subjective evaluation showed that the quality of the converted speech was comparable to that obtained with a Gaussian mixture model-based method under advantageous conditions with parallel and twice the amount of data.
CycleGAN-VC2: Improved CycleGAN-based Non-parallel Voice Conversion
Non-parallel voice conversion (VC) is a technique for learning the mapping from source to target speech without relying on parallel data.
MOSNet: Deep Learning based Objective Assessment for Voice Conversion
In this paper, we propose deep learning-based assessment models to predict human ratings of converted speech.
Unsupervised Speech Decomposition via Triple Information Bottleneck
Speech information can be roughly decomposed into four components: language content, timbre, pitch, and rhythm.
Utilizing Self-supervised Representations for MOS Prediction
In this paper, we use self-supervised pre-trained models for MOS prediction.
Defense for Black-box Attacks on Anti-spoofing Models by Self-Supervised Learning
To explore this issue, we proposed to employ Mockingjay, a self-supervised learning based model, to protect anti-spoofing models against adversarial attacks in the black-box scenario.
Voice Conversion from Non-parallel Corpora Using Variational Auto-encoder
We propose a flexible framework for spectral conversion (SC) that facilitates training with unaligned corpora.