Speech Synthesis

290 papers with code • 4 benchmarks • 19 datasets

Speech synthesis is the task of generating speech from some other modality like text, lip movements etc.

Please note that the leaderboards here are not really comparable between studies - as they use mean opinion score as a metric and collect different samples from Amazon Mechnical Turk.

( Image credit: WaveNet: A generative model for raw audio )

Benchmarks

Add a Result

These leaderboards are used to track progress in Speech Synthesis

Dataset	Best Model	Compare
LibriTTS	EVA-GAN-big	See all
North American English		See all
LJSpeech	BDDM vocoder	See all
Mandarin Chinese	WaveNet (L+F)	See all

Libraries

Use these libraries to find Speech Synthesis models and implementations

coqui-ai/TTS

15 papers

29,239

PaddlePaddle/PaddleSpeech

15 papers

10,142

TensorSpeech/TensorflowTTS

6 papers

3,698

keonlee9420/Expressive-FastSpeech2

4 papers

259

See all 22 libraries.

Datasets

Subtasks

Speech Synthesis - Tamil

Speech Synthesis - Kannada

Speech Synthesis - Malayalam

Speech Synthesis - Telugu

Speech Synthesis - Assamese

Speech Synthesis - Bengali

Speech Synthesis - Bodo

Speech Synthesis - Gujarati

Speech Synthesis - Hindi

Speech Synthesis - Manipuri

Speech Synthesis - Marathi

Speech Synthesis - Rajasthani

Most implemented papers

Most implemented Social Latest No code

WaveNet: A Generative Model for Raw Audio

ibab/tensorflow-wavenet • • 12 Sep 2016

This paper introduces WaveNet, a deep neural network for generating raw audio waveforms.

Paper
Code

FastSpeech 2: Fast and High-Quality End-to-End Text to Speech

coqui-ai/TTS • • ICLR 2021

In this paper, we propose FastSpeech 2, which addresses the issues in FastSpeech and better solves the one-to-many mapping problem in TTS by 1) directly training the model with ground-truth target instead of the simplified output from teacher, and 2) introducing more variation information of speech (e. g., pitch, energy and more accurate duration) as conditional inputs.

Paper
Code

Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions

coqui-ai/TTS • • 16 Dec 2017

This paper describes Tacotron 2, a neural network architecture for speech synthesis directly from text.

Paper
Code

Tacotron: Towards End-to-End Speech Synthesis

CorentinJ/Real-Time-Voice-Cloning • • 29 Mar 2017

A text-to-speech synthesis system typically consists of multiple stages, such as a text analysis frontend, an acoustic model and an audio synthesis module.

Paper
Code

FastSpeech: Fast, Robust and Controllable Text to Speech

coqui-ai/TTS • • NeurIPS 2019

In this work, we propose a novel feed-forward network based on Transformer to generate mel-spectrogram in parallel for TTS.

Paper
Code

MelGAN: Generative Adversarial Networks for Conditional Waveform Synthesis

descriptinc/melgan-neurips • • NeurIPS 2019

In this paper, we show that it is possible to train GANs reliably to generate high quality coherent waveforms by introducing a set of architectural changes and simple training techniques.

Paper
Code

Efficient Neural Audio Synthesis

CorentinJ/Real-Time-Voice-Cloning • • ICML 2018

The small number of weights in a Sparse WaveRNN makes it possible to sample high-fidelity audio on a mobile CPU in real time.

Paper
Code

Parallel WaveGAN: A fast waveform generation model based on generative adversarial networks with multi-resolution spectrogram

coqui-ai/TTS • • 25 Oct 2019

We propose Parallel WaveGAN, a distillation-free, fast, and small-footprint waveform generation method using a generative adversarial network.

Paper
Code

Style Tokens: Unsupervised Style Modeling, Control and Transfer in End-to-End Speech Synthesis

PaddlePaddle/PaddleSpeech • • ICML 2018

In this work, we propose "global style tokens" (GSTs), a bank of embeddings that are jointly trained within Tacotron, a state-of-the-art end-to-end speech synthesis system.

Paper
Code

Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis

CorentinJ/Real-Time-Voice-Cloning • • NeurIPS 2018

Clone a voice in 5 seconds to generate arbitrary speech in real-time

Paper
Code

Speech Synthesis

Benchmarks Add a Result

Libraries

Datasets

Subtasks

Most implemented papers

Content

Benchmarks

Add a Result