Speech Enhancement

218 papers with code • 12 benchmarks • 19 datasets

Speech Enhancement is a signal processing task that involves improving the quality of speech signals captured under noisy or degraded conditions. The goal of speech enhancement is to make speech signals clearer, more intelligible, and more pleasant to listen to, which can be used for various applications such as voice recognition, teleconferencing, and hearing aids.

( Image credit: A Fully Convolutional Neural Network For Speech Enhancement )

Benchmarks

Add a Result

These leaderboards are used to track progress in Speech Enhancement

Dataset	Best Model	Compare
VoiceBank + DEMAND	MP-SENet	See all
Deep Noise Suppression (DNS) Challenge	MP-SENet	See all
CHiME-3	Inter-Channel Conv-TasNet	See all
EasyCom	MaxDI (Baseline)	See all
DNS Challenge	DCUnet-MC	See all
WHAMR!	SepFormer	See all
WSJ0 + DEMAND + RNNoise	DCUNet-MC	See all
GRID corpus (mixed-speech)	Audio-Visual concat-ref	See all
TCD-TIMIT corpus (mixed-speech)	Audio-Visual concat-ref	See all
LibriSpeechDuplicate	SE-MelGAN	See all
WHAM!	SepFormer	See all
spatialized DNS challenge	DeFT-AN	See all

Show all 12 benchmarks

Collapse benchmarks

Libraries

Use these libraries to find Speech Enhancement models and implementations

rikorose/deepfilternet

4 papers

1,920

microsoft/DNS-Challenge

4 papers

971

anicolson/DeepXi

4 papers

485

espnet/espnet

3 papers

7,875

See all 10 libraries.

Datasets

Subtasks

Most implemented papers

Most implemented Social Latest No code

Proximal Policy Optimization Algorithms

labmlai/annotated_deep_learning_paper_implementations • • 20 Jul 2017

We propose a new family of policy gradient methods for reinforcement learning, which alternate between sampling data through interaction with the environment, and optimizing a "surrogate" objective function using stochastic gradient ascent.

171

Paper
Code

Perceptual Losses for Real-Time Style Transfer and Super-Resolution

alexjc/neural-enhance • 27 Mar 2016

We consider image transformation problems, where an input image is transformed into an output image.

Paper
Code

SEGAN: Speech Enhancement Generative Adversarial Network

santi-pdp/segan • • 28 Mar 2017

In contrast to current techniques, we operate at the waveform level, training the model end-to-end, and incorporate 28 speakers and 40 different noise conditions into the same model, such that model parameters are shared across them.

Paper
Code

Conv-TasNet: Surpassing Ideal Time-Frequency Magnitude Masking for Speech Separation

naplab/Conv-TasNet • • 20 Sep 2018

The majority of the previous methods have formulated the separation problem through the time-frequency representation of the mixed signal, which has several drawbacks, including the decoupling of the phase and magnitude of the signal, the suboptimality of time-frequency representation for speech separation, and the long latency in calculating the spectrograms.

Paper
Code

Phase-aware Speech Enhancement with Deep Complex U-Net

AppleHolic/source_separation • • ICLR 2019

Most deep learning-based models for speech enhancement have mainly focused on estimating the magnitude of spectrogram while reusing the phase from noisy speech for reconstruction.

Paper
Code

DCCRN: Deep Complex Convolution Recurrent Network for Phase-Aware Speech Enhancement

huyanxin/DeepComplexCRN • • Interspeech 2020

Speech enhancement has benefited from the success of deep learning in terms of intelligibility and perceptual quality.

Paper
Code

A Fully Convolutional Neural Network for Speech Enhancement

zhr1201/CNN-for-single-channel-speech-enhancement • • 22 Sep 2016

In hearing aids, the presence of babble noise degrades hearing intelligibility of human speech greatly.

Paper
Code

FullSubNet: A Full-Band and Sub-Band Fusion Model for Real-Time Single-Channel Speech Enhancement

haoxiangsnr/FullSubNet • • 29 Oct 2020

In our proposed FullSubNet, we connect a pure full-band model and a pure sub-band model sequentially and use practical joint training to integrate these two types of models' advantages.

Paper
Code

MetricGAN: Generative Adversarial Networks based Black-box Metric Scores Optimization for Speech Enhancement

JasonSWFu/MetricGAN • 13 May 2019

Adversarial loss in a conditional generative adversarial network (GAN) is not designed to directly optimize evaluation metrics of a target task, and thus, may not always guide the generator in a GAN to generate data with improved metric scores.

Paper
Code

SoundStream: An End-to-End Neural Audio Codec

google/lyra • 7 Jul 2021

We present SoundStream, a novel neural audio codec that can efficiently compress speech, music and general audio at bitrates normally targeted by speech-tailored codecs.

Paper
Code

Speech Enhancement

Benchmarks Add a Result

Libraries

Datasets

Subtasks

Most implemented papers

Content

Benchmarks

Add a Result