This is a project that implements DEMUCS model proposed in Real Time Speech Enhancement in the Waveform Domain from scratch in Pytorch. The proposed model is based on an encoder-decoder architecture with skip-connections. It is optimized on both time and frequency domains, using multiple loss functions. The web interface for this project is available at hugging face. You can record your voice in noisy conditions and get denoised version using DEMUCS model. In the scope of this project Valentini dataset in used. It is clean and noisy parallel speech database. The database was designed to train and test speech enhancement methods that operate at 48kHz. There are 56 speakers and ~10 gb of speech data. For model improvement it is possible to use a bigger training set from DNS challenge.
Link to Github/etc: https://github.com/BorisovMaksim/denoising
Cookies help us deliver our services. By using our services, you agree to our use of cookies.