This is a project that implements DEMUCS model proposed in Real Time Speech Enhancement in the Waveform Domain from scratch in Pytorch. The proposed model is based on an encoder-decoder architecture with skip-connections. It is optimized on both time and frequency domains, using multiple loss functions. The web interface for this project is available at hugging face. You can record your voice in noisy conditions and get denoised version using DEMUCS model. In the scope of this project Valentini dataset in used. It is clean and noisy parallel speech database. The database was designed to train and test speech enhancement methods that operate at 48kHz. There are 56 speakers and ~10 gb of speech data. For model improvement it is possible to use a bigger training set from DNS challenge.
Link to Github/etc: https://github.com/BorisovMaksim/denoising
Our website uses cookies, including web analytics services. By using the website, you consent to the processing of personal data using cookies. You can find out more about the processing of personal data in the Privacy policy