Git Product home page Git Product logo

dns-challenge's Introduction

ICASSP 2023 Deep Noise Suppression Challenge

Website: https://aka.ms/dns-challenge Git Repo: https://github.com/microsoft/DNS-Challenge Challenge Paper:

Important features of this challenge

  1. Along with noise suppression, it includes de-reverberation and suppression of interfering talkers for headset and speakerphone scenarios.
  2. The challenge has two tracks: (i) Headset (wired/wireless headphone, earbuds such as airpods etc.) speech enhancement; (ii) Non-headset (speakerphone, built-in mic in laptop/desktop/mobile phone/other meeting devices etc.) speech enhancement.
  3. This challenge adopts the ITU-T P.835 subjective test framework to measure speech quality (SIG), background noise quality (BAK), and overall audio quality (OVRL). We modified the ITU-T P.835 to make it reliable for test clips with interfering (undesired neighboring) talkers. Along with P.835 scores, Word Accuracy (WAcc) is used to measure the performance of models.
  4. Please NOTE that the intellectual property (IP) is not transferred to the challenge organizers, i.e., if code is shared/submitted, the participants remain the owners of their code (when the code is made publicly available, an appropriate license should be added).
  5. There are new requirements for model related latency. Please check all requirements listed at https://www.microsoft.com/en-us/research/academic-program/deep-noise-suppression-challenge-icassp-2023/

Baseline Speaker Embeddings

This challenge adopted pretrained ECAPA-TDNN model available in SpeechBrain as baseline speaker embeddings models, available at https://huggingface.co/speechbrain/spkrec-ecapa-voxceleb. Participants can use any other publically available speaker embeddings model or develop their own speaker embedding extractor. Participants are encourage to explore RawNet3 models available at https://github.com/jungjee/RawNet

Previous DNS Challenge used RawNet2 speaker embeddings. So far, impact of different speaker embeddings for personalized speech enhancements is not studied in sufficient depth.

Install SpeechBrain with below command:

pip install speechbrain

#Compute Speaker Embeddings for your wav file with below command:

import torchaudio from speechbrain.pretrained import EncoderClassifier classifier = EncoderClassifier.from_hparams(source="speechbrain/spkrec-ecapa-voxceleb") signal, fs =torchaudio.load('tests/samples/ASR/spk1_snt1.wav') embeddings = classifier.encode_batch(signal)

In this repository

This repository contains the datasets and scripts required for 5th DNS Challenge at ICASSP 2023, aka DNS Challenge 5, or simply DNS5. For more details about the challenge, please see our website and paper. For more details on the testing framework, please visit P.835.

Details

  • The datasets_fullband folder is a placeholder for the datasets. That is, our data downloader script by default will place the downloaded audio data there. After the download, it will contain clean speech, noise, and room impulse responses required for creating the training data.

  • The Baseline directory contains the enhanced clips from dev testset for both tracks.

  • download-dns-challenge-5-headset-training.sh - this is the script to download the data for headset (Track 1). By default, the data will be placed into the ./datasets_fullband/ folder. Please take a look at the script and uncomment the perferred download method._ Unmodified, the script performs a dry run and retrieves only the HTTP headers for each archive.

  • download-dns-challenge-5-speakerphone-training.sh - this is the script to download the data for speakerphone (Track 2).

  • noisyspeech_synthesizer_singleprocess.py - is used to synthesize noisy-clean speech pairs for training purposes.

  • noisyspeech_synthesizer.cfg - is the configuration file used to synthesize the data. Users are required to accurately specify different parameters and provide the right paths to the datasets required to synthesize noisy speech.

  • audiolib.py - contains modules required to synthesize datasets.

  • utils.py - contains some utility functions required to synthesize the data.

  • unit_tests_synthesizer.py - contains the unit tests to ensure sanity of the data.

  • requirements.txt - contains all the libraries required for synthesizing the data.

Datasets

V5_dev_testset: directory containing dev testsets for both tracks. Each testclip has 10s duration and the corresponding enrollment clips with 30s duration.

BLIND testset:

WAcc script

https://github.com/microsoft/DNS-Challenge/tree/master/WAcc

Wacc ground-truth transcript

Dev testset: available only for speakerphone track, see v5_dev_testset directory. For headset track, we are providing ASR output and list of prompts read during recording of testclips. Participants can help in correcting ASR output to generate the ground-truth transcripts. Blind testset:

Data info

The default directory structure and the sizes of the datasets of the 5th DNS Challenge are:

datasets_fullband 
+-- dev_testset 
+-- impulse_responses 5.9G
+-- noise_fullband 58G
\-- clean_fullband 827G
    +-- emotional_speech 2.4G
    +-- french_speech 62G
    +-- german_speech 319G
    +-- italian_speech 42G
    +-- read_speech 299G
    +-- russian_speech 12G
    +-- spanish_speech 65G
    +-- vctk_wav48_silence_trimmed 27G
    \-- VocalSet_48kHz_mono 974M

In all, you will need about 1TB to store the unpacked data. Archived, the same data takes about 550GB total.

Headset DNS track

Data checksums

A CSV file containing file sizes and SHA1 checksums for audio clips in both Real-time and Personalized DNS datasets is available at: dns5-datasets-files-sha1.csv.bz2. The archive is 41.3MB in size and can be read in Python like this:

import pandas as pd

sha1sums = pd.read_csv("dns5-datasets-files-sha1.csv.bz2", names=["size", "sha1", "path"])

Code prerequisites

  • Python 3.6 and above
  • Python libraries: soundfile, librosa

NOTE: git LFS is no longer required for DNS Challenge. Please use the download-dns-challenge-5*.sh scripts in this repo to download the data.

Usage:

  1. Install Python libraries
pip3 install soundfile librosa
  1. Clone the repository.
git clone https://github.com/microsoft/DNS-Challenge
  1. Edit noisyspeech_synthesizer.cfg to specify the required parameters described in the file and include the paths to clean speech, noise and impulse response related csv files. Also, specify the paths to the destination directories and store the logs.

  2. Create dataset

python3 noisyspeech_synthesizer_singleprocess.py

Citation:

If you use this dataset in a publication please cite the following paper:

@inproceedings{dubey2023icassp,
  title={ICASSP 2023 Deep Noise Suppression Challenge},
  author={
 Dubey, Harishchandra and Aazami, Ashkan and Gopal, Vishak and Naderi, Babak and Braun, Sebastian and  Cutler, Ross and Gamper, Hannes and Golestaneh, Mehrsa and Aichner, Robert},
  booktitle={ICASSP},
  year={2023}
}

The previous challenges were:

@inproceedings{dubey2022icassp,
  title={ICASSP 2022 Deep Noise Suppression Challenge},
  author={Dubey, Harishchandra and Gopal, Vishak and Cutler, Ross and Matusevych, Sergiy and Braun, Sebastian and Eskimez, Emre Sefik and Thakker, Manthan and Yoshioka, Takuya and Gamper, Hannes and Aichner, Robert},
  booktitle={ICASSP},
  year={2022}
}

@inproceedings{reddy2021interspeech,
  title={INTERSPEECH 2021 Deep Noise Suppression Challenge},
  author={Reddy, Chandan KA and Dubey, Harishchandra and Koishida, Kazuhito and Nair, Arun and Gopal, Vishak and Cutler, Ross and Braun, Sebastian and Gamper, Hannes and Aichner, Robert and Srinivasan, Sriram},
  booktitle={INTERSPEECH},
  year={2021}
}
@inproceedings{reddy2021icassp,
  title={ICASSP 2021 deep noise suppression challenge},
  author={Reddy, Chandan KA and Dubey, Harishchandra and Gopal, Vishak and Cutler, Ross and Braun, Sebastian and Gamper, Hannes and Aichner, Robert and Srinivasan, Sriram},
  booktitle={ICASSP},
  year={2021},
}
@inproceedings{reddy2020interspeech,
  title={The INTERSPEECH 2020 deep noise suppression challenge: Datasets, subjective testing framework, and challenge results},
  author={Reddy, Chandan KA and Gopal, Vishak and Cutler, Ross and Beyrami, Ebrahim and Cheng, Roger and Dubey, Harishchandra and Matusevych, Sergiy and Aichner, Robert and Aazami, Ashkan and Braun, Sebastian and others},
  booktitle={INTERSPEECH},
  year={2020}
}

The baseline NSNet noise suppression:

@inproceedings{9054254,
    author={Y. {Xia} and S. {Braun} and C. K. A. {Reddy} and H. {Dubey} and R. {Cutler} and I. {Tashev}},
    booktitle={ICASSP 2020 - 2020 IEEE International Conference on Acoustics,
    Speech and Signal Processing (ICASSP)},
    title={Weighted Speech Distortion Losses for Neural-Network-Based Real-Time Speech Enhancement},
    year={2020}, volume={}, number={}, pages={871-875},}
@misc{braun2020data,
    title={Data augmentation and loss normalization for deep noise suppression},
    author={Sebastian Braun and Ivan Tashev},
    year={2020},
    eprint={2008.06412},
    archivePrefix={arXiv},
    primaryClass={eess.AS}
}

The P.835 test framework:

@inproceedings{naderi2021crowdsourcing,
  title={Subjective Evaluation of Noise Suppression Algorithms in Crowdsourcing},
  author={Naderi, Babak and Cutler, Ross},
  booktitle={INTERSPEECH},
  year={2021}
}

DNSMOS API:

@inproceedings{reddy2021dnsmos,
  title={DNSMOS: A Non-Intrusive Perceptual Objective Speech Quality metric to evaluate Noise Suppressors},
  author={Reddy, Chandan KA and Gopal, Vishak and Cutler, Ross},
  booktitle={ICASSP},
  year={2021}
}
@inproceedings{reddy2022dnsmos,
  title={DNSMOS P.835: A non-intrusive perceptual objective speech quality metric to evaluate noise suppressors},
  author={Reddy, Chandan KA and Gopal, Vishak and Cutler, Ross},
  booktitle={ICASSP},
  year={2022}
}

Contributing

This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.

When you submit a pull request, a CLA bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact [email protected] with any additional questions or comments.

Legal Notices

Microsoft and any contributors grant you a license to the Microsoft documentation and other content in this repository under the Creative Commons Attribution 4.0 International Public License, see the LICENSE file, and grant you a license to any code in the repository under the MIT License, see the LICENSE-CODE file.

Microsoft, Windows, Microsoft Azure and/or other Microsoft products and services referenced in the documentation may be either trademarks or registered trademarks of Microsoft in the United States and/or other countries. The licenses for this project do not grant you rights to use any Microsoft names, logos, or trademarks. Microsoft's general trademark guidelines can be found at http://go.microsoft.com/fwlink/?LinkID=254653.

Privacy information can be found at https://privacy.microsoft.com/en-us/

Microsoft and any contributors reserve all other rights, whether under their respective copyrights, patents, or trademarks, whether by implication, estoppel or otherwise.

Dataset licenses

MICROSOFT PROVIDES THE DATASETS ON AN "AS IS" BASIS. MICROSOFT MAKES NO WARRANTIES, EXPRESS OR IMPLIED, GUARANTEES OR CONDITIONS WITH RESPECT TO YOUR USE OF THE DATASETS. TO THE EXTENT PERMITTED UNDER YOUR LOCAL LAW, MICROSOFT DISCLAIMS ALL LIABILITY FOR ANY DAMAGES OR LOSSES, INLCUDING DIRECT, CONSEQUENTIAL, SPECIAL, INDIRECT, INCIDENTAL OR PUNITIVE, RESULTING FROM YOUR USE OF THE DATASETS.

The datasets are provided under the original terms that Microsoft received such datasets. See below for more information about each dataset.

The datasets used in this project are licensed as follows:

  1. Clean speech:
  1. Noise:
  1. RIR datasets: OpenSLR26 and OpenSLR28:

Code license

MIT License

Copyright (c) Microsoft Corporation.

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE

dns-challenge's People

Contributors

ashaazami avatar chandanka90 avatar ddjanke avatar hadubey avatar hdubey avatar microsoftopensource avatar motus avatar mpariente avatar rocheng avatar rosscutler avatar vishakg avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

dns-challenge's Issues

Librivox dataset

Could you specify the reader and book names for the Librivox samples in the clean dataset? I would like to use them, but with a higher sampling rate (so I have to download them again from Librivox).

download data

Hi, I have tried the method in repo:asteroid to install git-lfs and download the dataset, and its too big and I don't know if the download process is broken because the internet connection problems or whatever, I have to download from the first file or I can start from the break point and preserve the files I have finished? Thanks!!!

datasets size

Thanks for sharing these datasets.

I have already downloaded 140GB, but the .git folder takes about 70GB. Could you tell me how much disk space should be prepared before git clone?

Thank you so much.

small issue for file path error

my_rir = os.path.normpath(os.path.join('datasets\impulse_responses', params['myrir'][rir_index]))

In 'datasets\impulse_responses' , the slash have to be flipped.
So 'datasets/impulse_responses' is the correct one.

Unless FileNotFoundError occurs.

FileNotFoundError: [Errno 2] No such file or directory: 'datasets\impulse_responses/SLR28/RIRS_NOISES/simulated_rirs/mediumroom/Room059/Room059-00049.wav'

Noise files from DEMAND

Hello,

I'm curious how to go about identifying which files in the noise dataset are pulled from the DEMAND dataset. I'd like to use both the DNS21 noise and DEMAND datasets for model training, but am hoping to exclude the DEMAND files from the DNS set. This was pretty straightforward to do for Freesound, but DEMAND is less clear.

Thank you!
Amie

Do we need to do dereverberation?

Hello, thank you for this challenge.
In the 'datasets/test set/synthetic/with reverb/' folder, there is dereverberation in the clean speech.
So, I think that the aim of this challenge is to remove the noise, but not including removing the reverberation. Right?

Request: labels for the datasets

Hi,

Can you supply labels associated with the audio? The README gives impression that you have them, but they are nowhere to be found in the repo. It would help a good deal to combat the bias. For starter, we could subsample classes that are overrepresented.

code for training

Hi,
Is it possible to get the code files of the NSNet that were used for training the net? Not only the inference code?

#bug The wav files seem all broken.

I used the commond as the README.md with noisyspeech_synthesizer_singleprocess.py to synthesize the data.
But there is a problem

Traceback (most recent call last):
File "noisyspeech_synthesizer_singleprocess.py", line 361, in
main_body()
File "noisyspeech_synthesizer_singleprocess.py", line 333, in main_body
noise_source_files, noise_clipped_files, noise_low_activity_files = main_gen(params)
File "noisyspeech_synthesizer_singleprocess.py", line 164, in main_gen
gen_audio(True, params, clean_index)
File "noisyspeech_synthesizer_singleprocess.py", line 119, in gen_audio
build_audio(is_clean, params, index, audio_samples_length)
File "noisyspeech_synthesizer_singleprocess.py", line 69, in build_audio
input_audio, fs_input = audioread(source_files[idx])
File "/mnt/lustre/xushuang4/shijing/codes/DNS-Challenge-master/audiolib.py", line 42, in audioread
audio, sample_rate = sf.read(path, start=start, stop=stop)
File "/mnt/lustre/xushuang4/anaconda2/envs/conda_shin_pyannote/lib/python3.6/site-packages/soundfile.py", line 373, in read
subtype, endian, format, closefd) as f:
File "/mnt/lustre/xushuang4/anaconda2/envs/conda_shin_pyannote/lib/python3.6/site-packages/soundfile.py", line 740, in init
self._file = self._open(file, mode_int, closefd)
File "/mnt/lustre/xushuang4/anaconda2/envs/conda_shin_pyannote/lib/python3.6/site-packages/soundfile.py", line 1265, in _open
"Error opening {0!r}: ".format(self.name))
File "/mnt/lustre/xushuang4/anaconda2/envs/conda_shin_pyannote/lib/python3.6/site-packages/soundfile.py", line 1455, in _error_check
raise RuntimeError(prefix + _ffi.string(err_str).decode('utf-8', 'replace'))
RuntimeError: Error opening './datasets/clean/book_11069_chp_0022_reader_08200_33.wav': File contains data in an unknown format.

I thought it was the problem of my soundfile or librosa at first. But I tried in several different versions that all got the same problem. Then I found I could use it to read other .wav files.

Then I try to open the files directly under player in Win10 system, it says the files may be broken.
Please check it. Cause I think it's a very serious problem.

NSNet 2 baseline ONNX model with 512-size STFT and 32ms window

Hello,

In the paper describing the NSNet 2 baseline, you use an STFT of size 512 with a 32ms square-root Hann window, but in the provided nsnet2 baseline model under NSNet2-baseline, you use a size 320 STFT with a 20ms window. Is there a link where I can find the pretrained ONNX model for the 512-size STFT with 32ms window? I'm working on hardware-accelerating NSNet 2 inference using Spatial, and the FFT algorithm I'm using is the efficient Cooley-Tukey algorithm which requires power-of-2 inputs. Right now I need to pad the input audio frame of size 320 by zeros to reach size 512, after which I discard redundant/useless information from the output frame DFT to get a 161-size feature vector to feed into the provided model (after computing the log-power spectrum). This wastes computation, so having access to the 512-size STFT model would be very helpful.

I can't train it myself because I don't have the compute resources or even the storage to store the training data available at the moment.

About image source

Is there any other image source? Git is too slow in my place to download large datasets.

Total size of all files?

Hi, what's the total size of the entire dataset? It would also be helpful to have info on the size of the top level directories, if possible.

Thank you.

Missing files when using SAS_URL download

I downloaded the wideband dataset through azcopy using the SAS_URL. It ran without any errors or warning. When I run noisyspeech_synthesizer_singleprocess.py I get the following error:
FileNotFoundError: [Errno 2] No such file or directory: 'datasets/impulse_responses/SLR26/simulated_rirs_16k/mediumroom/Room162/Room162-00096.wav'

I checked and there are no .wav files in the smallroom, meidumroom or largeroom subdirectories only the rir_list and rir_info files. There seems to be a large subset of the data (at least the impulse responses) missing.

48KSample impulse response

I want to add reverb data to the 48Ksample training set.

Should I just use the 16Ksample impulse response or will a new impulse response be uploaded to the datasets_fullband directory?

Thanks!

Different shape after model

Hi, I am testing the NsNet2 baseline.
I found that my .wav file's shape is different after NsNet2.
Is this correct or what did I miss?

Lots of Warnings due to unexpected clipping

I am able to process the wave files, however i am getting lots of Warnings:

Number of files to be synthesized: 60000
Warning: File #0 has unexpected clipping, returning without writing audio to disk
Warning: File #9 has unexpected clipping, returning without writing audio to disk
Warning: File #10 has unexpected clipping, returning without writing audio to disk
Warning: File #10 has unexpected clipping, returning without writing audio to disk
Warning: File #19 has unexpected clipping, returning without writing audio to disk
Warning: File #27 has unexpected clipping, returning without writing audio to disk
Warning: File #27 has unexpected clipping, returning without writing audio to disk
Warning: File #28 has unexpected clipping, returning without writing audio to disk
Warning: File #33 has unexpected clipping, returning without writing audio to disk
Warning: File #34 has unexpected clipping, returning without writing audio to disk
Warning: File #37 has unexpected clipping, returning without writing audio to disk
Warning: File #42 has unexpected clipping, returning without writing audio to disk
Warning: File #44 has unexpected clipping, returning without writing audio to disk
Warning: File #65 has unexpected clipping, returning without writing audio to disk
Warning: File #65 has unexpected clipping, returning without writing audio to disk
Warning: File #66 has unexpected clipping, returning without writing audio to disk
Warning: File #69 has unexpected clipping, returning without writing audio to disk
Warning: File #79 has unexpected clipping, returning without writing audio to disk
Warning: File #82 has unexpected clipping, returning without writing audio to disk
Warning: File #83 has unexpected clipping, returning without writing audio to disk
Warning: File #88 has unexpected clipping, returning without writing audio to disk
Warning: File #89 has unexpected clipping, returning without writing audio to disk
Warning: File #97 has unexpected clipping, returning without writing audio to disk
Warning: File #98 has unexpected clipping, returning without writing audio to disk
Warning: File #104 has unexpected clipping, returning without writing audio to disk

Is that normal ?

How can I get 3076 real and about 115000 synthetic RIRs ?

In the paper and readme file, it states there are 3076 real and about 115000 synthetic RIRs got from SLR26 and SLR28. Actually, SLR28 contains SLR26. And there are only 325 real and 60000 synthetic RIRs. So how can I get 3076 real and about 115000 synthetic RIRs ?

Exclude reverberation in training data

Is there a way to not include reverberation in the training set? Does setting lower_t60 and upper_t60 to zero in noisyspeech_synthesizer.cfg remove reverberation? Also, what does target_level_lower and target_level_upper do exactly?

How to download the test set that in dev_testset folder?

Hi,

I want to use part of the DNS dataset, and I downloaded the clean and noise speech according to your Readme.md file. However, I did not find the link to download the test set which in dev_testset folder.
If it is convenient,can you tell me how to download the speech in “DNS-Challenge/datasets/dev_testset/ ”?

Thanks so much!

Audio type not supported

Hi,
I cloned the repository (i opened 'git bash' application and used the command 'git clone https://github.com/microsoft/DNS-Challenge.git').
It took a few minutes but it downloaded all the files and datasets.
But the wav files seem to be corrupted. Can't play the files using window's media tools. It says that the file is not playable, and that something is corrupted.

I didn't manage to install and use the LFS option...
So i just waited for everything to download (didn't take that much time).
Is it necessary to work with LFS in order to download the data correctly?

My current status:
When i ran the file "noisyspeech_synthesizer_multiprocessing.py" i got the following message:
"WARNING: Audio type not supported " and the program terminates.
I blame the corrupted wav files.

How can i download the wav files correctly?

BTW, the code requires the "pandas" package too.

Incorrect LibriVox Labelling

The data in the read_speech directory appears to reference the book, chapter, and speaker from the LibriVox database associated with the wav file. This metadata does not appear to be correct. For example, the file:

book_00082_chp_0011_reader_01593_0.wav

is not from reader 01593 but instead from reader 11223. This is just one example, this seems to happen for many, many files.

I'm wondering if this is a known issue and if there is any mapping from the DNS based labelling to the actual LibriVox data.

Clean up the acoustic_params data

CSV files in datasets/acoustic_params/ directory have a number of issues:

  • Some files are not valid CSV - e.g. having single quotes, Python binary strings like b"..." etc.
  • Data for emotional speech has absolute paths like /mnt/noisesuppeastusfilestore/ldc-corpora/kanhawin_git3/DNS-Challenge/...
  • Some file paths have \ instead of /

README should indicate that using git-lfs is required

Without git-lfs, the wav files that were downloaded seemed to be corrupt or incomplete, causing me to get a This audio type is not supported error when running noisyspeech_synthesizer_multiprocessing.py. They also took up only around 500MB. With git-lfs, git clone downloaded around 80GB of data. So using git-lfs should not be a recommendation for "faster downloading", but a requirement, unless I'm missing something.

DNSMOS web-API service

Does the DNSMOS web_API service still work now? The SCORING_URI and AUTH_KEY I applied several months ago can not allow me to use the web_API service. Need I apply them again?

A minor error in `noisyspeech_synthesizer_singleprocess.py`

In line 320:

    # The two items are both 'fileindex_start'. One of them should be 'fileindex_end'
    if cfg['fileindex_start'] != 'None' and cfg['fileindex_start'] != 'None':
        params['num_files'] = int(cfg['fileindex_end'])-int(cfg['fileindex_start'])
        params['fileindex_start'] = int(cfg['fileindex_start'])
        params['fileindex_end'] = int(cfg['fileindex_end'])
    else:
        params['num_files'] = int((params['total_hours']*60*60)/params['audio_length'])
        params['fileindex_start'] = 0
        params['fileindex_end'] = params['num_files']

Cloning the repo has huge space requirements.

I cloned the repo last year for the interspeech 2020 version of the dataset. Now that I'm cloning again for the Iatest version, on a new machine, I'm not able to complete the clone. I've gotten insufficient disk space errors twice already. I'm cloning onto an empty 1TB nvme drive.

error on running noisyspeech_synthesizer_singleprocess.py

python noisyspeech_synthesizer_singleprocess.py
Number of files to be synthesized: 60000
WARNING: Audio type not supported
Traceback (most recent call last):
File "noisyspeech_synthesizer_singleprocess.py", line 537, in
main_body()
File "noisyspeech_synthesizer_singleprocess.py", line 508, in main_body
noise_source_files, noise_clipped_files, noise_low_activity_files = main_gen(params)
File "noisyspeech_synthesizer_singleprocess.py", line 168, in main_gen
gen_audio(True, params, clean_index)
File "noisyspeech_synthesizer_singleprocess.py", line 132, in gen_audio
build_audio(is_clean, params, index, audio_samples_length)
File "noisyspeech_synthesizer_singleprocess.py", line 82, in build_audio
input_audio, fs_input = audioread(source_files[idx])
File "/data1/dtln/DNS-Challenge/audiolib.py", line 45, in audioread
if len(audio.shape) == 1: # mono
UnboundLocalError: local variable 'audio' referenced before assignment

DNSmos API serverce problem

I have obtained SCORING_URI and AUTH_KEY in accordance with the requirements of DNSMOS web_API service. But I got the following problem when using the given demo to predict the mos of wavs in local:
'Connection aborted.', RemoteDisconnected('Remote end closed connection without response'
How could I slove this problem

Will the subjective evaluation data be released?

Hi,organizers,

The subjective evaluation data is very useful for building an automatic speech assessment system, like DNS-MOS.

So the question is,are you going to release the subjective evaluation data (paired MOS and corresponding speech).

dataset wavefile format - WARNING: Audio type not supported

Number of files to be synthesized: 60000
WARNING: Audio type not supported
Traceback (most recent call last):
File "noisyspeech_synthesizer_singleprocess.py", line 351, in
main_body()
File "noisyspeech_synthesizer_singleprocess.py", line 323, in main_body
noise_source_files, noise_clipped_files, noise_low_activity_files = main_gen(params)
File "noisyspeech_synthesizer_singleprocess.py", line 155, in main_gen
gen_audio(True, params, clean_index)
File "noisyspeech_synthesizer_singleprocess.py", line 119, in gen_audio
build_audio(is_clean, params, index, audio_samples_length)
File "noisyspeech_synthesizer_singleprocess.py", line 69, in build_audio
input_audio, fs_input = audioread(source_files[idx])
File "/home/stuart/sagar/DNS-Challenge/audiolib.py", line 45, in audioread
if len(audio.shape) == 1: # mono
UnboundLocalError: local variable 'audio' referenced before assignment

the wav files are in audio x-wav format. Seems they dont have header. Do we have to reformat this or downloading failure?

Any support is appreciated
Thanks in advance

segmental_snr_mixer(), snr is used after rms already changed?

in function

def segmental_snr_mixer(params, clean, noise, snr, target_level=-25, clipping_threshold=0.99):

line 165 and on:

clean = normalize_segmental_rms(clean, rms=rmsclean, target_level=target_level)
noise = normalize_segmental_rms(noise, rms=rmsnoise, target_level=target_level)
# Set the noise level for a given SNR
noisescalar = rmsclean / (10**(**snr**/20)) / (rmsnoise+EPS)
noisenewlevel = noise * noisescalar

when use snr, the rms of clean and noise is changed due to clean and noise signals are changed in normalize_segmental_rms(), so why use rmsclean and rmsnoise when calculate noisescalar?

Some wave files are lost when downloaded by Azcopy

I've downloaded the wide and full band dataset by azcopy using the SAR_URL, but lots of wave files are missing, especially in Non-english (french, german, italian, russian, spanish)dataset. The number of files of the clean dataset in this project is about 800k, but I got about 270k files by azcopy.

I try to download this Non-english dataset from https://www.caito.de/2019/01/the-m-ailabs-speech-dataset/, but it does not seem to work, So, how can I get the wide and full band non-english dataset.

Rnnoise training

I want to use this dataset for training RNNoise model. Can anyone explain me how to use this dataset repo..it little complex

Buzzing sound in predictions on model trained with 44.1K sampling rate

We generated data set with 44.1K sampling rate by using the process mentioned .

Than we trained a model with this data.

But when we did inferencing as:-

1] For each noisy audio , we did predictions on 10ms window
2] We concat all the 10ms output window to weave back the de-noised audio.

We are hearing buzzing sound in the audio along with the voice.
The same is not observed if we do not do windowing and pass in the whole audio for inferencing.

Please can you help us with the same.

Please can you help us with creating a training dataset without reverberations?

Please can you help us with creating a training dataset without reverberations?
Because looking at the code of 'noisyspeech_synthesizer_singleprocess.py' and configuration 'noisyspeech_synthesizer.cfg' we cannot find a setting for the same.

The parameter 'rir_choice' always needs to be specified as : -
for only real rir, 2 for only synthetic rir, 3 (default) use both real and synthetic
but no value for 'no reverberation'. As specified in the paper the 'DNS test set no reverb'.

pesq

Audio referenced before assignment

If I run the code as the readme file says, I get this error, after looking at the code, it seems to me that line 41 (inside the try) isn't being correcly executed, any ideas to fix this?

Error:

~/DNS-Challenge$ python noisyspeech_synthesizer_singleprocess.py
Number of files to be synthesized: 60000
WARNING: Audio type not supported
Traceback (most recent call last):
  File "noisyspeech_synthesizer_singleprocess.py", line 537, in <module>
    main_body()
  File "noisyspeech_synthesizer_singleprocess.py", line 508, in main_body
    noise_source_files, noise_clipped_files, noise_low_activity_files = main_gen(params)
  File "noisyspeech_synthesizer_singleprocess.py", line 168, in main_gen
    gen_audio(True, params, clean_index)
  File "noisyspeech_synthesizer_singleprocess.py", line 132, in gen_audio
    build_audio(is_clean, params, index, audio_samples_length)
  File "noisyspeech_synthesizer_singleprocess.py", line 82, in build_audio
    input_audio, fs_input = audioread(source_files[idx])
  File "/home/alberto/DNS-Challenge/audiolib.py", line 45, in audioread
    if len(audio.shape) == 1:  # mono
UnboundLocalError: local variable 'audio' referenced before assignment

git lfs error while downloading dataset

Below is the error I'm getting while trying to clone the repo:

Cloning into 'DNS-Challenge'...
remote: Enumerating objects: 3, done.
remote: Counting objects: 100% (3/3), done.
remote: Compressing objects: 100% (3/3), done.
remote: Total 132377 (delta 0), reused 0 (delta 0), pack-reused 132374
Receiving objects: 100% (132377/132377), 23.95 MiB | 2.25 MiB/s, done.
Resolving deltas: 100% (134/134), done.
Updating files: 100% (132177/132177), done.
Downloading datasets/clean/book_00648_chp_0004_reader_11560_3.wav (992 KB)
Error downloading object: datasets/clean/book_00648_chp_0004_reader_11560_3.wav (a6a917c): Smudge error: Error downloading datasets/clean/book_00648_chp_0004_reader_11560_3.wav (a6a917c0a7fdcdce1147c20a77ef5eeb79ee54ffdd69d4fcb4b13996d978fef5): batch response: Post "https://github.com/microsoft/DNS-Challenge.git/info/lfs/objects/batch": dial tcp: lookup github.com: no such host

Errors logged to D:\TF\DTLN-tfjs\DNS-Challenge.git\lfs\logs\20200716T111635.8111709.log
Use git lfs logs last to view the log.
error: external filter 'git-lfs filter-process' failed
fatal: datasets/clean/book_00648_chp_0004_reader_11560_3.wav: smudge filter lfs failed
warning: Clone succeeded, but checkout failed.
You can inspect what was checked out with 'git status'
and retry with 'git restore --source=HEAD :/'

multiprocessor lacks good error message

I am running Linux-on-Windows so the initial source clean/speech and noise directories set in the .cfg file were not true. When I ran the noisyspeech_synthesizer_multiprocessing.py it crashed at row 73, idx = idx_counter.value % np.size(source_files), as the source_files was empty, making np.size(source_files) == 0, which in tun meant modulo by 0 or division by zero.

noisyspeech_synthesizers_singleprocess.py raised me a proper error code (couldnt find clean files) which was helpful. Could be worth adding the same error code to the multiprocessing file as well.

Noise categories

Hi,

Is there a CSV or some other documentation for the category/ies per noise file? i.e. what is contained in each file.

Maybe , it's the other way around

Perhaps we could build up a model of the speaker's voice before the call, and then directly block out other noises during the call. The model could also be saved on the user's hardware for continuous optimization.

Some WAV files are used twice in noisyspeech_synthesizer_singleprocess.py

# line 350
    if 'speech_csv' in cfg.keys() and cfg['speech_csv'] != 'None':
        cleanfilenames = pd.read_csv(cfg['speech_csv'])
        cleanfilenames = cleanfilenames['filename']
    else:
        #cleanfilenames = glob.glob(os.path.join(clean_dir, params['audioformat']))
        cleanfilenames= []
        for path in Path(clean_dir).rglob('*.wav'):
            cleanfilenames.append(str(path.resolve()))

defalut repo does not exist ‘speech_csv’, so list cleanfilenames contains all clean wav

# line 360
#   add singing voice to clean speech
    if params['use_singing_data'] ==1:
        all_singing= []
        for path in Path(params['clean_singing']).rglob('*.wav'):
            all_singing.append(str(path.resolve()))
            
        if params['singing_choice']==1: # male speakers
            mysinging = [s for s in all_singing if ("male" in s and "female" not in s)]
    
        elif params['singing_choice']==2: # female speakers
            mysinging = [s for s in all_singing if "female" in s]
    
        elif params['singing_choice']==3: # both male and female
            mysinging = all_singing
        else: # default both male and female
            mysinging = all_singing
            
        shuffle(mysinging)
        if mysinging is not None:
            all_cleanfiles= cleanfilenames + mysinging
    else: 
        all_cleanfiles= cleanfilenames

add singing voice to clean speech, so list all_cleanfiles contains the wav in ‘datasets/clean/singing_voice’ directory twice.
very confused about the above problems

Problems with data download and git lfs

Hi I'm having trouble downloading the dataset and would appreciate some help.

The first problem I encounter is a rate limit error while cloning the repo:

Error downloading object: datasets/blind_test_set/noreverb_fileid_54.wav (3355ba7): 
Smudge error: Error downloading datasets/blind_test_set/noreverb_fileid_54.wav (3355ba7982eb4f8f12514e77920bf965ce5b8c45a8f9681be06da9f4a16ec6cc): 
batch response: Rate limit exceeded: https://github.com/microsoft/DNS-Challenge.git/info/lfs/objects/batch

I then proceeded to git lfs pull hoping that it would finish cloning and it was going well until I hit another error:

Error updating the git index: (131998/131998), 86 GB | 5.5 MB/s                                                                                               
error: datasets/blind_test_set/realrec_fileid_0.wav: cannot add to the index - missing --add option?
fatal: Unable to process path datasets/blind_test_set/realrec_fileid_0.wav

I now get an incomplete version of the repo no matter what I do. Both git lfs pull and git lfs fetch hang and do nothing. I tried checking out before trying them out again and they still hang. Any ideas of what may be going wrong? Is there any alternative to using git lfs?

Dataset generation report

Hi, I'm setting a simple recipe for the challenge here and I'm having some question/remarks about the data generation :

  • The .csv files generated at the end of the scripts don't really make sense, I think columns and lines where inverted.
  • The default SNR range in the config file make lots of almost noiseless utterances, is this the intended behavior?
  • The noise files are saved with the original scale, not with the same scale as the mixture and the scaling factor are saved nowhere.. It would be nice to have the same scale as in the mixture, or at least, have a .csv file with the scaling factor for each utterance.

No enhanced audios are generated

Hi. I´m trying to run NSNet. I don´t get errors but no enhanced audios are generated in the assigned folder. In a Linux terminal I get the following message (I´me testing it with 6 .wav audios):
2020-04-01 09:49:30,581 DEBUG NSNet local workers start with 6 input files
2020-04-01 09:49:30,589 INFO NSNet local workers complete

Could you advice what could be the issue?

Thank you

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.