Git Product home page Git Product logo

nisqa-s's Introduction

NISQA-s: Speech Quality and Naturalness Assessment for Online Inference

NISQA-s is highly stripped and optimized version of original NISQA metric. It is aiming to create universal metrics set for both offline and online evaluation of audio quality.

This version supports only CNN+LSTM version of original model (since other modifications don't support streaming or perform too slow). It uses the same architecture with some tweaks for streaming purposes. Also there's no MOS-only model, since main model supports MOS prediction (for simplicity of the code and repo).

Installation

(Optional) Create new venv or conda env

Then just pip install -r requirements.txt

Note that there may be some problems with torch installation. If so, follow official PyTorch instructions

Quick start

If you want to just run this repo with provided config and samples -

python -m scripts.run_infer_file

If you want to test online inference from your mic -

python -m scripts.run_infer_mic

This will log inference results to terminal, so pay attention to it.

Config options

Default config is config/nisqa_s.yaml. All configurations for everything related to training and inference are happening here. There are detailed comments about each parameter, so we'll cover only the most important ones for inference:

  • ckp: path to trained checkpoint (weights/nisqa_s.tar by default)

  • sample: path to evaluated file

If you plan to run online inference, you should pay close attention to last 4 arguments in this config:

  • frame lets you choose length of buffer to feed into the model;

  • updates will make the model spit metrics more often (check argument description)

  • sd_device's ID should be provided if you want to run this on different input devices (e.g. sound-card mic). First run of run_infer_mic.py will show you those IDs.

  • sd_dump lets you save mic input to check the results in offline later.

And finally, you can run custom config for your experiments - just add --yaml argument to python -m scripts.run_infer_file/python -m scripts.run_infer_mic and provide path to your own config:

python -m scripts.run_infer_file --yaml path/to/custom/config.yaml

Training

We provide simple interface for training your own version of NISQA-s.

Firstly, you will need the dataset. You can obtain it from official NISQA repo. This is probably the only (but definitely the best) way to train this, since the data needs to be very specifically labeled for this to work.

To train the same version as provided -

python -m scripts.run_train

Remember to check name of the experiment in nisqa_s.yaml and path to NISQA Corpus in data_dir, as well as path to save the model (output_dir)

Training and model parameters in config

  • Since you're most probably using NISQA Corpus, there is no need to change anything in Dataset options. If you use some hand-made dataset - you need to refer to this guide.

  • Training options contains all parameters connected to training setup (like learning rates, batch size etc.).

  • You can also experiment with bias loss by enabling Bias loss options

  • Change Mel-Specs options if you want to experiment on different samplerates, Fourier lengths or sample length for training (although it is highly not recommended to lower value of ms_max_length because of NISQA Corpus labeling)

  • CNN parameters and LSTM parameters - change those to experiment on different parameters of convolutional and recurrent layers.

Note that provided checkpoint is trained with provided config.

Citations

@article{Mittag_Naderi_Chehadi_Möller_2021, 
  title={Nisqa: A deep CNN-self-attention model for multidimensional speech quality prediction with crowdsourced datasets}, 
  DOI={10.21437/interspeech.2021-299}, 
  journal={Interspeech 2021}, 
  author={Mittag, Gabriel and Naderi, Babak and Chehadi, Assmaa and Möller, Sebastian}, 
  year={2021}
} 
@misc{deepvk2024nisqa,
  author = {Ivan, Beskrovnyi},
  title = {nisqa-s},
  year = {2024},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {https://github.com/deepvk/nisqa-s}
}

nisqa-s's People

Contributors

spirinegor avatar jbloodless avatar

Stargazers

Fyo avatar  avatar Vladislav Skripniuk avatar  avatar Alexander Varlamov avatar Paul Artushkov avatar  avatar  avatar  avatar Qian Liu avatar Nickolay V. Shmyrev avatar Alex avatar Andrey Gershun avatar IZYUMOV KIRILL avatar Azer Shakhverdiev avatar Igor Filippov avatar  avatar Lukasz Liniewicz avatar  avatar Alexey Korepanov avatar Eva avatar Andrew Sokolov avatar  avatar Alexander Ivanov avatar Pavel avatar Artem Kotov avatar  avatar  avatar Ilia Ulitin avatar Alexandr Solovev avatar Semin Kirill avatar  avatar Daria Diatlova avatar

Watchers

Nickolay V. Shmyrev avatar Andrew Sokolov avatar

nisqa-s's Issues

"Maximum size for tensor at dimension" error for some audio files

For some files, the program gives an error like this:

  File "/Users/agershun/repo/whisper/NISQA-s/src/utils/process_utils.py", line 62, in segment_specs
    unfolded_x = transposed_x.unfold(0, seg_length, 1)
RuntimeError: maximum size for tensor at dimension 0 is 2 but size is 15

Here is a sample file:
https://drive.google.com/file/d/1n2CwhTZsp8DJdlFQfXWFGvL9rASFBrXQ/view?usp=sharing

I tried converting the .ogg file to .wav, but the problem still occurs.

I use MacBook Pro M1

"Padding size should be less.." error for some WAV files

Another problem for some WAV files

/Users/agershun/repo/whisper/NISQA-s/.venv/lib/python3.10/site-packages/torch/nn/modules/rnn.py:82: UserWarning: dropout option adds dropout after all but last recurrent layer, so non-zero dropout expects num_layers greater than 1, but got dropout=1 and num_layers=1
  warnings.warn("dropout option adds dropout after all but last "
NOI    COL   DISC  LOUD  MOS
Traceback (most recent call last):
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/Users/agershun/repo/whisper/NISQA-s/scripts/run_infer_file.py", line 33, in <module>
    out, h0, c0 = process(audio, sr, model, h0, c0, args)
  File "/Users/agershun/repo/whisper/NISQA-s/src/utils/process_utils.py", line 79, in process
    audio = get_ta_melspec(
  File "/Users/agershun/repo/whisper/NISQA-s/src/utils/process_utils.py", line 39, in get_ta_melspec
    S = melSpec(y)
  File "/Users/agershun/repo/whisper/NISQA-s/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/Users/agershun/repo/whisper/NISQA-s/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/Users/agershun/repo/whisper/NISQA-s/.venv/lib/python3.10/site-packages/torchaudio/transforms/_transforms.py", line 619, in forward
    specgram = self.spectrogram(waveform)
  File "/Users/agershun/repo/whisper/NISQA-s/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/Users/agershun/repo/whisper/NISQA-s/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/Users/agershun/repo/whisper/NISQA-s/.venv/lib/python3.10/site-packages/torchaudio/transforms/_transforms.py", line 110, in forward
    return F.spectrogram(
  File "/Users/agershun/repo/whisper/NISQA-s/.venv/lib/python3.10/site-packages/torchaudio/functional/functional.py", line 126, in spectrogram
    spec_f = torch.stft(
  File "/Users/agershun/repo/whisper/NISQA-s/.venv/lib/python3.10/site-packages/torch/functional.py", line 648, in stft
    input = F.pad(input.view(extended_shape), [pad, pad], pad_mode)
RuntimeError: Argument #4: Padding size should be less than the corresponding input dimension, but got: padding (480, 480) at dimension 2 of input [1, 88200, 2]

This is a sample of the file with the problem:
https://drive.google.com/file/d/1CHZ4TZaILu-K5rsA8XkPAEDbt_2VGXVd/view?usp=sharing

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.