google / visqol Goto Github PK

Perceptual Quality Estimator for speech and audio

License: Apache License 2.0

Starlark 5.58% Python 10.33% Shell 0.99% C++ 82.51% C 0.58%

visqol's Introduction

ViSQOL

ViSQOL (Virtual Speech Quality Objective Listener) is an objective, full-reference metric for perceived audio quality. It uses a spectro-temporal measure of similarity between a reference and a test speech signal to produce a MOS-LQO (Mean Opinion Score - Listening Quality Objective) score. MOS-LQO scores range from 1 (the worst) to 5 (the best).

Features
Build
Command Line Usage
API Usage
Dependencies
License
Papers
FAQ
Acknowledgement

Guidelines

ViSQOL can be run from the command line, or integrated into a project and used through its C++ or Python APIs. Whether being used from the command line, or used through the API, ViSQOL is capable of running in two modes:

Audio Mode:

When running in audio mode, input signals must have a 48kHz sample rate. Input should be resampled to 48kHz.
Input signals can be multi-channel, but they will be down-mixed to mono for performing the comparison.
Audio mode uses support vector regression, with the maximum range at ~4.75.

Speech Mode:

When running in speech mode, ViSQOL uses a wideband model. It therefore expects input sample rates of 16kHz. Input should be resampled to 16kHz.
As part of the speech mode processing, a root mean square implementation for voice activity detection is performed on the reference signal to determine what parts of the signal have voice activity and should therefore be included in the comparison. The signal is normalized before performing the voice activity detection.
Input signals can be multi-channel, but they will be down-mixed to mono for performing the comparison.
Speech mode is scaled to have a maximum MOS of 5.0 to match previous version behavior.

General guidelines for input

ViSQOL was trained with data from subjective tests that roughly follow industry standards, such as ITU-T Rec. P.863. As a result certain assumptions are made, and your input to ViSQOL should probably have these properties:

The input audio files should be approximately 8-10 seconds, with not too much silence inside of them and around 0.5s of silence around the audible part.
When comparing audio from different sources, be aware of sample rate on the files. If you compare the result from a 16kHz file and a 48kHz file with very similar content, the scores can be quite different.
The reference audio is clean and equal or higher quality than the degraded audio.
ITU-T P.800 has describes a standard listening test to measure MOS. It has various recommendations about the audio and environment that may be useful as a reference.

General guidelines for interpreting the output

Single scores are not very meaningful. Rather, treatments should be aggregated over several samples that have the same treatment.
The choice of audio mode vs speech mode can have large effects on the output.

Build

Linux/Mac Build Instructions

Install Bazel

Bazel can be installed following the instructions for Linux or Mac.
Tested with Bazel version 5.1.0.

Install Numpy

Can be installed with pip install numpy

Build ViSQOL

Change directory to the root of the ViSQOL project (i.e. where the WORKSPACE file is) and run the following command: bazel build :visqol -c opt

Windows Build Instructions (Experimental, last tested on Windows 10 x64, 2020 August)

Install Bazel

Bazel can be installed for Windows from here.
Tested with Bazel version 5.1.0.

Install git

git for Windows can be obtained from the official git website.
When installing, select the option that allows git to be accessed from the system shells.

Install Tensorflow dependencies

Follow the instructions detailed here to install tensorflow build dependencies for windows.

Build ViSQOL:

Change directory to the root of the ViSQOL project (i.e. where the WORKSPACE file is) and run the following command: bazel build :visqol -c opt

Command Line Usage

Note Regarding Usage

When run from the command line, input signals must be in WAV format.

Flags

--reference_file

The 48k sample rate WAV file used as the reference audio.

--degraded_file

The 48k sample rate WAV file that will be compared to the reference audio.

--batch_input_csv

Used to specify a path to a CSV file with the format:

reference,degraded ref1.wav,deg1.wav ref2.wav,deg2.wav
If the batch_input_csv flag is used, the reference_file and degraded_file flags will be ignored.

--results_csv

Used to specify a path that the similarity score results will be output to. This will be a CSV file with the format:

reference,degraded,moslqo ref1.wav,deg1.wav,3.4 ref2.wav,deg2.wav,4.1

--verbose

The reference file path, degraded file path and the MOS-LQO values will be output to the console after the MOS-LQO has been calculated, along with similarity scores on a per-patch and per-frequency band basis.

--output_debug

Used to specify a file path where output debug information will be written to. This debug info contains the full details of the comparison between the reference and degraded audio signals and is in JSON format. The file does not need to previously exist. Contents will be appended to the file if it does already exist or if ViSQOL is run in batch mode.

--similarity_to_quality_model

The lattice or libsvm model to use during comparison. Use this only if you want to explicitly specify the model file location, otherwise the default model will be used.

--use_speech_mode

Use a wideband model (sensitive up to 8kHz) with voice activity detection that normalizes the polynomial NSIM->MOS mapping so that a perfect NSIM score of 1.0 translates to 5.0.

--use_unscaled_speech_mos_mapping

When used in conjunction with --use_speech_mode, this flag will prevent a perfect NSIM score of 1.0 being translated to a MOS score of 5.0. Perfect NSIM scores will instead result in MOS scores of ~4.x.

--use_lattice_model

(default: true) Use a deep lattice network model to map similarity to quality. This produces more accurate results for speech (audio mode is not yet supported).

Example Command Line Usage

To compare two files and output their similarity to the console:

Linux/Mac:

./bazel-bin/visqol --reference_file ref1.wav --degraded_file deg1.wav --verbose

Windows:

bazel-bin\visqol.exe --reference_file "ref1.wav" --degraded_file "deg1.wav" --verbose

To compare all reference-degraded file pairs in a CSV file, outputting the results to another file and also outputting additional "debug" information:

Linux/Mac:

./bazel-bin/visqol --batch_input_csv input.csv --results_csv results.csv --output_debug debug.json

Windows:

bazel-bin\visqol.exe --batch_input_csv "input.csv" --results_csv "results.csv" --output_debug "debug.json"

To compare two files using scaled speech mode and output their similarity to the console:

Linux/Mac:

./bazel-bin/visqol --reference_file ref1.wav --degraded_file deg1.wav --use_speech_mode --verbose

Windows:

bazel-bin\visqol.exe --reference_file "ref1.wav" --degraded_file "deg1.wav" --use_speech_mode --verbose

To compare two files using unscaled speech mode and output their similarity to the console:

Linux/Mac:

./bazel-bin/visqol --reference_file ref1.wav --degraded_file deg1.wav --use_speech_mode --use_unscaled_speech_mos_mapping --verbose

Windows:

bazel-bin\visqol.exe --reference_file "ref1.wav" --degraded_file "deg1.wav" --use_speech_mode --use_unscaled_speech_mos_mapping --verbose

C++ API Usage

ViSQOL Integration

To integrate ViSQOL with your Bazel project:

Add ViSQOL to your WORKSPACE file as a local_repository:

local_repository (
    name = "visqol",
    path = "/path/to/visqol",
)

Then in your project's BUILD file, add the ViSQOL library as a dependency to your binary/library dependency list:
```
deps = ["@visqol//:visqol_lib"],
```
Note that Bazel does not currently resolve transitive dependencies (see issue #2391). As a workaround, it is required that you copy the contents of the ViSQOL WORKSPACE file to your own project's WORKSPACE file until this is resolved.

Sample Program

int main(int argc, char **argv) {

  // Create an instance of the ViSQOL API configuration class.
  Visqol::VisqolConfig config;

  // Set the sample rate of the signals that are to be compared.
  // Both signals must have the same sample rate.
  config.mutable_audio()->set_sample_rate(48000);

  // When running in audio mode, sample rates of 48k is recommended for the input signals.
  // Using non-48k input will very likely negatively affect the comparison result.
  // If, however, API users wish to run with non-48k input, set this to true.
  config.mutable_options()->set_allow_unsupported_sample_rates(false);

  // Optionally, set the location of the model file to use.
  // If not set, the default model file will be used.
  config.mutable_options()->set_model_path("visqol/model/libsvm_nu_svr_model.txt");

  // ViSQOL will run in audio mode comparison by default.
  // If speech mode comparison is desired, set to true.
  config.mutable_options()->set_use_speech_scoring(false);

  // Speech mode will scale the MOS mapping by default. This means that a
  // perfect NSIM score of 1.0 will be mapped to a perfect MOS-LQO of 5.0.
  // Set to true to use unscaled speech mode. This means that a perfect
  // NSIM score will instead be mapped to a MOS-LQO of ~4.x.
  config.mutable_options()->set_use_unscaled_speech_mos_mapping(false);

  // Create an instance of the ViSQOL API.
  Visqol::VisqolApi visqol;
  absl::Status status = visqol.Create(config);

  // Ensure that the creation succeeded.
  if (!status.ok()) {
    std::cout<<status.ToString()<<std::endl;
    return -1;
  }

  // Perform the comparison.
  absl::StatusOr<Visqol::SimilarityResultMsg> comparison_status_or =
          visqol.Measure(reference_signal, degraded_signal);

  // Ensure that the comparison succeeded.
  if (!comparison_status_or.ok()) {
    std::cout<<comparison_status_or.status().ToString()<<std::endl;
    return -1;
  }

  // Extract the comparison result from the StatusOr.
  Visqol::SimilarityResultMsg similarity_result = comparison_status_or.value();

  // Get the "Mean Opinion Score - Listening Quality Objective" for the degraded
  // signal, following the comparison to the reference signal.
  double moslqo = similarity_result.moslqo();

  // Get the similarity results for each frequency band.
  google::protobuf::RepeatedField<double> fvnsim = similarity_result.fvnsim();

  // Get the center frequency bands that the above FVNSIM results correspond to.
  google::protobuf::RepeatedField<double> cfb = similarity_result.center_freq_bands();

  // Get the mean of the FVNSIM values (the VNSIM).
  double vnsim = similarity_result.vnsim();

  // Get the comparison results for each patch that was compared.
  google::protobuf::RepeatedPtrField<Visqol::SimilarityResultMsg_PatchSimilarityMsg> patch_sims =
          similarity_result.patch_sims();

  for (Visqol::SimilarityResultMsg_PatchSimilarityMsg each_patch : patch_sims) {
    // Get the similarity score for this patch.
    double patch_similarity = each_patch.similarity();

    // Get the similarity results for each frequency band for this patch.
    // The center frequencies that these values correspond to are the
    // same as those that are returned in the parent center_freq_bands().
    google::protobuf::RepeatedField<double> patch_fvnsim = each_patch.freq_band_means();

    // Get the time (in sec) where this patch starts in the reference signal.
    double ref_patch_start_time = each_patch.ref_patch_start_time();

    // Get the time (in sec) where this patch ends in the reference signal.
    double ref_patch_end_time = each_patch.ref_patch_end_time();

    // Get the time (in sec) where this patch starts in the degraded signal.
    double deg_patch_start_time = each_patch.deg_patch_start_time();

    // Get the time (in sec) where this patch ends in the degraded signal.
    double deg_patch_end_time = each_patch.deg_patch_end_time();
  }

  return 0;
}

Python API Usage

ViSQOL Installation

From within the root directory install ViSQOL using pip.

pip install .

Sample Program

import os

from visqol import visqol_lib_py
from visqol.pb2 import visqol_config_pb2
from visqol.pb2 import similarity_result_pb2

config = visqol_config_pb2.VisqolConfig()

mode = "audio"
if mode == "audio":
    config.audio.sample_rate = 48000
    config.options.use_speech_scoring = False
    svr_model_path = "libsvm_nu_svr_model.txt"
elif mode == "speech":
    config.audio.sample_rate = 16000
    config.options.use_speech_scoring = True
    svr_model_path = "lattice_tcditugenmeetpackhref_ls2_nl60_lr12_bs2048_learn.005_ep2400_train1_7_raw.tflite"
else:
    raise ValueError(f"Unrecognized mode: {mode}")

config.options.svr_model_path = os.path.join(
    os.path.dirname(visqol_lib_py.__file__), "model", svr_model_path)

api = visqol_lib_py.VisqolApi()

api.Create(config)

similarity_result = api.Measure(reference, degraded)

print(similarity_result.moslqo)

Dependencies

Armadillo - http://arma.sourceforge.net/

Libsvm - http://www.csie.ntu.edu.tw/~cjlin/libsvm/

PFFFT - https://bitbucket.org/jpommier/pffft

Boost - https://www.boost.org/

Support Vector Regression Model Training

Using the libsvm codebase, you can train a model specific to your data. The procedure is as follows:

Gather audio file pairs in 48kHz (for audio mode) with subjective test scores.
Create 2 CSV files, one that lists the file pairs to be compared according to --batch_input_csv, and one that has the MOS-LQS (mean subjective scores) that correspond to the same rows in the batch csv file under a 'moslqs' column.
Modify src/include/sim_results_writer.h to output_fvnsim=true and output_moslqo=false
Run ViSQOLAudio in batch mode, using --batch_input_csv and --output_csv
Run scripts:make_svm_train_file on myvisqoloutput.csv
Run a grid search to find the SVM parameters. See the docs in scripts/make_svm_train_file.py for help with that.
This model can be passed into ViSQOL in audio mode using --similarity_to_quality_model

Currently, SVR is only supported for audio mode.

License

Use of this source code is governed by a Apache v2.0 license that can be found in the LICENSE file.

Papers

There have been several papers that describe the design of the ViSQOL algorithm and compare it to other metrics. These three should serve as an overview:

ViSQOL v3: An Open Source Production Ready Objective Speech and Audio Metric (2020 Twelfth International Conference on Quality of Multimedia Experience (QoMEX))

ViSQOL: an objective speech quality model (2015 EURASIP Journal on Audio, Speech, and Music Processing)

Objective Assessment of Perceptual Audio Quality Using ViSQOLAudio (The 2017 IEEE Transactions on Broadcasting)

FAQ

Why do I get compile error about undeclared inclusion(s) in rule '//:visqol_lib'?

This may have to do with bazel being out of sync. You may need to run bazel clean --expunge and rebuild.

Why are the MOS predictions on my files so bad?

There are a number of possible explanations, here are the most common ones:

In audio mode, ViSQOL was trained with a clean reference and degraded files full-band (audio containing frequencies up to 24 kHz) with bit rates as low as 24 kbps. If the degraded audio is lower than this it may behave poorly. If you have subjective scores, you might consider training your own model, as can be seen in scripts/make_svm_train_file.py.
Another explanation is that too much silence is being analyzed. We recommend 3 to 10 seconds of audio (typically 5 seconds) that has significant activity in the reference audio.
ViSQOL is designed as a proxy for evaluating codecs and VoIP network degradations with a subjective test similar to ITU-T P.800. In practice, users try it for other use cases, such as denoising, regression testing on preprocessing, and deep learning-based generative models. ViSQOL performs reasonably for some of these, and poorly for others.

Acknowledgement

In addition to the contributions visible on the repository history, Colm Sloan and Feargus O'Gorman have significantly contributed to the codebase in the collaboration between Andrew Hines and Google.

visqol's People

Contributors

Stargazers

Watchers

Forkers

gjasny cosmin michaelrw jianzo xnorpx fransfela dingsw1 fighting41love wenqian-ye bittorf sdwivedi samchen whltexbread games130 vkantchev gitzephyr xinj3 jonnor bartvanerp leoscribit feargusog bolt163 cadenzasong liuweiky neotim mchinen per-gron casezhao zeta1999 runngezhang crazycharles6 galiboo lipengyue leodixit olegjakushkin apexmin 1crazymoney wancaiyan eagomez2 jinlking jiuliguan yifei1010 ronggan alvarolaserna iunknwn isabella232 magicfrog2009 zuowanbushiwo youngjay0612 jasonzhang892 fengniy ui-richard emilio1234 sswensen teiyui hongwen-sun yuribondarenko cst781 happyday630 wanghoi shaun95 joeyheisenberg jcarlosneto hemantautomation andylogi terrisgo gpjtag andyweiqiu jfsantos sx-tts link89 bweepc youweideng80 miblue119 wendonggan pdkyll test-mass-forker-org-1 road2018 meadow163 kesonglu xiaozhuo12138 zys711 pseeth descriptinc ethansnowing terpste aluebs tomgajecki mydecember baekms byeong-seok fkwp teamaround andreybocharnikov tracek nosudden tashinam alex-testlab johnneycat peichangliang123

visqol's Issues

Question about --batch_input_csv, --results_csv and --output_debug commands.

Using this command as example: bazel-bin\visqol.exe --batch_input_csv "input.csv" --results_csv "results.csv" --output_debug "debug.json"

Can you check if command bellow is correct?

bazel-bin\visqol.exe --batch_input_csv "C:\AudioQuality\visqol-master\testevisqol\input.csv" --results_csv "C:\AudioQuality\visqol-master\testevisqol" --output_debug "C:\AudioQuality\visqol-master\testevisqol" --use_speech_mode --verbose

I'm not sure how this feature works...

I have created a input.csv file with following structure:
reference, degraded
path_reference1.wav, path_degraded1.wav
path_reference2.wav, path_degraded2.wav ...

Regarding results.csv and debug.json - are there files created automatically? Cause on the command I've mentioned below, I've just point a path folder.

Can you help me?

Thank you and Regards.

MOS-LQO results are low in speech mode

We tried to apply VISQOL in the audio quality evaluation of a security camera device.
Here is our recording process:
Human voice -> Recorded by high-quality microphone (48kHz, 16bit, mono) -> Resample (16kHz, 16bit, mono) -> reference audio (REF.MONO.16KHZ.VOICE.01.wav)
Human voice -> Recorded by camera's microphone -> Resample (16kHz, 16bit, mono) -> degraded audio (DEG.MONO.16KHZ.VOICE.01.wav)
VISQOL command:
visqol --reference_file REF.MONO.16KHZ.VOICE.01.wav --degraded_file DEG.MONO.16KHZ.VOICE.01.wav --verbose --use_speech_mode
Return MOS is 1.64007 (lower than our expected)
But, MOS is 3.41819 when used in audio mode.

Our test method is ok or not? What we need to do to improve MOS results in speech mode?
Audio files

Error downloading [https://github.com/protocolbuffers/protobuf/releases/download/v3.11.1/protobuf-all-3.11.1.tar.gz]

When I execute bazel build :visqol -c opt , such error is occured.
The version of protobuf is not existed
How can I fix it ?

Compile succeeds in Windows but fails to run

So I managed to compile Visqol in windows using the following:

PS D:\Encode\Tools\Visqol> .\bazel-3.5.0-windows-x86_64.exe --output_user_root D:\Encode\Tools\Visqol\binary build :visqol -c opt

This worked out fine. If I try to run it I get a model error:

PS D:\Encode\Tools\test> D:\Encode\Tools\Visqol\bazel-bin\visqol.exe --reference_file "cloud_age_source.wav" --degraded_file "cloud_age_opus_256.wav" --verbose
[commandline_parser.cc : 193] RAW: File not found: D:\Encode\Tools\test/model/libsvm_nu_svr_model.txt
[main.cc : 28] RAW: INVALID_ARGUMENT: Failed to load the default SVR model D:\Encode\Tools\test/model/libsvm_nu_svr_model.txt. Specify the correct path using '--similarity_to_quality_model <path/to/libsvm_nu_svr_model.txt>'?

If I point it to the model folder like so:

PS D:\Encode\Tools\test> D:\Encode\Tools\Visqol\bazel-bin\visqol.exe --similarity_to_quality_model "D:\Encode\Tools\Visqol\model/libsvm_nu_svr_model.txt" --reference_file "cloud_age_source.wav" --degraded_file "cloud_age_opus_256.wav" --verbose

It doesn't throw an error but it does absolute nothing. No output, no error, nothing. It seems to be running but frozen or just doing nothing.

Debug JSON output invalid

Report from user: Curly brackets need comma between them.

visqol not found

Hi, I didn't found visqol under bazel-bin after building on Ubuntu, could you give any advice to fix this? Thanks

Building fails when compiling under Fedora 35 with GCC 11 (w/Workaround)

Hi,

What the title says:

I tried to build with:

~/g/visqol (master)> bazel build :visqol -c opt

ERROR: /home/nomanos/.cache/bazel/_bazel_nomanos/f39c3da94de7eb0b5ebe28033691f633/external/com_google_absl/absl/synchronization/BUILD.bazel:30:11: Compiling absl/synchronization/internal/graphcycles.cc failed: (Exit 1): gcc failed: error executing command (from target @com_google_absl//absl/synchronization:graphcycles_internal) /usr/bin/gcc -U_FORTIFY_SOURCE -fstack-protector -Wall -Wunused-but-set-parameter -Wno-free-nonheap-object -fno-omit-frame-pointer -g0 -O2 '-D_FORTIFY_SOURCE=1' -DNDEBUG -ffunction-sections ... (remaining 35 arguments skipped)

Use --sandbox_debug to see verbose messages from the sandbox
external/com_google_absl/absl/synchronization/internal/graphcycles.cc: In member function 'void absl::lts_2020_09_23::synchronization_internal::GraphCycles::RemoveNode(void*)':
external/com_google_absl/absl/synchronization/internal/graphcycles.cc:451:26: error: 'numeric_limits' is not a member of 'std'
  451 |   if (x->version == std::numeric_limits<uint32_t>::max()) {
      |                          ^~~~~~~~~~~~~~
external/com_google_absl/absl/synchronization/internal/graphcycles.cc:451:49: error: expected primary-expression before '>' token
  451 |   if (x->version == std::numeric_limits<uint32_t>::max()) {
      |                                                 ^
external/com_google_absl/absl/synchronization/internal/graphcycles.cc:451:52: error: '::max' has not been declared; did you mean 'std::max'?
  451 |   if (x->version == std::numeric_limits<uint32_t>::max()) {
      |                                                    ^~~
      |                                                    std::max
In file included from /usr/lib/gcc/x86_64-redhat-linux/11/../../../../include/c++/11/algorithm:62,
                 from external/com_google_absl/absl/synchronization/internal/graphcycles.cc:38:
/usr/lib/gcc/x86_64-redhat-linux/11/../../../../include/c++/11/bits/stl_algo.h:3467:5: note: 'std::max' declared here
 3467 |     max(initializer_list<_Tp> __l, _Compare __comp)
      |     ^~~
Target //:visqol failed to build
Use --verbose_failures to see the command lines of failed build steps.
INFO: Elapsed time: 23.325s, Critical Path: 7.52s
INFO: 322 processes: 48 internal, 274 linux-sandbox.
FAILED: Build did NOT complete successfully

Workaround:

I managed to resume building by adding #include <limits> in bazel-visqol/external/com_google_absl/absl/synchronization/internal/graphcycles.cc:40 and running bazel build :visqol -c opt from VISQOL root again, however I am reporting this issue here if someone can implement a more permament/less hacky solution since I am not familiar with bazel.

GCC Ver: gcc (GCC) 11.2.1 20211203 (Red Hat 11.2.1-7)

Out of memory with many files in batch mode

Hi,
thanks for making this implementation available.

Tested version: master as of April 15, 30abce1332b961b9b0234a22785e9de95fbcdb8e
Machine platform. Arch Linux current, x86_64. Intel CPU, 16 GB RAM

Steps to reproduce:

Use visqol -batch_input_csv ... with a CSV with many pairs of audio.
The first file I tested had 930 pair of samples a 5 seconds each, with samplerate 44.1kHz

Expected result

visqol will eventually complete all the files, write the CSV output and exit exit code 0

Actual result

visqol was killed Linux after 35 minutes, and processing around 750 files.

Retesting with a smaller subset of the data (45 files) the program completes successfully. But when looking at the memory usage of the process, it seems to grow linearly. Around 250 MB resident for 45 files.

Looking at the in VisqolManager::Run I see the AudioSamples being loaded with MiscAudio::LoadAsMono, containing an AMatrix with the audio data. But I do not see any destuctor in AudioSignal nor AMatrix, nor any manual cleanup of these after a sample pair has been processed.

Could it be that these are never freed?

Degraded audio sample rate: 0.

Hi, I'm taking my first steps with visqol and have following problem:
I have two files (good and bad) both have 16kHz sample rate.
When i'm trying to compare this files, i have an error saing that degraded sample has 0 rate, as below:

C:\Users\user1\scoop\apps\bazel\4.2.1\visqol_test\visqol\bazel-bin>visqol.exe --reference_file "C:\Users\user1\scoop\apps\bazel\4.2.1\visqol_test\visqol\testdata\clean_speech\mysample\good_16000_Hz.wav" --degraded_file "C:\Users\user1\scoop\apps\bazel\4.2.1\visqol_test\visqol\testdata\clean_speech\mysample\bad_16000_Hz.wav" --verbose --use_speech_mode --similarity_to_quality_model C:\Users\user1\scoop\apps\bazel\4.2.1\visqol_test\visqol\model\libsvm_nu_svr_model.txt
[wav_reader.cc : 174] RAW: Error parsing WAV Header - Expected 16bit samples.
[misc_audio.cc : 143] RAW: Error reading header for file C:\Users\user1\scoop\apps\bazel\4.2.1\visqol_test\visqol\testdata\clean_speech\mysample\good_16000_Hz.wav.
[wav_reader.cc : 174] RAW: Error parsing WAV Header - Expected 16bit samples.
[misc_audio.cc : 143] RAW: Error reading header for file C:\Users\user1\scoop\apps\bazel\4.2.1\visqol_test\visqol\testdata\clean_speech\mysample\bad_16000_Hz.wav.
[main.cc : 57] RAW: Error executing ViSQOL: INVALID_ARGUMENT: Input audio signals have different sample rates! Reference audio sample rate: 138441597507072. Degraded audio sample rate: 0.

But, after changed sample rate of this files to 48kHz, it works, it can be compared with following result:

"C:\Users\user1\scoop\apps\bazel\4.2.1\visqol_test\visqol\testdata\clean_speech\mysample\good1_16000_Hz.wav" --degraded_file "C:\Users\user1\scoop\apps\bazel\4.2.1\visqol_test\visqol\testdata\clean_speech\mysample\bad1_16000_Hz.wav" --verbose --use_speech_mode --similarity_to_quality_model C:\Users\user1\scoop\apps\bazel\4.2.1\visqol_test\visqol\model\libsvm_nu_svr_model.txt
[visqol_manager.cc : 227] RAW: Input audio sample rate is above 16kHz, which may have undesired effects for speech mode. Consider resampling to 16kHz.
ViSQOL conformance version: 310
Speech mode

Reference Filepath: C:\Users\user1\scoop\apps\bazel\4.2.1\visqol_test\visqol\testdata\clean_speech\mysample\good1_16000_Hz.wav
Degraded Filepath: C:\Users\user1\scoop\apps\bazel\4.2.1\visqol_test\visqol\testdata\clean_speech\mysample\bad1_16000_Hz.wav
MOS-LQO: 1.59276

| FVNSIM | Freq Band |

| 0.401463 | 50.000Hz |
| 0.490032 | 98.767Hz |
| 0.458079 | 156.063Hz |
| 0.417346 | 223.380Hz |
| 0.361086 | 302.471Hz |
| 0.430890 | 395.394Hz |
| 0.422619 | 504.570Hz |
| 0.387232 | 632.839Hz |
| 0.394989 | 783.543Hz |
| 0.428883 | 960.604Hz |
| 0.433549 | 1168.633Hz |
| 0.392224 | 1413.046Hz |
| 0.373587 | 1700.205Hz |
| 0.323136 | 2037.587Hz |
| 0.293950 | 2433.977Hz |
| 0.344104 | 2899.694Hz |
| 0.340415 | 3446.863Hz |
| 0.315738 | 4089.731Hz |
| 0.345335 | 4845.034Hz |
| 0.382120 | 5732.437Hz |
| 0.340726 | 6775.044Hz |

Can you please help me, what i'm doing wrong?

Mentioned files attached
mysample.zip

Thans in advance

Fail to build on Windows

I have tried to build Visqol on Windows 10 but got an error message with Bazel:

ERROR: C:/visqol-master/BUILD:36:11: //:visqol_lib depends on @armadillo_headers//:armadillo_header in repository @armadillo_headers which failed to fetch. no such package '@armadillo_headers//': java.io.IOException: Error downloading [http://sourceforge.net/projects/arma/files/armadillo-9.860.2.tar.xz] to C:/users/peiqi/_bazel_peiqi/d3jan4lu/external/armadillo_headers/armadillo-9.860.2.tar.xz: Redirect loop detected
ERROR: Analysis of target '//:visqol' failed; build aborted: Analysis failed
INFO: Elapsed time: 129.077s
INFO: 0 processes.
FAILED: Build did NOT complete successfully (34 packages loaded, 278 targets configured)
    currently loading: @com_google_protobuf//

Bazel version is 3.5.0 (also tried 3.7.0)
Visual Studio 2019

Thanks for your help.

ViSQOL speech mode compresses low MOS examples around 2.4

ViSQOL in speech mode only has an issue with MOS < 2.4 scaled, where those values are compresed near a MOS of 2.4. I am working on a solution that changes the mapping function from NSIM to MOS. Since this will change the scores and require a conformance version bump I am looking into more details which may be a week or two. In the meantime, if anyone would like a patch to workaround this issue, I could provide that, but it may not be the final version.

Does visqol use gpu? Best settings for evaluating noise supression?

Hi thanks for the repo.

Quick question, when I am running visqol I am not seeing any gpu usage. Should I be? Perhaps my bazel did not installed correctly or the version of TF being used is not utilizing the gpu. I am running over thousands of files and it's taking quite some time...

Also just wanted to check what the best settings are for evaluating noise suppression using visqol? I see the two flags
--use_speech_mode --use_unscaled_speech_mos_mapping, if I use this might it ignore some bands of noise that may be present in the file (I see its sensitive up to 8kHz)? Should I run visqol in audio mode and speech mode and average the two (perhaps a weighted avg)?

Thanks for your guidance in advance.

pffft repo down

I received the following error during build.

ERROR: An error occurred during the fetch of repository 'pffft_lib_linux':
java.io.IOException: Error downloading [https://bitbucket.org/jpommier/pffft/get/29e4f76ac53b.zip] to /home/user/.cache/bazel/_bazel_user/631aac0edc06f05dddb4a0ea29b61903/external/pffft_lib_linux/29e4f76ac53b.zip: GET returned 403 Forbidden

Looks like https://bitbucket.org/jpommier is down.

Too few samples

I obtain this error for various files although the reference files are long enough and can for example be predicted with POLQA.

[comparison_patches_selector.cc : 331] RAW: Error building ref spectrogram: INVALID_ARGUMENT: Too few samples (484) in signal to build spectrogram (320 required minimum).
[visqol_manager.cc : 119] RAW: Error executing ViSQOL: INVALID_ARGUMENT: Too few samples (484) in signal to build spectrogram (320 required minimum)..

ViSQOL uses protobuf-internal headers

Hello,

while building ViSQOL with CMake against a CMake-built protobuf I noticed that ViSQOL uses protobuf-internal headers like src/google/protobuf/stubs/statusor.h which won't be available in a CMake-protobuf installation.

I opened an issue in protocolbuffers/protobuf#7358 and asked to align both: the CMake and Bazel buildsystem in terms of installed or available headers. Based on the outcome you might lose access to StatusOr.

Are you aware of any replacement for StatusOr (e.g. in abseil-cpp)?

Thanks,
Gregor

Shows Build did NOT complete successfully when building visqol

Hi, I try to build visqol on Windows 10 (x64, Bazel version 5.3.0), but it gives an error An error occurred during the fetch of repository 'local_execution_config_python'

Building Python Bindings

I am running into some issues when trying to build the Python bindings.
Would it be possible to add some documentation on how to do so?

Unit Tests Failing on Windows with File Not Found/File Missing Errors

Hi,

While I'm able to build the current version of visqol on Windows, when I run bazel test all_unit_tests from the root project directory (containing the workspace file), the test suite appears to be failing:

//:commandline_parser_test                                               FAILED in 0.3s
//:gammatone_spectrogram_builder_test                                    FAILED in 0.3s
//:misc_audio_test                                                       FAILED in 0.3s
//:vad_patch_creator_test                                                FAILED in 0.3s
//:visqol_api_test                                                       FAILED in 6 out of 15 in 0.6s
  Stats over 15 runs: max = 0.6s, min = 0.3s, avg = 0.5s, dev = 0.1s
//:visqol_manager_test                                                   FAILED in 12 out of 15 in 0.5s
  Stats over 15 runs: max = 0.5s, min = 0.3s, avg = 0.4s, dev = 0.1s

Executed 17 out of 17 tests: 11 tests pass and 6 fail locally.

Looking into the failed test logs, I see notes like this (from misc_audio_test):

[misc_audio.cc : 98] RAW: Could not find file testdata/clean_speech/CA01_01.wav.

I can confirm the wave files are in the appropriate testdata directory under the project root. For reference, here is the output of a tree command for the testdata dir:

C:\GITPROJECTS\VISQOL\TESTDATA
│   BUILD
│
├───alignment
│       degraded.wav
│       reference.wav
│
├───clean_speech
│       CA01_01.wav
│       transcoded_CA01_01.wav
│
├───conformance_testdata_subset
│       BUILD
│       castanets48_stereo.wav
│       contrabassoon48_stereo.wav
│       contrabassoon48_stereo_24kbps_aac.wav
│       glock48_stereo.wav
│       glock48_stereo_48kbps_aac.wav
│       guitar48_stereo.wav
│       guitar48_stereo_64kbps_aac.wav
│       harpsichord48_stereo.wav
│       harpsichord48_stereo_96kbps_mp3.wav
│       moonlight48_stereo.wav
│       moonlight48_stereo_128kbps_aac.wav
│       ravel48_stereo.wav
│       ravel48_stereo_128kbps_opus.wav
│       README
│       sopr48_stereo.wav
│       sopr48_stereo_256kbps_aac.wav
│       steely48_stereo.wav
│       steely48_stereo_lp7.wav
│       strauss48_stereo.wav
│       strauss48_stereo_lp35.wav
│
├───example_batch
│       batch_input.csv
│
├───filtered_freqs
│       guitar48_stereo_10k_filtered_freqs.wav
│
├───long_duration
│   └───1_min
│           guitar48_stereo_deg_1min.wav
│           guitar48_stereo_deg_25s.wav
│           guitar48_stereo_ref_1min.wav
│           guitar48_stereo_ref_25s.wav
│
├───mismatched_duration
│       guitar48_stereo_middle_2sec_cut.wav
│       guitar48_stereo_middle_50ms_cut.wav
│       guitar48_stereo_x2.wav
│
├───non_48k_sample_rate
│       guitar48_stereo_44100Hz.wav
│
├───short_duration
│   ├───10000_sample
│   │       guitar48_stereo_10000_sample.wav
│   │
│   ├───1000_sample
│   │       guitar48_stereo_1000_sample.wav
│   │
│   ├───100_sample
│   │       guitar48_stereo_100_sample.wav
│   │
│   ├───10_sample
│   │       guitar48_stereo_10_sample.wav
│   │
│   ├───1_sample
│   │       guitar48_stereo_1_sample.wav
│   │
│   ├───1_second
│   │       guitar48_stereo_1_sec.wav
│   │
│   └───5_second
│           guitar48_stereo_5_sec.wav
│
├───svr_training
│       training_mat_tcdaudio14_aacvopus15_fvnsims.txt
│       training_mat_tcdaudio14_aacvopus15_moslqs.txt
│
└───test_model
        cpp_model.txt

There are also entries like this (from visqol_manager_test):

[visqol_manager.cc : 65] RAW: INVALID_ARGUMENT: Failed to load the SVR model file: C:\users\USERNAME\_bazel_USERNAME\amczxuak\execroot\__main__\bazel-out\x64_windows-opt\bin\visqol_manager_test.exe.runfiles\__main__/model/libsvm_nu_svr_model.txt

This file is missing (there are no files under C:\users\USERNAME\_bazel_USERNAME\amczxuak\execroot\__main__\bazel-out\x64_windows-opt\bin\visqol_manager_test.exe.runfiles\__main__/ )

Are there some other command line flags or other setup work required when running the tests on Windows?

I'm happy to provide more detailed test logs if needed, or more version info.

Usage in Android

Can we use this in Android As an NDK ??

Audio mode vs Speech mode and macos install

Would audio mode generally be used for audio files other than speech (ex. a piano recording)? And speech mode only for people talking?

Additionally, I can't find any installation instructions for macos.

Thanks

SegFault with Python bindings

I installed VISQOL with the python bindings as described in the setup section. Trying to run the python example snippet I get a segmentation fault in the following line

api.Create(config)

Is this happening to anyone else?

Build the bazel failed

I've tried to build the bazel as mentioned in the readme bazel build :visqol -c opt, but I get the following error:

external/org_tensorflow/tensorflow/lite/kernels/internal/optimized/optimized_ops.h:3603:41:   required from here
external/eigen_archive/Eigen/src/Core/AssignEvaluator.h:889:3: error: static assertion failed: YOU_MIXED_DIFFERENT_NUMERIC_TYPES__YOU_NEED_TO_USE_THE_CAST_METHOD_OF_MATRIXBASE_TO_CAST_NUMERIC_TYPES_EXPLICITLY
Target //:visqol failed to build

Any solution for this problem.
(gcc version 6.5.0 & Ubuntu 19.04)

Windows Build Requires `git`

Hi,

Building visqol on Windows requires git, but it is not listed as a requirement in the build section.

This seems like an easy thing to add, I'd volunteer to add it.

Andrew

Build failed on windows (unable to fetch llvm-raw)

Hi,

I'm trying to build visqol on windows 11 (x64, Bazel version 5.3.2), but it seems unable to fetch the llvm-raw repository

Here is what it says :

Build did NOT complete successfully

hello,
I failed to run the build command. The attachment is the error log. Can you help me ?
gcc version: 7.5.0
ubuntu version: 18.04
build.zip

Seg fault caused by calling Visqol::VisqolApi::Measure

I found that with a longer duration, longer than 10 seconds, an application with Visqol may crash with Seg Fault.
The crash appears only when I use multiple threads, each of thread could run VisqolApi::Measure.
GDB backtrace I caught several times

memcpy@@GLIBC_2.14 (vg_replace_strmem.c:1035)
Visqol::AMatrix::AMatrix(absl...
Visqol::VisqolApi::Measure:

And it is important in our use case.
After reading documentation of Visqol and documented code I didnt see mentions that measure of MOS value can be used in multiple threads.
Of course, I can add mutex or signal handler.

Just to be sure that I'm using Visqol API correctly,
can we assume the current implementation of Visqol could be safely called from a single thread?
with multiple threads, we would need to apply synchronization

The performance on P.Sup23 Dataset is not as expected

Hi, I tested ViSQOL (both audio mode and speech mode) on ITU P.Sup23 Dataset, and made 2 scatters which contains the results of ViSQOL and PESQ. but the performance of ViSQOL is not as expected.

As you can see the correlation between PESQ results and subjective MOS is higher. So what did I do wrong or there are something extra I need to do? Or do you have some other dataset for test? There seem to be very few public datasets with MOS.
Thanks.

Use of Visqol in lieu of Polqa

Hi,

I'm looking into Visqol to be used as replacement/complement of Polqa

I have several campaigns consisting of various calls in and out with the recorded samples.

When performing the audio analysis with Polqa system, I can correlate the MOS result with the audio - i.e. bad score when the sound is degraded, good ones when the audio is of good quality.

When using Visqol as a drop down replacement, I don't see this correlation. I have scores in a center zone (~2 ish) for most of the samples, including when the audio sample is 'blank' between calls - i.e. no audio - or some with good quality (scored a 4+ by Polqa).

For the silence part, I would expect a score at 0.00 or close to 1.0 as given by Polqa.

The command line used is:
visqol --degraded_file /share/record.210317-114801.wav --reference_file /share/French_SWB_f1s3_m1s3_8s.wav

When using the --use_speech_mode modifier, the gap between a silence sample and a voice sample widen, with the silence score being lower, but significantly higher than expected. (1.45 vs 0.0 or close to 1.0).

What would be the appropriate way to use Visqol as a Polqa replacement?

Thanks

compile problem

ERROR: An error occurred during the fetch of repository 'svm_lib':
java.io.IOException: Error downloading [https://github.com/cjlin1/libsvm/archive/v324.zip] to /home/shuqinjun/.cache/bazel/_bazel_shuqinjun/97cbfcd434c59c5b83328fa3fb73d0de/external/svm_lib/v324.zip: Tried to reconnect at offset 589,812 but server didn't support it

i build in ubuntu 16.04，but got wrong,who can help me？

Voice activity detection/patch alignment

Hello!

I have some questions regarding the behaviour of the patch detection and alignment.

Firstly I would like to clarify - in Visqol's --verbose output, the patch start and end times - are the times displayed before or after the global alignment part of Visqol? IE if there is an overall 2 second delay in the degraded patch, which to my understanding gets detected during the global alignment, will the patch time output show post or pre-alignment times?

Secondly - the sample guidelines mentioned in Readme (8-10 seconds long, 0.5 seconds of silence at the beginning and end, not much silence in the middle of sample) - does this apply to Speech mode too, or should the alignment and voice detection of speech mode handle audio with delay, and with a lot of silence in the sample?

An example of a samples that I am currently trying to use with Visqol:
View from audacity:

Visqol Speech mode output
MOS-LQO: 2.52259

| FVNSIM | Freq Band |

| 0.377030 | 50.000Hz |
| 0.476088 | 98.767Hz |
| 0.459481 | 156.063Hz |
| 0.763644 | 223.380Hz |
| 0.835369 | 302.471Hz |
| 0.923244 | 395.394Hz |
| 0.926039 | 504.570Hz |
| 0.903042 | 632.839Hz |
| 0.884796 | 783.543Hz |
| 0.844095 | 960.604Hz |
| 0.841896 | 1168.633Hz |
| 0.866645 | 1413.046Hz |
| 0.860731 | 1700.205Hz |
| 0.856387 | 2037.587Hz |
| 0.825984 | 2433.977Hz |
| 0.823645 | 2899.694Hz |
| 0.745470 | 3446.863Hz |
| 0.706072 | 4089.731Hz |
| 0.700011 | 4845.034Hz |
| 0.635031 | 5732.437Hz |
| 0.547395 | 6775.044Hz |

| 0 | 1.000000 | 0.180 - 0.580 | 1.440 - 1.840 |
| 1 | 0.764560 | 2.181 - 2.580 | 2.180 - 2.579 |
| 2 | 0.772817 | 2.580 - 2.980 | 2.580 - 2.980 |
| 3 | 0.843457 | 3.780 - 4.180 | 3.780 - 4.180 |
| 4 | 0.814809 | 4.180 - 4.580 | 4.180 - 4.580 |
| 5 | 0.780449 | 4.580 - 4.980 | 4.580 - 4.980 |
| 6 | 0.699916 | 5.380 - 5.780 | 5.380 - 5.780 |
| 7 | 0.773998 | 5.781 - 6.180 | 5.780 - 6.179 |
| 8 | 0.693399 | 6.181 - 6.580 | 6.180 - 6.579 |
| 9 | 0.529567 | 6.580 - 6.980 | 6.560 - 6.960 |
| 10 | 0.728254 | 8.180 - 8.580 | 8.180 - 8.580 |
| 11 | 0.673384 | 8.580 - 8.980 | 8.580 - 8.980 |
| 12 | 0.707640 | 8.980 - 9.380 | 8.980 - 9.380 |

For reference, Visqol Audio mode output:
MOS-LQO: 3.41303

| FVNSIM | Freq Band |

| 0.533289 | 50.000Hz |
| 0.544615 | 91.748Hz |
| 0.645831 | 139.746Hz |
| 0.804246 | 194.931Hz |
| 0.902527 | 258.379Hz |
| 0.936618 | 331.326Hz |
| 0.961259 | 415.195Hz |
| 0.957324 | 511.621Hz |
| 0.950879 | 622.484Hz |
| 0.941872 | 749.946Hz |
| 0.922163 | 896.492Hz |
| 0.927609 | 1064.979Hz |
| 0.931144 | 1258.694Hz |
| 0.944402 | 1481.411Hz |
| 0.929926 | 1737.475Hz |
| 0.933558 | 2031.877Hz |
| 0.926355 | 2370.358Hz |
| 0.924536 | 2759.518Hz |
| 0.879479 | 3206.945Hz |
| 0.863854 | 3721.361Hz |
| 0.881097 | 4312.798Hz |
| 0.862002 | 4992.786Hz |
| 0.802238 | 5774.585Hz |
| 0.704898 | 6673.438Hz |
| 0.588221 | 7706.870Hz |
| 0.578189 | 8895.030Hz |
| 0.593581 | 10261.087Hz |
| 0.599670 | 11831.674Hz |
| 0.602659 | 13637.414Hz |
| 0.621666 | 15713.517Hz |
| 0.694320 | 18100.460Hz |
| 0.786449 | 20844.785Hz |

| 0 | 1.000000 | 0.280 - 0.880 | 1.200 - 1.800 |
| 1 | 1.000000 | 0.880 - 1.480 | 1.220 - 1.820 |
| 2 | 1.000000 | 1.480 - 2.079 | 1.241 - 1.840 |
| 3 | 0.681553 | 2.081 - 2.680 | 2.080 - 2.679 |
| 4 | 0.632587 | 2.680 - 3.280 | 2.680 - 3.280 |
| 5 | 0.891469 | 3.280 - 3.880 | 3.280 - 3.880 |
| 6 | 0.648068 | 3.880 - 4.480 | 3.880 - 4.480 |
| 7 | 0.681742 | 4.480 - 5.080 | 4.480 - 5.080 |
| 8 | 0.599973 | 5.080 - 5.680 | 5.080 - 5.680 |
| 9 | 0.611158 | 5.681 - 6.280 | 5.680 - 6.279 |
| 10 | 0.511901 | 6.280 - 6.880 | 6.280 - 6.880 |
| 11 | 0.938552 | 6.880 - 7.480 | 6.880 - 7.480 |
| 12 | 0.884550 | 7.480 - 8.075 | 7.505 - 8.100 |
| 13 | 0.611468 | 8.080 - 8.678 | 8.082 - 8.680 |
| 14 | 0.567279 | 8.680 - 9.280 | 8.680 - 9.280 |
| 15 | 0.990090 | 9.280 - 9.880 | 10.420 - 11.020 |
| 16 | 1.000000 | 9.880 - 10.480 | 10.440 - 11.040 |
| 17 | 1.000000 | 10.482 - 11.080 | 10.460 - 11.058 |
| 18 | 0.995017 | 11.080 - 11.649 | 10.591 - 11.160 |

The audio sample itself a voice recording.
Essentially, what I am trying to figure out, is - could feeding Visqol Voice samples with delay and a lot of silence be the culprit behind questionable scores we've been getting, or should we look for problems elsewhere.

float32 numpy arrays don't work with Python API

VisqolApi::Measure() when accessed through the python interface requires float64 ndarrays and will throw a type error on float32 ndarrays. Measure() is a native function that pybind11 translates. This may be a python casting/typing issue. The workaround is simple: cast to float64.

ImportError: initialization failed when trying to import in python

Hi all,
I'm trying to get ViSQOL to work through python
I followed all the installation steps and got the main script to work from the command line.
However, when I run the example code mentioned in the README, I hit this error:

>>> from visqol import visqol_lib_py
Add a python dependency on "@com_google_protobuf//:protobuf_python"
ModuleNotFoundError: No module named 'google'
Traceback (most recent call last):
  File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
  File "<frozen importlib._bootstrap>", line 992, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
ModuleNotFoundError: No module named 'google'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ImportError: initialization failed

It seems there's an extra dependency needed, but I'm having a hard time decyphering the error message.
What exactly am I missing?
Thanks in advance :)

Sampling rate conversion

If I were to have an audio file of a person speaking with a 48k sampling rate, would converting the file to 16k and running it in speech mode or leaving it as 48k and running it in speech mode despite the warning yield better results?

Does the test speech need to have the same length as the reference speech?

Hi, I would like to know if visqol can be used when the length of test speech is differernt from that of reference speech ?

speech mode: Patch 0 is always the beginning of the file, even when there is no voice activity

Example output from files with -50db silence in the first second:

| Patch Idx | Similarity | Ref Patch: Start - End | Deg Patch: Start - End |

| 0 | 1.000000 | 0.186 - 0.580 | 0.580 - 0.974 |
| 1 | 0.384118 | 1.380 - 1.780 | 1.420 - 1.820 |
| 2 | 0.522233 | 2.180 - 2.580 | 2.180 - 2.580 |
| 3 | 0.742840 | 6.180 - 6.580 | 6.180 - 6.580 |
| 4 | 0.688050 | 10.180 - 10.576 | 10.184 - 10.580 |
| 5 | 0.905596 | 10.580 - 10.980 | 10.580 - 10.980 |

Build failed on windows

Road to build the solution on windows is hard .. still couldn't figured it out. Does anyone succeeded in setting up this on windows?

Difference in spectrogram implementation compared to the paper - potential degradation

Hi, I've read the papers describing ViSQOL, and tried to wrap my head around the implementation you provided.

The second paper, "Objective Assessment of Perceptual Audio
Quality Using ViSQOLAudio", describes the spectrogram procedure (Part III.C) as:

"a short-time Fourier transform is performed with a 32 band Gammatone filter bank with a minimum frequency of..."

And in the following paper, "ViSQOL v3", a change is described to use an 80ms window with 25% overlap.

When examining the code, I saw that the Gammatone filterbank is implemented directly in the time domain. However, I did notice that a windowing function was not applied (the paper describes a hamming window, some comments in the code still refer to it). Won't this cause windowing artifacts which degrade the spectrogram? Is the change to an 80ms window enough to counteract this?

When run outside of the visqol/ directory speech mode requires --similarity_to_quality_mode but should not

Currently --similarity_to_quality_model is required for speech mode even though that model file is not used (the exponential mapping is always used for speech).
As a workaround passing in model/libsvm_nu_svr_model.txt as --similarity_to_quality_model will allow running in external directories.

Windows installation

I have some problem with windows installation. Can anyone explain more deeply step 3 Build ViSQOL:

Thanks

symbol not found in flat namespace on MacOS

I got an error like

dyld[2485]: symbol not found in flat namespace (_CFRelease)
zsh: abort ./bazel-bin/visqol --reference_file XXXXX

when I tried to run ./bazel-bin/visqol on MacOS

Any solutions for this problem?

PS: I install bazel from bazelisk with brew install bazelisk, which gives me a bazel version 5.3.0. would that be a trouble? If yes, how I could use bazel with the version 5.1.0 as indicated in the README file?

Thanks so much in advance

Feature request: pip-installable package with Python API

Thanks for making visqol! Would you consider making it easy to install and use in a Python environment?

Gammatome

I saw in the code reference to Dan Ellis’s Gammatone spectrogram: https://labrosa.ee.columbia.edu/matlab/gammatonegram/
Which version of Gammatone spectrogram of Dan Ellis was used? Fast one or the accurate one?

Do not get the maximum of MOS value using two same audio under speech mode

Hi,Thanks to the good job！
When I running in the speech mode with two same audio sampled at 16KHz, the MOS values of many results are around 4.4-4.6, and it did not reach the maximum value of 5.0. However, the NSIM score and similarity of all audio segments are 1.0. Is this a normal phenomenon?
I got these results using the SVR model you provided："lattice_tcditugenmeetpackhref_ls2_nl60_lr12_bs2048_learn.005_ep2400_train1_7_raw.tflite"

Need a direct version of visqol

Thanks to the good job.
but it need bazel compile firstly. The bazel compiler will always download something as dependencies, which is not friendly to some secure network envirionment where access to outside network is controled.
We need a direct usable version in Linux to have to test. If there is any python script to load the model and make MOS prediction, That is fine.
Can any one help?

Building fails under Ubuntu 18.04 with GCC 7

Hi thanks for the repo. I am compiling visqol in Ubuntu 18.04 with GCC 7.5.0

bazel build :visqol -c opt

I get the filesystem include error.

Then I found similar issue in here https://stackoverflow.com/questions/73974753/undefined-reference-to-stdfilesystem-using-bazel-build
Now I am trying to use gcc 8 to compile visqol based on the link above.

Just a quick question. Is there a gcc version that you recommend to use ? gcc 8 or above?

Duration of the Files

What is the max file length(duration) we can pass to the VISQOL? Can we pass more than 8-10secs files? How it'll handle 2min input/output files?

Cannot compile on Windows

Hello

I have tried to compile Visqol on Windows 10 but I have an error message with Bazel:

Cannot open include file: 'boost/filesystem.hpp': No such file or directory

Boost is installed in c:\boost, and I have check the match with the WORKSPACE file.
I also tried to clean the cache with bazel clean --expunge.

Bazel version is 1.0.0 (to avoid bash usage in previous version)
Boost version is 1.73
Visual Studio 2019

Thanks for your help.

Getting error - no such package '@com_google_protobuf//

I'm trying to build ViSQOL with following command - bazel build :visqol -c opt, but I'm getting the following error

ERROR: no such package '@com_google_protobuf//': java.io.IOException: Error downloading [https://github.com/protocolbuffers/protobuf/releases/download/v3.11.1/protobuf-all-3.11.1.tar.gz] to /private/var/tmp/_bazel_n689415/9b94f57562fb1bf28a7df8c596c92f3e/external/com_google_protobuf/temp9151135432370956210/protobuf-all-3.11.1.tar.gz: connect timed out
INFO: Elapsed time: 55.329s
INFO: 0 processes.
FAILED: Build did NOT complete successfully (0 packages loaded)

I'm running on Mac with Bazel version as bazel 3.7.2-homebrew and I'm able to download if I paste the URL (https://github.com/protocolbuffers/protobuf/releases/download/v3.11.1/protobuf-all-3.11.1.tar.gz) into the browser directly.

Questions about degraded file.

Hello,
Could you share detailed procedures about how to collect degraded files.

I checked on Visqol tutorial that "degraded files" must be: "48k sample rate WAV file that will be compared to the reference audio."

So, once I need to compare performance of different codecs like (mp3, AAC, OGG, FAAC, Flac etc...), AND once I have a reference audio file (wav format, stereo, 48k). What should I do to?
How can I get MOS scores of files using mp3 codec, for example?
Do I need to convert the reference audio file to mp3 and after that, revert back to wav form?
what tool is recommended to getting degraded file?

Thanks in Advance.

Cross correlation alignment

Is the fine-scaled time alignment enabled by default? Currently testing with both the reference and degraded file being 3s. Value is approximately 2.3. However if I cut off the first 0.5s of the degraded file the score drops to about 1.9. I was expecting only the last 2.5s of each audio file to be used in analysis and receive a score closer to 2.3, as I read in https://arxiv.org/pdf/2004.09584.pdf that ViSQOL did alignment.

Thank you.

google / visqol Goto Github PK

visqol's Introduction

ViSQOL

Table of Contents

Guidelines

Audio Mode:

Speech Mode:

General guidelines for input

General guidelines for interpreting the output

Build

Linux/Mac Build Instructions

Install Bazel

Install Numpy

Build ViSQOL

Windows Build Instructions (Experimental, last tested on Windows 10 x64, 2020 August)

Install Bazel

Install git

Install Tensorflow dependencies

Build ViSQOL:

Command Line Usage

Note Regarding Usage

Flags

Example Command Line Usage

Linux/Mac:

Windows:

Linux/Mac:

Windows:

Linux/Mac:

Windows:

Linux/Mac:

Windows:

C++ API Usage

ViSQOL Integration

Sample Program

Python API Usage

ViSQOL Installation

Sample Program

Dependencies

Support Vector Regression Model Training

License

Papers

FAQ

Why do I get compile error about undeclared inclusion(s) in rule '//:visqol_lib'?

Why are the MOS predictions on my files so bad?

Acknowledgement

visqol's People

Contributors

Stargazers

Watchers

Forkers

visqol's Issues

Reference Filepath: C:\Users\user1\scoop\apps\bazel\4.2.1\visqol_test\visqol\testdata\clean_speech\mysample\good1_16000_Hz.wav Degraded Filepath: C:\Users\user1\scoop\apps\bazel\4.2.1\visqol_test\visqol\testdata\clean_speech\mysample\bad1_16000_Hz.wav MOS-LQO: 1.59276

| FVNSIM | Freq Band |

Example output from files with -50db silence in the first second:

| Patch Idx | Similarity | Ref Patch: Start - End | Deg Patch: Start - End |

Recommend Projects

Recommend Topics

Recommend Org

Reference Filepath: C:\Users\user1\scoop\apps\bazel\4.2.1\visqol_test\visqol\testdata\clean_speech\mysample\good1_16000_Hz.wav
Degraded Filepath: C:\Users\user1\scoop\apps\bazel\4.2.1\visqol_test\visqol\testdata\clean_speech\mysample\bad1_16000_Hz.wav
MOS-LQO: 1.59276