Git Product home page Git Product logo

stable-audio-metrics's Introduction

stable-audio-metrics

Collection of metrics for evaluating music and audio generative models:

  • Fréchet Distance at 48kHz, based on Openl3.
  • Kullback–Leibler divergence at 32kHz, based on PaSST.
  • CLAP score at 48kHz, based on CLAP-LAION.

stable-audio-metrics adapted established metrics to assess the more realistic use case of long-form full-band stereo generations. All metrics can deal with variable-length inputs.

Installation

Clone this repository, and create a python virtual environment python3 -m venv env, activate it source env/bin/activate, and install the dependencies pip install -r requirements.txt.

  • GPU SUPPORT –We only support GPU usage, because it can be too slow on CPU.
  • TROUBLESHOOTING – It might require an older version of cuda because of Openl3 dependencies. Try cuda 11.8 if you find it does not run on GPU as expected.

Documentation

Main documentation is available in:

Each example script further details how to use it:

Usage

Modify our examples such that they point to the folder you want to evaluate and run it. For example, modify and run: CUDA_VISIBLE_DEVICES=0 python examples/musiccaps_no-audio.py to evaluate with musiccaps dataset, or CUDA_VISIBLE_DEVICES=6 python examples/audiocaps_no-audio.py to evaluate with audiocaps. Check the examples' documentation.

  • METRICS WITHOUT DATASETS – The no-audio examples allow running the evaluations without downloading the datasets, because reference statistics and embeddings are already computed in load. We do not provide any pre-computed embedding for the CLAP score, because is fast to compute.
  • COMPARING WITH STABLE AUDIO – To compare against Stable Audio, you must set all parameters as in the no-audio examples. Even if your model outputs mono audio at a different sampling rate. stable-audio-metrics will do the resampling and mono/stereo handling to deliver a fair comparison.

Data structure

Generate an audio for every prompt in each dataset, and name each generated audio by its corresponding id.

Our musiccaps examples assume the following structure, where 5,521 generations are named after the ytid from the prompts file load/musiccaps-public.csv: your_model_outputsfolder/-kssA-FOzU.wav,'your_model_outputs_folder/_0-2meOf9qY.wav, ... your_model_outputs_folder/ZzyWbehtt0M.wav.

Our audiocaps examples assume the following structure, where 4,875 generations are named after the audiocap_id from the prompts file load/audiocaps-test.csv: your_model_outputsfolder/3.wav, your_model_outputs_folder/481.wav, ... your_model_outputs_folder/107432.wav.

stable-audio-metrics's People

Contributors

jordipons avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.