Git Product home page Git Product logo

sp2si-code's Introduction

Speech-to-Singing Conversion

Setup

You can setup a new conda environment with the environment.yml file (recommended). You can start with miniconda installation if you are completely unfamiliar with anaconda

conda env create -f environment.yml

You will need to download the dataset (NUS-48E), models weigths, and clone a repo for melody extraction to run this code.

  • The NUS-48E dataset can be downloaded from this link. The downloaded dataset (folder named 'NUS_48E') should be saved inside this repository (current working directory).

  • The model weights can be downloaded from here. You should place the complete folder (named 'models') in the 'output' folder of this repo.

  • Download code for melody extractor system (by Li Su) here. You should extract the zip and place the folder named 'VocalMelodyExtPatchCNN-master' inside this repo (current working directory). After that you should move the file 'VocalMelodyExtPatchCNN-master/model3_patch25' with the other files of the repo (in the current working directory)

Usage

The first time you run the code, it will also organize the audio from NUS-48E in a dictionary. This can take up to 10-15 minutes.

You can currently

  1. Compute LSD for different models on random samples generated from NUS-48E dataset (with the function eval_sys()).
  2. Compute random predictions for multiple models on the NUS-48E data (function random_pred()).
  3. Compute prediction of a model on any given speech, melody file (function eval())

Both functions eval_sys() and random_pred() have some common set of arguments:

  • model_list: Specifies the list of models you want to compute results for. Eg. ['PMTL', 'PMSE']. Current available model options are 'PMTL', 'PMSE', 'B1'. 'B2'.
  • n_samp: Number of samples for prediction.
  • min_length: Minimum length of the input speech signal (in seconds). Default value is 1.0
  • fld: List of singer folders from NUS_48E dataset for whom you want to conduct evaluation. Eg. ['ADIZ']. Default value is ['ADIZ', 'SAMF'].
  • psongs: List of songs for evaluation, for each singer specified in fld. Eg. [['01', '18']]. Default value is [['18'], ['18']].
  • random (Not available with random_pred() function): Denotes if input samples used for prediction are randomly selected from the generated samples. Default value is True

Arguments for function eval()

  • net1, net2: Networks which you want to use for prediction
  • speech_file_loc: Location of file containing speech
  • melody_file_loc: Location of file to extract melody from
python evaluation_sp2si.py

sp2si-code's People

Contributors

jayneelparekh avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.