Git Product home page Git Product logo

ensemble's Introduction

System Requirments:

  1. Python3
    • google cloud text to speech
    • music21
    • textgrid
  2. Matlab
    • digital signals processing toolbox

How to use:

$ python3 generate.py [optional flags]

select a valid MusicXML file from the dialog, and then the song performances will be placed in the /output folder.

Several command-line flags can be used to affect how the program runs:

  • --validate - Run validation on speech alignment (typically the program will crash the first time running a new song without this option)
  • --no-tts - Skip the downloading the words from the song. Speech files are cached, so this typically isn't necessary
  • --no-align - Skip the speech audio alignment to phoneme information
  • --reset-cache - Delete all previously downloaded speech audio files that have been cached
  • --no-text - Replace all lyrics with the vowel 'Ah'

Examples:

The sheet music (MusicXML) for several pieces is available in the /sheet_music folder. Example performances (WAV) of the pieces are available in the /output/demos folder

To-Do:

  • fix the extend method so that it is always a smooth transition into the the stretched center of the vowel. It works most of the time, but there are still a lot of cases where it does not get a proper pitch for the vowel. Perhaps look at the mean pitch in the vowel, and try to match that rather than just whatever period we landed on.
  • look into dynamics control based on the intensity of the waveform
  • make it so that parts can have multiple notes at the same time (chords I think?)
  • make it so that multiple voices can be on the same line at a time
  • when stitching syllables, make each boundary a zero crossing (if next word derivative at zcc is wrong, invert the sound signal)
  • in python, detect when the same syllable is repeated over multiple notes (i.e. redo the function for extracting words and determining what syllable the of the word (according to the music) is being sung)
  • figure out why the forced aligner fails in a lot of cases
  • make each voice part use multiple singers
  • add vibratto and tremolo to voices, esp for sustained notes
  • for time stretching samples, look into some method to evaluate the quality of the period selected. Sometimes it sounds like an artifact is selected as part of the period, so perhaps the method could be to sweep over the vowel and pull out a period that is most average
  • look into integrating audiveris for optical music recognition, so that the software can do a full end to end performance, starts with a PDF of sheet music, and ending with the audio recording
  • move all signal processing from matlab to python (i.e. replace pitch detection, pitch shifting, and time stretching, with numpy and probably c++ libraries)

ensemble's People

Contributors

david-andrew avatar

Stargazers

 avatar

ensemble's Issues

Potential performance improvement of synthesize_speech

Hi,

I come from a research team which works on bugs in real-world applications.

I want to make a report about a few potential performance improvement in your project.

  1. In file generate.py, it downloads word-wise audio using Google synthesize_speech API sequentially. However, as cloud APIs do not compete resources locally, calling them in parallel would greatly reduces the execution time.

  2. Google synthesize_speech API actually also returns time info of the transcript. It would be more time efficient to feed several words to the API and chunk later. As there's a cold start time, two 1-word audios usually takes much longer time than one 2-word audio.

Perhaps someone might be able to confirm if these seem reasonable. Also, if you need any help, I'm happy to provide some code snippet for fixing.

If you are not offended by it, we may include your response as a percentage number (e.g. xx% of applications faces xx problem, and xx% of them confirms it) in our research paper.

Thanks.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.