Git Product home page Git Product logo

sinsymaker's Introduction

sinsymaker

HTS Voice creator from arbitrary karaoke song, generates singer library usable by Sinsy singing synthesizer

Beads

Until Beads is in Maven Central, you'll need to install it first.

  1. Download Beads

  2. Install the main JARs:

     # Windows
     mvn install:install-file -Dfile=D:/beads/library/beads.jar -Dpackaging=jar -DgroupId=net.beadsproject -DartifactId=beads -Dversion=1.02
     mvn install:install-file -Dfile=D:/beads/library/beads-io.jar -Dpackaging=jar -DgroupId=net.beadsproject -DartifactId=beads-io -Dversion=1.02
     # Linux
     mvn install:install-file -Dfile=/media/ceefour/passport/project_passport/lumen/speech/beads/beads/library/beads.jar -Dpackaging=jar -DgroupId=net.beadsproject -DartifactId=beads -Dversion=1.02
     mvn install:install-file -Dfile=/media/ceefour/passport/project_passport/lumen/speech/beads/beads/library/beads-io.jar -Dpackaging=jar -DgroupId=net.beadsproject -DartifactId=beads-io -Dversion=1.02
    
  3. Package and install the "source" JARs:

     # I already generated those:
     #jar -cvf F:/project_passport/lumen/speech/beads/beads-1.02-sources.jar -C "D:/beads/src/beads_main" .
     #jar -cvf F:/project_passport/lumen/speech/beads/beads-io-1.02-sources.jar -C "D:/beads/src/beads_io" .
     # All you have to is install them:
     # Windows
     mvn install:install-file -Dfile=F:/project_passport/lumen/speech/beads/beads-1.02-sources.jar -Dpackaging=jar -DgroupId=net.beadsproject -DartifactId=beads -Dversion=1.02 -Dclassifier=sources
     mvn install:install-file -Dfile=F:/project_passport/lumen/speech/beads/beads-io-1.02-sources.jar -Dpackaging=jar -DgroupId=net.beadsproject -DartifactId=beads-io -Dversion=1.02 -Dclassifier=sources
     # Linux
     mvn install:install-file -Dfile=/media/ceefour/passport/project_passport/lumen/speech/beads/beads-1.02-sources.jar -Dpackaging=jar -DgroupId=net.beadsproject -DartifactId=beads -Dversion=1.02 -Dclassifier=sources
     mvn install:install-file -Dfile=/media/ceefour/passport/project_passport/lumen/speech/beads/beads-io-1.02-sources.jar -Dpackaging=jar -DgroupId=net.beadsproject -DartifactId=beads-io -Dversion=1.02 -Dclassifier=sources
    

VorbisSPI

There's one file missing for vorbisspi in Maven, so:

mvn install:install-file -Dfile=/media/ceefour/passport/project_passport/lumen/speech/VorbisSPI1.0.3/lib/jogg-0.0.7.jar -DgroupId=com.jcraft -DartifactId=jogg -Dversion=0.0.7 -Dpackaging=jar

TarsosDSP (TODO: replace with Beads)

Until TarsosDSP is in Maven Central, you'll need to install it first.

  1. Download TarsosDSP

  2. Install the main JAR:

     mvn install:install-file -Dfile=/media/ceefour/passport/project_passport/lumen/speech/TarsosDSP-2.0-bin.jar -Dpackaging=jar -DgroupId=be.tarsos.dsp -DartifactId=tarsosdsp -Dversion=2.0
    
  3. Install the "source" JAR (it actually contains both .class and .java files):

     mvn install:install-file -Dfile=/media/ceefour/passport/project_passport/lumen/speech/TarsosDSP-2.0-with-sources.jar -Dpackaging=jar -DgroupId=be.tarsos.dsp -DartifactId=tarsosdsp -Dversion=2.0 -Dclassifier=sources
    

Audio Output Format

We'll be using ACID Loop File format (i.e. enhanced WAV) when generating all output audios.

Thanks, I've found a RIFF viewer, and specification of the riff chunks now. In my example wav, I see a chunk called 'acid' which is 24 bytes. I suppose this will contain the tempo information, but I haven't figured out yet how this field is structured.

Update: My test file had tempo 138.00 BPM I couldn't find 138 either in asci or in integer format in the acid tag, but 138 appears to be 00 00 0A 43 in floating point format, and this were exactly the last 4 bytes of the acid chunk. Now I still need to find out if the tempo is at a fixed offset in the tag, or if there's some other way to know where the tempo is located. The acid chunk that Fruity Loops created was 24 bytes long btw.

Via http://www.kvraudio.com/forum/viewtopic.php?p=3061898#p3061898 :

** The acid chunk goes a little something like this:
**
** 4 bytes          'acid'
** 4 bytes (int)     length of chunk starting at next byte
**
** 4 bytes (int)     type of file:
**        this appears to be a bit mask,however some combinations
**        are probably impossible and/or qualified as "errors"
**
**        0x01 On: One Shot         Off: Loop
**        0x02 On: Root note is Set Off: No root
**        0x04 On: Stretch is On,   Off: Strech is OFF
**        0x08 On: Disk Based       Off: Ram based
**        0x10 On: ??????????       Off: ????????? (Acidizer puts that ON)
**
** 2 bytes (short)      root note
**        if type 0x10 is OFF : [C,C#,(...),B] -> [0x30 to 0x3B]
**        if type 0x10 is ON  : [C,C#,(...),B] -> [0x3C to 0x47]
**         (both types fit on same MIDI pitch albeit different octaves, so who cares)
**
** 2 bytes (short)      ??? always set to 0x8000
** 4 bytes (float)      ??? seems to be always 0
** 4 bytes (int)        number of beats
** 2 bytes (short)      meter denominator   //always 4 in SF/ACID
** 2 bytes (short)      meter numerator     //always 4 in SF/ACID
**                      //are we sure about the order?? usually its num/denom
** 4 bytes (float)      tempo

TBD: Use FluidSynth's / GrandOrgue's format? TBD: RIFF Wave Cue-Point chunks: http://sharkysoft.com/archive/lava/docs/javadocs/lava/riff/wave/doc-files/riffwave-content.htm

Legacy Documentation

JMathStudio -- NO LONGER USED

Until JMathStudio is in Maven Central, you'll need to install it first.

  1. Download JMathStudio ZIP and extract to ~/tmp.

  2. Install the main JAR:

     mvn install:install-file -Dfile=$HOME/tmp/JMathStudio_Package/Bin/JMathStudio.jar -Dpackaging=jar -DgroupId=org.jmathstudio -DartifactId=jmathstudio -Dversion=1.2.0
    
  3. Create the Javadoc JAR then install it:

     jar -cvf ~/tmp/jmathstudio-1.2.0-javadoc.jar -C "$HOME/tmp/JMathStudio_Package/API Doc" .
     mvn install:install-file -Dfile=$HOME/tmp/jmathstudio-1.2.0-javadoc.jar -Dpackaging=jar -DgroupId=org.jmathstudio -DartifactId=jmathstudio -Dversion=1.2.0 -Dclassifier=javadoc
    

sinsymaker's People

Contributors

ceefour avatar

Stargazers

skymin avatar  avatar Mark Beltran avatar 小码蚁 avatar Lex Lim avatar 柚木 鉉 avatar  avatar Ulysses avatar  avatar

Watchers

 avatar James Cloos avatar

Forkers

syntheticity

sinsymaker's Issues

Generate ASS subtitle file from simple story script for rythmo band recording

To support this workflow:

  1. Simple story script (i.e. as close to text, no timing, but annotated with prosody)
  2. Process this script into ASS subtitle file, synthesize the proper timings for optimum audio segmentation
  3. Dub this ASS subtitle file, output is annotated WAV including noise profile
  4. Noise removal
  5. Process this annotated WAV with the ASS subtitle to generate voice database

Segment 2-word clauses into separate word audios

This should be the easiest. Examples:

  1. jiwit hahaha
  2. perangkap oleh

Simple silence/valley (local extrema) detection around the "middle" probably works

where "middle" is relatively biased by the word length

Noise removal algorithm + normalization

Given a noiseprofile.wav and raw.wav (both mono), perform noise removal.

My good settings are:

  • strength 36 dB (enough for SonicGear's)
  • sensitivity 5
  • smoothing 150 Hz

Afterwards normalize the entire wav to -1 dB.

HTML5 record with noise removal

record noise for 3 seconds, then get its profile

Then record.

After recording send both to server so you get noise removed and normalized output.

Depends on #4, #5.

Voice database browser/player (HTML5)

HTML5 App (Ionic) to browse a voice database in JSON and play voices (with proper pre/post/attack/release) in WAV/OggFLAC

It should be easy to support Neo4j-powered voice database as well as JSON.
and later on the official (or extended) HTS/Sinsy voice database.

Segment 3-word clauses into separate audios

This is harder than #1, but still doable, and very useful / common.

  1. dan sedang meronta-ronta (you can call this either 3 or 4 words, but emotionally you'll treat meronta-ronta as 1 word)
  2. jeratan bangau itu
  3. saat menjual kayu
  4. terjerat di perangkap
  5. terus dibelikan makanan
  6. uang hasil penjualannya
  7. Yosaku segera melepaskan

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.