sleepwalking / libllsm2 Goto Github PK

View Code? Open in Web Editor NEW

142.0 142.0 18.0 720 KB

Low Level Speech Model (version 2.1) for high quality speech analysis-synthesis

License: GNU General Public License v3.0

C 88.43% C++ 2.58% Objective-C 6.90% Makefile 2.09%

libllsm2's People

Contributors

Stargazers

Watchers

Forkers

tuxzz renesugar entn-at opensynth agangzz dltaixlt chenchy nature1317 dreamtonics zxy874175242 reaper10 flottant chenz85 aria723 sandyzikun chinum-565e9917b9f9061 qust-dsra-2021 qq547276542

libllsm2's Issues

"make test": make: *** No rule to make target 'build/test-dsputils', needed by 'test'.

I think I'm getting an error building the llsm tests.

make test
make: *** No rule to make target 'build/test-dsputils', needed by 'test'.

Get new pitches for the speech

Hi,your work is really good.
if we get new pitch for example pitch*=0.9,ignoring formant shift,
How many parameters do we need to modify?

Consider FChT for f0 estimation and harmonic analysis

Problems that we have now:

PYin is not robust and accurate enough on signal with low SNR at small window(4 * T), we have to fix some frames manually.
For example, tuxzz/revoice_prototype/yuri_orig, where [0.19s, 0.354s), [2.669s, 2.689s), [2.814s, 2.86s) should be voiced but pyin marked it unvoiced and give a poor predicted f0 value.
It may be a good choice to use FChT-based method[2] or the SRH(Summation of Residual Harmonics) based method to get better F0 and voice activity.
Our current harmonic analysis method isn't really perfect. Even if Chirp-Z and Direct-DFT[4] are more better than FFT spectrum-based method.
We can't even reconstruct a clean linear chirp or stationary harmonic signal perfectly due to inaccurate f0 and imperfect harmonic analysis-synthesis method.
FChT provided a method to analysis chirp signal near-perfectly, we can have better resolution on both time and frequency axes, which can give out a better analysis result.
0.005s for hop size is not enough for signals with very fast variant f0, which is common on talking voice.
0.0025s is good on both quality and speed.

By the way, a chirp harmonic model maybe better than our current model.

References

[1] Weruaga, Luis, and Márian Képesi. "The fan-chirp transform for non-stationary harmonic signals." Signal Processing 87.6 (2007): 1504-1522.
[2] Cancela, Pablo, Ernesto López, and Martín Rocamora. "Fan chirp transform for music representation." 13th Int. Conf. on Digital Audio Effects, Austria. 2010.
[3] Dunn, Robert, and Thomas F. Quatieri. "Sinewave analysis/synthesis based on the Fan-Chirp tranform." Applications of Signal Processing to Audio and Acoustics, 2007 IEEE Workshop on. IEEE, 2007.
[4] Direct-DFT is a modified DFT. Normal DFT is and Direct-DFT is , where f is frequency, fs is sample rate, allows you calculate any frequency point from a signal. It works better than czt on my program.

Is the dependency libihnm and libnebula open-source

I saw the commit from Eno included the newly added libihnm and libnebula dependency. Are those open-sourced?

fatal error: libihnm/ihnm.h: No such file or directory

When i did make, exists such error in the following:
mkdir -p ./build
cc -DFP_TYPE=float -std=c99 -Wall -fPIC -pthread -DUSE_PTHREAD -I/usr/include/ -I/usr/include/ -I/usr/include/ -fopenmp -Og -g -D_DEBUG -o ./build/layer0.o -c layer0.c
layer0.c:21:26: fatal error: libihnm/ihnm.h: No such file or directory
#include <libihnm/ihnm.h>
^
compilation terminated.
makefile:129: recipe for target 'build/layer0.o' failed
make: *** [build/layer0.o] Error 1

how to do concatenative synthesis？

Thank you for the great job!

Can I use libllsm2 to implement a simple concatenative synthesis program? If yes, can you give me some tips about how to do time stretching and how to do cross-fading between two recorded vowel audio samples?

Thank you very much!

How to handle the input audio and output it in real time?

I see the demo test-llsmrt.c about Real-time synthesis. But the F0 is calculate once in the demo.
Is there some solution exist for real time input audio and output in real time?
Thanks~

How to do speech synthesis?

Can you talk more detailly about how to do speech synthesis. And how about the quality compared with mainstream vocoder such as world? @Sleepwalking

How to synthesize voice using a statistical model (e.g. DNN, HMM) with libllsm2?

Hi, thank you for the great job!

Can I use libllsm2 as vocoder for a statistical model ?
I find feature's dimision in layer0 is different for each frame if the f0 is different. How should I use libllsm2 as a vocoder for a statistical acoustic model?
Can I implement similar features as synthesizer V by libllsm2?
Synthesizer v can control parameters such as tension, breathness and achieve amazing results, and libllsm2 can distangle voice in harmonic signal and noise signal. Can I use libllsm2 implement similar features? Can you give me some tips about how to implement this function?

Thank you very much!

Vocal Tract Params

Hi ! Thank you so much for your great job.
I am new to speech analysis-synthesis field, my question may seem foolish. (#^.^#)
I hope to not only modify f0 and Rd, but also vocal tract filter. I would like to know can I get the vocal tract params (Vocal Tract Filter) by using your library？ If yes, which function can I use.