synesthesiam / opentts Goto Github PK

View Code? Open in Web Editor NEW

858.0 20.0 120.0 280.66 MB

Open Text to Speech Server

License: MIT License

Dockerfile 0.35% Makefile 0.36% Python 56.60% HTML 11.48% Shell 3.36% Scheme 27.85%

opentts's People

Contributors

Stargazers

Watchers

Forkers

lilbuh msgpo wjlee-barco alexbarcelo poveteen drsensor 3dimaging franc6 wurielle kamalakkannan1984 x0911 jdamour ccaiccie mishav78 5chmidt hoccoban andreibosco ishine michaelwalkerfl maxmax2016 xzm2004260 whitefu sciai-ai tomnattle curiousme-lab snowyu billyjabs nagyrobi noscripter mrskeleker w-okada elijahahianyo jayyoon0525 suryatmodulus breadam bangfutao frontierdk rwaight jeffgong salvadorant system1system2 akbarazimifar rohankumardubey lcsouzamenezes swirkes barseghyanartur awsh-code colakelfatih theheartofraven eatskolnikov smileyraj paufan 344303947 qqq-tech arnaudallogene nguyenbinhanltv caikn angleszhaixd kayju overcyber barathrajkb wenrui-dev kerbymart adelriosantiago hubbleiuri123 anandkumarsp oiklite01 liyanwen0710 pleasenoban modeverv cyb3rstudio hireiogpt pahrizal ufodriverr zauberpilz jobajuba kmouratidis thefaizan prome02 ricson-hoo alexlawrence khoanguyen1806 iforking m0rp43us zuiqingfen casperdoudou itlaohuo nagyist qinzhuguang sanyaade-projects anuragt1104 foclgithub gkevinhou jasonzhang761213 vsuns storminstakk ikitcheng labknowledge rootgil tomato-dreamer

opentts's Issues

PermissionError Operation Not Permitted

I'm running the docker command docker run -it -p 5500:5500 synesthesiam/opentts:en --no-espeak but getting the following error:

Traceback (most recent call last):
  File "/app/app.py", line 46, in <module>
    _LOOP = asyncio.get_event_loop()
  File "/app/usr/local/lib/python3.7/asyncio/events.py", line 640, in get_event_loop
    self.set_event_loop(self.new_event_loop())
  File "/app/usr/local/lib/python3.7/asyncio/events.py", line 660, in new_event_loop
    return self._loop_factory()
  File "/app/usr/local/lib/python3.7/asyncio/unix_events.py", line 51, in __init__
    super().__init__(selector)
  File "/app/usr/local/lib/python3.7/asyncio/selector_events.py", line 54, in __init__
    super().__init__()
  File "/app/usr/local/lib/python3.7/asyncio/base_events.py", line 370, in __init__
    self._clock_resolution = time.get_clock_info('monotonic').resolution
PermissionError: [Errno 1] Operation not permitted
Exception ignored in: <function BaseEventLoop.__del__ at 0x765cf150>
Traceback (most recent call last):
  File "/app/usr/local/lib/python3.7/asyncio/base_events.py", line 625, in __del__
    warnings.warn(f"unclosed event loop {self!r}", ResourceWarning,
  File "/app/usr/local/lib/python3.7/asyncio/base_events.py", line 389, in __repr__
    f'<{self.__class__.__name__} running={self.is_running()} '
  File "/app/usr/local/lib/python3.7/asyncio/base_events.py", line 1805, in get_debug
    return self._debug
AttributeError: '_UnixSelectorEventLoop' object has no attribute '_debug'

I see where it's line 46 in app.py that's causing the error, but no idea why or how. Maybe it's something else?

MaryTTS : ValueError: invalid literal for int() with base 10: ''

Hey it looks like Mary TTS crash when asking to convert a big text, after the error on the screenshot above, if we try to click on the "Speak" button again, we get this error message :

ConnectionResetError: Connection lost

Here's the piece of text i took from a random article to try out the voices.


Rassurez-vous, on peut aussi faire de très jolies photos à d’autres moments de la journée. Mais vous devrez sans doute composer avec les ombres. À midi par exemple, le soleil est à son zénith ce qui laisse apparaître beaucoup d’ombre, notamment sur les sujets humains ou les animaux. En revanche, cette lumière apporte du contraste sur les photos de paysages majestueux.```

Festival breaks on special characters

I was trying the catalan festival voice and it works well except on special characters (like à, é, í, ç...). Same misbehaviour happens also in spanish. Other backends (espeak, nanotts) work correctly with those same characters.

These languages require those special characters. When they appear in a word, the TTS behaves funny and avoids that letter.

Maybe it is related to festival not supporting UTF-8? I just found this link https://www.web3.lu/character-encoding-for-festival-tts-files/ but I know nothing about OpenTTS internals or Festival internals. If that is indeed the case, maybe it is required to do some encoding changing from UTF-8 to ISO-8859-15 for the festival backend? Does that make sense?

FileNotFoundError: [Errno 2] No such file or directory: '/home/opentts/app/VERSION'

Docker Image: synesthesiam/opentts:en-2.1

Docker container fails to run and produces the following error in the log:

Traceback (most recent call last):
  File "/home/opentts/app/app.py", line 50, in <module>
...
FileNotFoundError: [Errno 2] No such file or directory: '/home/opentts/app/VERSION'

I skimmed the Dockerfile and it seems as if it isn't configured to copy the VERSION file into the /home/opentts/app directory during the build. I'm not quite savvy enough to test/fix it, but it seems to me that might be where the problem is.

docker container

hi
I am a pretty basic user, I installed docker on my Mac, pulled the docker image and running it.

I am getting following error , can anyone help please

Error when trying to use any larynx voices

Thank you for the amazing project / container, but I am getting the following error when trying to use any larynx voice using :latest (v2.1)

ERROR:opentts:/onnxruntime_src/onnxruntime/core/platform/posix/env.cc:183 onnxruntime::{anonymous}::PosixThread::PosixThread(const char*, int, unsigned int ()(int, Eigen::ThreadPoolInterface), Eigen::ThreadPoolInterface*, const onnxruntime::ThreadOptions&) pthread_setaffinity_np failed, error code: 0 error msg:
Traceback (most recent call last):
File "/home/opentts/app/.venv/lib/python3.9/site-packages/quart/app.py", line 1490, in full_dispatch_request
result = await self.dispatch_request(request_context)
File "/home/opentts/app/.venv/lib/python3.9/site-packages/quart/app.py", line 1536, in dispatch_request
return await self.ensure_async(handler)(**request_.view_args)
File "/home/opentts/app/app.py", line 718, in app_say
wav_bytes = await text_to_wav(
File "/home/opentts/app/app.py", line 368, in text_to_wav
wavs = [result async for result in wavs_gen]
File "/home/opentts/app/app.py", line 368, in
wavs = [result async for result in wavs_gen]
File "/home/opentts/app/app.py", line 469, in text_to_wavs
line_wav_bytes = await tts.say(line, voice_id, *say_args)
File "/home/opentts/app/tts.py", line 1288, in say
for result in results:
File "/home/opentts/app/larynx/init.py", line 88, in text_to_speech
tts_model = get_tts_model(
File "/home/opentts/app/larynx/init.py", line 300, in get_tts_model
model = load_tts_model(voice_model_type, model_dir,)
File "/home/opentts/app/larynx/init.py", line 337, in load_tts_model
return GlowTextToSpeech(config)
File "/home/opentts/app/larynx/glow_tts.py", line 25, in init
self.onnx_model = onnxruntime.InferenceSession(
File "/home/opentts/app/.venv/lib/python3.9/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 335, in init
self._create_inference_session(providers, provider_options, disabled_optimizers)
File "/home/opentts/app/.venv/lib/python3.9/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 368, in _create_inference_session
sess = C.InferenceSession(session_options, self._model_path, True, self._read_config_from_model)
RuntimeError: /onnxruntime_src/onnxruntime/core/platform/posix/env.cc:183 onnxruntime::{anonymous}::PosixThread::PosixThread(const char, int, unsigned int ()(int, Eigen::ThreadPoolInterface), Eigen::ThreadPoolInterface*, const onnxruntime::ThreadOptions&) pthread_setaffinity_np failed, error code: 0 error msg:

I did find this, not sure if it's the actual issue or not and or how to implement the property:

microsoft/onnxruntime#10113

docker image with nanotts voice

I would like to use nanotts voice because I think the quality for italian language is good.
But "latest" image and "all" image don't contain this component.
Only "it-2.1" works with nanotts.
Is there a pre-compiled image that contains all voices and all languages?

zh not available

docker run -it -p 5500:5500 synesthesiam/opentts:zh

How to install as an engine on Windows 10?

Hello. I want a third-party application to use a voice other than the default available on Windows 10.
I'm googling this stuff but can't figure out a way to install anything. Only the frameworks and docker containers. But I need to install it into Windows so the application would give me a choise to use it.

The Hungarian samples are in finnish

The Hungarian sample is finnish at: https://synesthesiam.github.io/opentts/#glow-speak_hu_hu_diana_majlinger

[Question] Guidance adding (coquitts) voices

Hi!
Thank you for all the work!

I wanted to ask for guidance on adding voices to opentts.
I understand that I would have to compile opentts again and build the docker image, as explained in the README.
What I am unsure about is what are the voices files and how to add more voices.

For the case of CoquiTTS, I get that I should put the voices file in the voices\coqui-tts folder before building.
Some voices files used in opentts by default are part of the release files.

My main question is: which files from coqui do you need to make up a voice?
I am personally interested in adding a german voice to opentts.

An execution error occurs when certain strings are included.

An execution error occurs when certain strings are included.
For example, when a string such as "<" is unborn.

Error Message in Terminal

$ docker run -it -p 5500:5500 synesthesiam/opentts:ja

Traceback (most recent call last):
  File "/home/opentts/app/.venv/lib/python3.9/site-packages/quart/app.py", line 1490, in full_dispatch_request
    result = await self.dispatch_request(request_context)
  File "/home/opentts/app/.venv/lib/python3.9/site-packages/quart/app.py", line 1536, in dispatch_request
    return await self.ensure_async(handler)(**request_.view_args)
  File "/home/opentts/app/app.py", line 718, in app_say
    wav_bytes = await text_to_wav(
  File "/home/opentts/app/app.py", line 368, in text_to_wav
    wavs = [result async for result in wavs_gen]
  File "/home/opentts/app/app.py", line 368, in <listcomp>
    wavs = [result async for result in wavs_gen]
  File "/home/opentts/app/app.py", line 492, in ssml_to_wavs
    for sent_index, sentence in enumerate(
  File "/home/opentts/app/.venv/lib/python3.9/site-packages/gruut/__init__.py", line 79, in sentences
    graph, root = text_processor(text, lang=lang, ssml=ssml, **process_args)
  File "/home/opentts/app/.venv/lib/python3.9/site-packages/gruut/text_processor.py", line 439, in __call__
    return self.process(*args, **kwargs)
  File "/home/opentts/app/.venv/lib/python3.9/site-packages/gruut/text_processor.py", line 490, in process
    root_element = etree.fromstring(f"<speak>{text}</speak>")
  File "/usr/lib/python3.9/xml/etree/ElementTree.py", line 1347, in XML
    parser.feed(text)
xml.etree.ElementTree.ParseError: not well-formed (invalid token): line 1, column 10

Error Message in Devtools

（インデックス）:190 
        
        
       GET http://0.0.0.0:5500/api/tts?voice=coqui-tts%3Aja_kokoro&lang=ja&vocoder=high&denoiserStrength=0.005&text=%3C%E6%A6%82%E8%A6%81&speakerId=&ssml=true&ssmlNumbers=true&ssmlDates=true&ssmlCurrency=true&cache=false 500 (Internal Server Error)

How can I add custom voice ? should I add /home/opentts/app/voices/ ?

Thanks for opentts.

I'm curious about how to add more voices.

I think adding voices and make new folder and deployment somefiles generator.onnx

for example, i've ko_kss voices then there is /app/voices/glow-speak/ko_kss

well I have question about how to add custom voices.

I think I have to make generator.onnx file . but it's not easy part.

anyone help about that ?

update for Mozilla tts

Now Mozilla TTS is a lot faster and it can run on both pytorch and tensorflow. Maybe you can consider to update the interface here to support these new models. Some useful links and examples.

https://colab.research.google.com/drive/1u_16ZzHjKYFn1HNVuA4Qf_i2MMFB9olY?usp=sharing

https://colab.research.google.com/drive/1LgQpdbgLHjyjTxgs6LHaKH_yl0luxkhT?usp=sharing

[CONTRIBUTION] Speech Dataset Generator

Hi everyone!

My name is David Martin Rius and I have just published this project on GitHub: https://github.com/davidmartinrius/speech-dataset-generator/

Now you can create datasets automatically with any audio or lists of audios.

I hope you find it useful.

Here are the key functionalities of the project:

Dataset Generation: Creation of multilingual datasets with Mean Opinion Score (MOS).
Silence Removal: It includes a feature to remove silences from audio files, enhancing the overall quality.
Sound Quality Improvement: It improves the quality of the audio when needed.
Audio Segmentation: It can segment audio files within specified second ranges.
Transcription: The project transcribes the segmented audio, providing a textual representation.
Gender Identification: It identifies the gender of each speaker in the audio.
Pyannote Embeddings: Utilizes pyannote embeddings for speaker detection across multiple audio files.
Automatic Speaker Naming: Automatically assigns names to speakers detected in multiple audios.
Multiple Speaker Detection: Capable of detecting multiple speakers within each audio file.
Store speaker embeddings: The speakers are detected and stored in a Chroma database, so you do not need to assign a speaker name.
Syllabic and words-per-minute metrics

Feel free to explore the project at https://github.com/davidmartinrius/speech-dataset-generator

David Martin Rius

MozillaTTS support removed in v2.1

In v2.1, support for MozillaTTS was removed, see 6de77a7 , file tts.py, lines 920ff.

As I do not see this mentioned in the CHANGELOG, I was wondering if this was intentional, and if yes, why?

OpenTTS provided a very handy way to use MozillaTTS with MariaTTS-compatible applications such as Home Assistant.

Add Festival Spanish voice

Hi, Is there a manual to install other voices?, I ran opentts via docker container and after I installed .deb festival package with the new voice and test in console, but when I refeshed the opentts site and api never updated the list of spanish's voices. I think I need to update some files to configure these new lenguages to appere in the web interface.

Or here is the packages if you want to add to opentts (I almost sure this files are opensource)
https://github.com/guadalinex-archive/hispavoces

Larynx voices sometimes erroring

Hello, today I was trying to use one of the new larynx voices and got this traceback.

Traceback (most recent call last):
  File "/app/usr/local/lib/python3.7/site-packages/quart/app.py", line 1821, in full_dispatch_request
    result = await self.dispatch_request(request_context)
  File "/app/usr/local/lib/python3.7/site-packages/quart/app.py", line 1869, in dispatch_request
    return await handler(**request_.view_args)
  File "/app/app.py", line 371, in app_say
    use_cache=use_cache,
  File "/app/app.py", line 244, in text_to_wav
    line, voice_id, vocoder=vocoder, denoiser_strength=denoiser_strength
  File "/app/tts.py", line 1377, in say
    for _, audio in text_and_audios:
  File "/app/usr/local/lib/python3.7/site-packages/larynx/__init__.py", line 82, in text_to_speech
    sentence.tokens, word_indexes=word_indexes, word_breaks=True
  File "/app/usr/local/lib/python3.7/site-packages/gruut/phonemize.py", line 207, in phonemize
    for word, word_phonemes in self.predict(words=words_to_guess):
  File "/app/usr/local/lib/python3.7/site-packages/gruut/phonemize.py", line 247, in predict
    words, model_path=self.g2p_model_path, **kwargs
  File "/app/usr/local/lib/python3.7/site-packages/phonetisaurus/__init__.py", line 60, in predict
    phonetisaurus_cmd, env=env, universal_newlines=True
  File "/app/usr/local/lib/python3.7/subprocess.py", line 411, in check_output
    **kwargs).stdout
  File "/app/usr/local/lib/python3.7/subprocess.py", line 512, in run
    output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '['phonetisaurus-apply', '--model', '/app/usr/local/lib/python3.7/site-packages/gruut/data/en-us/g2p.fst', '--word_list', '/tmp/tmpyuws4fry.txt', '--nbest', '1']' returned non-zero exit status 127.
Exception ignored in: <function Wave_write.__del__ at 0x7f81e4ad1e60>
Traceback (most recent call last):
  File "/app/usr/local/lib/python3.7/wave.py", line 327, in __del__
    self.close()
  File "/app/usr/local/lib/python3.7/wave.py", line 445, in close
    self._ensure_header_written(0)
  File "/app/usr/local/lib/python3.7/wave.py", line 463, in _ensure_header_written
    raise Error('# channels not specified')
wave.Error: # channels not specified

BTW: Thank you so much for making this! It is amazing to have so many voices easily accessible.

[Request] Can anybody put a Colab together for this?

It would be cool to have a Colab version of OpenTTS instead of having to rely on Docker.

Anybody can put this together? (or maybe it has already been done in a fork?)

ImportError: cannot import name 'escape' from 'jinja2'

Failed build running into:

ImportError: cannot import name 'escape' from 'jinja2'

Per SO post due to depreciation of the escape function in more recent versions of jinja2
https://stackoverflow.com/questions/71718167/importerror-cannot-import-name-escape-from-jinja2

FIX:

force requirements by adding:

jinja2<3.1.0

to the requirements file.

Fast synthesis for speech length estimation

Hi, thanks for the great software,

I was wondering if there is a way to scale down the voice quality (which is very good btw) to accelerate synthesis. I often use opentts to merely estimate the length of a given spoken text, and only need the high quality version. It currently takes quite some time to synthesize a 10 minutes text. Any ideas ?

Cheers

{request} Not updated the repository !

In DockerHub you have updates the quality of voice and features but when I build it using github docker file voice quality is poor
So try to update it

OpenBLAS Warning when using coqui-tts

Running OpenTTS in docker.

When using coqui-tts the following message appears in the console and the application becomes non responsive.

OpenBLAS Warning: Detect OpenMP Loop and this application may hang. Please rebuild the library with USE_OPENMP=1 option

Still active?

Is this repo going to be maintained? It looks like the last updates were 3 years ago.

Please Add chinese voice to glow-speak engine

https://github.com/rhasspy/glow-speak

Chinese
cmn_jing_li

And ESpeak Chinese should in zh instead cmn, yue

Could not initialize NNPACK! Reason: Unsupported hardware

‘’‘
INFO:opentts:Synthesizing with coqui-tts:zh_baker (3 char(s))...

Using model: tacotron2
Model's reduction rate r is set to: 2
Vocoder Model: fullband_melgan
Generator Model: fullband_melgan_generator
Discriminator Model: melgan_multiscale_discriminator
INFO:opentts:Synthesizing with coqui-tts:zh_baker (9 char(s))...
Text splitted to sentences.
Text splitted to sentences.
['门开了.']
['开锁失败请再试一次.']
Building prefix dict from the default dictionary ...
DEBUG:jieba:Building prefix dict from the default dictionary ...
Loading model from cache /tmp/jieba.cache
DEBUG:jieba:Loading model from cache /tmp/jieba.cache
Loading model cost 0.891 seconds.
DEBUG:jieba:Loading model cost 0.891 seconds.
Prefix dict has been built successfully.
DEBUG:jieba:Prefix dict has been built successfully.
[W NNPACK.cpp:80] Could not initialize NNPACK! Reason: Unsupported hardware.
ERROR:opentts:Sizes of tensors must match except in dimension 1. Got 15 and 39 in dimension 2 (The offending index is 1)
Traceback (most recent call last):
File "/home/opentts/app/.venv/lib/python3.9/site-packages/quart/app.py", line 1490, in full_dispatch_request
result = await self.dispatch_request(request_context)
File "/home/opentts/app/.venv/lib/python3.9/site-packages/quart/app.py", line 1536, in dispatch_request
return await self.ensure_async(handler)(**request_.view_args)
File "/home/opentts/app/app.py", line 718, in app_say
wav_bytes = await text_to_wav(
File "/home/opentts/app/app.py", line 368, in text_to_wav
wavs = [result async for result in wavs_gen]
File "/home/opentts/app/app.py", line 368, in
wavs = [result async for result in wavs_gen]
File "/home/opentts/app/app.py", line 469, in text_to_wavs
line_wav_bytes = await tts.say(line, voice_id, **say_args)
File "/home/opentts/app/tts.py", line 1716, in say
audio = await loop.run_in_executor(
File "/usr/lib/python3.9/concurrent/futures/thread.py", line 52, in run
result = self.fn(*self.args, **self.kwargs)
File "/home/opentts/app/TTS/utils/synthesizer.py", line 303, in tts
outputs = synthesis(
File "/home/opentts/app/TTS/tts/utils/synthesis.py", line 271, in synthesis
outputs = run_model_torch(
File "/home/opentts/app/TTS/tts/utils/synthesis.py", line 100, in run_model_torch
outputs = _func(
File "/home/opentts/app/.venv/lib/python3.9/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "/home/opentts/app/TTS/tts/models/tacotron2.py", line 229, in inference
decoder_outputs, alignments, stop_tokens = self.decoder.inference(
File "/home/opentts/app/TTS/tts/layers/tacotron/tacotron2.py", line 397, in inference
decoder_output, alignment, stop_token = self.decode(memory)
File "/home/opentts/app/TTS/tts/layers/tacotron/tacotron2.py", line 314, in decode
self.context = self.attention(
File "/home/opentts/app/.venv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 889, in call_impl
result = self.forward(*input, **kwargs)
File "/home/opentts/app/TTS/tts/layers/tacotron/attentions.py", line 322, in forward
attention, _ = self.get_location_attention(query, processed_inputs)
File "/home/opentts/app/TTS/tts/layers/tacotron/attentions.py", line 252, in get_location_attention
attention_cat = torch.cat(
RuntimeError: Sizes of tensors must match except in dimension 1. Got 15 and 39 in dimension 2 (The offending index is 1)
ERROR:opentts:Sizes of tensors must match except in dimension 1. Got 15 and 39 in dimension 2 (The offending index is 1)
Traceback (most recent call last):
File "/home/opentts/app/.venv/lib/python3.9/site-packages/quart/app.py", line 1490, in full_dispatch_request
result = await self.dispatch_request(request_context)
File "/home/opentts/app/.venv/lib/python3.9/site-packages/quart/app.py", line 1536, in dispatch_request
return await self.ensure_async(handler)(**request.view_args)
File "/home/opentts/app/app.py", line 718, in app_say
wav_bytes = await text_to_wav(
File "/home/opentts/app/app.py", line 368, in text_to_wav
wavs = [result async for result in wavs_gen]
File "/home/opentts/app/app.py", line 368, in
wavs = [result async for result in wavs_gen]
File "/home/opentts/app/app.py", line 469, in text_to_wavs
line_wav_bytes = await tts.say(line, voice_id, **say_args)
File "/home/opentts/app/tts.py", line 1716, in say
audio = await loop.run_in_executor(
File "/usr/lib/python3.9/concurrent/futures/thread.py", line 52, in run
result = self.fn(*self.args, **self.kwargs)
File "/home/opentts/app/TTS/utils/synthesizer.py", line 303, in tts
outputs = synthesis(
File "/home/opentts/app/TTS/tts/utils/synthesis.py", line 271, in synthesis
outputs = run_model_torch(
File "/home/opentts/app/TTS/tts/utils/synthesis.py", line 100, in run_model_torch
outputs = _func(
File "/home/opentts/app/.venv/lib/python3.9/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "/home/opentts/app/TTS/tts/models/tacotron2.py", line 229, in inference
decoder_outputs, alignments, stop_tokens = self.decoder.inference(
File "/home/opentts/app/TTS/tts/layers/tacotron/tacotron2.py", line 397, in inference
decoder_output, alignment, stop_token = self.decode(memory)
File "/home/opentts/app/TTS/tts/layers/tacotron/tacotron2.py", line 314, in decode
self.context = self.attention(
File "/home/opentts/app/.venv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/opentts/app/TTS/tts/layers/tacotron/attentions.py", line 322, in forward
attention, _ = self.get_location_attention(query, processed_inputs)
File "/home/opentts/app/TTS/tts/layers/tacotron/attentions.py", line 252, in get_location_attention
attention_cat = torch.cat(
RuntimeError: Sizes of tensors must match except in dimension 1. Got 15 and 39 in dimension 2 (The offending index is 1)
’‘’

我从docker中拉取的镜像，并运行起来，没有中文选项

Standalone version OS X

Sure would be cool for us non-coders to be able to use something besides MaryTTS' standalone from 2016. Can't stand the Mac OS X voices, especially since they are not available for commercial usage.

SSML doesn't work (for me?)

Hi,

I might do it wrong, but I try to use SSML to add breaks to my text. So I activated the checkbox SSML and wrapped everything in the tag. Then I added the to my transcript. It gets totally ignored. Am I missing anything here?

what is the synesthesiam/opents container 's default password for root?

When using the synesthesiam/opents container, you do not have permission to install packages and cannot switch to root through sudo or su.

May I ask what is the default password for root?

https://hub.docker.com/r/synesthesiam/opentts

--use_cuda true

Are there any plans in the future to add a GPU docker image so we could pass a use_cuda flag. Like this:
https://github.com/synesthesiam/coqui-docker/tree/master/coqui-tts

mozilla-tts: how to build

I am currently trying to reproduce your docker container,
but versions sounds a lot metallic compared to yours.
I copied out the model but it does not sound as good.
Can you reproduce what git commit you used?

ValueError on ARM chip

Hi Michael. Your work allows me to install TTS with ease. There is no issue with the Intel chip. Recently I installed on an ARM-based VPS and things showed up. While there is no issue with other voice ids, the following one always showed up with Voice id: coqui-tts:en_ljspeech, which IMO is the best one.

Hope you might have time to have a look.

The current one I have installed here: Voice id: http://168.138.190.231:5555/

Cheer.

Voice id: coqui-tts:en_ljspeech
ValueError: On entry to DLASCL parameter number 4 had an illegal value

Integrating Opentts into a android tts engine.

Hello!
I thought i'd attempt to raise this issue here, as it seems fit. Be aware im quite new to working with or understanding source code.
I have been trying to integrate the opentts api into a open source android system-wide tts (tts-server-android). It was going well except there seems to be a conflict with the opentts api when trying to do a http request to it.

The application allows for custom http requests, in this format:

"The format is the same as the Legado APP network TTS engine：
http://url, {"method":"POST", "body": "POST body. support using {{js code or variable}} "}

Built-in variables：

Text：{{speakText}}

Speed：{{speakSpeed}}

Volume：{{speakVolume}}

Baidu Example：
http://tsn.baidu.com/text2audio,{"method": "POST", "body": "tex={{encodeURI(speakText)}}&spd={{speakSpeed}}&per=4114&cuid=baidu_speech_demo&idx=1&cod=2&lan=zh&ctp=1&pdt=220&vol={{speakVolume}}&aue=6&pit=5&res_tag=audio"} "

I tried to make a custom http request to the opentts server running in docker.
Using this url:
http://192.168.0.226:5500/api/tts?voice=larynx%3Acmu_aew-glow_tts&text={{java.encodeURI(speakText)}}&vocoder=low&denoiserStrength=0&cache=true

Some raw inputs work and others seem to conflict with the syntax.

This does not work:

from an intelligence explosion (Good 1965): a process in which software based intelligent minds enter a runaway reaction of self improvement cycles, with each new and more intelligent generation appearing faster than its predecessor

log output:

Failed: (1) cc.l: Expected start of the object '{', but had 'EOF' instead at path: $ JSON input: %20with%20each%20new%20and%20more%20intelligent%20generation%20appearing%20faster%20than%20its%20predecessor.&vocoder=low&denoiserStrength=0&cache=true

This does work:

Part I of this volume is dedicated to essays which argue that progress in artificial intelligence and machine learning may indeed increase machine intelligence beyond that of any human being.

I'm curious to see what you think (or if you notice a issue I cant seem to detect.), as I strongly believe if I can get this reliably integrated into this application, I will have a functioning and incredibly good quality tts, that might encourage further development. So far the text that does get parsed is incredble.

Additionally here is a link to a issue raised by myself to the developer of the android application aswell. It has more detail on the information specific to the application itself.

SIOD ERROR: could not open file hindi_phones.scm

I keep getting that error when I use hindi tts. It would be nice if you include festival-hi* packages in the container.

Thanks for the image!

generated audio timestamps

i'm trying to use the generated audio for some automation.
Is there any way to ascertain something like word/character "timestamps" from the generation process? either would work.
Obviously the tts blends, it isn't sounding one character or one word at a time, but i'd imagine it still has to organise itself somehow.

Sorry i'm not too familiar with how tts engines work, hopefully that makes sense?

Python Basics Example/Demo

Hi Team,
I ended up here from browsing HackerNews where many people were looking for open-source TTS software packages: https://news.ycombinator.com/item?id=34211457
I started having a go with OpenTTS but was significantly slowed down since I could not quickly find a nice basic python implementation showing exactly how to get it up and running (i.e. in python read aloud "hello world" in one of the many voices). Is there any possibility of such a thing being put on the repo for people to build upon, rather than the html interface focus at present?

Original error was: libcblas.so.3: cannot open shared object file: No such file or directory

I use
docker run -it -p 5500:5500 synesthesiam/opentts:zh

Error
ImportError: IMPORTANT: PLEASE READ THIS FOR ADVICE ON HOW TO SOLVE THIS ISSUE! Importing the numpy C-extensions failed. This error can happen for many reasons, often due to issues with your setup or how NumPy was installed. We have compiled some common reasons and troubleshooting tips at: https://numpy.org/devdocs/user/troubleshooting-importerror.html Please note and check the following: * The Python version is: Python3.9 from "/home/opentts/app/.venv/bin/python3" * The NumPy version is: "1.20.3" and make sure that they are the versions you expect. Please carefully study the documentation linked above for further help. Original error was: libcblas.so.3: cannot open shared object file: No such file or directory

Anyway to queue operations?

I am using the opentts system to dynamically generate audio files through an automation platform. When I call the API to generate multiple files at the same time, it distorts each file if the text to generate is long. For smaller files, it won't be a problem, but we will be scaling up to larger text generations and it will be a problem in the future. Is there a way to queue operations so it will only process 1 operation at a time, and won't distort the current file?

Couldn't edit python_packages not a regular file

What could be the problem

Error during build
Couldn't edit python_packages not a regular file

PermissionError: [Errno 1] Operation not permitted

opentts_1  | Traceback (most recent call last):
opentts_1  |   File "/home/opentts/app/app.py", line 54, in <module>
opentts_1  |     _LOOP = asyncio.get_event_loop()
opentts_1  |   File "/usr/lib/python3.9/asyncio/events.py", line 639, in get_event_loop
opentts_1  |     self.set_event_loop(self.new_event_loop())
opentts_1  |   File "/usr/lib/python3.9/asyncio/events.py", line 659, in new_event_loop
opentts_1  |     return self._loop_factory()
opentts_1  |   File "/usr/lib/python3.9/asyncio/unix_events.py", line 54, in __init__
opentts_1  |     super().__init__(selector)
opentts_1  |   File "/usr/lib/python3.9/asyncio/selector_events.py", line 55, in __init__
opentts_1  |     super().__init__()
opentts_1  |   File "/usr/lib/python3.9/asyncio/base_events.py", line 397, in __init__
opentts_1  |     self._clock_resolution = time.get_clock_info('monotonic').resolution
opentts_1  | PermissionError: [Errno 1] Operation not permitted
opentts_1  | Exception ignored in: <function BaseEventLoop.__del__ at 0x76750f58>
opentts_1  | Traceback (most recent call last):
opentts_1  |   File "/usr/lib/python3.9/asyncio/base_events.py", line 681, in __del__
opentts_1  |     _warn(f"unclosed event loop {self!r}", ResourceWarning, source=self)
opentts_1  |   File "/usr/lib/python3.9/asyncio/base_events.py", line 419, in __repr__
opentts_1  |     f'closed={self.is_closed()} debug={self.get_debug()}>'
opentts_1  |   File "/usr/lib/python3.9/asyncio/base_events.py", line 1909, in get_debug
opentts_1  |     return self._debug

Trying to run in raspberry pi 3. My docker-compose

  opentts:
    image: synesthesiam/opentts:fi
    restart: unless-stopped
    volumes:
      - /etc/localtime:/etc/localtime:ro
    ports:
        - "5500:5500"

Unable to build: .dockerargs: No such file or directory

Hi, I've made some source changes to give cache files human-readable names (rather than hashed file names). I'm now trying to build the project with make en so that those changes will take effect. But I'm getting the following error:

$ make en
./configure --language en
en
./configure: line 484: build_packages[@]: unbound variable
xargs < .dockerargs docker buildx build . -f Dockerfile  --output=type=docker --tag synesthesiam/opentts:en --tag synesthesiam/opentts:latest
/bin/sh: .dockerargs: No such file or directory
make: *** [en] Error 1

In the Makefile, I see the reference to .dockerargs but there doesn't seem to be a .dockerargs file in the directory.

I'm on MacOS 10.15.