Git Product home page Git Product logo

text-to-speech-nodejs's Introduction

# DEPRECATED

This demo and repo is no longer supported. You can find the newly supported Text to Speech Demo here.

πŸ”Š Text to Speech Demo

Node.js sample applications that shows some of the the IBM Watson Text to Speech service features.

Travis semantic-release

Text to Speech is designed for streaming, low latency, synthesis of audio from text. It is the inverse of the automatic speech recognition.

You can view a demo of this app.

Prerequisites

  1. Sign up for an IBM Cloud account.
  2. Download the IBM Cloud CLI.
  3. Create an instance of the Text to Speech service and get your credentials:
    • Go to the Text to Speech page in the IBM Cloud Catalog.
    • Log in to your IBM Cloud account.
    • Click Create.
    • Click Show to view the service credentials.
    • Copy the apikey value.
    • Copy the url value.

Configuring the application

  1. In the application folder, copy the .env.example file and create a file called .env

    cp .env.example .env
    
  2. Open the .env file and add the service credentials that you obtained in the previous step.

    Example .env file that configures the apikey and url for a Text to Speech service instance hosted in the US East region:

    TEXT_TO_SPEECH_IAM_APIKEY=X4rbi8vwZmKpXfowaS3GAsA7vdy17Qh7km5D6EzKLHL2
    TEXT_TO_SPEECH_URL=https://gateway-wdc.watsonplatform.net/text-to-speech/api
    

Running locally

  1. Install the dependencies

    npm install
    
  2. Run the application

    npm start
    
  3. View the application in a browser at localhost:3000

Deploying to IBM Cloud as a Cloud Foundry Application

  1. Login to IBM Cloud with the IBM Cloud CLI

    ibmcloud login
    
  2. Target a Cloud Foundry organization and space.

    ibmcloud target --cf
    
  3. Edit the manifest.yml file. Change the name field to something unique. For example, - name: my-app-name.

  4. Deploy the application

    ibmcloud app push
    
  5. View the application online at the app URL, for example: https://my-app-name.mybluemix.net

Directory structure

.
β”œβ”€β”€ app.js                      // express routes
β”œβ”€β”€ config                      // express configuration
β”‚   β”œβ”€β”€ error-handler.js
β”‚   β”œβ”€β”€ express.js
β”‚   └── security.js
β”œβ”€β”€ manifest.yml
β”œβ”€β”€ package.json
β”œβ”€β”€ public                      // static resources
β”œβ”€β”€ server.js                   // entry point
β”œβ”€β”€ test                        // tests
└── views                       // react components

License

This sample code is licensed under Apache 2.0.

Contributing

See CONTRIBUTING.

Open Source @ IBM

Find more open source projects on the IBM Github Page

text-to-speech-nodejs's People

Contributors

aclm avatar andresfvilla avatar arlemi avatar daniel-bolanos avatar dependabot[bot] avatar ehdsouza avatar esbullington avatar germanattanasio avatar greenkeeperio-bot avatar jeff-arn avatar kasaby avatar kevinkowa avatar kkeerthana avatar kognate avatar leonrch avatar lhuihui avatar lpatino10 avatar mamoonraja avatar mikemosca avatar nfriedly avatar sirspidey avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

text-to-speech-nodejs's Issues

developer experience checklist

  • make sure it scales, it could be in the front page of reddit
    • Mainly looking at page weight: Shoot for 2-3mb max and rendering in 5 sec or less.
    • Test with dev tools throttling to 3G speeds and make sure things are still reasonable.
  • Add google analytics
  • blue-green deployment + travis (see this)
  • testing + travis (see this)
  • security.js (helmet + express-rate-limitation) + ~~~CSRF (see personality-insights and speech demos)~~~
    • Skipping CSRF protection here because it doesn't work on GET, and we have to use GET to stream to <audio> elements. (It honestly doesn't make a lot of sense in general for unauthenticated apps... but we're employing it for the side-effect of making scraping harder. But, again, it doesn't work on GET requests.)
  • package.json should not specify node-engine so that Bluemix will always use the latest one.
  • Google RE-Captcha support (make sure design take this into account when designing a demo)
    • Talk to design to add it for existing demos.
  • Update travis to send emails when there is a tag release.
  • Bluemix deployment tracker and privacy notice

First syllable is lost in created audio after it's downloaded

  1. Issue: First syllable is lost in created audio after it's downloaded, for all en-US English voices, and some other voices.

Example text:
tomato, tomato, tomato.

Allision voice and most languages
After pressing, "Download" and listening ...
Expected: tomato, tomato, tomato
Actual: mato, tomato, tomato (note: first syllable of first word is missing)

Alexandra

Service failing using a dedicated Bluemix account.

I am getting a challenge to enter ID information to pass through a Watson gateway when trying to use the demo application. In Bluemix Public, I am able to provide my user information and use the service; however, when I try to access the service using a Bluemix dedicated account, my credentials do not work. Assistance would be most appreciated.

Code 500 error

{"code":500,"error":"Block-scoped declarations (let, const, function, class) not yet supported outside strict mode"}

How can I fixed this? its about NodeJs version?

Redeploy the demo

Hi, German&James, I've updated the navigation link, service icon and favicon. Please help redeploy the demo :-)

Drop down menu

need to change the order in the drop down box for the voices. Should be Language (Country code): Name of voice? So for example "English (US): Michael". This way it allows for easier sorting by language.

Language translator V2 or V3 ?

Hello , Is this demo already using the Language Translation V3 service ?
Or is it still on V2 ?
I cannot find an entry for TTS V3 in the current Watson SDK .
Is there a a way to bypass that ?
Thank you

Copyright on output?

Thanks for providing!

One quick question: Is the output licensed in a certain way or is it open source as well?

TTS demo does not seem to be streaming

I'm not sure, but from the latency it looks like it is not streaming. I thought this was a recent change by Eric, but maybe not in the published version yet?

Investigate out of memory crashes

Long running app can crash due to out of memory error. (With 768 MB).

Should investigate whether this is due to heavy simultaneous usage, a very large input, an ongoing memory leak, or something else.

Pause in the synthesis of "Consapevole"

There is a pause in the middle of the first word "Consapevole" when synthesizing the default Italian text.
This problem seems to only happen when using Firefox. Works ok on Chrome.
Downloading the audio works fine. I only see this problem when the audio is streamed.

Speech should start playing before it finishes downloading

The service supports streaming of audio, however on both Chrome and Firefox wait until the entire download is finished before they begin playing, creating a lot of lag. Our old demo did not had this bug (it started playing immediately although the slider position was incorrect, since it didn't know the file size).

Audio not downloaded as "transcript.mp3" in Chrome

In Chrome, I see the option to download the audio here:

image

This always downloads the audio as 60aa3761-59f1-4519-baca-f029e9a0d8bc.mp3 instead of transcript.mp3 like when I download through this:

image

Should the GUID be removed?

"SyntaxError: Unexpected token : "on running node app.js

I'm getting an error while trying to start the node server:

"credentials": {
^
SyntaxError: Unexpected token :
at exports.runInThisContext (vm.js:73:16)
at Module._compile (module.js:443:25)
at Object.Module._extensions..js (module.js:478:10)
at Module.load (module.js:355:32)
at Function.Module._load (module.js:310:12)
at Function.Module.runMain (module.js:501:10)
at startup (node.js:129:16)
at node.js:814:3

My app.js looks like the following and the username and password field are filled in properly

{
"credentials": {
"url": "https://stream.watsonplatform.net/text-to-speech/api",
"password": "passwordhere",
"username": "usernamehere"
}
}

Should have visual feedback while waiting for speech

Pressing "Speak" results in no user-visible feedback until the entire download is finished and speech begins. This can often take 10-20 seconds, resulting in a bad user experience. As soon as the Speak button is pressed there should be some visual indication that the request was sent and is waiting for the server.

ETA on non-supported tags?

I'm making a voice file that requires emphasis on some words and audio clips to be played, but neither of those features are supported. Do you know either A) when those features will become supported or B) what other programs I can use that do support those features?

update this fragment for work

var tts_credentials = extend({
  url: 'https://stream.watsonplatform.net/text-to-speech/api',
  version: 'v1',
  username: 'xxxxxxx',
  password: 'xxxxx',
}, bluemix.getServiceCreds('text_to_speech'));

// Create the service wrappers
var textToSpeech = watson.text_to_speech(tts_credentials);

app.get('/synthesize', function(req, res) {
  var transcript = textToSpeech.synthesize(req.query);
  transcript.on('response', function(response) {
	if (req.query.download) {
	  response.headers['content-disposition'] = 'attachment; filename=transcript.ogg';
	}
  });
  transcript.on('error', function(error) {
	console.log('Synthesize error: ', error)
  });
  transcript.pipe(res);
});

I need update this fragment, because i cant hear speak
Tnks

Text to Speech demo returns 500 error

Overview
I checked out demo project from https://github.com/watson-developer-cloud/text-to-speech-nodejs.
I made .env file and input url and apikey that I got from my IBMCloud account.
Then, run locally.
Access from browser and click speak button.
I got 500 error, and there were error message like screenshots.
After few seconds error message disappear.
I retried to click speak button, then run correctlly.
I recognized that 500 error happens only first time when run localserver by "npm start".

Expected behavior
Run locally, then click speak button first time, then run correctlly.

Actual behavior
Run locally, then click speak button first time, there is 500 error and error message.

How to reproduce
Checked out demo project from https://github.com/watson-developer-cloud/text-to-speech-nodejs.
Follow the Readme and run locally, then click speak button.

Screenshots
Before click speak button
γ‚Ήγ‚―γƒͺγƒΌγƒ³γ‚·γƒ§γƒƒγƒˆ 2019-06-04 10 38 58
After click speak button
γ‚Ήγ‚―γƒͺγƒΌγƒ³γ‚·γƒ§γƒƒγƒˆ 2019-06-04 10 39 19
Error message
γ‚Ήγ‚―γƒͺγƒΌγƒ³γ‚·γƒ§γƒƒγƒˆ 2019-06-04 10 23 52

Additional information:

  • OS: MacOS Mojave
  • Which version of Node are you using?: 12.0.0

Ogg Vorbis vs Ogg Opus

The docs (and my experience a while back) indicate that the service only supports Ogg containers with the Opus codec but the demo says "The audio is returned in the Ogg Vorbis format which..."

(FWIW, I'd love to see both codecs supported - some of the IoT hardware I wanted to do a demo with supports Ogg Vorbis but not Ogg Opus.)

Synthesize stops when text contains invalid character(s)

TTS DEMO -- For all in US English Voices only, after pressing "Speak" button the program begins to synthesize text to speech, but stops before the end when text contains invalid character

Expected: For all voices, program synthesizes all text, with any character, to speech
Actual: For US English voices, program does not synthesize text, with invalid character, to speech

Note:
As per Radek, "This is a bug in the demopage; it works fine when communicating directly with https://stream-d.watsonplatform.net/text-to-speech/api/v1/synthesize?voice=en-US_AllisonVoice"

See details below:
When en-US_AllisonVoice, en-US_LisaVoice, or en-US_MichaelVoice is selected, and the text below is entered, and you press "Speak" the program stops speaking at the point "Stops speaking here" is stated in the text box.
And an audio file is successfully created containing only the part spoken out loud and not the entire text.
When any other voice is selected, the program works correctly and completes successfully by not stopping in the middle of the text.

Example text in text box:
Italian:
<phoneme alphabet="ipa" ph=".ʀe.ˈla.to"></phoneme> <phoneme alphabet="ibm" ph="1byan.0ko"></phoneme>

English:
<phoneme alphabet="ipa" ph="tΙ™ΛˆmeΙͺ.ΙΎoʊ"></phoneme> <phoneme alphabet="ipa" ph="tΙ™ΛˆmeΙͺ.ΙΎoʊ"></phoneme> <phoneme alphabet="ibm" ph=".0tx.1me.0Fo"></phoneme>

Castillian Spanish:
<phoneme alphabet="ipa" ph=".ˈθiΕ‹.ko"></phoneme> <phoneme alphabet="ibm" ph="1Tin.0ko"></phoneme>

Italian:
<phoneme alphabet="ipa" ph=".ʀe.ˈla.to"></phoneme> <phoneme alphabet="ibm" ph="1byan.0ko"></phoneme>

English:
<phoneme alphabet="ipa" ph="tΙ™ΛˆmeΙͺ.ΙΎoʊ"></phoneme> <phoneme alphabet="ipa" ph="tΙ™ΛˆmeΙͺ.ΙΎoʊ"></phoneme> <phoneme alphabet="ibm" ph=".0tx.1me.0Fo"></phoneme>

Castillian Spanish:
<phoneme alphabet="ipa" ph=".ˈθiΕ‹.ko"></phoneme> <phoneme alphabet="ibm" ph="1Tin.0ko"></phoneme> -- Stops speaking here

North American Spanish:
<phoneme alphabet="ipa" ph=".in.di.ˈβi.ðwo"></phoneme> <phoneme alphabet="ibm" ph=".0e.1DaD"></phoneme>

Brazilian Portuguese


French:

US English


UK English


German:

Alexandra

Support for MP3 output

Add support for MP3 output (doesn't have to be streamed). Currently greatest support for audio support in HTML is through MP3. Ogg is only supported by a few browsers. If we had MP3 support it would be easier to use this service in our product.

Add the ability to receive the Word timing metadata of a TTS stream

We currently use TTS for our product, but the quality of this is much better than anything else out there. We'd like to use this, but our product requires that we know the timings of when words (or phonemes) are said. One of the TTS products we currently use gives us this data and we use it. It'd be nice if this would export that metadata along with the audio file so we could use this service.

We currently parse this from the metadata in an id3v2 tags that are stuck in Comment sections. We currently get something like this:

timed_phonemes
word start_in_ms end_in_ms amplitude
word ...
...

Not married to the format, but just to illustrate what type of information we'd need.

TTS Demo -- "Too many requests, please try again later.", when I press "Speak" button too often

Issue: TTS Demo -- Too many requests, please try again later.
When I make about 5 updates to the text box in a minute, I get an error message: "Too many requests, please try again later."

or when I press the "Speak" button, too often (23 times in a row), I get an error message: "Too many requests, please try again later." (See attachment for a screen shot after this test).

Expected: No error message
Actual: I get an error message: "Too many requests, please try again later."

Alexandra
TTS Demo -- Too many requests, please try again later on June 28 2016.zip

Support for semi colon (;)

Based on a question in dAnswers: https://developer.ibm.com/answers/questions/186320/text-to-speech-doesnt-like-semi-colons.html

Having just tried out the text to speech demo (English) all was going well until it reached a semi colon (;) in the text.
It was a fair amount of text (several paragraphs) and I though it might might only read the first paragraph so I deleted this character and tried again and it did indeed read past that point and past all other candidate ending points all the way the end end successfully.
I can only therefore deduce that a semi-colon is not supported? Is this intentional or a bug? Are there any other characters that should be stripped out of the text?

Demo throws error for valid Unicode representation of "E"

The demo page at reports error

`

Input contains unsupported IPA character 0X00000045

`

for SSML input

<speak version="1.0">
  <phoneme alphabet="ipa" ph="&#x02C8;&#x0045;s">s</phoneme>
</speak>

where &#x02C8;&#x0045;s should correpond to 'Es

Add GDPR Language to Stand Alone Demo Pages

Context:
GDPR will go into effect on May 25th; we need the demos to display the following language in order to make them compliant with the law:

This system is for demonstration purposes only and is not intended to process Personal Data. No Personal Data is to be entered into this system as it may not have the necessary controls in place to meet the requirements of the General Data Protection Regulation (EU) 2016/679.

For more information see: https://github.ibm.com/Watson/developer-experience/issues/4342

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.