pgmichael / wavenet-for-chrome Goto Github PK

Chrome extension that transforms highlighted text into high-quality natural sounding audio using Google Cloud's Text-to-Speech.

Home Page: http://wavenet-for-chrome.com

License: MIT License

HTML 1.67% JavaScript 20.61% CSS 0.38% TypeScript 77.34%

wavenet-for-chrome's People

Contributors

Stargazers

Watchers

wavenet-for-chrome's Issues

Some voices skewed and distorted

en-US-Wavenet-G.
Happen randomly and fully or partially.

Shortcut download a 0 byte file.

Using the shortcut Ctrl+Shift+E downloads an empty 0 byte file. Texts are selected while key press.
Running from the context menu work fine.

Google Chrome
Version 87.0.4280.66 (Official Build) (64-bit)

Audio - Invalid argument

I get an error:

Failed to synthesize text
Request contains an invalid argument.

for SSML like:

Audio MP3 inclusions worked fine just yesterday, but not today.

I tried a couple hostings and WAV-format as well, same thing.

Any ideas?

Replicating Wavenet-for-Chrome in Google TTS NodeJS

This is brilliant, so handy.

Just a query regarding bitrate and overall quality. When using "Download as MP3" the quality from Wavenet-for-Chrome far exceeds what I am getting out of using the NodeJS version of Google Text-to-Speech.

https://github.com/pgmichael/wavenet-for-chrome/blob/master/js/background.js

I take it you are using LINEAR16 to create the better quality bitrate and then writing it as an MP3?

Here is my code:

    // Start the Google Text-to-Speech program   
    const client = new textToSpeech.TextToSpeechClient();

    const request = {
      input: { text: myMessage },
      // Select the language and SSML Voice Gender (optional)
      voice: { languageCode: langcode, name: voicename },
      // Select the type of audio encoding
      audioConfig: { audioEncoding: 'MP3' },
    };

    // Performs the Text-to-Speech request
    const [response] = await client.synthesizeSpeech(request);
    // Write the binary audio content to the temp
    const writeMP3File = util.promisify(fs.writeFile);
    await writeMP3File('/tmp/' + "tts.mp3", response.audioContent, 'binary');
    console.log('MP3 audio content written to temp');

This is what I get from using this Node code
Audio: mp3, 24000 Hz, mono, fltp, 32 kb/s

Can you help edit this code so I get the same top quality as I get when I use your plugin? (I use NodeJS for the bigger files)

SSML not working in sandbox

Tried using Sandbox with SSML, unfortunately the tags are not recognized and are always read. Whether I use <script></script> or encode it like <script> it's always read out loud.

Mixed ffprobe outputs over last 2-3 weeks MP3 creation

I have downloaded MP3s created from SSML for some projects to add to videos then create with ffmpeg. Has something changed over the last month? As the ffprobe outputs for each batch of work we did seems to have changed (3 times by the looks of it). This has lead to some videos created not playing properly on certain devices.

Work done on 19th Jan, ffprobe output:

Input #0, wav, from 'download (65).mp3':
  Duration: 00:00:18.04, bitrate: 384 kb/s
    Stream #0:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 24000 Hz, 1 channels, s16, 384 kb/s

Work done on 23rd Jan, ffprobe output:

Input #0, wav, from 'download (0).mp3':
  Duration: 00:00:35.94, bitrate: 384 kb/s
    Stream #0:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 24000 Hz, 1 channels, s16, 384 kb/s

Work done on 31st Jan, ffprobe output:

Input #0, ogg, from 'download (0).mp3':
  Metadata:
    TLEN            : 37003
  Duration: 00:00:37.00, start: 0.000000, bitrate: 23 kb/s
    Stream #0:0: Audio: opus, 48000 Hz, mono, fltp

Running it today 19th Feb now brings another set of file attributes!

Input #0, mp3, from 'download (1).mp3':
  Duration: 00:00:25.30, start: 0.000000, bitrate: 32 kb/s
    Stream #0:0: Audio: mp3, 24000 Hz, mono, fltp, 32 kb/s

But BTW, this plugin is brilliant! Just a bit of an issue with using these with video creation (ffmpeg) for me.

PS - Just for additional reference, this is the output from a file in June 2019 I created via Google TTS in Node myself

Input #0, wav, from 'download.mp3':
  Duration: 00:00:33.06, bitrate: 384 kb/s
    Stream #0:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 24000 Hz, 1 channels, s16, 384 kb/s

Failed to synthesize text - new error experienced

"Trying to acquire 1 unit; currently 1500 units in use; maximum of 1500 units allowed. Server: /bns/jt/borg/jt/bns/cloud-ml-tts-composer/prod-standard-voice-global.tts-composer/4"

tried to use smaller text grouping, didn't help.

Ability to stop, pause and resume current selection.

The shortcut to start reading should also stop reading, or a second shortcut should be added. If you select a large section of text and start reading, you can't stop it!

"Queue mode"

Add an option to enable "queue mode" which changes the behaviour when activating the extension. Normally, if it is already speaking, it will interrupt the previous speech with whatever new selection is sent. Instead, in this mode, it will just keep queueing requests back to back. If the user stops playback, it will jump to the next queued selection of text, and so on. This is good for sites that have a lot of crap between sections of text, so this way we could queue up all the readable text on the site without having to listen to it read out URLs and image sources.

ssml not working as of ver 10 update

As of today 7/26/23 ssml is no longer working.

exsample
<speak> <prosody pitch="low">Absolutely!</prosody> The power of the Horde is <emphasis>undeniable</emphasis>! For the Horde! </speak>

saying please enable billing?

This API method requires billing to be enabled. Please enable billing on project #58906374421 by visiting https://console.developers.google.com/billing/enable?project=58906374421 then retry. If you enabled billing for this project recently, wait a few minutes for the action to propagate to our systems and retry.

just bought this stuff and set it up like a month ago. any ideas on how to fix this issue?

Adding the extension

Greetings,

Unfortunately, it is not possible to add Wavenet from the Chrome Store and also manually from the extension development methods. It gives some sort of certificate error.

Also, because I have this extension on Chromium as well, I receive a new sort of error that states the API key is not valid which is wrong, and after copy-pasting it multiple times it works.

As very handy as this extension, it will be appreciated in advance if you would fix the issue.

Thank you very much.

Is it possible to read ssml?

Thanks for making this. It's been of great help, but is it possible to read SSML. Right now it reads the syntax.

Mp3 Download?

Pg, I love your work, it has been a blessing for me and my family, but it would be perfect if it had the "download as mp3" functionality.

Anyways, cheers and thank you again.

New voices! (August 2019)

Looks like Google just announced some new voices and variants to existing languages.

Cloud Text-to-Speech expands its number of voices by nearly 70%, now covering 33 languages and variants

If I have some time this week I'll create PR for this. But am wondering if @pgmichael has thought of programmatically getting the voices instead? I wonder if there is any implications: https://cloud.google.com/text-to-speech/docs/list-voices

Education

Toefl and Ielts education

Bypass Google's 5000 characters limit.

Hello, let's say I want to read this article https://blog.cloudflare.com/empowering-your-privacy/, I can't in one shot, I have to select multiple times, it's a hassle! Why the 5000 chars limit? If this could be removed or manually configured, it would be awesome!

Google Cloud Speech-to-Text - Speech Recognition for Chrome.

Hello, thank you for the Wavenet plugin that you developed, it is very easy to use. I have a question, do you have any plans to develop the Google Speech to Text for Chrome plugin? Thank you very much.

external use

How can i use the extension from my own scripts in local inside my own tabs?

Add newlines between titles and paragraph.

As mentioned in #15, adding newlines between titles and paragraph will introduce a short pause when reading, which will sound more natural to the end user.

beans

`npm run build` does not work on Ubuntu 18.04

Hi, I love this package and I use it everyday! It changed my life. I wanted to modify it this morning but I ran into the following error when I tried to npm run watch

Note

Ubuntu 18.04
node v8.10.0
npm v3.5.2

My Error

npm ERR! Linux 4.15.0-140-generic
npm ERR! argv "/usr/bin/node" "/usr/bin/npm" "run" "build"
npm ERR! node v8.10.0
npm ERR! npm  v3.5.2

npm ERR! Invalid version: "5"
npm ERR! 
npm ERR! If you need help, you may report this error at:
npm ERR!     <https://github.com/npm/npm/issues>

npm ERR! Please include the following file with any support request:
npm ERR!     /home/conor/Dropbox/07_liquidity/18_wavenet_for_chrome/wavenet-for-chrome/extension/npm-debug.log

oh and here is the npm-debug.log

(py383) ➜  extension git:(master) ✗ cat npm-debug.log 
0 info it worked if it ends with ok
1 verbose cli [ '/usr/bin/node', '/usr/bin/npm', 'run', 'build' ]
2 info using [email protected]
3 info using [email protected]
4 verbose stack Error: Invalid version: "5"
4 verbose stack     at Object.fixVersionField (/usr/share/npm/node_modules/normalize-package-data/lib/fixer.js:191:13)
4 verbose stack     at /usr/share/npm/node_modules/normalize-package-data/lib/normalize.js:32:38
4 verbose stack     at Array.forEach (<anonymous>)
4 verbose stack     at normalize (/usr/share/npm/node_modules/normalize-package-data/lib/normalize.js:31:15)
4 verbose stack     at final (/usr/share/npm/node_modules/read-package-json/read-json.js:338:5)
4 verbose stack     at then (/usr/share/npm/node_modules/read-package-json/read-json.js:113:5)
4 verbose stack     at /usr/share/npm/node_modules/read-package-json/read-json.js:300:12
4 verbose stack     at /usr/share/npm/node_modules/graceful-fs/graceful-fs.js:76:16
4 verbose stack     at FSReqWrap.readFileAfterClose [as oncomplete] (fs.js:511:3)
5 verbose cwd /home/conor/Dropbox/07_liquidity/18_wavenet_for_chrome/wavenet-for-chrome/extension
6 error Linux 4.15.0-140-generic
7 error argv "/usr/bin/node" "/usr/bin/npm" "run" "build"
8 error node v8.10.0
9 error npm  v3.5.2
10 error Invalid version: "5"
11 error If you need help, you may report this error at:
11 error     <https://github.com/npm/npm/issues>
12 verbose exit [ 1, true ]

Add KB shortcut to read or read whole page

some sites for online book reading e.g. O'Reilly.com, when I select text and right click to go into the read option, it deselects the text and hides the menu option.

Could we have an option to read the main text on the body or the ability to highlight text and then execute the read via a keyboard shortcut.

For reference, the read aloud chrome plugin allows this (read whole page) but doesn't support the GoogleWavenet voices ;)

Ability to retrieve selection from all environments.

We currently retrieve the selected text from the selection context or the window.getSelection method. Unfortunately, this does not seem to works in certain environment:

PDFs
Google Docs
Text inputs
iFrames

Increase quotas

I am always getting the same message. Even though sentences are quite short.

Either input.text or input.ssml is longer than the limit of 5000 bytes. This limit is different from quotas. To fix, reduce the byte length of the characters in this request, or consider using the Long Audio API: https://cloud.google.com/text-to-speech/docs/create-audio-text-long-audio-synthesis.

I read the article on Google but I don't understand what I have to do.

How can I increase that limit??

On my API I see all this cuotas:

Count of requests for Neural2 voices per minute
All requests per minute
Count of requests for Long Audio Synthesis per minute
Count of requests for querying Long Audio Synthesis operations per minute
Count of requests for Studio voices per minute

What is exactly the process??? Can anybody help me with this? Thanks!!

Ability to download in .wav format

This extension is super useful, but an option to download audio in .wav formats would be awesome!

Download not working when SSML used

Thank you for the great extension which saves a ton of time.

I am facing an issue with download using sandbox and right click method.
Here is the scenario.

The below content throws error.
Studio O is selected as voice.

<speak>
In a lush jungle, a thankful tiger and a cheerful monkey were close friends. <break time="200ms" /> They often explored the jungle together, admiring its beauty.
</speak>

When I remove the break time, the download works fine.
So the below content works.

<speak>
In a lush jungle, a thankful tiger and a cheerful monkey were close friends. They often explored the jungle together, admiring its beauty.
</speak>

Also, there are no issues when I press "Read Aloud" using both content. Only downloading is the problem

Voices missing for Nederlands / Dutch

Your plugin is fantastic. The only thing missing are the voices missing for Netherlands/Dutch: Wavenet A/B/C/D etc. Would be awesome if you could add them!

Shortcuts not working for me

Hi,

Thanks for the awesome extension!

None of the shortcuts configured via chrome://extensions/shortcuts work for me. Additionally, there is no configurable shortcut to pause the audio.

I'm using Chrome browser production build on Mac OS 10.15.2.

Let me know if I can provide any additional info.

Keyboard shortcut for download option?

A keyboard shortcut for the Download option would be very helpful (control/command + shift + D)

Feature Request: Play selected text on keyboard shortcut

Hi @pgmichael, thanks so much for creating this extension! I have a feature request to play a selected piece of text upon executing a keyboard shortcut!

Thanks so much!
Jonathan

Start speaking shortcut is not working in some URL

Please check this url: http://futurepress.github.io/epub.js/examples/scrolled.html
The "Start speaking" does not works after pressing the shortcut.
The speaking feature does actually works by the right menu.

Highlight text when reading

This is an advanced feature but has the potential to really differentiate this extension from the rest.

Android users enjoy a Google Assistant feature called "read it", what is great with it, apart from free wavenet voices, is that it highlight the text so you can read while listening... This is a game-changer and significantly boosts understanding!

I would love to have this feature here! This extension is the freaking best, and I know you @pgmichael or others can add this feature!

API Expired - no solution? Unable to use the application

The last few weeks the app hasnt been working

API key has apparently expired and Im unable to figure out how to make it work again. Needs to be simpler and this issue needs to be explained by the developers

Failed to synthesize text Resource has been exhausted (e.g. check quota).

can someone tell me what happened and why it won't complete text to speech MP3s anymore?

20240505

audio

Is it possible to add the "Audio device profile" option? Thank you.

Seemingly random skipping of sentences when reading over 5000 characters.

I've noticed that sometimes portions of sentences and sometimes entire sentences get completely skipped when reading >5000 characters. My guess is that it has something to do with the ReGex that handles splitting the text, but I can't be sure of that. Either way, it doesn't happen with every body of text. I think it happens more in bodies with odd characters sprinkled in, but I've also had the last sentence just get skipped with no explanation before. Easiest way to test would be to just select all the text on various sites and see if it reads everything.

Firefox version?

I'm sure this is a bit of a reach, but due to Chrome's impending removal of Manifest V2, I am beginning to start transitioning to Firefox. However, this extension doesn't seem to have a similar equivalent on Firefox. I'm not sure how simple it is to port this extension to Firefox, but is that in the cards? Even if it's just this current version and never gets updated, it would help me a great deal with my transition, as most of my other extensions are available. I'm sure I'm not the only person thinking this, but I thought I'd put it out there. I use this extension liberally and it would be a huge loss for me to not be able to use it on Firefox. Thank you for your consideration.

`Options` in menu is disabled, unable to see settings for pitch or speed.

Add media properties to downloaded file.

No media properties created for the mp3 file, when download mp3 option is selected.

Google Docs

I can't seem to play or download scripts through google docs in chrome, set the shortcuts to be global in chrome as well but no luck, any ideas?

ITS SHOWING aPI KEY EXPIRED

even my account working well and just registerd yeaterday and after facing this issue i deleted old api key and generated new than also its showing same error

English Education

IELTS、TOEFL、SSAT、SAT、GMAT、GRE

I love this app.

I love, love this app. I struggle to read and having Wavenet as an option is fantastic. Though I think that by the time I've listened to a novel it may cost the same as buying the audio version. LOL

Anyhow, would love to see this ported out of chrome into a desktop app that could read text copied to the clipboard automatically. Be even more useful if it displayed and highlighted the text in a pop-up window.

Thank you for your work!

Can an option to play/pause audio and a corresponding shortcut for that be added?

Thanks!

Add a word filter list

Add a user-customizable list of words or text cues to find and replace with different text before sending the query to WaveNet. This could allow us to fix bad pronunciation of commonly spoken words.

pgmichael / wavenet-for-chrome Goto Github PK

wavenet-for-chrome's People

Contributors

Stargazers

Watchers

Forkers

wavenet-for-chrome's Issues

Recommend Projects

Recommend Topics

Recommend Org