thev360 / qcs-tts Goto Github PK

View Code? Open in Web Editor NEW

2.0 2.0 0.0 80 KB

Text-to-Speech System for 12's QCS Frontend

JavaScript 100.00%

qcs-tts's Introduction

Strange experiments and stuff, etc.

qcs-tts's People

Contributors

Stargazers

Watchers

qcs-tts's Issues

espeak likes to use the wrong voice constantly?

This'll probably be fixed by the improved voice profiles system thing, but espeak has a nasty habit of not using the voices I'm asking it to use. My wild guess is that I don't tell espeak to switch to another voice in time for it to load that voice completely, and then I ask it to speak and it uses the fallback default voice? If I remember correctly, there is a way to set the current voice of the TTS on the speechSynthesis object. I'd just need to set that at startup. This should run after user configs are loaded, so it's kinda difficult to do in this current UI-less "edit the JavaScript yourself if you're so smart" setup.

Single TTS reading of a new message with multiple tabs open

(sloppier spitballing writing)

Really the only way to coordinate like this is to use a broadcast channel? It does make a few things more complex - like iirc multiple tabs can't run their own TTS while others are playing (research please to see if this is OS-specific or standardized) -- either new utterances are queued favoring the currently-speaking tab (bad, means if both pages are very active, current page has to finish all its new messages before other page starts its oldest new message) or they aren't (good).

Oh that and uh the multi-page side-by-side view. how does it interact with that. oops..

in both cases I want to ensure that it doesn't play the notif sound in the other tabs if a page is configured to speak in foreground and notif sound in background. you know what i mean.

Interface to manage per-user settings and per-room settings

Writing out JavaScript objects to configure the TTS sucks actually! I'd add a button to open up a modal <dialog>, which opens a tabbed view with a new place for some less-often-changed settings (skip key, maybe a notchless volume bar?? (god i wish holding alt just ignored notches lo)) and a tab to manage user settings and another for room settings.

User settings will be like some accordion list. Global user settings are expanded by default. Each "user" entry consists of a list of aliases, including integer IDs (contentapi user IDs) and string names (bridge usernames) and the whole shebang blah blah my head hurts again. I'll look through my docs when I get to this. Don't worry.

Room settings will be the same deal, but -- ah damn I just realized contentapi is a tree structure and I don't have a way to easily "apply to children". It doesn't matter. Maybe it'd be appropriate to organize this into profiles -- oh, and the TTS Notify picker will just pick from these profiles for un-overridden rooms. The "profiles" system often leads to gross UIs but i think it's appropriate here and it's a logical abstraction over what I have now. So I guess it means that room settings will be a list of profiles and each profile can be attached to a list of room IDs, much like the user settings accordion.

I also would like to pretty-print IDs as their respective contents' names rather than just paste in numbers.

OCR integration to read posted images

Tesseract looks good enough. It runs in-browser, so data doesn't leave the browser, which makes me feel okay with using it. Even with that, this should be optional as the OCR models are 10 MB downloads.

The default eng model isn't perfect - giving it a few of my screenshots yielded mixed results until I gave it a screenshot of just text. Maybe cancel the utterance of the recognized text if it satisfies some things like:

doesn't contain a longer-than-5-characters alphabetic word
contains strange non-alphanumeric non-punctuation symbols
(maybe use the positional data in some way?? maybe even trim out single-character bits and try to keep what's remaining?)

thev360 / qcs-tts Goto Github PK

qcs-tts's Introduction

qcs-tts's People

Contributors

Stargazers

Watchers

qcs-tts's Issues

espeak likes to use the wrong voice constantly?

Single TTS reading of a new message with multiple tabs open

Interface to manage per-user settings and per-room settings

OCR integration to read posted images

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent