👋 Hi, I’m @VRCWizard this is my personal Github where I work on projects related to my hobbies
TTS Voice Wizard:
TTSVoiceWizard.com
Speech to Text to Speech. Song now playing. Sends text as OSC messages to VRChat to display on avatar. (STTTS) (Speech to TTS) (VRC STT System) (VTuber TTS)
Home Page: https://TTSVoiceWizard.com
License: MIT License
👋 Hi, I’m @VRCWizard this is my personal Github where I work on projects related to my hobbies
TTS Voice Wizard:
TTSVoiceWizard.com
old version there was optimus prime in english now portuguise ? and mexican WTH and now less voices its bad ill delete it and never going to use it
Hi, there! Using STT (whisper), I found a couple of bugs. (I'm running the app on Visual Studio 2022 Debug)
Click on "Speech to Text to Speech" on/off several times will cause the Exception Error.
Closing the app (pressing "" icon) with "Speech to Text to Speech" Enabled, will not kill the app. There are dangling thread remaining preventing the app to exit. You will see this behavior when the app is launched from Visual Studio Debug.
I think that's all for now and many thanks for the wonderful app.
Greetings! Just upgraded to version 1.5.1.7 and tried to use Text-to-Speech in System Speech mode, the program crashes after the second use. Crash dumps didn't appear in the program directory. Using Event Viewer, it turned out that TTSVoiceWizard crashes with an error System.IO.DirectoryNotFoundException
, trying to find assets not in its own directory, but in the directory of the voice used. Here are the general details straight from the Event Viewer:
Application: TTSVoiceWizard.exe
CoreCLR Version: 6.0.1823.26907
.NET Version: 6.0.18
Description: The process was terminated due to an unhandled exception.
Exception Info: System.IO.DirectoryNotFoundException: Could not find a part of the path 'C:\Program Files (x86)\Speech2Go Voice Package\x64\Assets\sounds\TTSButton.wav'.
at Microsoft.Win32.SafeHandles.SafeFileHandle.CreateFile(String fullPath, FileMode mode, FileAccess access, FileShare share, FileOptions options)
at Microsoft.Win32.SafeHandles.SafeFileHandle.Open(String fullPath, FileMode mode, FileAccess access, FileShare share, FileOptions options, Int64 preallocationSize)
at System.IO.Strategies.OSFileStreamStrategy..ctor(String path, FileMode mode, FileAccess access, FileShare share, FileOptions options, Int64 preallocationSize)
at System.IO.Strategies.FileStreamHelpers.ChooseStrategy(FileStream fileStream, String path, FileMode mode, FileAccess access, FileShare share, Int32 bufferSize, FileOptions options, Int64 preallocationSize)
at System.IO.File.OpenRead(String path)
at NAudio.Wave.WaveFileReader..ctor(String waveFile)
at OSCVRCWiz.Resources.Audio.AudioDevices.PlaySoundAsync(String soundName) in {Filtered}\OSCVRCWiz\Resources\Audio\AudioDevices.cs:line 483
at System.Threading.Tasks.Task.<>c.<ThrowAsync>b__128_1(Object state)
at System.Threading.QueueUserWorkItemCallbackDefaultContext.Execute()
at System.Threading.ThreadPoolWorkQueue.Dispatch()
at System.Threading.PortableThreadPool.WorkerThread.WorkerThreadStart()
at System.Threading.Thread.StartCallback()
Excluded some part of the path for privacy reasons.
Extremely cool project but noticed a few bits of concern in the git repo worth addressing.
There's a number of prebuilt blobs in the git project and a large amount of content that could be better served as git submodules with their own sub builds wired into the main build script.
Could lead to much improved ability to modify the program to suit individual needs, future contributions, and of course support for building and running on more platforms than win32.
Is cleaning up the source repo on the roadmap?
I would like to make a suggestion here. This is about OBS.
I am currently using Closed Captioning via Google Speech Recognition 0.0.8.
This plug-in converts my voice into text in real time.
This text is entered as an OBS text source and disappears after a while.
I want to convert this real-time text into AI voice and send it to the broadcast using this text source.
Currently, in your wizard program, you have to input text directly from the program to the keyboard to convert it into voice.
Can you create an OBS plug-in that takes the text source of OBS and converts it into voice in real time?
As the title suggests, would be awesome if there was an option to have TTSVoiceWizard listen to a custom port and allow it to be controlled by other OSC applications. Things like starting/stopping the chatbox output and KAT output seperately with endpoints would be cool. Thank you for your work VRCWizard!
Version: v0.9.4.6
OS: Win10
After the latest update, the text box does not clear itself after TTS.
I've checked that the toggle is on and tried with it toggle off as well. Also tried extracting the zip file again with same results.
Updated to the 1.0.3 version today, I think i was on 0.9.9 before. Anyway I got the .NET error again like you warned in the 1.0.2 release notes. However, it would not start after installing the package from that error message. I installed the x86 package for console apps and after that it also still would not start. After I installed the Hosting Bundle it started normally. I've never used Fonixtalk/Moonbase but apparently it must be checking for that dependency before it will even launch.
Hi, I've been trying the web captioner hook. And no matter how long or short the recognized speech is. TTS wizard ignores the first word.
And on related note, Sometimes It repeats the sentences as individual words.
Webhook is also set to 4-5 word batches, so short sentences should be sent instead of individual words.
Scoop is a command-line installer for windows that is well-suited to managing portable apps like this one. Scoop users can install the app from my scoop bucket.
You can add a note to the wiki about installing with scoop, which only requires a couple of commands:
scoop bucket add xrtools "https://github.com/babo4d/scoop-xrtools"
scoop install tts-voice-wizard
The scoop installation also creates a start menu shortcut, recommends the user to install VB-CABLE, and provides instructions on how to install the .NET desktop runtime dependency with scoop:
> scoop info tts-voice-wizard
...
...
Suggestions : extras/windowsdesktop-runtime-lts
Notes : Some features require a virtual audio cable like VB-CABLE <https://vb-audio.com/Cable/>
Requires .NET Desktop Runtime 6.0. To install with scoop:
scoop install sudo
sudo scoop install windowsdesktop-runtime-lts
I have submitted the manifest to the scoop Extras
repository (ScoopInstaller/Extras#11918) which would further simplify installation and improve discoverability.
Hi uhm, im having an issue with the TTS system i've never experienced before, upon pressing enter to make the program do the TTS process.. its, doubled, lemme try to explain it the best i can, instead of hearing the TTS say the text i inputted only once, i hear it twice, the second time with some slight delay than the first time, its a lil weird and idk why this is happening.
Clip of this happening: https://streamable.com/rsh7fc
Any help with this would be appreciated- (this is my first time ever making a post on github btw lol)
So DeepL supports capturing from applications, translating directly in applications, and piping text back and fourth, it also supports screen capture and ocr.
As someone who uses it to communicate with a massive japanese team on the daily, it is by far the best and most accurate translation. And, it's completely free for the kind of use this application would be spitting out.
It's paid options are for team translations documents and cat tools, all of which you don't need.
I'm planning on using your tool to give myself jp closed captions in game (i'm less interested in the TTS portion)
The azure requirement is a huge turn off for many, and to be honest it's translations aren't even all that good in comparison, especially for Japanese.
This would create a completely free option for users that works with what is considered the best tool to work with japanese users.
there is any way to convert the audio into mp3 ?
Hi, there!
I have a small request.
Every time when I use the app, I have to change the preset, for example, "Text to Speech" -> "Presets" always go back to "Non Selected"
Is it possible to remember "Preset"?
Thanks.
when playing music thru YouTube, it'll show the first song your listening to and the what min and sec your on if you have it showing that and upon updating the text it'll show the same thing indefinitely until i restart the app. Pls get this fixed asap
as soon as i hop on vrc tts dead evry 5 min need help
Just wanted to make this known just in case it isn't.
Compact mode doesn't display current duration with "Output Current Song Periodically" is enabled.
Also it isn't compact upon switching songs
Additionally, add support for the hour mark if possible (only if song length >1hr, otherwise hide hour mark), as when I added local files that are long mixes, it doesn't display the hour mark.
I would like all changed field on Speech to TTS menu will be saved on app closing, Thank you <3
i am getting [OBSText File Error: Could not find a part of the path 'C:\Users\User\Desktop\TTSVoiceWizard-v1.5.8.1-x64\Output\TextOut\OBSText.txt'.. Try moving folder location.] it was in downloads but no matter where i move tts voice it dose it I also run it as administrator it even did it befor i updated as well i can click open file location and it works it just wont update / overwrite the txt file
I tried several other TTS on this programm, but some are not working, the most important to me is "Acapela Elan TTS Digalo Nikolai", even though its SAPI5, its just not on the list
Hi Bro,can u add more API,like OPENAI for the translation and TTS at the sametime, or Edge TTS stuff, just a thought, hope u see it,
i like your work a lot by the way!! im a Chinese user, the Api apply and payment is not friendly, maybe you can add more API for choose,
respect~
Would be really nice if there were toggles or some way to control what parts of the program are influenced by websocket commands. It could be handled similarly to the OSC endpoint variables or just toggles in the UI itself. For example, there could be a way to control whether the speech output, chatbox etc is influenced by websocket commands or not. Also having another OSC endpoint for TTS output would be a nice addition alongside the current chatbox and KAT endpoints. Thanks again Wizard for catering to my weird needs, still really appreciate you adding the OSC to TTS function. It's been really helpful for integrating my own niche chatbox projects alongside this and I hope others will take advantage of it too <3
Hi! Thanks for you share.
Can support download mp3 voice file when TTS?
While the TTS is great to use, there is a lack of being able to control the volume in the TTS Voice Wizard itself. The GLADOS one comes out at a very high volume. Having a simple volume slider for the TTS would solve this issue
Hi.
Would you add the option so that we can add our own .pth RVC voice file to it to use as basis for our STTTS?
Thank you.
There appears to be a bug where spotify periodically outputs when the setting to do so is disabled, except without the current time.
This happens when "Send Text to VRChat with KAT" is disabled.
i found way pull Quest 2 Battery Life but it does Require ADB (Android Debug Bridge) to Detect Battery life.
to get battery life have to find way get Quest 2 Connect by Wired and Wireless using ADB in port 5555
mainly Oculus App uses ADB pull Quest 2 Battery Life and Controllers all direct headset and ADB calls it out.
so might be way get Quest and Quest 2 Supported and already fork Current Project today i see can find way intergrade it in and push changes with Quest 1/2 Integrations less someone else on it before me.
Hi, STT (Whisper) is the biggest use-case for me. I think it's probably the most important feature for now until I can use it reliably.
Hopefully, it's the same for everyone as it's the starting point for using TTSVoiceWizard.
Anyway, there is what I find using the latest v.1.5.0 from the github main.
In the Log View, I see the new "Whisper Debug: ..." output. When STT mode is on, it will always shows randomly shows one of the followings. I think it's clear what it means.
(A) "Listening" (listening and there is no sound input)
(B) "Listening, Voice" (listening and sound input is detected)
(C) "Listening, Transcribing" (processing recorded voice)
But the problem is that they do no accurately represent what's really happening, and the behaviors are bit random.
Here are my observations. (I always launch it from VS Debug but I think the behaviors are the same from .exe)
Here is another observation/question.
I see the following logs in VS Console Output.
It seems to recreate the same threads infinitely.
Can you please tell me what these threads are for?
Perhaps the unstability is related to these thread constantly being recreated?
Many thanks.
I have tested almost all whisper models of all sizes, and similar things happen.
When the recognition input language is set to Chinese.
When there is no voice input, the whisper model will continue to output spam, but if it is set to English, there will be no similar issue.
Japanese has not found the same issue, and I have not tried other languages.
Then I tried to switch the model to vosk, and there was no similar issue.
When I set the input to a microphone without any signal input, as can be seen in the screenshot, the whisper models output garbage.
This issue occurs more often when the mic is set to the one that I everyday use at the time I don't say anything
I don't know what caused this problem. I tried to use text replacement to delete these spam messages, but he often randomly combined some common words and randomly added spaces, which made it basically unusable.
I checked Whisper Model's github, as well as some technology shares using Whisper Model directly.
But there seems to be no such problem.
It seems that Whisper Model can be set to recognize multiple languages at the same time.
Is it possible to manually add startup commands, or provide an option to select multiple languages?
I wonder if this problem will be eliminated when multiple languages are recognized at the same time.
Hi, I have a couple of feature requests that can help working with STT easier.
I like to change my chatting language quickly to talk to someone with different languages.
Right now, in order to change the language, I have to restart STT (click on "Speech to Text to Speech").
This has a couple of problems. Click on "Speech to Text to Speech" quickly can cause crashes, and restarting STT can take bit of time to load ggml and it's a bit unstable when it starts.
Therefore, it would be nice to change "language" without restarting STT. Ggml should be loaded only once, or only when it's changed, I think.
I would like to temporarily "pause" capturing voice without stopping STT. Restarting STT can be problematic as I mentioned above. Is it possible to add a Toggle Button right next to "Speech to Text to Speech" to temporarily stop capturing voice? This way, I can control STT not to go crazy and STT can have time to clear up the buffer when not capturing.
Please let me know if you have a better idea.
Cheers!
Hello, saw the glados-tts that was forked to work for the TTS Voice Wizard put out a new version of a TTS model about 8 months ago. Would it be possible to make this new model compatible with the TTS Voice Wizard like the old model? Thanks!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.