adrianlyjak / obsidian-aloud-tts Goto Github PK
View Code? Open in Web Editor NEWLicense: MIT License
License: MIT License
I find 1.0 speed too slow, and 1.25 too fast. Wondering if we could have the option to set the speed manually or have a 1.15 option? Thanks.
The audio visualization seems like its frequently getting truncated, and overemphasizes low audio ranges.
Adjust it and make it nicer.
It would be pretty cool to support Parler-TTS, an open source TTS with fairly good quality and voice customization.
If running the model local, the inference code is currently just python. Perhaps there could be a bridge and this would be a node.js only feature.
I don't know how much work it would be to make the model work with javascript+ONNX, but that would also be pretty cool, and likely useful to others to do better on device TTS.
When running the player on a mobile device, such as on an iphone, I want the text player to continue to play while the device is locked, and show an OS level play-pause-track-switcher
When I exit the note, and re-enter it, and hit play, the audio sometimes gets regenerated again from scratch. But it also just generally happens when I change my typing cursor position inside the note, and hit play. I'm assuming this has something to do with my cursor position since the plugin always seem inclined to regenerate text it had already previously generated, because it can't seem to consistently remember which already-generated audio file pairs with which sentence, so all it can do is resort to generating new audio.
It also seems that the plugin also generates new audio directly from openai if I increase/decrease the speed. Isn't there a way to just make the audio faster without necessarily having to generate new audio? Like how you would download a video from youtube and play it at faster speeds just fine in VLC media player. This would also probably allow the plugin to play the audio at 3, 4, 5x, which would be really nice for digesting information from a note quickly without having to slowly read.
Honestly, all this comes down to is my wallet crying in agony :). It would also be nice if, in the future, we can utilize local models for notes. Or, at least, have audio be generated once and then imbedded into the note once and for all.
Eleven Labs has a lot of high quality voices and voice configuration. It would be nice to support them as well. Initially this could support just their canned voices, but it could also eventually support custom voices.
Azure tts is the best TTS service I've ever used. You can try this service on this website. Additionally, it supports many different programming languages.
OpenAI's TTS is trained on multiple languages, but compared to specialized TTS models trained on a single language, it still falls short in Chinese, Japanese, and other languages compared to Azure TTS's single-language versions.
May I ask if aloud can provide support for Azure TTS API? Thanks.
Obsidian notes frequently contain links. its not very helpful to read the links aloud.
Strip links and other non-useful markdown characters before converting text to speech.
The player splits text sentence-by-sentence, and renders those to speech. It does this so that content is streamable and navigable.
However, the speech synthesis isn't as good.
Would it be possible to stream the audio chunks from openai, and somehow rethink the navigation chunks within the paragraph?
Hi,
I am uncertain if this is an unusual request. However, I would greatly appreciate the functionality to permanently mark a portion of text, which would subsequently generate a small audio icon alongside it, enabling playback.
Whether this could be done in a callout or as part of an embedded code snippet would be up to you (There may be better ways apart from what I illustrate below)
e.g.
> [!tts|1.2 voice]
> Text to play back
or
```tts
playbackspeed: 1.2
voice: xxx
Text to play back
```
Thanks
Pause and Resume functionality is very commonly used, but setting Pause and Resume to two different hotkeys is not very reasonable, as it creates a disjointed experience by requiring different keys each time. Most media players bind Pause and Resume to the same hotkey. I hope you can modify this feature.
Thank you very much.
This is a great Obsidian plugin, but the highlight colors for the text being read and the text not yet read are too similar. Could you add an extra configuration option in the settings or an additional configuration JSON file to set the highlight RGBA values?
Thank you very much.
The API costs money, so I'd like to see cost and usage over time. Might be nice to even set a spending cap
Right now audio files are cached for a short duration to make replaying audio faster. The plugin expires audios after 8 hours. The audio is stored in a .tts
directory in the vault. This has the downside of syncing the audio caches. I previously had the caches lasting for much longer, but it was slowing down my iOS syncs significantly.
Is there perhaps instead a good way to store the caches outside the vault so that they don't sync?
When playing text, the player is only supposed to play in the selected pane that the text is from, but it's showing in all of the panes.
I want to convert text to audio and save it in markdown file for later. Is it possible?
Hello Adrian, the plugin's cache duration of 168 hours is ideal to avoid additional charges for new inferences. I was unable to locate the hidden folder on my Windows system. Although I'm familiar with the process. The primary purpose is for backup. Is there a method to determine which portion of the text has been converted to audio? Any suggestions would be appreciated. Thanks for the plugin.
I use three kinds of markup in my documents:
[ @authortitleyear, page ]
)[^1]
){>> notes to self <<}
)I would like to have Aloud skip text wrapped in these markup code when reading my document, as it would make reading much more fluid without all my notes and citations. Thanks.
Hey there,
I use Dataview to generate a document of randomly ordered "tidbits" from a folder. But when I highlight the Dataview text, aloud says "No text selected to speak". If it's possible to fix (or if there's a workaround?), that'd be amazing.
Thanks!
Something you probably already are planning but can you add a setting to control the speed.
There is a speed option in the Whisper API:
speed: The speed of the generated audio. Select a value from 0.25 to 4.0. 1.0 is the default.
Thanks for the plugin btw was a missing piece in my workflow.
Hi, thanks for a great plugin! It works great on Mac, but on iOS, there's neither sound, nor controls when I press "Aloud: text to speech".
Since I sync with Working Copy, I can confirm that the sound files are being generated in the .tts
directory and can be played from directly from there (the audio content matches the text in the note).
iOS Version: 17.2.1
Model: iPhone 11 Pro Max
Right now you have to select next and then run the command. I'd like to have a command that will just directly read the entirety of the current document, without having to first "select-all." Thanks.
See #28 for original context
OpenAI has a speed option in their API that the plug-in is using. This was just the most straightforward option to implement the feature. I haven't been impressed with the quality of the non-1x audio though. It seems like its lower quality? There's a playbackRate parameter in web audio that this could use (example), however it ends up changing the pitch of the audio as well, so there'd be more involved here than just tuning that parameter. Perhaps the detune parameter could counter-balance the pitch change.
Hi, the plugin doesn't work for mac with this error in console:
app.js:1 Uncaught (in promise) Error: Request failed, status 429
at new t (app.js:1:1977769)
at qG (app.js:1:1977961)
at app.js:1:1978638
at app.js:1:237056
at Object.next (app.js:1:237161)
at a (app.js:1:235879)
I have the latest version (0.2.0
) and obsidian version 1.5.12
Let me know if you need any additional info.
Need a quick guide to set it up with opendai.speech.
I am trying for 10 hours and opendai.speech is working but couldnt link it to aloud plugin yet..
I tried pasting the url and it gives parlor tts not found error.. while i have other tts (piper) set up...
In mobile, its harder to access and configure commands. Would be nice to have the player always open in mobile
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.